Test review: The Michigan English language assessment battery

advertisement
Running Head: TEST REVIEWS
Test Reviews
Albatool Abalkheel
Colorado State University
2
TEST REVIEWS
Test Reviews
For students, having a certain level of proficiency in English is an essential requirement in
most universities that use English as the main language of instruction, including Qassim
University in Saudi Arabia. Therefore, many publishers and testing companies publish tests
that are used to assess the English language abilities of adult learners. However, the published
tests differ from each other in some of their characteristics, such as their costs, and structures.
So, it is important to evaluate these tests to identify which ones are appropriate for certain
purposes and individuals.
The test of English as a Foreign Language – Internet Based Test (TOEFL iBT),
International English Language Testing System (IELTS) and Michigan English Language
Assessment Battery (MELAB) are all designed for assessing English language abilities and are
used to make decisions regarding who is proficient enough in English to be selected for
entrance to an educational institution. These three tests are the focus of this paper because they
all assess the four language skills (reading, writing, listening and speaking). These four skills
are essential in achieving academic success in universities that use English as the language of
instruction.
This review will compare these three tests because they are common in Saudi Arabia to
decide which one is more appropriate for students who want to gain admission as an English
major at Qassim University in Saudi Arabia. Several Saudi universities accept the result of
each of this test, but Qassim University want to accept the result of just one or two
standardized tests. Then, they have to report to the ministry of Higher Education in Saudi
Arabia about the tests that they would accept with a report about these tests and why they
decide to use them. After the decision, the university holds some workshops for students who
will take these tests to help them with strategies that may help them to get high scores.
3
TEST REVIEWS
TOEFL iBT
Table 1
Description of TOEFL IBT™ Test
Publisher
Educational Testing Service ETS. ETS Corporate Headquarters, 660
Rosedale Road, Princeton, NJ 08541 USA Tel: 1-609-921-9000
Website: http://www.ets.org/toefl
Publication Date
Target population
2005
Nonnative-English speaking students who want to study at institutions
of higher education that use English as the language of instruction
Cost of the test
From $160 to $250
(Sawaki, Stricker & Oranje, 2009)
Overview
As shown in Table 1, the TOEFL iBT was published in 2005 by the Educational Test
Service (ETS). It is accepted as proof of English proficiency by more than 9,000 institutions in
more than 130 countries, according to the test website. An extended description of the test is
provided below (see Table 2).
Table 2
Extended Description for TOEFL iBT
Test
TOEFL iBT can be used to make decisions of who is proficient enough in
purpose
English to be selected for entrance to an educational institution (www.ets.org).
Another purpose of TOEFL iBT is to measure the communicative language
ability and language proficiency of people whose first language is not English
(Jamieson, Jones, Kirsch, Mosenthal, & Taylor, 2000). The TOEFL iBT score is
used for English-language learning program admissions, and certification
candidates.
Test
structure
The test has four parts: reading, listening, speaking, and writing. The reading
section is designed to measure students’ comprehension based on finding main
ideas, inferring, understanding vocabulary in context and understanding factual
information. Topics of texts are various, such as hard sciences topics.
In the listening section test takers will listen to the conversations and lectures
only once and they can take notes while listening. This section is designed to
measure test takers’ ability to understand spoken English through recognizing a
speaker’s attitude, understanding the organization of given information, and
making inferences about what has been said.
The speaking section is designed to assess test takers’ ability to express an
opinion about a familiar topic and to summaries reading or listening tasks. It
deals with a variety of topics, such as topics about campus situations.
The writing section is designed to measure test takers’ ability to expressing
their opinions about a given topic through writing based on their knowledge and
experience on the given topic, and their ability to write a summary of a listening
passage and relate it to a reading passage. Test takers type their answers on the
computer.
4
TEST REVIEWS
The following table shows the tasks in each section:
Skill
Tasks
Questions
Reading
(60-100
minutes)
3 - 5 passages (about
700 words each)
12 - 14 multiple-choice items each (36
-56 items total)
Listening
(60 – 90
minutes)
4 - 6 lectures
2 - 3 conversations
6 multiple-choice items each (34 -51
items total)
5 multiple-choice items each
Speaking
(20 minutes)
6 tasks:
2 independent tasks
4 integrated tasks
General questions on familiar topics
The first two: an oral and a written
stimulus
The second two: oral stimulus
Writing
(50 minutes)
2 tasks:
1 independent task
Writing an essay responding to a
(20 minutes)
general question (150- 225 words)
1 integrated task
Writing a summary of a lecture and a
(30 minutes)
reading (150-225 words)
(“TOEFL iBT™ Test Framework and Development,” n.d.; “TOEFL iBT® Test
Content,” n.d.).
Scoring of
the test
The total score for this test is 120. Each section is scored on a scale of 0-30.
Reading and listening tasks are scored by computer. Each of the six tasks in the
speaking part is rated from 0 to 4 by ETS-certified raters, based on four criteria:
general, delivery, language use, topic development. The scores then are summed
and converted to a scaled score of 0 to 30. The writing tasks are rated from 0 to 5.
A human rater assesses the content and meaning in writing tasks, while
automated scoring techniques assesses linguistic features. This is the distribution:
Skill
Scale
Level
Reading
0 – 30
High (22 - 30)
Intermediate (15 - 21)
Low (0 -14)
Listening
0 – 30
High (22 - 30)
Intermediate (15 - 21)
Low (0 -14)
Speaking
0 – 30
Good (26 - 30)
Fair (18 - 25)
Limited (10 -17)
Weak (0 - 9)
Writing
0 – 30
Good (24 - 30)
Fair (17 - 23)
Limited (1 -16)
Total Score
0 – 120
(“TOEFL iBT™ Test Framework and Development,” n.d.; “TOEFL iBT® Test
Content,” n.d.).
5
TEST REVIEWS
Statistical
distribution
of scores
Standard
error of
measurement
ETS reported the following means and standard deviations of test-takers who
took the TOEFL-iBT between January and December of 2013 (Test and Score
Data Summary for TOEFL iBT® Tests, 2013).
Section
Mean
SD
Reading
20.01
6.7
Listening
19.7
6.7
Speaking
20.1
4.6
Writing
20.6
5.0
Total
81
20
Calculating the Standard error of measurement (SEM) of the total score of the
first year’s operational data from 2007 ended up with 5.64. A closer look
indicates that reading, listening and writing have a higher SEM than speaking.
Score
Scale
SEM
Reading
0-30
3.35
Listening
0-30
3.20
Speaking
0-30
1.62
Writing
0-30
2.76
0-120
Total
5.64
(“Reliability and Comparability of TOEFL iBT™ Scores,” 2011)
According to Lawrence (2011), SEM refers to how close a test taker’s score is to
the test taker’s true ability score. The larger the standard error, the wider the band
score and this indicate that the score is less reliable (Miller et al., 2008).
Evidence of The reliability estimation in TOEFL iBT is based on two theories: item response
Reliability
theory (IRT) and generalizability theory (G-theory). IRT is used with reading and
listening parts, while G-theory is used with speaking and writing parts.
These two different methods were used because the test has constructed-response
and selected-response tasks. G-theory measures the score reliability for tasks in
which the test taker generates answers, while IRT is used with tasks in which test
takers select from a list of possible responses. A generalizability coefficient (G coefficient) is used as the index score reliability in this framework. TOEFL iBT®
has high reliability. The reliability estimates for the four parts are provided in the
following:
Reliability
Score
Estimate
Reading
Listening
Speaking
Writing
Total
0.85
0.85
0.88
0.74
0.94
The reliability estimates for the reading, listening, speaking, and total scores are
relatively high, while the reliability of the writing score is lower.
(“Reliability and Comparability of TOEFL iBT™ Scores,” 2011)
TEST REVIEWS
6
Evidence of
The validation process for TOEFL iBT started with “the conceptualization
Validity
and design of the test” and continues today with an ongoing program of
validation research” (“Validity Evidence Supporting the Interpretation and Use of
TOEFL iBT Scores,” 2011). Different types of validity evidence have been
collected for TOEFL iBT. It is reported on the official website of the test that a
strong case for the validity of proposed score interpretation and uses has been
constructed and that researchers conducted studies of “factor structure,”
“construct representation,” “criterion-related and predictive validity” and
“consequential validity.”
.
According to Lawrence (2008), TOEFL iBT is valid because the task
design and scoring rubrics are appropriate for the purposes of the test; it is related
to academic language proficiency and the linguistic knowledge, the test structure
is related to theoretical views of the relationships among English language skills;
(p. 4-10).
7
TEST REVIEWS
IELTS Test
Table 3
Description of IELTS
Publisher
University of Cambridge ESOL Examinations, the British Council, and IDP:
IELTS Australia. Subject Manager, University of Cambridge ESOL
Examinations, 1 Hills Road, Cambridge CB1 2EU United Kingdom; telephone
44-1223-553355; ielts@ucles.org.uk. Website: http://www.ielts.org/
Date of
publication
1989
Target
population
Nonnative-English speaking who want to study or work where the language of
communication is English
Cost of the
$185 to December 31, 2012, $190 from January 1, 2013
test
(“IELTS: Test Takers FAQs,” n.d.; “IELTS: The Proven Test with Ongoing Innovation,” n.d.)
Overview
As shown in Table 3, the publisher of this test is IELTS Australia and British Council,
and it is a version of the IELTS test that was used in Australia, New Zealand, and the United
Kingdom in the 1980’s (Alderson and North, 1991). It has two modules: the Academic module
and the General Training module. An extended description is provided in Table 4.
Table 4
Extended Description for IELTS
Test
Both modules of the IELTS test are used to assess the language ability of adult
purpose
people. The Academic module is for candidates willing to do their graduate or
undergraduate studies and those who want “professional registration” in an English
speaking place, while the General Training module is for candidates willing to
migrate to or work in an English speaking country or those who want to study at
below degree levels in an English speaking place (Choose IELTS, n.d.).
Test
Structure
Both modules have listening, reading, writing and speaking parts. However, they
differ in their reading and writing tasks. The listening section allows test takers 30
minutes to listen to “four recorded texts, monologues and conversations by a range
of native speakers” and to answers 40 questions based on these recordings. These
questions have different kinds and assess different sub-skills.
Multiple-choice items assess “detailed understanding of specific points, or
general understanding of the main points of the recording” and to measure their
following of conversation between two people and their recognition of “how facts
in the recording are connected to each other” through matching. Diagram labeling
questions are designed to assess test takers ability ”to understand, for example, a
description of a place, and how this description relates to the visual” and their
8
TEST REVIEWS
“ability to understand explanations of where things are and follow directions.”
Sentence completion tasks are designed to assess test takers ability to” understand
the important information in a recording.” Short-answer questions are designed to
assess test takers ability to “listen for facts, such as places, prices or times, heard in
the recording.”
Regarding the reading section, both modules have three sections and allow
test-takers 60 minutes to read and answer questions. Each section in the Academic
module has one text, which “range from the descriptive and factual to the
discursive and analytical.” The first section in the General module has two or three
short factual texts, the second has two short factual texts, and the third has one
longer, more complex text.
Test takers are asked to complete a total of 40 different questions. Each
question assesses some sub-skills. Multiple-choice tasks are designed to assess test
takers’ understanding of main and specific points. Identifying information
(True/False/Not given), identifying writer’s views/claims, sentence completion,
diagram labeling and short answer question assess test takers ability to recognize
specific information given in the text, while matching headings assess test takers
ability to scan a text to find specific information.
The writing section in the Academic module consists of two tasks that test taker
has to complete in 60 minutes. In the first task test-takers are given a graph, table,
chart, or diagram for which they should describe, summarize or explain the
information in 150 words. This task is designed to assess test takers’ ability to
organize, present and compare data and their explanation of a process. The second
task is writing an essay responding to a point of view, argument, or problem in 250
words. This task assesses their ability to “present a solution to a problem, present
and justify an opinion, compare and contrast evidence, opinions and implications
and evaluate and challenge ideas, evidence or an argument.”
The writing section in the General module consists also of two tasks that test
taker has to complete in 60 minutes. The first task is writing a letter responding to a
given situation. The letter either explains the situation or asks for information (150
words). This task is designed to assess test takers ability to “ask for and/or provide
general factual information express needs, wants, likes and dislikes express
opinions.” The second task is writing a short essay responding to a point of view,
argument or problem (250 words). This task has the same skills that are assessed by
the second task in the Academic module.
The speaking section has three face-to-face oral interview tasks that take 11-14
minutes. The first task involves questions on familiar topics (4-5 minutes). This
task assesses test takers ability to “give opinions and information on everyday
topics and common experiences or situations by answering a range of questions.”
The second task asks test taker to respond to a particular topic based on a task card
in 3-4 minutes. This task assesses test takers’ ability to speak at length on a given
topic. The last task requires test-takers to respond to further questions that are
connected to the topic from the previous task (4-5 minutes). It assesses test takers
ability to explain their opinions and to analyze, discuss and speculate about
issues.”
(“IELTS: Information for Candidates,” n.d.)
Scoring of
the test
The reading and listening sections are assessed by computer, while the writing and
speaking sections are rated by IELTS raters who are trained and certified by
Cambridge ESOL. Then the scores are averaged and rounded to have a final score
shown on a Band Scale ranging from 1 to 9 with a profile score for each of the four
part where 9 is the highest score. Each item in the listening and reading parts is
9
TEST REVIEWS
worth one point. Scores of each part out of 40 are converted to the IELTS 9-band
scale. Writing is assessed based on task achievement, coherence and cohesion,
lexical resource, grammatical range and accuracy. Speaking is assessed based on
fluency and cohesion, lexical resource, grammatical range and accuracy and
pronunciation. (“IELTS: Information for Candidates,” n.d.).
Distribution of scores are provided as follow:
Skill
Listening
Statistical
distribution of
scores
Score Range
0 – 40
Raw Score out of 40
16
23
30
35
Band Score
5
6
7
8
Academic
Reading
0 – 40
15
23
30
35
5
6
7
8
General
Training
Reading
0 – 40
15
23
30
34
4
5
6
7
Speaking
Writing
Not Applicable
Not Applicable
N/A
N/A
0–9
0–9
Total
Score
Not Applicable
9
The distribution scores and standard error of measurement of listening and reading
(2012) is reported. However, it is not reported for writing and speaking in the same
manner because they are not item-based and “candidates' writing and speaking
performances are rated by trained and standardized examiners according to
detailed descriptive criteria and rating scales.”
Skill
Listening
Academic Reading
General Training Reading
Mean
6.0
5.9
6.1
St Deviation
1.3
1.0
1.2
(IELTS: Test Performance, n.d.)
Standard
error of
measurement
Standard error of measurement of Listening and Reading (2012) is reported.in the
following way:
Skill
Listening
Academic Reading
General Training Reading
SEM
0.390
0.316
0.339
This is interpreted as less than half a band score.
(IELTS: Test Performance, n.d.)
TEST REVIEWS
Evidence
of
reliability
Evidence
of validity
10
Cronbach's alpha is used to report the reliability of Listening and Reading
tests. It is “a reliability estimate which measures the internal consistency of the 40item test.” Average alphas across 16 listening versions, General module reading
versions, and Academic reading versions in 2012 are reported. The average alpha
was 0.91 for the Listening, 0.92 for General reading module and 0.90 for
Academic reading.
The Reliability of rating in Writing and Speaking is “assured through the faceto-face training and certification of examiners and all must undergo a retraining
and recertification process every two years” and many experimental
generalizability studies were carried out to examine the reliability of ratings.
Coefficients of 0.83-0.86 for Speaking and 0.81-0.89 for Writing are shown by Gstudies based on examiner certification (“IELTS: Test Performance,” n.d.).
Ongoing research that works on ensuring that the test is functioning as
intended is reported. Research topics are related to the impact of IELTS on
enrolling higher education and professional registration, “prediction of academic
language performance, stakeholder attitudes, and “test preparation” (“IELTS:
Predictive validity,” n.d.). Predictive validity of IELTS is the focus of many
reported studies. Hill, Storch and Brian Lynch (1999) investigated the
effectiveness of IELTS as a predictor of academic success of international students
(n=66) and found that the relationship between GPA and IELTS scores was strong.
Allwright and Banerjee (1997, as cited in Breeze & Miller, 2011) found that there
is a positive correlation between the English-medium academic success and overall
IELTS band scores of international students.
11
TEST REVIEWS
MELAB
Table 5
Description of MELAB
Publisher
English Language Institute, Testing and Certification, MELAB Testing
Program, University of Michigan, Ann Arbor, MI, 48109–1057, USA;
phone: +1 734 764 2416/763 3452; email: melabelium@umich.edu;
websites: http://www.cambridgemichigan.org/melab
Publication date
1985
Target population
Adult nonnative speakers of English who want to study or work where
the language of communication is English
Cost of the test
$60 plus $15 for oral interview
Overview
The English Language Institute at the University of Michigan developed the Michigan
English Language Assessment Battery (MELAB). The standardized test for assessing English
language proficiency is introduced from one to three times monthly at scheduled dates.
Table 6
Extended description for MELAB Test
Test
MELAB is designed to assess advanced-level English language competence of
purpose
adult nonnative speakers of English for admission purposes and to judge their
fluency in English, and to assess professionals who need English for work, training
or employment purposes (“About the MELAB,” n.d.; Weigle, 2000).
Test
structure
The test includes three essential parts: written composition, listening
comprehension, and grammar, cloze, vocabulary and reading comprehension (G-CV-R), and one optional part (speaking test). The writing part allows test-takers 30
minutes to write an essay of 200 -300 words. Test takers choose to write on one of
two essay topics: opinion, description or explanation of a problem. The listening
part allows test takers 30 - 35 minutes to answer multiple-choice question based on
“questions, statements or short dialogs, and two to three extended listening texts.
Grammar- cloze-vocabulary- reading test (G-C-V-R) allows test takers 80
minutes to answer 100 questions. These questions include 30 - 35 multiple-choice
questions based on American conversational grammar where candidates have to
choose only grammatically correct answers. They also include 20 - 25 fill in the
gaps in two cloze passages by choosing the correct words in terms of grammar and
meaning. In addition, they have 30 -35 multiple-choice questions where candidates
have to choose the best word or phrase in terms of meaning to complete a
sentence. The remaining questions are multiple-choice items where candidates
have to choose the correct answer in a set of comprehension questions in 4 - 5
reading comprehension passages (“MELAB Success,” n.d.).
The speaking part allows test takers 10 – 15 minutes. In a face-to-face
interview, candidates are asked about their backgrounds, future plans, and opinions
on certain issues (Johnson, 2005).
12
TEST REVIEWS
Scoring of
the test
All tests are sent to the English Language Institute, University of Michigan
for scoring. However, the speaking section is not included because it is scored
locally (Weigle, 2000). Listening and GCVR parts are computer-scored where
“each correct answer contributes proportionally to the final score for each section
and there are no points deducted for wrong answers.” A scaled score is calculated
through a “mathematical model based on Item Response Theory” (“MELAB
Scoring;” n.d.).
For Writing, two trained raters score the essay according to a “ten-step
holistic scale.” The scale descriptors focus on “topic development, organization,
and range, accuracy and appropriateness of grammar and vocabulary” (Johnson,
2005, p. 4). Then, this ten-point scale is converted into intervals to make the
section have the same scale of listening and GCVR “The writing scale is set at
nearly equal intervals between 50 and 100 (53 and 97, to be exact)” (Weigle, 2000,
p. 449).
For speaking, criteria of judgment include fluency and intelligibility,
grammar and vocabulary, interactional skills and functional language use or
sociolinguistic proficiency. A holistic scale from one to four is given (Johnson,
2005).
The final report of MELAB test includes a score for each part and the final
MELAB score, “which is the average of the scores for the writing, listening, and
GCVR sections” (Weigle, 2000, p. 450). The Speaking test score is added to the
report if the candidate has taken it. The score ranges for each MELAB section are:
Part
Range
Writing
0 –97
Listening
0 – 100
GCVR
0 – 100
Speaking
1- 4 (May include + or -)
Final MELAB
0 – 99 (Average of writing, listening, and GCVR
scores)
(“MELAB Scoring;” n.d.)
Statistical
distribution of
score
MELAB Report (2013) provides descriptive statistics for the writing, listening, and
GCVR sections as well as for the MELAB final score.
Scaled Score Writing
Listening
GCVR
Final Score
Minimum
0
31
Maximum
97
98
Median
75
78
Mean
75.97
76.23
St Deviation
8.63
13.28
Distribution of MELAB Speaking Test Scores
21
99
74
72.22
16.15
Speaking Score
%
Speaking Score
%
1
0.34
3-
7.76
1+
0.57
3
16.44
2-
1.26
3+
20.32
2
1.48
4-
25.00
2+
4.68
4
22.15
44
98
75
74.77
11.62
TEST REVIEWS
13
Standard
error of
measurement
Reliability and SEM Estimates for the Listening and GCVR parts are provided in
MELAB Report (2013).
Listening
GCVR
Administration
Reliability
SEM
Reliability
SEM
January/February
0.89
4.40
0.93
3.81
March/April
0.89
4.46
0.94
4.03
May/June
0.89
4.53
0.94
4.05
July/August
0.89
4.47
0.95
3.85
September/October
0.89
4.50
0.95
3.82
November/December
0.87
4.95
0.93
4.04
Evidence
of
reliability
According to MELAB Technical Manual, (as cited in Russell, 2011), MELAB has
a high reliability rating. The test/retest reliability coefficient for the MELAB is .91,
and the alternate form correlation between two compositions is .89 with a high
level of inter-rater reliability, ranging from .90 to .94 for the writing part. Garfinkel
(2003) stated that support of the test’s overall reliability is provided through a
moderately high and statistically significant correlation between MELAB scores
and estimates made by teachers in one small sample" (p. 756, as cited in Russell,
2011).
The authors of the test provide content-related evidence of validity for each part of
the test in the technical manual by “describing the nature of the skill that the test is
intended to measure, the process of test development and a thorough description of
the prompts and item types.” They also provide construct-related evidence for
validity through “a consideration of the item types on the MELAB in relation to a
theory of communicative language ability,” factor analysis and native-speaker
performance. Besides, they provide criterion-related evidence of validity through
comparing MELAB scores, the MELAB composition and speaking test,
comparing MELAB scores with the TOEFL, and comparing MELAB scores with
teacher assessments of students’ proficiency (Weigle, 2000, p. 451).
Evidence
of validity
14
TEST REVIEWS
Comparison and Contrast of the Three Tests
The teaching context I will work in is Qassim University in Saudi Arabia. Students in this
situation include 10 teaching assistants whose ages range from 24 – 30 years old. These students
study English to get admission into universities in the United States. They are all female Saudi
students coming from different majors, including: Special Education, Psychology, Educational
Leadership and Educational Technology. They all have scholarships to obtain their Master’s and
Ph.D. degrees in the United States. Furthermore, this group of students has similar degrees of
English language proficiency whereby they all are considered high-intermediate to advanced
learners of English. Because this program is designed by the university, classes are taught by
instructors from the English department in the university.
Considering the teaching context and the results of the reviews for each test, I believe that
IELTS is the most appropriate test for assessing adult ESL learners in this program in Saudi
Arabia. There are several reasons that lead me to this conclusion. Both the IELTS and TOEFLiBT are similar in terms of cost, distribution of testing centers in Saudi Arabia, and
organizational acceptance of scores. What makes IELTS unique from other large-scale
proficiency exams is that it assesses English as an international language (Uysal, 2010).
Assessing students communicative competence that shows their grammatical knowledge of
syntax, morphology, phonology and his social knowledge is more important than assessing his
usage of standard English because communicative competence is an essential element in
speaking English internationally.
Test administration is another credit to IELTS over TOEFL iBT. The latter is an online
test, while the IELTS is a paper-based test. There is a probability for having technical problems
while taking the TOEFL test, such as Internet disconnect which is common in Saudi Arabia.
Such a problem may affect the result. Besides, test takers may feel tired from focusing on the
screen with just a short break after the first two sections. Not everyone can work effectively on
the computer for four hours. Besides, a language learner may write well on paper, but he/she
TEST REVIEWS
15
may not be familiar with typing English on the keyboard.
IELTS also has another credit in the waiting time for results. A test taker of IELTS has to
wait only thirteen days to get his/her result, while he/she has to wait fifteen days for the TOEFL
score. Therefore, I can conclude that IELTS is the more appropriate test of determining the level
of language proficiency of this group of learners in Saudi Arabia than TOEFL iBT.
In contrast, MELAB is the less helpful for those students because it is only accepted by 523
universities in the U.S (“MELAB Recognizing Organizations,” n.d.). Meanwhile IELTS and
TOEFL are accepted by over 2,000 universities and colleges in the U.S. Besides, there is
limited availability of the MELAB outside the USA and Canada where it is available only as a
“sponsored group test arranged by ELIUM,” and the speaking section is not available outside
the USA and Canada (“MELAB Recognizing Organizations,” n.d.). In addition, the writing
task in this test makes it less effective than IELTS. There are two writing tasks in IELTS, but
there is a single task in the MELAB. This is a drawback of MELAB because it might not be
representative of a test taker’s actual ability.
16
TEST REVIEWS
References
Alderson, J.C. & North, B. (1991). (eds.). Language testing in the 1990s. London:
Macmillan Publishing Ltd.
Breeze, R. & Miller, P. (2011). Report 5: Predictive validity of the IELTS listening test as an
indicator of student coping ability in Spain. Retrieved from
http://www.ielts.org/PDF/vol12_report_5.pdf.
British Council. (n.d.). Choose IELTS. Retrieved April 17, 2014, from
http://www.http://takeielts.britishcouncil.org/choose
Cambridge Michigan Language Assessments. (n.d.). About the MELAB. Retrieved April 20,
2014, from http://www.cambridgemichigan.org/melab
Cambridge Michigan Language Assessments. (n.d.). MELAB recognizing organizations.
Retrieved April 20, 2014, from
http://www.cambridgemichigan.org/sites/default/files/resources/Reports/MELAB-2013Report.pdf
Cambridge Michigan Language Assessments. (2013). MELAB report. Retrieved April 20,
2014, from http://www.cambridgemichigan.org/sites/default/files/resources/Reports/
MELAB-2013-Report.pdf
Cambridge Michigan Language Assessments. (n.d.). MELAB scoring. Retrieved April 20,
2014, from http://www.cambridgemichigan.org/exams/melab/results
Educational Testing Service. (2011). Reliability and comparability of TOEFL iBT™ scores.
Retrieved April 17, 2014, from http://www.ets.org/s/toefl/pdf/toefl_ibt_research_s1v3.pdf
Educational Testing Service. (n.d.). TOEFL iBT® test content. Retrieved April 17, 2014, from
http://www.ets.org/toefl/ibt/about/content/
Educational Testing Service. (n.d.). TOEFL iBT™ test framework and test development (PDF).
Retrieved April 17, 2014, from
http://www.ets.org/s/toefl/pdf/toefl_ibt_research_insight.pdf
TEST REVIEWS
17
Educational Testing Service. (2011). Validity evidence supporting the Interpretation and Use
of TOEFL iBT™ Scores. Retrieved April 17, 2014, from
http://www.ets.org/s/toefl/pdf/toefl_ibt_insight_s1v4.pdf
Hill, K., Storch, N., & Lynch, B. (1999). A comparison of IELTS and TOEFL as predictors of
academic success. IELTS Research Reports, 2, 52-63.
International English Language Testing System. (n.d.). Band descriptors, reporting and
interpretation. Retrieved April 17, 2014, from
http://www.ielts.org/researchers/score_processing_and_reporting.aspx
International English Language Testing System. (n.d.). IELTS™: Information for candidates.
Retrieved April 17, 2014, from
http://www.ielts.org/pdf/Information_for_Candidates_booklet.pdf
International English Language Testing System. (n.d.). Predictive validity
Retrieved April 17, 2014, from
http://www.ielts.org/researchers/research/predictive_validity.aspx
International English Language Testing System. (n.d.). Test performance 2012. Retrieved April
17, 2014 from http://www.ielts.org/researchers/analysis_of_test_data/test_ performance_
International English Language Testing System. (2013). Test and Score Data Summary for
TOEFL iBT® Tests. Retrieved September 16, 2014, from
http://www.ets.org/s/toefl/pdf/94227_unlweb.pdf
International English Language Testing System. (n.d.). Test takers FAQs. Retrieved April 17,
2014, from http://www.ielts.org/test_takers_information/test_takers_faqs.aspx
International English Language Testing System. (n.d.). The proven test with ongoing
innovation. Retrieved April 17, 2014, from
http://www.ielts.org/institutions/about_ielts/the_proven_test.aspx
TEST REVIEWS
18
Jamieson, J., Jones, S., Kirsch, I, Mosenthal, P. & Taylor, C. (2000). TOEFL ™ framework: A
working paper. TOEFL Monograph Series Report, No.16. Princeton, NJ: Educational
Testing Service.
Johnson, J. (2005). (ed). Spaan Fellow working papers in second or foreign language
assessment. Retrieved April 17, 2014, from https://wwwprod.lsa.umich.edu/UMICH/eli/
Home/_Projects/Scholarships/Spaan/PDFs/Spaan_Papers_V3_2005.pdf
Lawrence, I. (2008). Validity evidence supporting the interpretation and use of TOEFL iBT™
Scores. TOEFL iBT Research Insight, 4, 1-16. Retrieved from
http://www.ets.org/s/toefl/pdf/toefl_ibt_insight_s1v4.pdf
Lawrence, I. (2011). Reliability and comparability of TOEFL iBT™ scores. TOEFL iBT
Research Insight, 3, 1-7. Retrieved from
http://www.ets.org/s/toefl/pdf/toefl_ibt_research_s1v3.pdf
MELAB Success. (n.d.). Retrieved April 20, 2014, from http://www.melabsuccess.com/
Russell, B. (2011). The Michigan English Language Assessment Battery (MELAB). Retrieved
April 17, 2014, from http://voices.yahoo.com/the-michigan-english-language-assessmentbattery-melab-8793013.html?cat=9
Sawaki,Y. Stricker, L. & Oranje, L. (2009). Factor structure of the TOEFL Internet-Based
Test (iBT): Exploration in a field trail sample. ETS, Princeton, NJ.
Weigle, S. C. (2000). Test review: The Michigan English language assessment battery
(MELAB). Language Testing, 17, 449–455
Uysal, H. H. (2010). A critical review of the IELTS writing test. ELT Journal, 64(3), 314-320.
Download