Concurrent Validity of TOEIC and CSEPT: A Case Study on Lower

advertisement
Concurrent Validity of TOEIC and CSEPT: A Partial-range Case Study
樹德科技大學通識教育學院 講師
劉冠麟
Abstract
The literature in measurement indicated a five-step conventional method of correlating
different assessment results: equating, calibrating, statistical moderation, benchmarking, and
social moderation. (Angoff, 1971, North, 2000, Council of Europe, 2003) This study
examined the concurrent validity of TOEIC and CSEPT by using a condensed approach with
two steps. The first compared the “can-do” lists of the two tests and the second conducted a
statistical analysis by using 35 sets of student scores who participated in TOEIC and CSEPT
within a time span of 5 months. Pearson product-moment correlation of the two tests was
arrived to examine how the test scores relate. A regression model was also produced to predict
the correlation pattern within and beyond the range of collected data. The result computed by
the model was compared with the current stated scores stated by each test administrator in
relation to CEFR. Suggestions and adjustments were proposed based on the findings.
Keywords: concurrent validity, TOEIC, CSEPT, proficiency test, language assessment.
1. Introduction
Various English proficiency tests exit to serve the need of test takers at all walks of life,
adapting to the features that the different tests deigned to target. In Taiwan, policies hold that
citizens should possess adequate English proficiency in order to better play the role as a
member of the global village. Therefore, English learners and educational authorities in
Taiwan, where different kinds of tests are available, depend on scores of these validated tests
to determine whether someone has achieved a certain level in English. A rigorously
established correlation of these different language tests is therefore in need when different
tests are used for the same purpose.
Comparing different test scores should trace back to the development of standardized
tests when different forms of the same test were in need of “equating” to achieve reliability
(Angoff, 1971). The idea later expanded to establish correlation among different tests by
comparing correlation coefficients computed based on the data gathered from the same group
of test takers taking different tests, i.e. the “concurrent validity” (Bachman, 1990).
In Taiwan, several English proficiency tests are available. TOEFL (Test of English as a
Foreign Language) and TOEIC (Test of English for International Communication) were
introduced to Taiwan by ETS (Educational Testing System) for university admission services
and business workforce training. For admissions to Commonwealth universities, IELTS
(International English Language Testing System) was later introduced to Taiwan. KET
(Cambridge Key English Tests) of the main suite tests were also adopted by many preschool
or children’s English educators. In addition to these tests, the GEPT (General English
Proficiency Test) developed locally by LTTC (Language Teaching and Testing Center,
affiliated to National Taiwan University) has been widely recognized as a valid English
proficiency test by Taiwanese governmental and educational institutions. For many collegiate
institutions in Taiwan, CSEPT (College Student English Proficiency Test) is used to assess
students’ English proficiency. These tests developed by specific test administrators reflect a
variation in design and structure, it is difficult to have a clear cut and it is too abrupt and
rough to state exact correlated scores between any two tests without robust validation.
For studies in correlation between different tests, so far the author has only located that
the University of Cambridge Local Examination Syndicate (UCLES) conducted a series of
comparison studies between UCLES and ETS tests such as comparing FCE and TOEFL.
(Bachman, Davidson, Ryan, and Choi, 1995). Among the six tests mentioned, pairs of TOEFL
and TOEIC, IELTS and KET, and GEPT and CSEPT are derived from the same test
developers such as ETS and LTTC or from the same testing system like Cambridge ESOL
(English for Speakers of Other Languages) Examinations. The correlation between GEPT and
CSEPT, TOEIC and TOEFL, IELTS and KET would attract fewer disputes because these
pairs are from the same test administrators. This study aims to examine two particular tests
from different test administrators: TOEIC of ETS and CSEPT of LTTC. Both tests are being
used and recognized in Taiwanese universities as valid tests to demonstrate students’
proficiency in English. The correlation between the two tests could serve as a reference for
students who need to have a quick understanding or prediction in another test by having the
test score of either test as mentioned.
2. Literature review
Before proceeding to the discussion of the correlation of the two tests, some concepts
about correlating different test results should be made clear.
2.1 Concurrent validity
Tests we use nowadays are validated by two theories: the Classic Test Theory (CTT) and
Item Response Theory (IRT). The former theory is conventional and it is analogized as a
“hand tool” (Davidson, 2000) to examine test items’ effectiveness through calculating Item
Facility and Discrimination Index. The later works as a “power tool” to take probability, test
taker ability, and item difficulty into consideration.
Concurrent validity is a concept to examine whether two measures, tests, or scales are
correlated. Davies (1990) proposed that concurrent validity is achieved by having the test and
the criterion administered at the same time. This concept is usually demonstrated by using a
previously validated test to establish a second validated test. For example, this concept is used
by psychologists to establish the validity of a newly formed test on intelligence. It is also used
by researchers of social science to validate cross-lingual research instruments.
Hughes (2003) stated that there are essentially two kinds of criterion-related validity:
concurrent-validity and predictive validity. Concurrent validity is established when the test
and the criterion are administered in at about at about the same time. Nall (2003) explained
that “concurrent and predictive validity” are both forms of criterion-related validity.
Concurrent validity is determined by comparing results from one test format with those of
another instrument which is assumed to be testing “the same thing”, that is, to be held in
reference to the same language construct. This is typically accomplished by examining the
correlation between the tests’ results, looking for high positive correlation coefficient. The
ETS’ own TOEIC documentation, under a section labeled “Construct-related validity”
provides a definition of validity that does not differ from our definition of concurrent validity.
It would seem that the ETS accepts concurrent validity as sufficient proof of construct validity.
Therefore, we can use scores from the same subject of the two tests- TOEIC and CSEPT- to
establish the concurrent validity of the two tests. If the scores indicate a high correlated
relationship, then we can say the two tests have high concurrent validity.
2.2 Correlation, comparability, and Equivalent scores
2.2.1 CEFR descriptors
Council of European (2001) developed a “Framework of Reference” (CEFR) describing
how language users with different language proficiency level would vary in performance. The
reference level classified users as proficient, independent and basic with two subcategories in
each classification. Language performance reflecting proficiency is demonstrated by these
descriptors depicted in each level (see table 1). Language proficiency tests thus base their test
items on these “can-do” descriptors in order to have a common reference for language
proficiencies. The reference levels are recognized within European countries and there are
more and more test administrators and educational authorities adopt the reference table for
test or educational purposes outside Europe. In Taiwan, it has become a reference tool to
correlate, to compare, and to calculate equivalent scores of different language proficiency test
results.
Table 1: Global scale of the Common Reference Level
Proficient
user
C2
C1
Independent B2
user
B1
Basic
user
A2
A1
Source: Council of Europe
Can understand with ease virtually everything heard or read. Can summarize
information from different spoken and written sources, reconstructing
arguments and accounts in a coherent presentation. Can express him/herself
spontaneously, very fluently and precisely, differentiating finer shades of
meaning even in more complex situations.
Can understand a wide range of demanding, longer texts, and recognize
implicit meaning. Can express him/herself fluently and spontaneously without
much obvious searching for expressions. Can use language flexibly and
effectively for social, academic and professional purposes. Can produce clear,
well-structured, detailed text on complex subjects, showing controlled use of
organizational patterns, connectors and cohesive devices.
Can understand the main ideas of complex text on both concrete and abstract
topics, including technical discussions in his/her field of specialization. Can
interact with a degree of fluency and spontaneity that makes regular interaction
with native speakers quite possible without strain for either party. Can produce
clear, detailed text on a wide range of subjects and explain a viewpoint on a
topical issue giving the advantages and disadvantages of various options.
Can understand the main points of clear standard input on familiar matters
regularly encountered in work, school, leisure, etc. Can deal with most
situations likely to arise whilst traveling in an area where the language is
spoken. Can produce simple connected text on topics which are familiar or of
personal interest. Can describe experiences and events, dreams, hopes and
ambitions and briefly give reasons and explanations for opinions and plans.
Can understand sentences and frequently used expressions related to areas of
most immediate relevance (e.g. very basic personal and family information,
shopping, local geography, employment). Can communicate in simple and
routine tasks requiring a simple and direct exchange of information on familiar
and routine matters. Can describe in simple terms aspects of his/her
background, immediate environment and matters in areas of immediate need.
Can understand and use familiar everyday expressions and very basic phrases
aimed at the satisfaction of needs of a concrete type. Can introduce him/herself
and others and can ask and answer questions about personal details such as
where he/she lives, people he/she knows and things he/she has. Can interact in
a simple way provided the other person talks slowly and clearly and is prepared
to help.
2.2.2 CEF correlation process
North (2000) and the Council of Europe (2001) stated in papers that:
The scales for the Common References Levels are intended to facilitate the description
of the level of proficiency attained in existing qualifications- and so aid comparison
between systems. The measurement literature recognizes five classic ways of linking
separate assessments: (1) equating; (2) calibrating; (3) statistical moderation; (4)
benchmarking, and (5) social moderation.
This corresponds to the literature in measurement indicating the process of equating and
calibrating two forms of the same tests as the initial development in test score comparison.
(Angoff, 1971) North continues:
The first three methods are traditional: (1) producing alternative versions of the same
test (equating), (2) linking the results from different tests to a common scale
(calibrating), and (3) correcting for the difficulty of test papers or the severity of
examiners (statistical moderation).
The last two methods involve building up a common understanding through discussion
(social moderation) and the comparison of work samples in relation to standardized
definitions and examples (benchmarking). Supporting this process of building a common
understanding is one of the aims of the Framework. This is the reason why the scales of
descriptors to be used for this purpose have been standardized with a rigorous
development methodology. In education this approach is increasingly described as
standards-oriented assessment. It is generally acknowledged that the development of a
standards-oriented approach takes time, as partners acquire a feel for the meaning of
the standards through the process of exemplification and exchange of opinions.
The ideas for expert judgment as social moderation and benchmarking can also trace
back to early development in educational measurement. Thorndike (1982) proposed a
test-equating method by using experts’ judgments to estimate item difficulties and to equate
tests, in which Rogosa (1982) claimed as novel and potentially important. The expert
judgment method later proved indeed important as we can see the development in recent years
and the issues in discussion in this paper. Furthermore, Kolen and Brennan (2004) proposed
the term “linking” for the concept of putting scores from two or more tests on the same scale.
If the tests conform to the same content and statistical specifications, then we are entitled to
call the resulting linking an equating.
2.3 Tests in consideration
2.3.1 TOEIC
The TOEIC official can-do guide (ETS, 2004) introduces TOEIC (Test of English for
International Communication) as:
TOEIC measures the listening and reading comprehension skills of non-native speakers
of English. The TOEIC test is designed for use by organizations working in an
international market where English is the primary language of communication. These
organizations use TOEIC scores to make employment decisions about selection,
assignment to overseas posts, promotion, training needs, and training effectiveness.”
TOEIC test consists of 200 multiple-choice questions; 100 listening comprehension
questions, and 100 reading comprehension questions. The listening comprehension
section is administered by audiotape; the reading comprehension section is administered
using a standard paper-and-pencil format. The answers from both sections are recorded
on a scannable answer sheet. Examinees receive two subscores, one each for listening
comprehension and reading comprehension, along with a total score (listening
comprehension plus reading comprehension). Each standardized subscore ranges from 5
to 495, with a total score range of 10 to 990.
2.3.2 CSEPT
CEEPT (College Students English Proficiency Test) is developed by the LTTC
(Language Teaching and Training Center) affiliated to National Taiwan University. It is used
only for college students in Taiwan, not open to general public like the other test by LTTC, the
GEPT (General English Proficiency Test). It is conducted only on demand by individual
university administrations; to be held on specific date arranged jointly cooperated by LTTC
and university administrators to test on student’s English proficiency for curriculum or
teaching effectiveness purposes.
The test is comprised of two sections on listening and reading. There are 25
multiple-choice questions in the listening section and 60 multiple-choice questions in the
reading section. Test takers write test responses on optical scannable sheet for computer
scoring process. Test transcripts denote score for listening and reading with both full points as
120, making a possible full mark for the test as 240. More information about CSEPT can be
found at http://www.lttc.ntu.edu.tw.
3. Methodology
The five steps proposed in the literature (North, 2000) are regarded as the “standard” or
“legitimate” method to correlate or to establish the relationship between two tests. However,
the five steps are ideal for large-scale projects and test administrators themselves. This paper
intends to limit the process within two techniques: the first technique this study employed is
to compare the available can-do descriptions as stated by the TOEIC and CSEPT
administrators. Then a statistical data analysis is conducted by calculating the Pearson
correlation between the scores collected. A regression model is also computed to predict
correlated scores for higher performance.
3.1 Data collection
Data for discussion in this study come from two sources. The TOEIC can-do descriptors
are from studies published by ETS (www.ets.org). The CSEPT can-do list are extracted and
translated from LTTC test transcripts, since no other can-do descriptors about CSEPT could
be located.
For the statistical data, the university (a technological university in southern Taiwan)
where the data were collected upholds a policy of an “exit threshold for English proficiency”.
In accordance to the Ministry of Education policy, students have to achieve “CEF A2 English
proficiency” before graduation, or they can have an option for a makeup course after two
official attempts in attending tests. They should submit score reports and obtain enrollment in
the makeup course. A total of 35 sets of scores were included for analysis. For the purpose of
the submitted scores, all of them are under CSEPT 170 and TOEIC 400 (the current
correlation adopted in the university mentioned). Both tests are divided into scores for
listening, reading, and the total. The pairs can be examined through Pearson Correlation. A
regression model can be computed to predict scores correlating the two tests for higher
performance.
4. Findings and discussion
Two steps of establishing the correlation of TOEIC and CSEPT are described as follows.
4.1 Can-do list comparison
There are detail descriptors stated according to listening, speaking, reading, writing, and
interacting skills. For the TOEIC can-do list, listening and reading scores are classified as
bands of 5-100, 105-225, 230-350, 355-425, 430-495, while the CSEPT divided listening and
reading performance as bands of below 29, 30-49, 50-69, 70-89, 90-109, 110 and higher. Here
the 105-225 band of TOEIC and 70-89 band of the CSEPT are included for consideration.
We see that TOEIC can-do list are more detail with “can-do”, “can-do with difficulty”,
and “cannot do” for “reading”, “listening” and “interactive” descriptions. For both 105-225
bands in reading and listening, there are no clear “can-do” description. There are “can do with
difficulty” descriptions. Therefore we would say that the language users can reach this level
of ability if they can perform adjusted easier tasks described in the TOEIC can-do list. (See
table 2 and 3)
Table 2: TOEIC Reading Score of 105 - 225
Can Do
Can Do
with
Difficulty
Cannot
Do
Reading
Writing
read, on storefronts, the type of store or
write a list for items to take on a weekend
services provided (e.g., “dry cleaning,”
“book store”)
read and understand a restaurant menu
read and understand a train or bus
schedule
find information that I need in a telephone
directory
read office memoranda written to me in
which the writer has used simple words
or sentences
read and understand traffic signs
read and understand simple,
step-by-step instructions
read and understand a travel brochure
read and understand directions and
explanations presented in computer
manuals written for beginning users
read and understand a letter of thanks
from a client or customer
read and understand an agenda for a
meeting
read and understand magazine articles
like those found in Time or Newsweek,
without using a dictionary
read highly technical material in my field
or area of expertise with no use or only
infrequent use of a dictionary
identify inconsistencies or differences in
points of view in two newspaper
interviews with politicians of opposing
parties
read and understand a popular novel
trip
write a one- or two-sentence thank-you
note for a gift a friend sent to me
write a brief note to a co-worker
explaining why I will not be able to attend
the scheduled meeting
write a postcard to a friend describing
what I have been doing on my vacation
write clear directions on how to get to my
house or apartment
fill out an application form for a class at
night school
write a letter requesting information about
hotel accommodations for a future
vacation
write a short note to a co-worker
describing how to operate a standard
piece of office equipment (e.g.,
photocopier, fax machine)
write a memorandum to my supervisor
explaining why I need a new time off from
work
write a letter introducing myself and
describing my qualifications to
accompany an employment application
write a memorandum to my supervisor
describing the progress being made on a
current project or assignment
write a complaint to a store manager
about my dissatisfaction with an
appliance I recently purchased
write a letter to a potential client
describing the services and/or products
of my company
write a 5-page formal report on a project
in which I participated
write a memorandum summarizing the
main points of a meeting I recently
attended
If we look at the descriptions in the listening part, we see that tasks in this band are quite
fundamental in the daily-life context. For example, understanding “how are you?”,
understanding prices told when shopping, ordering food in restaurants.
Table 3: TOEIC Listening Score of 105 - 225
Can Do
Can Do
with
Difficulty
Cannot
Do
Listening
Speaking
 understand simple questions in social
 introduce myself in social situations and
situations such as “How are you?”
“Where do you live?” and “How do you
feel?”
 understand a salesperson when she or
he tells me prices of various items
 understand someone speaking slowly
and deliberately, who is giving me
directions on how to walk to a nearby
location
 understand a person’s name when she or
he gives it to me over the telephone
understand directions about what time to
come to a meeting and the room in which
it will be held
understand explanations about how to
perform a routine task related to my job
 understand a co-worker discussing a
simple problem that arose at work
understand announcements at a railway
station indicating the track my train is on
and the time it is scheduled to leave
 understand headline news broadcasts on
the radio
 understand a client’s request made on
the telephone for one of my company’s
major products or services
 understand play-by-play descriptions on
the radio of sports events that I like (e.g.,
soccer, baseball)
 understand an explanation given over the
radio of why a road has been temporarily
closed
 understand someone who is speaking
slowly and deliberately about his or her
hobbies, interests, and plans for the
weekend
 understand a discussion of current
events taking place among a group of
persons speaking English
 understand an explanation of why one
restaurant is better than another
use appropriate greeting and
leave-taking expressions
 state simple biographical information
about myself (e.g., place of birth,
composition of family)
 order food at a restaurant
 describe my daily routine (e.g., when I
get up, what time I eat lunch)
 describe the plot of a movie or television
program that I have seen
 describe a friend in detail, including
physical and personality characteristics
 describe my academic training or my
present job responsibilities in detail
 talk about topics of general interest (e.g.,
current events, the weather)
 talk about my future professional goals
and intentions (e.g., what I plan to be
doing next year)
 tell a co-worker how to perform a routine
job task
 telephone the airline to change my flight
reservations to a different time and day
 tell a colleague at work about a
humorous event that recently happened
to me
 adjust my speaking to address a variety
of listeners (e.g., professional staff, a
friend, children)
tell someone directions on how to get to
my house or apartment
 give a prepared half-hour formal
presentation on a topic of interest
However, the interactive descriptions in this 105-225 band seem more difficult in
relation to their listening and reading descriptions. For example, “explaining policies”,
“discussing best ways for a task”, “bargaining prices with an real-estate agent”. (See table 4)
These description seem too difficult for English learners at this beginning level to perform, if
the correlation is defined as the level around CEF A2. If these descriptions in the interactive
part involves only listening comprehension, it is reasonable to the author of having these
descriptions in this level. Even so, some of these tasks would be too complicated for English
learners in this level.
Table 4: TOEIC Interactive Score of 105 - 225
Interacting
Can Do
explain written company policies to a new employee
 discuss with a co-worker the best way to accomplish a job task
 meet with a doctor and explain the physical symptoms of my illness
 meet with a real-estate agent to discuss the type of house I would like
to buy
 discuss world events with an English-speaking guest
discuss with my boss ways to improve customer service or product
quality
 conduct an interview with an applicant for a job in my area of expertise
 conduct simple business transactions at places such as the post
office, bank, drugstore
 telephone a restaurant to make dinner reservations for a party of three
give and take messages over the telephone
 discuss with an electronics salesperson the features I want on a new
videocassette recorder (VCR)
 explain to a repairman what is wrong with an appliance that I want
fixed
 request information over the telephone (e.g., check airline schedules
with a travel agent)
 talk to an elementary school class about what I do for a living
 telephone a department store and find out if a certain item is currently
in stock
The CSEPT can-do list descriptions are shorter compared to the can-do list in TOEIC.
The main gist of the can-do list in this 70-89 band focuses on the tasks in daily life context as
well. If we compare the content of the TOEIC can-do and the CSEPT can-do, we would find
most descriptions in the listening and reading part match but not for the interactive part. (See
table 5)
Table 5: CSEPT can-do statement
Understand
basic conversations in school and daily contexts.
Be
70 ~ 89
Listening
able to comprehension the main ideas or important detail
information of speaking without speakers slowing delivery speed.
However,
repetition and/or explanation are still needed.
(能聽懂與學校及日常生活相關之基礎會話,說話者通常無須放慢速度,但
仍須重覆或解釋;能掌握大意及部分之重要細部資訊。)
Be
able to understand sentence structures with occasional errors
in using simple sentences.
Be
able to read passages, articles and related phrases with
3000-4000 word count ability.
Composite

Be able to use context to predict word and sentence meaning.

Cannot handle complex sentences.
(能瞭解基本句子之語法結構,使用簡單句時偶有錯誤;對複雜句之掌握仍
有困難。能閱讀約 3,000~4,000 字彙及相關片語之讀物,能大致利用字詞結
構及上下文推測字詞意義或句子內容。)
Note: The English can-do statements are translated by author with original in Chinese in
Brackets.
If we examine the descriptors stated in the can-do list of TOEIC compared to the CSEPT
can-do list again, we would find that descriptors in CSEPT match the descriptors in TOEIC in
this band. Therefore we can say that TOEIC scores as 210 to 450 match that of CSEPT 140 to
180, accordingly.
4.2 Statistical data analysis
Table 6 indicated that Listening scores for TOEIC and CSEPT are not correlated. The
Pearson product-moment correlation between the TOEIC and CSEPT listening scores is .060
with P=.733 (>.05). The correlation indicated no significance. Therefore the scores in
listening reported no correlation. This might be the result of having too few samples in this
study; the fluctuation of score variation cannot be resolved by the number of sample.
Table 6: CSEPT and TOEIC Listening (N=35)
CSEPT-LISTEN TOEIC-LISTEN
CSEPT-LISTEN Pearson Correlation
Sig. (2-tailed)
1.000
.060
.
.733
Table 7 indicated that reading scores for TOEIC and CSEPT are correlated, with a
Pearson correlation of 0.342 (P=.045).
Table 7: CSEPT and TOEIC reading (N=35)
CSEPT-READ
CSEPT-READ
TOEIC-READ
Pearson Correlation
1.000
.342 *
Sig. (2-tailed)
.045
.
* Correlation is significant at the 0.05 level (2-tailed).
Table 8 indicated that the total scores for TOEIC and CSEPT are correlated, with a
Pearson correlation of .393 (P=.020). The total score correlation is a stronger indication that
TOEIC and CSEPT scores are correlated. We can use either one of the two tests to predict the
potential score in another test.
Table 8: TOEIC and CSEPT correlation (N=35)
CSEPT-TOTAL
CSEPT-TOTAL
Pearson Correlation
Sig. (2-tailed)
TOEIC-TOTAL
1.000
.393 *
.
.020
* Correlation is significant at the 0.05 level (2-tailed).
The regression analysis corresponded to the Pearson correlation test. The degree of
freedom with denominator 35 at .05 significance is 4.12. The regression analysis reported an
F value at 6.021. Therefore the model stands as a valid one. (See table 9)
Table 9: ANOVA analysis
Model
Sum of Squares
df
F
Sig.
6.021
.020
Mean
Square
1
Regression
7721.032
1
7721.032
Residual
42318.968
33
1282.393
Total
50040.000
34
a. Predictors: (Constant), CTOTAL
b. Dependent Variable: TTOTAL
Table 10: Coefficients of TOEIC and CSEPT
Model
1
Unstandardized
Std. Error
Standardized
Coefficients
Coefficients
Beta
Beta
(Constant)
177.694
24.521
CTOTAL
.563
.229
.393
t
Sig.
7.247
.000
2.454
.020
a. Dependent Variable: TTOTAL
Since the model has been proved valid, we can use table 5 to arrive at a formula to
predict scores at higher performance. Table 10 indicated a 177.694 constant and β=0.563.
The model is as follows:
Y= 177.694+ 0.563X
(Where Y=TOEIC, X=CSEPT)
The current adopted score for correlation is CSEPT 170=CEF A2 and TOEIC 350=CEF
A2. According to this correlation, CSEPT 170 should be equal to TOEIC 350. However, the
regression model we found based on the collected data is different. When CSEPT is 170,
TOEIC should be 273.404.
However, the worry about too few samples might have also resulted in a limited range
problem. The correlation works fine within the data range, i.e. below TOEIC 400 and CSEPT
170. If we would like to see the correlation above the range, we should include data for the
upper range, which are not available when the study was done. However, we can use
reasonable hypothetical data to see how the correlation would be established. This study adds
in the same amount of dummy samples with full marks on both tests, i.e. 35 sets of TOEIC
990 and CSEPT 240. The data are as table 11.
Table 11: ANOVA analysis
Sum of Squares
df
Mean Square
F
Sig.
Regression
328914.980
1
328914.980
1062.618
.000
Residual
21048.220
68
309.533
Total
349963.200
69
Model
1
a. Predictors: (Constant), TOEIC
b. Dependent Variable: CSEPT
The analysis indicated that the correlation is significant, in which the F value is 1062.618
far above the critical value. The high F value indicated a valid regression model.
Table 12: Coefficients of TOEIC and CSEPT
Model
Unstandardized
Std. Error Standardized
Coefficients
Coefficients
B
Beta
1 (Constant)
60.621
4.007
TOEIC
.181
.006
.969
t
Sig.
15.130
.000
32.598
.000
a. Dependent Variable: CSEPT
According to the data in table 12, the highly validated regression model is as follows:
Y= 60.621+ 0.181X
(Where Y=CSEPT, X=TOEIC)
The second model would offer a totally different opinion about the correlation. If we use
the LTTC stated CSEPT 170 as CEF A2, we will get a TOEIC about 604 for CSEPT 170.
Moving down to CEF A1 as CSEPT 130, we will get a TOEIC 383. These figures, on the
contrary, are higher than the stated correlated score in term of TOEIC correlated to CSEPT.
Again, more data are need for further examination.
5. Conclusion
This paper investigates the concurrent validity of TOEIC and CSEPT by using two
condensed validation measures. The first “can do” list comparison indicated a correlation
range that TOEIC scores as 210 to 450 match that of CSEPT 140 to 180. In addition, the
statistical analysis on the collected data shows following results. The listening scores of
TOEIC and CSEPT is not correlated. Reading scores and the total scores of TOEIC and
CSEPT are positively correlated. And the total composite scores of TOEIC and CSEPT are
significantly correlated. We can say that TOEIC and CSEPT have significantly validated
“concurrent validity”, we can use a person’s TOEIC score to estimate her/his performance in
CSEPT, or vise versa.
The regression analysis indicated that the total scores of TOEIC and CSEPT are
significantly correlated and a regression model based on the collected scores under the band
of TOEIC 400 and CSEPT 170 reported a “CSEPT 170=TOEIC 273.404” and “CSEPT 130=
TOEIC 250.884” correlation. The correlation indicated a lower score compared to the
currently used “CESPT 170=TOEIC 350” correlation based on CEF. It is therefore suggested
to lower the TOEIC score correlated to the CSEPT test.
Nevertheless, the second trial with hypothetical data turned the correlated around. After
adding equivalent amounts of both full marks in TOEIC and CSEPT, we have a correlation of
“CSEPT 170=TOEIC 604.304” and “CSEPT 130=TOEIC 383”. The result contrastively
suggested making higher TOEIC score correlated to the CSEPT test.
However, the initial results bear a lot of space for improvement. First, the correlation is
theoretically valid within the range with supporting data, e.g. under TOEIC 400 and CSEPT
170. Second, the correlation should be refined through research design and data collection.
Third, the number of samples should be increased for more accurate calculation. Nevertheless,
the findings would serve as a quick reference to understand how these two tests are used and
correlated. It is hoped to continue to enrich the data for more accurate correlation in the
future.
References:
Angoff, W. (1971). Scales, norms, and equivalent scores. In Thorndike, R. (Eds.) Educational
measurement. Washington D. C.: American Council on Education.
Bachman, L, Davidson, F., Ryan, K., and Choi, I. (1995). An investigation into the
comparability of two test of English as a foreign language: the Cambridge-TOEFL
comparability study. Cambridge: Cambridge University Press
Bachman, L. (1990). Fundamental Considerations in Language Testing. London: Oxford
University Press.
Council of Europe (2001). Common European Framework of Reference for Languages.
Council of Europe/Cambridge University Press.
Davidson F. (2000). The language tester’s statistical toolbox. System, 28, 605-617.
Davies, A. (1990). Principles of language testing. Oxford: Basil Blackwell.
ETS (2004). TOEIC Can-Do guide: linking TOEIC scores to activities performed using
English. Chauncey Group International.
Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press
Kolen, M. & Brennan, R. (2004). Test equating, scaling, and linking: methods and practices.
NY: Springer.
LTTC (2003). Concurrent validity of GEPT. Taipei: LTTC.
Nall, T. (2003). TOEIC: a discussion and analysis. Retrieved on 20 January 2006. From
http://www.geocities.com.two centselfcafe/teach/toeic.htm.
North, B. (2000). Linking language assessments: an example in a low stakes context. System,
28, 555-577.
Rogosa, D. (1982). Discussion of “item and score conversion by pooled judgment”. In
Holland P. & Rubin, D. (Eds.) Test equating. (pp. 319-326) NY: Academic Press.
Thorndike, R. (1982) Item and Score conversion by pooled judgment. In Holland P. & Rubin,
D. (Eds.) Test equating. (pp. 309-318) NY: Academic Press.
Download