Uploaded by Behbud Muhammedzade

summaryoftestinglanguageskillsfromtheorytopracticepartonehosseinfarhady-170803093722

advertisement
Testing language Skills From Theory To Practice
(Part One)
Prepared By: Mousazadeh





Evaluation is a major consideration in any educational
setting.
Teachers have always wanted to know how much
their students have learned.
The government and private sectors which pay
teachers and employ the students afterwards are
interested in having precise information about
students’ abilities.
Students, teachers, administrators and parents all
work toward achieving educational goals.
Measurement and evaluation are essential devices to
help students and teachers and administrators to
make sound educational decisions.


Some educational decisions will affect
a large number of people (Ex: the
entrance exam to the universities).
Some educational decisions will affect
only a single person (Ex: should Ali be
placed in an advanced group?).

A good decision is defined as one that is
based on all relevant information.



The term test is usually considered the
narrowest of the three terms (tests,
measurement, and evaluation).
Tests connote the presentation of a set of
questions to be answered.
As a result of a person’s answers to a set of
questions, we obtain a measure or a
numerical value of a characteristic of that
person.


Measurement
implies
measuring
characteristics by means other than giving
tests.
Using observations, rating scales or other
devices that allow us to obtain information in
a quantitative form is measurement.



Stufflebeam et al. (1971) stated that
evaluation is “the process of delineating,
obtaining and providing useful information
for judging decision alternatives”.
Evaluation is interpreted as the determination
of the congruence between performance and
objectives.
Evaluation is categorized as professional
judgment or as a process that allows one to
make a judgment about the value of a
measure.

Evaluation requires that we have a goal or
objective in mind.
Both test givers and test takers benefit from
the test results.
 Testing will encourage the students and will
motivate them in learning the subject matter.
 Teachers should provide positive classroom
experiences for their students through giving
tests.
 Appropriate evaluation provides a sense of
accomplishment in the students and
alleviates their dissatisfaction about the
educational program.






Testing will help the students prepare themselves
and thus learn the materials.
Repeated preparations will enable students to
master the language.
Students will benefit from the test results and the
discussion over these results.
Several tests or quizzes will make students better
aware of the course objectives.
The analysis of the test results will reveal the
students’ areas of difficulty.
The students will have an opportunity to make up
for their weaknesses.

They should provide good instruction and
appropriate evaluation.

He believes that a better awareness of course
objectives and personal language needs can
help the students adjust their personal
activities towards the achievement of their
goals.






An appropriate test should provide answers
to the following questions:
Has the instruction been successful?
Were the materials for the instruction at the
right level?
Have all language skills been emphasized
equally?
What points need reviewing?
Should the same materials be used next year
or do they need some modification?

They need testing to explain and justify their
activities in class.
The analysis of the results should provide
answers for the following questions:




Were the test instructions clear?
Was the allotted time sufficient?
How did the students feel when responding
to the items?
Were the test results a reflection of the
students’ performances during the course?


They are frequently used to evaluate the
students’ progress in school.
Through
classroom
achievement
tests,
teachers can measure the efficiency of the
instruction.
They state that teacher-made tests are
valuable because:



They measure the students’ progress based
on the classroom activities.
They motivate students.
They provide an opportunity for the teacher
to diagnose students’ weaknesses.


The content of the test might be ambiguous.
Sometimes the tests
instructional materials.
are
irrelevant
to

Any teacher-made tests must be based on a
predetermined content to measure the
students’ knowledge at a given point of time.


They are commercially prepared by skilled
test makers and measurement experts.
They provide samples of behavior under
uniform procedures.
They are different in terms of the following
aspects:
 Direction for administration and scoring
 Sampling of content
 Construction
 Norms
 Purposes and use


In teacher-made tests there is usually no
uniform directions specified.
In standardized tests, specific instructions,
standardized administrations and scoring
procedures are used.


In teacher-made tests, both content and
sampling are determined by classroom
teacher.
In standardized tests, content is determined
by curriculum and subject matter experts and
involves extensive investigations of existing
syllabi, textbooks, and programs. Sampling
of content is done systematically.


In teacher-made tests, the construction may
be hurried and random; there is often no test
blue prints, item try outs, item analysis or
revision; quality of test may be quite poor.
In
standardized
tests,
meticulous
construction procedures are used that include
constructing objectives and test blue prints,
employing item try outs, item analysis and
item revisions.


In teacher-made tests, only local classroom
norms are available.
In standardized tests, in addition to local
norms, national and district norms are
available.


In teacher-made tests, particular objectives
set by teacher and for intra class comparisons
are measured.
In standardized tests, broad curriculum
objectives and for inter class; school and
national comparisons are measured.

It is meant that the same fixed set of
questions is administered with the same set
of directions, time restrictions, scoring
procedures.

Scoring is usually based on objective
procedures. However, some may also include
essay type questions.


Teacher-made tests usually cover a single
unit of work or that of a term.
Standardized tests usually have a wider range
of coverage (that is they cover more material).
They assess either one year’s learning or
more than one year’s learning.

Testing is viewed as a practical teaching
strategy giving learners useful opportunities
for discussion of language choices.

He believes that “language testing today
reflects current interest in teaching, but it
also reflects earlier concerns for scientifically
sound tests.”
Traditional tests
Multiple- choice tests
Testing commutation





They are closely related to GTM in language
teaching.
The early stage of traditional testing is called
intuitive stage in which the relationship between
language teaching and testing is stronger.
Knowing about the language was emphasized.
Students had to memorize many language rules
and lists of words.
Traditional tests also include a great deal of
writing
(composition)
and
reading
comprehension.




(Ex1: convert the following statement into
past tense)
(Ex2: write the main parts of these verbs: go/
buy)
(Ex3: make sentences using each of these
words: bashful, diligent)
(Ex4: translate the following sentence or
passage into Farsi)

Structural linguistics and behavioristic
psychology

These two disciplines suggested that
“language mastery could be evaluated
scientifically bit by bit” (Madsen, 1983).

Behaviorist psychologists consider language
learning as a set of habits.

Structural linguists analyze the components
of language (sound, morphemes, words,
syntax).

Objective tests were devised to measure
different language elements.

The main reasons were the emergence of
structural
linguistics
and
bahavioristic
psychology and unreliability of subjective
tests.

Open –ended and multiple –choice tests





They are the most popular types of objective
tests.
The students are presented with alternatives or
options (including one correct answer and
distracters).
They are expected to choose the correct
alternative.
They measure only a single or discrete feature of
the language.
They provide the learner with restricted contexts,
usually no wider than the item context.

Constructing good test items with reasonable
distracters is very difficult.

She suggests that” the inexperienced test
constructors should first prepare openended items and administer them to some
students. The wrong answers provided by the
students could be used as reasonable
distracters later on.”

Uncommon and implausible distracters are
dangerous instruments to be used in
language testing. Many types of multiple
choice tests expose students to a lot of
unlikely errors.

Many language tests prepared by teachers
intend to examine linguistic components
separately. These linguistic components
constitute language skills.




These tests move toward global testing
They make more comprehensive demands on
the learners
Two very popular types of global test are
dictation and cloze.
The term cloze is taken from Gestalt
psychology and is based on a passage with
some deleted words.



It requires perceptive and productive skills
and a sound knowledge of lexical and
grammatical systems.
Students should
linguistic clues.
take
advantage
of
all
The students should rely on some other
contextual clues, too.

Tests are constructed to enable learners to
manipulate language functions and to identify
utterances as belonging to a certain function
of language.

There is a misconception about language that
successful language usage would lead to
successful language use. While linguistic
aspects of language are only one part of the
communication process.
Functions of
language tests

A test is an instrument for collecting
numerical information of an attribute.

The purpose is to determine the degree of
existence of an attribute.

The function of a test refers to the purpose
for which a test is designed.

Prognostic and evaluation of attainment

Placement test, aptitude test, and selection
test.

They are not directly related to a particular
course of study.

They are based on a clearly specified course
of instruction.

Achievement test, proficiency test, and
knowledge test

The scores are used to make decisions about
the most appropriate channel of educational
or occupational career for the testees.

The main goal is to make sound decisions
about the future success of the examinees on
the basis of their present capabilities.



The purpose is to provide information upon
which the examinees’ acceptance or non acceptance into a particular program can be
determined.
Ex1: taking a selection test to obtain a
drivers’ license
Ex2: taking a selection test to demonstrate
your capabilities for employment as a typist.

There is a criterion for pass or fail in selection
test but not for the placement test.

Due to administrative limitations, admitting
all applicants who pass a selection test is not
possible. In other words, when the number of
applicants passing a test exceeds the capacity
of the educational program, it changes into
competition test.



There are two options:
To increase the facilities to admit more
applicants or
To modify the passing criteria (i.e. the
difficulty level of the test can be increased).


They are used to determine the most
appropriate channel of education for the
examinees. The purpose is to help those who
need more help.
Ex: taking placement test in language
department of university to take academic
courses.

Placement test
To predict applicants’ success in achieving certain
objectives in the future
The examinee does not need to have prior knowledge of
the subject being tested
They can be contributed to making decisions on the
future career of the applicants
Ex: how good a pilot, an engineer, or a teacher can one
be?

Developing aptitude tests is a very delicate
and time consuming task. Weak tests may
provide invalid and misleading information.

Evaluation of attainment tests deal with the
extent to which examinees have learned the
materials they have been taught. While they
are directly related to educational settings,
prognostic tests are not so.
They are used to measure
the degree of students’
learning from a particular
set of materials.
They measure the detailed
elements of an instructional topic.
They are used to determine the
strength and weaknesses of the
examinees in a particular course
of study.
They are developed on the basis of
materials being taught.
They are designed to measure
students’ overall achievement in a
particular language class.
Achievement tests can be used for
both instruction and evaluation
purposes.
Achievement test focus on measuring
students’ achievement of the materials
covered within the course.
Proficiency tests measure the overall
language ability of the learners.
They measure the degree of learners’ knowledge
through his language education.
They measure the degree of his capability in language
components.
They measure the degree a person is able to practically
demonstrate his knowledge of language use.
Many universities use proficiency tests for admission
purposes (TOEFL)
Construction of
proficiency tests is more
difficult than other tests.
It is not easy to define
proficiency.

They are used in situations where the
medium of instruction is a language other
than the learners’ mother tongue. They
measure the examinees’ knowledge in areas
other than the language itself.
Forms Of
Language
Tests
Explanation of the concept
of the item
Different classifications of
item formats
The advantages and
disadvantages of item
formats classifications
The test
appearance
may put the
testee in an
unexpected
situation.
It may
disappoint or
encourage
the testee.
The appearance
of the test may be
harmonious with
or contrary to his
presuppositions
about it.

The form refers to the physical appearance of
the test.
The form depends on the nature and
varieties of attributes to be measured.
The form also depends on the
function of the test.

An item is the smallest unit of a test.

An item consists of two parts: the stem and
the response.

The purpose of the stem is to elicit
information from the examinee and to make
examinees provide the examiner with
information.

Stem can be presented as a question, a
statement, an incomplete sentence, or as
other varieties.

The response refers to the information
elicited from the examinees.

The response can range from recognizing a
single word to providing a comprehensive
essay presenting discussion or explanation of
a complex issue.

The stem is followed by three, four or five
responses. The responses are called
alternatives, options or choices one of which
is the correct response and the others are
called distracters.
Alternatives
correct
response
distracters

An alternative may or may not be the correct
response; in other words alternatives include
both the correct response and the distracters.
Whereas distracters consist of only wrong
alternatives.
Subjective versus objective items
Essay type versus multiple choice items
Suppletion versus recognition items, and
psycholinguistic classification

Translation tests were used as major
techniques of testing (translating a passage
or a set of sentences from one language into
another).
The advantage was that the content of translation
tests was relevant to the materials to be tested. In
other words the content was a valid
representation of the materials.
The main shortcoming of translation test was that
the scoring procedures were not systematic.
The scoring is not systematic.
It requires a great amount of time and energy.
Fluctuations of scores from one scorer to another
creates serious problem for the consistency of test
scores.
The scoring did not follow any objective criterion.
To compensate for the
inadequacies of subjective tests.
To apply psychometric principles
to language tests.
To develop consistently scored
tests.

Objectivity or subjectivity refers to the way a
test is scored and has little or nothing to do
with the form of a test. It is misunderstanding
to assume all composition tests are
subjective or all multiple- choice tests are
objective.

It refers to all kinds of items in which the
examinee is required to produce language
elements.

It refers to all kinds of items in which the
examinee is required to select the correct
response from among given alternatives.

There are different varieties of essay type
formats ranging from a single word
production to producing a comprehensive
explanation and each of them requires a
certain type of activity. They cannot be
classified under the same category.

Recognition form items require the
examinees to recognize the correct response
from among the alternatives provided for
each stem.

They require the examinee to supply the
missing parts of the stem or complete an
incomplete stem.

The degree of production and the way they
will be scored were not clear.

In the new classification, the form of the item
is determined by taking theoretical principles
of language processing into account.

Because it assumes both psychological and
linguistic principles as the underlining
theoretical assumptions of item formats.
Verbal
non
verbal
Perception
production of oral
or written materials
identification
analysis
recognition
comprehension
Verbal manifestation
includes oral and
written forms
Non- verbal
manifestations include all
sorts of graphic devices.

Statistics involves collecting numerical
information called data, analyzing them and
making meaningful decisions on the basis of
the outcome of the analysis.
There are two major areas :
Descriptive statistics
inferential statistics.

According to Hatch and Farhady (1982), in
descriptive statistics we describe sample
data.



Each characteristic of the sample is called a
statistic. Through utilization of the methods
of inferential statistics and from the statistic,
we can make inferences about the
characteristics of a given population.
Statistic: the characteristics of a given sample
Parameter: the characteristics of a given
population



Normally the first step in summarizing the
data is to arrange the scores in the order of
size, usually from the highest to the lowest.
For ties, ranks are averaged.
Scores: 19Ranks: 1-
172-
16163.5- 3.5-
145-
136-
127.5-
127.5-
89-





In addition to the time and trouble required to
determine the ranks, the list is long and
inadequate for making comparisons with other
groups or classes that are much larger or much
smaller.
Ex: ranking 19th in a class of 20 is poorer than
ranking 19 in a class of 100 students.
The status of a score should not simply be
announced by the number of scores above or
below it.
Ex: Reza got the third highest mark in the class.
The number of scores in the entire distribution
would have to be made known.


Frequency is the number of times each score
occurs.
EX: 19, 19, 19, 17, 17, 15, 15, 15, 15, 15, 13, 12, 11
f19=3
 f13=1

f17=2
f12=1
f15= 4
f11=1

Frequency distribution
Percentile
percentage

By using percentile we can determine the
position of a score in a given distribution. By
percentile we can report how a given student
is doing.

It refers to the frequency of each score
divided by the total number of scores.
(RF=f/N)
The relative frequency is multiplied by 100 and
the result is called percentage. By percentage we
can say that what percent of the subjects passed
the test or received a particular score.
(Percentage= RFX100)

It indicates the standing of any particular
score in a group of scores. It shows how
many scores fall below the given score or
point in a distribution.

The cumulative frequency is obtained by
adding the frequency of successive intervals
in the previous work. The cumulative
frequency column is constructed from the
bottom up.

Lower case letter (f) is used for absolute
frequency and upper case (F) for cumulative
frequency.
To compute the percentile rank of any level
or point, the corresponding F should be
divided by the total number of scores and
the result is then multiplied by 100.
(percentile= F/N X 100)

A percentile rank of an individual
means what percent of the students
who took the test scored at or lower
than the level in question. The
percentile rank of an individual score
is often more helpful than the
particular score itself.
Scores
(ranked
order)
Frequency(f)
Relative
frequency(RF)
Percentage
Cumulative
frequency(F)
percentile
19
17
15
13
12
11
2
2
3
1
1
1
2/10
2/10
3/10
1/10
1/10
1/10
20%
20%
30%
10%
10%
10%
10
8
6
3
2
1
100%
80%
60%
30%
20%
10%
It is a valuable supplement to summarize the data and
statistical analysis.
A graph or chart attracts the reader’s attention.
A graph is often an effective method of clarifying a point.
It is said that the pictures speak for themselves.
The picture or graph is a more concrete representation of
the data.
He states that today our attention is
called more to the limitless
possibilities in visual education.
The correct graph reveals the message
briefly and simply.
Better comprehension of data than is
possible with textual matter alone.
More analysis of subject than is possible
in a written text.
A check of accuracy.


Bar graph. In bar graphs vertical bars are
used. The height of each bar represents the
number of members or the frequency of that
class.
First two axes should be drawn
(horizontal and vertical lines)
Enter the scores on the horizontal axis
and the frequency of each score on the
vertical one.

The histogram is a series of columns. One
class interval is the base for each column and
the height is the number of cases or
frequency in that class. It is customary to
extend the scale one class interval above and
below the range.

In the histogram, the top of each column is
indicated by a horizontal line and the length
of one class interval represents the frequency
in that class. The points are joined by straight
line.

Usually for description of a set of data, just
two or three properties of a set of scores are
singled out. Indexes known as summary
statistics describe the typical size and spear
of scores.

Two properties including measures of central
tendency and measures of dispersion
(variability) are described.
The
mode
the
mean
the
median


The mode. The mode is the score that occurs
most frequently in a set of scores.
12, 14, 15, 16, 16, 16, 20


When all of the scores in a group occur with
the same frequency it has no mode.
11, 11, 12, 12, 13, 13, 14, 14


The mode is the average of the two adjacent
scores.
12, 13, 14, 14, 15, 15, 16, 17, 18,

(Md) is the score at the 50th percentile in a
group of scores. It is the score that divides
the ranked scores into halves. Half of the
scores are larger than the median and the
other half are smaller.




a) If the data include an odd number of
scores:
The median is the middle score when they are
ranked.
b) If the data include an even number of
scores:
The median is the point half way between the
central values when the scores are ranked.

10,11, 12, 14, 17, 19, 20


10, 12, 13, 14, 15, 17, 19, 20
Md= 14.5





The mean is the arithmetic average. It is
computed by dividing the sum of all scores
by the number of scores. It is represented
through the following formula:
X ֿ= ∑x/N
Xֿ= mean
∑= sum of scores
x=any individual score in a distribution
N= total number of scores
If we subtract the mean from the scores, the resulting
difference is a deviation score (D) it can be either + or -.
The sum of all N deviation scores would be zero.
The sum of the squared deviations of scores from the mean
is less than the sum of the squared deviations around any
point other than the mean.
score
mean
Deviation score
D squared
0
2
-2
4
1
2
-1
1
1
2
-1
1
3
2
1
1
5
2
3
9

Range, variance, standard deviation

Measures of central tendency only locate the
center of the distribution. The location of the
center may not be adequate to provide a
logical picture of the data.
It is the difference between the largest number and the smallest number in the
distribution.
It is the simplest measure of variation to calculate since only two numbers are
used.
It doesn’t tell us anything about how the other terms vary.
If there is one extreme value in a distribution, the dispersion will appear very
large. If we remove the extreme term, the dispersion may become small.






First the mean of the numbers are calculated.
The mean is subtracted from each number.
The results of subtraction are squared.
The average of the squared results are
computed which is variance.
Standard deviation is the square root of the
variance.
Variance=∑(x-xֿ)/N-1


SD tells us about the degree of dispersion of
scores in a distribution.
By comparing the SD of different groups we
would know to what extent they are
homogeneous.




Linear correlation
The coefficient of correlation(Pearson
correlation)
Rank order correlation
Point Biserial correlation ‫همبستگى دو رشته اى نقطه اى‬


20, 19, 18, 17, 16,15
15, 16, 17, 18, 19, 20


20, 19, 18, 17, 16
20, 19, 18, 17, 16

The function refers to the purpose of the test
and the form refers to the way an item is
presented to the examinees. The form and
the function of a test are interrelated. The
function of a test can impose certain
limitations on the form of the items of that
test.
Determining the function
and the form
Planning (determining the
content of the test)
Preparing the items
Reviewing the items
Pretesting
Validating the test
The characteristics of the
examinees
The specific purpose of the
test
The scope of the test
The nature of the population to which the test is likely to
be administered (age, gender,..)
Level of intellectual and cognitive abilities
The examinees’ language background (doing contrastive
analysis)
The examinees’ educational system

The purpose is to gather quantitative
information about the degree of the
examinees’ command in a particular area of
knowledge.
Examining the instructional objectives (ex: including major
structural points covered during the instruction)
Dividing the major topics into their specific points (the
degree of detaildness depends on practical factors such as
test length and test time).
Preparing table of specifications with two dimensions (on
one dimension, topics and subtopics are listed. On other
dimension, form and number of items is described.

The purpose is to assure the test developer
that the test includes a representative sample
of the materials covered in a particular
course.

Preparing items
Avoid using broad general statements
Avoid using statements which measure trivial points
Avoid using negative statements
Avoid using long and complex sentences
Make true and false statements approximately of similar length,
difficulty and distribution.
Use homogeneous materials in a single matching item
Include an unequal number of items in each column
Clarify the way the items are to be matched the from the
two columns
Keep the list brief and place the shorter column to the
right





The stem should be quite clear and state the point to be
tested unambiguously.
The stem should include as much of the item as
possible.
Negative statements should be avoided because they
are likely to be ignored by the examinees.
All of the statements should be grammatically correct
by themselves and consistent with the stem.
Every item should have one correct or clearly best
answer.






All distracters should be plausible.
All distracters should be of similar length and level of
difficulty.
Using “all of the above” or “none of the above” as an
alternative is not recommended.
Correct responses should be distributed approximately
equally but randomly among the alternatives.
The stem should not provide any grammatical clue
which might help the examinee find the correct
response without understanding the item.
The stem should not start with a blank.



Pretesting means examining or reviewing the
test objectively not subjectively.
To determine objectively the characteristics
of the individual items including (IF), (ID), and
(CD)
Validation
which
determines
the
characteristics of the items together and
includes reliability, validity and practicality


It refers to the easiness of an item.
It is the proportion of correct responses to
the total number of responses.




IF= ∑C/ N
IF= item facility,
∑C= sum of the correct responses,
N= total number of responses

It is when all examinees get an item correctly
and equals to 1.

Zero

(IF) indexes beyond 0.63 are too easy, and
(IF) indexes below 0.37 are too difficult


It is the proportion of wrong responses to the
total number of responses.
Item difficulty= 1 – item facility

It refers to the extent to which a test item
discriminates more knowledgeable examinees
from less knowledgeable ones.





ID= item discrimination,
CH= number of correct responses of the high
group,
CL= number of correct responses of the low
group,
½ N= total number of responses divided by 2.
(Item discrimination index beyond 0.40 can
be acceptable.)

It refers to the frequency with which
alternatives are selected by the examinees.
Download