Type of test based on the purpose and teacher

advertisement
Final Project of Assessment
Alfian Bagus H (100221404953)
Dwi Rudi H(10221404963)
Type of Test Based on the Purpose and Teacher’s Scoring
Type of test based on the purpose
Type of test
Purpose
Material
Orientation
1. Aptitude test
To measure
one’s talents in
language
Language as
a general
ability
To predict a
person’s
success to
exposure to
the second
language
2. Screening test
To select or to
allow able
candidates
Reflecting
requirements
in the future
performance
3. Placement
test
To place
individual to
their appropriate
level/class
Reflecting
the materials
in the
instructional
program
To predict
the future
performance
of the
candidates
Pupils will
not be
misplaced
so that the
pupil’s
learning will
be optimum
4. Achievement
test
To assess the
extent to which
the students
have achieved
Reflecting
the
instructional
materials in
To know if
the standard
of
competence
Dimension and
Aspect
 MLAT&PLAB
 MLAT: Number
learning; Phonetic
script; Spelling
cues; Words in
sentences; Paired
associates
 SNMPTN
 TPA;
 Second Language
Placement Test at
San Fransisco
 Assessing
comprehension and
production,
responding through
written and oral
performance, openended and limited
responses, selection
and gap-filling
formats
 Formative and
summative test
the pre stated
learning
objectives
Identifying
students’
learning
strengths and
weaknesses in
an instructional
program
syllabus/curri
culum
have been
achieved
Covering the
instructional
materials of a
learning
program
To know the
progress of
the teaching
and learning
activity
6. Proficiency
test
To know the
current stage
regardless the
education had
taken before
Language as
a general
ability
For
academic
future
7. Research
purpose
To answer the
research
questions
To measure
the
conceptualize trait
8. Program
evaluation
To see the
effectiveness of
a certain
program
Conceptualize general
language
ability
Conceptualize general
language
ability
5. Diagnostic
test
 GMAT
 Arithmetic (roots,
powers, number
properties, special
character,
modules);
Statistic; Word
problems
(min/max,
overlapping sets,
rate, work,
mixture);
Geometri;
Probability;
Combination;
Algebra
 TOEFL, IELTS,
TOEIC,
BULATS, etc
 TOEFL: reading,
listening,
speaking, writing
comprehension
Type of test based on the teacher’s scoring
1. Subjective test
It is a test in which the learners ability or performance are judged by examiners’ opinion
and judgment. It requires the examinees to create their own responses. No single wording
(or set of actions) can be regarded as the only correct response, and a response may earn
full or partial credit. Responses must be scored subjectively by content experts. Example:
writing essay and short answer.
2. Objective test
It is a test in which learners ability or performance are measured using specific set of
answer. This test consists of factual questions and requires extremely short answers that
can be quickly and unambiguously scored by anyone with an answer key, thus
minimizing subjective judgments by both the person taking the test and the person
scoring it. They tend to focus more on specific facts than on general ideas and concepts.
Example: multiple choices test, true or false test, matching and problem based questions.
ATTACHMENT
The Example of the Questions
1. MLAT
PART I: NUMBER LEARNING
Part I of the MLAT has 43 possible points. This part of the MLAT tests auditory and memory
abilities associated with sound-meaning relationships. In this part of the MLAT, you will learn
the names of numbers in a new language. Subsequently, you will hear the names of numbers
spoken aloud, and you will be asked to write down these numbers. For example, if you heard
someone say the number “seventeen” in English, you would write down 1 7. But in this test, you
will hear the numbers in a new language. Here’s how it will work:
You will hear some instructions read aloud. The speaker will then teach you some numbers (not
the same as these samples, of course). The speaker will say something like:
[The red text represents the voice you will hear.]
Now I will teach you some numbers in the new language. First, we will learn some single-digit
numbers:
“ba” is “one”
“baba” is “two”
“dee” is “three”
Now I will say the name of the number in the new language, and you write down the number you
hear. Try to do so before I tell you the answer:
“ba” -- That was “one”
“dee” -– That was “three”
“baba” -– That was “two”
Now we will learn some two-digit numbers:
“tu” is “twenty”
“ti” is “thirty”
“tu-ba” is “twenty-one” in this language -- because “tu” is twenty and “ba” is one.
“ti-ba” is “thirty-one “ – because “ti” is thirty and “ba” is one.
Now let’s begin. Write down the number you hear.
a. ti-ba [you have only about 5 seconds to write down your answer]
b. ti-dee
c. baba
d. tu-dee
PART II: PHONETIC SCRIPT
Part II of the MLAT is a test of your ability to learn a system for writing English sounds
phonetically. There are 30 possible points in this section. First you will learn phonetic symbols
for some common English sounds. For each question, you will see a set of four separate
syllables. Each syllable is spelled phonetically. A speaker will model the sounds for you by
pronouncing each of the four syllables in a set. Then the speaker will model the sounds in the
next set.
After the speaker models the sounds in five sets, you will be asked to look back at the first set.
The speaker will go through the groups again, but this time the speaker will say only one of the 4
syllables in a set. Your task is to select the syllable that has a phonetic spelling that matches the
syllable you heard.
For example, you would look at the first five sets. They would look something like this:
1. bot but bok buk
2. bok buk bov bof
3. geet gut beet but
4. beek beev but buv
5. geeb geet buf but
[Remember, the red text represents the voice of the speaker that you will hear]
The speaker will then pronounce each of the four syllables in each of the five sets. You follow
along:
1. “bot” “but” “bok” “buk”
2. “bok” “buk” “bov” “bof”
3. “geet” “gut” “beet” “but”
4. “beek” “beev” “but” “buv”
5 . “geeb” “geet” “buf” “but”
Then the speaker will go back to number 1 and pronounce just one syllable from the set of four.
So, you might hear:
1. “buk”
During the actual test, you must indicate which syllable you heard by darkening the
corresponding space on the computer answer sheet. Then you hear the next question:
PART III: SPELLING CUES
Part III of the MLAT has 50 questions. This part of the MLAT requires the ability to associate
sounds with symbols and depends somewhat on knowledge of English vocabulary. It is also
somewhat speeded, and therefore, it is much more challenging than the following exercise,
which consists of only 4 practice question. Nonetheless, trying these sample questions will give
you a good idea of what Part III is like.
Each question below has a group of words. The word at the top of the group is not spelled in the
usual way. Instead, it is spelled approximately as it is pronounced. Your task is to recognize the
disguised word from the spelling. In order to show that you recognize the disguised word, look
for one of the five words beneath it that corresponds most closely in meaning to the disguised
word. When you find this word or phrase, write down the letter that corresponds to your choice.
Try all four samples; then click below to check your answers.
NOW GO RIGHT AHEAD WITH THESE SAMPLE QUESTIONS. WORK RAPIDLY!
1. kloz
A. attire
B. nearby
C. stick
D. giant
E. relatives
2. restrnt
A. food
B. self-control
C. sleep
D. space explorer
E. drug
PART IV: WORDS IN SENTENCES
There are 45 questions in MLAT Part IV. The following exercise consists of only 4 practice
questions. The MLAT questions test recognition, analogy, and understanding of a far greater
range of syntactic structures than the 4 sample questions shown here.
In each of the following questions, we will call the first sentence the key sentence. One word in
the key sentence will be underlined and printed in capital letters. Your task is to select the letter
of the word in the second sentence that plays the same role in that sentence as the underlined
word in the key sentence.
Look at the following sample question:
Sample: JOHN took a long walk in the woods.
Children in blue jeans were singing and dancing in the park.
A
B
C
D
E
You would select “A.” because the key sentence is about “John” and the second sentence is
about “children.”
NOW GO RIGHT AHEAD WITH THESE SAMPLE QUESTIONS.
Write down your answers so that you can check them when you are finished.
1. MARY is happy.
From the look on your face, I can tell that you must have had a bad day.
A
B C
D
E
2. We wanted to go out, BUT we were too tired.
Because of our extensive training, we were confident when we were out sailing,
A
B
C
yet we were always aware of the potential dangers of being on the lake.
D
E
PART V. PAIRED ASSOCIATES
Part V of the MLAT focuses on the rote memory aspect of learning foreign languages. On the
actual test, you will have 2 minutes to memorize 24 words. You will then do a practice exercise.
You can look back at the vocabulary during this practice exercise, but you will not be permitted
to look at the vocabulary or at your practice sheet while you are doing the Part V questions that
follow the exercise.
Your task here is to MEMORIZE the Maya-English vocabulary below. There are only six words
to memorize on this practice test. Keep in mind that the vocabulary list on Part V of the MLAT
will be 4 times longer than this sample. Take 40 seconds to memorize this vocabulary. Then
click below to go to the questions. Do not look back at the vocabulary until you have finished
responding to the sample questions.
Vocabulary
Maya – English
c?on gun
si?
wood
k?ab hand
kab
juice
bat
ax
pal
son
NOW GO RIGHT AHEAD WITH THESE SAMPLE QUESTIONS.
Write down your answers so that you can check them when you are finished.
1. bat
A. animal
B. stick
C. jump
D. ax
E. stone
2. kab
A. juice
B. cart
C. corn
D. tool
E. run
3. c?on
A. story
B. gun
C. eat
D. mix
E. bird
2. English Placement Test Practice
1 - Usage - Choose the Best Answer
Mr. Smith ___________ to the store, bought some milk, gave the clerk $5.00 and _________
back $2.25 in change.
a) gone / got
b) went / had
c) went / got
d) gone / had
2 - Usage - Which sentence is punctuated properly?
a) The suspect broke, free ran through the street, turned the corner and escaped.
b) The suspect broke free, ran through the street, turned the corner and escaped
c) The suspect broke free, ran through the street, turned the corner and escaped.
d) The suspect broke, free ran through the street, turned the corner and, escaped.
3 - Comprehension - Read the Short Paragraph and Choose the Answer That Must Be
True
Jack had 3 sisters and 3 brothers. He was not the oldest and not the youngest. All of the girls
had red hair. Jack had red hair. Everyone else had brown hair.
a) Jack had 3 older brothers and 3 younger sisters.
b) Jack's brothers all had brown hair.
c) Jack was one of 6 children.
d) Jack liked his oldest sister more than any other sibling.
3. Achievement test for 3rd grade of elementary school
Directions: Carefully read each question. Fill in the circle next to the correct answer.
1. What word is an antonym for cool?
O A. warm
O B. mild
O C. damp
2. The student misbehaved in class. What does the word misbehaved
mean?
O A. behaved well
O B. behaved quietly
O C. behaved badly
4. GMAT Diagnostic test
ARITHMETIC (ROOTS)
1. 324 + 289 = ?
(A). 32
(B). 33
(C). 34
(D). 35
(E). 36
2. 36 + 64 + 52 + 20 = ?
A. 19 + 20
B. 19 20
C. 145
D. 5 100 + 20
E. 7 5
3. If x is an integer and x ´ x - x = a, which of the following must be true?
I. a is Even
II. a is Positive
III. a is an Integer
A. I only
B. II only
C. III only
D. I and II
E. None of the above
4. Proficiency test (TOEFL)
SECTION 1: Reading Comprehension
1. According to the passage, how do memories get transferred to the STM?
A) They revert from the long term memory.
B) They are filtered from the sensory storage area.
C) They get chunked when they enter the brain.
D) They enter via the nervous system.
2. The word elapses in paragraph 1 is closest in meaning to:
A) passes
B) adds up
C) appears
D) continues
5. Objectives test
1. The respiratory center in the brainstem is NOT affected by which situation?
A. high levels of carbon dioxide in the blood
B. high levels of hydrogen ions in the blood
C. low levels of oxygen molecules in the blood
2. What is a waste product normally excreted in the urine?
A. excess glucose
B. excess protein
C. red blood cells
D. urea
Test type
Test type based on test construction
1. Direct test
An item which test the students' ability to do something, such as write a letter or make a
speech rather than testing individual language points.
Ex : writing test
2. Indirect test
An item that tests knowledge of the language (grammar and vocabulary) during the students'
ability to do things such as write a letter, make a speech
E.g.: “Structure & Written Expression” section of theTOEFL
Test type based on score interpretation
1. Norm-Referenced testing
Norm-referenced tests (or NRTs) compare an examinee’s performance to that of other
examinees.
The goal is to rank the set of examinees so that decisions about their opportunity for success
Ex: college entrance. The Stanford, Metropolitan, and California Achievement Tests (SAT,
MAT, and CAT), as well as the Iowa and Comprehensive Tests of Basic Skills (ITBS and
CTBS)
2. Criterion-referenced tests
Criterion-referenced tests (or CRTs) differ in that each examinee’s performance is compared
to a pre-defined set of criteria or a standard (SK KD)
The goal with these tests is to determine whether or not the candidate has the demonstrated
mastery of a certain skill or set of skills. These results are reaching the standard exactly,
below the standard, beyond the standard.
E.g : A national board medical exam. Either the examinee has the skills to practice the
profession, in which case he or she is licensed, or does not
3. Communicative testing
Linguistics competence: the purpose of this test to know the ability in using good and right
language and this test also called grammatical competence.
Sociolinguistics competence: the purpose of this test to know the ability in using good and
right language in the right time.
Strategic competence: the ability in using communication strategy to make the message that
will be delivered easy to understand.
Organizational competence: the purpose of this test to know the ability in making statements
to be delivered well.
4. Performance
Performance based assessment means that you may have a difficult time distinguishing
between formal and informal assessment. A characteristic of many performance based
language assessment is the presence of interactive task. In such cases, the assessment
involves learners in actually performing the behavior that we want to measure.
Ex: the test takers are measured in the act of speaking, requesting, responding or in
combining listening and speaking and in integrating reading and writing.
Test type based on approaches
1. Discrete-Point Test
The test is based on mastering grammar only, vocabulary or both. Discrete-Point Test
called atomistic because an approach emphasize on dividing language to a small part.
Another characteristic of this test is using multiple choices.
2. Integrative testing
An item which tests more than one skill at a time
(e.g. a writing task tests the students grammar, vocabulary, punctuation, spelling,)
3. Power test
To measure the level of performance with sufficient difficulty and ample time to
complete. This test is higher order thinking and also very difficult, need much time, need
much money
Ex: research
4. Speed test
In the speed test the scope of the questions is limited and the methods you need to use to
answer them is clear. Taken individually, the questions appear relatively straightforward.
Speed test are concerned with how many questions you can answer correctly in the
limited time.
Example : IQ test, psychology test
5. Computer Adaptive testing
Computerized adaptive testing (CAT) is a form of computer based test that adapts
To the examinee's ability level. For this reason, it has also been called tailored testing.
The computer is programmed to fulfill the test design as it continuously adjusts to find
question of appropriate difficulty for test-takers at all performance level.
In CATs, the test-taker sees only one question at a time, and the computer scores each
question before selecting the next one. As a result, test-takers cannot skip question and
once they have entered and confirmed their answer, they cannot return to question or to
any earlier part of the test.
Example : GMAT (Graduate Management Admission Test).
6. Standardized test
a standardized test is a test that is administered and scored in a consistent, or "standard",
manner. Standardized tests are designed in such a way that the questions, conditions for
administering, scoring procedures, and interpretations are consistent and are administered
and scored in a predetermined, standard manner
Example : ACT (American College Testing), GRE (Graduate Record Examination)
Approaches to language testing
1. Classical approach
It’s to Emphasis the grammatical rule, memorizing vocabulary, translating classic
text.
It applied in GTM on learning English (concentrated on how to teach grammar and
drilling vocabulary). The classical approach will produce good translator.
Attachment
Ex: He . . . plays on the computer
1) He . . . listens to the radio.
2) They . . . read book.
3) Pete . . . gets angry.
4)
5)
6)
7)






Tom is . . . very friendly.
I . . . tae sugar in my coffee.
Ramon is . . . hungry
My grandmother . . . goes for a walk in the evening.
The weakness
students’ language knowledge is not communicative
Students will feel bored and language feels difficult
Speech and listening are neglected
Not natural
The strength
The students’ knowledge about the language target is undoubted.
Students’ competence in writing and translating will develop.
2. Discrete approach
The test is based on mastering grammar only, vocabulary or both. Discrete-Point Test
called atomistic because an approach emphasize on dividing language to a small
part.
Another characteristics of this test is using multiple choice.
Ex: dialog completion
Pair dialog performance
The weakness
 Time consuming and not natural
 the success in taking the test do not means that they can use the language in
everyday life
The strength
 the test can cope a wide range of material
 Objective scoring and efficient test
 Allow quantification on students’ response
3. Integrative Approach/ Unitary Competence Hypothesis
English cannot be broke into several components, it is a unity The approach is to
emphasis in mastering integrative skill. The test is usually a test that is demanded on
several skill to manage
Ex: cloze test
Dictation
Interview
Writing composition
The weakness


Complicated mode of test
The weakness is merely the weakness of the particular test used



The strength
Students’ ability to communicate will develop
Seems natural
Challenging and can reveal students’ integrative skill
Approach to language testing: Communication-Based Movement Approach
and The Performance-Based Movement Approach
1) Communication-Based Movement Approach
 Language is a means of communication
 Emphasis on the function of the language skill that is listening, speaking, reading,
and writing.
 The test has no explicit testing of language component grammar and vocabulary
 Applied according to the social context
Ex: At a bank between teller and customer
At a class between students and teacher
Examples
Writing test based on real life situation
Multiple choices using dialog
Ex: “that women over there looks confused”
Why don’ you ask her?
a) Does she need help?
b) If she need help
c) Whether she need help
d) Do you need help
e) She needs helps or not
Advantages




Realistic in terms of formats
Widens the concepts of language ability beyond those of grammatical ability
A prediction indicator of success in communicating in real life?
The use of language skills in integration
Weaknesses


Generalizability of the test results
The influence of native language
Example: If the teacher comes from the same region, they might correct the
students’ errors
influenced by native language.
2) The performance-based movement approach (Authentic Assessment)
The believe of The performance-based movement approach language is the vehicle of
content
Performance-Based Movement Approach give the students chance to demonstrate their
knowledge but also disclose more in-depth information on students’ academic needs.
 To emphasis in the task and language behaviours, and also contents and learning
outcomes with the present reference.
 Applied in CTL(performance based) is linked to task and language behavior and
also relates to content and learning outcomes with the present reference
 Testing practice in the CTL is authentic assessment.
Class language test





Portofolio
Project
Experiment
Extended
Response
Advantages




The pillars of the CTL similar to the seven components of effective learning
Students are under the guidance of the teacher
Students’ speaking presentation is essentially a form of public taking.
Teacher has vital role
Weaknesses

Fewer questions and call for a greater degree of subjective judgment


There are no clear right and wrong answers
Different teacher might grade a student’s work differently.
Characteristics of a Good Test: “RELIABILITY”



A good test need to be reliable”, so what is reliable ?
 Reliable’ means ‘stable’ or ‘consistence’
 Reliable test is a test that can produce stable scores or consistent scores
 Test scores demonstrated consistency or stability no matter who administers
the test, when or where the test is administered
Mathematical term of a score
X=T+E
 X : Obtained score/ observed score
 T : True score
 E : Error
How to estimate reliability?
 Test retest
The test-retest reliability estimation involves utilization of the same test to a
number of test takers on different testing occasions




Used to assess the consistency of a measure from one time to another
Weakness
It is not easy to create a similar condition on different testing occasion.
It is not known exactly what the best time interval for conducting the second
test administration ; too long or too close.
Strength
It has only one set of test to be constructed. , so we need less time and energy
Parallel forms
The parallel forms technique requires the construction of two or more sets of
test which these parallels test are made equal in every aspects of the test.







weakness
Making test that are equally similar in all aspect is not an easy task. It needs
more energy and time consuming
It is not easy to keep the test taker’s mental condition the same when they
responded to two sets of test administered almost at the same time.
Strength
Two form can be use in independent.
Internal consistency
This internal consistency approach is based on the logic that if the items in the
test are highly correlated, the test is said to be reliable.
Internal consistency
Split half
In split-half estimation we randomly divide all items that purport to measure
the same construct into two sets
Weakness
It does not fully reflect the true of reliability of the test
Inter-item
Inter -item estimation uses all of the items on our instruments that are
designed to measure the same construct.
Used to assess the degree to which different raters/observers give consistent
estimates of the same phenomenon.
Characteristics of a Good Test
“Validity”
Based on Kline (1993: 15), a test is said to be valid if it measures what it claims to measure.
Simply, validity is the precision of the test in measuring what is intended to be measured.
 There are three aspects or dimensions of the validity (validity instruments):
1. Face validity
‘The concept of face validity relates more to what a test appears to measures than what
test actually measures’ (Cohen et al., 1988:125). Face validity of a test then is linked to
what a test looks. If a test looks to measures what is intended to measure on “the face of
it” or on its look, the test can be said to be face valid. In brief, face validity refers to the
extent to which physical appearance of the test corresponds to what it is claimed to
measure.
For example: a speaking test is constructed and claims to test speaking abilities. When
test takers are to respond to the speaking test, and they produce speaking oral language as
the test claims, we can say that such a speaking is face valid to the test takers.
2. Content validity
Based on Wiersma and Jurs (1990:184), content validity means the extent to which the
test is representative of a defined body of content consisting of topics and processes. For
instance, a grammar test contains grammatical points to be tested such as infinitive,
gerunds, modals, tenses, etc.
3. Empirical validity
Empirical validity describes how closely scores on a test correspond (correlate) with
behavior as measured in other contexts. Moreover, we can say an instrument has
empirical validity when it is had tested.
Example: Students' scores on a test of academic aptitude may be compared with their
school grades (a commonly used criterion).
Empirical evidence of this kind of ability can be differentiated into two based on the time
for data collection of the external measure. There are concurrent and predictive validity.
 Concurrent validity/simultaneous validity: if the results are supported by other
concurrent performance beyond the assessment itself. On the other word, it is
when the score in a test related to another score that was made.
 Predictive validity: which means to assess and predict test takers’ prospect in the
future life.
4. Construct validity
Based on (Grondlund, 1985:72), construct validity is ‘. . . the extent to which the test
performance can be interpreted in terms of on or more psychological construct.’
Examination on construct validity requires a complex process. Wiersma and Jurs
(1990:193) states that there are two stages required in the examination of construct
validity. These are logical analysis and empirical analysis.
Wash back
To know the effect between the learning activity of the students with teaching learning
activity of the teacher. Test that influence learning of the students and teaching learning
activity of the teacher has good wash back validity.
 Some factors that influence validity instruments:
Based on Sukardi (2009:38), there are some factors that can influence the evaluation test
become not valid such as internal factors, external factors and factors from the students
them self.
1. The internal factors from the test:
 The instruction is not clear, so it can decrease the test validity.
 The words that are used in the structure of instrument evaluation are too difficult.
 The construction of the test items is not good.
 The difficulties level of the test items is not appropriate.
 The time allocated is not appropriate.
 The test items are not representing the content of the materials.
 The answers of the questions can be predicted by the students.
2. The external factors:
 The time allocated is not enough for the students.
 The assessment is not consistent.
 There is another person from outside helping the student to do the test.
3. The factors from the students them self.
 The wrong interpretation from the students.
 The students cannot concentrate well.
ITEM ANALYSIS
1.
2.
3.
4.
1)
2)
•
•
Item analysis is a process which examines student responses to individual test items
(questions) in order to assess the quality of those items and of the test as a whole.
PURPOSE
To improve items which will be used again in later tests.
To eliminate ambiguous items in a single test administration.
To increase instructors' skills in test construction.
To identify specific areas of course content which need greater emphasis or clarity.
Method
ITEM DIFFICULTY
ITEM DISCRIMINATION ITEM CHARACTERISTIC CURVE
3. Item validity (point biserial method)
4. Effectiveness of distractors
. ITEM DIFFICULTY
item difficulty is determined by the number of people who answer a particular test
item correctly (p).
For example, if the first question on a test was answered correctly by 76% of the class,
then the difficulty level (p or percentage passing) for that question is p = .76. If the
second question on a test was answered correctly by only 48% of the class, then the
difficulty level for that question is p = .48. The higher the percentage of people who
answer correctly, the easier the item, so that a difficulty level of .48 indicates that
question two was more difficult than question one, which had a difficulty level of .76.
Method of Item Difficulty
a) Method for Dichotomously Scored Item
b) Method for Polytomously Scored Item
Grouping Method
a). Method for Dichotomously Scored Item
R
N
P
p is the difficulty of a certain item.
R is the number of examinees who get that item correct.
N is the total number of examinees.
Example 1
There are 80 high school students attending a science achievement test, and 61
students pass item 1. Please calculate the difficulty for item 1.
Answer : 0.7
b). Method for Polytomously Scored Items
P
X
X max
X the mean of total examinees’ scores on one item
X max, the perfect scores of that item
•
Example 2
The perfect scores of one open- ended item is 20 points, the average score of total
examinees on this item is 11 points. What is the item difficulty?
Answer : 0.55
c). Grouping Method (Use of Extreme Groups)
Upper (U) and Lower (L) Criterion groups are selected from the extremes of distribution
of test scores or job ratings
P
PU  PL
2
PU is the proportion for examinees of upper group who get the item correct.
•
•
PL is the proportion for examinees of lower group who get the item correct.
Example 3
There are 370 examinees attending a language
test. Known that 64 examinees of 27% upper
extreme group pass item 5, and 33 examinees of 27%
lower extreme group pass the same item. Please
compute the difficulty of item 5.
Answer : 0.49
II. ITEM DISCRIMINATION
Item discrimination refers to the degree to which an item differentiates correctly
among test takers in the behavior that the test is designed to measure.
item discrimination determines whether those who did well on the entire test did well on
a particular item.
Method of Item Discrimination
a). Index of Discrimination
D = pH - pL (7.5)
We need to set one or two cutting scores to divide the examinees into
upper scoring group and lower scoring group.
pH is the proportion in the upper group who answer the item
correctly and pL is the proportion in the lower group who answer the
item correctly.
Values of D may range from -1.00 to 1.00.
ILLUSTRATION BY GGS INFORMATION SERVICES. CENGAGE LEARNING,
GALE.
Another example
50 Examinees’ Test Data on 8-Item Scale About Job Stress.
Question:
There are 140 students attending a world history test. (1) If we use the ratio 27% to
determine the upper and lower group, then how many examinees are there in the upper and
lower group separately? (2)If 18 examinees in upper group answer item 5 correctly, and 6
examinees in lower group answer it correctly, then calculate the discrimination index for item
5
ITEM VALIDITY :POINT-BISERIAL METHOD
ITEM VALIDITY :POINT-BISERIAL METHOD
ILLUSTRATION BY GGS INFORMATION SERVICES. CENGAGE LEARNING,
GALE.
•
Another way to determine the discriminability of an item is to determine the correlation
coefficient between performance on an item and performance on a test, or the tendency of
students selecting the correct answer to have high overall scores. POINT-BISERIAL
METHOD
Distractor (Incorrect Alternatives)
• Analyzing the distractors (e.i., incorrect alternatives) is useful in determining the relative
usefulness of the decoys in each item. Items should be modified if students consistently
fail to select certain multiple choice alternatives. The alternatives are probably totally
implausible and therefore of little use as decoys in multiple choice items. A
discrimination index or discrimination coefficient should be obtained for each option in
order to determine each distractor's usefulness .
Stages of Test Construction
There is a set general procedure for test constructions.
1.
Statement of the problem.
In constructing a test, a test maker has to make sure about what he/she wants to know and
for what purpose. The following questions have to be answered:
a. What kind of test is it to be? ( Achievement/proficiency/diagnostic/placement test)
b. What is the precise purpose?
c. What abilities are to be tested?
d. How detailed must the results be?
e. How accurate must the results be?
f. How important is backwash?
g. What constraints are set by unavailability of expertise, facilities, time (for construction,
administration and scoring)
2.
Providing a solution to the problem.
After the problems are clear, then steps can be taken to solve it. Efforts should be made to
gather information on tests that have been designed for similar situation.
3.
Writing specifications for the test
The first form that the solution takes is a set of specifications for the test. This will include
information on: content, format and timing, criteria levels of performance and scoring
procedures.
a. Content
This refers not to the content of a single, particular version of a test, but to the entire
potential content of any number of versions. Sample of this content will appear in
individual versions of the test.
-
Operations: The tasks that candidates have able to carry out.
For a reading test, these might include, for example: scan text to locate specific
information, guess meaning of unknown words from context.
-
Types of Text
For writing test, these might include: letters, forms, academic essay up to pages in
length.
-
Addressees
This refers to the kinds of people that the candidate is expected to be able to write or
speak to (for example native speakers of the same age and status): or the people for
whom reading and listening materials are primarily intended ( for example nativespeaker university students)
-
Topics
Topics are selected according to suitable for the candidate and the type of test.
b. Format and Timing
This should specify test structure (including time allocated to components) and item
types/elicitation procedures, with examples. It should state what weighting is to be
assigned to each component. It should also say how many passages will normally be
presented (in the case of reading or listening) or required (in the case of writing), how
many items there will be in each component.
c. Criteria Level of Performance
The required levels of performance for (different levels of) success should be specified.
This may involve a simple statement to the effect that, to demonstrate “mastery”, 80% of
the items must be responded to correctly. It may be more complex: The Basic level oral
interaction specifications of the Royal Society of Arts (RSA) Test of the Communicative
Use of English as a Foreign Language will serve as an example. These refer to accuracy,
appropriacy, range, flexibility and size, Thus:
BASIC
DESCRIPTORS
Accuracy
Pronunciations may be heavily influenced by L1 and
accented through general intelligible. Any confusion
caused by grammatical/lexical errors can be classified
by the candidate
Appropriacy
Use of language broadly appropriate to function,
through no subtlety should be expected. The intention of
the speaker can be perceived without excessive effort.
Range
Severely limited range of expression is acceptable. My
often have to search for a way to convoy the desired
meaning
Flexibility
Need not usually take the initiative in conversation. My
take time to respond to a change of topic. Interlocutor
may have to make considerable allowances and often
adopt a supportive role
Size
Contributions generally limited to one or two simple
utterances are acceptable.
d. Scoring Procedures
These are most relevant where scoring will be subjective. The test constructors should be
clear as to how they will achieve high scorer rebility.
4.
Writing the test
a. Sampling
It is most unlikely that everything found under the heading of “Content” in the
specifications can be included in any one version of the test. Choices have to be made.
For content validity and for beneficial backwash, the important thing is to choose widely
from the whole area of content. One should not concentrate on those elements known to
be easy to test. Succeeding versions of the test should also sample widely and
unpredictably.
b. Item writing and moderation
The writing of successful items is extremely difficult. No one can expect to be able
consistently to produce perfect items. Some items will have to be rejected, others
reworked. The best way to identify items that have to be improved or abandoned is
through teamwork. Colleagues must really try to find fault, and despite the seemingly
inevitable emotion attachment that item writers develop to items that they have created,
they must be open to, and ready to accept, the criticism that are offered to them. Good
personal relations are a desirable quality in any test writing team.
Moderation is a process/an attempt should be made to administer the test to native
speakers of a similar educational background to future test candidates. These native
speakers should score 100%, or close to it. Items that prove difficult for comparable
native speakers almost certainly need version or replacement.
5.
c. Writing and moderation of scoring key
Once the items have been agreed, the next task is to write the scoring key where this is
appropriate. Where there is intended to be only one correct response, this is perfectly
straightforward matter. Where there are alternative acceptable responses, which may be
awarded different scores, or where partial credit may be given for incomplete responses,
greater care is necessary. Once again, the criticism of colleagues should be sought as a
matter of course.
Pretesting
Pretesting is needed although careful moderation has been administered. There are likely to
be some problem with every test. It is obviously better if there problem can be identify
before the test is administrated to the group for which it is intended. The aim should be to
administer it first to a group as similar as possible to the one for which it is really intended.
Problem in administration and scoring are noted. The reliability coefficients of the whole
test and of its components are calculated, and individual items are analyzed.
6.
Validation of the test
Validity in particular test usually refers to criterion related to validity. We are looking for
empirical evidence that the test will perform well against some criterion. For example: The
achievement test might be validated against the ratings of students by their current language
teachers and by their future subject teachers soon after the beginning of their academic
courses.
Scoring, Grading, Test-Score Interpretation


Scoring is a process to utilizing a number to represent the responses made by
the test taker
The score is basically raw (raw score) because in order for the score to be
meaningful, further analyses are required.
Types of scoring
Dichotomous vs Continuous Scoring
Dichotomous Scoring  entails viewing and treating the response as either
one of two distinct, exclusive categories.
Example: scoring in multiple choice, true-false, and correct-incorrect (1 is
assigned to a correct answer; 0 to an incorrect one)
Continuous Scoring  views and treats the test taker’s response as being
graded in nature.
Example: speaking and writing test ( may be scored as 1,2,3,4,5 in term of
fluency for speaking test)
Holistic, Primary Trait and Analytic Scoring
Holistic Scoring considers the test taker’s response as a whole totality rather
than as consisting of fragmented part.
Example: speaking test will be scored used Test of Spoken English (TSE)
scoring guide
Primary trait scoring focused on one specific type of features or traits that the
test takers need to demonstrate.
For example, in writing, the teacher scored only the content of the writing
product.
For example, in speaking, the teacher scored only the clarity in expressing the
idea.
Analytic scoring emphasizes on individual points or components of the test
takers’ response.
Linguistic and non linguistic features are both important to be scored.
It is classified based on how the test taker’s response is viewed and treated.
Grading
Grades reflect the standard to have the weighting system of a quality.
In grading, you can include both achievement aspect and non-achievement
aspect
If you pre-specify standards of performance on a numerical point system,
you are using an absolute system of grading.
For example, having established points for a midterm test, points for a final
exam, and points accumulated for the semester, are set by the institution
Relative grading is usually accomplished by ranking students in order of
performance (percentile ranks) and assigning cut-off points for grades. It is
allowing your own interpretation and of adjusting for unpredicted ease or
difficulty of a test.
Test Interpretation
Steps:
1. Frequency Distribution
2. Measures of Central Tendency
3. Measures of Variability
4. Item Analysis
Frequency Distribution
Showing the number of students who obtained mark awarded.
From 30 students, there are 6 students get 100.
Measures of Central Tendency
Indicators of how well most students perform in the group or in brief,
indicators of group performance.
Mean
Median
Mode
Measures of Variability
Indicators of homogenity or heterogenity of a group.
Range
Standard deviation
Variance
Item Analysis
A process which examines students’ responses to individual test items
(questions) in order to assess the quality of those items and of the tests as a
whole.
Characteristics of Authentic Assessment
1. Include task to perform and rubric for scoring the performance.
Accomplishing this approach, teacher should always give task to be accomplished in
every meeting. So does the teacher who should always make a scoring rubric for
assessing the students’ performance (group work and individual).
There are 3 kinds of rubric:
 Holistic
 Analytic
 Primary three
2. Construct the students’ response.
The teachers help/guide the students to construct their way of thinking, not just
analyzing, identifying, and stating something.
3. Higher Order Thinking (HOT).
the task given is expected to make the students to use the higer order thinking (analyzing,
evaluating, and creating). thus, the students will be more creative and construct their own
knowledge.
4. Integrated skill (not integrative).
Combine some language skills in one assessing time. This is different with integrative
approach which integrates some language components in one assessing time.
5. Process and product equally consider important.
In this approach, the important thing is not only the product, but also the process for
reaching the product.
6. Depth information for the teacher.
By using the rubric, the teacher will get the complete information about the students’
skill, their weakness and strength in facing the lesson. From that information, the teacher
can construct the best way to help the students to face the challenge.
The Benefit for the Students
 The students will know how well their mastery toward the learning material.
 The students will know and strengthen their mastery in some skill.
 The students will connect their learning with their experiences, world, and society in
wider scope.
 The students will sharpen their skill in high order thinking.
 The students will have responsibilities and choices.
 The students will cooperate with other students.
 The students will learn to measure the level of their performances.
Variety
1. Interview
This is actually the speaking test in the performance-based assessment. The teacher can
ask some question to the students while the lesson is going.
2. Retelling
Different from the previous kind of test, this test is more casual and usually done in daily
activity. Interviewing the students is usually done in every end of the chapter (it is like
chapter test), while retelling is usually done in every lesson (if needed). This kind of test
is usually done for assessing the activeness of the students. This is done after the students
observing something, hearing some explanation, listened to something, and gain some
conclusion based on their own understanding.
Three things to be evaluated on retelling:
 Students’ speaking skill.
 Organization text order.
 Students’ response to the text.
3. Composing
Composing means making an essay (narrative, descriptive, etc.) or any report in written
form. There are two approaches for this assessment:
 Global Approach
In this kind of assessment, the teacher focuses in the whole of the essay. This way
of assessing is good but the teacher should give the detail score for each part then
accumulate it to reach the total score which will give us the score for whole
assignment.
 Component Approach
Different with the Global Approach, this assessing, only focus in one particular
skill or aspect. For example: in assessing an essay, the teacher only focuses on the
idea that being generated and pays no attention to the generic structure or the
vocabulary used (diction).
4. Project/Exhibition
The teacher can ask students to make a project (group or individual) and hold an
exhibition so that they will feel appreciated and will work harder and serious for their
own project.
5. Experiment
this experiment is actually a simulation in doing something. in English class, this
experiment is simply the teaching of procedure text. thus, the students ask to do
something (simply a simple experiemnt that is popular and easy so that it will not trouble
you) and explaining with procedure demonstrating or perform the procedure of
something. this assessment is actually a practice to do an experiment since the students in
Indonesia is rarely deal with an experiment and turn out difficult in the end of college
year. experiment assessment cannot stand alone, this assessment is actually part of larger
activity.
6. Written/Spoken Response
the students ask to write or perform the report after do some experiment or observation.
then, the teacher give some questions (observasing or comprehending question) and after
the students finish answering all the question, the teacher and students can discussing it.
this assessment is also the part of larger activity.
7. Observation
there is two kinds of observation:
 spontaneous observation
observation without complicated preparation/plan.
 structured observation
observation with plan or preparation.
8. Portfolio
collecting students’ the best work and compile it. this assessmnet is used to gain the
information of students’ development from time to time.
.
.
Download