Uploaded by burakcay8181

assessment7 (1)

advertisement
PSYCHOLOGICAL
ASSESSMENT
Dr. Münevver Başman
[email protected]
[email protected]
Test Development
The process of developing a test occurs in five stages:
•
The idea for a test is conceived (test conceptualization),
•
items for the test are drafted (test construction).
•
First draft of the test is then tried out on a group of sample testtakers (test tryout).
•
The data from the tryout are collected, testtakers’ performance on the test as a whole and on
each item is analyzed.
•
•
Statistical procedures (item analysis) are employed to assist in making judgments about which items are good
as they are, which items need to be revised, and which items should be discarded.
On the basis of the item analysis and related considerations, a revision or second draft of the test
is created. This revised version of the test is tried out on a new sample of testtakers, the results
are analyzed, and the test is further revised if necessary—and so it goes.
Test Conceptualization
•
To define the construct that will be measure by test, literature
about the construct should be reviewed.
•
Available literature on existing tests designed to measure a
particular construct should be reviewed.
•
a structured interview to measure a construct can be developed. It
can involve open-ended interviews.
•
For psychological tests designed to be used in clinical settings,
clinicians, patients, patients’ family members, clinical staff, and
others may be interviewed for insights that could assist in item
writing.
•
Testtaker sample can be asked to write a composition about the
construct.
Test Construction
•
a rating scale can be defined as a grouping of words,
statements, or symbols on which judgments of the strength of
a particular trait, attitude, or emotion are indicated by the
testtaker.
•
It is termed a summative scale, because the final test score is
obtained by summing the ratings across all the items.
Deciding the Number of Categories
•
The original Likert type scale includes 5 response categories
from ‘strongly agree’ to ‘strongly disagree’. Later, two, three,
four, six, and seven response categories were used in the
Likert scales.
•
Even/Odd number of categories:
•
•
Testtakers tend to choose positive or negative categories
Moderate ones tend to leave blank.
•
It is better to have an odd number of response categories.
Loss of information in terms of scale level as the number of
categories falls below 5. The difference between categories is
indistinguishable as they go up from 5.
•
Young people, those with low levels of education?
•
“Smiley” faces have been used in social-psychological research
with young children and adults with limited language skills.
•
One type of summative rating scale, the Likert scale, is used
extensively in psychology, usually to scale attitudes.
Writing Items
•
Items are written using the information obtained at the end of the
literature review and composition applications.
•
Do not use statements that refer to the past rather than to the present.
•
Avoid statements that are factual or could be interpreted as factual.
•
Make sure no statement can be interpreted in more than one way.
•
Do not use filler statements that are irrelevant to the psychological
construct being investigated.
•
Avoid universals such as all, always, none, and never because they often
introduce ambiguity.
•
Words such as only, just, merely, and others of a similar evaluative nature
should be used with care and moderation.
•
Be sure to include statements that cover the entire range of the
construct of interest.
•
Keep the wording of the statements simple, clear, and direct.
•
Keep the statements short (rarely exceed 20 words).
•
Be sure each statement contains only one complete thought.
•
Statements should be written in simple sentences rather than
compound or complex sentences unless this cannot be avoided.
•
Avoid using words that may not be understood by those who are to
be given the completed scale. Make the reading level appropriate
to the intended audience.
•
Do not use double negatives.
•
The number of positive and negative expressions should be
approximately same.
•
Both positive and negative expressions, including the same
scale, are not included in the tryout scale.
•
lack of spelling errors, expression disorders
•
Positive-negative items should be mixed order
•
Numerical and verbal categories should be written (only
numeric categories may cause confusion)
•
The instruction should be easy to understand and as short as
possible.
aim of the scale
• number of items in the scale
• Response categories
• estimated response time
• The identity of the respondents should be hidden and this should also
be stated in the instruction.
•
•
Submission to expert opinion
Test tryout
•
Printed-online
•
Layout such as letter size, line spacing
•
Items must be distinguishable from each other
•
Print quality-screen layout
Test tryout
•
This first draft of the test is then tried out on a group of
sample testtakers.
•
Sample that is applied the scale should represent the
population.
•
Item statistics depend on sample.
•
Number of testtakers should be at least five times more than
the number of items
Item Analysis
•
The main purpose of data analysis obtained by test tryout is to
develop a valid and reliable scale included items with the best
psychometric properties.
Item
Score
Total
Score
Scoring
responses
Item analysis
Select items
Categories
Positive
Negative
Totally disagree
1
5
Almost totally disagree
2
4
Sometimes agree
3
3
Almost totally agree
4
2
Totally agree
5
1
•
The scale score of each respondent is the sum of the item
scores. For this, the response of each respondent to each item
should be scored.
•
Reverse scores
•
High scale scores always show a positive construct.
Item Analysis
•
Item Discrimination
•
Item Difficulty (for maximum performance test)
•
Item reliability
•
Item validity
Item
Discrimination
Correlation
Difference
between means
of Upper and
Lower level
Correlation based item analysis
•
Item-total correlation: The correlation between each item and
a scale score.
•
Corrected item-total correlation: Correlation between each
item and a scale score that excludes that item.
r
n XY   X  Y
n X  ( X ) nY  (Y ) 
2
2
2
2
Difference between means of Upper and
Lower level
•
Respondents are ranked from the highest score to the lowest
score based on the total scale scores.
•
27% upper group-27% lower group
•
Is there a significant difference between the means of total
scores of upper group and lower group?
•
Upper group mean>Lower group mean
Index
of
Item Evaluation of Item
Discrimination
0.40 and upper
Excellent
0.30-0.39
Good item. Still can be improved.
0.20-0.29
Item should be improved.
0.19 and lower
Weak. It must be removed from the test.
Item Difficulty
p= Number of correct answered students/ number of upper and
lower level students
p= (29+9) / 108 = 0,35
Du  Dl
pj 
2N
Item 18
A
B*
C
D
E
Missing
Total
Upper
(%27)
8
29
8
3
5
1
54
Lower
(%27)
11
9
13
10
6
5
54
Total
19
38
21
13
11
6
108
Item Discrimination
Du  Dl
rjx 
N
• rjx= 29-9 / 54= 0,37
Item 18
A
B*
C
D
E
Missing
Total
Upper
(%27)
8
29
8
3
5
1
54
Lower
(%27)
11
9
13
10
6
5
54
Total
19
38
21
13
11
6
108
Number of Students (100)
Item 1
Number of correct answered
students
Item 2
Number of correct answered
students
Upper group
(%27)
(27)
25
20
Lower group
(%27)
(27)
15
15
Index of item’s difficulty
(p)
Index of item discrimination
(rjx)
p
25  15
 0.74
54
25  15
rjx 
 0.36
27
20  15
p
 0.64
54
rjx 
20  15
 0.19
27
Item standart deviation and variance
•
Give information about the differentiation of item scores
•
Item variance =
•
Item standart deviation Sx=
•
If index of an item is .60, find the item variance?
pj  (1  pj )
pj  (1  pj )
Item reliability
The reliability of each item is directly proportional to the
discrimination and standard deviation of the item.
The standard deviation of the item reaches its highest value
when item difficulty is 0.50. Therefore, if the item difficulty is
.50 or about 0.50 , it will increase the reliability of the item.
rx  rjx  sx
Item validity
•
Factor Analysis (for typical performance test)
•
The correlation between the score on item 1 and a score on
the criterion measure (denoted by the symbol r1c) is multiplied
by item 1’s item-score standard deviation (s1), and the product
is equal to an index of an item’s validity (r1c s1). (for maximum
performance test)
Revision
•
On the basis of the item analysis and related considerations, a
revision or second draft of the test is created. This revised
version of the test is tried out on a new sample of testtakers,
the results are analyzed, and the test is further revised if
necessary—and so it goes.
TEST RELIABILITY
Test-Retest
Parallel or
alternate test
forms
Singleadministration
methods
TEST VALIDITY
Validity
Face validity
Content
validity
Criterionrelated
validity
Construct
validity
Download