Chapter 2 Issues in Test Design

advertisement
Chapter 2 Issues in Test Design
Maximal performance tests measure the upper limits of one's abilities and for
that reason are also called "Ability" tests.
Achievement tests - are maximal performance tests. They measure how much
you have learned or how much skill you have developed in a given area. A
classroom test is an achievement test.
Aptitude tests - measure your "potential" for learning new information. A test of
mechanical ability is an aptitude test.
Whether a test is one or the other of these is not always clear and there is
much debate and controversy surrounding tests like IQ tests, the SAT,
GRE, etc.
Speed vs. Power tests - A maximal performance test can be a speed or a
power test.
Speed tests - contain items that are all pretty much equal in difficulty. The
outcome measure is "how many" you can answer correctly in a given amount of
time (e.g., a typing test or the WISC "digit symbol" subtest)
Power tests - contain items of increasing difficulty so that fewer and fewer
people will make it to the end of the test. The SAT, GRE, and other standardized
tests are power tests.
Typical Performance Tests - measure "characteristics" of the person. These
include (1) personality, (2) attitude, and (3) vocational/Interest tests.
Personality tests- may be "projective" or "objective."
Objective Personality Tests - (also called "self report" tests) utilize objective
and standard questions and scoring, typically multiple choice, true-false, or Likert
scale in format. Favored by "trait" and "statistically" oriented psychologists.
Some examples are the MMPI, Cattell's 16 pf, and the NEO-PI-R.
Advantages - fast, inexpensive, easy to administer and score, can easily be
given to large numbers of people, not subject to examiner biases.
Disadvantages - subjects may not understand instructions, questions may be
"face valid," leading to biased (e.g., fake good) responding.
Projective Personality Tests - The subject responds to a series of "ambiguous
stimuli" Presumably, the "unconscious" is being tapped. Favored by
psychoanalytically (Freudian) oriented psychologists. Some examples are the
Rorschach and the Thematic Apperception Test (TAT).
Advantages - may provided "interesting data," can sometimes be used as a tool
to "jump start" the therapy process, don't suffer from the "face validity" problem.
Disadvantages. time consuming to administer, score, and interpret. They don't
fit in well with the current "Zeitgeist" (world view) of managed care
psychotherapy.
There is not much dispute that objective tests are far superior to projective
tests when it comes to RELIABILITY and VALIDITY
In your instructor's opinion, use of projective tests is on the decline.
Attitude Tests - measure opinions or beliefs, usually use objective items. A bias
problem common to attitude tests is "socially correct or appropriate responding"
Interest Tests - measure likes and dislikes and are therefore useful in decision
making regarding future career and job training.
Norm Referenced vs. Criterion Referenced Scorning (sometimes the
distinction between these two is not entirely clear)
Norm referenced scoring - most important is where a test taker falls in relation
to others who have also taken the test (vs. the actual raw score). Percentiles are
one type of norm referenced scoring. If a test gets "curved," it is clearly norm
referenced.
Norm Group - (or normative group) is the group the subject is being compared
to.
Standardization Sample - name for large norm groups used when working with
major standardized tests such as the Stanford Binet, SAT, or GRE.
Criterion Referenced Scoring - (also called pass-fail or mastery tests) A
particular score (the "criterion") such as 75% correct must be reached in order to
pass. The performance of others is irrelevant. The EPPP (Examination for
Professional Practice in Psychology), and state boards for various professions
are examples.
Ipsative Scoring (also called Forced Choice) - Questions typically take the
form: "Would you rather: A. read a book OR B. go Bungee Jumping? ONLY
used with Typical Performance Multi scale tests. This is so you cannot score
high on all of the scales. Most commonly seen on vocational and interest tests.
The Myers Briggs test (based on Carl Jung's theory) uses ipsative items.
Construct Explication - (actually, the domain may or may not be a construct).
A logical dissection and analysis of the domain of your test, identifying content
areas to be covered by the test (see Table 2.12). This should generally precede
question creation.
Individual vs. Group Administration
Individual Administration - there is one test taker and one examiner, items are
presented one at a time (e.g., Stanford Binet). Most items are verbal free
response or physical response (e.g., puzzle assembly).
Advantages - (1) Examiner can use the "basal-ceiling" approach so that time is
not wasted on too easy or too hard items, (2) Test taker attitudes and reactions
can be observed and addressed, (3) encouragement and guidance can be given
Disadvantages - (1) Costly and time consuming, (2) examiner behavior can
influence subject performance, (3) there is an element of subjectivity in recording
and grading responses.
Basal level - level at which the subject gets virtually all items correct.
Ceiling level - level at which the subject gets virtually all items wrong.
Group Administration - one examiner can test many people, usually paper and
pencil alternate choice. The California Achievement Test (CAT) and Iowa Test of
Basic Skills (ITBS) are used to assess achievement in K - 12 year olds.
Advantages - (1) Large numbers of people can be assessed quickly and
efficiently, (2) Very cost effective, (3) no risk of grading or scoring biases.
Disadvantages - (1) critics argue that only "rote" learning is assessed, more
complex cognitive skills cannot be assessed this way, (2) no way of knowing if
there are motivational or other subject problems.
Tailored Testing (2 meanings)
1. To save time, computerized testing can be programmed to simulate the "basalceiling" method of testing used in the Stanford Binet.
2. Adapting a test for individuals with special needs. For example, the KABC
(Kaufman Assessment Battery for Children) has a set of "non-verbal" scales well
suited for testing children with hearing or speaking difficulties.
Download