Uploaded by jlo041789

Test Development

Chapter 8- Test Development
After the completion of the chapter, students should be able to:
1. Understand the process of psychological test development.
2. Create a psychological test following the procedures provided.
3. Perform reliability and validity estimates in the created psychological test.
Chapter Topics:
A. Test conceptualization
B. Test construction
C. Test tryout
D. Item analysis
E. Test revision
A. Test Conceptualization
Test Development Process:
Test Development Process
1. Test Conceptualization
2. Test Construction
3. Test Tryout
4. Item Analysis
Test Conceptualization
Preliminary Questions:
5. Test Revision
This is the beginning of any published test.
An emerging social phenomenon or pattern of behavior
might serve as the stimulus for the development of new
What is test designed to measure?
What is the objective of the test?
Is there a need for this test?
Who will use this test?
Who will take the test?
What content will the test cover?
How will the test be administered?
What is the ideal format of the test?
Should more than one form of the test be developed?
Pilot Work
Types of Scale:
Scaling Methods:
• What special training will be required of the test users for
administering or interpreting the test?
• What type of response will be required of test takers?
• Who benefits from an administration of this test?
• Is there any potential harm as the result of an
administration of this test?
• How will the meaning be attributed to scores on this test?
• Also called as pilot study and pilot research.
• Test items may be pilot studied (or piloted) to evaluate
whether they should be included in the final form of the
instrument. In developing a structured interview to
measure introversion/extraversion, for example, pilot
research may involve open-ended interviews with
research subjects believed for some reason (perhaps
based on an existing test) to be introverted or
• Scaling may be defined as the process of setting rules for
assigning numbers in measurement.
• Age-based Scale
▪ Interest is on the test performance as function of age.
• Grade-based Scale
▪ Interest is on the test performance as function of
• Stanine Scale
▪ When a raw score is to be transformed into scores
that range from 1-9.
• Rating Scale
▪ A grouping of words, statements, or symbols in which
judgments of the strength of a particular trait, attitude
or emotion are indicated by the test taker.
• Summative Scale
▪ Final test score is obtained by summing the ratings of
all the items.
• Likert Scale
▪ Contains 5-7 alternative responses which may include
the following continuum: Agree/Disagree;
• Paired Comparisons
▪ Test takers are presented with 2 stimuli which they
must compare in order to select one.
Select the behavior that you think is more justified:
a) Cheating on taxes if one has a chance.
b) Accepting a bribe in one’s duties.
• Comparative Scale
▪ Entails judgment on a stimulus in comparison with
other stimulus on the scale.
Comparative Scaling: Rank according to Beauty
_____ Angel Locsin
_____ Marian Rivera
_____ Anne Curtis
_____ Heart Evangelista
_____ Toni Gonzaga
• Categorical Scale
▪ Done by placing stimuli into alternative categories that
differ quantitatively.
Categorical Scaling
30 cards with various scenarios/situations. You are to
judge whether scenarios are:
• Guttman Scale
▪ Entails all respondents who agree with the stronger
statements will also agree with the milder statement.
Writing Items:
• What is the range of content should the items cover?
Questions to Consider by the • Which of the many different types of item formats should
Test Developer:
be employed?
• How many items should be written?
Item Pool
• Reservoir or well from which adequate items will be
drawn or discarded for the final revision of the test.
• Items could be derived from the test developer’s personal
experience or academic acquaintance with the subject
• Help may also be sought through experts in their
respective fields.
Item Format
• Form, plan, structure, arrangement, and layout of
individual test items.
a. Selected-Response
• Requires test takers to select a response from a set of
alternative responses.
• Multiple Choice
• Binary Choice
• Matching Type
Elements of Multiple Choice
• Stem
• Correct Option
• Several Incorrect Options or distractors or foils
b. Constructed-Response
• Requires test takers to supply or create the correct
Completion Items
• Requires examinee to provide a word or phrase that
completes a sentence.
Writing items for Computer
• Item Bank
▪ Large, easily accessible collection of test questions.
• Computer Adaptive Testing (CAT)
Interactive, computer-administered test taking process
wherein items presented to the test taker are based in
part of the test taker’s performance on previous items.
• Item Branching
▪ Ability of the computer to tailor the content and order
of presentation of test items.
Cumulative Scoring
• The higher the score on a test, the higher the ability or
Class/Category Scoring
• Response earn credit toward placement in a particular
class or category with other test takers whose pattern of
responses are similar.
Ipsative Scoring
• Comparison of test taker’s score on one scale within a
test with another scale within that same test.
Test Tryout
• Test should be tried out on people similar in critical
respects to the people for whom the test was designed.
• Subjects should not be fewer than 5, rather than ideally
10. The more the subjects, the better.
• Tryout should be executed under conditions as identical
as possible to the condition under which is the
standardized test will be administered.
a. Item-Difficulty Index
• Obtained by calculating the proportion of the total number
of test takers who got the item right.
• Value can range from 0-1.
• Optimal item difficulty should be determined in respect to
the number of options.
b. Item-Reliability Index
• Provides an indication of internal consistency of a test.
The higher the index, the greater the internal consistency.
• Obtained using factor analysis.
c. Item-Validity Index
• Statistics designed to provide an indication of the degree
to which a test is measuring what it purports to measure;
the higher the item-validity index, the greater the test’s
criterion-related validity.
d. Item-Discrimination Index
• Indicate how adequate an item separates or discriminates
between high scorers and low scorers on an entire test.
e. Qualitative Item Analysis
• Nonstatistical procedure designed to explore how
individual test item works.
• “Think aloud” Test administration.
• Expert panels
Test Revision
• Characterize each item according to its strengths and
• Balance various strengths and weaknesses across items.
Characteristics of Tests that
are Due for Revision:
• Administer the revised test under standardized conditions
to a second appropriate sample of examinees.
• Current test takers cannot relate to the test.
• Vocabulary that is not readily understood by the test
• Inappropriate meaning of the words dictated by popular
culture change.
• Test norms are no longer adequate as a result of group
membership changes.
• Test norms are no longer adequate as a result of agerelated shifts.
• Reliability and validity are improved for revision.
• Theory on which the test was based has been improved.