Chapter 8- Test Development Objectives: After the completion of the chapter, students should be able to: 1. Understand the process of psychological test development. 2. Create a psychological test following the procedures provided. 3. Perform reliability and validity estimates in the created psychological test. Chapter Topics: A. Test conceptualization B. Test construction C. Test tryout D. Item analysis E. Test revision A. Test Conceptualization TEST DEVELOPMENT Test Development Process: Test Development Process 1. Test Conceptualization 2. Test Construction 3. Test Tryout 4. Item Analysis Test Conceptualization • • Preliminary Questions: • • • • • • • • • 5. Test Revision This is the beginning of any published test. An emerging social phenomenon or pattern of behavior might serve as the stimulus for the development of new test. What is test designed to measure? What is the objective of the test? Is there a need for this test? Who will use this test? Who will take the test? What content will the test cover? How will the test be administered? What is the ideal format of the test? Should more than one form of the test be developed? Pilot Work Scaling Types of Scale: Scaling Methods: • What special training will be required of the test users for administering or interpreting the test? • What type of response will be required of test takers? • Who benefits from an administration of this test? • Is there any potential harm as the result of an administration of this test? • How will the meaning be attributed to scores on this test? • Also called as pilot study and pilot research. • Test items may be pilot studied (or piloted) to evaluate whether they should be included in the final form of the instrument. In developing a structured interview to measure introversion/extraversion, for example, pilot research may involve open-ended interviews with research subjects believed for some reason (perhaps based on an existing test) to be introverted or extraverted. TEST CONSTRUCTION • Scaling may be defined as the process of setting rules for assigning numbers in measurement. • Age-based Scale ▪ Interest is on the test performance as function of age. • Grade-based Scale ▪ Interest is on the test performance as function of grade. • Stanine Scale ▪ When a raw score is to be transformed into scores that range from 1-9. • Rating Scale ▪ A grouping of words, statements, or symbols in which judgments of the strength of a particular trait, attitude or emotion are indicated by the test taker. • Summative Scale ▪ Final test score is obtained by summing the ratings of all the items. • Likert Scale ▪ Contains 5-7 alternative responses which may include the following continuum: Agree/Disagree; Approve/Disapprove • Paired Comparisons ▪ Test takers are presented with 2 stimuli which they must compare in order to select one. Select the behavior that you think is more justified: a) Cheating on taxes if one has a chance. b) Accepting a bribe in one’s duties. • Comparative Scale ▪ Entails judgment on a stimulus in comparison with other stimulus on the scale. Comparative Scaling: Rank according to Beauty _____ Angel Locsin _____ Marian Rivera _____ Anne Curtis _____ Heart Evangelista _____ Toni Gonzaga • Categorical Scale ▪ Done by placing stimuli into alternative categories that differ quantitatively. Categorical Scaling 30 cards with various scenarios/situations. You are to judge whether scenarios are: Beautiful Average Ugly • Guttman Scale ▪ Entails all respondents who agree with the stronger statements will also agree with the milder statement. TEST CONSTRUCTION: WRITING ITEMS Writing Items: • What is the range of content should the items cover? Questions to Consider by the • Which of the many different types of item formats should Test Developer: be employed? • How many items should be written? Item Pool • Reservoir or well from which adequate items will be drawn or discarded for the final revision of the test. • Items could be derived from the test developer’s personal experience or academic acquaintance with the subject matter. • Help may also be sought through experts in their respective fields. Item Format • Form, plan, structure, arrangement, and layout of individual test items. a. Selected-Response • Requires test takers to select a response from a set of alternative responses. • Multiple Choice • Binary Choice • Matching Type Elements of Multiple Choice • Stem • Correct Option • Several Incorrect Options or distractors or foils b. Constructed-Response • Requires test takers to supply or create the correct answer. Completion Items • Requires examinee to provide a word or phrase that completes a sentence. Writing items for Computer • Item Bank Administration ▪ Large, easily accessible collection of test questions. • Computer Adaptive Testing (CAT) ▪ Interactive, computer-administered test taking process wherein items presented to the test taker are based in part of the test taker’s performance on previous items. • Item Branching ▪ Ability of the computer to tailor the content and order of presentation of test items. TEST CONSTRUCTION: SCORING ITEMS Cumulative Scoring • The higher the score on a test, the higher the ability or trait. Class/Category Scoring • Response earn credit toward placement in a particular class or category with other test takers whose pattern of responses are similar. Ipsative Scoring • Comparison of test taker’s score on one scale within a test with another scale within that same test. TEST TRYOUT Test Tryout • Test should be tried out on people similar in critical respects to the people for whom the test was designed. • Subjects should not be fewer than 5, rather than ideally 10. The more the subjects, the better. • Tryout should be executed under conditions as identical as possible to the condition under which is the standardized test will be administered. ITEM ANALYSIS a. Item-Difficulty Index • Obtained by calculating the proportion of the total number of test takers who got the item right. • Value can range from 0-1. • Optimal item difficulty should be determined in respect to the number of options. b. Item-Reliability Index • Provides an indication of internal consistency of a test. The higher the index, the greater the internal consistency. • Obtained using factor analysis. c. Item-Validity Index • Statistics designed to provide an indication of the degree to which a test is measuring what it purports to measure; the higher the item-validity index, the greater the test’s criterion-related validity. d. Item-Discrimination Index • Indicate how adequate an item separates or discriminates between high scorers and low scorers on an entire test. e. Qualitative Item Analysis • Nonstatistical procedure designed to explore how individual test item works. • “Think aloud” Test administration. • Expert panels TEST REVISION Test Revision • Characterize each item according to its strengths and weaknesses. • Balance various strengths and weaknesses across items. Characteristics of Tests that are Due for Revision: REFERENCE: • Administer the revised test under standardized conditions to a second appropriate sample of examinees. • Current test takers cannot relate to the test. • Vocabulary that is not readily understood by the test taker. • Inappropriate meaning of the words dictated by popular culture change. • Test norms are no longer adequate as a result of group membership changes. • Test norms are no longer adequate as a result of agerelated shifts. • Reliability and validity are improved for revision. • Theory on which the test was based has been improved.