UNIT IV ITEM ANALYSIS IN TEST DEVELOPMENT CHAP 14: ITEM ANALYSIS CHAP 15: INTRODUCTION TO ITEM RESPONSE THEORY CHAP 16: DETECTING ITEM BIAS 1 CHAPTER 14 ITEM ANALYSIS *The goal of test construction is to create a test with minimum length and good reliability and validity. *Item Analysis is the computation and examination of any statistical property of an item response distribution. *Item Analysis is a process that we go through when constructing a new test or subtests from a pool of items with good reliability and validity. 2 CHAPTER 14 ITEM ANALYSIS *Categories of Item Parameter *Item parameters fall into 3 categories or indices. 1. Indices that describe the distribution of responses to a single item (e. g. mean and variance of item responses). 2. Indices that describe the degree of relationship between the response to the item and some criterion of interest. Ex. next 3 CHAPTER 14 ITEM ANALYSIS Ex. The relationship between the questions (items) and the criterion of interest i.e., depression in Factor Analysis. 3. Indices that are a function of both, meaning, relationship to item variance/mean and a criterion of interest. Ex. First, find the variance/mean for your items then, calculate the relationship between these items variance and the criterion of interest (i.e., depression) for two groups.. 4 CHAPTER 14 ITEM ANALYSIS Item Difficulty “P” P= f/N or Number of examinees who answered an item correctly / Total number of participants (See your midterm item analysis and Chap 5). The higher the P value the easier the item 5 6 CHAPTER 14 ITEM ANALYSIS *Steps in Item Analysis In a typical item analysis the test developer will take 7 steps (they are similar to the process of test construction in Chapter 4). Next Slide 7 FYI PROCESS OF TEST CONSTRUCTION CHAP IV 1-Identifying purposes of test scores use 2-Identifying behaviors to represent the construct 3- Preparing test specification i.e., Bloom Taxonomy 4- Item construction 5- Item Review 8 PROCESS OF TEST CONSTRUCTION 6- Preliminary item tryouts 7- Field test 8- Statistical Analysis 9- Reliability and Validity 10- Guidelines 9 CHAPTER 14 ITEM ANALYSIS *7 Steps in Item Analysis 1. Describe what proportions of the test score are of greatest important. Ex. when I select questions for your midterm/final exam I look for the similarities of the questions with those of qualifying/comprehensive or EPPP exam. 10 CHAPTER 14 ITEM ANALYSIS *Steps in Item Analysis 2. Identify the item parameters (e.g. mean, variance) most relevant to these proportions. 3. Administer the items to a sample of examinees representative of those for whom the test is intended. Ex. IQ test for children or depression test for adults. 11 CHAPTER 14 ITEM ANALYSIS Steps in Item Analysis 4. Estimate for each item the parameters identified in step 2 i.e., variance). 5. Establish a plan for item selection. Ex. Using item difficulties (P) as in Item Analysis to select the items. 12 CHAPTER 14 ITEM ANALYSIS Steps in Item Analysis 6. Select the final subset of items, or use the data (Items in your Item Analysis) for test revision. Ex. Takeout all questions with very high or very low item difficulties. 7. Conduct a cross validation (validity) study. Ex. Use SPSS and compare the results of 2 tests or 2 classes (e. g. this year class and last year class). i.e., Confirmatory Factor Analysis. 13 UNIT V TEST SCORING AND INTERPRETATION CHAP 17: CORRECTING FOR GUESSING AND OTHER SCORING METHODS CHAP 18: SETTING STANDARDS CHAP 19: NORMS AND STANDARD SCORES CHAP 20: EQUATINGSCORESFROM DIFFERENT TESTS 14 UNIT V TEST SCORING AND INTERPRETATION CHAPT NORMS AND 19 STANDARDS SCORES 15 CHAPTER 19 NORMS AND STANDARD SCORES *Alfred Binet (1910)Ratio IQ = Ratio of MA/CA Terman Ratio IQ = Ratio of MA/CA X 100 standardized it. *Louis *Deviation IQ = Uses Norms to estimate the IQ We use Norms when we want to compare an examinee’s score (raw score) or score on a test to the distribution of scores (scaled or standard scores) for a sample from a well-defined population. Ex. next 16 CHAPTER 19 NORMS AND STANDARD SCORES Ex. When we want to estimate the IQ of a 20 year-old person, We compare his/her raw score on the subtest of an IQ test with the people of his/her age, which is his/her norm (standard scores). Using this technique tells us where this person stands among the people of his/her age. 17 NORMS AND STANDARD SCORES *9 BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432) 1. Identify the population of interest Ex. Students, employees of a company, inmates, patients, etc. 2. Identify the most critical statistics that will be computed for the sample data. Ex. Standard deviation σ, σ² , M, SS, p 18 NORMS AND STANDARD SCORES *9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432) 3. Decide on the tolerable amount of sampling error That is the discrepancy between the sample statistic (M) and population parameter, (µ) (Central Tendency M=µ). The Central Limit Theorem has 3 characteristics; 1. Central Tendency 2.The Shape of the Distribution (normal) and 3. Variability or Standard Error of Mean (σm). M-µ 19 9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432) 4. Device a procedure for drawing a sample from the population of interest. There are 4 types of probability sampling I Simple Random Sampling Give everyone in the population an equal chance to be selected Ex. Draw names from a hat. II Systemic Sampling N/n Select every Kth name on the list. Ex. CAU Pop N=1500 and your sample size n=150 N/n=1500/150=10 Select every 10th student. 20 9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432) SAMPLING CONT.. III Stratified Sampling “Strata” means different layers. We use Stratified Sampling when we want to compare 2 different groups (e.g. Males and females CAU Doctoral Students). First we randomly select males then, randomly select females. 21 9BASIC STEPS IN CONDUCTING A NORMING STUDY(P.432) SAMPLING CONT.. IV Cluster Sampling We use Cluster sampling when the population consists of units not individuals, such as classes. Ex. Miami Dade School Districts. If we want to conduct a research with the Miami Dade 2nd graders (1000- 2nd grade classes). We’ll randomly select about 10 of these 1000- 2nd grade classes to be in our sample then we conduct research. 22 9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432) 5.Estimate the minimum sample size (n) required to hold the sampling error within the specific limits. There are different statistical procedures to estimate the (n). (n) should be ≥30. 1. n= (σ/d)² d=effect size d=M-µ/σ 2. n= (σ/σm) ² σm= σ/√n Standard error of mean for pop Ex. Z score Sm=S/√n Estimated Standard Error of the Mean 23 for a sample. Ex. t-distribution NORMS AND STANDARD SCORES 24 THE EFFECT SIZE EX. TWO INDEPENDENT T-TEST 25 NORMS AND STANDARD SCORES 26 9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432) 6. Draw the Sample and collect the Data 7. Compute the Values of the Group Statistics of interest and their standard error. Sm=S/√n or σm = σ/√n Calculate the standard error of measurement, which is the difference between M and µ. Also known as sampling error. 27 9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432) 8. Identify the Types of Normative Scores that will be needed, and prepare the Normative Score Conversion table (see next 2 slide). 9. Prepare written documentation of the Normative Scores. 28 NORMS AND STANDARD SCORES Types of Normative Scores Raw Score Score on a subtest or a test. Scaled Score Normative score for specific age. 29 NORMATIVE SCORES 30 Wex-ler *NORMATIVE SCORES 31 NORMS AND STANDARD SCORES *Usefulness of Scaled Scores Scaled Scores are useful for two purpose: 1. Scaled scores relate the examinee’s performance to percentile rank scores of the norm group and their grade level. 2. In evaluation and research the mean scaled score is a better estimation of average group performance than the mean raw score. 32 33 34 NORMATIVE SCORES Multiply by 5 to convert to percentile. This means neither USA nor Iran are using a Normal Distribution in their grading system. USA is negatively and IRAN is positively skewed. 35 CHAPTER 19 NORMS AND STANDARD SCORES *Echternacht (1971) 3 steps Process of Grade and Age Equivalent Scores 1. First we convert the raw scores to scaled scores 2. Second, calculate the median scaled score for each grade-level, and plot them on a bivariate scatter plot. 3.Connect the points and draw a smooth curve. It is similar to Deviation IQ. I.e., Child’s performance compares 36 with that of others at a particular age or grade level. CHAPTER 19 NORMS AND STANDARD SCORES 37 38