*What is Test Theory? The study of measurement problems, influence of these measurement problems on psychological inventories, and how to create methods to minimize these problems 1 UNIT I INTRODUCTION TO MEASUREMENT THEORY CHAP 1: WHAT IS TEST THEORY CHAP 2: STATISTICAL CONCEPTS FOR TEST THEORY CHAP 3: INTRODUCTION TO SCALLING CHAP 4: PROCESS OF TEST CONSTRUCTION CHAPTER 5: TEST SCORES AS COMPOSITES 2 UNIT II RELIABILITY CHAP 6: RELIABILITY AND THE CLASSICAL TRUE SCORE MODEL CHAP 7: PROCEDURES FOR ESTIMATING RELIABILITY CHAP 8: INTRODUCTION TO GENERALIZABILITY THEORY CHAP 9: RELIABILITY COEFFICIENTS FOR CRITERION-REFERENCED TESTS 3 UNIT III VALIDITY CHAP 10: INTRODUCTION TO VALIDITY CHAP 11: STATISTICAL PROCEDURES FOR PREDICTION AND CLASSIFICATION CHAP 12: BIAS IN SELECTION CHAP 13: FACTOR ANALYSIS 4 UNIT IV ITEM ANALYSIS IN TEST DEVELOPMENT CHAP 14: ITEM ANALYSIS CHAP 15: INTRODUCTION TO ITEM RESPONSE THEORY CHAP 16: DETECTING ITEM BIAS 5 UNIT V TEST SCORING AND INTERPRETATION CHAP 17: CORRECTING FOR GUESSING AND OTHER SCORING METHODS CHAP 18: SETTING STANDARDS CHAP 19: NORMS AND STANDARD SCORES CHAP 20: EQUATINGSCORESFROM DIFFERENT TESTS 6 Introduction to Classical and Modern Test Theory Chapter 1 7 Historic Origins Pioneer countries in test theory are: Germany, England, France, and the United States 8 Germany Wilhelm Wundt, Ernest Weber, and Gustavo Fechner used procedures for collection of observations in a standard way for all subjects, such as reading the instructions at the top of the test page (see next slide). 9 Germany Cont.. Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The type of sensation you experience depends on which area of the brain is activated. This is known as a. sensory localization. b.transduction. c.sensory adaptation.d.cerebralization. 2. A hypnic jerk usually occurs during a.light sleep.b.deep sleep.c.episodes of hypersomnia.d.episodes of sleep apnea. See p.14 Exercise 4-b 10 Germany p.14 Exercise 4-b 4.Consider the following testing practices and indicate which nineteenth-century psychological researcher probably should be credited with the origin? b. A teacher about to give a test reads aloud from the test manual: “Please read the instructions at the top of the page silently while I read them aloud…..” (see previous slide) England Karl Pearson-----Pearson Correlation Charles Spearman----Spearman Correlation. Used Factor Analysis in his “Theory of Intelligence.” Galton----Categorizing half cousin to Darwin France Alfred Binet & Theodore Simon (1905) Developed the first IQ test. IQ=MA/CAx100 MA=Mental Age CA= Chronological Age *The Difference between Ratio IQ and Deviation IQ or Normative IQ 13 United States James McKeen Cattell “Mental Testing” Thorndike -- An Introduction to the Theory of Mental and Social Measurement Trail and Error A Theory of Learning 14 Test Key Terms Optimal Performance Typical Performance Observable Performance Constructs Measurement 15 Key Terms Test: Test is a Procedure for obtaining a sample of an individual’s performance. Optimal Performance: Refers to the performance on Aptitude Tests (GRE,SAT,ACT), or Achievement Tests (WRAT, WIAT) 16 Key Terms Typical Performance: Refers to the performance on questioners and inventories to report one’s feelings, attitudes, interests, or reactions to a situation. Observable Performance: Refers to perform in an observable behavior (watching children interacting with each others, natural observation). 17 Key Terms Measurement: Quantifying an observable behavior or when quantitative value is given to a behavior. See Exercise 1 & 2 on P.14 18 19 20 Heavy drinkers die at a younger age 21 Confounding Variables Confounding variables are variables that the researcher failed to control, or eliminate, damaging the internal validity of an experiment. Also, known as a third variable or a mediator variable, can adversely affect the relation between the independent variable and dependent variable. Ex. Next 22 Ex. A research group might design a study to determine if heavy drinkers die at a younger age. Heavy drinkers may be more likely to smoke, or eat junk food, all of which could be factors in reducing longevity. A third variable may have adversely influenced the results. 23 Intervening Variables A variable that explains a relation or provides a causal link between other variables. Also called “Mediating Variable” or “intermediary variable.” Ex. Next slide 24 Intervening Variables Ex: The statistical association between income and longevity needs to be explained because just having money does not make one live longer. Other variables intervene between money and long life. People with high incomes tend to have better medical care than those with low incomes. Medical care is an intervening variable. It mediates the relation between income and longevity. 25 Key Terms Constructs: Constructs are hypothetical concepts or psychological attributes/traits, such as personality, anxiety, depression etc. They are difficult to measure. Constructs are not physical attributes such as height and weight. 26 *Why do we have Measurement Problems in Psychology?? 1.There is no single universal way of defining psychological construct 2. Psychological measurements are based on samples of behavior 3. Sampling of behavior results in errors in measurement 4.The units (scales) of measurements are not well defined. 5. The measurements must have demonstrated relationship to other variables to have meaning. 27 Role of Test Theory in Research & Evaluation Selecting a Problem Operational Definitions of Variables Instruments Accuracy of the Instruments Data Collection Use of Statistics 28 Chapter 2 Statistical Concepts for Test Theory 29 Population Sample 30 Population and Sample Population: Population is the set of all individuals of interest for a particular study. Measurements related to Population are PARAMETERS. Sample: Sample is a set of individuals selected from a population. Measurements related to sample are STATISTICS. 31 Statistics The people chosen for a study are its subjects or participants, collectively called a sample –The sample must be representative 32 Statistics Descriptive Describes the distribution of scores and values such as mean, median, and mode Inferential Infer or draw a conclusion from a sample. 33 Key Terms Constant I.e. temp in learning and hunger Variable IV manipulate DV measure Discrete Numbers 1, 2 , 3, 14 Continues Numbers 1.3, 3.6 34 CONTINUOUS VERSUS DISCRETE VARIABLES Discrete variables (categorical) – Values are defined by category boundaries – E.g., gender Continuous variables – Values can range along a continuum – E.g., height 35 Statistics Scales of Measurement Frequency Distributions and Graphs Measures of Central Tendency Standard Deviations and Variances Z Score 1- Pearson Correlations 2- Spearman 36 Scales of Measurement (NOIR) Nominal Scale Qualities Assignment of labels Example Gender— (male or female) Preference— (like or dislike) Voting record—(for or against) What You Can Say Each observation belongs in its own category What You Can’t Say An observation represents “more” or “less” than another observation 37 ORDINAL SCALE Qualities Assignment of values along some underlying dimension (order) Example Rank in college Order of finishing a race What You Can Say One observation is ranked above or below another. What You Can’t Say The amount that one variable is more or less than another 38 INTERVAL SCALE Qualities Equal distances between points arbitrary zero Example Number of words spelled correctly on Intelligence test scores Temperature What You Can Say What You Can’t Say One score differs from another on some measure that has equally appearing intervals The amount of difference is an exact representation of differences of the variable being studied 39 40 RATIO SCALE Qualities Meaningful and nonarbitrary zero Absolute zero Example Age Weight Time? What You Can Say One value is twice as much as another or no quantity of that variable can exist What You Can’t Say Not much! 41 LEVELS OF MEASUREMENT Level of Measurement For Example Quality of Level Ratio Rachael is 5’ 10” and Gregory is 5’ 5” Absolute zero Interval Rachael is 5” taller than Gregory An inch is an inch is an inch Ordinal Rachael is taller than Gregory Greater than Nominal Rachael is tall and Gregory is short Different from Variables are measured at one of these four levels Qualities of one level are characteristic of the next level up The more precise (higher) the level of measurement, the more accurate is the measurement process 42 WHAT IS ALL THE FUSS? Measurement should be as precise as possible In psychology, most variables are probably measured at the nominal or ordinal level But—how a variable is measured can determine the level of precision 43 Frequency Distributions and Graphs 44 histogram 45 Polygon 46 Frequency Distributions and Graphs 47 48 49 50 51 52 53 Platykurtic Mesokurtic, , Leptokurtic 54 Frequency Distributions Frequency Distributions (ƒ) 2, 4, 3, 2, 5, 3, 6, 1, 1, 3, 5, 2, 4, 2 Σƒ=N=14 Ρ=ƒ/N P=Proportion %=P x 100 55 Frequency Distributions Frequency Distributions (ƒ) X f fX Ρ=ƒ/N %=P x 100 Cum% 6 5 4 3 2 1 1 2 2 3 4 2 6 1/14=.07 7% 56 Frequency Distribution Table Cumulative % X f fX P=f/n %= px100 6 1 6 1/14=.07 7% 7% 5 2 10 2/14=.14 14% 21% 4 2 8 2/14=.14 14% 35% How do you Calculate Cumulative Percent ? • Add each new individual percent to the running tally of the percentages that came before it. • For example, if your dataset consisted of the four numbers: 100, 200, 150, 50 then their individual values, expressed as a percent of the total (in this case 500), are 20%, 40%, 30% and 10%. • The cumulative percent would be:1.Proportion 2.percentage • 100/500=0.2x100: 20% • 200: (i.e. 20% from the step before + 40%)= 60% • 150: (i.e. 60% from the step before + 30%)= 90% • 50: (i.e. 90% from the step before + 10%) = 100% 58 Frequency Distributions X=2, f=4, N=14 Ρ=ƒ/N P=4/14=.29 %=P x 100= 29% X=3, f=3, N=14 P=3/14=.21 %= 21% μ=ΣƒX/Σƒ 59 Mean Measures of Central Tendency Mean--------Interval or Ratio scale – The sum of the values divided by the number of values--often called the "average." μ=ΣX/N – Add all of the values together. Divide by the total number of values to obtain the mean. – Example: X 7 12 24 20 19 ???? 60 Statistics The Mean is: μ=ΣX/N= 82/5=16.4 (7 + 12 + 24 + 20 + 19) / 5 = 16.4. 61 Median Measures of Central Tendency Median or Middle ------Ordinal Scale – Divides the values into two equal halves, with half of the values being lower than the median and half higher than the median. Sort the values into ascending order. If you have an odd number of values, the median is the middle value. If you have an even number of values, the median is the arithmetic mean (see above) of the two middle values. – Ex: The median of the same five numbers (7, 12, 24, 20, 19) is ???. 62 Mode The median is 19. Mode ----Nominal Scale – The most frequently-occurring value (or values). Calculate the frequencies for all of the values in the data. The mode is the value (or values) with the highest frequency. – Example: For individuals having the following ages -- 18, 18, 19, 20, 20, 20, 21, and 23, the mode is ???? 63 CHARACTERISTICS OF MODE Nominal Scale Discrete Variable Describing Shape 64 The Range The Mode is 20 The Range: The Range is the difference between the highest number –lowest number +1 2, 4, 7, 8, and 10 -> Discrete Numbers 2, 4.6, 7.3, 8.4, and 10 -> Continues Numbers The difference between the upper real limit of the highest number and the lower real limit of the lowest number. Variability 66 Variability Range, Interquartile Range, Semi-Interquartile Range, Standard Deviation, and Variance are the Measures of Variability Variability is a measure of dispersion or spreading of scores around the mean, and has 2 purposes: 1. Describes the distribution Next slide 67 Variability 2. How well an individual score (or group of scores) represents the entire distribution. i.e. in Z Score Ex. In inferential statistics we collect information from a small sample then, generalize the results obtained from the sample to the entire population. Next slide 68 Variability SS, Standard Deviations and Variances X 1 2 4 5 σ² = ss/N σ = √ss/N Pop s² = ss/n-1 or ss/df Standard deviation s = √ss/df Sample SS=Σx²-(Σx)²/N Computation SS=Σ( x-μ)² Definition Sum of Squared Deviation from Mean Variance (σ²) is the Mean of Squared Deviations=MS69 Suppose you earned a score of X = 54 on an exam. Which set of parameters would give you the highest grade? a. μ= 50 and σ= 2 σ²=4 b. μ= 50 and σ= 4 σ²=16 c. μ= 54 and σ= 2 σ²=4 d. μ= 54 and σ= 4 σ²=16 70 Suppose you earned a score of X = 46 on an exam. Which set of parameters would give you the highest grade? a. μ= 50 and σ= 2 σ²=4 b. μ= 50 and σ= 4 σ²=16 c. μ= 54 and σ= 2 σ²=4 d. μ= 54 and σ= 4 σ²=16 71 Covariance Correlation is based on a statistic called Covariance (Cov xy or S xy) ….. COVxy=SP/N-1 Correlation-- r=sp/√ssx.ssy Covariance is a number that reflects the degree to which 2 variables vary together. Original Data X Y 8 1 1 0 3 6 0 1 72 Covariance 73 Spearman Correlation rank order data then proceed X Y 1 1 2 3 3 2 4 4 74 Ranking/Monotonic Transformation Score Rank position Final Rank 3 1 1.5 3 2 1.5 5 3 3 6 4 5 6 5 5 6 6 5 12 7 7 75 76 Z Scores Z=x-μ/ σ Single score Z=M-μ/ σm Sample Mean and research σ = σ/√n m we use Z score when σ is known. 77 Z-Scores X= σ(Z)+µ µ= X- σZ σ= (X-µ)/Z If X=60 µ=50 σ=5 Z=? 78 Computations/ Calculations or Collect Data and Compute Sample Statistics Z Score for Research M=115, n=25 79 Z Score for Research Standard Error (σm ) 80 81 82 Stanines Stanines are used to compare an individual student’s achievement with the results obtained by a national reference sample chosen to represent a certain year level i.e. 2nd level, 3rd level a nine-point scale used for normalized test scores, with 1-3 below average, 4-6 average, and 7-9 above average. It is a nine-point scale of standard score with mean of 5 and SD of 2. 83 The Correlational Method Correlational data can be graphed and a “line of best fit” can be drawn 1- Pearson Correlations 2-Spearman 84 The Correlational Method Correlation is the degree to which events or characteristics vary from each other –Measures the strength of a relationship –Does not imply cause and effect 85 The Correlational Method Correlational data can be graphed and a “line of best fit” can be drawn 86 Positive Correlation Positive correlation = variables change in the same direction 87 Positive Correlation 88 Negative Correlation –Negative correlation = variables change in the opposite direction 89 Negative Correlation 90 No Correlation –Unrelated = No consistent relationship 91 No Correlation 92 The Correlational Method The magnitude (strength) of a correlation is also important –High magnitude = variables which vary closely together; fall close to the line of best fit –Low magnitude = variables which do not vary as closely together; loosely scattered around the line of best fit 93 The Correlational Method Direction and magnitude of a correlation are often calculated statistically –Called the “Correlation Coefficient,” symbolized by the letter “r” Sign (+ or -) indicates direction Number (from 0.00 to 1.00) indicates magnitude 0.00 = no consistent relationship +1.00 = perfect positive correlation -1.00 = perfect negative correlation Most correlations found in psychological research fall far short of “perfect” 94 The Correlational Method Correlations can be trusted based on statistical probability – “Statistical significance” means that the finding is unlikely to have occurred by chance By convention or agreement, if there is less than a 5% probability that findings are due to chance or (p < 0.05), results are considered “significant,” and thought to reflect the larger population –Generally, confidence increases with the size of the sample (n) and the magnitude of the correlation (r) 95 The Correlational Method Advantages of correlational studies: – Have high external validity Can generalize findings – Can repeat (replicate) studies on other samples Difficulties with correlational studies: – Lack internal validity Results describe but do not explain a relationship 96 External & Internal Validity *External Validity External validity addresses the ability to generalize your study to other people and other situations. *Internal Validity Internal validity addresses the "true" causes of the outcomes that you observed in your study. Strong internal validity means that you not only have reliable measures of your independent and dependent variables BUT a strong justification that causally links your independent variables (IV) to your dependent variables (DV). 97 The Correlational Method Pearson r=sp/√ssx.ssy Original Data X Y 1 3 2 6 4 4 5 7 SP requires 2 sets of data SS requires only one set of data 98 The Correlational Method Spearman r=sp/√ssx.ssy Original Data Ranks X Y X Y 1 3 1 1 2 6 2 3 4 4 3 2 5 7 4 4 SP requires 2 sets of data SS requires only one set of data 99 Regression and Prediction Y=bX+a Regression Line 100 101 Three Levels of Analysis for Prediction INPUTS PROCESSES OUTCOMES Ex. Stress (INPUT) is an unpleasant psychological (PROCESS) that occurs in response to environmental pressures (job) and can lead to withdrawal (OUTCOME). 1 0 2 prognosis 103 104