lOMoARcPSD|25267805 Psych Assessment C4 C5 Reviewer Psychological Assessment (Far Eastern University) Studocu is not sponsored or endorsed by any college or university Downloaded by flint cloak (ijfa0101@gmail.com) lOMoARcPSD|25267805 PSYCHOLOGICAL ASSESSMENT – CHAPTER 4: RELIABILITY History and Theory of Reliability Conceptualization of Error Psychology researchers pursue complex traits such as intelligence or aggressiveness, which one can neither see nor touch. the concern with reliability has been a particular obsession for psychologists and provides evidence of the advanced scientific status of the field Spearman’s Early Studies Charles Spearman - advanced development of reliability assessment Abraham De Moivre - introduced the basic notion of sampling error Karl Pearson - developed the product moment correlation Edward L. Thorndike - An Introduction to the Theory of Mental and Social Measurements sophisticated mathematical models have been developed to quantify “latent” variables based on multiple measures the greater the number of items, the higher the reliability Reliability can be estimated from the correlation of the observed test score with the true score Item Response Theory most important new development the computer is used to focus on the range of item difficulty that helps assess an individual’s ability level the method requires a bank of items that have been systematically evaluated for level of difficulty Models of Reliability reliability coefficient - the ratio of the variance of the true scores on a test to the variance of the observed scores Basics of Test Score Theory Classical test score theory - assumes that each person has a true score that would be obtained if there were no errors in measurement. Sources of Error errors of measurement are random Standard error of measurement - we usually assume that the distribution of random errors will be the same for all; basic measure of error The Domain Sampling Model considers the problems created by using a limited number of items to represent a larger and more complicated construct. Uses a sample of words An observed score may differ from a true score for many reasons (Ex. Situational factors) Test-retest method - consider the consistency of the test results when the test is administered on different occasions Parallel forms - evaluate the test across different forms of the test Internal consistency - how people perform on similar subsets of items selected from the same form of the measure Time Sampling: The Test–Retest Method used to evaluate the error associated with administering a test at two different times. applies only to measures of stable traits. Carryover effect - occurs when the first testing session influences scores from the second session Downloaded by flint cloak (ijfa0101@gmail.com) lOMoARcPSD|25267805 the time interval between testing sessions must be selected and evaluated carefully o a special case of the reliability formula that does not require the calculation of Item Sampling: Parallel Forms Method involves making sure that the test scores do not represent any one particular set of items or a subset of items from the entire domain. Parallel forms reliability - compares two equivalent forms of a test that measure the same attribute. Equivalent/parallel forms - When two forms of the test are available, one can compare performance on one form versus the other o o o Split-Half Method a test is given and divided into halves that are scored separately The results of one half of the test are then compared with the results of the other. Odd-even system - one subscore is obtained for the odd-numbered items in the test and another for the even-numbered items. Spearman-Brown formula - allows you to estimate what the correlation between the two halves would have been if each half had been the length of the whole test Cronbach’s coefficient alpha (a) – used when the two halves of a test have unequal variances (remains the most commonly used reliability index) the p’s and q’s for every item uses an approximation of the sum of the pq products—the mean test score. average difficulty level is 50% Difficulty - the percentage of test takers who pass the item Coefficient Alpha Cronbach developed a more general reliability estimate most general method of finding estimates of reliability through internal consistency. Factor analysis - one popular method for dealing with the situation in which a test apparently measures several different characteristics Reliability of a Difference Score Difference score - subtracting one test score from another Reliability in Behavioral Observation Studies KR20 (Kuder-Richardson) Formula simultaneously considers all possible ways of splitting the items. items are dichotomous, scored 0 or 1 (usually for right or wrong) Formula 21 or KR21 behavioral observation systems are frequently unreliable because of discrepancies between true scores and the scores recorded by the observer Reliability estimates: o Interrater o Interscorer o Interobserver o Interjudge Kappa Statistic Downloaded by flint cloak (ijfa0101@gmail.com) lOMoARcPSD|25267805 o o o o o best method for assessing the level of agreement among several observers introduced by J. Cohen a measure of agreement between two judges who each rate a set of objects using nominal scales Kappa - the actual agreement as a proportion of the potential agreement following correction for chance agreement Values vary between 1 (perfect agreement) and 21 (less agreement than can be expected on the basis of chance alone) The wider the interval, the lower the reliability of the score What to Do about Low Reliability Connecting Sources of Error with Reliability Assessment Method Increase the Number of Items - The larger the sample, the more likely that the test will represent the true characteristic Factor and Item Analysis - The reliability of a test depends on the extent to which all of the items measure one common characteristic o Tests are most reliable if they are unidimensional—one factor should account for considerably more of the variance than any other factor o Discriminability analysis - examine the correlation between each item and the total score for the test Correction for Attenuation - If a test is unreliable, information obtained with it is of little or no value. Thus, we say that potential correlations are attenuated, or diminished, by measurement error. Using Reliability Information Standard Errors of Measurement and the Rubber Yardstick The larger the standard error of measurement, the less certain we can be about the accuracy with which an attribute is measured a small standard error of measurement tells us that an individual score is probably close to the measured value. How Reliable Is Reliable? reliability estimates in the range of .70 and . 80 are good enough for most purposes in basic research. the most useful index of reliability for the interpretation of individual scores is the standard error of measurement PSYCHOLOGICAL ASSESSMENT – CHAPTER 5: VALIDITY Downloaded by flint cloak (ijfa0101@gmail.com) Validity lOMoARcPSD|25267805 the agreement between a test score or measure and the quality it is believed to measure “Does the test measure what it is supposed to measure?” the evidence for inferences made about a test score 3 types of evidence: : (1) construct related, (2) criterion related, and (3) content related organized into 3 sections: foundations, operations, and applications considers how tests are designed and built and how they are administered, scored, and reported. Aspects of Validity Content-Related Evidence for Validity considers the adequacy of representation of the conceptual domain the test is designed to cover content validity evidence: has been of greatest concern in educational testing and more recently in tests developed for medical settings Construct Underrepresentation - the failure to capture important components of a construct Construct-Irrelevant Variance - occurs when scores are influenced by factors irrelevant to the construct Criterion-Related Evidence for Validity mere appearance that a measure has validity it is crucial to have a test that “looks like” it is valid These appearances can help motivate test takers because they can see that the test is relevant. Criterion validity evidence - how well a test corresponds with a particular criterion Such evidence is provided by high correlations between a test and a welldefined criterion measure Criterion -the standard against which the test is compared. Predictive Validity Evidence - The forecasting function of tests o Ex. the SAT Critical Reading Test serves as predictive validity evidence as a college admissions test o SAT = predictor variable o GPA = criterion Concurrent validity evidence - applies when the test and the criterion can be measured at the same time. Validity Coefficient Face Validity Standards Predictive and Concurrent Evidence The relationship between a test and a criterion is usually expressed as a correlation tells the extent to which the test is valid for making statements about the criterion .30 to .40 = adequate Evaluating Validity Coefficients 1. Look for Changes in the Cause of Relationships - Be aware that the conditions of a validity study are never exactly reproduced 2. What Does the Criterion Mean? Criterionrelated validity studies mean nothing at all unless the criterion is valid and reliable 3. Review the Subject Population in the Validity Study - the validity study might have been done on a population that does not represent the group to which inferences will be made. 4. Be Sure the Sample Size Was Adequate s a validity coefficient that is based on a small number of cases 5. Never Confuse the Criterion with the Predictor 6. Check for Restricted Range on Both Predictor and Criterion - A variable has a “restricted range” if all scores for that variable fall very close together 7. Review Evidence for Validity Generalization - Criterion-related validity evidence obtained in one situation may not be generalized to other similar situations o Generalizability - the evidence that the findings obtained in one situation can be generalized i.e., applied to other situations Downloaded by flint cloak (ijfa0101@gmail.com) lOMoARcPSD|25267805 8. Consider Differential Prediction Predictive relationships may not be the same for all demographic groups Construct-Related Evidence for Validity Construct - something built by mental synthesis Construct validity evidence - established through a series of activities in which a researcher simultaneously defines some construct and develops the instrumentation to measure it. Construct validation - involves assembling evidence about what a test means (done by showing the relationship between a test and other tests and measures) Convergent Evidence When a measure correlates well with other tests believed to measure the same construct measures of the same construct converge, or narrow in, on the same thing Discriminant Evidence / Divergent Validation The answer is that the index taps something other than the tests used in the convergent evidence studies demonstration of uniqueness a test should have low correlations with measures of unrelated constructs, or evidence for what the test does not measure Criterion-Referenced Tests have items that are designed to match certain specific instructional objectives Validity studies for the criterion-referenced tests would compare scores on the test to scores on other measures that are believed to be related to the test Relationship between Reliability and Validity Attempting to define the validity of a test will be futile if the test is not reliable. we can have reliability without validity it is logically impossible to demonstrate that an unreliable test is valid Downloaded by flint cloak (ijfa0101@gmail.com)