FINAL TERM REVIEWER Reliability - Deals with the consistency of measure, regardless of exactly what it is measuring. Validity - Deals with what a test measures, specifically whether it measures what it intended to measure. Classical Test Theory - Observed score = True score + Error (X = T + E). - Variance = true variance + error variance. - Observed score: The scores the individual actually obtained. - True score: The portion of the observed score that reflects whatever ability, trait, or characteristic the test assesses. - Error: Component of the observed score that does not have to do with the test taker's true ability or trait being measured. - Two types of error: - Random error: Caused by unpredictable fluctuations and inconsistencies of other variables on the measurement. - Systematic error: Typically constant or proportionate to what is presumed to be the true value of the variable being measured. Sources of Variance - Test construction/test content: Variation within items on a test or between tests. - Test administration, test environment: Test taker variables, examiner-related variables. - Test scoring and interpretation: Scorer or rater differs in subjectivity. Reliability Estimates - To eliminate error of variance. - Test-retest reliability: Same test but administered at different times (e.g., 6 months). - Must consider suitable interval, test taker's motivation, practice run, acquired new skills/learning. - Alternate forms reliability: Two tests administered at different times (e.g., 2 3 weeks). - Two different versions of a test have been constructed to be parallel. - Internal consistency estimate: Administered only once. - Internal consistency of the test items. - Methods of internal consistency estimate of reliability: - Split-half: One test divided into two equal tests, and measure the difference of scores (Spearman-Brown formula). - Kuder-Richardson formula 20: Determines the inter-item consistency of dichotomous items. - Coefficient alpha: Mean of all split-half correlations, supported by the Spearman-Brown equation. - Average Proportional Distance (APD): Degree of difference between scores on test items. - Inter-scorer reliability: The degree of agreement or consistency between two or more scorers with regard to a particular measure. Item Response Theory Generalizability theory. Validity - How accurately a test measures what it claims to measure - Ensures that a test is genuinely assessing what it intends to assess. Assumption 1: - Validity is a characteristic of the test itself, not just how scores are interpreted. - Problem: This assumption holds as long as validation data supports the intended purpose and the test is used with the specified population. *Insert Assumption 2* Assumption 3: - The accuracy of test scores depends, to some extent, on the test author's understanding of the constructs they intend to measure. Problems of the Classical Definition: - Assumptions 1 and 2 are only justified for tests that clearly link behavior to psychological constructs. - Can lead to confusion between measurement consistency and reliability. - May also cause misinterpretation of test titles, attaching validity to what tests claim to measure rather than the actual scores. Other Definitions of Validity: - Cohen: A judgment or estimate of how well a test measures in a specific context. - Urbina: The degree to which accumulated evidence supports the intended interpretation of test scores. - Domino & Domino: An integrated judgment of the adequacy and appropriateness of interpretation and actions based on assessment measures. - The Standards for Educational and Psychological Testing: The degree to which evidence and theory support interpretations of test scores for proposed uses. Validation: - The process of gathering and evaluating evidence about the accuracy of a test. Validity vs. Validation: - Validity - refers to assumptions about what is being measured - Validation - refers to assumptions about how it should be measured. Factors Impacting Validity: - Internal Validity: Ensures that tested variables are not influenced by other factors. - External Validity: Determines the degree of confidence in applying test results to broader contexts. Categories of Validity: 1. Content Validity: - Based on evaluating subjects, topics, or content covered by test items. - Deals with the relationship between test content and a well-defined domain of knowledge or behavior. - Judgment of how well a test samples behavior representative of its intended universe. - Lawshe method - raters judge as each item as to whether it is - Essential - Useful but not essential - Not necessary - Content validity methods - Expert Judgment - Content Validity Index (CVI) - Delphi Technique - Cognitive Interviews - Pilot Testing - Literature Review - Focus Groups. 2. Criterion-Related Validity: - Involves a standard against which a test or test score is evaluated. - Expresses the relationship between test scores and status on another criterion reflecting the construct of interest. - Characteristics of and adequate criteria: - Relevant - Valid for the intended purpose - Uncontaminated - Can be determined concurrently or predictively. - Concurrent Validity- how well a test score is related to some criterion measured at the same time - Predictive Validity - how well a test score predicts some criterion or outcome measure in the future. - Validity Coefficient - measures the relationship between test scores and the criterion measure. - Incremental Validity - assesses how an additional predictor explains something beyond existing predictors. - Expectancy Data - provide information for evaluating criterion-related validity. - Criterion Contamination - refers to undesirable situations where test scores unfairly influence the criterion. 3. Construct Validity: - Involves the test's ability to measure a theorized construct. - Constructs -informed, scientific ideas developed to describe or explain behavior. - High and low scorers should behave as theorized if a test is a valid measure of a construct. - All types of validity evidence fall under the umbrella of construct validity. - Evidence of Construct Validity - Homogeneity - extent to which a test consistently measures a single concept - - Changes with age - some constructs are expected to change over time Pretest/posttest changes - test scores change due to experiences between a pretest and a posttest Distinct group differences - scores on a test vary predictably based on membership in a specific group Convergent evidence - if high correlations with existing tests measuring the same trait Discriminant evidence - if the test scores do not correlate strongly with unrelated variables; discriminate other different constructs Factor analysis - new tests should load on a common factor with other tests measuring the same construct. - Exploratory factor analysis - uncovers the underlying structure of observed variables - Confirmatory factor analysis - Validates pre-existing theories by testing whether observed variables align with the hypothesized factor structure.