Correlation & Prediction REVIEW • Correlation • Bivariate Direct/Indirect Cause/Effect • Strength of relationships (is + stronger than negative?) • Coefficient of determination (r2); Predicts what? • Linear vs Curvilinear relationships Inferential Statistics Used to infer sample characteristics to a population Table 5-2 Variable Classification Independent Dependent Presumed cause The antecedent Manipulated/measured by researcher Predicted from Predictor X Presumed effect The consequence Outcome (measured) Predicted to Criterion Y Common Statistical Tests • Chi-Square Determine association between two nominally scaled variables. • Independent t-test Determine differences in one continuous DV between ONLY two groups. • Dependent t-test Compare 2 related (paired) groups on one continuous DV. • One-Way ANOVA Examine group differences between 1 continuous DV & 1 nominal IV. Can handle more than two groups of data. What Analysis? IV DV Statistical Test 1 Nominal 1 Nominal Chi-Square 1 Nominal (2 groups) 1 Nominal (>2 groups) 1 continuous t-test 1 continuous One-Way ANOVA Some Examples • Chi-Square Gender and knee injuries in collegiate basketball players (Q angle) • Independent t-test Differences in girls and boys (independent groups; mutually exclusive) on PACER laps • Dependent t-test Pre and Post measurement of same group or matched pairs (siblings) on number of push-ups completed • One-Way ANOVA Major (AT, ES, PETE; IV >2 levels) and pre-test grade in this class Norm-Referenced Measurement HPHE 3150 Dr. Ayers Topics for Discussion • Reliability (variance & PPM correlation support reliability & validity) • Consistency • Repeatability • Validity • Truthfulness • Objectivity • Inter-rater reliability • Relevance • Degree to which a test pertains to its objectives Reliability Observed, Error, and True Scores Observed Score = True Score + Error Score ALL scores have true and error portions True scores are impossible to measure 2 O b s e r v e d S c o r e V a r i a n c e = S o 2 E r r o r S c o r e V a r i a n c e = S e 2 T r u e S c o r e V a r i a n c e = S t Reliability THIS IS HUGE!!!! rxx ' S S 2 2 true observed S 2 observed S 2 S 2 error observed Reliability is that proportion of observed score variance that is true score variance TIP: use algebra to move S2t to stand alone as shown in formula above (subtract S2e from both sides of equation ) S2 o = S2 t + S 2 e •Desirable reliability > .80 •There is variation in observed, true & error scores •Error can be +(↑ observed scores) or –(↓ observed scores) •Error scores contribute little to observed variation •Error score mean is 0 •S2o = S2t + S2e •Validity depends on reliability and relevance •Observed variance is necessary •Generally, longer tests are more reliable (fosters variance) Table 6-1 Systolic Blood Pressure Recordings for 10 Subjects Subject 1 2 3 4 5 6 7 8 9 10 Sum (S) Mean (M) Variance (S2) S Observed BP 103 117 116 123 127 125 135 126 133 145 1250 125.0 133.6 11.6 = True BP 105 115 120 125 125 125 125 130 135 145 1250 125.0 116.7 + Error BP -2 +2 -4 -2 +2 0 +10 -4 -2 0 0 0 16.9╣ 10.8 4.1 ╣ Se is square root of S2e Reliability Coefficients • Interclass Reliability • Correlates 2 trials • Intraclass Reliability • Correlates >2 trials Interclass Reliability (Pearson Product Moment) • Test Retest (administer test 2x & correlate scores) • See Excel document (Norm-ref msmt examples) • Time, fatigue, practice effect • Equivalence (create 2 “equivalent” test forms) • Odd/Even test items on a single test • Addresses most of the test/retest issues • Reduces test size 50% (not desirable); longer tests are > reliable • Split Halves • Spearman-Brown prophecy formula Index of Reliability rxx ' The theoretical correlation between observed scores and true scores High I of R = low error Square root of the reliability coefficient If r=.81, I of R=.9 Compared to the Coefficient of Determination: r2 (shared variance) I of R vs C of Det. If r=.81 I of R =? C of Det=? Reliability So What? Find a friend and talk about: 1 thing you “got” today 1 thing you “missed” today; can they help? Reliability REVIEW • Inferential • Infer sample findings to entire population • Chi Square (2 nominal variables) • t-test (1 nominal variable for 2 groups, 1 continuous) • ANOVA (1 nominal variable for 2+ groups, 1 continuous) • Correlation • Are two variables related? • What happens to Y when X changes? • Linear relationship between two variables • Quantifies the RELIABILITY & VALIDITY of a test or measurement • Reliability (0-1; .80+ goal) • All scores: observed = true + error • rxx=S2t/S2o • proportion of observed score variance that is true score variance • Interclass reliability coefficients (correlates 2 trials) • Test/retest time, fatigue, practice effect • Equivalent reduces test length by 50% • Split-halves • Index of Reliability • Tells you what? • Related to C of D how? rxx ' Standard Error of Measurement RELIABILITY MEASURE SEM S 1 rxx ' S=standard deviation of the test rxx’=reliability of the test Reflects the degree to which a person's observed score fluctuates as a result of measurement errors EXAMPLE: Test standard deviation=100 SEM = 100 =100(.16) =100(.4) =40 1 .84 r=.84 SEM is the standard deviation of the measurement errors around an observed score EXAMPLE: Average test score=500 SEM=40 68% of all scores should fall between 460-540 (500+40) 95% of all scores range between: ? 420-580 Standard Error of Estimate (reflects accuracy of estimating a score on the criterion measure) VALIDITY MEASURE Standard Error Standard Error of Prediction SEE S 1 r 2 xy Standard Errors both are standard deviations SE of Measurement (reliability) SEM S 1 rxx ' SE of Estimate (criterion-related validity) SEE S 1 r 2 xy Factors Affecting Test Reliability 1) 2) 3) 4) 5) 6) 7) 8) Fatigue ↓ Practice ↑ Subject variability homogeneous ↓, heterogeneous ↑ Time between testing more time= ↓ Circumstances surrounding the testing periods change=↓ Test difficulty too hard/easy= ↓ Precision of measurement precise= ↑ Environmental conditions change=↓ SO WHAT? A test must first be reliable to be valid Validity Types THIS SLIDE IS HUGE!!!! • Content-Related Validity (a.k.a., face validity) • Should represent knowledge to be learned • Criterion for content validity rests w/ interpreter • Use “experts” to establish • Criterion-Related Validity • Test has a statistical relationship w/ trait measured • Alternative measures validated w/ criterion measure • Concurrent: criterion/alternate measured same time • Predictive: criterion measured in future • Construct-Related Validity • Validates theoretical measures that are unobservable Methods of Obtaining a Criterion Measure • Actual participation (game play) • Skills tests, expert judges • Perform the criterion (treadmill test) • Distance runs, sub-maximal swim, run, cycle • Heart disease (developed later in life) • Present diet, behaviors, BP, family history • Success in grad school • GRE scores, UG GPA Interpreting the “r” you obtain THIS IS HUGE!!!! Correlation Matrix for Development of a Golf Skill Test (From Green et al., 1987) Playing golf Long putt Chip shot Pitch shot Middle distance shot Playing golf 1.00 Long putt .59 1.00 Chip shot .58 .47 1.00 Pitch shot .54 .37 .35 1.00 Middle distance shot .66 .55 .61 .40 1.00 Drive -.65 -.62 -.48 -.52 -.79 Drive What are these? Concurrent Validity coefficients 1.00 Interpret these correlations Actual golf score Criterion Putting Trial 1 Putting Trial 2 Actual golf score 1.00 Putting T1 .78 1.00 Putting T2 .74 .83 1.00 Driving T1 .58 .21 .25 Driving T2 .68 .25 .30 Observer 1 .48 .34 .40 Observer 2 .39 .30 .41 Driving Trial 1 Driving Trial 2 Observer Observer 1 2 What are these? 1.00 Concurrent .70 1.00 Validity coefficients .43 .38 .47 .35 1.00 .50 1.00 Interpret these correlations Actual golf score Putting Trial 1 Actual golf score 1.00 Putting T1 .78 1.00 Putting T2 .74 .83 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer Observer 1 2 What are these? 1.00 Reliability coefficients Driving T1 .58 .21 .25 1.00 Driving T2 .68 .25 .30 .70 1.00 Observer 1 .48 .34 .40 .43 .38 1.00 Observer 2 .39 .30 .41 .47 .35 .50 1.00 Interpret these correlations Actual golf score Actual golf score 1.00 Putting T1 .78 Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer Observer 1 2 1.00 What is this? Putting T2 .74 .83 1.00 Driving T1 .58 .21 .25 1.00 Driving T2 .68 .25 .30 .70 1.00 Objectivity coefficient Observer 1 .48 .34 .40 .43 .38 1.00 Observer 2 .39 .30 .41 .47 .35 .50 1.00 Concurrent Validity This square represents variance in performance in a skill (e.g., golf) Concurrent Validity The different colors and patterns represent different parts of a skills test battery to measure the criterion (e.g., golf) Concurrent Validity Error The orange color represents ERROR or unexplained variance in the criterion (e.g., golf) Remember: ↑error = ↓ validity Concurrent Validity A B C D Consider the Concurrent validity of the above 4 possible skills test batteries Concurrent Validity D – it has the MOST error and requires 4 tests to be administered A B C D Which test battery would you be LEAST likely to use? Why? Concurrent Validity C – it has the LEAST error but it requires 3 tests to be administered A B C Which test battery would you be MOST likely to use? Why? D Concurrent Validity A or B – requires 1 or 2 tests to be administered but you lose some validity A B C Which test battery would you use if you are limited in time? D