Correlation and Regression Quantitative Methods in HPELS 440:210 Agenda Introduction The Pearson Correlation Hypothesis Tests with the Pearson Correlation Regression Instat Nonparametric versions Introduction Correlation: Statistical technique used to measure and describe a relationship between two variables Direction of relationship: Positive Negative Form of relationship: Linear Quadratic ... Degree of relationship: -1.0 0.0 +1.0 Uses of Correlations Prediction Validity Reliability Agenda Introduction The Pearson Correlation Hypothesis Tests with the Pearson Correlation Regression Instat Nonparametric versions The Pearson Correlation Statistical Notation Recall for ANOVA: r = Pearson correlation SP = sum of products of deviations Mx = mean of x scores SSx = sum of squares of x scores Pearson Correlation Formula Considerations Recall for ANOVA: SP = S(X – Mx)(Y – My) SP = SXY – SXSY / n = S(X – Mx)2 SSy = S(Y – My)2 r = SP / √SSxSSy SSx Pearson Correlation Step 1: Calculate SP Step 2: Calculate SS for X and Y values Step 3: Calcuate r Step 1 SP SXY = (0*1)+(10*3)+(4*1)+(8*2)+(8*3) SXY = 0 + 30 + 4 + 16 + 24 SXY = 74 SP = SXY – SXSY / n SP = 74 – [30(100)]/5 SP = 74 - 60 SP = 14 SX=30 SY=10 SP = S(X – Mx)(Y – My) SP = (-6*-1)+(4*1)+(-2*-1)+(2*0)+(2*1) SP = 6 + 4 + 2 + 0 + 2 SP = 14 Step 2 SSx and SSy Step 3 r r = SP / √SSxSSy r = 14 / √(64)(4) r = 14 / √256 r = 14/16 r = 0.875 Interpretation of r Correlation ≠ causality Restricted range data does not represent the full range of scores – be wary If Outliers can have a dramatic effect Figure 16.9 Correlation and variability Coefficient of determination (r2) Agenda Introduction The Pearson Correlation Hypothesis Tests with the Pearson Correlation Regression Instat Nonparametric versions The Process Step 1: State hypotheses Non directional: Directional: a = 0.05 Step 3: Collect data and calculate statistic H0: ρ ≤ 0 (no positive population correlation) H1: ρ < 0 (positive population correlation exists) Step 2: Set criteria H0: ρ = 0 (no population correlation) H1: ρ ≠ 0 (population correlation exists) r Step 4: Make decision Accept or reject Example Researchers are interested in determining if leg strength is related to jumping ability Researchers measure leg strength with 1RM squat (lbs) and vertical jump height (inches) in 5 subjects (n = 5) Step 1: State Hypotheses Non-Directional H0: ρ = 0 H1: ρ ≠ 0 Critical value = 0.878 Step 2: Set Criteria Alpha (a) = 0.05 Critical Value: Use Critical Values for Pearson Correlation Table Appendix B.6 (p 697) Information Needed: df = n - 2 Alpha (a) = 0.05 Directional or non-directional? 0.878 Step 3: Collect Data and Calculate Statistic Data: S Calculate SP SP = SXY – SXSY / n SP = 27135 – [1065(126)]/5 SP = 27135 - 26838 SP = 297 X Y XY 200 25 5000 180 22 3960 225 27 6075 300 27 8100 160 25 4000 X X-Mx (X-Mx)2 1065 126 27135 200 -13 169 180 -33 1089 225 12 144 300 87 7569 160 -53 2809 Calculate SSx M 213 S 11780 M Step 3: Collect Data and Calculate Statistic Calculate SSy Y Y-My (Y-My)2 X X-Mx (X-Mx)2 25 -0.2 0.04 200 -13 169 22 -3.2 10.24 180 -33 1089 27 1.8 3.24 225 12 144 27 1.8 3.24 300 87 7569 25 -0.2 0.04 160 -53 2809 25.2 S 16.8 M 213 S Calculate r Step 4: Make Decision r = SP / √SSxSSy 0.667 < 0.878 r = 297 / √11780(16.8) Accept or reject? r = 297 / √197904 r = 297 / 444.86 r = 0.667 11780 Agenda Introduction The Pearson Correlation Hypothesis Tests with the Pearson Correlation Regression Instat Nonparametric versions Regression Recall Several uses of correlation: Prediction Validity Reliability Regression attempts to predict one variable based on information about the other variable Line of best fit Regression Line of best fit can be described with the following linear equation Y = bX + a where: Y = predicted Y value b = slope of line X = any X value a = intercept 25 5 Y = bX + a, where: Y = cost (?) b = cost per hour ($5) X = number of hours (?) a = membership cost ($25) Y = 5X + 25 Y = 5X + 25 Y = 5(10) + 25 Y = 5(30) + 25 Y = 50 + 25 = 75 Y = 150 + 25 = 175 Line of best fit minimizes distances of points from line Calculation of the Regression Line Regression line = line of best fit = linear equation SP = S(X – Mx)(Y – My) SSx = S(X – Mx)2 b = SP / SSx a = My - bMx Example 16.14, p 557 Mx=5 My=6 SP = S(X – Mx)(Y – My) SSx = S(X – Mx)2 b = SP / SSx SP = 16 SP = 10 b = 16 / 10 = 1.6 a = My - bMx Y = bX + a a = 6 – 1.6(5) = -2 Y = 1.6(X) - 2 Agenda Introduction The Pearson Correlation Hypothesis Tests with the Pearson Correlation Regression Instat Nonparametric versions Instat - Correlation Type data from sample into a column. Label column appropriately. Choose “Manage” Choose “Column Properties” Choose “Name” Choose “Statistics” Choose “Regression” Choose “Correlation” Instat – Correlation Choose the appropriate variables to be correlated Click OK Interpret the p-value Instat – Regression Type data from sample into a column. Label column appropriately. Choose “Manage” Choose “Column Properties” Choose “Name” Choose “Statistics” Choose “Regression” Choose “Simple” Instat – Regression Choose appropriate variables for: Response (Y) Explanatory (X) Check “significance test” Check “ANOVA table” Check “Plots” Click OK Interpret p-value Reporting Correlation Results Information to include: Examples: Value of the r statistic Sample size p-value A correlation of the data revealed that strength and jumping ability were not significantly related (r = 0.667, n = 5, p > 0.05) Correlation matrices are used when interrelationships of several variables are tested (Table 1, p 541) Agenda Introduction The Pearson Correlation Hypothesis Tests with the Pearson Correlation Regression Instat Nonparametric versions Nonparametric Versions Spearman rho when at least one of the data sets is ordinal Point biserial correlation when one set of data is ratio/interval and the other is dichotomous Male vs. female Success vs. failure Phi coefficient when both data sets are dichotomous Violation of Assumptions Nonparametric Version Friedman Test (Not covered) When to use the Friedman Test: Related-samples design with three or more groups Scale of measurement assumption violation: Ordinal data Normality assumption violation: Regardless of scale of measurement Textbook Assignment Problems: 5, 7, 10, 23 (with post hoc)