Overview • Correlation -Definition -Deviation Score Formula, Z score formula -Hypothesis Test • Regression -Intercept and Slope -Unstandardized Regression Line -Standardized Regression Line -Hypothesis Tests Associations among Continuous Variables 3 characteristics of a relationship Direction Positive(+) Negative (-) Degree of association Between –1 and 1 Absolute values signify strength Form Linear Non-linear Direction Positive Negative C1 vs C2 C1 vs C2 120.0 20.0 80.0 C2 C2 13.3 40.0 6.7 0.0 0.0 4.0 8.0 12.0 C1 0.0 0.0 83.3 166.7 250.0 C1 Large values of X = large values of Y, Small values of X = small values of Y. Large values of X = small values of Y Small values of X = large values of Y - e.g. IQ and SAT -e.g. SPEED and ACCURACY Degree of association Strong (tight cloud) Weak (diffuse cloud) C1 vs C2 C1 vs C2 120.0 20.0 80.0 C2 C2 13.3 40.0 6.7 0.0 0.0 4.0 8.0 C1 12.0 0.0 0.0 4.0 8.0 C1 12.0 Form Linear Non- linear Regression & Correlation The 2001 Mets 250 225 Weight 200 in Pounds 175 150 67 68 69 70 71 72 73 74 75 76 77 Height in Inches What is the best fitting straight line? Regression Equation: Y = a + bX How closely are the points clustered around the line? Pearson’s R Correlation Correlation - Definition Correlation: a statistical technique that measures and describes the degree of linear relationship between two variables Scatterplot Obs A B C D E F X 1 1 3 4 6 7 Y 1 3 2 5 4 5 Y Dataset X Pearson’s r A value ranging from -1.00 to 1.00 indicating the strength and direction of the linear relationship. Absolute value indicates strength +/- indicates direction The Logic of Correlation MEAN of X MEAN of Y Below average on X Above average on X Above average on Y Above average on Y Below average on X Above average on X Below average on Y Below average on Y Cross-Product = ( X X )(Y Y ) For a strong positive association, the cross-products will mostly be positive The Logic of Correlation MEAN of X MEAN of Y Below average on X Above average on X Above average on Y Above average on Y Below average on X Above average on X Below average on Y Below average on Y Cross-Product = ( X X )(Y Y ) For a strong negative association, the cross-products will mostly be negative The Logic of Correlation MEAN of X MEAN of Y Below average on X Above average on X Above average on Y Above average on Y Below average on X Above average on X Below average on Y Below average on Y Cross-Product = ( X X )(Y Y ) For a weak association, the cross-products will be mixed Pearson’s r covariatio n of X, Y r total variation of X, Y Deviation score formula SP r SSXSSY SP (sum of products) =∑ (X – X)(Y – Y) Deviation Score Formula Femur Humerus ( X X ) A 38 41 B 56 63 C 59 70 D 64 72 E 74 84 mean 58.2 66.00 (Y Y ) ( X X )2 SSX SP r SSXSSY (Y Y )2 ( X X )(Y Y ) SSY SP Deviation Score Formula Femur Humerus ( X X ) A 38 41 -20.2 B 56 63 -2.2 C 59 70 0.8 D 64 72 5.8 E 74 84 15.8 mean 58.2 66.00 SP r SSXSSY = .99 (Y Y ) ( X X )2 (Y Y )2 ( X X )(Y Y ) -25 408.04 4.84 -3 .64 4 625 9 16 505 6.6 3.2 6 33.64 18 249.64 696.8 SSX 36 324 1010 SSY 34.8 284.4 834 SP Pearson’s r Below average on X Above average on X Below Average on Y Above average on Y Below average on X Above average on X Below average on Y Below average on Y Deviation score formula SP r SSXSSY For a strong positive association, the SP will be a big positive number SP (sum of products) =∑ (X – X)(Y – Y) Pearson’s r Below average on X Above average on X Below Average on Y Above average on Y Below average on X Above average on X Below average on Y Below average on Y Deviation score formula SP r SSXSSY For a strong negative association, the SP will be a big negative number SP (sum of products) =∑ (X – X)(Y – Y) Pearson’s r Below average on X Above average on X Below Average on Y Above average on Y Below average on X Above average on X Below average on Y Below average on Y Deviation score formula SP r SSXSSY For a weak association, the SP will be a small number (+ and – will cancel each other out) SP (sum of products) =∑ (X – X)(Y – Y) Pearson’s r covariatio n of X, Y r total variation of X, Y Z score formula Z r X Zy n -1 Standardized cross-products Z-score formula Femur Humerus A 38 41 B 56 63 C 59 70 D 64 72 E 74 84 mean 58.2 66.00 s 13.20 15.89 Z r X Zy n -1 ZX ZY ZXZY Z-score formula Femur Humerus ZX ZY A 38 41 -1.530 -1.573 B 56 63 -0.167 -0.189 C 59 70 0.061 0.252 D 64 72 0.439 0.378 E 74 84 1.197 1.133 mean 58.2 66.00 s 13.20 15.89 Z r X Zy n -1 ZXZY Z-score formula Femur Humerus ZX ZY ZXZY A 38 41 -1.530 -1.573 2.408 B 56 63 -0.167 -0.189 0.031 C 59 70 0.061 0.252 0.015 D 64 72 0.439 0.378 0.166 E 74 84 1.197 1.133 1.356 mean 58.2 66.00 ∑=3.976 s 13.20 15.89 Z r X Zy n -1 r = .99 Formulas for R Deviations formula SP r SS X SSY Z score formula Z r X Zy n -1 Interpretation of R • A measure of strength of association: how closely do the points cluster around a line? •A measure of the direction of association: is it positive or negative? Interpretation of R • r = .10 very small association, not usually reliable • r = .20 small association • r = .30 typical size for personality and social studies • r = .40 moderate association • r = .60 you are a research rock star • r = .80 hmm, are you for real? Interpretation of R-squared r 2 • The amount of covariation compared to the amount of total variation • “The percent of total variance that is shared variance” • E.g. “If r = .80, then X explains 64% of the variability in Y” (and vice versa) Hypothesis testing with r Hypotheses H0: = 0 HA : ≠ 0 Test statistic = r r t SEr 1 r 2 SEr n2 df n 2 Or just use table E.2 to find critical values of r Practice A B C D E mean alcohol 6.47 6.13 6.19 4.89 5.63 tobacco ( X X ) 4.03 3.76 3.77 3.34 3.47 (Y Y ) ( X X )2 SSX r SP SS X SSY df n 2 (Y Y )2 ( X X )(Y Y ) SSY SP Practice A B C D E mean alcohol tobacco ( X X ) 6.47 4.03 6.13 3.76 6.19 3.77 4.89 3.34 5.63 3.47 r .95 (Y Y ) ( X X )2 1.55 SSX df 3 (Y Y )2 ( X X )(Y Y ) .30 SSY .64 SP Properties of R • A standardized statistic – will not change if you change the units of X or Y. (bc based on z-scores) •The same whether X is correlated with Y or vice versa • Fairly unstable with small n •Vulnerable to outliers • Has a skewed distribution Linear Regression Linear Regression • But how do we describe the line? • If two variables are linearly related it is possible to develop a simple equation to predict one variable from the other • The outcome variable is designated the Y variable, and the predictor variable is designated the X variable • E.g. centigrade to Fahrenheit: F = 32 + 1.8C this formula gives a specific straight line The Linear Equation Where a = intercept b = slope X = the predictor Y = the criterion a and b are constants in a given line; X and Y change Criterion • F = 32 + 1.8(C) • General form is Y = a + bX • The prediction equation: Y’ = a+ bX Predictor The Linear Equation Where a = intercept b = slope X = the predictor Y = the criterion Criterion • F = 32 + 1.8(C) • General form is Y = a + bX • The prediction equation: Y’ = a + bX Different b’s… Predictor The Linear Equation Where a = intercept b = slope X = the predictor Y = the criterion Criterion • F = 32 + 1.8(C) • General form is Y = a + bX • The prediction equation: Y’ = a + bX Different a’s… Predictor The Linear Equation Where a = intercept b = slope X = the predictor Y = the criterion Criterion • F = 32 + 1.8(C) • General form is Y = a + bX • The prediction equation: Y’ = a + bX Different a’s and b’s … Predictor Slope and Intercept • Equation of the line Y ' a bX The slope b: the amount of change in y with one unit change in x The intercept a: the value of y when x is zero Slope and Intercept • Equation of the line The slope b r The intercept Y ' a bX sy SP s x SS X a Y bX The slope is influenced by r, but is not the same as r When there is no linear association (r = 0), the regression line is horizontal. b=0. The 2001 Mets 45 40 Age in Years 35 30 25 20 68 69 70 71 72 73 74 75 Height in Inches and our best estimate of age is 29.5 at all heights. 76 77 When the correlation is perfect (r = ± 1.00), all the points fall along a straight line with a s slope b r s y x The 2001 Mets 200 195 190 Height in 185 centimeters 180 175 170 68 69 70 71 72 73 Height in Inches 74 75 76 77 When there is some linear association (0<|r|<1), the regression line fits as close to the points as possible s and has a slope b r y sx The 2001 Mets 250 225 Weight 200 in Pounds 175 150 67 68 69 70 71 72 73 Height in Inches 74 75 76 77 Where did this line come from? It is a straight line which is drawn through a scatterplot, to summarize the relationship between X and Y It is the line that minimizes the squared deviations (Y’ – Y)2 We call these vertical deviations “residuals” Regression lines Minimizing the squared vertical distances, or “residuals” Unstandardized Regression Line • Equation of the line The slope b r The intercept Y ' a bX sy SP s x SS X a Y bX Properties of b (slope) • An unstandardized statistic – will change if you change the units of X or Y. •Depends on whether Y is regressed on X or vice versa Standardized Regression Line • Equation of the line The slope r The intercept none ZY ' (Z X ) A person 1 stdev above the mean on height would be how many stdevs above the mean on weight? Properties of β (standardized slope) • A standardized statistic – will not change if you change the units of X or Y. •Is equal to r, in simple linear regression Exercise X Y 1 3 1 4 5 4 5 6 9 6 9 Calculate: r= b= a= β= Write the regression equation: Write the standardized equation: 7 Y ' a bX br sy sx a Y bX Exercise X Calculate: Y 1 3 1 4 5 4 5 6 9 6 9 r = .866 b = .375 a = 3.125 β = .866 Write the regression equation: Y ' 3.125 .375 X Write the standardized equation: Z .866Z 7 Y' Y ' a bX br sy sx a Y bX X Regression Coefficients Table Predictor Unstandardized Coefficient Standard Standardized error Coefficient Intercept a SEa - Variable X b SEb t a SEa b t SEb t sig Summary Correlation: Pearson’s r r Z r SP SS X SSY X ZY n 1 Unstandardized Regression Line a Y bX b SP SS X br sy sx Standardized Regression Line r t r SEr Y ' a bX t a SEa t b SEb ZY ' (Z X ) Exercise in Excel X Y 1 -1.5 1 0 2 -3 4 -4 7 -2 9 -6 Calculate: r= b= a= β= Write the regression equation: Write the standardized equation: Sketch the scatterplot and regression line