Psychology 820 Correlation Regression & Prediction Concept of Correlation A coefficient of correlation (r or ρ ‘rho’) is a statistical summary of the degree and direction of relationship or association between two variables (X and Y) Degree of Relationship Correlations range from 0 to 1.00 Direction of Relationship Positive (+) relationship: High score on X goes with a High score on Y Negative (-) relationship: High score on X goes with Low score on Y The Bivariate Normal Distribution A family of three dimensional surfaces Scatterplots The chief purpose of the scatterplot is for the study of the nature of the relationship between two variables. Components of r Pearson Product Moment Correlation Additional Measures of Relationships Spearman Rank Correlation Both X and Y are ranks Phi Coefficient Both X and Y are dichotomies Point-Biserial Coefficient One dichotomous variable and one continuous measure Biserial Correlation One artificial dichotomy and one continuous measure Tetrachoric Coefficient Both X and Y are artificial dichotomies Linear and Curvilinear Relationships Only the degree of linear relationship is described by r or ρ If there is a substantial nonlinear relationship between two variables, a different correlation coefficient (such as eta η) should be used Linear Transformations and Correlation Any transformation of X or Y that is linear does not affect the correlation coefficient This includes transformations to zscores, T-scores, addition of a constant to all values, subtracting multiplying or dividing by non-zero constants Effects of Variability on Correlation The variability (heterogeneity) of the sample has an important influence on r Range restriction Causation and Correlation Correlation must be carefully distinguished from causation. Third Variable Factor Effect of Outliers Regression and Prediction Prediction and correlation are opposite sides of the same coin Regression is usually the statistical method of choice when the predicted variable is an ordinal, interval, or ratio scale. Simple linear regression (1 IV & 1 DV) extends to multiple regression (more than 1 IV) The Regression Effect The sons of tall fathers tend to be taller than average, but shorter than their fathers. The sons of short fathers tend to be shorter than average, but taller than their fathers. Regression to the Mean Regression Equation Y = b X + c (the equation of a straight line) Line of best fit Line of least-squares Prediction equation Proportion of Variance Interpretation of Correlation The coefficient of determination (r2) is the proportion of variance in Y that can be accounted for by knowing X and, conversely, the proportion of variance in X that can be accounted for by knowing Y. The coefficient of nondetermination (k2) is the proportion of variance “not accounted for” Homoscedasticity In a bivariate normal distribution the variance of scores on Y will be the same for all values of X (equal variance of Y scores for each value of X) is known as homoscedasticity. This assumption means that the variance around the regression line is the same for all values of the predictor variable (X). The plot on the right shows a violation of this assumption. For the lower values on the X-axis, the points are all very near the regression line. For the higher values on the X-axis, there is much more variability around the regression line. Part Correlation It is the correlation of X1 (IQ) with X2 (achievement posttest) after the portion of the posttest that can be predicted from the pretest has been removed. Partial Correlation Simple extension of part correlation The correlation of X1 and X2 with X3 “held constant”, removed, or partialed out is a partial correlation. Multiple Regression Multiple regression is the statistical method most commonly employed for predicting Y from two or more independent variables. Multiple Correlation The correlation between Y and Ypredicted when the prediction is based on two or more independent variables is termed multiple correlation