Correlation Review

advertisement
Psychology 820
Correlation
Regression & Prediction
Concept of Correlation
A coefficient of correlation (r or ρ ‘rho’) is a
statistical summary of the degree and direction of
relationship or association between two variables
(X and Y)
Degree of Relationship
Correlations range from 0 to 1.00
Direction of Relationship
Positive (+) relationship: High score on X goes with a
High score on Y
Negative (-) relationship: High score on X goes with Low
score on Y
The Bivariate Normal Distribution
A family of three dimensional surfaces
Scatterplots
The chief purpose of the scatterplot is
for the study of the nature of the
relationship between two variables.
Components of r
Pearson Product Moment Correlation
Additional Measures of
Relationships
Spearman Rank Correlation
Both X and Y are ranks
Phi Coefficient
Both X and Y are dichotomies
Point-Biserial Coefficient
One dichotomous variable and one continuous measure
Biserial Correlation
One artificial dichotomy and one continuous measure
Tetrachoric Coefficient
Both X and Y are artificial dichotomies
Linear and Curvilinear
Relationships
Only the degree of linear relationship
is described by r or ρ
If there is a substantial nonlinear
relationship between two variables, a
different correlation coefficient (such
as eta η) should be used
Linear Transformations and
Correlation
Any transformation of X or Y that is
linear does not affect the correlation
coefficient
This includes transformations to zscores, T-scores, addition of a
constant to all values, subtracting
multiplying or dividing by non-zero
constants
Effects of Variability on Correlation
The variability (heterogeneity) of the
sample has an important influence on
r
Range restriction
Causation and Correlation
Correlation must be carefully
distinguished from causation.
Third Variable Factor
Effect of Outliers
Regression and Prediction
Prediction and correlation are opposite
sides of the same coin
Regression is usually the statistical method
of choice when the predicted variable is an
ordinal, interval, or ratio scale.
Simple linear regression (1 IV & 1 DV)
extends to multiple regression (more than 1
IV)
The Regression Effect
The sons of tall fathers tend to be
taller than average, but shorter than
their fathers.
The sons of short fathers tend to be
shorter than average, but taller than
their fathers.
Regression to the Mean
Regression Equation
Y = b X + c (the equation of a straight line)
Line of best fit
Line of least-squares
Prediction equation
Proportion of Variance
Interpretation of Correlation
The coefficient of determination (r2) is the
proportion of variance in Y that can be
accounted for by knowing X and,
conversely, the proportion of variance in X
that can be accounted for by knowing Y.
The coefficient of nondetermination (k2) is
the proportion of variance “not accounted
for”
Homoscedasticity
In a bivariate normal distribution the variance of scores on Y
will be the same for all values of X (equal variance of Y
scores for each value of X) is known as homoscedasticity.
This assumption means that the
variance around the regression line
is the same for all values of the
predictor variable (X). The plot on the
right shows a violation of this
assumption. For the lower values on
the X-axis, the points are all very
near the regression line. For the
higher values on the X-axis, there is
much more variability around the
regression line.
Part Correlation
It is the correlation of X1 (IQ) with X2
(achievement posttest) after the
portion of the posttest that can be
predicted from the pretest has been
removed.
Partial Correlation
Simple extension of part correlation
The correlation of X1 and X2 with X3
“held constant”, removed, or partialed
out is a partial correlation.
Multiple Regression
Multiple regression is the statistical
method most commonly employed for
predicting Y from two or more
independent variables.
Multiple Correlation
The correlation between Y and Ypredicted
when the prediction is based on two
or more independent variables is
termed multiple correlation
Download