Correlation and Regression Analysis

advertisement
Chapter 9:
Correlation and Regression Analysis
Correlation
• Correlation is a numerical way to measure the
strength and direction of a linear association
between an independent and dependent variable.
• The Pearson Correlation Coefficient (r) is the most
widely used measure of correlation.
• The sign of the correlation coefficient (r) indicates
the direction of the association.
• The size of the correlation coefficient (r) indicates
the strength of the association.
Direction
• A positive correlation indicates that as one
variable increases, the other one increases.
– An example might be height and weight.
• A negative correlation indicates that as one
variable increases, the other one decreases.
– An example might be turnovers and wins.
• No correlation means that there is no association
between the two variables.
– An example might be intelligence and the amount of
coffee you drink.
Strength
Examples of positive, linear associations with different
amounts of strength.
Scatterplots
• The graphs on the previous slide are known as
scatterplots, or scatter diagrams.
• Scatterplots are graphical displays that show the
relationship between two numerical variables.
• The independent variable is plotted on the x-axis
and the dependent variable is plotted on the yaxis.
Calculating the Correlation Coefficient
• The correlation coefficient is obtained by dividing
the sample covariance by the product of the
standard deviation of each of the two variables.
Example: Construct a scatter diagram and
calculate the correlation coefficient for the data
set below.
Interpretation
• So what does an r of -0.918 mean?
1) There is a very strong negative correlation
between x and y.
2) Larger values of x tend to correspond to
smaller values of y and smaller values of x
tend to correspond to larger values of y.
Example: Try this example. Find the correlation coefficient of the
data set below.
The Coefficient of Determination
• The coefficient of determination is the square
of the correlation coefficient.
• The coefficient of determination indicates the
percent of the variation in the dependent
variable that can be explained by the variation
in the independent variable.
So what exactly does this mean?
So let’s say we
randomly selected 16
trucks and looked for
the relationship
between miles on the
truck and price for the
truck.
• As one would suspect, as miles increase, price decreases.
• The coefficient of determination for this data set is 0.664, or
66.4%.
• We can interpret this by saying that 66.4% of the variation in
price is accounted for by the linear model relating price to
miles driven.
• Another way to look at this is that 33.6% of a truck’s price is
accounted for by factors other than miles.
Regression
• Regression analysis is a statistical technique
for estimating and predicting the value of one
variable (dependent variable) on the basis of
the knowledge of another variable
(independent variable).
• The goal is to attain the equation for the line
of best fit (aka regression line or least squares
line) from the scatterplot of the two variables.
Equation of the Regression Line
a and b are known as regression coefficients.
How to Find the Regression Coefficients
Let’s return to a previous example and try and write the
regression equation.
Now you try with this one.
SPSS Output Example
Download