Correlation

advertisement
Correlation
What is correlation?
• Correlation is the measure of whether and
how strongly pairs of variables are related.
Types of correlation
• Positive/negatives
• Strong /weak
Correlation coefficient r
• Correlation coefficient r is a number that indicates how
well data fit a statistical model
• Ranges between –1 and 1
• The closer to –1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear
relationship
• The closer to 0, the weaker any positive linear
relationship
Correlation coefficient r
• 1 is a perfect positive correlation
• 0 is no correlation (the values don't seem
linked at all)
• -1 is a perfect negative correlation
Correlation and causation
• Causation is a relationship that describes and
analyses cause and effect
• Correlation is NOT causation
Examples
• Temperature & ice-cream sales
• Ice-cream sales & shark attack
• Runny nose & headache
How to calculate r
• Step 1: Find the mean of x, and the mean of y
• Step 2: Subtract the mean of x from every x
value (call them "a"), do the same for y (call
them "b")
• Step 3: Calculate: a × b, a2 and b2 for every
value
• Step 4: Sum up a × b, sum up a2 and sum up b2
• Step 5: Divide the sum of a × b by the square
root of [(sum of a2) × (sum of b2)]
How to calculate r
• Formula for r
• Sxx is the sum of all the squares of the differences
between the xi and the mean, for all i from 1 to n.
• Syy is the sum of all the squares of the differences
between the yi and the mean, for all i from 1 to n.
• Sxy is the product of the differences between the xi
and the mean and the differences between the yi
and the mean, for all I from 1 to n.
Example
• A local shop of milk shakes keeps a track of the amount of milk shakes
they sell in accordance to the temperature on that day. Below are the
figures of their sale and temperature for the last 12 days. Comment on the
relationship.
Solution
Question
• Find the correlation coefficient of the
following data.
Linear Regression
• In regression, one variable is considered
independent (=predictor) variable (X) and the
other the dependent (=outcome) variable Y.
• In statistics, linear regression is an approach for
modeling the relationship between a scalar
dependent variable y and one or more
explanatory variables (or independent variable)
denoted X.
• The case of one explanatory variable is called
simple linear regression.
Linear Regression
• Remember this?
• Y=mX+B
m
B
What is a slope?
• A slope of 2 means that every 1-unit change in
X yields a 2-unit change in Y.
Prediction
• If you know something about the x, based on
the model, you can know something about y
• Extrapolation:
• Attempting to use a regression equation to
predict values outside of the observed range
How to calculate linear regression
Exercise
• Find the linear regression of the following data.
Download