Pearson`s correlation

advertisement
Topics for Today
Scatterplots
Relationship between 2 Continuous Variables
Pearson’s Correlation
Facts and Myths
Correlation as a Statistic
Stat203
Fall2011 – Week 9, Lecture 1
Page 1 of 28
Two Continuous Variables
Using the 2-sample Chi-square test we were able to
investigate the relationship between two discrete variables.
Eg: - Radio format and age
- weather and city
Now we will examine the relationship between two
__________ variables.
The first tool we will discuss is called ___________.
Stat203
Fall2011 – Week 9, Lecture 1
Page 2 of 28
but even before that … Scatter Plots
Shows the relationship between 2 continuous variables
measured on the same ___________.
Values of the one variable (X) are plotted on the horizontal
axis and values of the other variable (Y) are plotted on the
vertical axis. Each individual appears as a single point.
Let’s look at this in SPSS …
Stat203
Fall2011 – Week 9, Lecture 1
Page 3 of 28
Let’s look at a dataset called Detroit that has information
from the city for years 1961 to 1973. It contains 6 variables:
-
year
homicide rate (per 100,000 population)
# of police (per 100,000 population)
unemployment rate (%)
# registered handguns (per 10,000 population)
average weekly income ($)
Stat203
Fall2011 – Week 9, Lecture 1
Page 4 of 28
Let’s create a scatterplot of two of these variables.
Stat203
Fall2011 – Week 9, Lecture 1
Page 5 of 28
Stat203
Fall2011 – Week 9, Lecture 1
Page 6 of 28
A scatterplot of the # of registered handguns and the # of
police officers:
Stat203
Fall2011 – Week 9, Lecture 1
Page 7 of 28
let’s look at the first row of the data table, and then identify
that point (circle it) in the scatterplot on the previous page:
Each row in the data table corresponds to exactly one point
in the scatter plot.
What sort of relationship between the # of registered
handguns and the # of police officers does this scatterplot
show?
Stat203
Fall2011 – Week 9, Lecture 1
Page 8 of 28
Correlation
The term ___________ is often used in common language
and has a general interpretation as implying a
____________ between two events … including two
discrete events:
“Autism is correlated with vaccination”
… or things that can’t really be measured
“there’s a correlation between my mood and my partner’s
behavior”
However in statistics the term correlation means
something specific.
Stat203
Fall2011 – Week 9, Lecture 1
Page 9 of 28
Statistical Correlation
___________ measures the _________ and ________ of
a ______ relationship between two continuous variables
(X and Y). Pearson’s correlation is the most commonly
used:
r=
å
n
i=1
(x i - x )(y i - y )
é n (x - x ) 2 ùé n (y - y ) 2 ù
êëåi=1 i
úûêëåi=1 i
úû
Note:
- this is ONLY a linear relationship
- there are many types of relationships that are not
linear
Stat203
Fall2011 – Week 9, Lecture 1
Page 10 of 28
I only give you the formula for completeness; we will not be
calculating it by hand (it is extremely tedious).
In this class as in every time you analyze data in the future,
we will make the software calculate the correlation.
However, it is important that you understand that it’s just
another statistic calculated from the data, just like the
mean, the standard deviation, or the odds-ratio.
Stat203
Fall2011 – Week 9, Lecture 1
Page 11 of 28
Some Facts about Correlation
1. Correlation can only be used when both variables are
interval or ratio level
2. Correlation does not change when we change the units
of measurement of X and Y
Height in cm or in will give same correlation to weight in kg or lbs
3. Positive correlation indicates positive association
between the variables and negative correlation indicates
negative association
4. Correlation is always between __ and _. Values near 0
indicate a very ____ relationship
-1 or 1 will occur only if points fall on a straight line
Stat203
Fall2011 – Week 9, Lecture 1
Page 12 of 28
Examples
The following are scatter plots of two variables with the
correlation between the two listed above the plot.
Stat203
Fall2011 – Week 9, Lecture 1
Page 13 of 28
Pearson Correlation of 1
As in the definition, correlation is the strength of the linear
relationship. All of these figures have the ____ correlation!
Important note! The strength of the correlation doesn’t
depend on the slope of the line, just how _______
clustered the points are to a _____________ … any straight
line!
Stat203
Fall2011 – Week 9, Lecture 1
Page 14 of 28
Examples of a relationship with
Pearson Correlation of 0
Stat203
Fall2011 – Week 9, Lecture 1
Page 15 of 28
Facts in a video
http://www.youtube.com/watch?v=Ypgo4qUBt5o
Stat203
Fall2011 – Week 9, Lecture 1
Page 16 of 28
Let’s do some examples – Correlation guessing
Q15, pg 370 – correlation between poverty and rates of
teen pregnancy in 8 US states.
a)
b)
c)
d)
[-0.95, -0.5)
[-0.5, 0)
(0, 0.5)
[0.5, 0.95)
Stat203
Fall2011 – Week 9, Lecture 1
Page 17 of 28
Q16, pg 370 (edited) – Hours studied and exam grade
a) [-0.95, -0.5)
b) [-0.5, 0)
c) (0, 0.5)
d) [0.5, 0.95)
Stat203
Fall2011 – Week 9, Lecture 1
Page 18 of 28
Q19, pg 371 – Hours watching TV vs # books read
a)
b)
c)
d)
[-0.95, -0.5)
[-0.5, 0)
(0, 0.5)
[0.5, 0.95)
Stat203
Fall2011 – Week 9, Lecture 1
Page 19 of 28
An Example
0
y
-4
-5
-2
y
0
2
5
In which of these two scatter plots is the correlation higher?
-3
-2
-1
0
1
2
-5
x
Stat203
Fall2011 – Week 9, Lecture 1
0
x
Page 20 of 28
5
The correlation of the x and y in the two figures is
_________, only the _____ of the axes is different!
Don’t trust your eye, always calculate the correlation.
… but don’t trust the correlation … always check by eye.
Stat203
Fall2011 – Week 9, Lecture 1
Page 21 of 28
Myths about Correlation
1. Correlation implies causation
There could be a third, unknown variable which
influences both X and Y
2. A correlation coefficient of zero implies no
relationship between two variables
WRONG! it only implies no LINEAR relationship!
Remember the funky shaped figures!
Stat203
Fall2011 – Week 9, Lecture 1
Page 22 of 28
Myths explained in video
http://www.youtube.com/watch?v=MTbZoKEOkUg
http://www.youtube.com/watch?v=VW1IEqKuf6s
(Only to 2:48)
Stat203
Fall2011 – Week 9, Lecture 1
Page 23 of 28
Correlation as a statistic
As with the mean, the Odds Ratio and the other statistics we
have looked at, a correlation is a characteristic of a
population that we estimate with our ______:
Mean
Proportion
Odds Ratio
Correlation
Stat203
Fall2011 – Week 9, Lecture 1
Population
(Parameter)
µ
p
Sample
(Statistic)
X
pˆ
OR
_
_
Page 24 of 28
The r tells part of the story
Remember, the correlation (r) we calculate from a sample is
only one of the _____________ correlations we could have
obtained one of many possible _______. It’s possible that
the true population correlation, ρ, has another value … say
0, or ρ0.
So … there is some variability of our estimate r, it’s
standard error.
1- r 2
seˆ(r) =
n -2
Stat203
Fall2011 – Week 9, Lecture 1
Page 25 of 28
Hypotheses for Associations
between Continuous Variables
H0: there is no linear relationship between X and Y
Ha: there is a linear relationship between X and Y
Is the same as:
H 0: H 0: ρ = 0
H a: H 0: ρ ≠ 0
And as in our other hypotheses tests, we will use a
_________ (r ) to approximate a _________ (ρ).
Stat203
Fall2011 – Week 9, Lecture 1
Page 26 of 28
Testing for Correlation = 0
Recall our hypothesis tests for the μ= 0, we used a t-test.
x -0
x
t=
=
se(x ) s / n
If both X and Y are normally distributed, the test for H0: ρ = 0
is very similar:
r-0
r
t=
=
se(r)
1- r 2
n -2
and we look up our t value in the appropriate table to find
the p-value!
Stat203
Fall2011 – Week 9, Lecture 1
Page 27 of 28
New Topics Covered Today
Pearson’s Correlation
 Most commonly calculated correlation statistic
 No definition of response or predictor
 Always between -1 and 1
Hypothesis testing for Correlation
 Does a correlation exist? Reject null = a non-zero correlation
Reading:
Chapter 10 up to page 360
Stat203
Fall2011 – Week 9, Lecture 1
Page 28 of 28
Download