Bivariate Linear Correlation

advertisement
Bivariate Linear Correlation
Linear Function
•Y = a + bX
Fixed and Random Variables
• A FIXED variable is one for which you
have every possible value of interest in
your sample.
– Example: Subject sex, female or male.
• A RANDOM variable is one where the
sample values are randomly obtained from
the population of values.
– Example: Height of subject.
Correlation & Regression
• If Y is random and X is fixed, the model is
a regression model.
• If both Y and X are random, the model is a
correlation model.
• Psychologists generally do not know this
• They think
– Correlation = compute the corr coeff, r
– Regression = find an equation to predict Y
from X
Scatter Plot
Perfect Positive Linear
Y
X
Perfect Negative Linear
Y
X
Perfect Positive Monotonic
Y
X
Perfect Negative Monotonic
Y
X
Performance
Nonmonotonic Relationship
Test Anxiety
For the data plotted below, the linear r = 0, but
the quadratic r = 1.
Burgers (X) and Beer (Y)
Subject
1
2
3
4
5
Sum
Mean
St. Dev.
X
5
4
3
2
1
15
3
1.581
Y
8
10
4
6
2
30
6
3.162
Beers
A Scatter Plot of Our Data
Burgers
XY
40
40
12
12
2
106
Burger (X)-Beer (Y) Correlation
.
Subject
1
2
3
4
5
Sum
Mean
St. Dev.
X
5
4
3
2
1
15
3
1.581
Y
8
10
4
6
2
30
6
3.162
SSy  (Y  Y )(Y  Y )   YY 
XY
40
40
12
12
2
106
(  Y )(  Y )
N
SSCP   ( X  X )(Y  Y )   XY 
 106 
(  X )( Y )
N
15(30)
 16
5
SSCP  ( X  X )(Y  Y )
SSCP  ( X  X )(Y  Y )
SSCP  ( X  X )(Y  Y )
Burger (X)-Beer (Y) Correlation
Subject
1
2
3
4
5
Sum
Mean
St. Dev.
.
X
5
4
3
2
1
15
3
1.581
Y
8
10
4
6
2
30
6
3.162
SSCP   ( X  X )(Y  Y )   XY 
COV 
SSCP 16

 4.
N 1
4
XY
40
40
12
12
2
106
(  X )( Y )
r
N
 106 
15(30)
 16
5
COV ( X ,Y )
4

 .80
sx sy
1.581(3.162)
Hø: ρ = 0
t
r n2
1 r
2

.8 3
1  .64
 2.309
• df = n – 2 = 3
• Now get an exact p value and construct a
confidence interval
Get Exact p Value
• COMPUTE p=2*CDF.T(t,df).
Go To Vassar
• http://vassarstats.net/
N increased to 10.
Presenting the Results
• The correlation between my friends’
burger consumption and their beer
consumption fell short of statistical
significance, r(n = 5) = .8, p = .10,
95% CI [-.28, .99].
• Among my friends, beer consumption was
positively, significantly related to burger
consumption, r(n = 10) = .8, p = .006,
95% CI [.34, .95].
Assumptions
1.
2.
3.
4.
5.
6.
•
Homoscedasticity across Y|X
Normality of Y|X
Normality of Y ignoring X
Homoscedasticity across X|Y
Normality of X|Y
Normality of X ignoring Y
The first three should look familiar, we
made them with the pooled variances t.
Bivariate Normal
When Do Assumptions Apply?
• Only when employing t or F.
• That is, obtaining a p value
• or constructing a confidence interval.
Shrunken r2
(1  r )(n  1)
(1  .64)( 4)
 1
 1
 .52
(n  2)
3
2
• This reduces the bias in estimation of 
• As sample size increases (n-1)/(n-2)
approaches 1, and the amount of
correction is reduced.
Do not use Pearson
r if the relationship
is not linear. If it is
monotonic, use
Spearman rho.
Every time X
increases, Y
decreases –
accordingly we
have here a
perfect,
negative,
monotonic
relationship
Pearson r measures the strength of the
linear relationship. Notice that it is NOT
perfect here.
Spearman rho measures the strength of
monotonic relationship. Notice that it IS
perfect here.
Uses of Correlation Analysis
• Measure the degree of linear association
• Correlation does imply causation
– Necessary but not sufficient
– Third variable problems
• Reliability
• Validity
• Independent Samples t – point biserial r
– Y = a + b Group (Group is 0 or 1)
Uses of Correlation Analysis
• Contingency tables -- 
Rows = a + bColumns
• Multiple correlation/regression
Y  a  b1 X 1  b2 X 2    bp X p
GPAECU  a  b1SATVerbal  b2SATMath    bpGPAHighSchool
Uses of Correlation Analysis
• Analysis of variance (ANOVA)
Y  a  b1Group1?  b2Group2 ?    bk 1Groupk 1
• PolitConserv = a + b1 Republican? + b2 Democrat?
k = 3, the third group is all others
• Canonical correlation/regression
(a1X1  a2 X 2  )  (b1Y1  b2Y2  )
Uses of Correlation Analysis
• Canonical correlation/regression
(a1X1  a2 X 2  )  (b1Y1  b2Y2  )
• (homophobia, homo-aggression) =
(psychopathic deviance, masculinity, hypomania,
clinical defensiveness)
• High homonegativity = hypomanic, unusually frank,
stereotypically masculine, psychopathically deviant
(antisocial)
Factors Affecting Size of r
• Range restrictions
– Without variance there can’t be covariance
• Extraneous variance
– The more things affecting Y (other then X),
the smaller the r.
• Interactions – the relationship between X
and Y is modified by Z
– If not included in the model, reduces the r.
Power Analysis
   n 1
Cohen’s Guidelines
• .10 – small but not trivial
• .30 – medium
• .50 – large
PSYC 6430 Addendum
• The remaining slides cover material I do
not typically cover in the undergraduate
course.
Correcting for Measurement Error
• If reliability is not 1, the r will
underestimate the correlation between the
latent variables.
• We can estimate the correlation between
the true scores this way:
r X tYt 
• rxx and rYY are reliabilities
r XY
r XX ryy
Example
• r between misanthropy and support for
animal rights = .36 among persons with an
idealistic ethical ideology
r X tYt 
.36
.78(.93)
 .42.
Comparing
Correlation/Regression
Coefficients
• Weaver, B., & Wuensch, K.
L. (2013). SPSS and SAS programs for
comparing Pearson correlations and OLS
regression coefficients. Behavior
Research Methods, 45, 880-895. doi
10.3758/s13428-012-0289-7
H :  1 =  2
• Is the correlation between X and Y the
same in one population as in another?
• The correlation between misanthropy and
support for animal rights was significantly
greater in nonidealists (r = .36) than in
idealists (r = .02)
H: WX = WY
• We have data on three variables. Does
the correlation between X and W differ
from that between Y and W.
• W is GPA, X is SATverbal, Y is SATmath.
• See Williams’ procedure in our text.
• See other procedures referenced in my
handout.
H: WX = YZ
• Raghunathan, T. E, Rosenthal, R, & and
Rubin, D. B. (1996). Comparing correlated
but nonoverlapping correlations,
Psychological Methods, 1, 178-183.
• Example: is the correlation between
verbal aptitiude and math aptitude the
same at 10 years of age as at twenty
years of age (longitudinal data)
H:  = nonzero value
• A meta-analysis shows that the correlation
between X and Y averages .39.
• You suspect it is not .39 in the population
in which you are interested.
• H:  = .39.
Download