Correlation

advertisement
Outline
Correlation and Covariance
 Bivariate Correlation Coefficient
 Types of Correlation
 Correlation Coefficient Formula
 Correlation Coefficient Computation
 Short-cut Formula
 Linear Function (Intercept and Slope)

(c) 2007 IUPUI SPEA K300 (4392)
Correlation and Covariance
It asks how two variables are related
 When x changes, how does y change?
 Underlying information is covariance
 Cov(x,y)=E[(x-xbar)(y-ybar)]
 Cov(x,y)=Cov(y,x)
 Cov(x,x)=Var(x), variance is a special
type of covariance (covariance of a
variable and itself)

(c) 2007 IUPUI SPEA K300 (4392)
Bivariate Correlation Coefficient






(Karl Pearson product moment) correlation
coefficient
Bivariate correlation coefficient (BCC) for two
interval/ratio variables
Differentiated from Spearman’s rank
correlation coefficient (nonparametric)
Differentiated from partial correlation
coefficient that controls the impact of other
variables
No causal relationship imposed. XY or YX
BCC is used for prediction
(c) 2007 IUPUI SPEA K300 (4392)
Bivariate Correlation Coefficient






BCC ranges from -1 to 1 (So does Gamma γ)
Covariance component can be negative
+ means positive relationship; when x
increases 1 unit, y increases r unit
0 means no relationship.
- means negative relationship; when x
increases 1 unit, y decreases r unit.
http://noppa5.pc.helsinki.fi/koe/corr/cor7.html
(c) 2007 IUPUI SPEA K300 (4392)
Positive relationship
0
1
2
y
3
4
5
r=1.0 (positive relationship)
0
1
2
3
x
(c) 2007 IUPUI SPEA K300 (4392)
4
5
Negative relationship
0
1
2
y
3
4
5
r=-1.0 (negative relationship)
0
1
2
3
x
(c) 2007 IUPUI SPEA K300 (4392)
4
5
No relationship
1.5
y
3.5
r=.0 (No relationship)
0
1
2
3
x
(c) 2007 IUPUI SPEA K300 (4392)
4
5
Correlation Coefficient

Ratio of the covariance component of x
and y to the square root of variance
components of x and y
r
SPxy
SS xx SS yy
n

 ( x  x )( y  y )
 ( x  x )  ( y  y)
SPxy   ( xi  x )( yi
i 1
n
i
i
2
i
2
i
x y

 y)   ( x y ) 
i
i
i
n
SS xx   ( xi  x )( xi  x )   ( xi  x )   xi2 
2
i
 x 
i 1
n
SS yy   ( yi  y )( yi  y )   ( yi  y )   yi2 
2
i 1
(c) 2007 IUPUI SPEA K300 (4392)
2
i
n
 y 
2
i
n
Correlation Coefficient (short-cut)
Textbook suggests a short-cut formula below
but it is not recommended.
r
SPxy
SS xx SS yy

n ( xi yi )   xi  yi
n x
2
i

  xi  n yi2   yi 
2
x y

  (x y ) 
2

n ( xi yi )   xi  yi
SPxy

i i
n
n
2
2
2


x
n
x


i
i   xi 
2
SS xx   xi 

n
n
2
2
2

yi 
n yi   yi 

2
SS yy   yi 

n
n
i
i
(c) 2007 IUPUI SPEA K300 (4392)
Illustration: example 10-2, p.526
No
x
y
(x-xbar)
(y-ybar)
(x-xbar)^2
(y-ybar)^2
(x-xbar)(y-ybar)
1
43
128
-14.5
-8.5
210.25
72.25
123.25
2
48
120
-9.5
-16.5
90.25
272.25
156.75
3
56
135
-1.5
-1.5
2.25
2.25
2.25
4
61
143
3.5
6.5
12.25
42.25
22.75
5
67
141
9.5
4.5
90.25
20.25
42.75
6
70
152
12.5
15.5
156.25
240.25
193.75
Sum
345
819
561.5
649.5
541.5
Mean
57.5
137
SSxx
SSyy
SPxy
Correlation coefficient
0.8967
(c) 2007 IUPUI SPEA K300 (4392)
Hypothesis Test
How reliable is a correlation coefficient?
 r is a random variable drawn from the
sample; ρ is its corresponding parameter
 H0: ρ =0, Ha: ρ ≠ 0
 TS follows the t distribution with df=n-2
 If H0 is not rejected, r is not reliable
regardless of its magnitude (ρ =0)

tr  r
n2
~ t (n  2)
2
1 r
(c) 2007 IUPUI SPEA K300 (4392)
Illustration: Example 10-3, p.529
Step 1. H0: ρ =0, Ha: ρ ≠ 0
 Step 2. α=.05, df=4 (=6-2), CV=2.776
 Step 3. TS=4.059, r=.897
 Step 4. TS>CV, reject H0 at the .05 level
 Step 5. ρ ≠ 0

n2
62
tr  r
 .897
 4.059 ~ t (n  2)
2
2
1 r
1  .897
(c) 2007 IUPUI SPEA K300 (4392)
Linear function
A function transforms input into output in
its own way
 Ex: y=square_root(x). Whey you put x
(input) into the funciton square_root(),
you will get y (output).
 Linear function consists of a intercept
and linear combinations of variables and
their slops. Y= a + bX + cX2…
 Slopes are constant

(c) 2007 IUPUI SPEA K300 (4392)
Intercept and Slope of a function
A linear model: Y = a + b X
 Dependent variable Y to be explained
 Independent variable X that explains Y
 Y-Intercept a: the coordinate of the point
at which the line intersects Y axis.
 Slope b: the change of dependent
variable Y per unit change in
independent variable X

(c) 2007 IUPUI SPEA K300 (4392)
Illustration
3
2
.5
1
1
y
4
5
Y = 2 +.5X
-1
0
1
2
3
x
(c) 2007 IUPUI SPEA K300 (4392)
4
5
Download