Chapter 9
Correlation
and
Regression
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
1
Chapter 9
Correlation and Regression
9-1 Overview
9-2 Correlation
9-3 Regression
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
2
Overview
Paired Data
™ is there a relationship
™ if so, what is the equation
™ use the equation for prediction
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
3
Example: Lengths and Weights of
Male Bears
x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0
y Weight (lb) 80
344
416
348
262
360
332
34
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
4
9-2
Correlation
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
5
Definition
™ Correlation
exists between two variables
when one of them is related to
the other in some way
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
6
Definition
™ Scatterplot (or scatter diagram)
is a graph in which the paired
(x,y) sample data are plotted with
a horizontal x axis and a vertical
y axis. Each individual (x,y) pair
is plotted as a single point.
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
7
Example: Lengths and Weights of
Male Bears
x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0
y Weight (lb) 80
344
416
348
262
360
332
34
(x , y) = (Length , Weight)
(53.0, 80)
(67.5, 344)
(72.0, 416)
etc.
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
8
Scatter Diagram of Paired Data
Lengths and Weights of Male Bears
500
(72,416)
400
Weight
(lb.)
(68.5,360)
•
•
(67.5,344)
300
•
•
(72,348)
•
(73,332)
• (73.5,262)
200
100
•
(37,34)
(53,80)
•
0
35
40
45
50
55
60
65
70
75
Length (in.)
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
9
Scatter Diagram of Paired Data
Lengths and Weights of Male Bears
500
400
Weight
(lb.)
•
•
•
•
300
•
•
200
100
•
•
0
35
40
45
50
55
60
65
70
75
Length (in.)
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
10
Positive Linear Correlation
y
y
y
x
x
x
(b) Strong
positive
(a) Positive
(c) Perfect
positive
Scatter Plots
Figure 99-2
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
11
Negative Linear Correlation
y
y
y
Figure 99-2
x
x
x
(d) Negative
(e) Strong
negative
(f) Perfect
negative
Scatter Plots
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
12
No Linear Correlation
y
y
x
(g) No Correlation
Figure 99-2
x
(h) Nonlinear Correlation
Scatter Plots
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
13
Definition
™ Linear Correlation Coefficient r
measures strength of the linear
relationship between paired xand y-quantitative values in a
sample
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
14
Definition
™ Linear Correlation Coefficient r
sometimes referred to as the
Pearson product moment
correlation coefficient
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
15
Assumptions
1. The sample of paired data (x,y
(x,y)) is a
random sample.
2. The pairs of (x,y
(x,y)) data have a
bivariate normal distribution.
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
16
Notation for the
Linear Correlation Coefficient
n
number of pairs of data presented.
Σ
denotes the addition of the items indicated.
Σx
denotes the sum of all x values.
Σ x2
indicates that each x score should be squared and then
those squares added.
(Σx)2 indicates that the x scores should be added and the total
then squared.
Σxy
indicates that each x score should be first multiplied by its
corresponding y score. After obtaining all such products,
find their sum.
r
represents linear correlation coefficient for a sample
ρ
represents linear correlation coefficient for a population
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
17
Definition
Linear Correlation Coefficient r
r=
nΣxy - (Σx)(Σy)
n(Σx2)
- (Σx)2
n(Σy2) - (Σy)2
Formula 99-1
Calculators can compute r
ρ (rho)
rho) is the linear correlation coefficient for all paired
data in the population.
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
18
Rounding the
Linear Correlation Coefficient r
™ Round to three decimal places so that
it can be compared to critical values
in Table AA-5
™ Use calculator or computer if possible
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
19
Example: Lengths and Weights of
Male Bears
x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0
y Weight (lb) 80
344
416
348
262
360
332
34
r = 0.897 using your calculator
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
20
Interpreting the Linear
Correlation Coefficient
™ If the absolute value of r exceeds the
value in Table A - 5, conclude that
there is a significant linear correlation.
™ Otherwise, there is not sufficient
evidence to support the conclusion of
significant linear correlation.
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
21
TABLE AA-5 Critical Values of the
Pearson Correlation Coefficient r
n
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
60
70
80
90
100
α = .05
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.514
.497
.482
.468
.456
.444
.396
.361
.335
.312
.294
.279
.254
.236
.220
.207
.196
α = .01
.999
.959
.917
.875
.834
.798
.765
.735
.708
.684
.661
.641
.623
.606
.590
.575
.561
.505
.463
.430
.402
.378
.361
.330
.305
.286
.269
.256
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
22
Properties of the
Linear Correlation Coefficient r
1. -1 ≤ r ≤ 1
2. Value of r does not change if all values of
either variable are converted to a different
scale.
3. The value of r is not affected by the choice of
x and y. Interchange x and y and the value of r
will not change.
4. r measures strength of a linear relationship.
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
23
Common Errors Involving Correlation
1. Causation: It is incorrect to conclude that
correlation implies causality.
2. Averages:
Averages: Averages suppress individual
variation and may inflate the correlation
coefficient.
3. Linearity: There may be some relationship
between x and y even when there is no
significant linear correlation.
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
24
Common Errors Involving Correlation
250
Distance
(feet)
200
150
100
50
0
1
0
2
3
4
5
6
7
8
Time (seconds)
Scatterplot of Distance above Ground and Time for Object Thrown Upward
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
25
Formal Hypothesis Test
™ To determine whether there is a
significant linear correlation
between two variables
™ Two methods
™ Both methods let H0: ρ = 0
(no significant linear correlation)
H1: ρ ≠ 0
(significant linear correlation)
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
26
Method 1: Test Statistic is t
(follows format of earlier chapters)
Test statistic:
t=
r
1-r2
n-2
Critical values:
use Table AA-3 with
degrees of freedom = n - 2
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
27
Method 1: Test Statistic is t
(follows format of earlier chapters)
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
28
Method 2: Test Statistic is r
(uses fewer calculations)
™Test statistic: r
™Critical values: Refer to Table AA-5
(no degrees of freedom)
Reject
ρ =0
-1
r = - 0.811
Fail to reject
ρ=0
Reject
ρ =0
r = 0.811
0
1
Sample data:
r = 0.828
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
29
Testing for
a Linear
Correlation
A-5
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
30
Is there a significant linear correlation?
Data from the Garbage Project
x Plastic (lb)
0.27 1.41
y Household
2
2.19
2.83
2.19
1.81
0.85
3.05
3
6
4
2
1
5
3
α = 0.05
n=8
H0 : ρ = 0
H1 :ρ ≠ 0
Test statistic is r = 0.842
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
31
Is there a significant linear correlation?
n=8
α = 0.05
α = .05
n
ρ=0
H :ρ ≠ 0
H0:
1
Test statistic is r = 0.842
Critical values are r = - 0.707 and 0.707
(Table A-5 with n = 8 and α = 0.05)
TABLE AA-5 Critical Values of the Pearson Correlation Coefficient r
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.514
.497
.482
.468
.456
.444
.396
.361
.335
.312
.294
.279
.254
.236
.220
.207
.196
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
60
70
80
90
100
α = .01
.999
.959
.917
.875
.834
.798
.765
.735
.708
.684
.661
.641
.623
.606
.590
.575
.561
.505
.463
.430
.402
.378
.361
.330
.305
.286
.269
.256
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
32
Is there a significant linear correlation?
0.842 > 0.707
The test statistic does fall within the critical region.
Therefore, we REJECT H0: ρ = 0 (no correlation) and conclude
there is a significant linear correlation between the weights of
discarded plastic and household size.
Reject
ρ =0
-1
r = - 0.707
Fail to reject
ρ=0
0
Reject
ρ =0
r = 0.707
1
Sample data:
r = 0.842
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
33
Justification for r Formula
Formula 99-1 is developed from
r=
Σ (x -x) (y -y)
(x, y)
(n -1) Sx Sy
centroid of sample points
x=3
y
x - x = 7- 3 = 4
(7, 23)
•
24
20
y - y = 23 - 11 = 12
Quadrant 1
Quadrant 2
16
•
12
y = 11
(x, y)
8
•
Quadrant 3
••
4
Quadrant 4
x
0
0
1
2
3
4
5
6
7
Chapter 9. Section 99-1 and 99-2. Triola,
Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison
Pearson/Addison Wesley
34