Chapter 9 Correlation and Regression

advertisement
ELEMENTARY
STATISTICS
Chapter 9
Correlation and Regression
MARIO F. TRIOLA
EIGHTH
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
EDITION 1
Chapter 9
Correlation and Regression
9-1 Overview
9-2 Correlation
9-3 Regression
9-4 Variation and Prediction Intervals
9-5 Multiple Regression
9-6 Modeling
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
2
9-1
Overview
Paired Data
 is there a relationship
 if so, what is the equation
 use the equation for prediction
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
3
9-2
Correlation
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
4
Definition
Correlation
exists between two variables
when one of them is related to
the other in some way
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
5
Assumptions
1. The sample of paired data (x,y) is a
random sample.
2. The pairs of (x,y) data have a
bivariate normal distribution.
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
6
Definition
Scatterplot (or scatter diagram)
is a graph in which the paired
(x,y) sample data are plotted with
a horizontal x axis and a vertical
y axis. Each individual (x,y) pair
is plotted as a single point.
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
7
Scatter Diagram of Paired Data
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
8
Positive Linear Correlation
y
y
y
(a) Positive
Figure 9-1
x
x
x
(b) Strong
positive
(c) Perfect
positive
Scatter Plots
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
9
Negative Linear Correlation
y
y
y
(d) Negative
Figure 9-1
x
x
x
(e) Strong
negative
(f) Perfect
negative
Scatter Plots
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
10
No Linear Correlation
y
y
x
(g) No Correlation
Figure 9-1
x
(h) Nonlinear Correlation
Scatter Plots
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
11
Definition
Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
12
Definition
Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample
r=
nxy - (x)(y)
n(x2) - (x)2
n(y2) - (y)2
Formula 9-1
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
13
Definition
Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample
r=
nxy - (x)(y)
n(x2) - (x)2
n(y2) - (y)2
Formula 9-1
Calculators can compute r
(rho) is the linear correlation coefficient for all paired
data in the population.
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
14
Notation for the
Linear Correlation Coefficient
n =
number of pairs of data presented

denotes the addition of the items indicated.
x
denotes the sum of all x values.
x2
indicates that each x score should be squared and then
those squares added.
(x)2 indicates that the x scores should be added and the total
then squared.
xy
indicates that each x score should be first multiplied by its
corresponding y score. After obtaining all such products,
find their sum.
r
represents linear correlation coefficient for a sample

represents linear correlation coefficient for a population
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
15
Rounding the
Linear Correlation Coefficient r
 Round to three decimal places so
that it can be compared to critical
values in Table A-6
 Use calculator or computer if possible
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
16
Interpreting the Linear
Correlation Coefficient
If the absolute value of r exceeds the
value in Table A - 6, conclude that there is
a significant linear correlation.
Otherwise, there is not sufficient
evidence to support the conclusion of
significant linear correlation.
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
17
TABLE A-6 Critical Values of the
Pearson Correlation Coefficient r
n
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
60
70
80
90
100
 = .05
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.514
.497
.482
.468
.456
.444
.396
.361
.335
.312
.294
.279
.254
.236
.220
.207
.196
 = .01
.999
.959
.917
.875
.834
.798
.765
.735
.708
.684
.661
.641
.623
.606
.590
.575
.561
.505
.463
.430
.402
.378
.361
.330
.305
.286
.269
.256
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
18
Properties of the
Linear Correlation Coefficient r
1. -1  r  1
2. Value of r does not change if all values of
either variable are converted to a different
scale.
3. The r is not affected by the choice of x and y.
Interchange x and y and the value of r will
not
change.
4. r measures strength of a linear relationship.
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
19
Common Errors Involving Correlation
1. Causation: It is wrong to conclude that
correlation implies causality.
2. Averages: Averages suppress individual
variation and may inflate the correlation
coefficient.
3. Linearity: There may be some relationship
between x and y even when there is no
significant linear correlation.
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
20
Common Errors Involving Correlation
FIGURE 9-2
250
Distance
(feet)
200
150
100
50
0
0
1
2
3
4
5
6
7
8
Time (seconds)
Scatterplot of Distance above Ground and Time for Object Thrown Upward
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
21
Formal Hypothesis Test
 To determine whether there is a
significant linear correlation
between two variables
 Two methods
 Both methods let H0:  =
(no significant linear correlation)
H1:  
(significant linear correlation)
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
22
Method 1: Test Statistic is t
(follows format of earlier chapters)
Test statistic:
t=
r
1-r2
n-2
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
23
Method 1: Test Statistic is t
(follows format of earlier chapters)
Test statistic:
t=
r
1-r2
n-2
Critical values:
use Table A-3 with
degrees of freedom = n - 2
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
24
Method 1: Test Statistic is t
(follows format of earlier chapters)
Figure 9-4
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
25
Method 2: Test Statistic is r
(uses fewer calculations)
Test statistic: r
Critical values: Refer to Table A-6
(no degrees of freedom)
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
26
Method 2: Test Statistic is r
(uses fewer calculations)
Test statistic: r
Critical values: Refer to Table A-6
(no degrees of freedom)
Reject
 =0
-1
Figure 9-5
r = - 0.811
Fail to reject
=0
0
Reject
 =0
r = 0.811
1
Sample data:
r = 0.828
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
27
FIGURE 9-3
Start
Testing for a
Linear Correlation
Let H0:  = 0
H1:   0
Select a
significance
level 
Calculate r using
Formula 9-1
METHOD 1
METHOD 2
The test statistic is
t=
The test statistic is
r
r
Critical values of t are from
Table A-6
1-r2
n -2
Critical values of t are from Table A-3
with n -2 degrees of freedom
If the absolute value of the
test statistic exceeds the
critical values, reject H0:  = 0
Otherwise fail to reject H0
If H0 is rejected conclude that there
is a significant linear correlation.
If you fail to reject H0, then there is
not sufficient evidence to conclude
that there is linear correlation.
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
28
Is there a significant linear correlation?
Data from the Garbage Project
x Plastic (lb)
y Household
0.27 1.41
2
3
2.19
2.83
2.19
1.81
0.85
3.05
3
6
4
2
1
5
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
29
Is there a significant linear correlation?
Data from the Garbage Project
x Plastic (lb)
y Household
n=8
0.27 1.41
2
3
 = 0.05
2.19
2.83
2.19
1.81
0.85
3.05
3
6
4
2
1
5
H0:  = 0
H1 :  0
Test statistic is r = 0.842
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
30
Is there a significant linear correlation?
n=8
 = 0.05
=0
:  0
H 0:
H1
Test statistic is r = 0.842
Critical values are r = - 0.707 and 0.707
(Table A-6 with n = 8 and  = 0.05)
TABLE A-6 Critical Values of the Pearson Correlation Coefficient r
n
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
60
70
80
90
100
 = .05
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.514
.497
.482
.468
.456
.444
.396
.361
.335
.312
.294
.279
.254
.236
.220
.207
.196
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
 = .01
.999
.959
.917
.875
.834
.798
.765
.735
.708
.684
.661
.641
.623
.606
.590
.575
.561
.505
.463
.430
.402
.378
.361
.330
.305
.286
.269
.256
31
Is there a significant linear correlation?
Reject
 =0
-1
r = - 0.707
Fail to reject
=0
0
Reject
 =0
r = 0.707
1
Sample data:
r = 0.842
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
32
Is there a significant linear correlation?
0.842 > 0.707, That is the test statistic does fall within the
critical region.
Reject
 =0
-1
r = - 0.707
Fail to reject
=0
0
Reject
 =0
r = 0.707
1
Sample data:
r = 0.842
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
33
Is there a significant linear correlation?
0.842 > 0.707, That is the test statistic does fall within the
critical region.
Therefore, we REJECT H0:  = 0 (no correlation) and conclude
there is a significant linear correlation between the weights of
discarded plastic and household size.
Reject
 =0
-1
r = - 0.707
Fail to reject
=0
0
Reject
 =0
r = 0.707
1
Sample data:
r = 0.842
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
34
Justification for r Formula
Formula 9-1 is developed from
r=
 (x -x) (y -y)
(n -1) Sx Sy
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
35
Justification for r Formula
Formula 9-1 is developed from
r=
 (x -x) (y -y)
(n -1) Sx Sy
(x, y)
centroid of sample points
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
36
Justification for r Formula
Formula 9-1 is developed from
r=
 (x -x) (y -y)
(n -1) Sx Sy
(x, y)
centroid of sample points
x=3
y
x - x = 7- 3 = 4
(7, 23)
•
24
20
y - y = 23 - 11 = 12
Quadrant 1
Quadrant 2
16
•
12
8
•
Quadrant 3
••
4
y = 11
(x, y)
Quadrant 4
FIGURE 9-6
x
0
0
1
2
3
4
5
6
7
Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman
37
Download