Uploaded by Julia Ibrahim

Chapter 13 Simple Regression

advertisement
Chapter Thirteen
13- 1
Linear Regression and Correlation
GOALS
When you have completed this chapter, you will be able to:
ONE
Draw a scatter diagram.
TWO
Understand and interpret the terms dependent variable and
independent variable.
THREE
Calculate the least squares regression line and interpret the slope
and intercept values.
FOUR
Calculate and interpret the coefficient of correlation, the coefficient
of determination, and the standard error of estimate.
Chapter Thirteen
13- 2
continued
Linear Regression and Correlation
GOALS
When you have completed this chapter, you will be able to:
FIVE
Conduct a test of hypothesis to determine if the population
coefficient of correlation is different from zero.
13- 3
Relationship between two variables



Relationship between family income and expenditure
Relationship between advertisement money and total
sales volume.
Relationship between car speed and mileage.
13- 4
Regression Analysis
The mathematical model by which we can
determine the relationship between dependent
and independent variables is known as
regression analysis.
13- 5
Example:
A corporation owns several companies. The strategic planner for the corporation believes
dollars spent on advertising can to some extent be a predictor of total sales dollars. As an
aid in long-term planning, she gathers the following sales and advertising information
from several of the companies for 2015 (in $ millions).
ADVERTISING
SALES
12.5
148
3.7
55
21.6
338
60.0
994
37.6
541
6.1
89
16.8
126
41.2
379
Develop the equation of the simple regression line to predict sales from advertising
expenditures using these data.
13- 6
Scatter Diagram
13- 7
13- 8
Regression Equation
y  a  bx
 xy 
where
 x y
n
b
2

 x
2
x 
n
a  y  bx
13- 9
Example:
An economist is interested in the relationship between the
disposable income of a family and the amount of money spent
annually on food. For a preliminary study, the economist takes
a random sample of eight middle-income families of the same
size (father, mother, two children). The results are as follows,
where x denotes disposable income, in thousands of dollars,
and y denotes food expenditure, in hundreds of dollars.
x
y
30
55
36
60
27
42
20
40
16
37
24
26
19
39
25
43
13- 10
a) Identify the predictor and response variables.
b) Graph the regression equation and the data points.
c) Determine the regression equation for the data.
Describe the apparent relationship between disposable
income and annual food expenditure.
d) What does the slope of the regression line represent in
terms of disposable income and annual food
expenditure?
e) Use the regression equation to predict the annual food
expenditure of a family with a disposable income
$33,000.
13- 11
13- 12
x
y
xy
x2
30
55
1,650
900
36
60
2,160
1,296
27
42
1,134
729
20
40
800
400
16
37
592
256
24
26
624
576
19
39
741
361
25
43
1,075
625
x = 197
y = 342
xy = 8,776
x2 = 5,143
b=1.2137, a=12.8625; y (x=33)=5291.46
13- 13
Standard Error of Estimate
How good the fitting of regression line is?
  y  yˆ 
2
Se 
n2
13- 14
Computing Standard Error : Method 1
y  yˆ
( y  yˆ ) 2
49.2735
5.7265
32.7928
60
56.5557
3.4443
11.8632
27
42
45.6324
-3.6324
13.1943
20
40
37.1365
2.8635
8.1996
16
37
32.2817
4.7183
22.2623
24
26
41.9913
-15.9913
255.7217
19
39
35.9228
3.0772
9.4691
25
43
43.2050
-0.2050
0.0420
0.0011
353.5450
x
y
30
55
36
ŷ
Se=7.6762
13- 15
How is the standard error used?
If some error term (called residual) does not
lie in between   2Se or   3Se, then the
corresponding data can be regarded as
outlier.
13- 16
A simpler formula to calculate Se
Se 
 y  a  y  b xy
2
n2
13- 17
Computing Standard Error : Method 2
x
y
xy
y2
30
55
1,650
3025
36
60
2160
3600
27
42
1134
1764
20
40
800
1600
16
37
592
1369
24
26
624
676
19
39
741
1521
25
43
1075
1849
197
342
8776
15,404
13- 18
Hypothesis test for the slope of
regression line
t
b
Se
S xx

x

 x 
n
2
where
S xx
2
Example: Income and expenditure: At the 5%
significance level, do the data provide sufficient
evidence to conclude that the income is useful as a
predictor of expenditure?
13- 19
Hypothesis Test for b
H 0 :   0,
Ha :  0
 t 0.025,6   2.447
1.2137
t
7.6767

7.6767
197  197
5143 
8
1.2137
5143  4851 .125
1.2137

7.6767 17 .0843
1.2137

0.4493
 2.7013
13- 20
Coefficient of Determination
This measures the percentage of
variation in the observed values of the
dependent variable that is explained by
the regression.
13- 21
Coefficient of Determination
SSE
r 1
SST
2

y

SST    y  yˆ    y 
n

x y 

 xy 

2
2
2
2

SSE  SST 
n

x

x  n
2
2

13- 22
Coefficient of determination
2
342
SST   y 2   y 2 / n  15404
1540414620.5  783.5
8
2
197 342 

 8776

2


8776

8421
.
75
8
  783.5 
SSE  783.5  
197197
5143 4851.125
5143
8
125493.06
 783.5 
 783.5  429.9548 353.5452
291.875
353.5452
r 1
 1  0.4512  0.5488
783.5
2
13- 23
Correlation Coefficient
Perfect negative
correlation
Strong
-ve
correl.
-1.0
Perfect positive
correlation
No
correlation
Moderate
-ve
correl.
-0.50
-ve
correlation
Weak
–ve
correl.
Weak
+ve
correl
0
Moderate
+ve correl
0.50
+ ve
correlation
Strong
+ve
correl.
1.0
13- 24
Perfect Positive Correlation
10
9
8
7
6
y 5
4
3
2
1
0
r = +1
0
1
2
3
4
5
x
6
7
8
9
10
13- 25
Perfect Negative Correlation
10
9
8
7
6
y 5
4
3
2
1
0
r = -1
0
1
2
3
4
5
x
6
7
8
9
10
13- 26
Strong Positive Correlation
10
9
8
7
6
y
5
4
3
2
1
0
0.5 < r < 1.0
0
1
2
3 4
5
x
6
7
8
9
10
13- 27
Zero Correlation
10
9
8
7
6
y 5
4
3
2
1
0
r=0
0
1
2
3 4
5
x
6
7
8
9
10
13- 28
Formula of correlation coefficient
 
2

2
2




x 
y 
y








xy 
x

2 

x  n

 

y  n

8 7 7 68 4 2 1.7 5
5 1 4 34 8 5 1.1 2 51 5 4 0 41 4 6 2 0.5 
3 5 4.2 5

2 9 1.8 7 57 8 3.5
3 5 4.2 5

 0 .7 4 0 8
4 7 8.2 1

n

Another formula for Coefficient of
determination:
Coefficient of determination
= (Correlation Coefficient)2
= r2
13- 29
13- 30
Correlation Coefficient Test
Step 1: State the null hypothesis, H0 :  = 0
State the alternative hypothesis as
Ha:   0, or Ha:  < 0
or Ha: >0
Step 2: Decide on the significance level, .
Step 3: Find the critical value
t/2 (2-tailed test), -t (left-tailed test), +t (right-tailed test)
Step 4: Compute the value of test statistic
r
t 
1 r 2
n2
13- 31
Correlation Coefficient test
H o:   0
Ha :   0
 t 0.05,7  1.895
t
r
1 r 2
n2
0.7408
0.7408


0.2746
1  0.5476
6
 2.6977
Re ject H o
13- 32
13- 33
Self Test
Many studies have been done that indicate the maximum heart rate
an individual can reach during intensive exercise decreases with age.
A physician decided to do his own study and recorded the ages and
peak heart rates of 10 randomly selected people. The results are
shown in the following table, where x denotes age, in years, and y
denotes peak heart rate.
x
30
38
41
38
29
39
46
41
42
24
y
186
183
171
177
191
177
175
176
171
196
13- 34
 Graph
the regression equation and the data points.
 Determine the regression equation for the data.
 Describe the apparent relationship between age and peak heart
rate.
 What does the slope of the regression line represent in terms of
age and peak heart rate?
 Use the regression equation to predict the peak heart rate of a 28year-old person.
 Identify the predictor and response variables.
Download