advertisement

12/14/98 252z9871

2. Data from 1) is repeated below.

* y x*

1

* y*

82

.

6 ,

?,

* x y*

2

*2 y*

786

34

.

64

.

1 ,

,

* x*

*1 x*

*2 x*

1

130 .

0 ,

54

.

*0 and n*

* x*

1

2

9

.

2364 .

0 ,

* x*

2 a. Do a multiple regression of salary against months and gender.. (12)

4 .

0 ,

* x*

2

2

4 .

0 , b. Compute problem.(5)

*R*

2

* and R*

2

adjusted for degrees of freedom. Compare both with values for the previous c. Do an F test to see if gender helps explain salary.(6) d. Use your equation to predict salary for a female employee with 30 months service. (2) e. Using the method suggested in the text, make your answer to d into prediction and confidence intervals.

(4) f. Using the results from this problem and the previous one draw a graph with 3 regression lines showing salary against months for (i) employees in general, (ii) male employees and (iii) female employees (4)

Solution: a) First, we compute compute

*X*

1

*Y*

*X*

1

*X*

*X*

2

*Y*

* n X*

2

*Y*

4

2

54

9

0 .

44444

.

0

2

2 .

22222

9 .

17778

. Third, we compute

1306

34 .

1

.

50

,

*Y*

9

0 .

44444 and

*X*

2

*Y*

*X*

1

34

*X*

*Y*

2

*, X*

.

1

2

9 .

17778

,

1

* n*

* n X*

14.44444

*Y*

*Y*

1

2

2

28

786

2 .

61111

*X*

2

*54 and X*

.

64 ,

2

4 .

0

*X*

9

1

2

0 .

44444

4 .

0 ,

. Second, we

*X*

2

2

4 .

0 and

.

55556

,

,

*X*

1

2

*X n X*

.

0

9

14 .

4444

1

*Y*

1

2

* n X*

1

*Y*

486

0 .

44444

.

22222

113

.

38889

,

3 .

77778

,

*X*

2

2

* n X*

2

2

. Fourth, we substitute these numbers into the Simplified Normal Equations:

*X*

*X*

1

*Y*

2

*Y*

* n X n X*

1

*Y*

2

*Y*

* b*

*1 b*

1

*X*

*X*

1

2

1

*X*

*2 n X*

1

*2 n X*

1

*X*

2

2

* b*

*X*

2

1

*X*

2 which are

113

.

3889

2 .

6111

482

.

2222

3 .

*7778 b*

*1 b*

1

3 .

7778

2 .

*2222 b*

*2 b*

2

*X*

2

*2 n X*

1

*X n X*

2

2

2

*, and solve them as two equations in two unknowns for b*

*1 and b*

2

. We do this by multiplying the second equation by 1.70, which is 3.7778 divided by 2.2222 so that the two equations become

113

.

3889

4 .

4389

486

6

.

.

2222

*4222 b b*

1

1

3

3

.

.

7778

*7778 b b*

2

2

, we then add these two together to get

109 get

.

95 solving

3 .

*7778 b*

0

479

* b*

2

.

*80 b*

1

*Y*

* b*

1

, so that

113

*X*

.

3889

1

* b*

2

* b*

1

*X*

2

486

0 .

22707

.

2222

9 .

17778

. The first of the two normal equations can now be rearranged to

0 .

22707

, which gives us

0 .

22707

14 .

*4444 b*

2

0 .

78897

0 .

78897

0 .

4444

. Finally we get

6 .

*24853 b*

0

by

. Thus our equation is b) The coefficient of determination is

* s e*

2

0 .

22707

*Y*

2

*Y*

ˆ

* b*

*0 n Y*

2

113 .

*3889 b*

1

*X*

1

* b*

2

*X*

2

0 .

78897

28

.

*5556 b*

1

*X*

1

*Y*

* n*

6 .

24853

*R*

2

2 .

6111

* n X*

3

1

*Y b*

1

2

0 .

*22707 X*

*X*

1

*Y*

1

* n X*

0 .

*78897 X*

1

*Y*

*Y*

2

*2 n Y*

2

2

.

97379

*X*

2

*Y*

* n X*

2

*Y*

*Y*

*X*

2

2

*Y*

* n*

* . (The standard error is n Y*

* n X*

3

2

2

*Y*

1

*R*

2

, but we don’t need it yet.) Our results can be summarized below as:

*R*

2

*.92601 n*

*9 k*

1

*R*

2

.9154

.97379 9 2 .9651

4

12/14/98 252z9871

*R*

2

*, which is R*

2

* adjusted for degrees of freedom, has the formula R*

2

*X*

2

Error

Total

0.699

28.556

6

8

0.1165

* n*

1

*R*

*2 n*

* k*

*1 k*

, where k is the number of independent variables. regression is better.

*R*

2

adjusted for degrees of freedom seems to show that our second c) the easiest way to do the F test and have it look right is to note that

*Y*

*2 n Y*

2

28 .

55556 . For the regression with one independent variable the regression sum of squares is

*R*

2

*Y*

*2 n Y*

2

.

92601 regression sum of squares is

28

*R*

2

.

55556

*Y*

2

*26 n Y*

.

443

2

. For the regression with two independent variables the

.

97379

28 .

55556

27 .

807 . The difference between these is 1.364. the remaining unexplained variation is 28.556 – 27.807 = 0.699. the ANOVA table is

*Source SS DF MS F F*

.

05

*X*

1

26.443 1 26.443

1.364 1 1.364 11.71

*F*

6

1

5 .

99

*Since our computed F is larger than the table F , we reject our null hypothesis that X*

2

has no effect. d)

*Y*

*ˆ b*

0

* b*

1

*X*

1

* b*

2

*X*

2

6 .

24853

0 .

*22707 X*

1

0 .

78897 X e) According to the text, we can use the following:

2

*Confidence interval Y*

ˆ

*Prediction interval Y*

ˆ

* t n*

* k*

1

*2 s e n*

* t n*

* k*

1

*2 s e*

.

6 .

24853

0 .

22707

0 .

78897 ( 1 )

12 .

2717 f) We get our general equation from the last problem, and then take the equation from this problem with

*X*

2

*0 for men and X*

2

1 for women.

*If X*

1

*0 If X*

1

20

General

*Y*

ˆ

5 .

81

0 .

*233 X*

1

*Y*

ˆ

5 .

*81 Y*

ˆ

10 .

47

Men

*Y*

ˆ

6 .

25

0 .

*227 X*

1

*Y*

ˆ

6 .

*25 Y*

ˆ

10 .

79

Women

*Y*

ˆ

5 .

46

0 .

*227 X*

1

*Y*

ˆ

5 .

*46 Y*

ˆ

10 .

00

These points enable us to graph the lines.

5

12/14/98 252z9871

3. Data from the previous problem is repeated again but with sales replacing gender! a) Compute the correlation between sales and salary. Is it significant? (5) b) Compute a rank correlation between sales and salary. Is it significant?

*Why might we expect rank correlation to be higher that the conventional correlation? (5) c) Compute Kendall's W for this data and test it for significance.(6) y x*

*1 x*

*2 x*

*2 y x*

2

2

Salary months sales

7.5 6 2.25 16.875 5.0625 Boldface data is additional computations.

** 8.6 10 2.58 22.188 6.6564**

** 9.1 12 2.73 24.843 7.4529**

**10.3 18 3.09 31.827 9.5481**

**13.0 30 3.90 50.700 15.2100**

** 6.2 5 2.86 17.732 8.1796**

** 8.7 13 3.81 33.147 14.5161**

** 9.4 15 3.82 35.908 14.5924**

** 9.8 21 3.94 38.612 15.5236**

**82.6 28.98 271.832 96.7416 **

Solution: a)

*Y*

82 .

6 ,

*Y*

2

786 .

64 ,

*X*

2

28 .

98 ,

*X*

2

2

96 .

7416 ,

*X*

2

*Y*

271 .

*832 and n*

9 .

*This means that X*

*2 r xy*

28 .

98

9

*X*

2

2

*X*

2

*Y*

* n X*

2

2

* n X*

2

*Y*

*Y*

2

3 .

22 . Other sums come from previous problems.

* n Y*

2

271 .

832

96 .

7416

9

3 .

22

9 .

17778

9

3 .

22

2

28 .

55556

.

*3510 r xy*

.

3510

.

59246

*For the significance test, t*

* r*

.

5925

1

.

3510

1 .

*945 . For a 1-sided test H*

0

:

*0 , H*

1

:

0

1

* r n*

2

2

*7 we find t n*

2

.

05

* t*

7

.

05

1 .

*895 . Since the computed t is more than the critical value we reject conclude that the correlation is significant. However, for a 2-sided test H*

0

:

*0 , H*

1

:

*H*

0

and

*0 we find t n*

2

.

025

* t*

7

.

025

2 .

*365 . Since the computed t is less than the critical value we accept H*

0

* and conclude that the correlation is not significant. b) Computations for both b) and c) appear in the table below. r y r x 1 d*

2 d r y r x 1 r x 2

SR 2

*SR*

2 1

3 2

5 3

8 5

9 8

1 4

4 6

6 7

7 9

1

1

2

3

1

-3

-2

-1

-2

0

1

1

4

9

1

9

4

1

4

34

2 2 1

3 3 2

5 4 3

8 7 5

9 9 8

1 1 4

4 5 6

6 6 7

7 8 9

5

8

12

20

26

6

15

19

24

135

25

64

144

400

676

36

225

361

576

2507

6

12/14/98 252z9871

*The first 4 columns are the rank correlation. We rank the items in columns. the rankings and must sum to zero. a 1-sided test and r s*

2

1

*6 n*

* n*

*2 d*

2

1

1

9

6

81

1

1

d is the difference between

.

2833

.

*7167 n*

9 , the 5% critical value is 0.600, so the correlation is significant.

. From the table for c) To compute Kendall’s W , we ranked the data in columns and added across columns to get row sums.

*We square and sum these row sums. SR*

* n*

*SR*

135

9

*15 , S*

*2 n SR*

2

2507

9

2

*482 and if k is the number of columns (judges), W*

1

*12 k*

2

*S n*

*3 n*

1

482

729 9

12

.

*8926 . According to the table, the 5% critical value for S is 54.0, indicating significant agreement, since our computed S is higher. (We reject H*

0

, disagreement.)

**Solution Continues in 252zz9871 **

7