#  12/14/98 252z9871

2. Data from 1) is repeated below.

 y x

1

 y

82

.

6 ,

?,

 x y

2

2 y

786

34

.

64

.

1 ,

,

 x

1 x

2 x

1

130 .

0 ,

54

.

0 and n

 x

1

2

9

.

2364 .

0 ,

 x

2 a. Do a multiple regression of salary against months and gender.. (12)

4 .

0 ,

 x

2

2 

4 .

0 , b. Compute problem.(5)

R

2

and R

2

adjusted for degrees of freedom. Compare both with values for the previous c. Do an F test to see if gender helps explain salary.(6) d. Use your equation to predict salary for a female employee with 30 months service. (2) e. Using the method suggested in the text, make your answer to d into prediction and confidence intervals.

(4) f. Using the results from this problem and the previous one draw a graph with 3 regression lines showing salary against months for (i) employees in general, (ii) male employees and (iii) female employees (4)

Solution: a) First, we compute compute

X

1

Y

X

1

X

X

2

Y

 n X

2

Y

4

2

54

9

0 .

44444

.

0

2

2 .

22222

9 .

17778

. Third, we compute

1306

34 .

1

.

50

,

Y

9

0 .

44444 and

X

2

Y

X

1

34

X

Y

2

, X

.

1

2



9 .

17778

,

1

 n

 n X

14.44444

Y

Y

1

2

2

28

786

2 .

61111

X

2

54 and X

.

64 ,

2

4 .

0

X

9

1

2 

0 .

44444

4 .

0 ,

. Second, we

X

2

2

4 .

0 and

.

55556

,

,

X

1

2

X n X

.

0

9

14 .

4444

1

Y

1

2

 n X

1

Y

486



0 .

44444

.

22222

113

.

38889

,

3 .

77778

,

X

2

2

 n X

2

2

. Fourth, we substitute these numbers into the Simplified Normal Equations:

X

X

1

Y

2

Y

 n X n X

1

Y

2

Y

 b

1 b

1

X

X

1

2

1

X

2 n X

1

2 n X

 

1

X

2

2

 b

X

 

2

1

X

2 which are

113

.

3889

2 .

6111

482

.

2222

3 .

7778 b

1 b

1

3 .

7778

2 .

2222 b

2 b

2

X

2

2 n X

1

X n X

2

2

2

, and solve them as two equations in two unknowns for b

1 and b

2

. We do this by multiplying the second equation by 1.70, which is 3.7778 divided by 2.2222 so that the two equations become

113

.

3889

4 .

4389

486

6

.

.

2222

4222 b b

1

1

3

3

.

.

7778

7778 b b

2

2

, we then add these two together to get

109 get

.

95 solving

3 .

7778 b

0

479

 b

2

.

80 b

1

Y

 b

1

, so that

113

X

.

3889

1

 b

2

 b

1

X

2

486

0 .

22707

.

2222

9 .

17778

. The first of the two normal equations can now be rearranged to

0 .

22707

, which gives us

0 .

22707



14 .

4444 b

2

 

0 .

78897

0 .

78897



0 .

4444

. Finally we get

6 .

24853 b

0

by

. Thus our equation is b) The coefficient of determination is

 s e

2

0 .

22707

Y

2

Y

ˆ

 b

0 n Y

2



113 .

3889 b

1

X

1

 b

2

X

2

0 .

78897

28

.

5556 b

1

 

X

1

Y

 n

6 .

24853

R

2 



2 .

6111

 n X

3

1

Y b

1

 

2

0 .

22707 X

X

1

Y

1

 n X

0 .

78897 X

1

Y

Y

2

 

2 n Y

2

2

.

97379

X

2

Y

 n X

2

Y

 

Y

X

2

2

Y

 n

. (The standard error is n Y

 n X

3

2

2

Y

 

1

R

2

, but we don’t need it yet.) Our results can be summarized below as:

R

2

.92601 n

9 k

1

R

2

.9154

.97379 9 2 .9651

4

12/14/98 252z9871

R

2

, which is R

2

adjusted for degrees of freedom, has the formula R

2

X

2

Error

Total

0.699

28.556

6

8

0.1165

 n

1

R

2 n

 k

1 k

, where k is the number of independent variables. regression is better.

R

2

adjusted for degrees of freedom seems to show that our second c) the easiest way to do the F test and have it look right is to note that

Y

2  n Y

2 

28 .

55556 . For the regression with one independent variable the regression sum of squares is

R

2

 

Y

2  n Y

2

.

92601 regression sum of squares is



28

R

2

.

55556

Y

2

26 n Y

.

443

2

. For the regression with two independent variables the

.

97379



28 .

55556

27 .

807 . The difference between these is 1.364. the remaining unexplained variation is 28.556 – 27.807 = 0.699. the ANOVA table is

Source SS DF MS F F

.

05

X

1

26.443 1 26.443

1.364 1 1.364 11.71

F

6

1 

5 .

99

Since our computed F is larger than the table F , we reject our null hypothesis that X

2

has no effect. d)

Y

ˆ  b

0

 b

1

X

1

 b

2

X

2

6 .

24853

0 .

22707 X

1

0 .

78897 X e) According to the text, we can use the following:

2

Confidence interval Y

ˆ

Prediction interval Y

ˆ

 t n

 k

1

2 s e n

 t n

 k

1

2 s e

.

6 .

24853

0 .

22707

 

0 .

78897 ( 1 )

12 .

2717 f) We get our general equation from the last problem, and then take the equation from this problem with

X

2

0 for men and X

2

1 for women.

If X

1

0 If X

1

20

General

Y

ˆ 

5 .

81

0 .

233 X

1

Y

ˆ 

5 .

81 Y

ˆ 

10 .

47

Men

Y

ˆ 

6 .

25

0 .

227 X

1

Y

ˆ 

6 .

25 Y

ˆ 

10 .

79

Women

Y

ˆ 

5 .

46

0 .

227 X

1

Y

ˆ 

5 .

46 Y

ˆ 

10 .

00

These points enable us to graph the lines.

5

12/14/98 252z9871

3. Data from the previous problem is repeated again but with sales replacing gender! a) Compute the correlation between sales and salary. Is it significant? (5) b) Compute a rank correlation between sales and salary. Is it significant?

Why might we expect rank correlation to be higher that the conventional correlation? (5) c) Compute Kendall's W for this data and test it for significance.(6) y x

1 x

2 x

2 y x

2

2

Salary months sales

7.5 6 2.25 16.875 5.0625 Boldface data is additional computations.

8.6 10 2.58 22.188 6.6564

9.1 12 2.73 24.843 7.4529

10.3 18 3.09 31.827 9.5481

13.0 30 3.90 50.700 15.2100

6.2 5 2.86 17.732 8.1796

8.7 13 3.81 33.147 14.5161

9.4 15 3.82 35.908 14.5924

9.8 21 3.94 38.612 15.5236

82.6 28.98 271.832 96.7416

Solution: a)

Y

82 .

6 ,

Y

2 

786 .

64 ,

X

2

28 .

98 ,

X

2

2

96 .

7416 ,

X

2

Y

271 .

832 and n

9 .

This means that X

2 r xy

28 .

98

9

X

2

2

X

2

Y

 n X

2

2

 n X

2

Y

Y

2

3 .

22 . Other sums come from previous problems.

 n Y

2

271 .

832

96 .

7416

9

3 .

22



9 .

17778

9

3 .

22

2

28 .

55556

.

3510 r xy

.

3510

.

59246

For the significance test, t

 r

.

5925

1

.

3510

1 .

945 . For a 1-sided test H

0

:

 

0 , H

1

:

 

0

1

 r n

2

2

7 we find t n

2

.

05

 t

7

.

05

1 .

895 . Since the computed t is more than the critical value we reject conclude that the correlation is significant. However, for a 2-sided test H

0

:

 

0 , H

1

:

H

0

and

0 we find t n

2

.

025

 t

7

.

025

2 .

365 . Since the computed t is less than the critical value we accept H

0

and conclude that the correlation is not significant. b) Computations for both b) and c) appear in the table below. r y r x 1 d

2 d r y r x 1 r x 2

SR 2

SR

2 1

3 2

5 3

8 5

9 8

1 4

4 6

6 7

7 9

1

1

2

3

1

-3

-2

-1

-2

0

1

1

4

9

1

9

4

1

4

34

2 2 1

3 3 2

5 4 3

8 7 5

9 9 8

1 1 4

4 5 6

6 6 7

7 8 9

5

8

12

20

26

6

15

19

24

135

25

64

144

400

676

36

225

361

576

2507

6

12/14/98 252z9871

The first 4 columns are the rank correlation. We rank the items in columns. the rankings and must sum to zero. a 1-sided test and r s

2 

1

6 n

 n

2 d

2

1

 

1

9

6

 

81

1

1

 d is the difference between

.

2833

.

7167 n

9 , the 5% critical value is 0.600, so the correlation is significant.

. From the table for c) To compute Kendall’s W , we ranked the data in columns and added across columns to get row sums.

We square and sum these row sums. SR

 n

SR

135

9

15 , S

2  n SR

2

2507

9

 

2 

482 and if k is the number of columns (judges), W

1

12 k

2

S n

3  n

1

482

 

 

729 9

12

.

8926 . According to the table, the 5% critical value for S is 54.0, indicating significant agreement, since our computed S is higher. (We reject H

0

, disagreement.)

Solution Continues in 252zz9871

7