12/14/98 252z9871
2. Data from 1) is repeated below.
y x
1
y
82
.
6 ,
?,
x y
2
2 y
786
34
.
64
.
1 ,
,
x
1 x
2 x
1
130 .
0 ,
54
.
0 and n
x
1
2
9
.
2364 .
0 ,
x
2 a. Do a multiple regression of salary against months and gender.. (12)
4 .
0 ,
x
2
2
4 .
0 , b. Compute problem.(5)
R
2
and R
2
adjusted for degrees of freedom. Compare both with values for the previous c. Do an F test to see if gender helps explain salary.(6) d. Use your equation to predict salary for a female employee with 30 months service. (2) e. Using the method suggested in the text, make your answer to d into prediction and confidence intervals.
(4) f. Using the results from this problem and the previous one draw a graph with 3 regression lines showing salary against months for (i) employees in general, (ii) male employees and (iii) female employees (4)
Solution: a) First, we compute compute
X
1
Y
X
1
X
X
2
Y
n X
2
Y
4
2
54
9
0 .
44444
.
0
2
2 .
22222
9 .
17778
. Third, we compute
1306
34 .
1
.
50
,
Y
9
0 .
44444 and
X
2
Y
X
1
34
X
Y
2
, X
.
1
2
9 .
17778
,
1
n
n X
14.44444
Y
Y
1
2
2
28
786
2 .
61111
X
2
54 and X
.
64 ,
2
4 .
0
X
9
1
2
0 .
44444
4 .
0 ,
. Second, we
X
2
2
4 .
0 and
.
55556
,
,
X
1
2
X n X
.
0
9
14 .
4444
1
Y
1
2
n X
1
Y
486
0 .
44444
.
22222
113
.
38889
,
3 .
77778
,
X
2
2
n X
2
2
. Fourth, we substitute these numbers into the Simplified Normal Equations:
X
X
1
Y
2
Y
n X n X
1
Y
2
Y
b
1 b
1
X
X
1
2
1
X
2 n X
1
2 n X
1
X
2
2
b
X
2
1
X
2 which are
113
.
3889
2 .
6111
482
.
2222
3 .
7778 b
1 b
1
3 .
7778
2 .
2222 b
2 b
2
X
2
2 n X
1
X n X
2
2
2
, and solve them as two equations in two unknowns for b
1 and b
2
. We do this by multiplying the second equation by 1.70, which is 3.7778 divided by 2.2222 so that the two equations become
113
.
3889
4 .
4389
486
6
.
.
2222
4222 b b
1
1
3
3
.
.
7778
7778 b b
2
2
, we then add these two together to get
109 get
.
95 solving
3 .
7778 b
0
479
b
2
.
80 b
1
Y
b
1
, so that
113
X
.
3889
1
b
2
b
1
X
2
486
0 .
22707
.
2222
9 .
17778
. The first of the two normal equations can now be rearranged to
0 .
22707
, which gives us
0 .
22707
14 .
4444 b
2
0 .
78897
0 .
78897
0 .
4444
. Finally we get
6 .
24853 b
0
by
. Thus our equation is b) The coefficient of determination is
s e
2
0 .
22707
Y
2
Y
ˆ
b
0 n Y
2
113 .
3889 b
1
X
1
b
2
X
2
0 .
78897
28
.
5556 b
1
X
1
Y
n
6 .
24853
R
2
2 .
6111
n X
3
1
Y b
1
2
0 .
22707 X
X
1
Y
1
n X
0 .
78897 X
1
Y
Y
2
2 n Y
2
2
.
97379
X
2
Y
n X
2
Y
Y
X
2
2
Y
n
. (The standard error is n Y
n X
3
2
2
Y
1
R
2
, but we don’t need it yet.) Our results can be summarized below as:
R
2
.92601 n
9 k
1
R
2
.9154
.97379 9 2 .9651
4
12/14/98 252z9871
R
2
, which is R
2
adjusted for degrees of freedom, has the formula R
2
X
2
Error
Total
0.699
28.556
6
8
0.1165
n
1
R
2 n
k
1 k
, where k is the number of independent variables. regression is better.
R
2
adjusted for degrees of freedom seems to show that our second c) the easiest way to do the F test and have it look right is to note that
Y
2 n Y
2
28 .
55556 . For the regression with one independent variable the regression sum of squares is
R
2
Y
2 n Y
2
.
92601 regression sum of squares is
28
R
2
.
55556
Y
2
26 n Y
.
443
2
. For the regression with two independent variables the
.
97379
28 .
55556
27 .
807 . The difference between these is 1.364. the remaining unexplained variation is 28.556 – 27.807 = 0.699. the ANOVA table is
Source SS DF MS F F
.
05
X
1
26.443 1 26.443
1.364 1 1.364 11.71
F
6
1
5 .
99
Since our computed F is larger than the table F , we reject our null hypothesis that X
2
has no effect. d)
Y
ˆ b
0
b
1
X
1
b
2
X
2
6 .
24853
0 .
22707 X
1
0 .
78897 X e) According to the text, we can use the following:
2
Confidence interval Y
ˆ
Prediction interval Y
ˆ
t n
k
1
2 s e n
t n
k
1
2 s e
.
6 .
24853
0 .
22707
0 .
78897 ( 1 )
12 .
2717 f) We get our general equation from the last problem, and then take the equation from this problem with
X
2
0 for men and X
2
1 for women.
If X
1
0 If X
1
20
General
Y
ˆ
5 .
81
0 .
233 X
1
Y
ˆ
5 .
81 Y
ˆ
10 .
47
Men
Y
ˆ
6 .
25
0 .
227 X
1
Y
ˆ
6 .
25 Y
ˆ
10 .
79
Women
Y
ˆ
5 .
46
0 .
227 X
1
Y
ˆ
5 .
46 Y
ˆ
10 .
00
These points enable us to graph the lines.
5
12/14/98 252z9871
3. Data from the previous problem is repeated again but with sales replacing gender! a) Compute the correlation between sales and salary. Is it significant? (5) b) Compute a rank correlation between sales and salary. Is it significant?
Why might we expect rank correlation to be higher that the conventional correlation? (5) c) Compute Kendall's W for this data and test it for significance.(6) y x
1 x
2 x
2 y x
2
2
Salary months sales
7.5 6 2.25 16.875 5.0625 Boldface data is additional computations.
8.6 10 2.58 22.188 6.6564
9.1 12 2.73 24.843 7.4529
10.3 18 3.09 31.827 9.5481
13.0 30 3.90 50.700 15.2100
6.2 5 2.86 17.732 8.1796
8.7 13 3.81 33.147 14.5161
9.4 15 3.82 35.908 14.5924
9.8 21 3.94 38.612 15.5236
82.6 28.98 271.832 96.7416
Solution: a)
Y
82 .
6 ,
Y
2
786 .
64 ,
X
2
28 .
98 ,
X
2
2
96 .
7416 ,
X
2
Y
271 .
832 and n
9 .
This means that X
2 r xy
28 .
98
9
X
2
2
X
2
Y
n X
2
2
n X
2
Y
Y
2
3 .
22 . Other sums come from previous problems.
n Y
2
271 .
832
96 .
7416
9
3 .
22
9 .
17778
9
3 .
22
2
28 .
55556
.
3510 r xy
.
3510
.
59246
For the significance test, t
r
.
5925
1
.
3510
1 .
945 . For a 1-sided test H
0
:
0 , H
1
:
0
1
r n
2
2
7 we find t n
2
.
05
t
7
.
05
1 .
895 . Since the computed t is more than the critical value we reject conclude that the correlation is significant. However, for a 2-sided test H
0
:
0 , H
1
:
H
0
and
0 we find t n
2
.
025
t
7
.
025
2 .
365 . Since the computed t is less than the critical value we accept H
0
and conclude that the correlation is not significant. b) Computations for both b) and c) appear in the table below. r y r x 1 d
2 d r y r x 1 r x 2
SR 2
SR
2 1
3 2
5 3
8 5
9 8
1 4
4 6
6 7
7 9
1
1
2
3
1
-3
-2
-1
-2
0
1
1
4
9
1
9
4
1
4
34
2 2 1
3 3 2
5 4 3
8 7 5
9 9 8
1 1 4
4 5 6
6 6 7
7 8 9
5
8
12
20
26
6
15
19
24
135
25
64
144
400
676
36
225
361
576
2507
6
12/14/98 252z9871
The first 4 columns are the rank correlation. We rank the items in columns. the rankings and must sum to zero. a 1-sided test and r s
2
1
6 n
n
2 d
2
1
1
9
6
81
1
1
d is the difference between
.
2833
.
7167 n
9 , the 5% critical value is 0.600, so the correlation is significant.
. From the table for c) To compute Kendall’s W , we ranked the data in columns and added across columns to get row sums.
We square and sum these row sums. SR
n
SR
135
9
15 , S
2 n SR
2
2507
9
2
482 and if k is the number of columns (judges), W
1
12 k
2
S n
3 n
1
482
729 9
12
.
8926 . According to the table, the 5% critical value for S is 54.0, indicating significant agreement, since our computed S is higher. (We reject H
0
, disagreement.)
Solution Continues in 252zz9871
7