The exam to be given on April 22, 1998, will cover material since the

advertisement
C22.0015 Information for Final Exam 2011.MAY.04

The problems below are taken from previously-used exams, or were considered for
exams. Solutions follow, beginning on page 7.
1.
In the following analysis of variance table, there are some missing entries. Fill in
those entries and be sure to leave empty those positions that are left empty by convention.
SOURCE OF
VARIATION
Regression
Error
Total
DEGREES OF
FREEDOM
3
SUM
SQUARES
MEAN
SQUARES
45.00
F
5.00
23
2. Suppose that you have three random variables X1, X2, X3, independent and normally
distributed. It is known that the standard deviation is  = 10 for each value, but the
means 1, 2, and 3 are not known. Find the likelihood ratio test criterion for
H0: 1 = 2 versus HA : 1  2. Show that –2 log  is exactly chi-squared on one degree
of freedom.
3. Consider the regression model Yi =  xi + i where the xi’s are known nonrandom
quantities and the ’s are independent normal N(0, 2). Suppose that the value of  is
known.
(a)
Find the maximum likelihood estimate of .
(b)
Find Fisher’s information and thus give the limiting variance of the maximum
likelihood estimate.
(c)
Find the exact variance of your estimate from (a).
(d)
If (b) and (c) do not produce exactly the same variance, show how these two
solutions are reconciled as n  .
4. There are two independent samples of data from normal populations. We have
X1, X2, …, Xm as a sample from N(, 2)
Y1, Y2, …, Yn as a sample from N(, 2)
Observe that the sample sizes are m and n, which are in general not equal. The standard
deviations are, of course, ψ and .
This problem asks about some of the steps needed to do the likelihood ratio test of
H0: 2 = 2. (The final form of the test is not part of the problem, however.)

Page 1
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

(a)
(b)
(c)
(d)
Write the likelihood for this problem.
Give the maximum likelihood estimates for , 2, , and 2. (If you remember
these, you can simply state the result without any derivation.)
Write the maximized likelihood. That is, substitute your answers in (b) into the
likelihood of (a).
What would be the maximum likelihood estimate of 2 under H 0 : 2 = 2 ?
(Obviously this is also the maximum likelihood estimate of 2.)
5. A certain biochemical measurement is used to determine whether or not a person
has been exposed to a certain disease. If the person has been exposed to the disease,
then the measurement (call it X) follows a normal distribution with  = 4.0 and  = 1.2.
If the person has not been exposed, then X follows a normal distribution with  = 2.2
and  = 1.2. Since the only difference is in the mean, the problem has been set up as a
hypothesis test of H 0 :  = 4.0 versus HA :  = 2.2. (For medical considerations the
“exposed” version has been set up as the null hypothesis.)
(a)
(b)
(c)
Based on a single observation X, give the details of the best test at level  = 0.01.
(Here “details” means that you should give cutoff points as numbers.)
For the rule in (a), give the probability of type II error.
This is somewhat harder. You’ll find that the solution to (b) is not very
satisfying. Suppose that you could take a whole sample of independent values
X 1 , X 2 ,..., X n and base the procedure on X . What is the smallest sample size n
for which you would be able to get the probability of type II error at or below 0.01
(while still keeping  = 0.01)?
6. The model for Poisson regression is that x1 , x 2 ,..., x n are n positive known values,
Y1, Y2 , ..., Yn are n independent random variables, with Yi ~ Poisson(xi).
(a)
(b)

Find the least squares estimate of . HINT: E Yi = xi.
Find the maximum likelihood estimate of .
Page 2
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

7. The random variable U takes its values in the set {0, 1, 2, 3, 4, 5, 6}. Two possible
probability laws are under consideration.
(a)
(b)
(c)
(d)
0
1
2
3
4
5
6
f0
1
16
1
7
3
16
1
7
4
16
1
7
3
16
1
7
2
16
1
7
1
16
f1
2
16
1
7
1
7
Based on one observation of U, you wish to test H 0 : f0 versus HA: f1 . What is the
best test of size  = 18 ?
For the test that you found in (a), give the power, meaning 1 – P(Type II error).
Find the best test of size  = 83 .
For the test in (c), give the power.
8. Suppose that you have exactly one observation X on a random variable which has the
 x
triangular distribution on [0, ]. The density is f(x | ) = 2 2 I  0  x   . You

want to test H0:  = 10 versus H1:  = 12 at significance level  = 0.05. Give the details
of the best test.
9. Suppose that you have exactly one observation X on a random variable which is
uniform on the interval [0, ]. You want to test H0:  = 10 versus H1:  = 12 at
significance level  = 0.05. Give the details of the best test.
10. You have a sample X1, X2, …, Xn from the normal probability density N(, 2). Note
that the mean and standard deviation are the same. Assume that  > 0. Find the
maximum likelihood estimate of .
11. Consider the regression of Y on X1, X2, X3, X4, and X5. The units on the variables are
these:
Y
Hours
X1
Inches
X2
(none)
X3
Weeks
X4
Hours
X5
Dollars
Here X2 has no units; its values are pure numbers.

Page 3
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

Give the units for each of the following statistical quantities.
(a)
(b)
(c)
(d)
(e)
b1
b2
b4
s
R2
(f)
(g)
(h)
(i)
(j)
ei (a residual)
ri (a Studentized residual)
F (the F statistic)
SSE (error sum of squares)
det( X´X )
12. Consider the regression model noted as
Y  X  
n1
n p p1
n1
Two people are making independent attempts to analyze this model.
Andrea assumes that Var = 2 I and thus use the ordinary least squares estimate bLS =
(X´X)-1 X´Y.
Barbara assumes that Var = 2 V, where V is a specified (known) n-by-n matrix, and
she uses the generalized least squares estimate given by bV = (X´V-1X)-1 X´ V-1Y.
(a)
(b)
(c)
Suppose that Andrea is correct and that Var  really is 2 I. Exactly what do we
mean when we say that bLS is better than bV?
Again assuming that Andrea’s assumption is correct, what is Var(bV)?
Suppose that Barbara is really correct in that Var = 2 V. Find Var(bV) in this
case.
13. This problem concerns the regression given by the model
Yi = 0 + A Ai + B Bi + C Ci + D Di + i
(a)
(b)
(c)
(d)
Compare the R 2 statistic for the model to the R 2 statistic after omitting variable B.
How would you test the null hypothesis H 0 : D = 0 ?
How would you test the combined null hypothesis H 0 : C = 0 and D = 0 ?
How would you test the null hypothesis H 0 : C = D? (Assume that variables C
and D are in the same units.)
14. You have data (x1, y1), …, (x24, y24), and you have done the regression of Y on X.
This resulted in the fitted regression equation Y = 13.2 + 4.8 x. The standard error of

Page 4
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

b 
regression was reported as s = 2.2. The estimated variance matrix for  0  was given as
 b1 
 16 4 

 . Give a 95% confidence interval for the linear combination 2 0 + 3 1.
 4 2.25 
15. The multiple regression model is
Yi = 0 + 1 xi1 + 2 xi2 + … + k xik + i
for i = 1, 2, …, n. It is assumed that the xij’s are all nonrandom and that 1, 2, …, n is
a sample from a normal N(0,  population.
What are the sufficient statistics for this model?
16. This problem considers the regression of YIELD on the predictors ARCH, BERRY,
CRUST, DEPLETE, and EDGAR. There is some concern as to whether we can accept
the hypothesis H0 : ARCH = 0, BERRY = 0, CRUST = 0. There is a separate question as to
whether we can accept H0 : DELETE = 0, EDGAR = 0. Observe that each of these
hypotheses involves more than a single .
The regression of YIELD on all five variables yielded this output:
The regression equation is
YIELD = 734 + 1.36 ARCH - 3.26 BERRY - 0.050 CRUST
+ 0.0123 DEPLETE - 0.258 EDGAR
Analysis of Variance
Source
DF
SS
Regression
5
20540.6
Error
76
47297.3
Total
81
67837.9
MS
4108.1
622.3
F
6.60
P
0.000
The regression of YIELD on ARCH, BERRY, and CRUST gave this:
The regression equation is
YIELD = 720 + 1.10 ARCH - 3.34 BERRY - 0.004 CRUST
Analysis of Variance
Source
DF
SS
Regression
3
19948.5
Error
78
47889.4
Total
81
67837.9

Page 5
MS
6649.5
614.0
F
10.83
P
0.000
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

The regression of YIELD on DELETE and EDGAR gave this:
The regression equation is
YIELD = 598 - 0.0531 DEPLETE + 0.611 EDGAR
Analysis of Variance
Source
DF
SS
Regression
2
6478.0
Error
79
61359.9
Total
81
67837.9
(a)
(b)
MS
3239.0
776.7
F
4.17
P
0.019
Based on these findings give the test for H0 : ARCH = 0, BERRY = 0, CRUST = 0
and indicate whether we would accept or reject H0 at the 0.05 level of
significance.
Based on these findings give the test for H0 : DELETE = 0, EDGAR = 0 and indicate
whether we would accept or reject H0 at the 0.05 level of significance.
Solutions begin on next page.

Page 6
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

SOLUTIONS
1. The completed table is this:
SOURCE OF
VARIATION
Regression
Error
Total
DEGREES OF
FREEDOM
3
20
23
SUM
SQUARES
135.00
180.00
315.00
MEAN
SQUARES
45.00
9.00
F
5.00
2. It is tempting to drop X3 from the problem because it is irrelevant to the hypothesis. In
this case, it’s a legitimate decision, but we’ll show the results while keeping X3. The
likelihood is
1
3

 xi i 2
1
200
L = 
e
i 1 10 2
Under the alternative HA, the best (maximum likelihood) estimate for each i is xi. The
exponential part of the maximized likelihood will drop out completely, and the
maximized likelihood is
3
1
1
=
3/ 2
2
1, 000  2 
 10
LA,max =
i1
x1  x2
and  3 = x3. Observe then that
2
2
2
2
 x1  x2 
x x 

x x 
  x1  1 2  =  1  2  =
2 
4

2 2
Under H 0 , you’ll find  1   2 
 x1  ˆ 1 
2
 x1  x2 
Similarly,  x2  ˆ 2  =
2
2
. Of course  x3  ˆ 3  = 0. This results in a
2
4
maximized likelihood under H0 which is
 x x 
1   x1  x2 
 1 2 0
200 
4
4

2
L0,max =

1
1, 000  2 
3/ 2
e
 x x 
1   x1  x2 
 1 2 0
200 
4
4

2
=


1
1, 000  2 
3/ 2
e
2
Page 7
2




=




1
1, 000  2 
3/ 2
e

 x1  x2 2
400
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

It follows that the likelihood ratio test statistic is
1
e

 x1  x2 2
1, 000  2 
L
 = 0,max =
1
LA,max
3/ 2
1, 000  2 
3/ 2
It follows immediately that – 2 log  =
 X1  X 2 
400
= e
 x1  x2 

 x1  x2 2
400
2
200
. In random variable from, this is
2
. However, under H 0 , it happens that X1 – X2 is N(0, 2×100), and so it
200
follows that – 2 log  ~ 12 exactly.
3. We must begin by writing the likelihood. This is
1 n
1
2
2
n
 2   yi xi 
 2  yi xi 
1
1
2  i 1
2
e
L = 
= n
e
n/2
  2 
i 1  2
For part (a), we need the maximum likelihood estimate. This is done in the usual way.
log L = n log  
n
1 n
2
log  2  
y  xi 
2  i
2
2 i 1
let

1 n
log L  

2
x
y


x
 0




i
i
i

22 i 1
This condition can be written as
n
 xi2 
i 1
n
x y
i i
i 1
n
The solution is then  ML 
xY
i i
i 1
n
x
, in random variable form.
2
i
i 1

Page 8
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

For part (b), we need Fisher’s information. There are several ways to do this, but
probably the easiest is to exploit the second derivative.
 2

 
1 n

I() = E  2 log L  = E  
2 xi  yi  xi  
2 

   2 i 1
 

n
2
 xi
  1
 1


= E   2   xi yi  xi2   =  E  2    xi2   =
  i 1


    i 1
n
n
i 1
2
It follows that the limiting variance of the maximum likelihood estimate is the reciprocal
2
of this, namely n
.
2
 xi
i 1
For part (c) we’ll find the exact variance of the maximum likelihood estimate. This is
 n
  xiYi
Var  i n1
2

  xi
 i 1
=
n
1
 n 2
  xi 
 i 1 


1
1
 n

 =
Var   xiYi  =
2
2
n
n

 i 1



2
2

  xi 
  xi 

 i 1 
 i 1 
2
x
i 1
2
i

2
=
n
x
i 1
2
i
Var Yi 
2
n
x
2
i
i 1
Finally, for part (d) we note that the exact and asymptotic variances are exactly the same.
Thus, there is no need to consider what happens as n  .
4. The likelihood is
1
2
1

 xi  2   n
 m

 y j  
1
1
22
2 2
e
e
L = 
  


 i 1  2
  j 1  2
2
1
1 n
2




  xi   
  y j   
1
1
2  2 i1
2 2 j 1
e
e
=  m
 n

m/2
n/2
   2 
    2 

m

Page 9
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

For part (b), the maximum likelihood estimates from the X’s are  = X and
2
1 m
ˆ 2 
X i  X  ; these are given in random variable form. From the Y’s, the


m i 1
2
1 n
ˆ 2   Y j  Y  .
maximum likelihood estimates are  = Y and 
n j 1
The maximized likelihood (call it LA,max ) is LA,max =
2
1
1 n
2




  xi  x  
 yj y 
1
1
ˆ 2 i1
ˆ 2 j 1
2
2
e
e
 m
 n

m/2
n/2
ˆ  2 
ˆ  2 
 
  

m
2
1
m
1
2


 yj y 


  xi  x  
2  j 1
1 n
2  i 1
1 m
2   y j  y  
2   xi  x  


 
1
1
 m i 1

 n j 1

e

e
=  m



m/2
n/2
n
ˆ
ˆ

2


2










 

n
m
n
m n


 
 
 
1
1
1
2
2
e  n
e  =
=  m
e 2 2
m/2
n/2
 m n / 2
m n
ˆ
ˆ
ˆ
ˆ

2


2

   2 

  
   
For part (d), we’ve got to consider the likelihood with 2 = 2. This is
2
1
1
2




  xi   
  y j   
1
1
2  2 i1
2  2 j 1
e
e
L0 =  m
 n

m/2
n/2
   2 
    2 

m
=

1
 m  n  2 
 m n / 2
e
n
n
2
1 m
2
   xi     y j  
2  2  i 1
j 1



Now, in finding the maximum likelihood estimate, we’ll still have to have  = X and
 = Y . The likelihood at this intermediate stage of maximization is
L0(intermediate) =


1
 m  n  2 
 mn / 2
e
Page 10
n
1 m
2
   xi  x    y j  y
2  2  i 1
j 1


2


gs2011
C22.0015 Information for Final Exam 2011.MAY.04

To get the maximum likelihood estimate for  (or 2), let’s take the logarithm.
log L0(intermediate)
=   m  n  log  
n
2
mn
1 m
2
log  2  
x  x     yj  y 
2  i
j 1
2
2  i 1

Then the derivative:
n
2  let

mn
1 m
2
log L0  intermediate   
 3    xi  x     y j  y    0
j 1


  i 1

The solution is
ˆ 02 

n
2
1 m
2
   xi  x     y j  y  
j 1
m  n  i 1

The details of the likelihood ratio test would be straightforward at this point, though
messy.
5. The best test will be Neyman-Pearson, and H 0 will be rejected for
1

e
1
21.22
 x  4.0 2
1.2 2
1

 x  2.2 2
1
21.22
e
1.2 2
 k
This quickly reduces to rejecting H 0 if X < k (for some k). This form of the rule could be
deemed obvious just from the statement of the problem.
The relevant detail comes in the selection of k so that the level of the test is 0.01. We set
up the following condition:
k  4.0 
 X  4.0 k  4.0



  40 = P  Z 
P[ X < k   = 4.0 ] = P 
1.2 
1.2

 1.2


Page 11
want
 0.01
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

k  4.0
 2.33 , for which
1.2
the solution is k = 1.204. This takes care of (a). We note that the cutoff k is not
between the null and alternative values (4.0 and 2.2), so we expect very high
probability of Type II error.
The normal table gives P[ Z < -2.33 ] = 0.01, so we solve
For (b), we need the probability of type II error. This is found as
 X  2.2 1.204  2.2

P[ X  1.204   = 2.2 ] = P 

  2.2 = P[ Z  -0.83 ]
1.2
 1.2

= 0.7967  80%
For part (c), you know that the rule will be to reject H 0 if X < k. You also know that

1.2
SD( X ) =
=
. Now find
n
n

 X  4.0

P[ X < k   = 4.0 ] = P 
1.2

n
k  4.0 

= P Z  n
1.2 

Now we solve for k in
n
k  4.0
1.2
n


  4.0 

want
 0.01
k  4.0
2.796
  2.33 , giving k = 4.0 
.
1.2
n
We will now use this k in dealing with the type II error probability.
2.796


4.0 
 2.2




X  2.2
2.796
n

  2.2 
P  X  4.0 
  2.2 = P 
1.2
n
 1.2



n
n


2.796 

1.8 

n  = P  Z  1.8 n  2.796 
= P Z 



1.2
1.2




n 



Page 12
want
 0.01
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

1.8 n  2.796
 2.33 The solution is n  3.107.
1.2
The objective can be reached if n  (3.107)2  9.65. Thus we would need at least 10
observations in order to make both misclassification probabilities 1% or less. This is not
4.0  2.2
a massive sample size. Since the null and alternative values are
= 1.5 standard
1.2
deviations apart, we do not need a lot of data to decide between H0 and H1 .
As P[ Z  2.33 ] = 0.01, we solve
n
  y  x 
6. The least squares criterion is simply Q =
i
i 1
i
2
. Routine differentiation will
n
lead to  LS 
x y
i i
i 1
n
x
.
2
i
i 1
To obtain the maximum likelihood estimate, we start naturally with the likelihood:
n
L =
e
xi 
xi
yi
n
= e
yi !
i 1
   xi
i 1
n

 yi
i 1
xiyi

i 1 yi !
n
Then we find
n
n
i 1
i 1
log L =  xi   yi log  
n
 y log x
i 1
i
i

n
 log  y !
i 1
i
n
Then
n

log L    xi 
i 1

 yi
i 1

let
 0
n
This solves as  ML 
y
i
i 1
n
x
=
i
y
.
x
i 1

Page 13
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

7. The best tests are based on the likelihood ratio. This table gives the relevant
information.
0
1
2
3
4
5
6
f0
1
16
1
7
3
16
1
7
4
16
1
7
3
16
1
7
2
16
1
7
1
16
f1
f0
f1
2
16
1
7
7
16
14
16
21
16
28
16
21
16
14
16
7
16
1
7
f0
. The obvious most extreme
f1
under H 0 , solving (a).
The null hypothesis is to be rejected for small values of
rejection set is {0, 6}. This has probability
The set {0, 6} has probability
2
7
2
16
=
1
8
under HA , the power of the test, as requested in (b).
If we enlarge the rejection set to {0, 1, 5, 6}, this has probability
thus is our solution to (c).
For part (d), we simply notice that this set has probability
4
7
6
16
=
3
8
under H 0 and
under HA.
8. This is clearly a Neyman-Pearson problem. The null hypothesis likelihood is
f(x |  = 10) =
10  x
I  0  x  10 
50
The alternative likelihood is
f(x |  = 12) =
12  x
I  0  x  12 
72
The best test rejects H0 whenever
=

f  x |   10 
 c
f  x |   12 
Page 14
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

The value c can be chosen to bring the test to level  = 0.05. However, we see easily
that
10  x
I  0  x  10 
72
50

 =
=
12  x
50
I  0  x  12 
72
10  x  I  0  x  10 
12  x  I  0  x  12 
We can absorb the numerical constant into c so that the best test requires that we reject
10  x  I  0  x  10  c.
H0 when
12  x  I  0  x  12 
There are four critical ranges of x to consider.
Numerator
Denominator
Ratio
0  x  10
10 - x
12 - x
10  x
12  x
x<0
0
0
irrelevant
10 < x  12
0
12 - x
12 < x
0
0
0
irrelevant
We will ignore the “irrelevant” sets, since they are impossible under both H0 and also
under H1 . The set { 10 < x  12 } should certainly be included in the rejection set, since
a value of x in that range is a clear indicator that H0 is false. We should also reject for the
condition
10  x
12  x
< c
10  12c
. This fraction is just another
1 c
constant, so we might as well describe the condition for rejection as x > c.
This condition is quickly rewritten as x >

Page 15
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

For this final use of c, we need to pick the c so that P[ X > c |  = 10 ] = 0.05. The
condition we’ll work with is
10  x
0.05  P[ X > c |  = 10 ] = 
I  0  x  10 dx =
50
c
10
want
1
=
50
=
10  x
dx
50
c
10

x 10
1 
x2 

10
x

=
c 10  x  dx 50 
2  x c
10

1 
c2  
1 
c2 
 100  50    10c    =
 50  10c 

50 
2 
50 
2

An equivalent statement is
2.50 =
c2
 10c  50
2
or
c2  20c  95 = 0
This is a routine quadratic whose roots are
c =
20 
 20
2
 4  1  95
2
=
20  20
= 10 
2
5
Numerically, these are 7.7639 and 12.2361. For this example, it’s the smaller root that
matters. Thus, we’ll reject H0 whenever x > 7.7639.
9. The null hypothesis likelihood is f(x |  = 10) =
likelihood is f(x |  = 12) =
=
1
I  0  x  10  . The alternative
10
1
I  0  x  12  . The best test rejects H0 whenever
12
f  x |   10 
 c
f  x |   12 
The value c can be chosen to bring the test to level  = 0.05. However, we see easily that

Page 16
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

1
I  0  x  10 
I  0  x  10 
6

 = 10
=
1
5
I  0  x  12 
I  0  x  12 
12
We can absorb the numerical constant into c so that the best test requires that we reject
I  0  x  10 
H0 when
 c.
I  0  x  12 
However, the numerator and denominator can only take values of 0 or 1. Thus, there are
only four possible values.
Numerator
Denominator
Ratio
x<0
0
0
irrelevant
0  x  10
1
1
1
10 < x  12
0
1
0
12 < x
0
0
irrelevant
We will choose to ignore the possibilities x < 0 and 12 < x, as these are impossible under
both H0 and H1. Certainly the ratio is small on the set 10 < x  12, so we should include
this interval in our rejection set. This takes nothing out of our level of significance, since
P[ 10 < x  12 | H0 ]. This means that we can spend the entire  = 0.05 on any subset of
the interval [0, 10]. It seems most natural to reject H0 on the set { x > 9.5 }, since
P[ X > 9.5 | H0 ] = 0.05. We could however create other equally-good rejection sets.
Some odd examples might be these:
Reject H0 on the set { x < 0.5 }  { x > 10 }.
Reject H0 on the set { 0.5 < x < 1.5 }  { x > 10 }.
Reject H0 on the set { 1.0 < x < 1.25 }  { 3.5 < x < 3.75 }  { x > 10 }.
This is a consequence of the uniform distribution, as it has a completely flat likelihood.

Page 17
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

10. The likelihood is
1
 2  xi  
1
L = 
e 2
i 1  2 
n
2

1
=

n
 2 
n/2
e
1
2 2
n
  xi  2
i 1
The maximizing will almost certainly be easier through log L :
log L = n log  
= n log  
n
1 n
2
log  2   
x  
2  i
2
2 i 1
n
n
1  n 2

log  2  
x

2

xi  n2 

2  i
2
2  i 1

i 1
n
= n log  
x
2
i
i 1
22

1 n
 xi   terms without 
 i 1
Now we can differentiate:
n
d
n
log L =  
d

x
2
i
i 1
3

let
1 n
x
i 0
2 i 1
If we multiple through by -3, we produce this quadratic equation:
n
n2   xi 
i 1
n
let
 xi2  0
i 1
This looks a little better if we divide by n. This gets us to
2  x 
1 n 2 let
 xi  0
n i 1
x 
The roots of this are

x2 
4 n 2
 xi
n i 1
2
Page 18
.
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

The item under the square root looks a little better if rearranged as follows:
4 n 2
4 n 2
2
2
2
x
=
x

 xi  nx  nx  =

i
n i 1
n  i 1

4
  n  1 s 2  nx 2 
n
x2 
x2
= 5x2  4
n 1 2
s
n
x 
5x2  4
Thus, we can re-identify the roots as
n 1 2
s
n
. It’s clear that the +
2
part of the  is relevant here, as the - part will produce a negative answer. Thus, we
present our maximum likelihood estimate as
ˆ ML =
x 
5x2  4
n 1 2
s
n
2
This answer makes sense in data for which x  s. Then the item under the square root is
about 9 x 2, and the estimate about
 x  9x 2
, which is x .
2
11. Units are shown here.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)

b1
hours/inch
b2
hours
b4
no units
s
hours
2
no units
R
ei (a residual)
ri (a Studentized residual)
F (the F statistic)
SSE (error sum of squares)
det( X´X )
Page 19
hours
no units
no units
hours2
inch2 × week2 × hours2 × dollars2
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

12.
(a)
(b)
The Gauss-Markov theorem states that bLS is best in the sense that
Var(b*) - Var(bLS) is a positive semi-definite matrix. In this statement, b* is any
other unbiased estimate, including bV.
Using Andrea’s assumptions and using Andrea’s calculations,
Var(bV) = Var( (X´V-1X)-1 X´ V-1Y )
= (X´V-1X)-1 X´ V-1 Var(Y) [(X´V-1X)-1 X´ V-1]´
= (X´V-1X)-1 X´ V-1 2 I V-1 X (X´V-1X)-1
= 2 (X´V-1X)-1 X´ V-1 V-1 X (X´V-1X)-1
= 2 (X´V-1X)-1 X´ V-2 X (X´V-1X)-1
This cannot be simplified further. This derivation used (X´)´ = X and [ (X´V-1X)-1 ]´
= (X´V-1X)-1.
(c)
Using Barbara’s assumptions, and Barbara’s calculations,
Var(bV) = Var( (X´V-1X)-1 X´ V-1Y )
= (X´V-1X)-1 X´ V-1 Var(Y) [(X´V-1X)-1 X´ V-1]´
= (X´V-1X)-1 X´ V-1 2 V V-1 X (X´V-1X)-1
= 2 (X´V-1X)-1 X´ V-1 X (X´V-1X)-1
= 2 (X´V-1X)-1
13.
(a)
The R 2 for the full model is bigger.
(b)
There are two ways to do this. One is to use the t-value listed for variable D; this
gives an ordinary t statistic, and we reject H 0 if  t   tn-5; /2 . The other way uses the
reduced sum of squares method:

Page 20
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

 residual sum of squares   residual sum of squares 



 from model with A, B, C   from model with A, B, C , D 
 residual sum of squares 

   n  5
 from model with A, B, C , D 
 residual sum of squares   residual sum of squares 



from model with A, B, C   from model with A, B, C , D 

=
 residual mean square 


 from model with A, B, C , D 
The null hypothesis is to be rejected if this calculation equals or exceeds F1,n5 ,
the upper  point for the F distribution with 1, n – 5 degrees of freedom.
(c) For this part, you MUST use the reduced sum of squares method:
 residual sum of squares   residual sum of squares  


  2
 from model with A, B   from model with A, B, C , D  
 residual sum of squares 

   n  5
 from model with A, B, C , D 
The null hypothesis is to be rejected if this calculation equals or exceeds F2,n5 ,
the upper  point for the F distribution with 2, n – 5 degrees of freedom.
(d) Observe that C Ci + D Di = C (Ci + Di) + (D - C) Di .
Thus, by defining a new variable Ti = Ci + Di and a new coefficient E = C - D , we can
rewrite the model as
Yi = 0 + A Ai + B Bi + C Ti + E Di + i
Now we have changed the hypothesis to H 0 : E = 0. This is now a question about a
single coefficient, and the method of part (b) can be used.
14. Certainly the point estimate for 20 + 31 is 2b0 + 3b1 = 2(13.2) + 3(4.8) = 26.4 +
14.4 = 40.8. Next observe that
Var(2b0 + 3b1) = 4 Var(b0) + 12 Cov(b0, b1) + 9 Var(b1)

Page 21
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

This is estimated as
4(16.0) + 12(-4.0) + 9(2.25) = 36.25
36.25  6.0208.
Then SE(2b0 + 3b1) =
With n = 24, the estimate of  must have 22 degrees of freedom. Thus, the 95%
confidence interval is
40.8 ± t22;0.025 6.0208
or
40.8 ± 2.0744 × 6.0208, or 40.8 ± 12.4895
You could reasonably give the interval as 40.8 ± 12.49, meaning (28.31, 53.29). Observe
that you do not explicitly use the value of s. This was incorporated into the estimated
b 
variance matrix of  0  .
 b1 
15. The likelihood for this model is
1
 2  yi  0  1 xi1  2 xi 2  ...  k xik 
1
L = 
e 2
i 1  2
n

1
=
 n  2 
n/2
e
1
2 2
n
  yi  0  1 xi 1  2 xi 2  ...  k xik 
2
2
i 1
It’s clear that all the sufficient statistics must come from the exponent. This exponent is

1 n
2
y  0  1 xi1  2 xi 2  ...  k xik  =
2  i
2 i 1

n
n
n
n
 terms with  
1 n 2
y

2

y

2

x
y

2

x
y

...

2

xik yi  






i
0
i
1
i
1
i
2
i
2
i
k

2
2  i 1
i 1
i 1
i 1
i 1
 no y's  
Thus the sufficient statistics are these:
n
y
i 1

2
i
n
,
n
y , x
i1 yi ,
i
i 1
i 1
n
n
x
i 2 yi , …,
i 1
x
y
ik i
i 1
Page 22
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

There are k + 2 of these. This makes perfect sense, since there are k + 2 parameters,
namely 0, 1, …, k, .
n
Many people also include in the sufficient statistics the sums
 xij ,
i 1
n
 xij2 ,
i 1
n
x
x .
ig ih
i 1
Technically, we don’t use these as statistics, since the x’s are modeled as nonrandom.
However, we will still need these sums to compute the estimates, so that there is no harm
in including them.

Page 23
gs2011
C22.0015 Information for Final Exam 2011.MAY.04

16. Each of these tests is based on the reduced sum of squares method. For part (a) we
examine
 SSresid from model 
 SSresid from model  

  
  3
using D, E 

 using A,B,C , D, E  

F =
 SSresid from model 

  76
 using A,B,C , D, E 
Here we use A for ARCH, B for BERRY, and so on. Numerically, the statistic is
F =
61,359.9  47,297.3  3
 7.532
47,297.3  76
0 . 05
This is to be compared to F3,76
= 2.7249. As our computed value of 7.532 exceeds this,
we just reject H 0 . We cannot accept H0 : ARCH = 0, BERRY = 0, CRUST = 0.
Part (b) is similar. We do this statistic:
 SSresid from model 
 SSresid from model  

  
  2
using
A
,
B
,
C
using
A
,
B
,
C
,
D
,
E




F = 
 SSresid from model 

  76
 using A,B,C , D, E 
The numeric work is this:
F =
47,889.4  47,297.3  2
 0.476
47,297.3  76
As this is less than 1, we will certainly accept H 0 . Just for the record, the comparison
. 05
point is F20, 76
= 3.1170.

Page 24
gs2011
Download