Lab 2

advertisement
MTH u481, Summer 1, 2005,
Computer Lab 2
Estimation and testing hypotheses simulation for one-parameter Beta
distribution using SPSS.
Part 1. Comparison of Mean Squared Errors for MLE and MOM estimates under
one-parameter beta distribution using SPSS.
Introduction.
This lab deals with a one-parameter beta distribution, which is described by the following
pdf:
fX(X) = *x^(-1) , 0 < x < 1.
(1)
The maximum likelihood estimate for  is mle = –n / sum (ln (Xi)).
The method of moments estimate for  is
mom = E(x)/ (1 – E(x)), where E(x) is the
sample mean of the X-sample.
( Both of these estimates were subject of your homework: Problems 5.8.5, and 5.8.12)
1. Find the cumulative distribution function, if pdf is given by (1).
In this lab you have to find out experimentally which estimate for  gives more accurate
results.
Procedure.
0. Open SPSS
1. Create a sample of size 100 with pdf given in (1) with  = 2 (fX(X) = 2 * x).
a) First fill one column with a 100 1’s, the easiest way to accomplish it is to fill 10
rows with 1, and then select all 10 rows, copy, and paste 10 times, at the end.
b) Now create a 100-sample with uniform pdf: First, select transform menu,
compute submenu. Type the name of the new variable in the target variable box,
for example “uniform”. Select RV.UNIFORM function, and move it to the
Numeric expression box, by pressing the button with the black triangle below it.
Fill the min and max parameters, they are 0 and 1 in this case.
c) Finally, transform the uniform sample to get the sample with pdf = 2x, 0<x<1.
This is based on the following fact: If X is uniformly distributed over [0,1], and F(y)
is the Y’s continuous cumulative distribution function F(y)=P(Y<y), then F(Y) is
Uniformly distributed over [0,1]
(Prove this if you have forgotten the proof given in class).
Inverting this, we obtain the following rule for our case:
To get the sample with pdf = 2x, we need to compute the square root of each
sample with uniform distribution. To do this, first, select transform menu,
compute submenu. Type the name of the new variable in the target variable box,
for example “bata2”. Select sqrt function, and move it to the Numeric expression
box, by pressing the button with the black triangle below it. You need to take the
square root of the uniform variable, so copy the variable name (“uniform”) inside
the parentheses of the sqrt function. You can check the pdf of the resulting sample
by looking at a histogram (Select Graphs menu, Histogram option, and choose the
beta2 variable).
2. Compute the maximum likelihood estimate for . (Pretend that you forgot that
 = 2, and let us estimate the parameter from the sample. We assume only that the
sample has pdf = *x^( - 1), and find  that best fits the data.)
a) The estimate is mle = –n / sum (ln (Xi)). To compute it, first create a new
variable with natural logarithm of the sample. (Use the compute variable
utility, select ln as a function, beta2 as the initial variable, and “lnbeta2” as a
target variable).
b) Now compute the sum of ln(Xi), by selecting Analyze, Descriptive Statistics,
Descriptives. Select the lnbeta2 variable. Press the options button, select the
sum, and deselect all other parameters to suppress the unnecessary output.
(The same variable will be used in Part 2, so store it!)
c) Now you have the sum of logs, n = 100, so use the formula above and
compute the Maximum likelihood estimate for . It should be close to 2,
if you have not made any mistakes.
3. Compute the method of moments estimate for .
The formula is
mom = E(x)/ (1 – E(x))
d) Compute the sample mean of beta2 sample. (Select Analyze, Descriptive
Statistics, Descriptives. Select the beta2 variable. Press the options button,
select the mean, and deselect all other parameters to suppress the unnecessary
output.) Now you have E(x). Use the formula above to find the method of
moments estimate for . It should be close to 2.
4. Compare the two estimates to the real  = 2.
In order to get more meaningful results, repeat steps 1 – 3 10 times, and get 20
different estimates for , (10 MLE and 10 MOM estimates). (You don’t have to
fill out the 100 1’s every time, since they stay the same).
NOTE: Compute and record the values of the mean for both the uniform and
the beta2 distributions in all 10 trials. Also in all the ten trials compute the sum of
the logs for both the uniform distribution, and the distribution with pdf = 2x.
(They will be used in part 2 of this lab. For each trial you should record 4 values
2 means, and 2 sums of natural logs).
Now compute the mean square error (MSE) for your MLE and MOM estimates.
MSE = sum(est - )^2 over all 10 values. ( = 2).
Compare the MSE values for the 2 estimators, which method is better? (has lower
mean squared error)?
Part 2. Hypotheses testing.
Introduction
In this part, you will use the results from part 1 to do two alternative tests of
hypotheses.
The null hypothesis: The data is uniformly distributed.
Alternative hypothesis: The data has pdf =*x^(-1) , 0<x<1, >1.
You will use two different tests, one based on means of the samples, and another based
on the product of the samples.
1. Hypotheses testing based on sample means.
You have the 10 means of the uniform samples, and 10 means of the samples with
pdf = 2x which corresponds to =2.
The uniform samples have mean = 0.5, and variance = 1/12. Thus the mean of a
sample of size 100 has variance = 1/(12*100). Using the CLT, the
z – score is z = (mean - .5) sqrt(1200). Use the 10% significance level to reject or
accept the null hypotheses. The critical z-value is 1.28.
Open a new data file, enter the means of your 10 trials into 1 variable, and the means
of samples with  = 2 into another variable.
b) Compute the z –score for each of the 10 means of uniform samples, and decide
whether to reject Ho or not based on the critical value above. In how many of the
samples Ho was rejected?
c) Compute the z-scores for each of the samples with = 2, and decide whether to
reject the samples or not, based on the same critical value. In how many samples
Ho was rejected?
2. Hypotheses testing based on Likelihood Ratio Test.
From the discussion in class, LRT provides the maximal power. Besides, in this
example it is uniformly most powerful over all >1. We should reject Ho if
the product of the samples is greater than a critical value, t. But what should t be to make
the p-value 10% ? Besides, the product of 100 random numbers from (0,1) is too small
for the computer.
Taking the log of both sides, sum of logs of the samples should be greater than ln(t).
Using the CLT, the test becomes whether [sum(ln(Xi))+n]/sqrt n > (ln(t)+ n)/sqrt n
(see the class handouts).
To get 10% significance level, (ln(t) + n)/sqrt n = 1.28.
a) Enter the values for the sum of the logs of the 2 distributions into 2 different
variables.
a) Apply the test above to each of the sum of logs of uniform samples, and find in
how many Ho was rejected.
b) Apply the test above to each of the sum of logs of samples with pdf = 2x, and find
in how many of them Ho was rejected.
c). (Optional). Do the same as in b) for =4 and significance level .15.
Remark. We did the simulation to confirm theoretical results. More frequently it is
applied with another aim, for example, when we cannot evaluate the critical
value analytically. Then it can be found numerically with any given
precision from the condition that the frequency of null hypothesis rejections
is as specified. This method is called Monte-Carlo estimation.
Download