Assignment 2 - City University of Hong Kong

advertisement
MA3518: Applied Statistics
Page 1
Department of Mathematics
Faculty of Science and Engineering
City University of Hong Kong
MA 3518: Applied Statistics
Assignment 2
Question 1:
Construct a SAS input to generate 30 random numbers from the normal distribution with
mean=100 and sd=3 by using the function mean+sd*rannor(seed).
(a) Describe the random sample thus obtained. What are the sample values of mean, sd,
skewness and excess kurtosis ? Does the data satisfy the normality assumption?
(b) Construct a SAS input to select about 40% of the numbers from the 30 numbers
without replacement, ( said between 10 to 14 ). To show its randomness, your input
should produce another subset when re-run.
(c) Test the hypothesis that the average value of the subset in (b) is 100 at a significance
level of 5%.
(5 marks)
Question 2:
Download the data set containing the daily open, high, low and close values for the
NASDAQ Index and the S&P500 Index from 23 March 2001 to 23 March 2004
(Remember to quote the source of the data set) and import the data set to the SAS work
library. Alternatively, use the datafile A2Q2.csv
Suppose we are interested in investigating the proportion of the total variation of the daily
squared returns of the S&P500 Index (S2_t) explained by the daily squared returns of the
NASDAQ Index (S1_t), the daily logarithmic returns of the S&P500 Index (LR2_t), the
MA3518: Applied Statistics
Page 2
daily ranges of the S&P500 Index (R2_t), the daily logarithmic returns of the NASDAQ
Index (LR1_t) and the daily ranges of the NASDAQ Index (R1_t).
(a) Fit a multiple linear regression model to facilitate the investigation. Are all of the
factors significant in explaining the total variation of the daily volatility of the
S&P500 Index at 10% significance level ?
(b) Create a plot of the observed daily squared returns of the S&P500 Index against the
corresponding predicted values and comment on the plot.
( c) Create a plot of the studentized residuals against the daily logarithmic returns of the
NASDAQ Index and comment on the plot .
(5 marks)
Question 3:
Consider the problem in Question 2 again. Suppose we are interested in investigating the
best model for the daily squared returns of the S&P500 Index.
(a) Select the “best” regression model by the best subset selection model by the
following criteria:
(i)
(ii)
(iii)
R2
Adjusted R2
Mallow Cp
(b) Select the “best” regression model by forward selection with significance level 5%.
(c) Select the “best” regression model by backward elimination with significance level
10%.
(d) Select your best choice and verified that all factors are significant in explaining the
total variation of the Index at 10% significance.
(5 marks)
MA3518: Applied Statistics
Page 3
Question 4:
Consider the following data of the radiation readings from a nuclear source measured by
a laboratory meter with various levels of background radiation levels:
Low
1.25
1.17
1.32
1.18
1.62
1.11
1.32
1.31
1.33
Medium High
1.28
1.12
1.36
1.33
1.12
1.26
1.22
1.30
1.36
1.28
1.21
1.18
1.33
1.10
1.28
1.16
1.13
1.62
Suppose you are a statistical consultant of the manufacturer of the laboratory meter and
want to provide some advice on how the readings of the meter are affected by the
background radiation levels.
(a) State the statistical model you are going to use to analyze the problem. As the sample
size is not very large. You may want to look at both parametric and non-parametric
methods to see whether the conclusions are the same.
(b) Perform an F-test on whether there is significant difference on the readings of the
meter under the three background radiation levels at 5% significance level.
( c) Does the non-parametric test give the same result?
(5 marks)
MA3518: Applied Statistics
Page 4
Question 5:
2k factorial designs are widely used in experiments with k factors and only two levels
(high and low) for each factor. The simplest case is when k=2.
Consider an investigation into the effect of the concentration of the reactant (factor A)
and the amount of the catalyst (factor B) on the conversion (yield) in a chemical process.
The following data were obtained:
Treatments
Yield ( Replicate)
_________________________________________________
A low, B low
28 25 27
A high, B low
36 32 32
A low, B high
18 19 23
A high, B high
31 30 29
__________________________________________________
Perform a two-way ANOVA on the data and make your own conclusions using a
significance level of 10% throughout.
(5 marks)
~ End of Assignment 2~
Download