MA3518: Applied Statistics Page 1 Department of Mathematics Faculty of Science and Engineering City University of Hong Kong MA 3518: Applied Statistics Assignment 2 Question 1: Construct a SAS input to generate 30 random numbers from the normal distribution with mean=100 and sd=3 by using the function mean+sd*rannor(seed). (a) Describe the random sample thus obtained. What are the sample values of mean, sd, skewness and excess kurtosis ? Does the data satisfy the normality assumption? (b) Construct a SAS input to select about 40% of the numbers from the 30 numbers without replacement, ( said between 10 to 14 ). To show its randomness, your input should produce another subset when re-run. (c) Test the hypothesis that the average value of the subset in (b) is 100 at a significance level of 5%. (5 marks) Question 2: Download the data set containing the daily open, high, low and close values for the NASDAQ Index and the S&P500 Index from 23 March 2001 to 23 March 2004 (Remember to quote the source of the data set) and import the data set to the SAS work library. Alternatively, use the datafile A2Q2.csv Suppose we are interested in investigating the proportion of the total variation of the daily squared returns of the S&P500 Index (S2_t) explained by the daily squared returns of the NASDAQ Index (S1_t), the daily logarithmic returns of the S&P500 Index (LR2_t), the MA3518: Applied Statistics Page 2 daily ranges of the S&P500 Index (R2_t), the daily logarithmic returns of the NASDAQ Index (LR1_t) and the daily ranges of the NASDAQ Index (R1_t). (a) Fit a multiple linear regression model to facilitate the investigation. Are all of the factors significant in explaining the total variation of the daily volatility of the S&P500 Index at 10% significance level ? (b) Create a plot of the observed daily squared returns of the S&P500 Index against the corresponding predicted values and comment on the plot. ( c) Create a plot of the studentized residuals against the daily logarithmic returns of the NASDAQ Index and comment on the plot . (5 marks) Question 3: Consider the problem in Question 2 again. Suppose we are interested in investigating the best model for the daily squared returns of the S&P500 Index. (a) Select the “best” regression model by the best subset selection model by the following criteria: (i) (ii) (iii) R2 Adjusted R2 Mallow Cp (b) Select the “best” regression model by forward selection with significance level 5%. (c) Select the “best” regression model by backward elimination with significance level 10%. (d) Select your best choice and verified that all factors are significant in explaining the total variation of the Index at 10% significance. (5 marks) MA3518: Applied Statistics Page 3 Question 4: Consider the following data of the radiation readings from a nuclear source measured by a laboratory meter with various levels of background radiation levels: Low 1.25 1.17 1.32 1.18 1.62 1.11 1.32 1.31 1.33 Medium High 1.28 1.12 1.36 1.33 1.12 1.26 1.22 1.30 1.36 1.28 1.21 1.18 1.33 1.10 1.28 1.16 1.13 1.62 Suppose you are a statistical consultant of the manufacturer of the laboratory meter and want to provide some advice on how the readings of the meter are affected by the background radiation levels. (a) State the statistical model you are going to use to analyze the problem. As the sample size is not very large. You may want to look at both parametric and non-parametric methods to see whether the conclusions are the same. (b) Perform an F-test on whether there is significant difference on the readings of the meter under the three background radiation levels at 5% significance level. ( c) Does the non-parametric test give the same result? (5 marks) MA3518: Applied Statistics Page 4 Question 5: 2k factorial designs are widely used in experiments with k factors and only two levels (high and low) for each factor. The simplest case is when k=2. Consider an investigation into the effect of the concentration of the reactant (factor A) and the amount of the catalyst (factor B) on the conversion (yield) in a chemical process. The following data were obtained: Treatments Yield ( Replicate) _________________________________________________ A low, B low 28 25 27 A high, B low 36 32 32 A low, B high 18 19 23 A high, B high 31 30 29 __________________________________________________ Perform a two-way ANOVA on the data and make your own conclusions using a significance level of 10% throughout. (5 marks) ~ End of Assignment 2~