MA3518: Applied Statistics Page 1 Department of Mathematics Faculty of Science and Engineering City University of Hong Kong MA 3518: Applied Statistics Tutorial 3 (Suggested Solutions) Question 1: (a) Let denote the average saving rate The null hypothesis is H0: = 0.2 and the alternative hypothesis is H1: < 0.2 This is a one-sided test (b) Since the sample size N is 100 which are greater than 30, the sampling distribution for the sample average X can be accurately approximated by a normal distribution with unknown mean and unknown variance 2/N by Central Limit Theorem, where 2 is the unknown population variance. An unbiased estimate for 2 is the sample variance s2 Test statistic under H0: T = ( X - 0.2) / (s / N ) ~ t (N - 1) where t (N - 1) denotes a students’ t-distribution with degree of freedom N-1 Since N is large, t (N - 1) tends to a standard normal distribution Hence, T follows a standard normal distribution approximately The observed value Tobs of T is given by: Tobs = (0.16 - 0.2) / (0.08 / 100 ) = -5 The p-value of the test is given by: MA3518: Applied Statistics Page 2 P(T < Tobs | H0) = P(T < - 5 | H0) = (-5) 0 The p-value can be interpreted as the likelihood that the test statistic is less than its observed value when H0 is true (c) Since the p-value is less than 0.05, we reject H0 and conclude that there is enough evidence to refute the economist’s claim at 5% significance level Question 2: (a) Let p denote the unknown proportion of first time buyers in the whole population of home buyers over the past three years The null hypothesis is H0: p = 0.4 and the alternative hypothesis is H0: p 0.4 This is a two-sided test (b) Let X denote the number of first time buyers in the sample and pe denote an estimator for the unknown population proportion p Clearly, X ~ Bin (500, p) Then, E( pe) = E(X/500) = p and Var( pe) = [p(1-p)] / 500 Since the sample size is greater than 30, the sampling distribution of pe can be accurately approximated by a normal distribution with unknown mean p and unknown variance [p(1-p)] / 500 Test statistic under H0: Z = (pe – 0.4) / [ 0.4(1 0.4) 1/2 ] ~ N(0, 1) approximately 500 The observed value Zobs of Z is given by: Zobs = (100/500 – 0.4) / [ 0.4(1 0.4) 1/2 ] = - 9.1287 500 For the two-sided test, the p-value is given by: P(Z > | Zobs | | H0) = 2 P(Z > | Zobs | | H0) = 2 P(Z > 9.1287 | H0) 0 Since the p-value is less than 0.05, we reject H0 and conclude that the percentage of home sales to first time buyers has changed from what it was three years ago MA3518: Applied Statistics Page 3 Question 3: (a) Let A and B denote the daily volatility for Stock A and Stock B, respectively The null hypothesis is H0: A = B and the alternative hypothesis is H1: A > B This is a one-sided test (b) Test statistic under H0: F = s12 / s22 ~ F(287, 287) where F(287, 287) is a F-distribution with degrees of freedom 287 and 287 The observed value Fobs for F is given by: Fobs = (0.5882)2 / (0.3256)2 = 3.2635 The p-value is given by: P(F > Fobs | H0) = P(F > 3.2635 | H0) 0 Since the p-value is less than 0.05, we reject the null hypothesis and conclude that there is no enough evidence to refute the claim made by the investment advisor Question 4: (a) The data were obtained from Yahoo Finance. It can be viewed from the course website with name ‘T3Q4.csv’ We use the ‘Import Data’ option to import the data directly to the SAS Work Library and create a dataset with name ‘T3Q4’ The SAS procedure is shown as follows: PROC UNIVARIATE Data = T3Q4; RUN; The SAS Output is given by: MA3518: Applied Statistics Page 4 The SAS System 15:55 Wednesday, February 11, 2004 1 The UNIVARIATE Procedure Variable: Close Moments N 3276 Sum Weights Mean 784.4712 Sum Observations Std Deviation 375.291338 Variance Skewness 0.39966166 Kurtosis Uncorrected SS 2477296978 Corrected SS Coeff Variation 47.8400402 Std Error Mean 3276 2569927.65 140843.588 -1.3135314 461262751 6.55687031 Basic Statistical Measures Location Variability Mean 784.4712 Std Deviation Median 667.4800 Variance Mode 375.3500 Range Interquartile Range 375.29134 140844 1232 676.66000 NOTE: The mode displayed is the smallest of 2 modes with a count of 3. Tests for Location: Mu0=0 Test -Statistic- Student's t t 119.6411 Sign M 1638 Signed Rank S 2683863 -----p Value-----Pr > |t| <.0001 Pr >= |M| <.0001 Pr >= |S| <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 1527.46 99% 1493.74 95% 1419.89 90% 1347.35 75% Q3 1122.71 50% Median 667.48 25% Q1 446.05 10% 377.75 5% 340.08 1% 312.49 0% Min 295.46 The SAS System 15:55 Wednesday, February 11, 2004 2 The UNIVARIATE Procedure MA3518: Applied Statistics Page 5 Variable: Close Extreme Observations -----Lowest----- -----Highest----- Value Obs Value 295.46 298.76 298.92 300.03 300.40 3108 3104 3105 3107 3109 1517.68 1520.77 1523.86 1527.35 1527.46 Obs 609 608 719 721 720 (b) From the SAS output, the skewness is 0.39966166. Hence, the distribution of the data is slightly positively skewed (c) From the SAS output, the excess kurtosis is -1.3135314. Hence, the distribution of the data has a lighter tail than a normal distribution. Question 5: (a) The data were obtained from Yahoo Finance. It can be viewed from the course website with name ‘T3Q5.csv’ We use the ‘Import Data’ option to import the data directly to the SAS Work Library and create a dataset with name ‘T3Q5’ The SAS procedure is shown as follows: PROC MEANS Data = T3Q5 alpha = 0.05 CLM; RUN; From the SAS output, a 95% confidence interval for the mean of the daily close values for NASDAQ is (2181.08, 2270.02) (b) The SAS procedure is shown as follows: PROC UNIVARIATE Data = T3Q5; RUN; The part of SAS output for the t test is shown as follows: MA3518: Applied Statistics Page 6 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 119.6411 Sign M 1638 Signed Rank S 2683863 Pr > |t| <.0001 Pr >= |M| <.0001 Pr >= |S| <.0001 From the SAS output, the p-value for the t test is less than 0.0001. Hence, we reject the null hypothesis and conclude that the mean of the daily close values for NASDAQ is greater than zero at 5% significance level (c) The result in part (b) is an approximate one since we do not know whether the data comes from a normal distribution or not Question 6: (a) The SAS procedure is shown as follows: Data Prices; INPUT Company $ Close @@; Datalines; A 21.1 A 28.3 A 17.1 A 16.6 A 28.5 A 25.1 A 13.5 A 30.2 A 20.8 A 16.6 A 12.2 B 20.2 B 36.6 B 29.8 B 28.8 B 38.8 B 36.8 B 38.8 B 37.8 B 35.8 B 38.2 B 28.7 RUN; (b) The SAS procedure to perform the t test is shown as follows: PROC TTEST COCHRAN; CLASS Company; VAR Close; RUN; The SAS output is shown as follows: The SAS System 15:55 Wednesday, February 11, 2004 14 The TTEST Procedure Statistics Variable Company Close Close Close A B Diff (1-2) N 11 11 Lower CL Upper CL Lower CL Upper CL Mean Mean Mean Std Dev Std Dev Std Dev Std Err 16.668 20.909 29.644 33.664 -18.23 -12.75 T-Tests 25.15 4.4112 6.3132 11.079 1.9035 37.683 4.1804 5.983 10.5 1.8039 -7.284 4.7054 6.1503 8.8815 2.6225 MA3518: Applied Statistics Variable Method Close Close Close Page 7 Variances Pooled Equal Satterthwaite Unequal Cochran Unequal DF t Value Pr > |t| 20 19.9 10 -4.86 -4.86 -4.86 <.0001 <.0001 0.0007 Equality of Variances Variable Method Close Folded F Num DF Den DF F Value Pr > F 10 10 1.11 0.8684 From the SAS output, the p-value for the variance ratio test is 0.8684 > 0.05. Hence, we do not reject the null hypothesis and conclude that the population variances of the two samples are the same at 5% significance level Since the population variances are equal but unknown, the t test with pooled estimate of the population variances is appropriate From the SAS output, the p-value is less than 0.0001. Hence, we reject the null hypothesis and conclude that the means of the daily close prices of the two stocks are different at 5% significance level ~ End of the Solutions~