MA3518: Applied Statistics Page 1 Department of Mathematics Faculty of Science and Technology City University of Hong Kong MA 3518: Applied Statistics Assignment 1 (Suggested Solutions) Question 1: (5 marks) (a) The SAS procedure is given by: Data Portfolio; INPUT Date $ Close; CARDS; 14Sep03 918.63 15Sep03 814.81 16Sep03 829.32 17Sep03 925.97 18Sep03 1018.38 19Sep03 1033.16 22Sep03 1012.12 23Sep03 1033.03 24Sep03 889.38 25Sep03 906.27 ; RUN; PROC PRINT Data = Portfolio; RUN; The SAS output is given by: The SAS System Obs Date 1 2 3 4 5 6 7 8 9 10 14Sep03 15Sep03 16Sep03 17Sep03 18Sep03 19Sep03 22Sep03 23Sep03 24Sep03 25Sep03 Close 918.63 814.81 829.32 925.97 1018.38 1033.16 1012.12 1033.03 889.38 906.27 20:34 Friday, March 5, 2004 1 MA3518: Applied Statistics Page 2 (b) The SAS procedure is given by: PROC MEANS Data = Portfolio MEAN STD alpha = 0.05 CLM; RUN; The SAS output is given by: The SAS System 20:34 Friday, March 5, 2004 5 The MEANS Procedure Analysis Variable : Close Mean Lower 95% CL for Mean Std Dev Upper 95% CL for Mean -----------------------------------------------------------------------------------------938.1070000 82.2722205 879.2529989 996.9610011 ------------------------------------------------------------------------------------------ (c) The SAS procedure is given by: PROC UNIVARIATE Data = Portfolio; RUN; The SAS output is given by: The SAS System 20:34 Friday, March 5, 2004 10 The UNIVARIATE Procedure Variable: Close Moments N 10 Sum Weights 10 Mean 938.107 Sum Observations 9381.07 Std Deviation 82.2722205 Variance 6768.71827 Skewness -0.170138 Kurtosis -1.4222355 Uncorrected SS 8861365.9 Corrected SS 60918.4644 Coeff Variation 8.77002522 Std Error Mean 26.0167605 Basic Statistical Measures Location Variability Mean 938.1070 Std Deviation 82.27222 Median 922.3000 Variance 6769 Mode . Range 218.35000 Interquartile Range 129.00000 MA3518: Applied Statistics Page 3 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 36.05779 Pr > |t| <.0001 Sign M 5 Pr >= |M| 0.0020 Signed Rank S 27.5 Pr >= |S| 0.0020 Quantiles (Definition 5) Quantile Estimate 100% Max 1033.160 99% 1033.160 95% 1033.160 90% 1033.095 75% Q3 1018.380 50% Median 922.300 25% Q1 889.380 10% 822.065 5% 814.810 1% 814.810 0% Min 814.810 The SAS System 20:34 Friday, March 5, 2004 11 The UNIVARIATE Procedure Variable: Close Extreme Observations -----Lowest----- -----Highest----- Value Obs Value 814.81 829.32 889.38 906.27 918.63 2 3 9 10 1 925.97 1012.12 1018.38 1033.03 1033.16 Obs 4 7 5 8 6 From the SAS output, the skewness and the excess kurtosis are given by -0.170138 and -1.4222355, respectively. Hence, the distribution of the data is negatively skewed and has a lighter tail than the tail of a normal distribution MA3518: Applied Statistics Page 4 Question 2: (5 marks) (a) The data were obtained from Yahoo Finance. The SAS procedure is given by: Data NASDAQ; INPUT Date $ Open High Low Close Volume; CARDS; 3Oct03 1864.54 1891.62 1864.54 1880.57 20145800 2Oct03 828.94 1842.55 1823.64 1836.22 16040900 1Oct03 1797.07 1832.25 1796.09 1832.25 18217400 30Sep03 1812.81 1812.81 1783.46 1786.94 18642400 29Sep03 1801.55 1824.59 1786.57 1824.56 16669300 26Sep03 1816.75 1821.57 1792.06 1792.07 18415300 25Sep03 1849.39 1856.22 1817.20 1817.24 20330600 24Sep03 1903.81 1904.13 1843.43 1843.70 22079700 23Sep03 1877.44 1901.73 1875.15 1901.72 18688000 22Sep03 1881.42 1881.42 1866.88 1874.62 17200800 ; RUN; PROC PRINT Data = NASDAQ (obs = 10); RUN; The SAS output is given by: The SAS System Obs Date 1 2 3 4 5 6 7 8 9 10 3Oct03 2Oct03 1Oct03 30Sep03 29Sep03 26Sep03 25Sep03 24Sep03 23Sep03 22Sep03 Open High Low 20:54 Friday, March 5, 2004 2 Close Volume 1864.54 1891.62 1864.54 1880.57 20145800 828.94 1842.55 1823.64 1836.22 16040900 1797.07 1832.25 1796.09 1832.25 18217400 1812.81 1812.81 1783.46 1786.94 18642400 1801.55 1824.59 1786.57 1824.56 16669300 1816.75 1821.57 1792.06 1792.07 18415300 1849.39 1856.22 1817.20 1817.24 20330600 1903.81 1904.13 1843.43 1843.70 22079700 1877.44 1901.73 1875.15 1901.72 18688000 1881.42 1881.42 1866.88 1874.62 17200800 MA3518: Applied Statistics Page 5 (b) The SAS procedure is given by: PROC UNIVARIATE Data = NASDAQ plot; RUN; The SAS output is given by (For saving spaces, I only display the results for ‘High’): The SAS System 20:54 Friday, March 5, 2004 6 The UNIVARIATE Procedure Variable: High Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 10 1856.889 35.1144002 0.22757532 34491464.8 1.89103388 Sum Weights 10 Sum Observations 18568.89 Variance 1233.0211 Kurtosis -1.8025124 Corrected SS 11097.1899 Std Error Mean 11.1041483 Basic Statistical Measures Location Variability Mean 1856.889 Std Deviation Median 1849.385 Variance Mode . Range Interquartile Range 35.11440 1233 91.32000 67.03000 Tests for Location: Mu0=0 Test -Statistic- Student's t t 167.2248 Sign M 5 Signed Rank S 27.5 -----p Value-----Pr > |t| <.0001 Pr >= |M| 0.0020 Pr >= |S| 0.0020 Quantiles (Definition 5) Quantile Estimate 100% Max 99% 95% 90% 75% Q3 50% Median 25% Q1 10% 5% 1% 0% Min 1904.13 1904.13 1904.13 1902.93 1891.62 1849.39 1824.59 1817.19 1812.81 1812.81 1812.81 MA3518: Applied Statistics Page 6 From the SAS output, the skewness and excess kurtosis for ‘Open’ are given by 0.27472513 and -1.3856295 respectively. Hence, the distribution for ‘Open’ is positively skewed and has a lighter tail than the tail of a normal distribution. The skewness and excess kurtosis for ‘High’ are given by 0.22757532 and -1.8025124, respectively. Hence, the distribution for ‘High’ is positively skewed and has a lighter tail than the tail of a normal distribution. The skewness and excess kurtosis for ‘Low’ are given by 0.25460319 and -1.7312382, respectively. Hence, the distribution for ‘Low’ is positively skewed and has a lighter tail than the tail of a normal distribution. The skewness and excess kurtosis for ‘Close’ are given by 0.28149912 and -0.7160249, respectively. Hence, the distribution for ‘Close’ is positively skewed and has a lighter tail than the tail of a normal distribution. The skewness and excess kurtosis for ‘Volume’ are given by 0.46777939 and 0.0263034, respectively. Hence, the distribution for ‘Volume’ is positively skewed and has a lighter tail than the tail of a normal distribution. Plotting will give stem-leaf, boxplot and normal probability plot. (c) The SAS procedure is given by: Data NASDAQ; INPUT Date $ Open High Low Close Volume; Difference = Open - Close; CARDS; 3Oct03 1864.54 1891.62 1864.54 1880.57 20145800 2Oct03 828.94 1842.55 1823.64 1836.22 16040900 1Oct03 1797.07 1832.25 1796.09 1832.25 18217400 30Sep03 1812.81 1812.81 1783.46 1786.94 18642400 29Sep03 1801.55 1824.59 1786.57 1824.56 16669300 26Sep03 1816.75 1821.57 1792.06 1792.07 18415300 25Sep03 1849.39 1856.22 1817.20 1817.24 20330600 24Sep03 1903.81 1904.13 1843.43 1843.70 22079700 23Sep03 1877.44 1901.73 1875.15 1901.72 18688000 22Sep03 1881.42 1881.42 1866.88 1874.62 17200800 ; RUN; PROC TTEST Data = NASDAQ; VAR Difference; RUN; The relevant part of the SAS output is given by: T-Tests Variable DF t Value Pr > |t| MA3518: Applied Statistics Difference Page 7 9 0.45 0.6617 From the SAS output, the p-value of the two-sided paired-t test for the equality of the two population means is 0.6617 which is greater than 5%. Hence, we do not reject the null hypothesis and conclude that there is not significant difference between the two population means at 5% significance level Question 3: (5 marks) (a) The SAS procedure is given by: Data Performance; INPUT Portfolio $ Return @@; Datalines; A 8.5 A 10.3 A 12.1 A 8.6 A 7.3 A 11.3 A 12.1 A 11.6 A 10.8 A 8.6 A 8.2 B 6.2 B 6.8 B 7.1 B 8.6 B 10.2 B 6.8 B 8.1 B 7.5 B 6.5 B 10.2 B 11.7 ; RUN; (b) The SAS procedure is given by PROC TTEST Data = Performance; Class Portfolio; Var Return; RUN; The SAS output is given by: The SAS System 16:27 Saturday, March 6, 2004 2 The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Mean Mean Mean Std Dev Std Dev Std Dev Std Err Variable Portfolio N Return A Return B Return Diff (1-2) 11 8.7728 9.9455 11.118 1.2196 1.7455 3.0632 0.5263 11 6.9359 8.1545 9.3732 1.2675 1.814 3.1835 0.547 0.2076 1.7909 3.3742 1.3619 1.7801 2.5706 0.759 T-Tests Variable Method Return Return Variances Pooled Equal Satterthwaite Unequal DF t Value Pr > |t| 20 20 2.36 2.36 0.0286 0.0286 MA3518: Applied Statistics Page 8 Equality of Variances Variable Method Return Num DF Den DF F Value Pr > F Folded F 10 10 1.08 0.9054 From the SAS output, the p-value of the test of the equality of variances is 0.9054 which is much greater than 5% significance level. Hence, we do not reject the null hypothesis and conclude that the population variances of the two samples are equal. In this case, we adopt the pooled t-test for testing the equality of the population means. The p-value of the two-sided t-test is 0.0286 which is less than 5% significance level. Hence, we reject the null hypothesis and conclude that there is significant difference between the annual percentage returns from the two portfolios at 5% significance level. Question 4: (5 marks) (a) The data were obtained from Yahoo Finance sorted in ascending order. The SAS procedure is given by: PROC REG Data = A1Q4; MODEL S_t = R_t; RUN; The SAS output is given by: The SAS System 19:57 Saturday, March 6, 2004 1 The REG Procedure Model: MODEL1 Dependent Variable: S_t Analysis of Variance DF Sum of Squares Model 1 Error 1009 Corrected Total 1010 409955 398352 808306 Source Root MSE 19.86954 Dependent Mean 12.97410 Coeff Var 153.14774 Mean Square F Value Pr > F 409955 1038.39 394.79854 R-Square 0.5072 Adj R-Sq 0.5067 <.0001 MA3518: Applied Statistics Page 9 Parameter Estimates Variable DF Intercept R_t 1 1 Parameter Estimate Standard Error t Value Pr > |t| -18.76952 6.79358 1.16658 -16.09 0.21082 32.22 <.0001 <.0001 The fitted linear regression model is given by: Set = -18.76952 + 6.79358Rt (b) From the SAS output, the value of R2 is 0.5072 which is not close to zero. Hence, the regression model can fit the data . (c) The p-value of the F-test for the full model is <0.0001. Hence, we reject the null hypothesis and conclude that the regression coefficient is significantly different from zero at 5% significance level The p-value of the t-test for the intercept is less than 0.0001. Hence, we reject the null hypothesis and conclude that the intercept is significantly different from zero at 5% significance level. The p-value of the t-test for the coefficient of Rt is less than 0.0001. Hence, we reject the null hypothesis and conclude that the coefficient is significantly different from zero at 5% significance level. Question 5: (5 marks) (a) The data were obtained from Yahoo Finance. The SAS procedure is given by: PROC REG Data = A1Q5; MODEL S_t = LNR_t R_t V_t; RUN; The SAS output is given by: The SAS System The REG Procedure Model: MODEL1 Dependent Variable: S_t 20:32 Saturday, March 6, 2004 1 MA3518: Applied Statistics Page 10 Analysis of Variance Source Sum of Squares DF Model 3 Error 2009 Corrected Total 2012 Mean Square 173291707 156252667 329544374 Root MSE Dependent Mean Coeff Var F Value Pr > F 57763902 77776 278.88410 R-Square 179.36669 Adj R-Sq 155.48266 742.69 <.0001 0.5259 0.5251 Parameter Estimates Variable DF Intercept LNR_t R_t V_t 1 1 1 1 Parameter Estimate Standard Error t Value Pr > |t| -248.13599 11.20760 -22.14 803.17273 513.35501 1.56 27.45329 0.58860 46.64 -3.0534E-8 8.753463E-9 -3.49 <.0001 0.1178 <.0001 0.0005 From the SAS output, the fitted regression model is given by: S_t = - 248.13599 + 803.17273 LNRt + 27.45329 Rt – 3.0534 10-8 Vt + et (b) Since the sample size is large, there is not much different for us to look at the R2 or the adjusted R2 even though we have several explanatory variables. From the SAS output, the value of R2 (adjusted R2) is given by 0.5259 (0.5251). Hence, the regression model can fit the data reasonably well (c) The p-value of the F-test for the full model is less than 0.0001. Hence, we reject the null hypothesis and conclude that the full model can fit the data well at 5% significance level The p-value of the t-test for the intercept is less than 0.0001. Hence, we reject the null hypothesis and conclude that the intercept is significantly different from zero at 5% significance level. The p-value of the t-test for the coefficient of LNRt is 0.1178. Hence, we do not reject the null hypothesis and conclude that the coefficient is not significantly different from zero at 5% significance level. The p-value of the t-test for the coefficient of Rt is is less than 0.0001. Hence, we reject the null hypothesis and conclude that the coefficient is significantly different from zero at 5% significance level. MA3518: Applied Statistics Page 11 The p-value of the t-test for the coefficient of Vt is 0.0005. Hence, we reject the null hypothesis and conclude that the coefficient is significantly different from zero at 5% significance level. ~ End of Solutions to Assignment 1~