Chu Hai College - City University of Hong Kong

advertisement
MA3518: Applied Statistics
Page 1
Department of Mathematics
Faculty of Science and Technology
City University of Hong Kong
MA 3518: Applied Statistics
Assignment 1 (Suggested Solutions)
Question 1:
(5 marks)
(a) The SAS procedure is given by:
Data Portfolio;
INPUT Date $ Close;
CARDS;
14Sep03 918.63
15Sep03 814.81
16Sep03 829.32
17Sep03 925.97
18Sep03 1018.38
19Sep03 1033.16
22Sep03 1012.12
23Sep03 1033.03
24Sep03 889.38
25Sep03 906.27
;
RUN;
PROC PRINT Data = Portfolio;
RUN;
The SAS output is given by:
The SAS System
Obs
Date
1
2
3
4
5
6
7
8
9
10
14Sep03
15Sep03
16Sep03
17Sep03
18Sep03
19Sep03
22Sep03
23Sep03
24Sep03
25Sep03
Close
918.63
814.81
829.32
925.97
1018.38
1033.16
1012.12
1033.03
889.38
906.27
20:34 Friday, March 5, 2004 1
MA3518: Applied Statistics
Page 2
(b) The SAS procedure is given by:
PROC MEANS Data = Portfolio MEAN STD alpha = 0.05 CLM;
RUN;
The SAS output is given by:
The SAS System
20:34 Friday, March 5, 2004 5
The MEANS Procedure
Analysis Variable : Close
Mean
Lower 95%
CL for Mean
Std Dev
Upper 95%
CL for Mean
-----------------------------------------------------------------------------------------938.1070000
82.2722205 879.2529989 996.9610011
------------------------------------------------------------------------------------------
(c) The SAS procedure is given by:
PROC UNIVARIATE Data = Portfolio;
RUN;
The SAS output is given by:
The SAS System
20:34 Friday, March 5, 2004 10
The UNIVARIATE Procedure
Variable: Close
Moments
N
10
Sum Weights
10
Mean
938.107
Sum Observations
9381.07
Std Deviation 82.2722205
Variance
6768.71827
Skewness
-0.170138
Kurtosis
-1.4222355
Uncorrected SS 8861365.9 Corrected SS
60918.4644
Coeff Variation 8.77002522 Std Error Mean 26.0167605
Basic Statistical Measures
Location
Variability
Mean 938.1070 Std Deviation
82.27222
Median 922.3000 Variance
6769
Mode
.
Range
218.35000
Interquartile Range 129.00000
MA3518: Applied Statistics
Page 3
Tests for Location: Mu0=0
Test
-Statistic- -----p Value------
Student's t t
36.05779 Pr > |t| <.0001
Sign
M
5
Pr >= |M| 0.0020
Signed Rank S
27.5
Pr >= |S| 0.0020
Quantiles (Definition 5)
Quantile
Estimate
100% Max 1033.160
99%
1033.160
95%
1033.160
90%
1033.095
75% Q3
1018.380
50% Median 922.300
25% Q1
889.380
10%
822.065
5%
814.810
1%
814.810
0% Min
814.810
The SAS System
20:34 Friday, March 5, 2004 11
The UNIVARIATE Procedure
Variable: Close
Extreme Observations
-----Lowest-----
-----Highest-----
Value
Obs
Value
814.81
829.32
889.38
906.27
918.63
2
3
9
10
1
925.97
1012.12
1018.38
1033.03
1033.16
Obs
4
7
5
8
6
From the SAS output, the skewness and the excess kurtosis are given by -0.170138 and
-1.4222355, respectively. Hence, the distribution of the data is negatively skewed
and has a lighter tail than the tail of a normal distribution
MA3518: Applied Statistics
Page 4
Question 2: (5 marks)
(a) The data were obtained from Yahoo Finance.
The SAS procedure is given by:
Data NASDAQ;
INPUT Date $ Open High Low Close Volume;
CARDS;
3Oct03 1864.54 1891.62 1864.54 1880.57 20145800
2Oct03 828.94 1842.55 1823.64 1836.22 16040900
1Oct03 1797.07 1832.25 1796.09 1832.25 18217400
30Sep03 1812.81 1812.81 1783.46 1786.94 18642400
29Sep03 1801.55 1824.59 1786.57 1824.56 16669300
26Sep03 1816.75 1821.57 1792.06 1792.07 18415300
25Sep03 1849.39 1856.22 1817.20 1817.24 20330600
24Sep03 1903.81 1904.13 1843.43 1843.70 22079700
23Sep03 1877.44 1901.73 1875.15 1901.72 18688000
22Sep03 1881.42 1881.42 1866.88 1874.62 17200800
;
RUN;
PROC PRINT Data = NASDAQ (obs = 10);
RUN;
The SAS output is given by:
The SAS System
Obs
Date
1
2
3
4
5
6
7
8
9
10
3Oct03
2Oct03
1Oct03
30Sep03
29Sep03
26Sep03
25Sep03
24Sep03
23Sep03
22Sep03
Open
High
Low
20:54 Friday, March 5, 2004 2
Close
Volume
1864.54 1891.62 1864.54 1880.57 20145800
828.94 1842.55 1823.64 1836.22 16040900
1797.07 1832.25 1796.09 1832.25 18217400
1812.81 1812.81 1783.46 1786.94 18642400
1801.55 1824.59 1786.57 1824.56 16669300
1816.75 1821.57 1792.06 1792.07 18415300
1849.39 1856.22 1817.20 1817.24 20330600
1903.81 1904.13 1843.43 1843.70 22079700
1877.44 1901.73 1875.15 1901.72 18688000
1881.42 1881.42 1866.88 1874.62 17200800
MA3518: Applied Statistics
Page 5
(b) The SAS procedure is given by:
PROC UNIVARIATE Data = NASDAQ plot;
RUN;
The SAS output is given by (For saving spaces, I only display the results for ‘High’):
The SAS System
20:54 Friday, March 5, 2004 6
The UNIVARIATE Procedure
Variable: High
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation
10
1856.889
35.1144002
0.22757532
34491464.8
1.89103388
Sum Weights
10
Sum Observations
18568.89
Variance
1233.0211
Kurtosis
-1.8025124
Corrected SS
11097.1899
Std Error Mean 11.1041483
Basic Statistical Measures
Location
Variability
Mean 1856.889 Std Deviation
Median 1849.385 Variance
Mode
.
Range
Interquartile Range
35.11440
1233
91.32000
67.03000
Tests for Location: Mu0=0
Test
-Statistic-
Student's t t 167.2248
Sign
M
5
Signed Rank S
27.5
-----p Value-----Pr > |t| <.0001
Pr >= |M| 0.0020
Pr >= |S| 0.0020
Quantiles (Definition 5)
Quantile
Estimate
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
1904.13
1904.13
1904.13
1902.93
1891.62
1849.39
1824.59
1817.19
1812.81
1812.81
1812.81
MA3518: Applied Statistics
Page 6
From the SAS output, the skewness and excess kurtosis for ‘Open’ are given by
0.27472513 and -1.3856295 respectively. Hence, the distribution for ‘Open’ is positively
skewed and has a lighter tail than the tail of a normal distribution.
The skewness and excess kurtosis for ‘High’ are given by 0.22757532 and -1.8025124,
respectively. Hence, the distribution for ‘High’ is positively skewed and has a lighter tail
than the tail of a normal distribution.
The skewness and excess kurtosis for ‘Low’ are given by 0.25460319 and -1.7312382,
respectively. Hence, the distribution for ‘Low’ is positively skewed and has a lighter tail
than the tail of a normal distribution.
The skewness and excess kurtosis for ‘Close’ are given by 0.28149912 and -0.7160249,
respectively. Hence, the distribution for ‘Close’ is positively skewed and has a lighter tail
than the tail of a normal distribution.
The skewness and excess kurtosis for ‘Volume’ are given by 0.46777939 and 0.0263034, respectively. Hence, the distribution for ‘Volume’ is positively skewed and
has a lighter tail than the tail of a normal distribution.
Plotting will give stem-leaf, boxplot and normal probability plot.
(c) The SAS procedure is given by:
Data NASDAQ;
INPUT Date $ Open High Low Close Volume;
Difference = Open - Close;
CARDS;
3Oct03 1864.54 1891.62 1864.54 1880.57 20145800
2Oct03 828.94 1842.55 1823.64 1836.22 16040900
1Oct03 1797.07 1832.25 1796.09 1832.25 18217400
30Sep03 1812.81 1812.81 1783.46 1786.94 18642400
29Sep03 1801.55 1824.59 1786.57 1824.56 16669300
26Sep03 1816.75 1821.57 1792.06 1792.07 18415300
25Sep03 1849.39 1856.22 1817.20 1817.24 20330600
24Sep03 1903.81 1904.13 1843.43 1843.70 22079700
23Sep03 1877.44 1901.73 1875.15 1901.72 18688000
22Sep03 1881.42 1881.42 1866.88 1874.62 17200800
;
RUN;
PROC TTEST Data = NASDAQ;
VAR Difference;
RUN;
The relevant part of the SAS output is given by:
T-Tests
Variable
DF
t Value
Pr > |t|
MA3518: Applied Statistics
Difference
Page 7
9
0.45
0.6617
From the SAS output, the p-value of the two-sided paired-t test for the equality of the two
population means is 0.6617 which is greater than 5%. Hence, we do not reject the null
hypothesis and conclude that there is not significant difference between the two
population means at 5% significance level
Question 3:
(5 marks)
(a) The SAS procedure is given by:
Data Performance;
INPUT Portfolio $ Return @@;
Datalines;
A 8.5 A 10.3 A 12.1 A 8.6 A 7.3 A 11.3 A 12.1 A 11.6 A 10.8 A 8.6 A 8.2
B 6.2 B 6.8 B 7.1 B 8.6 B 10.2 B 6.8 B 8.1 B 7.5 B 6.5 B 10.2 B 11.7
;
RUN;
(b) The SAS procedure is given by
PROC TTEST Data = Performance;
Class Portfolio;
Var Return;
RUN;
The SAS output is given by:
The SAS System
16:27 Saturday, March 6, 2004 2
The TTEST Procedure
Statistics
Lower CL
Upper CL Lower CL
Upper CL
Mean Mean Mean Std Dev Std Dev Std Dev Std Err
Variable Portfolio
N
Return
A
Return
B
Return Diff (1-2)
11 8.7728 9.9455 11.118 1.2196 1.7455 3.0632 0.5263
11 6.9359 8.1545 9.3732 1.2675 1.814 3.1835 0.547
0.2076 1.7909 3.3742 1.3619 1.7801 2.5706 0.759
T-Tests
Variable Method
Return
Return
Variances
Pooled
Equal
Satterthwaite Unequal
DF t Value Pr > |t|
20
20
2.36
2.36
0.0286
0.0286
MA3518: Applied Statistics
Page 8
Equality of Variances
Variable Method
Return
Num DF Den DF F Value Pr > F
Folded F
10
10
1.08
0.9054
From the SAS output, the p-value of the test of the equality of variances is 0.9054 which
is much greater than 5% significance level. Hence, we do not reject the null hypothesis
and conclude that the population variances of the two samples are equal.
In this case, we adopt the pooled t-test for testing the equality of the population means.
The p-value of the two-sided t-test is 0.0286 which is less than 5% significance level.
Hence, we reject the null hypothesis and conclude that there is significant difference
between the annual percentage returns from the two portfolios at 5% significance level.
Question 4: (5 marks)
(a) The data were obtained from Yahoo Finance sorted in ascending order.
The SAS procedure is given by:
PROC REG Data = A1Q4;
MODEL S_t = R_t;
RUN;
The SAS output is given by:
The SAS System
19:57 Saturday, March 6, 2004 1
The REG Procedure
Model: MODEL1
Dependent Variable: S_t
Analysis of Variance
DF
Sum of
Squares
Model
1
Error
1009
Corrected Total 1010
409955
398352
808306
Source
Root MSE
19.86954
Dependent Mean 12.97410
Coeff Var
153.14774
Mean
Square
F Value Pr > F
409955
1038.39
394.79854
R-Square 0.5072
Adj R-Sq 0.5067
<.0001
MA3518: Applied Statistics
Page 9
Parameter Estimates
Variable
DF
Intercept
R_t
1
1
Parameter
Estimate
Standard
Error t Value Pr > |t|
-18.76952
6.79358
1.16658 -16.09
0.21082 32.22
<.0001
<.0001
The fitted linear regression model is given by:
Set = -18.76952 + 6.79358Rt
(b) From the SAS output, the value of R2 is 0.5072 which is not close to zero. Hence, the
regression model can fit the data .
(c) The p-value of the F-test for the full model is <0.0001. Hence, we reject the null hypothesis
and conclude that the regression coefficient is significantly different from zero at 5% significance
level
The p-value of the t-test for the intercept is less than 0.0001. Hence, we reject the null hypothesis
and conclude that the intercept is significantly different from zero at 5% significance level.
The p-value of the t-test for the coefficient of Rt is less than 0.0001. Hence, we reject
the null hypothesis and conclude that the coefficient is significantly different from
zero at 5% significance level.
Question 5:
(5 marks)
(a) The data were obtained from Yahoo Finance.
The SAS procedure is given by:
PROC REG Data = A1Q5;
MODEL S_t = LNR_t R_t V_t;
RUN;
The SAS output is given by:
The SAS System
The REG Procedure
Model: MODEL1
Dependent Variable: S_t
20:32 Saturday, March 6, 2004 1
MA3518: Applied Statistics
Page 10
Analysis of Variance
Source
Sum of
Squares
DF
Model
3
Error
2009
Corrected Total 2012
Mean
Square
173291707
156252667
329544374
Root MSE
Dependent Mean
Coeff Var
F Value Pr > F
57763902
77776
278.88410 R-Square
179.36669 Adj R-Sq
155.48266
742.69 <.0001
0.5259
0.5251
Parameter Estimates
Variable
DF
Intercept
LNR_t
R_t
V_t
1
1
1
1
Parameter
Estimate
Standard
Error
t Value Pr > |t|
-248.13599
11.20760 -22.14
803.17273 513.35501
1.56
27.45329
0.58860 46.64
-3.0534E-8 8.753463E-9
-3.49
<.0001
0.1178
<.0001
0.0005
From the SAS output, the fitted regression model is given by:
S_t = - 248.13599 + 803.17273 LNRt + 27.45329 Rt – 3.0534  10-8 Vt + et
(b) Since the sample size is large, there is not much different for us to look at the R2 or
the adjusted R2 even though we have several explanatory variables. From the SAS
output, the value of R2 (adjusted R2) is given by 0.5259 (0.5251). Hence, the regression
model can fit the data reasonably well
(c) The p-value of the F-test for the full model is less than 0.0001. Hence, we reject the null
hypothesis and conclude that the full model can fit the data well at 5% significance level
The p-value of the t-test for the intercept is less than 0.0001. Hence, we reject the null hypothesis
and conclude that the intercept is significantly different from zero at 5% significance level.
The p-value of the t-test for the coefficient of LNRt is 0.1178. Hence, we do not reject
the null hypothesis and conclude that the coefficient is not significantly different from
zero at 5% significance level.
The p-value of the t-test for the coefficient of Rt is is less than 0.0001. Hence, we
reject the null hypothesis and conclude that the coefficient is significantly different
from zero at 5% significance level.
MA3518: Applied Statistics
Page 11
The p-value of the t-test for the coefficient of Vt is 0.0005. Hence, we reject the null
hypothesis and conclude that the coefficient is significantly different from zero at 5%
significance level.
~ End of Solutions to Assignment 1~
Download