STATISTICS 479 Exam II (100 points)

advertisement
Fall 2011
Name
STATISTICS 479
Exam II (100 points)
1. A SAS data set world was created using the following input statement and various demographic
variables for 60 countries:
input country $ popurban percgnp birthrat deathrat lifeexp;
Category variables popgrp and gnpgrp were created corresponding to % urban population and per
capita gnp. Answer parts(a) to (e) below.
(a) (2) Give the name of a SAS procedure that enables you to create a variety of tables, that may
be highly customized, of many different statistics computed for variables such as birthrat,
classified by variables such as popgrp and gnpgrp.
(b) (2) Give the name of a SAS procedure that allows you to produce side-by-side boxplots in high
resolution graphics (without using SAS/GRAPH statements such as symbol or axis).
(c) (2) Give the name of a SAS procedure (that is not a SAS/GRAPH procedure) that you may
use to obtain a normal probability plot of a variable such as popurban in high resolution
graphics.
(d) (2) Give the name of a SAS procedure that you may use to obtain a scatter plot matrix of the
4 variables popurban, percgnp, birthrat, and deathrat, in high resolution graphics.
(e) (2) Give the name of a SAS procedure that enables you to assign descriptive strings (e.g.
”Medium”) to be displayed instead of the numeric values of category variables such as popgrp
and gnpgrp on output.
2. (a) (3) The following statement is included in a proc freq step, where the data are observed
counts for combinations of popgrp and gnpgrp in the SAS data set world :
tables popgrp*gnpgrp/chisq expected cellchi2 norow nocol nopercent;
Explain the purpose that you may use the χ2 statistic this statement will produce.
(b) (3) The following statement is included in a proc tabulate step analyzing the SAS data set
world:
table popgrp*gnpgrp,birthrat*(n*f=4.0 mean*f=9.4 stderr*f=9.4);
Explain as much as possible the SAS output this statement will produce.
1
(c) (3) The following statement is included in a proc gchart step using the world data set as
input:
vbar lifeexp/midpoints= 50 to 80 by 5 type=freq;
Explain as much as possible the SAS output this statement will produce.
(d) (3) The following statement is included in a proc sgplot step using the world data set as
input:
hbar popgrp/response=deathrat stat=mean group=gnpgrp;
Explain as much as possible the SAS output this statement will produce.
3. Daily ozone concentrations (in ppb.) for an Eastern city in the U.S. for a period of 152 consecutive
days are graphed in a box plot. Answer the questions given below:
∗
+
10
20
30
40
50
60
70
80
90
100
110
(a) (3) Give the name and value of a measure of the location of this distribution.
(b) (3) What is the shape of the distribution of the ozone measurements suggested by the boxplot?
(c) (3) Give the name and value of a measure of spread of this distribution.
(d) (3) For how many days during this period is the daily ozone concentration below 30 ppb.?
4. (4) Use the following plot to say in what way the distribution of the data is different from that of
a normal distribution:
6
4
2
0
Ordered Data
8
10
Normal Probabilty Plot
−2
−1
0
1
2
Normal Percentiles
2
5. Below is a Q-Q plot of carbon monoxide (CO) concentrations in the air, measured on successive
Sundays and Fridays for several months in Linden, Mass.:
(a) (2) What does this tell you about the shapes of distributions of CO measurements on these
two days?
(b) (2) Using the above graph, compare the median CO concenrations on the two days.
(c) (2) Using the above graph, compare the variability of CO concenrations on the two days.
6. Below is a Quantile plot of the Stamford ozone data used in class:
(a) (2) Explain what is plotted on the vertical axis.
(b) (2) Approximately, estimate the 40th percentile
(c) (2) Approximately, estimate the percentage of days having ozone concentrations of 150 ppb or
more
3
7. Consider following data:
x
y
10
120
20
115
21
250
27
210
29
300
33
330
40
295
44
400
52
380
56
460
62
125
68
510
The procedure reg in SAS was used to an perform analysis of this set of data using the model
y = β0 + β1 x1 + ǫ
where ǫi are assumed to be independently distributed as N (0, σ 2 ) variables. Answer the following
questions based on the results appearing in the output attached to the end of the question.
i) (2) Give the regression sum of squares and its degrees of freedom. What is the value of R2 ?
ii) (2) Give the residual sum of squares and its degrees of freedom. What is the estimate s2 of σ 2 ?
iii (2) What is the F statistic for testing H0 : β1 = 0 vs. Ha : β1 6= 0? Make a decision based on
the p-value.
iv) (4) Using the estimate of β1 and its standard error, compute the t-statistic for testing
H0 : β1 = 0 vs. Ha : β1 6= 0.
v) (4) Using the estimate of β1 and its standard error, compute a 95% confidence interval for β1 .
vi) (4) Use the value of h22 to compute the standard error of the residual for observation 2.
vii) (4) Use the value of h22 to compute the standard error of ŷ2
4
viii) (4) Use the the residual for observation 2 and its standard error to calculate the corresponding
studentized residual.
ix) (4) Use RStudent to conduct the Bonferroni test procedure for y-outliers using Table B.10
supplied using α = .05.
x) (4) Find any cases, if any, that may be x-outliers explaining why you selected these.
xi) (4) Find any cases, if any, that may be influential explaining why you selected these.
xii) (4) If you find any case to be influential, explain why this case should be examined further.
xiii) (4) Examine the residuals vs. predicted values plot. State what you think this plot indicates
about the fit of the data to a straight line model.
xiv) (4) Examine normal probablity plot of the studentized residuals. Does it show that the assumptions made about the model are plausible? Explain why or why not?
5
Simple Linear Regression of Data for Exam II: Fall 2011
1
The REG Procedure
Number of Observations Read
Number of Observations Used
12
12
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
1
10
11
77201
116755
193956
77201
11676
Root MSE
Dependent Mean
Coeff Var
108.05335
291.25000
37.09986
R-Square
Adj R-Sq
F Value
Pr > F
6.61
0.0278
0.3980
0.3378
Parameter Estimates
Variable
Intercept
x
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
114.35740
4.59461
75.53323
1.78680
1.51
2.57
0.1610
0.0278
Simple Linear Regression of Data for Exam II: Fall 2011
2
Output Statistics
Obs
1
2
3
4
5
6
7
8
9
10
11
12
Dependent Predicted
Std Error
Variable
Value Mean Predict
120.0000
115.0000
250.0000
210.0000
300.0000
330.0000
295.0000
400.0000
380.0000
460.0000
125.0000
510.0000
160.3035
206.2497
210.8443
238.4119
247.6012
265.9796
298.1419
316.5204
353.2773
371.6557
399.2234
426.7911
Residual
59.7176 -40.3035
45.4494 -91.2497
44.1668
39.1557
37.3522 -28.4119
35.5119
52.3988
32.7038
64.0204
31.3073
-3.1419
32.7038
83.4796
39.4312
26.7227
44.1668
88.3443
52.3078 -274.2234
61.2484
83.2089
Std Error Student
Residual Residual
90.052
98.030
98.614
101.4
102.1
103.0
103.4
103.0
100.6
98.614
94.549
89.018
-0.448
-0.931
0.397
-0.280
0.513
0.622
-0.0304
0.811
0.266
0.896
-2.900
0.935
-2-1 0 1 2
|
|
|
*|
|
|
|
|
|
|*
|
|*
|
|
|
|*
|
|
|
|*
| *****|
|
|*
|
|
|
|
|
|
|
|
|
|
|
|
Cook's
D
RStudent
Hat Diag
H
0.044
0.093
0.016
0.005
0.016
0.019
0.000
0.033
0.005
0.080
1.287
0.207
-0.4289
-0.9240
0.3797
-0.2669
0.4937
0.6015
-0.0288
0.7956
0.2529
0.8862
-6.9047
0.9283
0.3054
0.1769
0.1671
0.1195
0.1080
0.0916
0.0839
0.0916
0.1332
0.1671
0.2343
0.3213
B Tables
539
Table B.10. 5% critical values based on the Bonferroni bounds for the t-test for a
single outlier using externally studentized residual in a linear regression model.
k
n
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
50
60
70
80
90
100
1
2
3
9.92
6.23
5.07
4.53
4.22
4.03
3.90
3.81
3.74
3.69
3.65
3.62
3.59
3.57
3.56
3.54
3.53
3.52
3.52
3.51
3.50
3.50
3.50
3.50
3.49
3.49
3.49
3.49
3.49
3.49
3.49
3.49
3.49
3.49
3.49
3.49
3.51
3.53
3.55
3.57
3.59
3.60
63.66
10.89
6.58
5.26
4.66
4.32
4.10
3.96
3.86
3.79
3.73
3.68
3.65
3.62
3.60
3.58
3.57
3.55
3.54
3.53
3.53
3.52
3.52
3.51
3.51
3.51
3.50
3.50
3.50
3.50
3.50
3.50
3.50
3.50
3.50
3.50
3.51
3.53
3.55
3.57
3.59
3.60
76.39
11.77
6.90
5.44
4.77
4.40
4.17
4.02
3.91
3.83
3.77
3.72
3.68
3.65
3.62
3.60
3.59
3.57
3.56
3.55
3.54
3.54
3.53
3.53
3.52
3.52
3.52
3.52
3.51
3.51
3.51
3.51
3.51
3.51
3.51
3.52
3.54
3.55
3.57
3.59
3.60
4
5
6
7
8
9
10
11
12
89.12
12.59 101.86
7.18 13.36 114.59
5.60 7.45 14.09 127.32
4.88 5.75 7.70 14.78 140.05
4.49 4.98 5.89 7.94 15.44 152.79
4.24 4.56 5.08 6.02 8.16 16.08 165.52
4.07 4.30 4.63 5.16 6.14 8.37 16.69 178.25
3.95 4.12 4.36 4.70 5.25 6.25 8.58 17.28 190.98
3.87 4.00 4.17 4.41 4.76 5.33 6.36 8.77 17.85
3.80 3.90 4.04 4.21 4.46 4.82 5.40 6.47 8.95
3.75 3.83 3.94 4.08 4.26 4.51 4.88 5.47 6.57
3.71 3.78 3.86 3.97 4.11 4.30 4.55 4.93 5.54
3.67 3.73 3.81 3.89 4.00 4.15 4.33 4.59 4.98
3.65 3.70 3.76 3.83 3.92 4.03 4.18 4.37 4.64
3.63 3.67 3.72 3.78 3.86 3.95 4.06 4.21 4.40
3.61 3.65 3.69 3.75 3.81 3.88 3.98 4.09 4.24
3.59 3.63 3.67 3.71 3.77 3.83 3.91 4.00 4.12
3.58 3.61 3.65 3.69 3.73 3.79 3.85 3.93 4.02
3.57 3.60 3.63 3.66 3.70 3.75 3.81 3.87 3.95
3.56 3.58 3.61 3.65 3.68 3.72 3.77 3.83 3.89
3.55 3.58 3.60 3.63 3.66 3.70 3.74 3.79 3.84
3.55 3.57 3.59 3.62 3.64 3.68 3.71 3.76 3.81
3.54 3.56 3.58 3.60 3.63 3.66 3.69 3.73 3.77
3.54 3.55 3.57 3.59 3.62 3.64 3.67 3.71 3.74
3.53 3.55 3.57 3.59 3.61 3.63 3.66 3.69 3.72
3.53 3.54 3.56 3.58 3.60 3.62 3.64 3.67 3.70
3.53 3.54 3.56 3.57 3.59 3.61 3.63 3.66 3.68
3.52 3.54 3.55 3.57 3.58 3.60 3.62 3.64 3.67
3.52 3.54 3.55 3.56 3.58 3.60 3.61 3.63 3.66
3.52 3.53 3.55 3.56 3.57 3.59 3.61 3.62 3.65
3.52 3.53 3.54 3.56 3.57 3.58 3.60 3.62 3.64
3.52 3.53 3.54 3.55 3.57 3.58 3.59 3.61 3.63
3.52 3.53 3.54 3.55 3.56 3.58 3.59 3.60 3.62
3.53 3.53 3.54 3.54 3.55 3.56 3.57 3.57 3.58
3.54 3.54 3.55 3.55 3.56 3.56 3.57 3.57 3.58
3.56 3.56 3.56 3.56 3.57 3.57 3.57 3.58 3.58
3.57 3.58 3.58 3.58 3.58 3.58 3.59 3.59 3.59
3.59 3.59 3.59 3.60 3.60 3.60 3.60 3.60 3.60
3.61 3.61 3.61 3.61 3.61 3.61 3.61 3.62 3.62
(continued)
Download