Statistics 479 Assignment #4 ...

advertisement
Statistics 479
Assignment #4
Answer Key
Fall 2013
1.
libname mylib "U:\Documents\Stat479\";
data fuelnew;
set mylib.fueldat;
if Percent=<54 then LicGrp=1;
else if 54<Percent=<58 then LicGrp=2;
else LicGrp=3;
label LicGrp="No. of Drivers";
run;
proc format;
value ig 1 = 'Low Income'
2 = 'Middle Income'
3 = 'High Income';
value lg 1='below 54%'
2='54 to 58%'
3='above 58%' ;
run;
ods rtf file="U:\Documents\Classwork\stat479\F13\as4n1_out.rtf";
proc tabulate data=fuelnew;
var Roads;
class IncomGrp LicGrp;
table IncomGrp*LicGrp,Roads*(n="Sample Size"*f=3. mean="Sample Mean"*f=8.4
std="Sample Standard Deviation"*f=8.2 stderr="Standard Error of the Mean"*f=8.2
max="Largest"*f=5.);
title "Analysis of Miles of Primary Highways by Per Capita Income Level and
Number of Licensed Driver";
format IncomGrp ig. LicGrp lg.;
run;
ods rtf close;
Miles of Primary Highways(in thsnds.)
Sample Standard
Sample Sample Standard Error of
Size
Mean Deviation the Mean Largest
Per capita Income No. of Drivers
Low Income
Middle Income
High Income
below 54%
7 4.4223
1.88
0.71
7
54 to 58%
5 4.9550
1.04
0.47
7
above 58%
1 3.2740
.
.
3
below 54%
2 5.3125
0.89
0.63
6
54 to 58%
7 7.0839
5.78
2.18
18
above 58%
9 5.7708
2.73
0.91
10
below 54%
5 7.8862
5.39
2.41
14
54 to 58%
6 4.2115
2.87
1.17
9
above 58%
6 5.1810
3.42
1.40
10
2.
data jurors;
input Ethnicity Count @@;
datalines;
1 2030 2 313 3 10 4 57 5 90
;
proc format;
value ef 1='White'
2='Black'
3='Native American'
4='Asian'
5= Other;
run;
ods rtf file="U:\Documents\Classwork\stat479\F13\as4n2_out.rtf";
proc freq data=jurors;
weight Count;
tables Ethnicity/nocum testp=(80.29 12.06 0.79 2.92 3.94);
format Ethnicity ef.;
title 'Ethnicity Distribution of Jurors';
run;
ods rtf close;
Test
Ethnicity Frequency Percent Percent
White
2030
81.20
80.29
Black
313
12.52
12.06
Native American
10
0.40
0.79
Asian
57
2.28
2.92
Other
90
3.60
3.94
Chi-Square Test
for Specified Proportions
Chi-Square
DF
Pr > ChiSq
9.7501
4
0.0449
The hypotheses tested is =
H 0 : p1 .8029,
=
p2 .1206,
=
p3 .0079,
=
p4 .0292,
=
p5 .0394 vs. H a : at
least one or more inequality, where p1 , p2 , p3 , p4 , p5 are multinomial probabilities
that a randomly selected juror belongs to one of the five ethnic classes.
We reject the null hypothesis at α = .05 , since the p-value <.05. Thus the data
provides evidence that the ethnic distribution of the jurors is different from
the ethnic distribution of people in the whole county.
3.
The problem set-up is the same as in the class example. We want to test the
hypothesis that the frequencies for the counts of number of quadrats is the same
as those obtained from a Poisson distribution with a mean parameter λ = 1.9 which
was estimated from the data.
The probabilities calculated are: 0.15 0.284 0.27 0.171 0.081 0.031 0.013
data cactus;
input Plants $ Quadrats @@;
datalines;
0 18 1 38 2 49 3 21 4 12 5 5 >=6 1
;
ods rtf file="U:\Documents\Classwork\stat479\F13\as4n3_out.rtf";
proc freq data=cactus;
weight Quadrats;
tables Plants/nocum expected testp=(15.0 28.4 27.0 17.1 8.1 3.1 1.3);
title 'Barrel Cacti Distribution in Quadrats';
run;
ods rtf close;
Test
Plants Frequency Percent Percent
0
18
12.50
15.00
1
38
26.39
28.40
2
49
34.03
27.00
3
21
14.58
17.10
4
12
8.33
8.10
5
5
3.47
3.10
>=6
1
0.69
1.30
Chi-Square Test
for Specified Proportions
Chi-Square
4.4528
6
DF
Pr > ChiSq
0.6156
WARNING: 29% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Since the p-value of .6156 is not smaller than .05 we fail to reject the null
hypothesis that the observed frequencies were sampled from a Poisson
distribution.
4.
data company;
input Grade Status Count @@;
label Grade="Job Grade" Status="Marital Status";
datalines;
1 1
58
1 2 874
1 3
15
1 4
8
2 1 222
2 2 3927
2 3
70
2 4
20
3 1
50
3 2 2396
3 3
34
3 4
10
4 1
7
4 2 533
4 3
7
4 4
4
;
proc format;
value ms 1 = 'Single'
2 = 'Married'
3 = 'Divorced'
4 = 'Widowed'
;
ods rtf file="U:\Documents\Classwork\stat479\F13\as4n4_out.rtf";
proc freq;
weight Count;
tables Grade*Status/chisq expected cellchi2 nocol nopercent norow;
title "Dependency of Job Grade on Marital Status";
format Status ms.;
run;
ods rtf close;
See output and discussion on the following page.
4.(Continued)
Table of Grade by Status
Grade(Job Grade)
Status(Marital Status)
Frequency
Expected
Cell Chi-Square
Single
Married
Divorced
Widowed
Total
1
58
39.118
9.1148
874
897.27
0.6033
15
14.626
0.0096
9
4.9913
3.2196
956
2
222
173.45
13.589
3927
3978.6
0.6683
70
64.851
0.4088
20
22.132
0.2053
4239
3
50
101.89
26.423
2396
2337
1.4885
34
38.094
0.4399
10
13
0.6924
2490
4
7
22.546
10.719
533
517.15
0.4859
7
8.4296
0.2424
4
2.8768
0.4386
551
337
7730
126
43
8236
Total
Statistic
DF
Value
Prob
Chi-Square
9 68.7484 <.0001
Likelihood Ratio Chi-Square
9 76.0113 <.0001
Mantel-Haenszel Chi-Square
1 19.5564 <.0001
Phi Coefficient
0.0914
Contingency Coefficient
0.0910
Cramer's V
0.0527
The p-value for the χ 2 statistic is very small indicating that the null
hypothesis that Job Grade is independent of Marital Status is to be rejected,
that is, an association exists between these two variables.
Phi, C, and V are all measures of the strength of association between nominal
variables. However, although the χ 2 statistic is large, the Phi, C, and V, are
all very small, may be showing that the association is very weak. However, all
these three statistics have sample size n as the denominator or part of the
denominator.
Knowing that Job Grade is an ordinal variable more powerful tests of association
could be used, instead of these 3.
5.
libname mylib "U:\Documents\Stat479\";
data fuelnew;
set mylib.fueldat;
if Percent=<54 then LicGrp=1;
else if 54<Percent=<58 then LicGrp=2;
else LicGrp=3;
label LicGrp="No. of Drivers";
run;
proc format;
value ig 1 = 'Low Income'
2 = 'Middle Income'
3 = 'High Income';
value lg 1='below 54%'
2='54 to 58%'
3='above 58%' ;
run;
ods rtf file="U:\Documents\Classwork\stat479\F13\as4n5_out.rtf";
proc freq data=fuelnew;
tables IncomGrp*LicGrp/chisq expected
cellchi2 nocol nopercent norow measures;
test gamma kentb scorr;
format IncomGrp ig. LicGrp lg.;
title 'Analysis of Association between Per Capita Income and Number of Licensed
Driver';
run;
ods rtf close;
Table of Incomgrp by LicGrp
Incomgrp(Per capita Income)
Frequency
Expected
Cell Chi-Square
LicGrp(No. of Drivers)
below 54%
54 to 58%
above 58%
Total
Low Income
7
3.7917
2.7147
5
4.875
0.0032
1
4.3333
2.5641
13
Middle Income
2
5.25
2.0119
7
6.75
0.0093
9
6
1.5
18
High Income
5
4.9583
0.0004
6
6.375
0.0221
6
5.6667
0.0196
17
14
18
16
48
Total
Statistic
DF
Value
Prob
Chi-Square
4
8.8452
0.0651
Likelihood Ratio Chi-Square
4
9.8932
0.0423
Mantel-Haenszel Chi-Square
1
2.4728
0.1158
Phi Coefficient
0.4293
Contingency Coefficient
0.3945
Cramer's V
0.3035
WARNING: 44% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Statistic
Value
ASE
Gamma
0.2814 0.1841
Kendall's Tau-b
0.1941 0.1295
Stuart's Tau-c
0.1927 0.1294
Somers' D C|R
0.1945 0.1291
Somers' D R|C
0.1937 0.1299
Pearson Correlation
0.2294 0.1386
Spearman Correlation
0.2162 0.1456
Lambda Asymmetric C|R
0.1333 0.1642
Lambda Asymmetric R|C
0.1667 0.0913
Lambda Symmetric
0.1500 0.1108
Uncertainty Coefficient C|R
0.0943 0.0526
Uncertainty Coefficient R|C
0.0946 0.0527
Uncertainty Coefficient Symmetric 0.0944 0.0526
Gamma
Gamma
0.2814
ASE
0.1841
95% Lower Conf Limit -0.0795
95% Upper Conf Limit 0.6422
Test of H0: Gamma = 0
ASE under H0
0.1890
Z
1.4891
One-sided Pr > Z 0.0682
Two-sided Pr > |Z| 0.1365
Kendall's Tau-b
Tau-b
0.1941
ASE
0.1295
95% Lower Conf Limit -0.0598
95% Upper Conf Limit 0.4480
Test of H0: Tau-b = 0
ASE under H0
0.1303
Z
1.4891
One-sided Pr > Z 0.0682
Two-sided Pr > |Z| 0.1365
Spearman Correlation Coefficient
Correlation
0.2162
ASE
0.1456
95% Lower Conf Limit
-0.0693
95% Upper Conf Limit
0.5016
Test of H0: Correlation = 0
ASE under H0
0.1463
Z
1.4776
One-sided Pr > Z
0.0698
Two-sided Pr > |Z|
0.1395
Sample Size = 48
6.
libname mylib "U:\Documents\Stat479\";
ods pdf file="U:\Documents\Classwork\stat479\F13\as4n6_out.pdf";
proc univariate data=mylib.fueldat noprint;
var Roads Numlic;
histogram Roads Numlic;
probplot Roads/normal(mu=est sigma=est);
probplot Numlic/normal(mu=est sigma=est);
title 'Graphics to Distribution of Roads and NumLic';
run;
ods pdf close;
See attached graphical output from SAS:
The interpretations from these are that:
•
Road distribution appears to be right-skewed as shown clearly by the
histogram and the bowl or cup-shape of the normal probability plot.
•
Numlic distribution also appears to be right-skewed as shown clearly by the
histogram and the bowl or cup-shape of the normal probability plot.
Download