SELF EXAM I Answers

advertisement
Statistical Methods I
Confidence Intervals and Sample Size Calculations
Self Check – Answers
Question 1: Refer to the CHOLEST dataset
a) Calculate the 95% interval for the cholesterol levels for the individuals who were
two days past a heart attack.
Answer:
This is a simple confidence interval using one sample mean. Using the SAS code
attached, you will generate the following output:
Analysis Variable : twoday twoday
Lower 95%
CL for Mean
Upper 95%
CL for Mean
Mean
234.4554788
265.2368289
249.8461538
This should be interpreted as – “We are 95% confident that the true mean cholesterol
level for people who are two days past a heart attack is between 234.46 and 265.24”.
b) Calculate the 95% interval for the cholesterol levels for the individuals who were
14 days past a heart attack.
Answer:
This is a simple confidence interval using one sample mean. Using the SAS code
attached, you will generate the following output:
Analysis Variable : fourteenday fourteenday
Lower 95%
CL for Mean
Upper 95%
CL for Mean
Mean
203.2742596
237.1701848
220.2222222
This should be interpreted as – “We are 95% confident that the true mean cholesterol
level for people who are two days past a heart attack is between 203.27 and 237.17”.
Question 2: Refer to the CHOLEST Dataset
a) Calculate the 99% interval for the cholesterol difference between the
measurements taken at 2 days after the attack and then again 4 days after the
attack (same people). Is zero included in the interval?
Answer:
This is a confidence interval using two paired sample means. Using the SAS code
attached, you will generate the following output:
Analysis Variable : Diff
Lower 99%
CL for Mean
Upper 99%
CL for Mean
Mean
2.9484804
35.5130581
19.2307692
This should be interpreted as – “We are 99% confident that the true difference in mean
cholesterol level for people who are two days past a heart attack and four days past a
heart attack is between 2.95 and 35.51”. Since the difference was calculated as two
day – four day, and the values are positive, this would indicate that the cholesterol
levels go down from day 2 to day 4. In addition, since zero is not included in the
interval, it appears that there is a true drop in cholesterol levels from day 2 to day 4 – 0 is
not a probable outcome.
b) Calculate the 99% interval for the cholesterol difference between the
measurements taken at 4 days after the attack and then 14 days after the
attack (same people). Compare these results to what you found in a).
Answer:
This is a confidence interval using two paired sample means. Using the SAS code
attached, you will generate the following output:
Analysis Variable : Diff1
Lower 99%
CL for Mean
Upper 99%
CL for Mean
Mean
-12.0294231
23.1405342
5.5555556
This should be interpreted as – “We are 99% confident that the true difference in mean
cholesterol level for people who are four days past a heart attack and fourteen days
past a heart attack is between -12.03 and 23.14”. The difference was calculated as
four day – fourteen day. Since the values go from negative to positive, this would
indicate that the cholesterol levels may go up (negative) or may go down (positive).
Since 0 is included in the interval, we cannot conclude that the cholesterol level
changes as a patient moves from day 4 after a heart attack to day 14. Comparing
these results to the previous question, it appears that cholesterol levels go down as a
patient is four days after a heart attack, but we cannot say that the levels will continue
to go down after that.
Question 3: Refer to the PennState3 Dataset
a) Calculate the 95% interval for the proportion of students who proclaim to believe
in the supernatural.
Answer:
This is a simple one sample proportion confidence interval. Using the SAS Code
attached, you will generate the following output:
Frequency
Percent
Cumulative
Frequency
1-yes
142
73.96
142
73.96
2-no
50
26.04
192
100.00
Supernat1
Cumulative
Percent
Binomial Proportion for
Supernat1 = 1-yes
Proportion
0.7396
ASE
0.0317
95% Lower Conf Limit
0.6775
95% Upper Conf Limit
0.8017
Exact Conf Limits
95% Lower Conf Limit
0.6715
95% Upper Conf Limit
0.8001
There is additional output created which is not required to answer the problem.
Using the first set of confidence limit numbers (the second set of numbers – the Exact
Confidence Limits – are based upon calculations using Monte Carlo estimates), we
would report that “We are 95% Confident that the true proportion of students who
believe in the supernatural is between 67.75% and 80.17%.”
b) Calculate the 90% interval for the difference between males and females who
proclaim to believe in the supernatural.
Answer:
This is the confidence interval around the difference between two sample proportions.
Using the SAS Code attached, you will generate the following output (note that the
output below represents only portions of the complete output that will be generated
from SAS :
Table of Sex by Supernat
Sex
Supernat
Frequency
Percent
Row Pct
Col Pct
no
yes
Total
Female
28
101
14.58 52.60
21.71 78.29
56.00 71.13
129
67.19
Male
22
41
11.46 21.35
34.92 65.08
44.00 28.87
63
32.81
50
142
26.04 73.96
192
100.00
Total
Frequency Missing = 35
From this first frequency table, we can see that the proportion of females who believe in
the supernatural is 78.29%, while the proportion of males who believe is 65.08%.
Because the “yes” response is located in the second column, we will examine the
Column 2 Risk Estimates table to find the Confidence Limits:
Column 2 Risk Estimates
Risk
(Asymptotic) 90%
(Exact) 90%
ASE Confidence Limits Confidence Limits
Row 1
0.7829 0.0363
0.7232
0.8426
0.7147
0.8411
Row 2
0.6508 0.0601
0.5520
0.7496
0.5402
0.7504
Total
0.7396 0.0317
0.6875
0.7917
0.6823
0.7912
Difference
0.1322 0.0702
0.0167
0.2476
Difference is (Row 1 - Row 2)
From this output, we would report the following: “We are 90% Confident that the
difference in proportion of men and women who believe in the Supernatural is
between 1.67% and 24.76%.” Since the value of 0 is not included in the interval, we
would conclude that there is a true difference in the proportion of men and women
who believe in the supernatural.
c) Returning to part a, what would the new sample size need to be if you wanted
to achieve the same Margin of Error at a 99% level of confidence? Please do this
calculation by hand.
Answer:
Lets begin this problem by identifying “p” or the proportion of people who said
“yes”. From the output, this was 73.96%. The upper end of the 95% Confidence Interval
was 80.17%. The difference between 73.96% and 80.17% is 6.21%. This is also the same
value as the difference between 73.96% and the lower end of the interval – 73.96%67.75% = 6.21%. This value is the Margin of Error. If we want to achieve this same Margin
of Error at a higher level of confidence, we must increase the sample size.
The new sample size calculation is:
.0621 = 2.575*SQRT((.7386*.2604)/n) or
n = .7386*.2604*2.5752 /.06212
n=330.69 = 331
This is the new sample size that we would need to have to achieve a Margin of Error of
.0621 and the same Interval as you found in part a) at a new 99% Confidence.
SAS CODE for Confidence Intervals Self Test
*Question 1 a) Calculate the 95% interval for the cholesterol levels for
the individuals who were 2 days past a heart attack;
ODS RTF;
Proc means data=jlp.cholest clm mean;
Var twoday;
Run;
ODS RTF CLOSE;
*Question 1 b)
Calculate the 95% interval for the cholesterol levels for
the individuals who were 14 days past a heart attack.;
ODS RTF;
Proc means data=jlp.cholest clm mean;
Var fourteenday;
Run;
ODS RTF CLOSE;
*Question 2 a)Calculate the 99% interval for the cholesterol difference
between the measurements taken at 2 days after the attack and then again 4
days after the attack (same people);
Data jlp.cholest1;
set jlp.cholest;
Diff = twoday - fourday;
run;
ODS RTF;
Proc means data=jlp.cholest1 clm mean alpha = .01;
Var diff;
Run;
ODS RTF CLOSe;
*Question 2 b)
Calculate the 99% interval for the cholesterol difference
between
the measurements taken at 4 days after the attack and then 14 days after the
attack
(same people);
Data jlp.cholest1;
set jlp.cholest;
Diff1 = fourday - fourteenday;
run;
ODS RTF;
Proc means data=jlp.cholest1 clm mean alpha = .01;
Var diff1;
Run;
ODS RTF CLOSe;
*****************************************************************************
*************************
*****note...although not asked, you should have recognized that the data was
not conveniently formatted to generate a two sample interval. To organize
the data into a structure that would accommodate a two sample CI - such as
the difference between the cholesterol level for the ******
*****control group and the level at 2 days (2 different groups of people),
you could have executed****
*****the following;
*****************************************************************************
************************;
DATA TEMP1;
SET jlp.cholest (KEEP = control);
RENAME control = cholest;
LENGTH GROUP $8;
GROUP = 'control';
DATA TEMP2;
SET jlp.cholest (KEEP = twoday);
RENAME twoday = cholest;
LENGTH GROUP $8;
GROUP = 'twoday';
DATA jlp.cholest2;
SET TEMP1 TEMP2;
PROC PRINT;
RUN;
ODS RTF;
PROC TTEST DATA = jlp.cholest2;
CLASS group;
VAR Cholest;
RUN; QUIT;
ODS RTF CLOSE;
*Question 3 a)Calculate the 95% interval for the proportion of students who
proclaim to believe in the supernatural;
Data jlp.pennstate3a;
set jlp.pennstate3;
if Supernat = "yes" then Supernat1 = "1-yes";
else if Supernat = "no" then Supernat1 = "2-no";
Run;
ODS RTF;
Proc Freq data=jlp.pennstate3a;
Tables supernat1/binomial alpha=.05;
Run;
ODS RTF CLOSE;
*Question 3 b)Calculate the 90% interval for the difference between males and
females who proclaim to believe in the supernatural;
ODS RTF;
Proc Freq data=jlp.pennstate3;
Tables Sex*Supernat/CHISQ PDIFF alpha=.1;
Run;
ODS RTF CLOSE;
Download