Statistical Methods I Confidence Intervals and Sample Size Calculations Self Check – Answers Question 1: Refer to the CHOLEST dataset a) Calculate the 95% interval for the cholesterol levels for the individuals who were two days past a heart attack. Answer: This is a simple confidence interval using one sample mean. Using the SAS code attached, you will generate the following output: Analysis Variable : twoday twoday Lower 95% CL for Mean Upper 95% CL for Mean Mean 234.4554788 265.2368289 249.8461538 This should be interpreted as – “We are 95% confident that the true mean cholesterol level for people who are two days past a heart attack is between 234.46 and 265.24”. b) Calculate the 95% interval for the cholesterol levels for the individuals who were 14 days past a heart attack. Answer: This is a simple confidence interval using one sample mean. Using the SAS code attached, you will generate the following output: Analysis Variable : fourteenday fourteenday Lower 95% CL for Mean Upper 95% CL for Mean Mean 203.2742596 237.1701848 220.2222222 This should be interpreted as – “We are 95% confident that the true mean cholesterol level for people who are two days past a heart attack is between 203.27 and 237.17”. Question 2: Refer to the CHOLEST Dataset a) Calculate the 99% interval for the cholesterol difference between the measurements taken at 2 days after the attack and then again 4 days after the attack (same people). Is zero included in the interval? Answer: This is a confidence interval using two paired sample means. Using the SAS code attached, you will generate the following output: Analysis Variable : Diff Lower 99% CL for Mean Upper 99% CL for Mean Mean 2.9484804 35.5130581 19.2307692 This should be interpreted as – “We are 99% confident that the true difference in mean cholesterol level for people who are two days past a heart attack and four days past a heart attack is between 2.95 and 35.51”. Since the difference was calculated as two day – four day, and the values are positive, this would indicate that the cholesterol levels go down from day 2 to day 4. In addition, since zero is not included in the interval, it appears that there is a true drop in cholesterol levels from day 2 to day 4 – 0 is not a probable outcome. b) Calculate the 99% interval for the cholesterol difference between the measurements taken at 4 days after the attack and then 14 days after the attack (same people). Compare these results to what you found in a). Answer: This is a confidence interval using two paired sample means. Using the SAS code attached, you will generate the following output: Analysis Variable : Diff1 Lower 99% CL for Mean Upper 99% CL for Mean Mean -12.0294231 23.1405342 5.5555556 This should be interpreted as – “We are 99% confident that the true difference in mean cholesterol level for people who are four days past a heart attack and fourteen days past a heart attack is between -12.03 and 23.14”. The difference was calculated as four day – fourteen day. Since the values go from negative to positive, this would indicate that the cholesterol levels may go up (negative) or may go down (positive). Since 0 is included in the interval, we cannot conclude that the cholesterol level changes as a patient moves from day 4 after a heart attack to day 14. Comparing these results to the previous question, it appears that cholesterol levels go down as a patient is four days after a heart attack, but we cannot say that the levels will continue to go down after that. Question 3: Refer to the PennState3 Dataset a) Calculate the 95% interval for the proportion of students who proclaim to believe in the supernatural. Answer: This is a simple one sample proportion confidence interval. Using the SAS Code attached, you will generate the following output: Frequency Percent Cumulative Frequency 1-yes 142 73.96 142 73.96 2-no 50 26.04 192 100.00 Supernat1 Cumulative Percent Binomial Proportion for Supernat1 = 1-yes Proportion 0.7396 ASE 0.0317 95% Lower Conf Limit 0.6775 95% Upper Conf Limit 0.8017 Exact Conf Limits 95% Lower Conf Limit 0.6715 95% Upper Conf Limit 0.8001 There is additional output created which is not required to answer the problem. Using the first set of confidence limit numbers (the second set of numbers – the Exact Confidence Limits – are based upon calculations using Monte Carlo estimates), we would report that “We are 95% Confident that the true proportion of students who believe in the supernatural is between 67.75% and 80.17%.” b) Calculate the 90% interval for the difference between males and females who proclaim to believe in the supernatural. Answer: This is the confidence interval around the difference between two sample proportions. Using the SAS Code attached, you will generate the following output (note that the output below represents only portions of the complete output that will be generated from SAS : Table of Sex by Supernat Sex Supernat Frequency Percent Row Pct Col Pct no yes Total Female 28 101 14.58 52.60 21.71 78.29 56.00 71.13 129 67.19 Male 22 41 11.46 21.35 34.92 65.08 44.00 28.87 63 32.81 50 142 26.04 73.96 192 100.00 Total Frequency Missing = 35 From this first frequency table, we can see that the proportion of females who believe in the supernatural is 78.29%, while the proportion of males who believe is 65.08%. Because the “yes” response is located in the second column, we will examine the Column 2 Risk Estimates table to find the Confidence Limits: Column 2 Risk Estimates Risk (Asymptotic) 90% (Exact) 90% ASE Confidence Limits Confidence Limits Row 1 0.7829 0.0363 0.7232 0.8426 0.7147 0.8411 Row 2 0.6508 0.0601 0.5520 0.7496 0.5402 0.7504 Total 0.7396 0.0317 0.6875 0.7917 0.6823 0.7912 Difference 0.1322 0.0702 0.0167 0.2476 Difference is (Row 1 - Row 2) From this output, we would report the following: “We are 90% Confident that the difference in proportion of men and women who believe in the Supernatural is between 1.67% and 24.76%.” Since the value of 0 is not included in the interval, we would conclude that there is a true difference in the proportion of men and women who believe in the supernatural. c) Returning to part a, what would the new sample size need to be if you wanted to achieve the same Margin of Error at a 99% level of confidence? Please do this calculation by hand. Answer: Lets begin this problem by identifying “p” or the proportion of people who said “yes”. From the output, this was 73.96%. The upper end of the 95% Confidence Interval was 80.17%. The difference between 73.96% and 80.17% is 6.21%. This is also the same value as the difference between 73.96% and the lower end of the interval – 73.96%67.75% = 6.21%. This value is the Margin of Error. If we want to achieve this same Margin of Error at a higher level of confidence, we must increase the sample size. The new sample size calculation is: .0621 = 2.575*SQRT((.7386*.2604)/n) or n = .7386*.2604*2.5752 /.06212 n=330.69 = 331 This is the new sample size that we would need to have to achieve a Margin of Error of .0621 and the same Interval as you found in part a) at a new 99% Confidence. SAS CODE for Confidence Intervals Self Test *Question 1 a) Calculate the 95% interval for the cholesterol levels for the individuals who were 2 days past a heart attack; ODS RTF; Proc means data=jlp.cholest clm mean; Var twoday; Run; ODS RTF CLOSE; *Question 1 b) Calculate the 95% interval for the cholesterol levels for the individuals who were 14 days past a heart attack.; ODS RTF; Proc means data=jlp.cholest clm mean; Var fourteenday; Run; ODS RTF CLOSE; *Question 2 a)Calculate the 99% interval for the cholesterol difference between the measurements taken at 2 days after the attack and then again 4 days after the attack (same people); Data jlp.cholest1; set jlp.cholest; Diff = twoday - fourday; run; ODS RTF; Proc means data=jlp.cholest1 clm mean alpha = .01; Var diff; Run; ODS RTF CLOSe; *Question 2 b) Calculate the 99% interval for the cholesterol difference between the measurements taken at 4 days after the attack and then 14 days after the attack (same people); Data jlp.cholest1; set jlp.cholest; Diff1 = fourday - fourteenday; run; ODS RTF; Proc means data=jlp.cholest1 clm mean alpha = .01; Var diff1; Run; ODS RTF CLOSe; ***************************************************************************** ************************* *****note...although not asked, you should have recognized that the data was not conveniently formatted to generate a two sample interval. To organize the data into a structure that would accommodate a two sample CI - such as the difference between the cholesterol level for the ****** *****control group and the level at 2 days (2 different groups of people), you could have executed**** *****the following; ***************************************************************************** ************************; DATA TEMP1; SET jlp.cholest (KEEP = control); RENAME control = cholest; LENGTH GROUP $8; GROUP = 'control'; DATA TEMP2; SET jlp.cholest (KEEP = twoday); RENAME twoday = cholest; LENGTH GROUP $8; GROUP = 'twoday'; DATA jlp.cholest2; SET TEMP1 TEMP2; PROC PRINT; RUN; ODS RTF; PROC TTEST DATA = jlp.cholest2; CLASS group; VAR Cholest; RUN; QUIT; ODS RTF CLOSE; *Question 3 a)Calculate the 95% interval for the proportion of students who proclaim to believe in the supernatural; Data jlp.pennstate3a; set jlp.pennstate3; if Supernat = "yes" then Supernat1 = "1-yes"; else if Supernat = "no" then Supernat1 = "2-no"; Run; ODS RTF; Proc Freq data=jlp.pennstate3a; Tables supernat1/binomial alpha=.05; Run; ODS RTF CLOSE; *Question 3 b)Calculate the 90% interval for the difference between males and females who proclaim to believe in the supernatural; ODS RTF; Proc Freq data=jlp.pennstate3; Tables Sex*Supernat/CHISQ PDIFF alpha=.1; Run; ODS RTF CLOSE;