Stat 250.3 November 12, 2003 Activity 8: CI’s and Hypothesis tests in Minitab. PART I: CI and hypothesis test for one proportion. Whenever you want to calculate a CI or test a hypothesis for 1 proportion in MINITAB select Stat>Basic Statistics> 1Proportion… If the data is in the MINITAB worksheet then just select the variable of interest (remember this variable must have ONLY 2 levels, i.e. two possible values). If the data is summarized (you already know the number of successes and the sample size), then select summarized data and fill in the relevant information. For CI, under Options select the confidence level (95%, 99% etc.), set the alternative at “not equals” AND MAKE SURE you select use test and interval based on normal distribution. If you just want to obtain a CI you do not need to specify a value for the test proportion. For Hypothesis test, under Options you can the test proportion (p0), and the alternative hypothesis. MAKE SURE you select “use test and confidence interval based on normal distribution”. Situation 1: The Penn State University states that 30% of its classes have sizes of 20 or smaller. A random sample of 200 classes revealed that 49 had class sizes of 20 or less. A) Conduct a hypothesis test to determine whether the proportion of classes with less than 20 students is different than 30%. [State the null and the alternative hypotheses check the conditions and use the Minitab output to draw your conclusions using alpha = 0.05.] Test and CI for One Proportion Test of p = 0.3 vs p not = 0.3 Sample 1 X 49 N Sample p 95.0% CI Z-Value P-Value 200 0.245000 (0.185394, 0.304606) -1.70 0.090 H0: p = .30 vs Ha: p ≠ Conditions: We assume that the sample is random, and we have that np0=200(.3)>10 and n(1-p0)=200(.7) >10, thus the conditions are satisfied. p-value = .090 > .05. Therefore, we fail to reject the null hypothesis. Thus, we cannot conclude the proportion of classes with less than 20 students is not 30%. 0.4 B) DRAW an illustration of the p-value, label the test statistic, and shade in the appropriate regions. 0.2 0.0 0.1 Density 0.3 p-value = .09 -4 -2 -1.7 0 Z-Stat 2 1.7 4 C) Report and interpret the 95% C.I. of the proportion of classes with less than 20 students. We are 95% confident that between 18.5% and 30.5% of PSU classes have less than 20 students. Situation 2: A trial is undertaken to determine the effectiveness of a new anti-cavity gum. Of 1000 patients who use the gum only 35 get cavities. The company wants to show that the population proportion of people who get cavities is less than 5%. A) Conduct an appropriate hypothesis test. [State the null and the alternative hypotheses check the conditions and use the Minitab output to draw your conclusions. (use alpha = 0.05)] Stat 250.3 November 12, 2003 Test and CI for One Proportion Test of p = 0.05 vs p < 0.05 Sample 1 X N Sample p 95.0% Upper Bound P-Value 35 1000 0.035000 0.046138 0.014 Ho: p = ..05 vs HA: p < .05 Conditions: We assume that the sample is random. Also, we have that np0=1000(.05)=50>10 and np0=1000(.95) >10, thus the conditions are satisfied. p-value = .014 < .05. Therefore, we reject the null hypothesis. We can conclude the proportion who got cavities is less than 5%. B) Based on your conclusion, what type of error could you have possibly made? Type I Error: We rejected the null hypothesis, and conclude the alternative hypothesis. C) Report and interpret the 95% C.I. of the proportion people who get cavities. (To obtain a CI you must have the alternative at “not equals”.) Test of p = 0.05 vs p not = 0.05 Sample 1 X 35 N 1000 Sample p 0.035000 95.0% CI (0.023609, 0.046391) Z-Value -2.18 P-Value 0.030 We are 95% confident that the proportion of people who get cavities is between 2.4% and 4.6%. PART II: CI for 1 mean and 1 sample t- test. First the data should be in the MINITAB worksheet. Whenever you want to calculate a confidence interval or test a hypothesis for 1 mean in MINITAB select Stat>Basic Statistics> 1-sample t . Then select the variable of interest in the “Variable” box. For CI, under “Options” select the confidence level (95%, 99% etc.) and set the alternative at “not equals”. If you just want to obtain a CI you do not need to specify a value for the test mean. For Hypothesis test, specify a value for the test mean and under “Options” you can specify the alternative hypothesis. First we download the survey data from the course web site, copy and paste it in the Minitab spreadsheet. Situation 3: A) Construct a 90% confidence interval for the mean GPA of all stat 200 students. Use Stat>Basic Statistics>1-sample t….under options change the confidence level to 90%. You do NOT have to specify a test mean! One-Sample T: GPA Variable GPA N 206 Mean 3.0013 StDev 0.5093 SE Mean 0.0355 95.0% CI (2.9314, 3.0713) B) If you were going to do this problem by hand you could have gotten the necessary components (x-bar, s, n) by selecting Stat>Basic Statistics>Display Descriptive Statistics. Do this now for the variable GPA. Notice that the standard error of the mean is also provided in the output. Verify MINITAB’s calculation of the SE Mean by using the sample standard deviation and n. Descriptive Statistics: GPA Variable GPA N 206 N* 2 Mean 3.0013 Median 3.0500 TrMean 3.0168 Variable GPA SE Mean 0.0355 Minimum 0.0000 Maximum 3.9500 Q1 2.6650 Q3 3.3425 Note that s.e(x-bar) = 0.5095/sqrt(206) = 0.0355 StDev 0.5093 Stat 250.3 November 12, 2003 C) Test the hypothesis that the mean GPA for stat 200 students is greater than 3.00. Use Stat>Basic Statistics>1sample t….under options change the confidence level to 90% One-Sample T: GPA Test of mu = 3 vs mu > 3 Variable GPA N 206 Variable GPA Mean 3.0013 95.0% Lower Bound 2.9427 StDev 0.5093 T 0.04 SE Mean 0.0355 P 0.485 From the output we have that the test statistic is t = 0.04, and the p-value = 0.485, so we fail to reject the null. We conclude that there is not enough evidence to claim that the mean GPA for stat 200 students is greater than 3.00. Situation 4: A) Consider the variables height and ideal height. What type of data structure is this? Paired data (quantitative). B) Construct a 95% confidence interval for the mean difference between height and ideal height. First take the differences between the columns Ideal Height and Height, use Calc>Calculator…, store result in variable “Differences”, and the Expression should be, 'Ideal Height' - 'Height(in)'. Then simply do a CI for 1 mean on the ‘Differences’ following same steps as above. This is partial Minitab output Variable C50 N 206 Mean 2.002 StDev 7.190 SE Mean 0.501 ( 95.0% CI 1.015, 2.990) C) Write a sentence that interprets this interval. We are 95% confident that the average difference between the ideal and the actual height for each person is between 1.015 to 2.99 inches. Based on this interval, we can reject the null hypothesis that the average difference is 0 (H 0: μd=0), and claim that is NOT EQUAL to 0 (Ha: μd≠0), based on alpha= 0.05. Also, we can reject the null hypothesis that the average difference is 0 (H 0: μd=0), and claim that is GREATER than 0 (Ha: μd > 0), based on alpha= 0.025. Minitab output for H0: μd=0 vs Ha: μd≠0 (“Test mean” in this case is 0) One-Sample T: Difference Test of mu = 0 vs mu not = 0 Variable Difference Variable Difference N 206 ( Mean 2.002 95.0% CI 1.015, 2.990) StDev 7.190 T 4.00 SE Mean 0.501 P 0.000 Minitab output for H0: μd=0 vs Ha: μd > 0 (“Test mean” in this case is 0) One-Sample T: Difference Test of mu = 0 vs mu > 0 Variable Difference Variable Difference N 206 Mean 2.002 95.0% Lower Bound 1.175 StDev 7.190 T 4.00 SE Mean 0.501 P 0.000 Stat 250.3 November 12, 2003 PART III: CI for the difference of 2 independent means and 2 sample t- test. Whenever you want to calculate a confidence interval or test a hypothesis for the difference between 2 means in MINITAB select Stat>Basic Statistics> 2-sample t . There are essentially two ways to enter the data when using Minitab for two-sample procedures. The most natural way (in my opinion) is to enter the two samples into two different columns. In other words, you could put sample 1 in column C1 and sample 2 in column C2. The other way to enter the two samples is to put all of the observations into 1 column, say C1. Then, you can specify a ``subscripting variable'' in column C2. The ``subscripting variable'' can be a column containing, for example 1's and 2's corresponding to observations in C1 from sample 1 or 2, respectively. The contents of C2 can be non-numeric, e.g. A's and B's, or male and female. If the samples are in one column, click the circle next to Samples in one column. Enter the column of the data in Samples and the column with the subscripting variable'' in Subscripts. If the samples are in different columns, click the circle next to Samples in different columns. Enter the column of data representing the first sample in First column and the column from the second sample in Second column. Confidence Intervals and Hypothesis tests are obtained in the same manner as in the previous cases. Situation 4: A) Construct a 95% confidence interval for the difference in the average heights between males and females. Use Stat>Basic Statistics> 2-Sample t…the “Samples” field should contain the response variable, and the “Subscripts” field the categorical variable. Two-Sample T-Test and CI: Height(in), Gender Two-sample T for Height(in) Gender female male N 112 94 Mean 64.08 70.27 StDev 3.70 3.50 SE Mean 0.35 0.36 Difference = mu (female) - mu (male ) Estimate for difference: -6.191 95% CI for difference: (-7.182, -5.200) T-Test of difference = 0 (vs not =): T-Value = -12.32 P-Value = 0.000 DF = 201 So a 95% C.I. for the difference between the two mean is (-7.182, -5.200) B) Write a sentence that interprets this confidence interval. We are 95% confident that the average height for females is 5.2 to 7.18 inches lower than the average height for males. PART IV: CI and hypothesis test for 2 proportions. Like the 1-proportion procedures in Minitab, we will enter the data into Minitab in its summarized form. That is, we only need the sample size and number of successes from each sample. If the data is not summarized, you can select Stat>Tables> Cross Tabulation…, and in the Classification variables select the column with the response variable (the “trend” of interest) and the column with the subscripting variable (indicating the population from where each unit is from). Whenever you want to calculate a CI or test a hypothesis for 2 proportion in MINITAB select Stat>Basic Statistics> 2Proportions… and then select summarized data and fill in the relevant information for the two samples. Confidence Intervals and Hypothesis tests are obtained in the same manner as in the previous cases. If you are interested in a test for the 2 proportions, make sure to select the “Use pooled estimate of p for the test”. Situation 5: A) Consider the variables Gender and DUI. Construct a 90% confidence interval for the difference in proportion of DUI in the past between males and females. Stat 250.3 November 12, 2003 First, let’s get the data summary. Select Stat>Tables> Cross Tabulation…, and in the Classification variables select Gender and DUI. We obtain the following output Tabulated Statistics: Gender, DUI Rows: Gender Columns: DUI No Yes All 68 28 96 46 65 111 114 93 207 female male All Cell Contents -Count Use Stat>Basic Statistics> 2- Proportions… and then select summarized data and fill in the boxes as follows then click on options, select confidence level 99.0 and “not equal” alternative. We have the following output Test and CI for Two Proportions Sample 1 2 X 65 46 N 93 114 Sample p 0.698925 0.403509 Estimate for p(1) - p(2): 0.295416 90% CI for p(1) - p(2): (0.186632, 0.404200) Test for p(1) - p(2) = 0 (vs not = 0): Z = 4.47 P-Value = 0.000 Thus the CI of the difference of the proportions of DUI of males and females is (0.186632, 0.404200). B) Test the hypothesis that the proportion of males DUI is higher than the proportion of females. Follow similar steps as in part (A). In the box “Test difference” leave the default value 0.0, select “greater than” alternative and select “Use pooled estimate of p for the test”. We have the following output Test and CI for Two Proportions Sample 1 2 X 65 46 N 93 114 Sample p 0.698925 0.403509 Estimate for p(1) - p(2): 0.295416 90% lower bound for p(1) - p(2): 0.210659 Test for p(1) - p(2) = 0 (vs > 0): Z = 4.24 P-Value = 0.000 The z-stat= 4.24 and the p-value is very small (0.000). So the test is significant. Note that using the CI in part A, (0.186632, 0.404200), we could reject the null hypothesis (since 0 is not in the interval) and claim that the Ha: p1-p2 >0 (since the interval is greater than 0) is true based on alpha = (1-.9)/2 = .1/2 = .05.