Lab Activity #11: More on Confidence Intervals and Hypothesis

advertisement
Stat 250.3
November 12, 2003
Activity 8: CI’s and Hypothesis tests in Minitab.
PART I: CI and hypothesis test for one proportion.
Whenever you want to calculate a CI or test a hypothesis for 1 proportion in MINITAB select Stat>Basic Statistics> 1Proportion… If the data is in the MINITAB worksheet then just select the variable of interest (remember this variable
must have ONLY 2 levels, i.e. two possible values). If the data is summarized (you already know the number of successes
and the sample size), then select summarized data and fill in the relevant information.
 For CI, under Options select the confidence level (95%, 99% etc.), set the alternative at “not equals” AND
MAKE SURE you select use test and interval based on normal distribution. If you just want to obtain a CI
you do not need to specify a value for the test proportion.
 For Hypothesis test, under Options you can the test proportion (p0), and the alternative hypothesis.
MAKE SURE you select “use test and confidence interval based on normal distribution”.
Situation 1:
The Penn State University states that 30% of its classes have sizes of 20 or smaller. A random sample of 200 classes
revealed that 49 had class sizes of 20 or less.
A) Conduct a hypothesis test to determine whether the proportion of classes with less than 20 students is different than
30%. [State the null and the alternative hypotheses check the conditions and use the Minitab output to draw your
conclusions using alpha = 0.05.]
Test and CI for One Proportion
Test of p = 0.3 vs p not = 0.3
Sample
1
X
49
N
Sample p
95.0% CI
Z-Value P-Value
200 0.245000 (0.185394, 0.304606) -1.70 0.090
H0: p = .30 vs Ha: p ≠
Conditions: We assume that the sample is random, and we have that np0=200(.3)>10 and n(1-p0)=200(.7) >10, thus
the conditions are satisfied.
p-value = .090 > .05. Therefore, we fail to reject the null hypothesis. Thus, we cannot conclude the proportion of
classes with less than 20 students is not 30%.
0.4
B) DRAW an illustration of the p-value, label the test statistic, and shade in the appropriate regions.
0.2
0.0
0.1
Density
0.3
p-value = .09
-4
-2
-1.7
0
Z-Stat
2
1.7
4
C) Report and interpret the 95% C.I. of the proportion of classes with less than 20 students.
We are 95% confident that between 18.5% and 30.5% of PSU classes have less than 20 students.
Situation 2:
A trial is undertaken to determine the effectiveness of a new anti-cavity gum. Of 1000 patients who use the gum only 35
get cavities. The company wants to show that the population proportion of people who get cavities is less than 5%.
A) Conduct an appropriate hypothesis test. [State the null and the alternative hypotheses check the conditions and use
the Minitab output to draw your conclusions. (use alpha = 0.05)]
Stat 250.3
November 12, 2003
Test and CI for One Proportion
Test of p = 0.05 vs p < 0.05
Sample
1
X
N Sample p 95.0% Upper Bound P-Value
35 1000 0.035000
0.046138
0.014
Ho: p = ..05 vs HA: p < .05
Conditions: We assume that the sample is random. Also, we have that np0=1000(.05)=50>10 and
np0=1000(.95) >10, thus the conditions are satisfied.
p-value = .014 < .05. Therefore, we reject the null hypothesis. We can conclude the proportion who got cavities
is less than 5%.
B) Based on your conclusion, what type of error could you have possibly made?
Type I Error: We rejected the null hypothesis, and conclude the alternative hypothesis.
C) Report and interpret the 95% C.I. of the proportion people who get cavities. (To obtain a CI you must have the
alternative at “not equals”.)
Test of p = 0.05 vs p not = 0.05
Sample
1
X
35
N
1000
Sample p
0.035000
95.0% CI
(0.023609, 0.046391)
Z-Value
-2.18
P-Value
0.030
We are 95% confident that the proportion of people who get cavities is between 2.4% and 4.6%.
PART II: CI for 1 mean and 1 sample t- test.
First the data should be in the MINITAB worksheet.
Whenever you want to calculate a confidence interval or test a hypothesis for 1 mean in MINITAB select Stat>Basic
Statistics> 1-sample t . Then select the variable of interest in the “Variable” box.
 For CI, under “Options” select the confidence level (95%, 99% etc.) and set the alternative at “not equals”.
If you just want to obtain a CI you do not need to specify a value for the test mean.
 For Hypothesis test, specify a value for the test mean and under “Options” you can specify the alternative
hypothesis.
First we download the survey data from the course web site, copy and paste it in the Minitab spreadsheet.
Situation 3:
A) Construct a 90% confidence interval for the mean GPA of all stat 200 students. Use Stat>Basic Statistics>1-sample
t….under options change the confidence level to 90%. You do NOT have to specify a test mean!
One-Sample T: GPA
Variable
GPA
N
206
Mean
3.0013
StDev
0.5093
SE Mean
0.0355
95.0% CI
(2.9314, 3.0713)
B) If you were going to do this problem by hand you could have gotten the necessary components (x-bar, s, n) by
selecting Stat>Basic Statistics>Display Descriptive Statistics. Do this now for the variable GPA. Notice that the
standard error of the mean is also provided in the output. Verify MINITAB’s calculation of the SE Mean by using the
sample standard deviation and n.
Descriptive Statistics: GPA
Variable
GPA
N
206
N*
2
Mean
3.0013
Median
3.0500
TrMean
3.0168
Variable
GPA
SE Mean
0.0355
Minimum
0.0000
Maximum
3.9500
Q1
2.6650
Q3
3.3425
Note that s.e(x-bar) = 0.5095/sqrt(206) = 0.0355
StDev
0.5093
Stat 250.3
November 12, 2003
C) Test the hypothesis that the mean GPA for stat 200 students is greater than 3.00. Use Stat>Basic Statistics>1sample t….under options change the confidence level to 90%
One-Sample T: GPA
Test of mu = 3 vs mu > 3
Variable
GPA
N
206
Variable
GPA
Mean
3.0013
95.0% Lower Bound
2.9427
StDev
0.5093
T
0.04
SE Mean
0.0355
P
0.485
From the output we have that the test statistic is t = 0.04, and the p-value = 0.485, so we fail to reject the null. We
conclude that there is not enough evidence to claim that the mean GPA for stat 200 students is greater than 3.00.
Situation 4:
A) Consider the variables height and ideal height. What type of data structure is this?
Paired data (quantitative).
B) Construct a 95% confidence interval for the mean difference between height and ideal height.
First take the differences between the columns Ideal Height and Height, use Calc>Calculator…, store result in variable
“Differences”, and the Expression should be, 'Ideal Height' - 'Height(in)'. Then simply do a CI for 1 mean on the
‘Differences’ following same steps as above.
This is partial Minitab output
Variable
C50
N
206
Mean
2.002
StDev
7.190
SE Mean
0.501
(
95.0% CI
1.015,
2.990)
C) Write a sentence that interprets this interval.
We are 95% confident that the average difference between the ideal and the actual height for each person is
between 1.015 to 2.99 inches.
Based on this interval, we can reject the null hypothesis that the average difference is 0 (H 0: μd=0), and claim that
is NOT EQUAL to 0 (Ha: μd≠0), based on alpha= 0.05.
Also, we can reject the null hypothesis that the average difference is 0 (H 0: μd=0), and claim that is GREATER
than 0 (Ha: μd > 0), based on alpha= 0.025.
Minitab output for H0: μd=0 vs Ha: μd≠0 (“Test mean” in this case is 0)
One-Sample T: Difference
Test of mu = 0 vs mu not = 0
Variable
Difference
Variable
Difference
N
206
(
Mean
2.002
95.0% CI
1.015,
2.990)
StDev
7.190
T
4.00
SE Mean
0.501
P
0.000
Minitab output for H0: μd=0 vs Ha: μd > 0 (“Test mean” in this case is 0)
One-Sample T: Difference
Test of mu = 0 vs mu > 0
Variable
Difference
Variable
Difference
N
206
Mean
2.002
95.0% Lower Bound
1.175
StDev
7.190
T
4.00
SE Mean
0.501
P
0.000
Stat 250.3
November 12, 2003
PART III: CI for the difference of 2 independent means and 2 sample t- test.
Whenever you want to calculate a confidence interval or test a hypothesis for the difference between 2 means in MINITAB
select Stat>Basic Statistics> 2-sample t .
There are essentially two ways to enter the data when using Minitab for two-sample procedures. The most natural way (in
my opinion) is to enter the two samples into two different columns. In other words, you could put sample 1 in column C1
and sample 2 in column C2. The other way to enter the two samples is to put all of the observations into 1 column, say C1.
Then, you can specify a ``subscripting variable'' in column C2. The ``subscripting variable'' can be a column containing,
for example 1's and 2's corresponding to observations in C1 from sample 1 or 2, respectively. The contents of C2 can be
non-numeric, e.g. A's and B's, or male and female.
If the samples are in one column, click the circle next to Samples in one column. Enter the column of the data in
Samples and the column with the subscripting variable'' in Subscripts.
If the samples are in different columns, click the circle next to Samples in different columns. Enter the column of data
representing the first sample in First column and the column from the second sample in Second column.
Confidence Intervals and Hypothesis tests are obtained in the same manner as in the previous cases.
Situation 4:
A) Construct a 95% confidence interval for the difference in the average heights between males and females. Use
Stat>Basic Statistics> 2-Sample t…the “Samples” field should contain the response variable, and the “Subscripts” field
the categorical variable.
Two-Sample T-Test and CI: Height(in), Gender
Two-sample T for Height(in)
Gender
female
male
N
112
94
Mean
64.08
70.27
StDev
3.70
3.50
SE Mean
0.35
0.36
Difference = mu (female) - mu (male )
Estimate for difference: -6.191
95% CI for difference: (-7.182, -5.200)
T-Test of difference = 0 (vs not =): T-Value = -12.32
P-Value = 0.000
DF = 201
So a 95% C.I. for the difference between the two mean is (-7.182, -5.200)
B) Write a sentence that interprets this confidence interval.
We are 95% confident that the average height for females is 5.2 to 7.18 inches lower than the average height for
males.
PART IV: CI and hypothesis test for 2 proportions.
Like the 1-proportion procedures in Minitab, we will enter the data into Minitab in its summarized form. That is, we only
need the sample size and number of successes from each sample. If the data is not summarized, you can select
Stat>Tables> Cross Tabulation…, and in the Classification variables select the column with the response variable (the
“trend” of interest) and the column with the subscripting variable (indicating the population from where each unit is from).
Whenever you want to calculate a CI or test a hypothesis for 2 proportion in MINITAB select Stat>Basic Statistics> 2Proportions… and then select summarized data and fill in the relevant information for the two samples. Confidence
Intervals and Hypothesis tests are obtained in the same manner as in the previous cases. If you are interested in a test for
the 2 proportions, make sure to select the “Use pooled estimate of p for the test”.
Situation 5:
A) Consider the variables Gender and DUI. Construct a 90% confidence interval for the difference in proportion of DUI in
the past between males and females.
Stat 250.3
November 12, 2003
First, let’s get the data summary. Select Stat>Tables> Cross Tabulation…, and in the Classification variables
select Gender and DUI. We obtain the following output
Tabulated Statistics: Gender, DUI
Rows: Gender
Columns: DUI
No
Yes
All
68
28
96
46
65
111
114
93
207
female
male
All
Cell Contents -Count
Use Stat>Basic Statistics> 2- Proportions… and then select summarized data and fill in the boxes as follows
then click on options, select confidence level 99.0 and “not equal” alternative. We have the following output
Test and CI for Two Proportions
Sample
1
2
X
65
46
N
93
114
Sample p
0.698925
0.403509
Estimate for p(1) - p(2): 0.295416
90% CI for p(1) - p(2): (0.186632, 0.404200)
Test for p(1) - p(2) = 0 (vs not = 0): Z = 4.47
P-Value = 0.000
Thus the CI of the difference of the proportions of DUI of males and females is (0.186632, 0.404200).
B) Test the hypothesis that the proportion of males DUI is higher than the proportion of females.
Follow similar steps as in part (A). In the box “Test difference” leave the default value 0.0, select “greater than”
alternative and select “Use pooled estimate of p for the test”. We have the following output
Test and CI for Two Proportions
Sample
1
2
X
65
46
N
93
114
Sample p
0.698925
0.403509
Estimate for p(1) - p(2): 0.295416
90% lower bound for p(1) - p(2): 0.210659
Test for p(1) - p(2) = 0 (vs > 0): Z = 4.24
P-Value = 0.000
The z-stat= 4.24 and the p-value is very small (0.000). So the test is significant.
Note that using the CI in part A, (0.186632, 0.404200), we could reject the null hypothesis (since 0 is not in the
interval) and claim that the Ha: p1-p2 >0 (since the interval is greater than 0) is true based on alpha = (1-.9)/2 =
.1/2 = .05.
Download