Hypothesis Testing for the Proportion of Two Samples

advertisement
Page 1 of 4
MAT 155 Statistical Analysis
Chapter 9 Inferences from Two Samples
0901-NotesCh9.doc
Chapter 9 Inferences from Two Samples
Consider the following problem. The Engineering Department at Sims Software, Inc.,
recently developed two chemical solutions designed to increase the usable life of computer disks. A
sample of disks treated with first solution lasted 86, 78, 66, 83, 84, 81, 84, 109, 65, and 102 hours.
Those treated with the second solution lasted 91, 71, 75, 76, 87, 79, 73, 76, 79, 78, 87, 90, 76, and
72 hours. At the 0.10 significance level, can we conclude that there is a difference in the length of
time the two types of treatment lasted?
From the question, we see that we are to compare the mean number of hours each
treatment lasted. So we are using the hypothesis test to test whether the two means are equal
(...can we conclude that there is a difference...?).
SOLUTION:
In this problem, we do not know the mean and standard deviation for the samples. Thus,
we calculate the mean and standard deviation for each of the samples before beginning our 5-step
procedure. Using the TI-84 calculator or Statdisk, we find the means and standard deviations (rounded to
one-decimal place accuracy) as listed in the table below.
Treatment
1
2
Sample Size
10
14
Mean
83.8
79.3
Standard Deviation
13.7
6.7
We do not know the population standard deviations. However, by using Statdisk, the histograms,
the boxplots, and the Normal Quantile plots indicate that both sets of data may be considered normal
distributions. Thus, we should use the t-test as the test statistic.
Step 1. State the null hypothesis (contains =, , or ) and alternate hypothesis (contains ≠, <, or >).
Think of the statement “can we conclude that there is a difference in the length of time the two
types of treatment lasted?”
From “…there is a difference…,” we write the alternate hypothesis as H1: 1  2
H0: 1 = 2
H1: 1  2
(Claim)
Step 2. Select a level of significance.
Stated in the problem as 10% or  = 0.10
Step 3. Identify the statistical test to use.
Since the two sets of data are not related or paired or matched, we consider the two samples to be
independent. Also, by Statdisk’s Prelim F-test, we reject the equal variances.
Thus, we calculate the test statistic with the following formula.
t
x
1

 x 2  1   2 
s12 s22

n1 n2
Page 2 of 4
MAT 155 Statistical Analysis
Chapter 9 Inferences from Two Samples
0901-NotesCh9.doc
Step 4. Formulate a decision rule.
Since the alternate hypothesis states H1: 1  2, this is a two-tailed test. For the degrees of
freedom, we use the smaller of n1 – 1 or n2 – 1. Since n1 – 1 = 10 – 1 = 9 and n2 – 1 = 14 – 1 = 13, we
use df = 9 and  = 0.10 for a two-tailed test to find critical values t = 1.833, from Table A-3.
[Note: Statdisk yields critical values t = ±1.781.] So the decision rule is to Reject the null hypothesis if the
computed t-value is less than –1.833 or greater than 1.833.
Step 5. Take a sample; arrive at a decision.
The two sample means and standard deviations, as shown above in the table above Step 1, are
used to compute the t-value for comparison with the critical t-values in Step 4.
t
83.8  79.3  0  0.9599
13.7 2 6.7 2

10
14
Since the computed t = 0.960 is between the critical t-values of –1.833 and 1.833, we do not reject the
null hypothesis that the means are statistically the same or equal at the 10% level of significance. Thus,
we conclude that the lengths of time the two types of treatment lasted are equal.
[Note: Both Statdisk and TI-84 Plus yield the same computed t-value of 0.9599; thus, the same pvalue of 0.3559.]
If it is determined that the samples variances are equal, we pool the variances using the following
formulas.
t
x
1

 x 2  1   2 
s 2p
n1
where
s 2p 

s 2p
n2
n1  1s12  n2  1s22 (pooled variance)
n1  1  n2  1
and the number of degrees of freedom is given by df =
n1 + n2 - 2
MAT 155 Statistical Analysis
Chapter 9 Inferences from Two Samples
0901-NotesCh9.doc
Page 3 of 4
[The following notes are from the Help session of Statdisk.]
Hypothesis Testing for the Proportion of Two Samples
This test is used to test a claim about two different population proportions (or percentages), such as this claim: "The
percentage of males who smoke is equal to the percentage of females who smoke."
First select the format of the original claim that is to be tested. Note that there are 6 different choices, which can be
viewed by clicking on the small box to the right of the default claim ("Pop. Proportion 1= Pop. Proportion 2"). Use the
up/down arrow keys to highlight the desired choice, then click on that desired choice. STATDISK will determine the
appropriate null hypothesis from the claim selected.
Significance (level): Enter a positive value in decimal form between 0.00001 and 0.50, such as 0.05.
For Sample 1, make these entries:
Sample Size, n1: Enter the size of the FIRST sample as a positive whole number.
Num Successes, x1: Enter the number of successes as a positive whole number.
For example, if testing the claim that "the percentage of males who smoke is equal to the percentage of females who
smoke," and the first sample consists of 200 males with 80 of them smoking, then n1 = 200 and x1 = 80.
For Sample 2, make the corresponding entries as described above.
Click on the Evaluate button to obtain the results.
Click on the Print button to print the results.
Click on the Plot button to display a graph showing the distribution with the test statistic and critical value(s).
To close the window, click on the X at the top of the module window.
Hypothesis Testing for the Mean of Two Matched Pairs
Note: This test requires that paired data are already listed in the STATDISK data window.
If the data are not already there, close this window and enter or open data sets so that they are listed in columns of
the STATDISK Data Window. (To open a data set from Appendix B in the textbook, click on "Datasets" at the top. To
manually enter a data set, click on Data, then click on Sample Editor. The data can also be copied into STATDISK
from another application, such as Minitab or Excel.)
This test is used to test a claim about the mean of the differences between matched pairs of data. The data must be
matched or paired. For example, in testing the effectiveness of new diet, obtain sample before and after weights for
each subject, then proceed to test the claim that the differences have a mean that is positive. [Matched or paired data
is such as a before and after treatment experiment or test using the same subjects in the before and in the after. For
example, consider the effectiveness of a diet using the weights of subjects measured before and after the diet
treatment. Each “before” value is matched or paired with the “after” value because each before/after pair of
measurements comes from the same person.]
First select the format of the original claim that is to be tested. Note that there are 6 different choices, which can be
viewed by clicking on the small box to the right of the default claim ("Mean of Differences = 0"). Use the up/down
arrow keys to highlight the desired choice, then click on that desired choice. STATDISK will determine the appropriate
null hypothesis from the claim selected.
Significance (level): Enter a positive value in decimal form, such as 0.05.
Choose the two columns to be used as sample data. The columns must be matched or paired, so they must have the
same number of entries.
Click on the Evaluate button to obtain the results.
Click on the Print button to print the results.
Click on the Plot button to display a graph showing the distribution with the test statistic and critical value(s).
To close the window, click on the X at the top of the module window.
MAT 155 Statistical Analysis
Chapter 9 Inferences from Two Samples
0901-NotesCh9.doc
Page 4 of 4
Hypothesis Testing: Mean Two Independent Samples
This test is used to test a claim about the equality of the means of two independent populations. (If the data are
somehow matched or paired, do not use this test; use the hypothesis test for "Mean Matched Pairs.") For example,
this test could be used for the claim that "Men have a mean IQ score that is equal to the mean IQ score of women,"
which would be denoted as "Pop. Mean 1 = Pop. Mean 2."
First select the format of the original claim that is to be tested. Note that there are 6 different choices, which can be
viewed by clicking on the small box to the right of the default claim ("Pop. Mean 1= Pop. Mean 2"). Use the up/down
arrow keys to highlight the desired choice, then click on that desired choice. STATDISK will determine the appropriate
null hypothesis from the claim selected.
Significance (level): Enter a positive value in decimal form, such as 0.05.
For Sample 1, make these entries:
Sample Size, n1: Enter the sample size as a positive whole number.
Sample 1 mean: Enter the sample mean of the first sample.
Sample 1 StDev: Enter the standard deviation of the first sample as a positive number.
Population StDev:
Enter the value of the standard deviation of the first POPULATION only if that value is
known. In reality, this value is usually not known, so this box is usually left blank. DO NOT ENTER THE VALUE OF
THE SAMPLE STANDARD DEVIATION HERE.
For the second sample, make the corresponding entries as described above.
If the two POPULATION standard deviations are entered, the results will be based on calculations using a normal
distribution, as described in the textbook. If the two POPULATION standard deviations are not both known (as is
usually the case), you must choose among these options:
Not Eq vars: NO POOL (That is, do not assume that the two population variances are equal, and do not use a pooled
estimate of a common population variance.)
Eq. vars: POOL (That is, assume that the two population variances are equal, and use a pooled estimate of the
common population variance.)
Prelim F-Test (That is, have STATDISK do a preliminary F test to decide whether the two population variances
appear to be equal. Then, based on the result, proceed by either assuming that the two population variances are not
equal (NO POOL) or by assuming that the two population variances are equal (POOL).)
Among the above three options, the NO POOL option generally yields the best results and it is the recommended
option.
Note: If the NO POOL option is used, STATDISK proceeds with calculations using the t distribution with the number
of degrees of freedom found from the formula included in the textbook (instead of using the simplified "df = smaller of
n1-1 and n2-1"). STADISK displays the number of degrees of freedom. Because STATDISK uses the more accurate
formula for df, the results may differ somewhat from those given in the textbook.
Click on the Evaluate button to obtain the results.
Click on the Print button to print the results.
Click on the Plot button to display a graph showing the distribution with the test statistic and critical value(s).
To close the window, click on the X at the top of the module window.
Download