Assumptions and Conditions

advertisement
Assumptions and Conditions - Inference for Means
Distinguish Between Assumptions and Conditions
The plot thickens, and soon. All of mathematics is based on "If..., then..." statements. Students
have seen such statements before. For example, they know (we hope) not to apply the
Pythagorean theorem unless they have a right triangle. The fact that it's a right triangle is the
assumption that guarantees the equation a 2 + b 2 = c 2 works, so we should always check to be
sure we are working with a right triangle before proceeding.
The same is true in statistics -- we just don't see the theorems themselves very often. All those
theorems, though, are in that same "If..., then..." form. The "If" part sets out the underlying
assumptions used to prove that the statistical method works. If those assumptions are violated,
the method may fail. The assumptions are about populations and models, things that are
unknown and usually unknowable. And that presents us with a big problem, because we will
probably never know whether an assumption is true.
In fact, there are three types of assumptions:
1. Unverifiable. We must simply accept these as reasonable -- after careful thought.
2. Plausible, based on evidence. We test a condition to see if it's reasonable to believe that
the assumption is true.
3. False, but close enough. We know the assumption is not true, but some procedures can
provide very reliable results even when an assumption is not fully met. In such cases a
condition may offer a rule of thumb that indicates whether or not we can safely override
the assumption and apply the procedure anyway.
A condition, then, is a testable criterion that supports or overrides an assumption.
Inference for Means
Whenever samples are involved, we check the Random Sample Condition and the 10 Percent
Condition. Beyond that, inference for means is based on t-models because we never can know
the standard deviation of the population. The theorems proving that the sampling model for
sample means follows a t-distribution are based on the...
Normal Population Assumption: The data were drawn from a population that's Normal.
We can never know if this is true, but we can look for any warning signals. We've done that
earlier in the course, so students should know how to check the...
Nearly Normal Condition: A histogram of the data appears to be roughly unimodal, symmetric,
and without outliers.
If so, it's okay to proceed with inference based on a t-model. But what does "nearly" Normal
mean? If the sample is small, we must worry about outliers and skewness, but as the sample size
increases, the t-procedures become more robust. By the time the sample gets to be 30-40 or
more, we really need not be too concerned. Then our Nearly Normal Condition can be supplanted
by the...
Large Sample Condition: The sample size is at least 30 (or 40, depending on your text).
Note that understanding why we need these assumptions and how to check the corresponding
conditions helps students know what to do. And it prevents the "memory dump" approach in
which they list every condition they ever saw -- like np 10 for means, a clear indication that
there's little if any comprehension there.
Inference for the Difference of Two Means
By now students know the basic issues. They check the Random Condition (a random sample
or random allocation to treatment groups) and the 10 Percent Condition (for samples) for both
groups. They also must check the Nearly Normal Condition by showing two separate
histograms or the Large Sample Condition for each group to be sure that it's okay to use t. And
there's more.
As was the case for two proportions, determining the standard error for the difference between
two group means requires adding variances, and that's legitimate only if we feel comfortable with
the Independent Groups Assumption. Again there's no condition to check. We have to think about
the way the data were collected.
Inference for Matched Pairs
Whenever the two sets of data are not independent, we cannot add variances, and hence the
independent sample procedures won't work. Such situations appear often. We might collect data
from husbands and their wives, or before and after someone has taken a training course, or from
individuals performing tasks with both their left and right hands. Matching is a powerful design
because it controls many sources of variability, but we cannot treat the data as though they came
from two independent groups. Instead we have the...
Paired Data Assumption: The data come from matched pairs.
There's no condition to be tested. Instead students must think carefully about the design. This
helps them understand that there is no "choice" between two-sample procedures and matched
pairs procedures. Either the data were from groups that were independent or they were paired.
The design dictates the procedure we must use. Looking at the paired differences gives us just
one set of data, so we apply our one-sample t-procedures. We already know the appropriate
assumptions and conditions.
Independence Assumption: The individuals are independent of each other. We base plausibility
on the Random Condition.
Normal Distribution Assumption: The population of all such differences can be described by a
Normal model. We verify this assumption by checking the...
Nearly Normal Condition: The histogram of the differences looks roughly unimodal and
symmetric.
Note that there's just one histogram for students to show here. We don't care about the two
groups separately as we did when they were independent. As before, the Large Sample
Condition may apply instead.
Download