Chapter 4: Alternatives to the t-Tools The t-tools have a broad range of application, but situations still arise where they should not be applied because the assumptions are clearly violated. 4.1 4.1.1 Space Shuttle O-Ring Failures • Scope of Inference? 4.1.2 Cognitive Load Theory in Teaching • Scope of Inference? 4.3 Other Alternatives for Two Independent Samples 4.3.1 Permutation Tests • A test that finds a p-value as the proportion of regroupings of the observed n1 +n2 numbers into two groups of size n1 and n2 that lead to test statistics as extreme or more extreme than the observed one. • Require no distributional assumptions or special conditions • Are always available (though may require too much computational effort) • STEPS: 1. Decide on a test statistic. 2. Compute its value from the 2 samples (=observed test statistic) 3. List all regroupings of the n1 + n2 numbers into groups of size n1 and n2 . 4. Recompute the test statistic for each regrouping. 5. Count the number of regroupings that produce the test statistics as extreme or more extreme than the observed test statistic. 6. Calculate the p-value by dividing the count in 5. by the total number of regroupings. 1 • Combinations, written Cn,k or n(n−1)(n−2)···1 nk= n! (n−k)!= k(k−1)···1×(n−k)(n−k−1)···1 k! Counts the number of ways we can choose k objects out of n. For example, if we have letters {A, B, C, D} there are six possible pairs of letters, {AB, AC, AD, BC, BD, CD} where order is not considered; BD and DB are the same set. • Example- The O-ring case study: 4.3.2 Welch’s t-Test for Comparing Two Normal Populations with Unequal Spreads • Offers the alternative to pooling −→ use the sample standard deviations as separate estimates of the population standard deviations. • New formula for the standard error: s SEW (Y 2 − Y 1 ) = s2 s22 + 1 n2 n1 • New formula for the degrees of freedom (Satterthwaite’s approximation): d.f.W = [SEW (Y 2 − Y 1 )]4 [SE(Y 2 )]4 (n2 −1) + [SE(Y 1 )]4 (n1 −1) • Compute the t-test and confidence intervals as usual for the two-sample t-test using SEW and d.f.W . • Why didn’t we just do this to begin with? Even when populations are normal, the exact distribution of the Welch’s t-ratio is unknown. We approximate it with the t-distribution with d.f.W degrees of freedom. • Why may it be inadequate to compare populations with different means AND different spreads? 2 4.5 Related Issues 4.5.1 Practical and Statistical Significance • p-values indicate statistical significance (strength of evidence against a null hypothesis) • Does statistical significance imply practical significance? • Practical significance refers to the practical importance of the effect in question: • What is the connection between sample size and statistical significance? • Three practical points: 1. p-values are sample-size dependent 2. A result with a p-value of 0.08 can have more scientific relevance than one with a p-value of 0.001. 3. Tests of hypotheses by themselves rarely convey the full significance of the results. They should be accompanied by confidence intervals to indicate the range of likely effects and to facilitate the assessment of practical significance. 4.5.2 Reporting Statistical Findings 4.5.4 Survey Sampling • Definition: Selecting members of a specific (finite) population to be included in a survey • Examples: – Simple random sample (SRS) (often unrealistic to find list of entire population) – More complicated sampling designs: ∗ Stratified: separate SRS’s selected from each stratum ∗ Multistage: SRS’s at different stages ∗ Cluster: SRS of clusters and then all members of cluster are included – The finite population correction (FPC): ∗ The formulas presented for SEs so far assume SRS with replacement (wr) ∗ Usually a subject cannot be chosen twice without replacement (wor) which changes SE ∗ The variance of an average from wr differs from wor by: FPC = (NN−n) ∗ When is the FPC near one? – Usual SEs are inappropriate for data from complex sampling designs, and often are smaller than they should be. Moral : If you have a finite (known N) population that you are randomly sampling from, consult a sampling textbook for correct standard error formulas. 3 • Non-response bias: – Individuals who tend to respond to surveys often have much different views than those who do not choose to respond. – Question: Is there something about the units we missed (or that didn’t respond) that is related to the response? – The Fix: spend more time and money to go back to at least some of the nonresponders and entice them to respond. Then check to see if these differ in a systematic way from the responders. 4