Common Statistical Mistakes

Common Statistical Mistakes Mistake #1 • Failing to investigate data for data entry or recording errors. • Failing to graph data and calculate basic descriptive statistics before analyzing data. Example: Wrong Decision Due to Error Example: Wrong Decision Due to Error Test of mu = 26.000 vs mu not = 26.000 Variable With Without N 16 15 Variable N With 16 Without 15 Mean 25.625 24.733 Mean 25.625 24.733 StDev 3.964 1.792 StDev 3.964 1.792 SE Mean 0.991 0.463 SE Mean 0.991 0.463 T -0.38 -2.74 P 0.71 0.016 95.0 % CI (23.513, 27.737) (23.741, 25.725) Mistake #2 • Using the wrong statistical procedure in analyzing your data. • Includes failing to check that necessary assumptions are met. Example: Wrong Decision Due to Wrong Analysis Pulse Rates Before and After Marching Student 1 2 3 4 BEFORE 60 56 90 78 AFTER 78 66 96 88 DIFFA-B 18 10 6 10 Paired Data Design, so analyze with Paired t-test. Example: Wrong Decision Due to Wrong Analysis Paired T for AFTER - BEFORE AFTER BEFORE Difference N 4 4 4 Mean 82.00 71.00 11.00 StDev 12.96 15.87 5.03 SE Mean 6.48 7.94 2.52 95% CI for mean difference: (2.99, 19.01) T-Test of mean difference = 0 (vs not = 0): T-Value = 4.37 P-Value = 0.02 Conclude mean pulse rate after is greater than mean pulse rate before. Example: Wrong Decision Due to Wrong Analysis Two sample T for AFTER vs BEFORE AFTER BEFORE N 4 4 Mean 82.0 71.0 StDev 13.0 15.9 SE Mean 6.5 7.9 95% CI for mu AFTER - mu BEFORE: ( -15.3, 37.3) T-Test mu AFTER = mu BEFORE (vs not =): T = 1.07 DF = 5 P = 0.33 Conclude no difference in mean pulse rates before and after marching. Mistake #3 • Failing to design your study so that it has high enough power to call meaningful differences “significantly different.” • Includes concluding that the null hypothesis is true. Should be “not enough evidence to say the null is false.” Example: Low Power Success = Yes, I recycle. Gender Male Female X 33 54 N 59 79 Sample p 0.559322 0.683544 Estimate for p(1) - p(2): -0.124222 95% CI for p(1) - p(2): (-0.287215, 0.0387704) Test for p(1) - p(2) = 0 (vs not = 0): Z = -1.49 P-Value = 0.135 A number of students said that they were surprised that the hypothesis test said “no difference in percentages.” Example: Low Power Power and Sample Size Test for Two Proportions Testing proportion 1 = proportion 2 (versus not =) Calculating power for: proportion 1 = 0.55 and proportion 2 = 0.70 Alpha = 0.05 Difference = -0.15 Sample Size Power 60 0.4366 70 0.4911 80 0.5421 *Sample size = # in EACH group Mistake #4 • Failing to report a confidence interval as well as the P-value. • P-value tells you if statistically significant. • Confidence interval tells you what the population value might be. Example: A Significant, but Potentially Meaningless Difference Two sample T for Phone Gender Male Female N 59 80 Mean 79 153 StDev 162 247 SE Mean 21 28 95% CI for mu (1) - mu (2): ( -142, -5) T-Test mu (1) = mu (2) (vs not =): T = -2.11 P = 0.036 DF = 135 P-value tells us significant difference, but confidence interval tells us that the difference in the averages could be as small as 5 minutes. Incidentally…. Outliers Removing Outliers … Two sample T for Phone Gender Male Female N 58 79 Mean 59.9 129 StDev 66.5 133 SE Mean 8.7 15 95% CI for mu (1) - mu (2): ( -103.7, -35) T-Test mu (1) = mu (2) (vs not =): T = -4.02 DF = 121 P = 0.0001 The difference in male and female phone usage becomes even more significant. We are 95% confident that the difference in the averages is now more than 35 minutes. Mistake #5 • “Fishing” for significant results. That is, performing several hypothesis tests on a data set, and reporting only those results that are significant. • If  = P(Type I) = 0.05, and we perform 20 tests on the same data set, we can expect to make 1 Type I error. (0.05 ×20 = 1). Example: Results Obtained from Fishing • Primary driver of $10,000 vehicle and going away for Spring Break are related (P=0.01). • Virginity and supporting self through school are related (P = 0.045). • Virginity and graduating in four years are related (P = 0.041). • Virginity and attending non-football PSU sports events are related (P = 0.016). Mistake #6 • Overstating the results of an observational study. – That is, suggesting that one variable “caused” the differences in the other variable. – As opposed to correctly saying that the two variables are “associated” or “correlated.” • Don’t forget that a significant result may be “spurious.” Example: Misleading Headlines • Virgins don’t support themselves through school. • Non-virgins too busy to go to non-football PSU sporting events. • Non-virgins also too busy to graduate in four years. Mistake #7 • Using a non-random or unrepresentative sample. • Includes extending the results of an unrepresentative sample to the population. Example: Unrepresentative sample • Shere Hite wrote a book in 1987 called “Women in Love” • 100,000 questionnaires about love, sex, and relationships sent to women’s groups. Only 4,500 questionnaires returned. • Entire book devoted to results of survey. • Examples: 91% of divorcees initiated the divorce; 70% of women married 5 years committed adultery. Mistake #8 • Failing to use all of the basic principles of experiments, including randomization, blinding, and controlling.

Common Statistical Mistakes

Related documents

Products

Support

Common Statistical Mistakes

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib