Two Independent Samples Question In 2000, did men and women differ in terms of their body mass index? 1 Populations random selection 2. Male Inference 1. Female Samples random selection 2 Body Mass Index Females Males n1 50 n2 50 Y1 27.484 Y2 26.868 s1 7.860 s2 7.215 s p 7.544 3 95% Confidence Interval Y Y t s * 1 2 p 1 1 n1 n2 * t from t - table with df n1 n2 2 4 95% Confidence Interval Y Y t s * 1 2 p 1 1 n1 n2 27.484 26.868 1.98457.544 0.616 1.98451.509 0.616 2.995 2.38 to 3.61 1 1 50 50 5 Interpretation We are 95% confident that the difference in population mean BMI for women compared to men is between –2.38 and 3.61. Women could have a mean BMI as much as 2.38 lower than men or as much as 3.61 higher than men. 6 Difference? Because zero is in the confidence interval, there could be no difference in population mean BMI for women compared to men. This agrees with the test of hypothesis. 7 Two-sample model Y i •Y represents a value of the variable of interest • i represents the ith population mean • represents the random error associated with an observation 8 Conditions The random error term, , is Independent Identically distributed Normally distributed with standard deviation, 9 Residuals Estimate of error (Observation – Fit) Residual ˆ Y Yi 10 Checking Conditions Independence. Hard to check this but the fact that we obtained the data through separate random samples of women and men assures us that the statistical methods should work. 11 Checking Conditions Identically distributed. Check using an outlier box plot. Unusual points may come from a different distribution Check using a histogram. Bimodal shape could indicate two different distributions. 12 Checking Conditions Normally distributed. Check with a histogram. Symmetric and mounded in the middle. Check with a normal quantile plot. Points falling close to a diagonal line. 13 Distributions 3 .99 2 .95 .90 1 .75 .50 Normal Quantile Plot BMI centered by Gender 0 .25 -1 .10 .05 -2 .01 -3 30 20 15 Count 25 10 5 -20 -15 -10 -5 0 5 10 15 20 14 BMI Residuals Histogram is skewed left and mounded to the right of zero. Box plot is fairly symmetric with two potential outliers on the high side. Normal quantile plot has points following the diagonal line for the first part but then wiggles around for larger values. 15 BMI Residuals The conditions for statistical inference may not be met for these data. 16 Consequences The P-value for the test may not be correct. Even so, there is not much of a difference between women and men, and I would not change my conclusion from the test of hypthesis. 17 Consequences The stated confidence level may not give the true coverage rate. I would still use the confidence interval but recognize that the true coverage rate is probably less than 95%. 18