Methods for Comparing a Single Numeric Variable across Two Groups – Part II We have already examined the methods used to compare a single numeric variable across two groups when the samples are dependent. Next, we will look at the methods which are used to compare a single numeric variable across two groups when the samples are independent of one another. Example: We’ve already looked at the data collected about the time spent on Facebook in a day from the Study Data Survey. Let’s again look at this data, but investigate whether males and females spend different amounts of time on Facebook in a day. The data can be found in the file Facebook_means.jmp on the course website. Research Question – Is there evidence that on average males and females spend different amounts of time on Facebook in a day? First, let’s take a look at the data in JMP. Choose Analyze Fit Y by X and put Gender in the X, Factor box and Time Spent on Facebook in the Y, Response box as shown below. After you click OK, you should get the following output. Questions: 1. Looking at the plot, what can you say about the amount of time males and females spend on Facebook? 1 We could also look at boxplots of the data to get a better “picture” of the data for each group. You can do this by clicking on the red drop-down arrow and choosing Display Options Box Plots. You should then get the following output. 2. Is there any additional information that is learned about the two groups from this plot? Some additional information that might be helpful in investigating the relationship between the amount of time spent on Facebook and gender are the means and standard deviations for each group. We can get JMP to provide this information by clicking on the red drop-down arrow and choosing the Means and Std Dev option. You should then get the following output added below the graph. 3. Using the means and standard deviations, as well as the above plot, what can you say about the amount of time males and females spend on Facebook in a day? 4. Do you think the variability of each group is similar enough to do an outright comparison between the groups? Explain. 2 5. Why should we even care about the variability for each group in the decision making process? Explain. Step 0: Define the research question. Is there evidence that on average males and females spend different amounts of time on Facebook in a day? Step 1: Determine the appropriate null and alternative hypotheses. Since we have two separate groups (population means) to consider, we’ll have to use two population means in our hypotheses. H0: µmales = µfemales Ha: µmales ≠ µfemales Step 2: Check the appropriate assumptions and find the test statistic. The type of hypothesis test that is most appropriate to conduct will depend on whether the two groups have similar variability or not. The formal test for equal variances (similar variability) is given below. H0: The variances ____________ equal (the same) Ha: The variances ____________ equal (the same) To carry out this test in JMP, choose Unequal Variances from the red drop-down menu. You should then get the following output. 3 If any of the p-values circled above are significant (i.e. less than 0.05) then there is evidence that the variances are unequal. 6. Is there evidence of unequal variances for the Facebook data? Explain. Next, we’ll need to check that either __________ samples are sufficiently large or ___________ populations are normally distributed. We will use the methods discussed previous for checking normality. 7. Which condition/assumption has been met for these data? Note: You can get the summary statistics, histograms, and normal quantile plots from JMP by choosing Analyze Distribution and putting time spent on Facebook in the Y, Columns box and gender in the By box as shown below. Females: 4 Males: Since, the sample sizes are sufficiently large and we have found no evidence of unequal variances, we can now carry out what is called the pooled t-test. Looking at the output from the Fit Y by X, click on the red drop-down arrow and choose Means/Anova/Pooled t. You should get the following output. Notes: The test statistic computed looks at the difference in the sample means ( x male x female ), denoted by Difference in the output above. The standard error of the difference, denoted Std Err Diff in the output above, summarizes the variability of the two groups assuming the variability is the same. The t Ratio given in the above output is the test statistic and is computed by dividing Difference by Std Err Diff. 8. Give the value of the test statistic and p-value from the above output. 5 Step 3: Find the p-value. The p-values are given in the same order as in the previous hypothesis testing output we’ve seen. 9. What is the p-value for this test? Step 4: Report the conclusion in context of the research question. Example: Consider the data found in the file BirthWeights.jmp found on the course website. The data summarize the birth weights of babies and whether the mother was an ex-smoker or a non-smoker. Step 0: Define the research question. Is there evidence the average birth weight for babies born to ex-smokers is smaller than the average birth weight of babies born to non-smokers? Step 1: Determine the appropriate null and alternative hypotheses. H0: µex-smokers ≥ µnon-smokers Ha: µex-smokers < µnon-smokers Note: We’ll have to make sure the order of the groups in the hypotheses matches the order in which JMP computed the difference so that we use the appropriate p-value. 6 Step 2: Check the appropriate assumptions and find the test statistic. First, we’ll need to determine whether the variances are the same (equal) or not (unequal). What hypotheses do we need to test? 10. What conclusion can be made regarding the assumption of equal variances? Next, we’ll need to determine whether the sample sizes are sufficiently large or the data are normally distributed. Ex-smokers: 7 Non-smokers: 11. Which condition, large samples or normal distributions, has been met? 12. What is the test statistic for this test? How did JMP compute it? Step 3: Find the p-value. Note: JMP found the difference in sample means by taking x NonSmoker x ExSmoker . Therefore, the nonsmoker group should be listed as the first group in our hypotheses. Therefore, the test in JMP is testing the following hypotheses: H0: µnon-smokers ≤ µex-smokers Ha: µnon-smokers > µex-smokers Step 4: Report the conclusion in context of the research question. 8 Example: Another question on the Student Data Survey asked about the number of Facebook friends a person has. The data can be found in the file Facebook_Friends.jmp on the course website. Step 0: Define the research question. Is there evidence that females on average have more Facebook friends than males? Step 1: Determine the appropriate null and alternative hypotheses. Step 2: Check the appropriate assumptions and find the test statistic. First, we’ll need to determine whether the variances are the same (equal) or not (unequal). What hypotheses do we need to test? 13. Has the equal variance assumption been met? 9 Since the equal variance assumption has not been met, we’ll have to carry out the hypothesis test assuming unequal variances. To carry out this test in JMP, choose t Test from the red drop-down menu. You should get the following output. Note: JMP computed the difference in sample means by taking x male x female , therefore the ordering of the groups in our hypotheses should be male, then female for the second group. 14. Give the value for the test statistic. How did JMP compute this value? Step 3: Find the p-value. Step 4: Report the conclusion in context. 10 Example: The data set Walleye.jmp from the course website contains information about the walleye population in two Minnesota Rivers – Minnesota and Mississippi Rivers. It is of interest to compare the mercury levels (HGPPM) of the walleye from the two rivers. Research Question – Is there evidence that there is a difference in the average mercury levels of the fish contained in the Minnesota and Mississippi Rivers? 11 Example: Another question on the survey asked each individual to report how many songs they have on their ipod. You can find the data in the file ipod.jmp on the course website. Research Question – Is there evidence that on average men have more songs on their ipod than women? 12