Methods for Comparing a Single Numeric Variable across Two

advertisement
Methods for Comparing a Single Numeric Variable across Two Groups – Part II
We have already examined the methods used to compare a single numeric variable across two groups
when the samples are dependent. Next, we will look at the methods which are used to compare a single
numeric variable across two groups when the samples are independent of one another.
Example: We’ve already looked at the data collected about the time spent on Facebook in a day from
the Study Data Survey. Let’s again look at this data, but investigate whether males and females spend
different amounts of time on Facebook in a day. The data can be found in the file Facebook_means.jmp
on the course website.
Research Question – Is there evidence that on average males and females spend different
amounts of time on Facebook in a day?
First, let’s take a look at the data in JMP. Choose Analyze  Fit Y by X and put Gender in the X, Factor
box and Time Spent on Facebook in the Y, Response box as shown below.
After you click OK, you should get the following output.
Questions:
1. Looking at the plot, what can you say about the amount of time males and females spend on
Facebook?
1
We could also look at boxplots of the data to get a better “picture” of the data for each group. You can
do this by clicking on the red drop-down arrow and choosing Display Options  Box Plots. You should
then get the following output.
2. Is there any additional information that is learned about the two groups from this plot?
Some additional information that might be helpful in investigating the relationship between the amount
of time spent on Facebook and gender are the means and standard deviations for each group. We can
get JMP to provide this information by clicking on the red drop-down arrow and choosing the Means
and Std Dev option. You should then get the following output added below the graph.
3. Using the means and standard deviations, as well as the above plot, what can you say about the
amount of time males and females spend on Facebook in a day?
4. Do you think the variability of each group is similar enough to do an outright comparison
between the groups? Explain.
2
5. Why should we even care about the variability for each group in the decision making process?
Explain.
Step 0: Define the research question.
Is there evidence that on average males and females spend different amounts of time on
Facebook in a day?
Step 1: Determine the appropriate null and alternative hypotheses.
Since we have two separate groups (population means) to consider, we’ll have to use two population
means in our hypotheses.
H0: µmales = µfemales
Ha: µmales ≠ µfemales
Step 2: Check the appropriate assumptions and find the test statistic.
The type of hypothesis test that is most appropriate to conduct will depend on whether the two groups
have similar variability or not. The formal test for equal variances (similar variability) is given below.
H0: The variances ____________ equal (the same)
Ha: The variances ____________ equal (the same)
To carry out this test in JMP, choose Unequal Variances from the red drop-down menu. You should
then get the following output.
3
If any of the p-values circled above are significant (i.e. less than 0.05) then there is evidence that the
variances are unequal.
6. Is there evidence of unequal variances for the Facebook data? Explain.
Next, we’ll need to check that either __________ samples are sufficiently large or ___________
populations are normally distributed. We will use the methods discussed previous for checking
normality.
7. Which condition/assumption has been met for these data?
Note: You can get the summary statistics, histograms, and normal quantile plots from JMP by choosing
Analyze  Distribution and putting time spent on Facebook in the Y, Columns box and gender in the By
box as shown below.
Females:
4
Males:
Since, the sample sizes are sufficiently large and we have found no evidence of unequal variances, we
can now carry out what is called the pooled t-test. Looking at the output from the Fit Y by X, click on
the red drop-down arrow and choose Means/Anova/Pooled t. You should get the following output.
Notes:

The test statistic computed looks at the difference in the sample means ( x male  x female ), denoted
by Difference in the output above.

The standard error of the difference, denoted Std Err Diff in the output above, summarizes the
variability of the two groups assuming the variability is the same.

The t Ratio given in the above output is the test statistic and is computed by dividing Difference
by Std Err Diff.
8. Give the value of the test statistic and p-value from the above output.
5
Step 3: Find the p-value.
The p-values are given in the same order as in the previous hypothesis testing output we’ve seen.
9. What is the p-value for this test?
Step 4: Report the conclusion in context of the research question.
Example: Consider the data found in the file BirthWeights.jmp found on the course website. The data
summarize the birth weights of babies and whether the mother was an ex-smoker or a non-smoker.
Step 0: Define the research question.
Is there evidence the average birth weight for babies born to ex-smokers is smaller than
the average birth weight of babies born to non-smokers?
Step 1: Determine the appropriate null and alternative hypotheses.
H0: µex-smokers ≥ µnon-smokers
Ha: µex-smokers < µnon-smokers
Note: We’ll have to make sure the order of the groups in the hypotheses matches the order in which
JMP computed the difference so that we use the appropriate p-value.
6
Step 2: Check the appropriate assumptions and find the test statistic.
First, we’ll need to determine whether the variances are the same (equal) or not (unequal). What
hypotheses do we need to test?
10. What conclusion can be made regarding the assumption of equal variances?
Next, we’ll need to determine whether the sample sizes are sufficiently large or the data are normally
distributed.
Ex-smokers:
7
Non-smokers:
11. Which condition, large samples or normal distributions, has been met?
12. What is the test statistic for this test? How did JMP compute it?
Step 3: Find the p-value.
Note: JMP found the difference in sample means by taking x NonSmoker  x ExSmoker . Therefore, the nonsmoker group should be listed as the first group in our hypotheses. Therefore, the test in JMP is testing
the following hypotheses:
H0: µnon-smokers ≤ µex-smokers
Ha: µnon-smokers > µex-smokers
Step 4: Report the conclusion in context of the research question.
8
Example: Another question on the Student Data Survey asked about the number of Facebook friends a
person has. The data can be found in the file Facebook_Friends.jmp on the course website.
Step 0: Define the research question.
Is there evidence that females on average have more Facebook friends than males?
Step 1: Determine the appropriate null and alternative hypotheses.
Step 2: Check the appropriate assumptions and find the test statistic.
First, we’ll need to determine whether the variances are the same (equal) or not (unequal). What
hypotheses do we need to test?
13. Has the equal variance assumption been met?
9
Since the equal variance assumption has not been met, we’ll have to carry out the hypothesis test
assuming unequal variances. To carry out this test in JMP, choose t Test from the red drop-down menu.
You should get the following output.
Note: JMP computed the difference in sample means by taking x male  x female , therefore the ordering of
the groups in our hypotheses should be male, then female for the second group.
14. Give the value for the test statistic. How did JMP compute this value?
Step 3: Find the p-value.
Step 4: Report the conclusion in context.
10
Example: The data set Walleye.jmp from the course website contains information about the walleye
population in two Minnesota Rivers – Minnesota and Mississippi Rivers. It is of interest to compare the
mercury levels (HGPPM) of the walleye from the two rivers.
Research Question – Is there evidence that there is a difference in the average mercury levels of
the fish contained in the Minnesota and Mississippi Rivers?
11
Example: Another question on the survey asked each individual to report how many songs they have on
their ipod. You can find the data in the file ipod.jmp on the course website.
Research Question – Is there evidence that on average men have more songs on their ipod than
women?
12
Download