Lab Assignment 3

advertisement
Lab #3 Two-Sample Hypothesis Testing, ANOVA, and Chi Square
POL SCI 390 BENESH
This lab assignment has several goals: 1) Provide another opportunity to explore your research questions
from the GSS; 2) Provide practice in conducting hypothesis tests on real data; 3) Experience ANOVA via
STATA (so that you don’t have to compute it by hand!); 4) Start to get comfortable with questions of the
relationship between variables; 5) Learn to obtain and read a crosstab in STATA, used very often in research.
To that end, consider your second lab, in which you were required to identify an independent variable, a
dependent variable, and two control variables. In this lab, we’ll go further than we were able to go toward
seeing whether your research question is borne out by testing differences between and among groups and the
relationship between two of your variables. (See the Lab 2 handout for the list of variables. Use your revised
dataset, which you should have saved after Lab 2.)
SECTION 1: Two-Sample Hypothesis Testing
Recall your hypothesis and the variables in which you are interested and again remind us what they are (or,
change them and type your new research question/theory/hypothesis here): My research question is
______________________________________, with the theory that ___________________________.
My main hypothesis is ___________________________________________. Hence, my DEPENDENT
(DV) is _____________________________________, my INDEPENDENT (IV) is
________________________________, and I control for CONTROL 1 __________________________
and CONTROL 2 _____________________________________________________________________.
Step 1: Choose a variable that could be recoded into two categories as your IV. (This could be your IV or
one of your controls.) Recode the variable into A NEW VARIABLE with two categories. (Male/Female,
North/South, High/Low, Democrat/Republican, etc.) (I used CAPS here because you’ll want to keep your
originally coded variable for now as you may wish to code it differently for the ANOVA analysis to come,
should you choose to use the same variables in that section.) Your DV should be as close to interval-ratio
level as possible. (A seven-category scale will be sufficient for these purposes.) Recode the DV as needed as
well (to eliminate missing cases, make values make more sense, etc. You likely did this already for Lab 2.)
Step 2: Conduct a two-sample hypothesis test using the FIVE STEP model from your text. (Step 1: State
the assumptions. Step 2: State the null hypothesis (and the research hypothesis). Step 3: Select the sampling
distribution and find the critical value (from the Appendices) of the test statistic. (We’ll use t because STATA
does not easily provide z.) Step 4: Have STATA compute the test statistic (t obtained). Step 5: Make a
decision and interpret the result.) Do not skimp on interpretation. This is the first time you get a real
handle on your hypothesis. Do you find some preliminary support for it, using the two-sample difference of
means test? Be sure to copy and paste all STATA output into your Word document for every step you take
(including the recoding).
Michael’s Slide 5 provides his example. His question is whether income differs significantly between women
and men. He tests the null that women’s income and men’s income come from the same population and
obtains a t(observed) of 3.08, which exceeds the critical value of 1.96. Hence, he concludes that women’s
income and men’s income come from different populations. Indeed, according to the p values STATA
provides, we can be 99.9% sure that men’s income comes from a population where the income is significantly
higher than women’s income. (He goes on to refine his test.)
Page 1 of 2
SECTION 2: ANOVA
Recall that, for ANOVA, you need a categorical IV that has more than two categories. Recode either the IV
you used in Section 1 or one of your other variables (the IV or one of the controls) into three or more
categories. ANOVA seeks to compare the variation within categories to the variation across categories, right?
So STATA will provide you with a SSW and SSB and then their ratio (the F statistic). Again, follow your
book’s 5 steps to test whether the categories of your IV come from different populations with respect to the
DV. (Step 1: State the assumptions (one of which is that the population variances are equal – STATA will
test this assumption for you when you use the oneway command). Step 2: State the null hypothesis (and
the research hypothesis). Step 3: Select the sampling distribution and find the critical value of the test
statistic. Step 4: Have STATA compute the test statistic (F obtained). Step 5: Make a decision and interpret
the result.) Again, don’t skimp on interpretation.
Slide 15 shows you Michael’s example where he considered whether education influences ideology. He has 5
categories in his IV (rs highest degree) and he seeks to test whether the mean ideology for each
category come from the same population (such that education has no influence on ideology). The null, then,
is that the population means for each group are the same. He obtains a SSB and SSW (which you could
calculate, provide the N wasn’t so large!!) and then their ratio, reported as F = 8.32. His value exceeds the
critical value for F (which is what? – consult the appropriate appendix) and so he rejects the null, concluding
that at least one of the means is significantly different from the others. Note the Bartlett’s test for equal
variances. The null hypothesis is that the population variances for each education category ARE equal, which
is an assumption made by ANOVA. He CANNOT REJECT that null, and so the ANOVA is appropriate.
Slide 17 explores the differences among the means to see whether they are all different from one another or
whether, instead, only one or two are different from the others. He finds that the differences in the means
are attributable to the graduate degree group, which is statistically different (more liberal) than all of the other
groups. (The other groups are indistinguishable from one another.) You should run and interpret this test
as well.
SECTION 3: Chi Square
Finally, you’ll use the Chi Square test for independence to see whether your variables are related to one
another. Do a Chi Square test for each of your IVs on your DV. (So, run a Chi Square between (1) your
IV and DV, (2) your Control 1 and DV, and (3) your Control 2 and DV.) Again, use your book’s 5 steps to
determine whether your variables are statistically significantly dependent. (Step 1: State the assumptions.
Step 2: State the null hypothesis (and the research hypothesis). Step 3: Select the sampling distribution and
find the critical value of the test statistic (from the appendix). Step 4: Have STATA compute the test statistic
(χ2 obtained). Step 5: Make a decision and interpret the result.) In what direction do the relationships run?
(Add column percents to figure this out.) Considering all three of the chi square tests you ran, what do you
now know about the relationships between your IVs and your DV and what does that mean for your research
question?
Michael’s Slide 23 reports the result of his test for independence where he seeks to ascertain whether gender
is related to whether or not the respondent has ever been unfaithful. As shown there, the χ2 statistic is large
and statistically significant. He concludes that men and women are statistically significantly different in their
propensity to cheat. But remember, χ2 doesn’t give us a direction for that relationship. Instead, we must
calculate column percentages to see how the two groups behave on the DV, as Michael does in Slide 24.
There, he shows that 22% of men versus 14% of women reported cheating on their spouse. Hence, men are
significantly more likely to engage in this behavior. You should do this as well. (Note that you can run the
crosstab with the column option right away so that you have this information readily available.)
SECTION 4: Putting it all together
You’ve now run three different tests to ascertain whether or not your research question is empirically
supported. What can you conclude, and why? (And what explains any differences ACROSS tests that you
find?)
Page 2 of 2
Download