Practice for the Mid-Term • • • SPSS DATA ANALYSIS: You will only be tested in SPSS on chisquare, t-test, and analysis of variance. All of the questions will use the socialsurvey.sav data set. 1. Open the socialsurvey.sav data set in SPSS Data Editor. Suppose you think that people who like to go out like to go to a variety of different events, but some people are just stay-athomes. Test the hypothesis at the p <.05 level of confidence that there is an association between having visited an art gallery or museum in the last year (the variable (#48) is called “Visit Art Museum or Gallery in Last Yr (visitart)” and having attended a sporting event in the last year (the variable (#47) is called “Attended Sports Event in Last Year (attsprts)” Is the obtained relationship in the expected direction (art gallery visitors went to more sports events, relative to their numbers, than nonvisitors? Is the relationship statistically significant? Give the value of the test statistic and its associated probability level. Create a new Word document and type Problem # 1. Below that copy and paste your SPSS output. Save the document as “SPSSOutputYourLastName.doc.” Chi-Square: Appropriate Test for the Impact of a Nominal Level IV on another Nominal Level IV • To answer this question you will run a Chi-Square Analysis. You will be able to figure this out by first considering the level of measurement for the two variables. In the “Variable View” window look at the two variables, and note that each is measured not in terms of ranking or numeric values but in terms of discrete categories (whether they did or did not attend a sports event, and whether they did or did not visit a museum or gallery). For two nominal scale variables like this, the correct statistic to analyze their relationship would be Chi-square. Running a Chi-Square Analysis in SPSS • • • • Now that you have decided to run a Chi-square test, go to Analyze/Descriptives/Crosstabs Move the Visited Art Galleries variable into the Columns box (you do this because the column variable is the one you are treating as independent, and in your hypothesis you have asked if there is a significant association between going to art galleries and going to sporting events). In the case of two variables like this there probably is no causal relationship, so which variable is the column variable is more or less arbitrary, but if your hypothesis were about the effect of say, gender, then gender is the obvious independent variable and would have to go in the column box) Move the Attended Sport Events variable into the Rows box Under Cells, click Observed, Expected, Row, Column and Total and click Continue (you do this so that you will have all the information you need to see if the direction of the relationship you predicted is in fact correct. In this case you want to know, is there an association between visiting art museums and going to sports events that is greater than what you might expect by chance) How to Make Your Decisions • • • Under Statistics, click Chi-square, then Continue, and OK. (Chi-square is your test statistic) To confirm your hypothesis, you need for the obtained value of Chi-square to be significant at the .05 level. SPSS will print out the exact probability level for you. You need for p to be less than .05 However, if after getting a significant Chi-square you look over the table of observed versus expected counts and note that the trend is going in the wrong direction (that is, museum goers are less likely than expected to go to sports events) then although you can say that there may be an association, you have not established that more art visits are associated with more sporting event attendance. Comparing Obtained (Count) to Expected (Expected Count) to Determine Direction of Relationship between the Two Variables On the right is what your output should look like for checking out the direction of the relationship. Note that within the group of art visitors that went to a sports event, the counter is higher than the expected count (orange), while within the group of nonvisitors to art museums, the count of those who went to sports events is lower than the expected count (green). (The expected count for a cell is obtained in your SPSS output by multiplying the column total by the row marginal (see arrows). So your obtained results are in the direction you were expecting. Now it remains to be seen if this is a statistically significant relationship Is the Obtained Value of Chi-square Significant? • • Is the observed positive relationship between visiting a museum or gallery and attending a sports event statistically significant beyond the .05 level? According to the output, you have a chi-square of 79.414 (df = 1) which is significantly less likely than the .05 probability level. If it were not significant, the value in the fourth column below would read .051 or larger. So it’s fair to say that you have confirmed the hypothesis there is a significant association between attending visiting art museums and visiting sporting events, and that the trend is for the relationship to be positive Chi-Square Tests Pearson Chi-Square Continuity Correction a Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Value 79.414b 78.474 80.584 79.361 df 1 1 1 1 Asymp. Sig. (2-sided) .000 .000 .000 Exact Sig. (2-sided) Exact Sig. (1-sided) .000 .000 .000 1487 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 282.17. Difference of Means Test for Two Levels of the IV: T Test • • • For the exam, we will only consider the t test for independent samples in doing an SPSS application. A t test is used when you have only two groups on the IV (two categories of a nominal-level independent variable, such as gender), and interval or ratio level for the DV There are three varieties of t test in your SPSS list of options (independent, dependent or pairedsuch as pre-post comparisons, and single sample (such as comparing a sample mean to an assumed population mean or other known parameter) Here’s a sample test question: Test the hypothesis that people with a college degree ((#59) “College Degree” (degree2) watch fewer hours of TV (the variable (#35) “Hours per Day Watching TV (tvhours)”) than people without a college degree. Test the hypothesis at the .01 level of confidence. Report the test statistic, df, probability level and the means for the two levels of the independent variable. Use the Levene test to determine which statistic you should report Determine the Fit between Levels of Measurement of IV and DV and the Statistical Test • • • Look at the variables in Variable View. You see that the IV, college degree, has two categories and that it is a nominal level variable (ignore the “ordinal” tag that is attached to all of the variables in the data file; it is not true of all of them) because there are only two discrete categories, have the degree and don’t have the degree Similarly, look at the DV, hours spent watching TV. This is at least an interval level measure. Therefore, this question is suitable for the t test, which is for testing the impact of a two-level nominal level IV on variation in an interval or better DV (alternatively, testing whether the means on two “groups” (levels of the IV) differ significantly on the DV (are drawn from different populations) Running a t Test for Independent Samples in SPSS • • • • • In Data Editor, go to Analyze/ Compare Means/ Independent Samples T-Test Move the College Degree (degree2) variable into the Group Variable box and click on the Define button to assign values to levels of the variable. Use the values from the Variable View which assign 0 to Group 1 (no degree) and 1 to Group 2 (college degree) Move the Hours Watching per Day Watching TV variable into the Test Variable(s) box Under Options, set the confidence interval to 99% and click Continue, and then OK Use your output to answer the question Finding Answers to the Question in Your SPSS T Test Output First, check to see if the variances between the two groups in the IV are significantly different based on Levene’s statistic: they are (note the probability level which shows the statistic fell into the critical region). This determines which value of t you will report. The tests shows that you can’t assume equal variances (they are significantly different) so you have to use the value of t which is calculated assuming unequal variances: 12.275 Independent Samples Test Levene's Test for Equality of Variances F Hours Per Day Watching TV Equal variances assumed Equal variances not assumed 66.470 Sig . .000 t-test for Eq uality of Means t df Sig . (2-tailed) Mean Difference Std. Error Difference 99% Confidence Interval of the Difference Lower Upper 8.861 1484 .000 1.18 .134 .839 1.528 12.275 1149.777 .000 1.18 .096 .935 1.433 Levene’s test shows signfiicant difference of variances between two levels of the IV Answering Your Question Group Statistics Hours Per Day Watching TV • Colleg e Degree No College degree Colleg e degree N 1140 346 Mean 3.17 1.99 Std. Deviation 2.392 1.217 Std. Error Mean .071 .065 Here are the group means Here’s a way to write the answer: A test of the hypothesis that people with a college degree would differ from people without a college degree on number of hours spent per day watching TV indicated that there was a significant difference in number of hours spent watching TV (t (unequal variances) = 12.275, df = 1149.777, p < .0005) . (Note that since your test is one-tailed (you predicted a direction, you have to “cut the probability in half” since SPSS only reports the two-tailed test). Persons with a college degree spent an average of 1.99 hours per week watching TV while persons without a college degree spent an average of 3.17 hours per day watching TV. Univariate Analysis of Variance: an Appropriate Test of the Impact of a 3- or More Level IV on an Interval or Ratio Level DV: Sample Problem • A Sample Problem for the Midterm: Test the hypothesis that religious preference of respondent (the variable (#27) “Religious Preference (relig)”) has a significant impact on hours per day watching TV (the variable (#35) “Hours per Day Watching TV (tvhours)”) Make a decision in advance to reject the null hypothesis (and confirm the research hypothesis) if the obtained value of the test statistic falls into the .01 confidence region Sample Problem, cont’d • Write up your results as if you were writing for a journal. Include • • • • • • • • • The value of the test statistic The degrees of freedom The level of significance associated with the test statistic Report the effect size (amount of variance accounted for, the partial eta squared) The statistical power associated with your test Report the results of the test for equality of variances Report the means for each condition (level of the variable “religious preference”). Run post hoc tests using Sheffe to see if there are significant pair-wise differences in mean TV hours watched among the levels of the independent variable (religious preference) and report which ones are significant. If there is anything in your printout that suggests that the Sheffe tests might not be appropriate, run the more appropriate type of post-hoc test. Make an assessment as to the importance of the observed relationship between religious preference and hours watching TV based on the effect size What Kind of Variables Do I Have? • First consult the Variable View in SPSS Data Editor to find out what kinds of variables these are (look under the value labels). You will see that religious preference is a nominal level variable with five categories, and tv hours is a ratio level variable. The appropriate analysis for studying the effect of a nominal level variable with more than two levels (categories) on an interval or ratio level variable is an analysis of variance (ANOVA). So you make the decision that you will run an ANOVA and treat religious preference as the IV and hours watching TV as the DV Running ANOVA in SPSS • • • • • • • Go to Analyze/ General Linear Model / Univariate Move the Religious Preference variable into the Fixed Factor(s) Window (this is where “fixed” IVs go) Move the Hours Per Day Watching TV Variable into the Dependent Variable box (you are saying TV hours watched is “dependent” on religious preference) Don’t make any changes under Model, Contrasts, or Plots Under Options, move Overall, Relig to the Display Means window Also under Options/Display, select descriptive statistics, estimates of effect size, observed power, and homogeneity tests. You know to ask for these because the question asked you to provide them. This will give you the mean tv hours according to religion, the effect size (how much variance in tv hours you can explain with religion), how much power you had to detect a difference if there is one, and whether or not your levels of the IV have different variances and thus you need to do the alternative tests (Tamhane, not Sheffe, for example) Finally under Options set the significance level to .01 and click Continue More ANOVA in SPSS • • • • • Click the Post Hoc Button and move relig into the Post Hoc Tests for window Under equal variances assumed select Scheffe (you will use this test if the group variances do not differ significantly according to the Lehane test; Lehane test will show up on your output) Under equal variances not assumed select Tamhane T2 test (you will use this test if the group variances are significantly different according to the Lehane test) Click Continue and then OK Consult your output to answer the question Getting the Answers from Your SPSS Output: F, Significance, eta square, Power Here is the overall F statistic and its associated level of signifcance. It is not significant according to the .01 level you set up because the obtained value is larger than .01, so you can’t reject the null hypothesis Tests of Between-Subjects Effects Dependent Variable: Hours Per Day Watching TV Source Corrected Model Intercept RELIG Error Total Corrected Total Type III Sum of Squares 53.308b 2741.557 53.308 7393.922 19921.000 7447.230 df 4 1 4 1478 1483 1482 Mean Square 13.327 2741.557 13.327 5.003 F 2.664 548.020 2.664 Sig . .031 .000 .031 Partial Eta Squared .007 .270 .007 Noncent. Parameter 10.656 548.020 10.656 Observed a Power .522 1.000 .522 a. Computed using alpha = .01 DF b. R Squared = .007 (Adjusted R Sq uared = .004) Here are the partial eta squared (percent of variance in DV explained by the IV) and the power estimate More Answers from the SPSS output: Means on the DV by Level of the IV; Equality of Variances Test Descriptive Statistics Dependent Variable: Hours Per Day Watching TV Religious Preference Protestant Catholic Jewish None Other Total Mean 2.90 2.75 2.45 3.42 2.62 2.90 Std. Deviation 2.061 2.133 1.410 3.472 2.104 2.242 Levene's Test of Equality of Error Variancesa Dependent Variable: Hours Per Day Watching TV F 6.850 df1 4 df2 1478 Sig . .000 Tests the null hypothesis that the error variance of the dependent variable is equal across g roups. a. Design: Intercept+RELIG N 947 332 31 139 34 1483 Here are the group means which show how average hours watching TV varied as a function of religious preference. This table also gives you a numerical breakdown of the religious preference categories Here is the Levene test of equality of variances between levels of the IV, which in this case is significant beyond the .001 level. This means the variances can’t be assumed to be equal, so instead of the Sheffe post hoc tests you use a test like the Tamlane for unequal group variances Post-hoc Comparisons when Equal Variances Can’t be Assumed • If the overall effect is not significant the post-hoc pairwise comparisons probably won’t be (for example comparing Catholics to Protestants) but let’s look at the output anyhow. Look at the table in your output called Post Hoc Tests, Religious Preference, Multiple Comparison (the bottom half). We have already established that we have to use the post hoc tests which assume unequal group variances, so we will look only at the Tamhane tests in the bottom half of the table. Look at the column called Significance and you will see that none of the tests yields a value of significance less than .05 Significance levels Answer to the Question: What your “Results” Section Would Say • Answer: To test the hypothesis that religious preference has a significant impact on hours spent watching TV, a one-way analysis of variance was conducted. The obtained value of F (4, 1478) of 2.664 was not significant at the .01 level. The effect size (partial eta squared) was .007. Power to detect the effect was .522. The Levene test for the equality of variances among the levels of the independent variable (religious preference) found that the variances were significantly different (F = 6.850, p < .001), suggesting that an alternative post hoc test for pair-wise differences of means should be used. The mean TV hours watched by religious preference were: Jewish, 2.45; “other,” 2.62; Catholic, 2.75; Protestant, 2.90; and “none,” 3.42. Tamhane tests of post-hoc differences indicated that there were no significant differences in TV hours watched between any levels of the independent variable. However, power to detect a between-groups effect was low (.522) despite the large sample size, so the issue might be revisited in a new study with more power to detect a small effect.