Tutorial two Answers Exercise One: With this question you are comparing proportions. Since the categories are both nominal level i.e. men and women and seen/not seen the appropriate test is a chi square test. The Null hypothesis is that there are no differences between men and women as to whether they’ve seen the movie. The Alternative hypotheses are One tailed: men have seen the movie more than women Two tailed: There is a difference in gender as to whether they’ve seen the movie. The critical value can be set at .05 Chi-Square Tests Pears on Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test N of Valid Cas es Value .586 b .082 .600 df 1 1 1 Asymp. Sig. (2-s ided) .444 .774 .439 Exact Sig. (2-s ided) Exact Sig. (1-s ided) .642 .392 20 a. Computed only for a 2x2 table b. 2 cells (50.0%) have expected count less than 5. The minimum expected count is 2.80. The continuity correction corrects for an overestimate of the chi-square value in a 2x2 table. This is the figure to use when examining for significance. In the table above the corrected value is .082 with an associated significance level of .774. To be significant, the Sig. Value needs to be .05 or smaller. In this case .774 is larger than the alpha value of .05 therefore we can conclude that our result is not significant. I.e. there is no gender difference in terms of who has seen the movie. Note however, that two cells have expected counts less than 5, which violates one of the assumptions of the Chi square. In this case we should use Fisher’s Exact test. For the second part of the exercise you need to collapse a ratio variable (age) into a categorical variable. (newage) In the data window click open the Transform menu, then click on Recode into Different variables. Click on the variable (age) and move it onto the box labeled Input Output Variable Type in a new name for this grouped variable e.g. Agegrp. Type in a label description and then click on Change Then click on Old and New Values Under the Old Values section click on Range lowest through ________ and enter 27 In the new Values section enter a value of 1 and then click on Add 1 Going back to the Old Values Section click on Range _________ through highest and enter 28 in the space In the new values side enter 2 into the value and then Add. And then continue A new variable should appear in the data window which can now be used to run a chi square test to see if age has a difference. age group * Seen Matrix Crosstabulation age group 1.00 2.00 Total Seen Matrix no yes 1 10 3.9 7.2 9.1% 90.9% 14.3% 76.9% 5.0% 50.0% 6 3 3.2 5.9 66.7% 33.3% 85.7% 23.1% 30.0% 15.0% 7 13 7.0 13.0 35.0% 65.0% 100.0% 100.0% 35.0% 65.0% Count Expected Count % within age group % within Seen Matrix % of Total Count Expected Count % within age group % within Seen Matrix % of Total Count Expected Count % within age group % within Seen Matrix % of Total Total 11 11.0 100.0% 55.0% 55.0% 9 9.0 100.0% 45.0% 45.0% 20 20.0 100.0% 100.0% 100.0% Chi-Square Tests Pears on Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test N of Valid Cas es Value 7.213 b 4.904 7.739 df 1 1 1 Asymp. Sig. (2-s ided) .007 .027 .005 Exact Sig. (2-s ided) Exact Sig. (1-s ided) .017 .012 20 a. Computed only for a 2x2 table b. 2 cells (50.0%) have expected count less than 5. The minimum expected count is 3.15. Since two cells have an expected cell frequency of less than 5 we have to use Fisher’s exact test the significance factor is .017 which is less than .05 so we can conclude that there is a significant difference in age as to who has seen the movie. More specifically that older people, regardless of gender are less ,likely to have seen it. 2 Exercise Two In this case we want to test whether there is a difference in means between the first and second test scores. Because the scores are not independent; it is the same person taking the score. The appropriate test is a paired-samples t-test. The null hypothesis here is that there is no difference between the scores after social skills training. The alternative hypothesis is that there is (or that there is an improvement) From the Analyze menu click on Compare Means and then Paired Samples T test Click on the two variables of interest and move them into the paired variables box. Click OK Paired Samples Statistics Pair 1 Mean 24.69 26.88 1st SE tes t s core 2nd SE tes t s core N 16 16 Std. Deviation 3.400 3.384 Std. Error Mean .850 .846 Paired Samples Test Paired Differences Mean Pair 1 1st SE tes t s core 2nd SE tes t s core -2.19 Std. Deviation Std. Error Mean 2.562 .640 95% Confidence Interval of the Difference Lower Upper -3.55 -.82 t -3.416 df Sig. (2-tailed) 15 .004 In this example you need to look at the last column Sig. This is the probability value. Since this value is less than .05 we can conclude that there is a significant difference in the scores from time one to time two. She can conclude that social skills training for six weeks improves social skills learning. Exercise three There are two things that you need to do here. You need to describe your data and also decide whether any differences are statistically significant. It is best to do these things separately. The first part of this questions asks you to compare means in a table format. From the Analyze menu click on compare means. And then Means\ The independent variable in this case is occupation Respiratory and Gastric problems are the dependent variables. Move the variables across into the appropriate boxes. On the options menu ensure that SD and number of cases will be calculated. 3 Report Occupation 1 2 Total Mean N Std. Deviation Mean N Std. Deviation Mean N Std. Deviation Res piratory Dis orders 1.82 11 1.722 4.00 9 2.550 2.80 20 2.353 Gas tric Dis orders 4.64 11 2.976 2.33 9 1.658 3.60 20 2.683 For question 2 you want to look just at plumbers and then at bankers. This means that you will have to split your data file From the Data view open the Data menu and click on split file option. Click on Compare Groups and specify the grouping variable i.e. occupation. Click on OK Until this option is turned off any tests that will be performed will be performed on the two groups separately. The next part asks you to see whether plumbers take more days off due to respiratory or stomach problems. Since the people involved are the same in both cases i.e. not independent you need to run a paired samples t test. See the procedure as above. This time however the result will be for both plumbers and bankers in one table. Paired Samples Test Paired Differences Occupation 1 2 Std. Deviation Std. Error Mean -2.82 3.763 1.135 -5.35 -.29 -2.484 10 .032 1.67 2.739 .913 -.44 3.77 1.826 8 .105 Mean Pair 1 Pair 1 Res piratory Disorders - Gas tric Dis orders Res piratory Disorders - Gas tric Dis orders 95% Confidence Interval of the Difference Lower Upper t df Sig. (2-tailed) For plumbers there is a significant difference, but not for bankers. To turn the split data file function off, go back into the Data menu and click on the first radio button: analyze all cases, do not create groups. The last part of the question asks you to simply compare days off between bankers and plumbers. This will require collapsing days off due to stomach problems and due to respiratory problems into a new variable. From the transform menu click on compute Give a name to the new target variable e.g. Daysoff Move the respiratory variable over to the numeric expression box, Then insert the + symbol and move the gastric problem variable over and click OK 4 The next step involves comparing the means of the two groups (Bankers and plumbers). Since the two groups are independent the appropriate test is an independent samples t-test The null hypothesis is that there are no differences between the two groups The alternative hypothesis is that there is a difference. From the Analyze menu click on Compare Means and then Independent Samples t-test. Move the dependent (Continuous) variable (i.e. daysoff) into the area labeled test variable. Move the independent (categorical) variable into the section labeled grouping variable. Click on define groups and type the numbers used in the data set to code for each group. I.e. 1 for plumbers and 2 for bankers. Click on Continue and then OK Group Statistics total days off Occupation 1 2 N 11 9 Mean 6.4545 6.3333 Std. Deviation 3.07778 3.31662 Std. Error Mean .92799 1.10554 Independent Samples Test Levene's Test for Equality of Variances F total days off Equal variances ass umed Equal variances not as sumed .049 Sig. .827 t-tes t for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper .085 18 .933 .1212 1.43207 -2.88745 3.12987 .084 16.637 .934 .1212 1.44339 -2.92914 3.17157 Levene’s test for equality of variances tests to see whether the variation of the two groups is the same (an assumption of the t-test). The outcome of this test determines which of the t-values you need to use. If the Sig. value is larger than .05 you should use the first line in the table. I.e. equivalent variances assumed. If it less than .05 (e.g. .001) this means that the variances of the two groups is not the same and the t-statistic you need to use is the second one. To find out if there is a significant difference between the two groups use the Sig (2tailed) column. Use the bottom one since equivalence of variance assumption has not been violated. If the value is equal or less than .05 then there is a significant difference in the mean scores. Since the difference is above .05 we can assume that there are no significant differences in number of days taken off between plumbers and bankers. 5