SOCY2200: Statistics Instructor: Natasha Sarkisian Worksheet 8: Hypothesis Testing for Correlation and Regression Hypothesis testing for correlation Whenever we are interested in finding out whether there is a relationship between two interval/ratio (“scale” in SPSS) variables in the population, we use correlation coefficient with the corresponding hypothesis testing. Step by step 1. State your null and research hypotheses. Null: In the population, the two variables are not related. H0: ρ = 0 The research hypothesis can be either non-directional: In the population, the two variables are related. H1: ρ ≠ 0 Or directional: In the population, the two variables are positively (or negatively) correlated. H1: ρ > 0 or H1: ρ < 0 2. Select the alpha level (the level of risk of Type I error that you are willing to take): the most common choice is 0.05; for more stringent tests, use 0.01 and .001; for smaller samples, we sometimes use .10. 3. Identify the test statistic that you need to use: here, it is the correlation coefficient itself. 4. We already know how to compute the correlation coefficient! 5. Use the table to find the critical value: We need to use Table B4 and find the critical value based on df=n-2, our selected alpha level, and one-tailed vs two-tailed choice. 6. Compare computed value with the critical value. 7. State your decision about H0: If your computed value is larger than the critical value reject H0 in favor of H1. If your computed value is smaller than the critical value fail to reject H0. Example We are interested to find out if the length of marriage is related to marital satisfaction among U.S. adults. We draw a random sample of 100 and calculate a sample r = -.225. Can we conclude that there is a relationship between the two variables in the population? Solution: 1. State hypotheses: H0: ρ = 0 In words: There is no relationship between the length of marriage and marital satisfaction among U.S. adults. H1: ρ ≠ 0 two-tailed test In words: There is a relationship between the length of marriage and marital satisfaction among U.S. adults. 2. Select alpha: The problem does not specify one, so let’s pick the most common one, 0.05 3. Test statistic: Correlation coefficient itself 4. The correlation coefficient is r = -.225 5. Use the table to find critical value: Table B4 (df=n-2=100-2=98, alpha=.05, two-tailed) 0.1946 6. Compare computed and critical value: .225 > 0.1946. 7. State your decision about H0: Reject H0 in favor of H1 We report: r = -.225, p < .05 Conclusion: The length of marriage has a weak negative relationship to marital satisfaction, and this relationship is statistically significant at .05 level. 1 Hypothesis Testing for Regression In regression, we use hypothesis tests to determine if our independent variable X has an effect on the dependent variable Y in the population. In a bivariate regression model, we obtain two coefficients: constant a and intercept b. In principle, we could test hypotheses for both of them, but usually, we are not interested whether our constant, a, is different from 0 or not. So the more important test is for the slope b. If we find that the slope is significantly different from zero, we conclude that there is in fact a linear relationship between the independent and dependent variables in the population. Step by Step 1. State your null and research hypotheses. Null: In the population, the independent variable has no effect on the dependent variable. H0: β = 0 The research hypothesis can be either non-directional: In the population, the independent variable has an effect on the dependent variable. H1: β ≠ 0 Or directional: In the population, an increase in the independent variable is associated with an increase (or decrease) in the dependent variable. H1: β > 0 or H1: β < 0 2. Select the alpha level (the level of risk of Type I error that you are willing to take): the most common choice is 0.05; for more stringent tests, use 0.01 and .001; for smaller samples, we sometimes use .10. 3. Identify the test statistic that you need to use: Student’s t 4. Formula: t = b/SEb SEb is the standard error of our regression slope b; if you manually calculated b, you would have to calculate SEb as Here, y is values of our y, y’ is predicted values of y based on the regression equation, x is values of x, is the mean of x, and n is the sample size. We won’t calculate it by hand in this class though. 5. Use the table to find the critical value: We need to use Table B2 and find the critical value based on df=n-2, our selected alpha level, and one-tailed vs two-tailed choice. 6. Compare computed value with the critical value. 7. State your decision about H0: If your computed value is larger than the critical value reject H0 in favor of H1. If your computed value is smaller than the critical value fail to reject H0. Example We expect that among employed Americans, those who spend more time commuting by car have higher levels of stress as a result. We use a nationally representative sample of 75 employed individuals and regress stress on time spent commuting by car. We end up obtaining slope b = 0.247, with a standard error of .133. Can we conclude that commuting time increases stress among Americans? Solution: 1. State hypotheses: H0: β = 0 In words: The commuting time has no effect on stress among Americans. H1: β > 0 In words: The commuting time increases stress among Americans. 2. Select alpha: 0.05 2 3. Test statistic: Student’s t 4. t = b/sb = .247/.133 = 1.86 5. Use the table to find critical value: Table B2 (df=n-2 = 75-2=73, alpha = .05, one-tailed) 1.666 6. Compare computed & critical value: 1.86 > 1.666 7. State your decision: We reject H0 in favor of H1. Conclusion: The time spent commuting by car is associated with increased levels of stress among employed Americans, and this relationship is statistically significant at .05 level. Exercises 1. We are interested to find out if one’s income is related to life satisfaction. We draw a random sample of 50 and calculate a sample r = .455. Can we conclude that there is a relationship between the two variables in the population? 2. We expect that higher levels of job autonomy lead to higher levels of employee satisfaction. We use a nationally representative sample of 65 employed individuals and regress employee satisfaction on job autonomy. We end up obtaining slope b = 0.372, with a standard error of .199. Can we conclude that job autonomy increases employee satisfaction? 3 3. Use variables AGEKDBRN and TVHOURS to evaluate whether, in the U.S. population, there is a relationship between the age when one has their first child and the number of hours watching TV. [Note: State hypotheses, use SPSS, and state your conclusions.] 4. Use variables AGE and TVHOURS to evaluate whether, as Americans get older, they watch more hours of TV per day. [Note: State hypotheses, use SPSS, and state your conclusions.] 4