Sociology 3211q Final Exam Dec 11 , 2013 NAME___________________________ Instructions: Answer all questions in the space provided, or go to the back or use a blank sheet of paper if necessary. Show your work; for questions that don't involve calculations, explain the reasoning behind your answer. If you're not sure how to interpret a question, ask me. There are 75 points total. Dependent variable: self-rated health (5=excellent, 4=very good, 3=good, 2=fair, 1=poor) Independent variables: age in years female (1=woman; 0=man) Income (1=under 10K; 2=10-15K; 3=15-20K; 4=20-25K; 5=25-35K; 6=35-50K; 7=50-75K; 8=over 75K) Educr: Education (1=not high school grad; 2=hs grad; 3=some college; 4=college grad) BMI=Body mass index (calculated from height and weight; 25 or over is considered overweight) Dummy variables for race/ethnicity: White: Non-hispanic white Black Hispanic Orace: all other race/ethnicity Model Summary Model R R Square .505a 1 Adjusted R Std. Error of the Square Estimate *** .253 .95947 a. Predictors: (Constant), COMPUTED BODY MASS INDEX, orace, hispanic, female, REPORTED AGE IN YEARS, black, educr, INCOME LEVEL ANOVAa Model 1 Sum of Squares df Mean Square Regression 1101.929 8 137.741 Residual 3217.468 3495 .921 Total 4319.397 3503 F 149.622 a. Dependent Variable: health b. Predictors: (Constant), COMPUTED BODY MASS INDEX, orace, hispanic, female, REPORTED AGE IN YEARS, black, educr, INCOME LEVEL 1 Sig. .000b Coefficientsa Model Unstandardized Coefficients Standardized t Sig. Coefficients B (Constant) 2.705 .131 .076 .034 black -.175 hispanic Beta 20.591 .000 .034 2.257 .024 .066 -.039 -2.633 .009 -.144 .070 -.031 -2.063 .039 orace -.231 .069 -.050 -3.377 .001 educr .153 .018 .136 8.288 .000 INCOME LEVEL .157 .009 .299 17.549 .000 -.010 .001 -.156 -10.131 .000 -.037 .003 -.193 -13.016 .000 female 1 Std. Error REPORTED AGE IN YEARS COMPUTED BODY MASS INDEX a. Dependent Variable: health 1. (6) What is the predicted value of health for a 30-year-old black woman who makes $40,000 a year, has graduated from college, and has a body mass index of 25? 2.705+.076-.175+.153*4+.157*6-.010*30-.037*25=2.935 2. (4) Suppose that the woman in question 1 reports that her health is “very good.” What is her residual? 4-2.935=1.065 2 3. (4) Suppose that twenty years later, the woman has the same BMI and makes $55,000 per year. Is the predicted value of her health at that time higher, lower, or the same as in question 1? You could calculate the predicted value in the same way as question 1. But it would be quicker to note that only two things would be different: she would be 20 years older, which would reduce the predicted value by .01*.020=.2, and one unit higher in income, which would increase it by .157. So the predicted value would be lower. 4. (4) According to this regression, what kind of person is predicted to have the best health? Female, white, well educated, high income, young, low BMI 5. (4) I have replaced the number for the R square by ***. number should go there? What 1101.929/4319.397=.255 (or you could square R) 6. (4) According to this regression, which variable has more effect on health, education or BMI? Explain how you can tell. BMI, because the absolute value of the standardized coefficient is bigger. 7. (6) Rank the racial/ethnic groups from highest to lowest in terms of self-rated health. Non-hisp white 0 Hispanic -.144 Black -.175 Other race -.231 3 8. (4) Suppose that I computed a new variable: Age divided by 10, and used it in the regression instead of age. What would change in the model summary, ANOVA table, and table of coefficients, and what would stay the same? The only change is that the coefficient would be ten times as big as the coefficient for age. Everything else would stay exactly the same. 9. (6) Construct a 95% confidence interval for the coefficient of “black.” -.175-2*.066= -.307 to -.175+2*.066= -.043 10. (4) Here are the ANOVA tables from regressions that are exactly the same as the one on p. 1 except that they used different transformations of age. Which regression is the best, the one using age, the one using age squared, or the one using the square root of age? Explain how you can tell. The one with the square root of age, because the regression sum of squares is bigger (and the residual sum of squares is smaller). ANOVAa Model 1 Sum of Squares df Mean Square Regression 1091.364 8 136.420 Residual 3228.033 3495 .924 Total 4319.397 3503 F 147.703 Sig. .000b With Age squared ANOVAa Model 1 Sum of Squares df Mean Square Regression 1105.161 8 138.145 Residual 3214.236 3495 .920 Total 4319.397 3503 With the square root of age 4 F 150.212 Sig. .000b ANOVAa Model 1 Sum of Squares df Mean Square Regression 1101.103 6 183.517 Residual 3218.294 3497 .920 Total 4319.397 3503 F Sig. .000b 199.410 Predictors: (Constant), COMPUTED BODY MASS INDEX, REPORTED AGE IN YEARS, female, educr, white, INCOME LEVEL 11. (4) Here is the ANOVA table from a regression including a dummy variable for white, plus the age, female, education, income, and BMI variables. Which is better: this regression or the regression on p. 1? Explain. This one, because the Mean Square Residual is lower. Coefficientsa Model Unstandardized Coefficients Standardized t Sig. Coefficients B (Constant) 2.402 .084 .001 .033 black -.391 hispanic Beta 28.529 .000 .000 .028 .978 .065 -.089 -6.066 .000 -.322 .068 -.070 -4.705 .000 orace -.284 .067 -.061 -4.208 .000 educr .303 .017 .270 18.317 .000 -.014 .001 -.211 -14.220 .000 female 1 Std. Error REPORTED AGE IN YEARS 12. (6) Here are the coefficients from a regression without income and BMI. Based on these results and the results on p. 1, give the direct, indirect, and total effects of education on health. Direct .153 Total .303 Therefore Indirect=.303-.153=.150 The regression on p. 1-2 gives the direct effects because variables that come after education (that is, are potentially influenced by education). 5 13. (4) Suppose someone said we should add a variable for smoking status to the regression. If we did, would the coefficient for income become larger, smaller, or stay exactly the same? Or is it impossible to know until you do the regression including smoking status? You can’t know until you do it. Coefficientsa Model Unstandardized Coefficients Standardized t Sig. Coefficients B (Constant) 1.584 .168 .098 .034 black -.173 hispanic Beta 9.441 .000 .043 2.845 .004 .067 -.039 -2.589 .010 -.145 .070 -.031 -2.058 .040 orace -.234 .069 -.050 -3.397 .001 educr .158 .019 .140 8.509 .000 INCOME LEVEL .158 .009 .301 17.596 .000 -.010 .001 -.153 -9.888 .000 lonorm .225 .139 .067 1.613 .107 hinorm .215 .136 .078 1.577 .115 loover .238 .136 .088 1.746 .081 hiover .053 .138 .017 .384 .701 obese -.235 .135 -.095 -1.737 .083 female 1 Std. Error REPORTED AGE IN YEARS a. Dependent Variable: health 14. (4) I made dummy variables for BMI ranges: underweight (under 18.5), low normal (18.5 to 22), high normal (22 to 25), low overweight (25 to 28), high overweight (28 to 30), and obese (over 30) and included them in the regression. According to the results given above, which of these groups has the best health, on the average? Low overweight 6 15. (3) Suppose I made a variable for BMI squared and included it in the regression with the BMI variable (and did not include the BMI category dummies). Would the coefficient for the BMI squared variable be positive, negative, or zero? 16. (4) Here are some numbers representing the ages of the employees at a company and their salaries (in thousands). Calculate the correlation between age and salary. Age 28 33 37 47 54 57 61 67 Salary 61 44 38 58 66 64 77 56 The mean age is 48, the mean salary is 58. The sums of squared deviations from the mean are 1394 for age and 1070 for salary. The sum of (age-mean(age))*(salary-mean(salary))=681. So the correlation is 681/sqrt(1394*1070)=.558 17. (4) Suppose you did a regression with salary as the dependent variable and age as the independent variable. Calculate the regression coefficient for age. 681/1394=.489 7