Sociology 3211q NAME___________________________ Final

advertisement
Sociology 3211q
Final Exam
Dec 11 , 2013
NAME___________________________
Instructions: Answer all questions in the space provided, or go to the back or use a blank sheet
of paper if necessary. Show your work; for questions that don't involve calculations, explain the
reasoning behind your answer. If you're not sure how to interpret a question, ask me. There
are 75 points total.
Dependent variable: self-rated health (5=excellent, 4=very good, 3=good, 2=fair, 1=poor)
Independent variables:
age in years
female (1=woman; 0=man)
Income (1=under 10K; 2=10-15K; 3=15-20K; 4=20-25K; 5=25-35K; 6=35-50K; 7=50-75K;
8=over 75K)
Educr: Education (1=not high school grad; 2=hs grad; 3=some college; 4=college grad)
BMI=Body mass index (calculated from height and weight; 25 or over is considered overweight)
Dummy variables for race/ethnicity:
White: Non-hispanic white
Black
Hispanic
Orace: all other race/ethnicity
Model Summary
Model
R
R Square
.505a
1
Adjusted R
Std. Error of the
Square
Estimate
***
.253
.95947
a. Predictors: (Constant), COMPUTED BODY MASS INDEX, orace,
hispanic, female, REPORTED AGE IN YEARS, black, educr, INCOME
LEVEL
ANOVAa
Model
1
Sum of Squares
df
Mean Square
Regression
1101.929
8
137.741
Residual
3217.468
3495
.921
Total
4319.397
3503
F
149.622
a. Dependent Variable: health
b. Predictors: (Constant), COMPUTED BODY MASS INDEX, orace, hispanic, female,
REPORTED AGE IN YEARS, black, educr, INCOME LEVEL
1
Sig.
.000b
Coefficientsa
Model
Unstandardized Coefficients
Standardized
t
Sig.
Coefficients
B
(Constant)
2.705
.131
.076
.034
black
-.175
hispanic
Beta
20.591
.000
.034
2.257
.024
.066
-.039
-2.633
.009
-.144
.070
-.031
-2.063
.039
orace
-.231
.069
-.050
-3.377
.001
educr
.153
.018
.136
8.288
.000
INCOME LEVEL
.157
.009
.299
17.549
.000
-.010
.001
-.156
-10.131
.000
-.037
.003
-.193
-13.016
.000
female
1
Std. Error
REPORTED AGE IN
YEARS
COMPUTED BODY MASS
INDEX
a. Dependent Variable: health
1. (6) What is the predicted value of health for a 30-year-old black
woman who makes $40,000 a year, has graduated from college, and has a
body mass index of 25?
2.705+.076-.175+.153*4+.157*6-.010*30-.037*25=2.935
2. (4) Suppose that the woman in question 1 reports that her health is
“very good.” What is her residual?
4-2.935=1.065
2
3. (4) Suppose that twenty years later, the woman has the same BMI
and makes $55,000 per year. Is the predicted value of her health at
that time higher, lower, or the same as in question 1?
You could calculate the predicted value in the same way as question 1.
But it would be quicker to note that only two things would be
different: she would be 20 years older, which would reduce the
predicted value by .01*.020=.2, and one unit higher in income, which
would increase it by .157. So the predicted value would be lower.
4. (4) According to this regression, what kind of person is predicted
to have the best health?
Female, white, well educated, high income, young, low BMI
5. (4) I have replaced the number for the R square by ***.
number should go there?
What
1101.929/4319.397=.255 (or you could square R)
6. (4) According to this regression, which variable has more effect
on health, education or BMI? Explain how you can tell.
BMI, because the absolute value of the standardized coefficient is
bigger.
7. (6) Rank the racial/ethnic groups from highest to lowest in terms
of self-rated health.
Non-hisp white
0
Hispanic
-.144
Black
-.175
Other race
-.231
3
8. (4) Suppose that I computed a new variable: Age divided by 10,
and used it in the regression instead of age. What would change in
the model summary, ANOVA table, and table of coefficients, and what
would stay the same?
The only change is that the coefficient would be ten times as big as
the coefficient for age. Everything else would stay exactly the same.
9. (6) Construct a 95% confidence interval for the coefficient of
“black.”
-.175-2*.066= -.307
to -.175+2*.066= -.043
10. (4) Here are the ANOVA tables from regressions that are exactly
the same as the one on p. 1 except that they used different
transformations of age. Which regression is the best, the one using
age, the one using age squared, or the one using the square root of
age? Explain how you can tell.
The one with the square root of age, because the regression sum of
squares is bigger (and the residual sum of squares is smaller).
ANOVAa
Model
1
Sum of Squares
df
Mean Square
Regression
1091.364
8
136.420
Residual
3228.033
3495
.924
Total
4319.397
3503
F
147.703
Sig.
.000b
With Age squared
ANOVAa
Model
1
Sum of Squares
df
Mean Square
Regression
1105.161
8
138.145
Residual
3214.236
3495
.920
Total
4319.397
3503
With the square root of age
4
F
150.212
Sig.
.000b
ANOVAa
Model
1
Sum of Squares
df
Mean Square
Regression
1101.103
6
183.517
Residual
3218.294
3497
.920
Total
4319.397
3503
F
Sig.
.000b
199.410
Predictors: (Constant), COMPUTED BODY MASS INDEX, REPORTED AGE IN YEARS, female,
educr, white, INCOME LEVEL
11. (4) Here is the ANOVA table from a regression including a dummy
variable for white, plus the age, female, education, income, and BMI
variables. Which is better: this regression or the regression on p.
1? Explain.
This one, because the Mean Square Residual is lower.
Coefficientsa
Model
Unstandardized Coefficients
Standardized
t
Sig.
Coefficients
B
(Constant)
2.402
.084
.001
.033
black
-.391
hispanic
Beta
28.529
.000
.000
.028
.978
.065
-.089
-6.066
.000
-.322
.068
-.070
-4.705
.000
orace
-.284
.067
-.061
-4.208
.000
educr
.303
.017
.270
18.317
.000
-.014
.001
-.211
-14.220
.000
female
1
Std. Error
REPORTED AGE IN
YEARS
12. (6) Here are the coefficients from a regression without income
and BMI. Based on these results and the results on p. 1, give the
direct, indirect, and total effects of education on health.
Direct
.153
Total
.303
Therefore
Indirect=.303-.153=.150
The regression on p. 1-2 gives the direct effects because variables
that come after education (that is, are potentially influenced by
education).
5
13. (4) Suppose someone said we should add a variable for smoking
status to the regression. If we did, would the coefficient for income
become larger, smaller, or stay exactly the same? Or is it impossible
to know until you do the regression including smoking status?
You can’t know until you do it.
Coefficientsa
Model
Unstandardized Coefficients
Standardized
t
Sig.
Coefficients
B
(Constant)
1.584
.168
.098
.034
black
-.173
hispanic
Beta
9.441
.000
.043
2.845
.004
.067
-.039
-2.589
.010
-.145
.070
-.031
-2.058
.040
orace
-.234
.069
-.050
-3.397
.001
educr
.158
.019
.140
8.509
.000
INCOME LEVEL
.158
.009
.301
17.596
.000
-.010
.001
-.153
-9.888
.000
lonorm
.225
.139
.067
1.613
.107
hinorm
.215
.136
.078
1.577
.115
loover
.238
.136
.088
1.746
.081
hiover
.053
.138
.017
.384
.701
obese
-.235
.135
-.095
-1.737
.083
female
1
Std. Error
REPORTED AGE IN
YEARS
a. Dependent Variable: health
14. (4) I made dummy variables for BMI ranges: underweight (under
18.5), low normal (18.5 to 22), high normal (22 to 25), low overweight
(25 to 28), high overweight (28 to 30), and obese (over 30) and
included them in the regression. According to the results given
above, which of these groups has the best health, on the average?
Low overweight
6
15. (3) Suppose I made a variable for BMI squared and included it in
the regression with the BMI variable (and did not include the BMI
category dummies). Would the coefficient for the BMI squared variable
be positive, negative, or zero?
16. (4) Here are some numbers representing the ages of the employees
at a company and their salaries (in thousands). Calculate the
correlation between age and salary.
Age
28
33
37
47
54
57
61
67
Salary
61
44
38
58
66
64
77
56
The mean age is 48, the mean salary is 58. The sums of squared
deviations from the mean are 1394 for age and 1070 for salary. The
sum of (age-mean(age))*(salary-mean(salary))=681. So the correlation
is 681/sqrt(1394*1070)=.558
17. (4) Suppose you did a regression with salary as the dependent
variable and age as the independent variable. Calculate the
regression coefficient for age.
681/1394=.489
7
Download