Solutions

advertisement
Statistics 404
Fall 2009
SECOND EXAM
Solutions
Name _______________________________
The U.S. Census Bureau estimates there are about 96,000
centenarians (.03% of the population) in the US—a number
that is predicted to more than quadruple by 2030, reaching
1.15 million by 2050. Is it the glass of sherry or wine they
drink each evening before going to bed, or is it centenarians’
genetic makeup that accounts for their longevity? Or is it
centenarians’ cultural background? (For example, although
the proportion of centenarians in the Japanese population is
about the same as that in the US, in 2050 this proportion is projected to be double
that of the US!) Then there is the issue of centenarians’ quality of life. For
instance, although centenarians are much more likely to be female than male,
men who reach 100 years of age are usually healthier than are same-aged women.
You have been hired by the National Centenarian Awareness Project (NCAP) to
investigate centenarians’ “secrets to longevity.” The NCAP provides you funds
sufficient to conduct a cross-cultural study based on face-to-face interviews with
parallel random samples of centenarians in the US and Japan. You conduct
interviews with 50 female and 50 male centenarians in each country, yielding a
total sample size of 200. Beyond variables that indicate each subject’s nationality
(N=1 if US, N=2 if Japan) and gender (G=1 if male, G=2 if female), you have data
on these 200 subjects’ responses to the following 9 questions:
BIRTHYR (B)
= In what year were you born? (values range from 1895 to 1909)
ADMITHOS (A) = How many times have you been admitted as a patient in a
hospital during the past year? (values range from 0 to 8)
HAPPY (H)
= On a scale from 1 to 10, where 1 is the least happy of persons
and 10 is the most happy of persons, how happy would you say
you are? (values range from 1 to 10)
MDEATH (M)
= How old was your mother when she died?
FDEATH (F)
= How old was your father when he died?
RECACTIV (R) = How often have you joined others in recreational activities (e.g.,
golf, card playing, shuffle board, etc.) during the past week?
SHERRY (S)
= Do you generally drink a glass of sherry or wine daily? (values:
1=’yes’ or 0=’no’)
ENJOY (E)
= Do you generally enjoy being alone? (values: 1=’yes’ or 0=’no’)
OTHLIKE (O)
= How many visitors have you had during the past week?
1
Statistics 404
Fall 2009
SECOND EXAM
Solutions
Name _______________________________
a. You decide to use BIRTHYR, ADMITHOS, and HAPPY (i.e., variables
associated with centenarians’ responses to the first 3 of the above questions)
as the dependent variables in your investigation. However, even before
selecting independent variables for your regression models, you notice that
you will probably need to transform one of these 3 variables to ensure that
your data are homoscedastic. Which variable is this? In the space provided
below, sketch the heteroscedastic pattern that you suspect this variable will
produce if used as a dependent variable. Explain why you suspect the
variable will produce this pattern. What variable transformation would you
use to correct for this pattern? (Hint: Be sure to label the axes in your sketch!
Also note that in subsequent parts of the exam, analyses with this “suspected
variable” will have been performed using the required variance stabilizing
transformation.)
[weight 3]
ADMITHOS
Variable that will probably produce a heteroscedastic pattern: ______________
Sketch of the pattern you suspect:
ê
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Â
Why you suspect the above-sketched pattern:
ADMITHOS is a Poisson random variable (a measure of counts [hospital
admissions] within a fixed time period [the past year]). As such, its variance
increases linearly with the magnitude of its mean.
Transformation used to correct this pattern:
The appropriate variable transformation would be Y  ADMITHOS (or, if some
values of ADMITHOS equal zero, Y  ADMITHOS  ADMITHOS  1 ) to be used
as the dependent variable in regression models instead of ADMITHOS.
2
STAT 404 / SECOND EXAM
b. You begin your analysis by investigating the “genetics argument” that
people live longer because they inherited genes from parents who themselves
lived long. Following this line of thinking, you regress BIRTHYR (B) on
FDEATH (F) and MDEATH (M) and obtain the following regression equation:
Bˆ  1906  .5F  .4 M
Express the meaning of the partial slope between BIRTHYR and FDEATH (i.e.,
bˆF  .5 )in words.
[weight 6]
After adjusting the centenarians’ fathers’ longevity as if their mothers all died
at the same age (i.e., after removing from their fathers’ “genetic tendency to
live longer” these fathers’ tendency to have—perhaps instinctually—selected
a wife with a similar genetic tendency toward longevity), one would estimate
centenarians to have been born a half-year later (i.e., to have been a half-year
younger) for each additional year that their father lived.
c. If you interpreted b̂F correctly in part b, it likely sounds counterintuitive to
you. (Shouldn’t centenarians live longer if their fathers lived longer?) Upon
closer examination, you note that FDEATH and MDEATH are strongly
correlated ( rFM  .85 ), suggesting to you that centenarians nearly always inherit
longevity from both parents. This said, explain the problem in how you
specified the regression model in part b. Three techniques for remedying this
problem are mentioned in your lecture notes. Please name 2 of them.
[weight 3]
The problem is one of repetitiveness. If centenarians’ genetic-based longevity
consistently originates from both parents, FDEATH and MDEATH need to be
combined into a single measure of centenarians’ “genetic inheritance of
longevity.” Three techniques for remedying repetitiveness are (1) summing
variables, (2) factor analysis, and (3) principal components analysis.
3
STAT 404 / SECOND EXAM
d. You next investigate factors that contribute to centenarians’ physiological
quality of life (as measured [negatively] by ADMITHOS). Your thinking is that
exercise (as measured by RECACTIV) and the artery-cleaning effects of
drinking sherry or wine (as measured by SHERRY) have health benefits that
will keep centenarians out of hospitals. You regress ADMITHOS (A) on
RECACTIV (R) and SHERRY (S) both separately and in a multiple regression,
yielding the following 3 regression equations:
Aˆ  1  .5R
Aˆ   3  2 S
Aˆ   4  .7 R  6S
In the space below, sketch a plot of A, R, and S that is consistent with the
numbers provided in these 3 equations. (Hint: Indicate data points as “1” if
S=1, and as “0” if S=0.)
[weight 3]
8
1
1
1
1
1
ADMITHOS
(A)
0
0
0
0
0
0
0
RECACTIV (R)
4
15
STAT 404 / SECOND EXAM
e. Referring to the first and third regression equations listed in part d, give a
theoretical explanation for why in the first equation the bivariate slope
between A and R is positive, whereas in the third equation the partial slope
between A and R is negative. (Hint: Be sure that your explanation is
consistent with the sketch you drew in answering part d.)
[weight 3]
Centenarians’ sherry or wine drinking distorts the negative relation between
the frequency of their weekly activities and the (square root—as per part a)
frequency of their annual hospital admissions. Our findings in part d suggest
that drinking sherry or wine daily has detrimental health effects for
centenarians, not health benefits (e.g., from cleaning their arteries). That is, the
detrimental health-effects of drinking are likely the reason why centenarians’
are more frequently admitted as a hospital patient if they do than if they do not
drink sherry or wine each day. However, since recreationally active
centenarians are more likely to drink sherry or wine than less recreationally
active centenarians, the fact that “the more recreationally active centenarians
ended up in hospitals” is due not to their recreational activities but to their
drinking. Among centenarians who drink the same (i.e., who exclusively either
do or do not drink a glass of sherry or wine daily), recreational activity
decreases the frequency with which they were admitted to a hospital during the
past year.
5
STAT 404 / SECOND EXAM
f. Your next step is to examine centenarians’ psychological wellbeing (as
measured by HAPPY). Your thinking is that centenarians’ happiness will be
enhanced if they are liked by others (as measured by OTHLIKE). Unfortunately,
such an analysis would be complicated by the fact that centenarians’ visits
from others and their own psychological wellbeing are both influenced by their
physical health. That is, not only are healthy people more likely to be happy,
others may have reasons besides liking (e.g., selfish reasons such as hopes of
an inheritance) for visiting centenarians when they are unhealthy. As a result,
the effect of OTHLIKE on HAPPY may be due to this common prior cause
(namely, health—a variable that [you should be sure to assume that] you have
no adequate measure of in your data set). Explain how you might proceed with
an analysis of the effect of OTHLIKE on HAPPY in a way that would not violate
the assumption that  X T~e   X T *  e~ . (Hints: What instrumental variable
 

might you use? Why would this variable make a good instrument? How would
you use the variable to ensure that the  X T~e   X T *  e~ assumption is not
 
violated?)

[weight 3]
Two-stage least squares (2SLS) is called for here. In this case, an instrumental
variable is needed that is (a) related to others’ liking but (b) unrelated to the
centenarians’ health. Accordingly, ENJOY (E) might work as an instrumental
variable for the following reasons:
(a) Someone who enjoys solitude is less likely to be liked by others than
someone who enjoys others’ company. (Note: An instrumental variable
may have a positive or negative linear association with the variable for
which it is an instrument.)
(b) If the enjoyment of being alone comprises a centenarian’s general
character trait, it would not vary according to her or his health.
The 2SLS procedure could be implemented as follows:
(1) Stage 1: Regress OTHLIKE on ENJOY, and obtain the predicted values (Ohat) from this regression.
(2) Stage 2: Regress HAPPY on the O-hat values obtained in Stage 1.
Note: Unlike OTHLIKE, O-hat will not be linearly associated with health-related
variance in HAPPY.
6
STAT 404 / SECOND EXAM
g. Finally, you decide to describe differences in psychological wellbeing
among Japanese women, Japanese men, US women, and US men. Average
scores on the HAPPY variable for each of these four groups are as follows:
Japanese women
Japanese men
US women
US men
8
7
4
5
Keeping in mind that there are exactly 50 people in each of these groups, do
the following while treating the four groups as if they comprise a single
nominal-level variable with 4 attributes of “Japanese woman,” “Japanese
man,” “US woman,” and “US man” (i.e., do not consider them as the two
distinct variables of nationality [N] and gender [G]): First, explain how you
would construct effect measures from this variable. Second, obtain constant
and slope estimates from the regression of HAPPY on these effect measures.
Finally, after computing the “effect” associated with each of the four groups,
explain the meaning of each effect in words. (Hint: Be sure to show how the
effect variables were constructed and how you calculated your estimates.)
[weight 4]
Effect measures:
E JW
 1 if Japanese woman
 1 if Japanese man
 1 if US woman



   1 if US man
E JM    1 if US man
EUW    1 if US man
 0 otherwise
 0 otherwise
 0 otherwise



Constant and slope estimates:
aˆ 
50  8  50  7   50  4   50  5  6
200
ˆ
ˆ
H JW  8  aˆ  bJW E JW  bˆJM E JM  bˆUW EUW  6  bˆJW 1  bˆJM 0   bˆUW 0   6  bˆJW  bˆJW  2
Hˆ JM  7  aˆ  bˆJW E JW  bˆJM E JM  bˆUW EUW  6  bˆJW 0   bˆJM 1  bˆUW 0   6  bˆJM  bˆJM  1
Hˆ UW  4  aˆ  bˆJW E JW  bˆJM E JM  bˆUW EUW  6  bˆJW 0   bˆJM 0   bˆUW 1  6  bˆUW  bˆUW  2
The regression model is thus Hˆ  6  2 E JW  1E JM  2 EUW .
The effects and “meanings in words” associated with each group are as follows:
bˆJW  2 : Japanese women’s happiness scores were 2 points above the overall
average.
bˆJM  1 : Japanese men’s happiness scores were 1 point above the overall average.
bˆUW  2 : US women’s happiness scores were 2 points below the overall average.
3
bˆUM    bˆi  2  1  2   1 : US men’s happiness scores were 1 point below the
i 1
overall average.
7
Download