Exam 2 Solutions Stat 557 Fall 2000 Problem 1

advertisement
Stat 557
Fall 2000
Exam 2 Solutions
Problem 1
As expected, most students used a single log-linear model to analyze these data. Other
approaches were used. Two separate log-linear models were used, one where the table was
collapsed across the levels of use of other health care providers (O) and the other hwere
the table is collapsed across the levels of the use of doctor visits (D). The data can also
be analyzed with logistic regression models, then the model searching methods in the SAS
LOGISTIC procedure could be used. One approach would t two logistic regression models,
one using the log-odds of a doctor visit as the repsonse and the other using the log-odds
of a visit to alternative health care provider as the response. The joint response for visits
to doctors and other health care providers, however, has four categoriesand three logistic
regression models would be needed to replicate the log-linear modell analysis. A key part of
this approach is choosing an informative set of logits. We will only comment on the log-linear
model approach in this solution.
(a) Summary: There is essentially no dierence in demand for doctor visits for single
adults covered by the government and private health insurance plans in Australia. The
data provide only tenetative evidence of a small increase in the demand for services
from other health care providers by single adults covered by private health insurance.
These results hold after making adjustments for age, sex, and the presence or absence
of chronic illness and recent short term illness. About 55:7% of the respondents were
covered by government health insurance. Government health insurance covers a slightly
higher proportion of males (58:6%) than females (53:0%). Overall, females are 60
(b) Before tting complicated models, a good way to start the analysis is an examination
of percentages and two-way tables. This provides the following information:
In this sample of single adults in Australia, 20:2% of the respondents visited a
doctor and 9:1% visited some other health care provider in the last two weeks.
20:37% of the 2892 single adults covered by government programs and
8:85% of the 2892 single adults covered by government programs and
Females (24.4Females (11.6
Demand for both doctor and other health care services tend to increase with age.
1
Respondents with recent illness are about 4 times more likely to visit a doctor and
3 times more likely to visit other health care providers than respondentts without
recent illness.
Respondents with chronic conditions are about 2 times more likely to visit a doctor
and 3 times more likely to visit other health care providers than respondents
without chronic conditions.
About 55:7% of the respondents were covered by government health insurance.
A slightly higher proportion of males (58:6%) than females (53:0%)were covered
by government health insurance. There was essentially no dierence in the proportion of respondents with chronic conditions (55:5%) and the proportion of
respondents without chronic conditions (56:0%) who were covered by government
health insurance. There was also very little dierence in the proportion of respondents with recent illness (56:3%) and the proportion of respondents with no
recent illness (54:3%) who were covered by government health insurance. The
government provided health care coverage for 56% of the 20-29 year old respondents, about 40% of the 30-49 year old respondents, 52% of the 50-59 year old
respondents, 61% of the 60-69 year old respondnets, and 66% of the repondents
over 70 years old.
These results suggest that a log-linear model will have to account for eects of age, sex,
recent illness and chronic illness on demand for services from doctors and other health
care providers, but there may be little dierence between government and private
insurance coverage. They also suggest that a log-linear analysis will have to account
for age and sex dierence in coverage rates for the government and private insurance
plans. These results give little insight intoo higher order associations. Of courses,
conditional associations identifed by a log-linear analysis could dier from the marginal
associations.
A few students made good use of mosaic plots to visually examine conditional reassociations with the demand for doctor visits or visits with other health care providers.
(c) Most people used the step( ) or stepAIC( ) functions in S-PLUS to choose a model,
starting with some simple model like complete independence. For these data, this
approach tended to make the model too complicated. There are a number of interaction
terms that are close to being signicant at the .05 level that are put into the model.
Consequently, there is a rather wide range of models that seem to provide a good
description of these data. It was okay to select one of the more complicated models in
2
this range for your answer as long as you clearly distinguished between the substantial
eects and the weaker eects. I took a model identied by the stepAIC( ) function and
used backward elimination, dropping one the least signicant interaction at each step
without violating the hierarchical modeling criterion, to select a more simple model:
log(mijklmrt) = + Si + Aj + Hk + Il + Cm + Dr + Ot
SC
HC SA
SI
AI
+ SH
ik + im + km ij + il + jl
AO
SO SD
AD
AC
+ IC
lm + jt + it ir + jr + jm
ID
IO CD
CO
DO
+ AH
jk + lr + lt mr + mt + rt
SAI
SAD AIC
+ SHC
ikm + ijl + ijr ilm
Using the coding given in the statement of the problem,
S
A
H
I
C
D
O
=
=
=
=
=
=
=
sex
age group
type of health insurance
presence/absence of recent illness
presence/absence of chronic
doctor visits
visits to other health care providers
This model implies that, conditional on the levels of the other factors, demands for
doctor services are about the same for single adults covered by government and private
insurance plans. It also implies that, conditional on the levels of other factors, demands
for services from other health care providers are about the same for single adults covered
by government and private insurance plans. This coincides with results from the twoway tables of marginal counts.
There are three two-factor interaction terms with doctors visits in the model that are
not involved in higher order interactions. The strength and direction of these associations are consistent across the levels of the other factors. Estimates were obtained
from the GENMOD procedure in SAS where the interaction terms are constrained to
be zero at the highest level of any factor invovled int he interaction.
Recent illness: ^ID
^ = 3:97, and
11 = 1:38 corresponds to an estimated odds ratio of an approximate 95% condence interval for the odds ratio is (3:19; 4:94). This
3
implies that single adults with a recent illness are from 3 to 5 times as likely to
visit a doctor than single adults without a recent illness.
Chronic illness: ^CD
^ = 1:49,
11 = 0:396 corresponds to an estimated odds ratio of and an approximate 95% condence interval for the odds ratio is (1:20; 1:84).
This implies that single adults with a chronic illness are from 20 to 84 percent
more likely to visit a doctor than single adults without a chronic illness.
Visit to other health care providers: ^OD
11 = 0:241 corresponds to an estimated
odds ratio of ^ = 1:27, and an approximate 95% condence interval for the odds
ratio is (1:08; 1:50). This implies that single adults that visited another health
care providered in the last two week are from 8 to 50 percent more likely to visit
a doctor than single adults who did not visit some other health care provider.
Sex and Age are involved in a three-way interaction with demand for doctor visits.
This implies, for example, that changes in demand ofr doctor visits across age
groups are not the same for single males and single females in Austrailia. The
following table of PROC GENMOD estimates of the ^AD
jr terms show that demand for doctor visits by single males in Austrailia tends to become stronger as
age increases.
visit
20 ; 29 30 ; 39 40 ; 49 50 ; 59 60 ; 69 70+
yes (r = 1) -1.158 -0.951 -0.747 -0.521 -0.726 0.00
no (r = 2) 0.00
0.00
0.00
0.00
0.00 0.00
Applying the exponential function to these estimates to obtain mle's of odds rations, we see that males older than 70 are about 3 times as likely as 20-39 year
old males to visit doctors and about 2 times as likely as 40-69 year old males to
visit doctors. There is a weaker and slightly dierent trend across age groups for
single women. This is seen by computing the corresponding table of values for
^SAD
^AD
jr + 1jr shown below.
visit
20 ; 29 30 ; 39 40 ; 49 50 ; 59 60 ; 69 70+
yes (r = 1) -0.46
-0.42
-0.74
-0.17
-0.30 0.00
no (r = 2) 0.00
0.00
0.00
0.00
0.00 0.00
Applying the exponential function to these estimates to obtain mle's of odds
rations, we see that females older than 70 are about 50% more likely to visit
doctors than 20-39 year old females, about 2 times as likely to visit doctors as
40-49 year old females, and about 20% ; 35% more likely to visit doctors than
50-69 year old females.
4
Alternatively, you could examine how dierence between male and female demands for doctor services dier across age groups. We will not show thos results
here.
For the model we selected, the HO
kt was not quite signicant at the .05 level. There
is only weak evidence that demand of other health care providers by single adults was
higher (about 15
Other two-factor interactions invovling demand for other health care providers did
not involve three-factor interactions. Hence, these two-factor associations are approximately consistent across the levels of the other factors.
Sex: ^SO
^ = 1:32, and an ap11 = 0:28 corresponds to an estimated odds ratio of proximate 95% condence interval for the odds ratio is (1:07; 1:64). This implies
that single females are from 7 to 64 percent more likely to visit other health care
providers than single males.
Age: The following table of PROC GENMOD estimates of the AO
jt terms show that
demand for visits to other health care providers by single adults in Australia is
weakest in the 20-29 age group and strongest in the 70+ age group.
visit
20 ; 29 30 ; 39 40 ; 49 50 ; 59 60 ; 69 70+
yes (t = 1) -0.867 -0.614 -0.578 -0.656 -0.431 0.00
no (t = 2) 0.00
0.00
0.00
0.00
0.00 0.00
Recent illness: ^IO
^ = 2:07, and
11 = 0:729 corresponds to an estimated odds ratio of an approximate 95% condence interval for the odds ratio is (1:55; 2:77). This
implies that single adults with a recent illness are approximately 2 times as likely
to visit a non-doctor health care provider than single adults without a recent
illness.
Chronic illness: ^CO
^ = 1:80,
11 = 0:590 corresponds to an estimated odds ratio of and an approximate 95% condence interval for the odds ratio is (1:42; 2:29).
This implies that single adults with a chronic illness are from 42 to 129 percent
more likely to visit a non-doctor health care provider than single adults without
a chronic illness.
Visit with a doctor: ^OD
^=
11 = 0:241 corresponds to an estimated odds ratio of 1:27, and an approximate 95% condence interval for the odds ratio is (1:08; 1:50).
This implies that single adults that visited a doctor in the last two week are from
5
8 to 50 percent more likely to visit some other health care provider than single
adults who did not visit a dcotor.
The log-linear analysis also provides insight into dierences among single adults covered
by government andn private health insurance in Austrailia with respect to sex, age,
recent illness, and chronic illness. These dierence would have been of greater interest
if the conditional asociations between insurance plans and demand for doctor visits
and visits from other health care providers had not agreed so well with the results
form the two-way marginal tables of counts. We will breiy report the results implied
by the log-linear model we selected for these data.
Age: The following table of PROC GENMOD estimates of the AH
jk terms shows that
enrollment rate in private health insurance is highest for the 30-39 age group and
it decrease as age increases. This corresponds to what was seen in the corresponding two-way marginal table of counts.
insurance 20 ; 29 30 ; 39 40 ; 49 50 ; 59 60 ; 69 70+
gov. (k = 1) -0.583 -1.305 -1.084 -0.642 -0.253 0.00
pri. (k = 2) 0.00
0.00
0.00
0.00
0.00 0.00
HI
item[Recent illness:] The kl interaction was deleted fromthe model because it
was not signicant at the 0.15 level. This implies that incidence rates of illness in
the last two weeks were about the same for single adults covered by the government
and private insurance plans.
Sex and chronic illness are involved in a three-way interaction with health care coverage. This implies, for example, that the dierence between incidence rates of
chronic disease for single adults enrolled in the government and private health
insurance plans is not the same for males and females. The PROC GENMOD
estimate ^HC
^ = 0:774,
11 = ;0:256 corresponds to an estimated odds ratio of and an approximate 95% condence interval for the odds ratio is (0:66; 0:91).
The incidence of chronic disease is lower for single males covered by government insurance than for single males covered by private health insurance. Since
^SHC
^HC
11 + 111 = ;0:256 + 0:329 = 0:073 corresponds to an estimated odds ratio of
^ = 1:07, the incidence of chronic disease is slightly higher among single females
covered by government insurance than among single females covered by private
health insurance.
(d) Since over 50% of the estimates of the mean counts are smaller than 5 and many are
6
smaller than 0.5, the chi-square approximation to the Pearson X 2 statistic or the G2
statistic may not provide a reliable p-value for testing the t of the selected model
against the general alternative of 384 independent Poisson counts with arbitrary positive means. Nevertheless, for this model X 2 = 279:56 and G2 = 297:32 are both
smaller than the 314, the degrees of freedom for the chi-square approximation. Hence,
the model appears to provide a good summary of the variation in the observed counts.
There is no indication of extra-Poisson variation and it is not necessary to entertain
negative binomial distributions for the counts or some other distribution to allow variances of the counts to be larger than the means. An examination of the various types
of residuals produced by the GENMOD procedure in SAS revealed no extreme values.
A plot of the observed counts versus the estimated means shows little variationn from
a 45 degree line.
Problem 2
(a) Summary: For both species, the proportion of eggs that produce males turtles decreases as temperature increases. The change occurs over a smaller temperature interval for species 1 than for species 2. For species 1 eggs collected in Iowa, the proportion
of male hatchlings decreases from 95 percent to 5 percent as the incubation temperature increases from about 27.69 oC to 29.04 oC , but for species 2 eggs collected in
Iowa, the proportion of male hatchlings decreases from 95 percent to 5 percent as the
incubation temperature increases from about 26.95 oC to 30.25 oC . The results are
similar for eggs collected in Louisiana, but the tempearture intervals are shifted by
about 1.5oC for both species.
(b) A search for a good model might begin by plotting the observed proportion of male
hatchlings against temperature for each species within each location. One could also
examine plots of empirical logits, omitting cases where the obseved percentage of male
hatchlings was either 0 or 100 percent. The next step is to t four separate logistic
regression models, one for each species within each location. For species 2 eggs, these
preliminary analyses revealed that logistic regression models with just a linear temperature trend were sucient to model the decrease in the proportion of male hatchlings
as incubation temperature increases. Furthermore, the curves for species 2 were nearly
7
parallel, suggesting that the temperature coecient might be the same at both locations. A logistic regression model with dierent intercepts for the two locations and
the same coecient for the linear temperature component was found to provide an adequate model for the species 2 eggs. The preliminary analyses revealed a much sharper
temperature eect for the species 1 eggs. At each location, both linear and quadratic
temperature trends were needed in the logistic regression models. Further comparison
of those logistic regression models showed that same intercept could be used for both
loctaions and the coecient for the quadratic temperature eect could also be the same
for both locations, but the linear temperature eects required dierent coecient for
the two locations.
(c) The nal model is shown below with a standard error shown in parentheses beneath
each estimated coecient. Here ij denotes the conditional probability that a male
turtle emerges from an egg collected from the i-th species in the j -th location when it
is incubated at the specied temperature.
Species 1 in Iowa:
log
^11
1;
^11
= 2:0157 + 3:3088(temperature ; 26) ; 1:6286(temperature ; 26)2
(1:5854)
(1:3996)
(0:3160)
Species 1 in Louisiana:
log
^12
1;
^12
= 2:0157 + 6:0170(temperature ; 26) ; 1:6286(temperature ; 26)2
(1:5854)
(1:6201)
(0:3160)
Species 2 in Iowa:
log
^21
1;
^21
Species 2 in Louisiana:
log
^22
1;
^22
= 4:6402 ; 1:7840(temperature ; 26)
(0:2876)
(0:0997)
= 7:2936 ; 1:7840(temperature ; 26)
(0:4258)
(0:0997)
These curves are displayed in the following gure.
8
For species 2 eggs, from either Iowa or Louisiana, a one degree C increase in incubation
temperature corresponds to about an 83 percent decrease in the log-odds for males (a
factor of exp(-1.784)=0.168). The intercepts for the Iowa and Louisiana curves, 4.64
and 7.29, respectively, correspond to hatch rates of more than 99 percent males at 26
o C. For species 1 eggs, an intercept 0f 2.0157 corresponds to hatch rates of about 88
percent males at 26 oC, but this is not well estimated. For species 1 eggs from Iowa,
an increase in the incubation temperature from T to T + 1 corresponds to a change of
about ;1:682 ; 3:2592(T ; 25) in the log-odds that a male hatches from an egg. For
species 1 eggs from Louisiana, an increase in the incubation temperature from T to
T + 1 corresponds to a change of about ;4:388 ; 3:2592(T ; 25) in the log-odds that
a male hatches from an egg.
9
(d) Plots of the observed proportions overlaid with the estimated curves described above
do not reveal any obvious deciencies in the proposed model. Examination of residuals
does not reveal any obvious deciences in the proposed model. In this case, it would
be a good idea to make a separate plot of the residuals versus incubation temperature
for each species in each location and using a data smoother to put a smooth curve on
each residual plot. (Nobody did this.) The deviance value is G2 = 21:72 for testing the
null hypothesis that the four curves provided by our model are appropriate against the
general alternative that there are 40 independent binomial random variables for the
observed number of eggs that produce male turtles (for the ten incubation temperatures
used with eggs collected from each of the two species at each of the two locations).
This test has 33 degrees of freedom. The presence of some small estimates of expected
counts for either males or females may prevent a large sample chi-square approximation
from providing a reliable p-value. Nevertheless, the G2 value is smaller than the 33
degrees of freedom which suggests that the observed counts are consistent with the
proposed model and there is no need to consider models to account for extra-binomial
variation.
Scores: Here is a stem-leaf display of the scores for this exam. Each problem was scored on
a 50 point scale.
9
8
8
7
7
6
6
1
888
111122
6667888999
222234
56688
02
10
Download