Preface

advertisement
iii
Preface
The Department of Defense has been conducting a series of studies of health
effects of veterans who served in the Persian Gulf War. As part of that effort,
RAND has been working with the Office of the Special Assistant to the Deputy
Secretary of Defense for Gulf War Illnesses to compile a series of literature
reviews and policy papers. Three government-sponsored studies were
published in the New England Journal of Medicine in 1996 and 1997. These studies
were critiqued by R.W. Haley, and this critique was published in The American
Journal of Epidemiology, along with responses by the authors of the three articles in
question and a reply by Haley to those responses. The Special Assistant asked
RAND to review R.W. Haley’s critique along with the responses to that critique
by the authors of the three studies and Haley’s reply. This document reports the
results of that review. This research was begun in 1998, and a completed draft
was provided to the sponsor in May 1999.
This work is sponsored by the Office of the Special Assistant and was carried out
jointly by RAND Health’s Center for Military Health Policy Research and the
Forces and Resources Policy Center of the National Defense Research Institute.
The latter is a federally funded research and development center sponsored by
the Office of the Secretary of Defense, the Joint Staff, the unified commands, and
the defense agencies.
v
Contents
Preface ................................................... iii
Summary ................................................. vii
AN ASSESSMENT OF TECHNICAL ISSUES RAISED IN R.W.
HALEY’S CRITIQUE OF THREE STUDIES OF HEALTH EFFECTS
OF THE GULF WAR ..................................... 1
Calculation Errors ....................................... 2
The “Healthy Warrior” Effect ............................... 3
Superpopulation Models ................................. 4
Analysis of Mortality Rates................................ 5
Deaths Attributable to External Causes ....................... 7
Alternative Explanations ................................. 8
The Unequal Follow-Up Issue ............................... 10
Appendix
DERIVATION OF CONFIDENCE INTERVALS
FOR RISK RATIOS ....................................... 14
References................................................. 15
This ¶ is a kludge to fix a bug in MS Word. Don’t delete it!
vii
Summary
In a 1998 article in the American Journal of Epidemiology, R.W. Haley challenged
the validity of three government-sponsored studies that found that military
personnel deployed to the Persian Gulf region in connection with the 1991 Gulf
War experienced no excess risk of adverse health effects. The three studies,
which were published in the New England Journal of Medicine in 1996 and 1997,
used multivariate statistical procedures to contrast postwar rates of death,
hospitalization, and birth defects among Gulf War veterans with those for other
military personnel who were deployed elsewhere. Haley claimed that the
authors’ statistical methods were flawed and their findings were distorted by
various biases. The three authors published rebuttals to Haley and Haley also
prepared a response to their reply all in the same issue of the American Journal of
Epidemiology.
This study undertook a thorough review of the three original studies to examine
the technical issues that Haley raised, focusing on his criticisms of the statistical
work and the authors’ rebuttals. In essence, Haley argues that the studies’
authors—in calculating relative health risk ratios for Gulf War veterans and for
other veterans—did not account for the fact that the database they used to assess
Gulf War illness resulted from a complete sample of Gulf War veterans and
approximately a 50% random sample of the nondeployed veterans.
Haley treats the sampling variability in the mortality and hospitalization rates
based on these two huge samples as the only source of randomness affecting the
relative risk ratio estimates. He maintains that the correct formulas for
calculating confidence intervals and for gauging statistical inferences are those
tailored to this narrowly specified sampling situation. This review examined the
validity of this argument and delineated counterarguments for basing statistical
analyses on more general formulations, including the superpopulation models
that the studies in question used.
This review concludes that, in the context of assessing adverse health effects
based on observational data, Haley’s formulation exaggerates the precision of
statistical measures, ignores numerous sources of random error affecting these
measures, and constitutes an unsatisfactory basis for statistical analyses.
Moreover, even if one accepts his calculations, the paper fails to make the case
viii
that revised analyses would invalidate the other studies’ overall findings of no
adverse health effects linked to Gulf War deployment. While Haley’s work
alleges that the studies also are distorted by biases that the authors have not
properly accounted for, a hard look at this argument reveals little or no basis for
this criticism. In sum, this review supports the authors’ rebuttals of Haley’s
criticisms and concludes that they stem mainly from erroneous suppositions and
misunderstandings.
1
AN ASSESSMENT OF TECHNICAL ISSUES RAISED
IN R.W. HALEY’S CRITIQUE OF THREE STUDIES OF
HEALTH EFFECTS OF THE GULF WAR
Dr. R.W. Haley’s critique [1], “Bias from the ‘Healthy-Warrior Effect’ and
Unequal Follow-Up in Three Government Studies of Health Effects of the Gulf
War,” was published in the August 15, 1998, issue of the American Journal of
Epidemiology, along with the responses to Haley’s paper [2. 3. 4] by the authors of
the studies in question [6, 7, 8] and a reply to those responses by Haley [5]. This
review attempts to clarify some of the technical issues raised in these
commentaries and provides an independent appraisal of the arguments from a
statistician’s perspective.
The three studies, published in the New England Journal of Medicine in 1996 and
1997, examined possible health consequences of Gulf War deployment on
military personnel by contrasting postwar rates of death, hospitalization, and
birth defects among Gulf War veterans with those for “nondeployed” veterans,
i.e., those who served during the same period but did not deploy to the Gulf
War. Haley levels several criticisms against the studies:
A joint review of the three papers . . . indicates that the three
studies were strongly biased toward finding no excess risk in the
deployed veterans. The biases resulted from errors in the
calculation of confidence intervals for tests of statistical
significance, a failure to appreciate a more pertinent application of
the “healthy-soldier effect,” and the unequal effects of excluding
hospitalizations in nonmilitary hospitals (p. 315).
As this citation illustrates, Haley makes heavy use of the word “bias” in a
colloquial sense. By saying that the studies were “strongly biased toward
finding no excess risk,” Haley implies that the authors of all three studies skewed
their findings to downplay adverse health effects among Gulf War veterans. By
asserting biases in analyses, as in the boldface section headings “Bias in analyses
of hospitalization” and “Bias in analyses of mortality,” he not only impugns the
statistical work but also the analysts themselves, alleging that they used faulty
methods, incorrect formulas, and erroneous interpretations. Fortunately, Haley
backs up his charges with very clear expositions of the statistical issues, and the
authors of the three studies have responded in kind, so the allegations can be
examined one by one to determine their validity.
2
CALCULATION ERRORS
Haley’s main criticism of the statistical work in the mortality and hospitalization
studies [6, 7] was that the authors’ analyses and interpretations were flawed
because they omitted “finite population correction factors” in calculating
confidence intervals for certain “relative risk ratios.” The meanings of these
statistical concepts and their relevance to Haley’s arguments will be spelled out
later, but the nub of the issue is whether Haley’s formula on p. 316 for evaluating
confidence intervals for risk ratios is valid and applicable in this situation. If so,
the authors have understated the implied precision of the relative risk ratios in
such a way that some statistically significant differences between the
hospitalization and mortality rates of Gulf War veterans and those of
nondeployed veterans would be interpreted as insignificant. Because Haley cites
these technical “errors” to buttress his claims that the authors erroneously
discounted the “excess risks” of Gulf War deployment, the validity and
applicability of Haley’s formula are key issues in understanding the dialogue
between Haley and the authors.
The statistical framework that Haley has in mind is a narrowly defined sampling
situation in which N1 persons in a population of N persons have undergone
some “treatment” (e.g., Gulf War deployment) and the remaining N0 = N – N 1
have not. Let θ1 denote the mortality or hospitalization rate over some period
among the N1 treated members, and let θ0 denote the analogous rate for the
untreated group. Then the relative risk ratio ρ = θ1/θ0 is the ratio of the two rates.
Next, suppose that, to estimate the population parameters θ1, θ0, and ρ, one can
observe random samples of sizes n 1 and n 0 taken without replacement from the
two groups. Then the sample means (proportions) P1 and P0 can be used to
estimate the population means, and the sample relative risk ratio R = P1 /P0 can
be used to estimate the population risk ratio ρ.
As is shown in the appendix, it follows from these assumptions that, for large
values of n1 and n 0, an approximate 95% confidence interval for ρ is given by
R ⋅ exp(±1.96 T1 + T0 )
where
Ti =
1 − Pi Ni − ni
⋅
.
ni Pi Ni − 1
This formula for the confidence interval endpoints agrees with Haley’s formula
on p. 316 for the case α = .05, except that the denominator Ni – 1 in the formula
for Ti has been replaced by Ni. The second factor in the formula for Ti is called
3
the finite population correction (fpc) factor. It is noteworthy that the fpc factor
shrinks to zero (and the confidence interval endpoints merge) as the sample sizes
n i approach the population sizes Ni.
In the present case, where n 1 = N1 and n 0 ≈ N0/2 (i.e., all Gulf War veterans were
included in the study, and about half of the nondeployed veterans were
randomly selected for inclusion), the term T1 vanishes and T0 is only about half
as large as it would be if the fpc factors were omitted, so that the inclusion of the
fpc factors leads to markedly shorter confidence intervals. An implication of this
observation for statistical inferences follows from the fact that, given a 95%
confidence interval for a relative risk factor, one can test the hypothesis of no
differences in the population proportions θi (or equivalently that ρ = 1) at the 5%
significance level by observing whether or not the confidence interval covers the
value 1.0. Thus, the inclusion of the fpc factors in calculating several confidence
intervals, say, for rates of mortality attributable to various causes of death, could
lead to identifying more statistically significant differences than if the fpc factors
were omitted.
To sum up, Haley’s formula for a 95% confidence interval for θ1/θ0 is valid in a
narrowly defined sampling situation. However, this confidence interval is for a
ratio of population means, θ1/θ0, not for “adjusted” means. Unless the treatment
and untreated groups are balanced on the key risk factors related to the
outcomes of interest (to assure that θ1 and θ0 would be about the same if it were
not for the effects directly attributable to Gulf War deployment), there is no basis
for inferring that sample risk ratios P1/P0 considerably greater than (or less than)
1.0 reflect the magnitudes of the treatment effects.
THE “HEALTHY WARRIOR” EFFECT
In the three studies that Haley criticizes, the crude sample risk ratios P1 /P0 were
presented in conjunction with detailed multivariate analyses to control for
differences between the Gulf War veterans and their nondeployed counterparts
in risk factors associated with the health-related outcomes under investigation.
In this case, there were marked differences between the two groups in terms of
personal attributes (age, race, gender, type of unit, occupational specialty)—and
presumably in unmeasured health status measures as well—that had to be
accounted for in gauging the effects of Gulf War deployment. Haley clearly
understands the need to adjust the crude relative risk ratios for these differences,
arguing that the treatment and untreated groups are definitely not comparable
here because of what he calls the “healthy-warrior effect.”
4
Statisticians have a wide variety of models and methodologies (e.g., loglinear
models, logistic regression, proportional hazards models, analysis of covariance,
indirect or direct standardization) for incorporating “covariates” and risk factors
into analyses of this type. These methodologies provide “adjusted” rates for
both groups and adjusted risk ratios analogous to P1 /P0 that control for
differences in risk factors between the groups. If the covariates can be shown to
be irrelevant to the assessment of the treatment effects, then the adjusted risks
would coincide with the unadjusted risks, and the adjusted relative risk ratios
would be the same as the crude ratios P1 /P0 . In reporting the results of such an
analysis, most applied statisticians would report confidence intervals and
standard errors for the crude ratios that are consistent with their modeling
assumptions and analytic framework. And, since most statisticians use
superpopulation models in which the individual observations are treated as being
independent (or at least uncorrelated), the correct formula for the confidence
intervals endpoints, under those assumptions, would omit the finite population
correction factors. This is the essence of the authors’ responses to Haley on this
matter.
Superpopulation Models
Thus, the applicability of Haley’s formula depends in part on the tenability of
superpopulation models and “model-based” analyses. Haley argues that
superpopulation models are inappropriate here, because “Gulf War veterans
constitute a unique, finite population, one that has never existed before, for
which the defining circumstances are unlikely to recur, and for which we can
identify all members.” Gray et al. argue otherwise, conceding that the choice
between finite population and superpopulation models is a matter of debate
among some statisticians:
Briefly, this philosophical debate as applied to this study concerns
whether one wishes to consider the hospitalization experience of
Gulf War veterans and nondeployed veterans to be one
deterministic experience (finite population model) for which we
have complete data or one realization of a stochastic experience
(superpopulation model). . . Under the finite population model,
there is essentially no random variability, except that the
nondeployed veteran population was sampled at a 50 percent rate,
which results in (essentially) null confidence intervals. Under the
superpopulation model, there is stochastic variability, and the
confidence intervals reported in the hospitalization paper apply
(p. 328).
Actually, there is little debate among applied statisticians on this issue. They
routinely adopt superpopulation models, which are commonly referred to as
“survival models” or “hazard rate models” in this context, as a basis for
5
analyzing categorical data that arise from counting processes of the types
considered here, namely, counts of deaths, accidents, illnesses, hospitalizations,
birth defects, etc., during some time interval (see references 5 through 8). In
applications of these types, the models commonly adopted reflect variability in
the severity, timing, classification, and resolution of the health-related episodes
underlying the cell counts, and the counts themselves are treated as realizations
of stochastic processes (e.g., time-dependent Poisson processes). Does the
adoption of these models affect the calculation of standard errors and confidence
intervals for relative risk ratios? The answer is that, if the population means θi
are small (as they are here), the net effect of assuming that the hospitalization or
mortality category counts have Poisson distributions (in lieu of hypergeometric
distributions) is to omit the fpc factors in the formula for Ti.
To support his contention that fpc factors are required, Haley cites W.G.
Cochran’s monograph on sampling techniques [13], which is restricted primarily
to simple sampling situations. Nevertheless, Cochran makes it clear that he
views superpopulation models as viable alternative frameworks for analyzing
complex survey data; in Sections 6.7 and 7.8, he shows the concurrence of
“design-based” (finite population) estimators with best linear unbiased
estimators in simple linear regression models, and he notes the simplicity and
exactness of model-based variance calculations. For more general discussions
pointing to the tradeoffs between model-based and design-based methodologies
for complex sampling designs, see references 14 and 15.
Analysis of Mortality Rates
Haley presents his calculations of confidence intervals for relative mortality rate
ratios in Table 2, a key table in the dialogue because it constitutes the basis for
Haley’s claim that the postwar mortality rates are distorted by “selection biases”
due to the healthy-warrior effect. To separate issues here, I have checked that the
95% confidence intervals for the mortality rate ratios listed in the table are
consistent with the reported numbers of deaths and sample sizes, so the
correctness of Haley’s confidence intervals is not at issue here. However, Haley
barely mentions these confidence intervals in his discussions of the rates.
Perhaps by listing them under pairs of crude and adjusted rates, he may have
intended to invite the reader to conjecture that the adjusted rates would have
about the same relative precision as the crude rates, but there is no basis for that
conjecture.
To make his case, he argues that the pattern of the crude, cause-specific mortality
rate ratios in Table 2, in conjunction with the adjusted mortality rate ratios taken
from the Kang and Bullman study [6], indicates excess postwar mortality among
6
Gulf War veterans. He reasons that the very low crude mortality rate ratios for
infectious diseases (0.22), cancers (0.59), diseases of the digestive system (0.61),
and diseases of the circulatory system (0.87) show that the “magnitudes of the
selection biases were large.” He goes on to assert on p. 319 that, given the
pattern of the crude and adjusted cause-specific mortality rates, other ratios that
are near 1.0 or larger would be substantially higher if it were not for the healthywarrior effects that are not fully accounted for in the Kang-Bullman analysis:
The rate ratios were close to 1.0 for death from diseases of the
respiratory system, suicide, and homicide and substantially greater
than 1.0 for death from motor vehicle accidents (Table 2). Since the
“healthy-warrior effect” must have included an excess of personnel
with chronic respiratory illness and major depression in the
nondeployed group, postwar mortality rate ratios near 1.0 suggest
that the deployed group suffered excess postwar death from
respiratory illness. . . . In contrast, since death from homicide and
motor vehicle accidents is not known to have antecedents that
would prevent a soldier from being deployed to the war zone (no
“healthy-warrior effect”), their postwar rate ratios are probably
unbiased estimators of the true excess mortality risk due to deployment.
(Italics added.)
This argument has several holes. First, Haley exaggerates the potential selection
biases from the healthy-warrior effect. Challenging Haley on this score, Kang
and Bullman contend that “the effects of the potential selection bias on the
mortality outcomes are minimal and negligible” (p. 325), and they present a table
showing a remarkable concordance between the cause-specific mortality rates of
activated reservists and those for unactivated reservists during the 1991–1993
postwar period, thereby refuting Haley’s contention. Gray et al. also challenge
the basis for Haley’s supposition and present additional information, notably
Figure 1, to show that the selection effect on hospitalizations was “transient and
largely resolved by the conclusion of the conflict” (p. 328).
Second, Haley glosses over the fact that he is dealing with very small numbers of
deaths relative to the huge sample sizes. Because the cause-specific death rates
are tiny, the ratios are suspect not only because of their tiny denominators but
also because of possible classification errors in the cause-specific death counts. In
particular, Haley’s comment in the paragraph cited above regarding deaths from
diseases of the respiratory system was based on just 14 deaths among the 695,516
deployed veterans and the same number among the 746,291 other veterans in the
sample. Thus, the crude mortality rates for the two groups are 0.0000201 and
0.0000188, and the relative risk ratio is 1.07, for which Haley reports a 95%
confidence interval of (0.74, 1.56), as compared with my calculation of (0.51, 2.25)
when the fpc factors are omitted. Kang and Bullman in [6, p. 1499] reported the
7
analogous 95% confidence interval for the adjusted mortality rate ratio, 1.27, to
be (0.60, 2.70). No matter which interval estimates are used, it is hard to see how
Haley could infer that these numbers “suggest that the deployed group suffered
excess postwar death from respiratory illness.”
While the death counts for other disease-related causes are somewhat higher, the
cell counts are still very small, and even the confidence intervals that Haley
provides are wide enough to raise doubts as to whether Table 2 provides
evidence supporting Haley’s argument. Moreover, the cell counts themselves
may not be reliable, given that the causes of death were determined from death
certificates that may have misreported the principal cause of death or may have
reported multiple causes. Kang and Bullman concede possible classification
errors in their study, asserting that “death certificates dependably establish the
fact of a person’s death, but their accuracy in recording the cause is variable”
[6, p. 1503]. Given that the causes of death were then computerized using ICD-9
codes (from the International Classification of Diseases, 9th Revision) and given the
possibilities of coding and recording errors, not only in the ICD-9 codes, but also
in the Social Security numbers of the deceased veterans and the unit designation
codes used to classify the veterans into the deployed and nondeployed groups,
there is room to question the reliability of the cell counts.
If there is fuzziness in the categorization of the causes of the disease-related
deaths, and if there is reason to expect that the effects of Gulf War deployment
on mortality rates might be relatively uniform over categories, then those effects
would manifest themselves in the mortality rates and ratios derived from the
pooled counts over all disease-related categories. There were 337 deaths from all
disease-related causes among the 695,516 Gulf War veterans and 534 among the
746,291 nondeployed veterans, so that the crude mortality rate ratio was 0.68.
While that would seem to support Haley’s case for sizable healthy-warrior
effects, Kang and Bullman report that the analogous adjusted mortality rate ratio
for the pooled counts was 0.88, and the associated 95% interval estimate was
(0.77, 1.02), which provides weak evidence to support Haley’s supposition of
healthy-warrior effects and no evidence of excess deaths attributable to Gulf War
deployment.
Deaths Attributable to External Causes
The death counts attributed to external causes are larger, but there were only
1,765 deaths from all causes among Gulf War veterans and only 1,729 in the
comparison group, so that the overall mortality rates were 0.00254 and 0.00232.
A substantial majority of those deaths, 1,317 and 1,081, were attributed to
8
accidents, suicides, and homicides, so their linkages to Gulf War deployment are
questionable.
Of the three externally caused death categories that Haley lists in Table 2, deaths
attributed to motor vehicle accidents were the most numerous—549 Gulf War
veterans versus 398 other veterans—and yielded the highest mortality rate ratio,
1.48, for which the associated 95% interval estimate is (1.38, 1.59) if the fpc factors
are included, and (1.30, 1.69) otherwise. Since the analogous adjusted mortality
rate ratio and interval estimate reported in [6] are 1.31 and (1.14, 1.49), I see no
basis for Haley’s claim that the 1.48 figure represents a “probably unbiased”
estimate of the “true excess mortality risk due to deployment.” In any case, the
crude and adjusted mortality rate ratios for deaths due to accidents are quite
high, and they raise further questions as to whether other factors must be
considered in assessing the effects of Gulf War deployment.
What was there about Gulf War deployment that might account for about 150
more deaths from motor vehicle accidents during the three-year period
1991–1993? And how can one explain the high relative risk ratios for other
accidents and the significantly higher externally caused death rates for female
Gulf War veterans [6, p. 1501]? Although Kang and Bullman dismiss their
findings of higher risk ratios for males as being statistically insignificant based on
their Cox’s regression analyses and they cite previous studies finding increased
postwar mortality from accidents for veterans from previous wars, their
explanation seems unsatisfactory.
Alternative Explanations
A more plausible explanation for these findings is that they reflect what might be
termed “separation effects.” According to Table 1 in the hospitalization study
[7], the Gulf War veterans separated from service at a considerably higher rate
(42.5%) through 1993 than those in the comparison group (36.4%). Applying
those rates to the numbers of Gulf War veterans and other veterans in the
mortality study (695,516 and 746,291), I infer that approximately 296,000 of the
Gulf War veterans had returned to civilian life through 1993, outnumbering the
corresponding figure (241,000) for other veterans by 22%.
One implication of the difference in separation rates is that more Gulf War
veterans underwent separation physical examinations. Haley alleges that some
veterans would fail to report serious illnesses on those exams, but Gray et al.
defend the rigor of the exams and argue that the veterans had considerable
incentives to report their medical problems fully (p. 330). If these exams led to
identifying and treating some of the serious medical conditions, thereby
preventing later complications and deaths, this would supplant healthy-warrior
9
effects as an explanation for the somewhat lower disease-related mortality rates
for Gulf War veterans than for other veterans. The adjusted mortality rate ratio
for disease-related deaths was 0.88, and the 95% interval estimate was (0.77,
1.02).
Another implication of the difference in separation rates is that many more Gulf
War veterans became subject to the perils of civilian life. If we assume for the
moment that the separatees, being free of military constraints on personal
behavior and living under less-controlled environments, would have higher
mortality rates due to external causes (accidents, suicide, and homicide), then
one would expect proportionately more deaths in these categories among Gulf
War veterans than among other veterans. The premise that civilian life is more
hazardous is partially substantiated by Table 4 in [6], which shows that the
standardized mortality ratios for all external causes were 0.64 for the Gulf War
veterans and 0.55 for other veterans. Thus, even though the death counts for
Gulf War veterans through 1993 included large numbers of deaths after they had
returned to civilian life, and they included the relatively high death counts for
the women in this group, the Gulf War veterans still had 36% fewer deaths than
one would predict based on mortality rates for civilians having the same age, sex,
and race attributes.
Based on the standardized cause-specific mortality rate ratios in Table 4 of [6]
and Kang and Bullman’s citation of a study of U.S. Army soldiers showing that
the mortality rate of soldiers in 1986 was only half the rate of their civilian
counterparts (p. 1503), I conjecture that a reexamination of the mortality data
would support the hypothesis that there was a sizable jump in the hazard rate
(force of mortality) for externally caused deaths at the separation date. If
analyses of the timing of deaths among both groups of veterans support the
conclusion that the hazard rates for external causes were about twice as high
after the separation date than they were before, that finding, in conjunction with
the higher separation rates among Gulf War veterans, would account for the
higher adjusted mortality rate for all external causes among male Gulf War
veterans, and it might even account for the very high mortality rates for the
women who served in the Gulf.
Carrying this argument another step, suppose that it can be shown that both
groups of veterans had exactly the same cause-specific mortality rates while they
remained on active duty and they had the same (greatly elevated) mortality rates
after they separated. In addition, suppose that it can be shown that Gulf War
deployment caused higher separation rates among Gulf War veterans. Then it
would follow that the higher mortality rates among Gulf War veterans would be
attributable to Gulf War deployment, even though Gulf War deployment had no
10
effect whatsoever on mortality rates on veterans either before and after they left
the service. The point of this argument is not to explain away the differences in
the mortality rates, but to pinpoint another salient factor, in addition to Haley’s
healthy-warrior effect, that merits consideration in weighing the mortality rate
ratios.
Of course, not all veterans experienced equally hazardous conditions before or
after separation, and it seems reasonable to expect that, if there is, on average, a
doubling of the hazard rates at the separation point, then there might be no
increase whatsoever for many veterans but a tenfold or hundredfold increase for
others, say, a female military policeman who leaves the service to become a
inner-city cop. When one considers the extent to which individuals vary in their
lifestyles, environments, and exposures to hazards, a case can be made for more
detailed analyses of the mortality data to account for variability in those factors,
but I see nothing in the Kang-Bullman study indicating that they glossed over
important risk factors in their assessment. In fact, I see numerous details in their
report indicating that they strove to turn up adverse effects of Gulf War
deployment. Witness their telling statement on p. 1502 in [6]: “Of the 10 deaths
attributed to infectious or parasitic disease, none were reported as due to
leishmaniasis or other infectious diseases endemic to the Middle East, or as due
to the effects of biologic warfare agents.”
THE UNEQUAL FOLLOW-UP ISSUE
Haley’s primary criticism of the hospitalization and birth defects studies was that
they were distorted by “biases from unequal follow-up.” He makes his case as
follows:
Whereas virtually all deaths were equally ascertained in both
comparative populations for the first study, the records of
hospitalizations, births, and birth defects for the second and third
studies were obtained only from military hospitals serving
personnel remaining on active duty; the hospital records of
personnel who separated from active duty during the follow-up
period and were treated in nonmilitary hospitals were excluded
(p. 315).
This is a valid criticism, especially in the case of the birth defects study [8], which
would have to be a long-term study to be conclusive, perhaps requiring followups of the two veteran populations for ten or twenty years. However, as the
preceding discussion of separation effects shows, analysts would have to
separate the effects of Gulf War deployment from those of other salient healthrelated factors. Given the complexity of that task and, perhaps more important,
11
given “the absence of a clearly defined hypothesis regarding measurable
exposures and specific birth effects” (p. 327), I share the authors’ concerns about
the practicality of undertaking a long-term study. In my view, Cowan et al.’s
analyses in [8] and their responses to Haley’s criticisms are well-conceived and
well-documented.
Insofar as the hospitalization study [7] is concerned, it is not clear that the
restriction to military hospitalizations is a serious shortcoming. In fact, there
would seem to be some advantages from an analytic viewpoint, since this
restriction assures greater uniformity in the reporting of hospitalizations. While
Gray et al. have taken steps to augment their database with computerized
hospitalization records from the state of California and the Department of
Veterans Affairs (p. 330), it will be difficult, if not impossible, to gauge the effects
of Gulf War deployment on postservice hospitalizations allowing for the
multitude of separation effects that one can hypothesize. Moreover, it seems
unlikely that additional postservice data will reveal significant effects of Gulf
War deployment on the health outcome measures under consideration, given
that those effects have not manifested themselves in either the restricted
hospitalization data or the unrestricted mortality data for both groups of
veterans through September 1993. Of course, one cannot rule out the possibility
that additional follow-up data might facilitate identifying less serious effects of
Gulf War deployment in the form of illnesses that do not rise to the level of
requiring hospitalization, but those illnesses are not the subject of the studies
under review.
Among the specific causes of hospitalizations for which the relative risk ratios
were high, the high counts of hospitalizations for mental disorders stand out. If
one assumes that the criteria for distinguishing, say, personality disorders from
neurotic disorders are somewhat fuzzy, then one must also concede that some
higher risk ratios might stem from classification errors that distort the
subcategory counts, especially when the counts are as small as they are in these
studies. Perhaps the significantly higher risk ratio for hospitalizations in 1992
due to adjustment reactions might be attributable to Gulf War deployment, but the
vagueness of the categorization raises more questions about the reliability of the
counts. (According to the ICD-9-CM manual, adjustment reactions “are usually
closely related in time and content to stresses such as bereavement, migration, or
other experiences.”) In any case, there is only weak evidence of excess
hospitalizations for Gulf War veterans in the reported standardized risk ratios.
To account for higher relative risk ratios for genitourinary disorders, the authors
explain that “the observed differences between cohorts with regard to rates of
12
diagnoses suggest that medical care for some conditions was deferred until after
the war” [7, p. 1511].
Gray et al. provide a multifaceted response to Haley’s criticism that the
hospitalization study did not fully account for the healthy-warrior effects. First,
they challenge the basis of Haley’s supposition that there were marked
differences between the deployed and nondeployed groups in the prevalence of
chronic diseases. Second, they point to their efforts to control prewar selection
effects in their analysis by including a “surrogate health status covariate, prewar
hospitalization, as a statistical adjustment” (p. 328). Third, they report the results
of additional analyses that address the specific issue of healthy-warrior effects.
To carry out the additional analyses, they exploited a virtue of the hospitalization
database in that it covered a nearly five-year period from November 1988
through September 1993, thereby permitting comparisons of hospitalization rates
between the deployed and nondeployed groups before, during, and after the
Gulf War. To allow for the fact that the data were right-censored for veterans
who separated from service before September 1993, they employed Cox’s
regression procedure to estimate the effects of Gulf War deployment on first
hospitalizations. Then they extended their analysis to include second and later
hospitalizations by using logistic regression to compare the probabilities of
hospitalization for deployed and nondeployed personnel during each threemonth interval from November 1988 to September 1993. They found that Gulf
War veterans experienced slightly lower hospitalization rates prior to
deployment, lower rates during the military build-up and conflict (from August
1990 to July 1991), and slightly lower rates after July 1991. Averaging the
estimated quarterly hospitalization rates across time intervals, they found that
the average probability of hospitalization for Gulf War veterans before August
1990 was 0.0194, while after August 1990 it was 0.0189, whereas the comparable
averages for the nondeployed veterans were 0.0218 and 0.0235. They concluded
that there was a selection effect stemming from the fact that “only the most fit
service members were deployed; however, this effect was transient” (p. 329).
To a certain extent, these findings substantiate Haley’s position that healthywarrior effects exist and must be accounted for in definitive analyses of Gulf War
illness. However, the evidence from the three studies indicates that Haley has
exaggerated the importance of healthy-warrior effects, and he has downplayed
the efforts that the authors undertook to account for those effects. Also, while
Haley’s criticisms pertaining to unequal follow-up in the hospitalization and
birth defect studies are valid, the authors of those studies have responded fully
to his criticisms, citing additional analyses to support their findings and
interpretations. Of course, questions remain as to whether Gulf War veterans
13
have experienced significantly more chronic illnesses that do not ordinarily entail
hospitalization and whether they have suffered or will suffer more negative
long-term health outcomes than their nondeployed counterparts. The clear
message that emerges from this review is that Haley’s concerns about healthywarrior effects and unequal follow-ups must be addressed in studies that
attempt to answer those questions.
14
Appendix
DERIVATION OF CONFIDENCE INTERVALS FOR
RISK RATIOS
Let ρ = θ1 /θ0 denote the risk ratio of interest, where θ1 and θ0 are the population
proportions to be estimated by sample proportions P1 and P0 based on random
samples of sizes n1 and n0 taken without replacement from populations of sizes
N1 and N0 . Then we know from sampling theory results [13, p. 51] that Pi is
unbiased for θi with variance
Var(Pi ) =
θ i (1 − θ i ) Ni − ni
⋅
ni
Ni − 1
and that Pi is asymptotically normal for large values of n i and Ni. It follows that
Z = ln R = ln(P1 / P0 ) = ln P1 − ln P0 is asymptotically normal with mean
ln ρ and variance σ Z2 = Var (ln P1 ) + Var (ln P0 ) . Hence, if σ Z were known,
the end points Z ± 1.96σ Z would define a 95% confidence interval for ln ρ so
that a 95% confidence interval for ρ would be given by
exp(Z ± 1.96σ Z ) = (P1 / P0 )exp(±1.96σ Z ) = R ⋅ exp(±1.96σ Z ) .
Moreover, the same asymptotic result holds if one replaces σ Z by the standard
error σ̂ Z derived by replacing population means by sample means in the
σ Z2 . Applying the Taylor’s formula linear approximation for
f (x) = ln x around x = θ , namely,
formula for
f (x) ≈ f (θ ) + f ′(θ )(x − θ ) = f (θ ) + (1 / θ )(x − θ ) ,
we first use the representation
ln Pi ≈ ln(θ i ) + (Pi − θ i ) / θ i to approximate
Var(ln Pi ) = (1 / θ i2 )Var(Pi ) =
which leads to
1 − θ i Ni − ni
⋅
niθ i Ni − 1 ,
σ̂ Z = Est.Var(Z) = T1 + T0 where Ti is specified by the
formula on page 2.
15
References
1.
Haley RW. Bias from the “Healthy-Warrior Effect” and Unequal FollowUp in Three Government Studies of Health Effects of the Gulf War. Am J
Epidemiol 1998; 148: 315–323.
2.
Kang HK, Bullman TA. Counterpoint: Negligible “Healthy-Warrior
Effect” on Gulf War Veterans’ Mortality. Am J Epidemiol 1998; 148:
324–325.
3.
Cowan DN, Gray GC, DeFraites RF. Counterpoint: Responding to
Inadequate Critique of Birth Defects Paper. Am J Epidemiol 1998; 148:
326–327.
4.
Gray GE, Knoke JD, et al. Counterpoint: Responding to Suppositions and
Misunderstandings. Am J Epidemiol 1998; 148: 328–333.
5.
Haley RW. Countercounterpoint: Haley Replies. Am J Epidemiol 1998;
148: 334–338.
6.
Kang HK, Bullman TA. Mortality Among U.S. Veterans of the Persian
Gulf War. N Engl J Med 1996; 335: 1498–1504.
7.
Gray GC, Coate BD, Anderson CM, et al. The Postwar Hospitalization
Experience of U.S. Veterans of the Persian Gulf War. N Engl J Med 1996;
335: 1505–1513.
8.
Cowan DN, DeFraites RF, Gray GC, et al. The Risk of Birth Defects
Among Children of Persian Gulf War Veterans. N Engl J Med 1997; 336:
1650–1656.
9.
Agresti A. Categorical Data Analysis. New York: Wiley, 1990.
10.
Cox DR and Oakes D. Analysis of Survival Data. New York: Chapman and
Hall, 1984.
11.
Elandt-Johnson RC and Johnson NL. Survival Models and Data Analysis.
New York: Wiley, 1980.
12.
Chiang CL. Introduction to Stochastic Processes in Biostatistics. New York:
Wiley, 1968.
13.
Cochran WG. Sampling Techniques. 3d ed. New York: Wiley, 1977.
14.
Sarndal CE. On π-Inverse Weighting Versus Best Linear Unbiased
Weighting in Probability Sampling. Biometrika 1980; 67: 639–650.
15.
Smith TMF. Present Position and Potential Developments: Some Personal
Views—Sample Surveys. J Royal Statist Soc A 1984; 147: 208–221.
Download