by Harris DE, Aboueissa AM and Hartley D published in Rural and

advertisement
Comments on Harris DE, Aboueissa, AM and Hartley, D (2008)
Igor Kovasov
Institute of Biology of the Southern Seas
National Academy of Sciences of Ukraine
This note reports the wrong statistical regression used in “Myocardial infarction and heart
failure hospitalization rates in Maine, USA – variability along the urban-rural continuum”
by Harris DE, Aboueissa AM and Hartley D published in Rural and Remote Health, 8:
980 (online), 2008.
In this study, the response (dependent) variable in the study is the hospitalization rates.
Clearly normal distribution-based linear regression models are incorrect simply because
the rate is NOT a irrelevant to normal distribution and, moreover, there is no warranty
that the predictive rate will be between 0 and 1. In fact, Poisson regression is the
prototype for rate data (see, Chapter 12 of Fleiss, Levin and Paik, 2003 and Chapter 9 of
Agresti, 2002).
It is well known that an incidence rate (such as hospitalization rate, rare disease rate,
accident rate, etc) is defined to be the ratio of the frequency (X) and size (of the
population or subpopulation, denoted by t). Even in elementary statistics textbook, we
can see the frequency (X) of the occurrences (number of people hospitalized in this case)
is a default Poisson random variable (not a normal random variable) whiles the size (n) if
NOT a random variable. Therefore, the rate is a scalar multiple of a Poisson random
variable. The rate is NOT a normal random variable. From mathematical point of view,
rate is defined in interval [0, 1] while normal random variable is defined in (-∞, +∞).
A little bit more details on why Poisson is the correct model and normal linear model is
incorrect. Note that, log (rate) = log(freq) – log(n). Since freq < n, therefore, - ∞ <
log(rate) = log(freq) – log(n) < 0 consequently, exp(-∞) < rate < exp(0), i.e. 0 < rate
< 1. This makes both statistical and mathematical senses! While in linear regression
model E[rate] = β0 + β1X1 + ∙∙∙ + βkXk , there is no warranty the average is always in [0, 1]
for any pattern of values of independent variable! Hence the linear regression model is
wrong from both statistical and practical point of view.
In summary, the normal based linear regression in the data analysis is obviously a wrong
model! Consequently, the results based on the wrong model are statistically incorrect.
References
Fleiss, JL, Levin, B and Paik, MC. 2003. Statistical Methods for Rates and Proportion.
3rd Ed. John Wiley.
Agresti, A. 2002. Categorical Data Analysis, 2nd Ed. John Wiley.
Download