by Harris DE, Aboueissa AM and Hartley D published in Rural and

Comments on Harris DE, Aboueissa, AM and Hartley, D (2008) Igor Kovasov Institute of Biology of the Southern Seas National Academy of Sciences of Ukraine This note reports the wrong statistical regression used in “Myocardial infarction and heart failure hospitalization rates in Maine, USA – variability along the urban-rural continuum” by Harris DE, Aboueissa AM and Hartley D published in Rural and Remote Health, 8: 980 (online), 2008. In this study, the response (dependent) variable in the study is the hospitalization rates. Clearly normal distribution-based linear regression models are incorrect simply because the rate is NOT a irrelevant to normal distribution and, moreover, there is no warranty that the predictive rate will be between 0 and 1. In fact, Poisson regression is the prototype for rate data (see, Chapter 12 of Fleiss, Levin and Paik, 2003 and Chapter 9 of Agresti, 2002). It is well known that an incidence rate (such as hospitalization rate, rare disease rate, accident rate, etc) is defined to be the ratio of the frequency (X) and size (of the population or subpopulation, denoted by t). Even in elementary statistics textbook, we can see the frequency (X) of the occurrences (number of people hospitalized in this case) is a default Poisson random variable (not a normal random variable) whiles the size (n) if NOT a random variable. Therefore, the rate is a scalar multiple of a Poisson random variable. The rate is NOT a normal random variable. From mathematical point of view, rate is defined in interval [0, 1] while normal random variable is defined in (-∞, +∞). A little bit more details on why Poisson is the correct model and normal linear model is incorrect. Note that, log (rate) = log(freq) – log(n). Since freq < n, therefore, - ∞ < log(rate) = log(freq) – log(n) < 0 consequently, exp(-∞) < rate < exp(0), i.e. 0 < rate < 1. This makes both statistical and mathematical senses! While in linear regression model E[rate] = β0 + β1X1 + ∙∙∙ + βkXk , there is no warranty the average is always in [0, 1] for any pattern of values of independent variable! Hence the linear regression model is wrong from both statistical and practical point of view. In summary, the normal based linear regression in the data analysis is obviously a wrong model! Consequently, the results based on the wrong model are statistically incorrect. References Fleiss, JL, Levin, B and Paik, MC. 2003. Statistical Methods for Rates and Proportion. 3rd Ed. John Wiley. Agresti, A. 2002. Categorical Data Analysis, 2nd Ed. John Wiley.

by Harris DE, Aboueissa AM and Hartley D published in Rural and

Related documents

Products

Support

by Harris DE, Aboueissa AM and Hartley D published in Rural and

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib