How to write-up models with negative binomial variance Fred Watson, 2010-2014 Negative binomial distributions are appropriate when your response variable is a positive integer with possibly large variance. Count data are typical of this. Conversely, if your response is a count, it is not appropriate to model it using continuous distributions (e.g. gamma, log-normal) or distributions that are defined below zero (e.g. normal). (Poisson distributions can also be used for count data, and are historically more common. Poisson models would be the best choice for equi-dispersed data, since they use one less parameter than negative binomial models, and are thus more parsimonious. You could choose between Poisson and negative binomial either using a dispersion test, or an AIC model comparison between two models that differ only in being either Poisson or negative binomial.) Let’s say you want to model the influence of weather on observed counts of a bird species. The response is the bird count, a positive integer. The predictor is weather, which we’ll say is a binary variable taking values of either 0 (clear weather) or 1 (cloudy weather). You could write: I modeled variation in bird count, Yi, at site i, as a function of weather, Wi, using a linear model with negative-binomial variance: ðð ~ NB(ðð , ð) ðð = ð―ðķ + ð―ð ðð where NB(ðð , ð)is a negative binomial distribution function with mean µi and a dispersion parameter ïą (Venables & Ripley 2002), the mean count, µi, is modeled as linear function of weather Wi and ïĒC and ïĒ W are fitted coefficients representing the mean count on clear days and the effect of cloudy weather on the count respectively. The model was fitted using the glm.nb function in the R statistical package (R Core Development team 2011) using the code: library("MASS") glm.nb( Y~W, link="identity") It’s probably more common to use a log link function, which leads to an exponential relationship between mu and the predictor variable/s. So if you did that, you’d need to change the equation, code, and text as indicated in red below: I modeled variation in bird count, Yi, at site i, as a function of weather, Wi, using a linear model with negative-binomial variance: ðð ~ NB(ðð , ð) ðð = exp(ð―ðķ + ð―ð ðð ) where NB(ðð , ð)is a negative binomial distribution function with mean µi and a dispersion parameter ïą (Venables & Ripley 2002), the mean count, µi, is modeled as an exponential function of weather Wi and ïĒC and ïĒW are fitted coefficients representing the mean count on clear days and the effect of cloudy weather on the count respectively. The model was fitted using the glm.nb function in the R statistical package (R Core Development team 2011) using the code: library("MASS") glm.nb( Y~W, link="log") A more-complex way to write this might be: I modeled variation in bird count, Yi, at site i, as a function of weather, Wi, using a linear model with negative-binomial variance: ððð (ðĶ|ðð , ð) = NB(ðĶ|ðð , ð) = Γ(ð + ðĶ) ðð ðĶ ð ð Γ(ð)ðĶ! (ðð + ð)ð+ðĶ ðð = ð―ðķ + ð―ð ðð where µi is a linear model for the mean count, ïĒC and ïĒW are fitted coefficients representing the mean count on clear days, and the effect of cloudy weather on the count, ððð (ðĶ|ðð , ð) is the probability mass function for Yi,ï ïą is a dispersion parameter, and ï(.) is the gamma function. This model leads to: E(ðð ) = ðð var(ðð ) = ðð + ðð 2 /ð where E(Yi) denotes the expected value of Yi, and var(Yi) denotes the variance of Yi, from which it can be seen that the variance increases with the mean and can be larger than the mean (Venables & Ripley 2002). The model was fitted using the glm.nb function in the R statistical package (R Core Development team 2011) using the code: library("MASS") glm.nb( Y~W, link="identity")