Count Data Regression Poisson, Negative

Count Data Regression Poisson and Negative-binomial Regression models using R Asma Alfadhel Timothy Monday Outline  Theoretical Background ◦ Count Data ◦ Count Regression Models ◦ Over Dispersion  Implementation in R ◦ Simulated Data ◦ Real Data Part I THEORETICAL BACKGROUND Count Data  The observations can take only a nonnegative integer values 0, 1, 2, 3, ..  These integers arise from counting rather than ranking.  It is different from: – Binary data y ϵ (0,1) – Ordinal data ranking is important. (logistic/ probit regression) http://en.wikipedia.org/wiki/Ordinal_data Count Data  Examples: – The number of DVD purchases in a store, observed for several months. – The number of Dr. visits, observed for several days. Count Regression Models  General Linear Models −Poisson regression model (equal dispersion) −Negative binomial regression model (NB)  (over dispersion) Two-Part Models −Hurdle Poisson model −Hurdle negative binomial model −Zero-inflated Poisson model −Zero-inflated negative binomial model (excess zeroes) Poisson Distribution  The Poisson distribution has one parameter μ  E(Y) = μ = Var(Y)  e = 2.71828…  y! = 1 x 2 x 3 x … x y ≈ Normal Distribution Small mean Small count numbers Many zeroes Poisson Regression Large mean Large count numbers Few/none zeroes OLS Regression Poisson Regression Model  It allows you to model the relationship between a Poisson distributed response variable and one or more explanatory variables.  The explanatory variables can be either numeric or categorical. http://www.instantr.com/category/statistical-models/ Poisson Regression Model  Poisson Regression Coefficient Interpretation Example 1: Example 2: yi ~ Poisson (exp(2.5 + 0.18Xi)) yi ~ Poisson (exp(2.5 - 0.18Xi)) (e0.18 )= 1.19 (e-0.18 )= 0.83 1.19 – 1 = 0.19 1- 0.83 = 0.17 A one unit increase in X, A one unit increase in X, will will increase the average decrease the average number of y by 19% number of y by 17% Over Dispersion  Observed variance > Theoretical variance  The variation in the data is beyond Poisson model prediction Var(Y)= μ+ α ∗ f(μ), (α: dispersion parameter)  α = 0, indicates standard dispersion (Poisson Model)  α > 0, indicates over-dispersion (Reality, Neg-Binomial)  α < 0, indicates under-dispersion (Not common) Negative-Binomial vs. Poisson Distribution Many zeroes Small mean Small count numbers Poisson Regression Many zeroes Small mean more variability in count numbers NB Regression Negative-Binomial vs. Poisson Distribution Many zeroes Large mean NB Regression Few\none zeroes Large mean OLS Regression Negative-Binomial Distribution One formulation of the negative binomial distribution can be used to model count data with over-dispersion http://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm Negative-Binomial Regression Model  Estimation Method  Parameters Estimation  For Poisson Regression β0, β1, …, βn  For Negative-Binomial Regression β0, β1, …, βn, and a Goodness of Fit  LLNB > LLPoisson  AIC = 2k - 2ln(L) ◦ k: # of parameters ◦ L: Maximum Likelihood http://en.wikipedia.org/wiki/Akaike_information_criterion Two-Part Models  Used to handle the excess zeroes issue  Instead of assuming that count outcome comes from a single data generating process, two-part Zeroes:generated by two models, considerExcess count outcome systematically different statistical processes When there are more zeroes in the data than a Logit model P or orNegative-Binomial NB model predicts Poisson Model  Hurdle Regression (Zero-truncated)  Zero-inflated Regression http://www2.sas.com/proceedings/forum2008/371-2008.pdf Part II IMPLEMENTATION IN R Models in R Poisson Model glm(Y ~ X, family = poisson) Negative Binomial Model glm.nb(Y ~ X) Hurdle-Poisson Model hurdle(Y ~ X| X1, link = “logit”, dist = “poisson”) hurdle(Y ~ X| X1, link = “logit”, dist = “negbin”) Zero-Inflated Model zip(Y ~ X| X1, link = “logit”, dist = “poisson”) zinb(Y ~ X| X1, link = “logit”, dist = “negbin”) Thank you For Listening ANY QUESTIONS? References Count Data Models in SAS, WenSui Liu, Jimmy Cela http://www2.sas.com/proceedings/forum2008/371-2008.pdf Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press. http://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm http://en.wikipedia.org/wiki/Ordinal_data http://en.wikipedia.org/wiki/Poisson_distribution http://www.instantr.com/category/statistical-models/ http://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm http://en.wikipedia.org/wiki/Akaike_information_criterion Generalized Linear Models A generalized linear model involves: 1.Data vector y = ( y1, y2, …, yn) 2.Predictors X and coefficients β , forming a linear predictor Xβ 3.A link function g, yielding a vector of transformed data ŷ = g(Xβ) 4.A data distribution, p(y| ŷ) 5.Possible other parameters (variances, overdispersions, & cutpoints) involved in the predictors, link functions, and data disribution Gelman, A., & Hill, J. (2007)

Count Data Regression Poisson, Negative

Related documents

Products

Support

Count Data Regression Poisson, Negative

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib