Lecture 6

advertisement
Quantitative Methods
Analyzing event counts
Event Count Analysis
Event counts involve a non-negative interger-valued
random variable. Examples are the number of bills
introduced by a legislator, the number of car
accidents, etc. Trivia: one of the earliest recorded
uses of the poisson distribution was an 1898 analysis
of the number of Prussian soldiers that were kicked
to death by horses.
OLS can generally not be used for event count
analysis because it will produce biased and
inconsistent estimates. (The dependent variable is
not really interval / continuous—it is left censored—
and the data are heteroskedastic.)
Event Count Analysis
Poisson models
Poisson regression—another example
Poisson Models
 The poisson distribution function:
 (a poisson distribution has a mean and
variance equal to λ. As λ increases, the
distribution is approximately normal.
Poisson Models
 The predicted counts (or “incidence
rates”) can be calculated from the results
as follows:
Poisson Models
One can compare incidence rates with the
“incidence rate ratios”. The incidence
rate ratio for a one-unit change in xi with
all of the variables in the model held
constant is e Bi
Poisson Models—an example
-----------------------------------------------------------------daysabs |
b
z
P>|z|
e^b e^bStdX SDofX
---------+-------------------------------------------------------gender | -0.40935
-8.489
0.000
0.6641
0.8147
0.5006
angnce | -0.01467 -11.342
0.000
0.9854
0.7686
17.9392
------------------------------------------------------------------
Poisson Models—an example
Being male decreases the # of days absent by a
factor of .66.
And it decreases the expected # of days absent by
100*(.66-1)% = =33%.
For each point increase in the language score, the
expected # of days absent decreases by a factor
of .98 (or an expected decrease of
100%(.98-1)%= -2%))
Negative Binomial Regression
Often, there is overdispersion, where the variance > mean. In
practice, what this usually means of one of two things: first,
it’s possible that there is some unobserved variable that
makes some observations have higher counts than others
(i.e., number of publications of professors—or # rbi of a
sports team—can’t assume the mean # is the same across
observations).
Essentially, this is common with pooled data, and unobserved
variables—and will look like heteroskedasticity.
(Examplethe school from which one graduates).
Negative Binomial Regression
The second possibility is that if you have
one event, it increases or decreases
the probability that you will have
others (i.e., bill sponsorship counts)
Negative Binomial Regression
A negative binomial regression analysis
is appropriate in these cases (and if
there is no “overdispersion”, a NBR
will collapse down to a Poisson).
(Notethere are also alternatives, such
as zero-inflated (many, many zeros)
and zero-truncated (no zeros) NBR.)
Negative Binomial Regression
Zero inflated models essentially model based
on the assumption that there is an “always
zero” category of cases and a “sometimes
zero” category of cases.
Zero truncated models  example would be
online survey of web usage.
Download