Interpreting and Extending the Poisson Model

advertisement
Extending the Poisson Model
I.
Poisson Extension #1. Varying Lengths of Observation Periods
A. Concepts Behind the Extension. We can relax the first assumption, which
constrained our intervals to all be of the same length. There are two ways that we can think
of relaxing this assumption. One of them will change the systematic component of our
model, and fits with the conceptualization of event counts as the outcome of a number of
binary trials. The other will change the stochastic component, and fits with the
conceptualization of event counts as a realization of a random rate of occurrence. Both will
ask us to supply additional information giving each intervals length, and both have the same
log likelihood function. Which concept you pick, and thus what sort of information you
supply, will depend on how you think of your count variable.
B. Incorporating the Number of Possible Events into the Systematic
Component. Suppose you think of your event count as the outcome of a series of binary
trials. King’s example is the count of US House members who switch parties in any given
session. The length of Congressional sessions stay (approximately) constant over US history,
but the size of the House has grown. This produces more potential party switchers, and we
can generally denote the number of trials in a time interval as Ni. It is alright that N is finite,
in this case, because the chances are extremely small that the number of events in any
observed interval would approach this upper bound (ie, that all members of Congress would
switch their party). But Ni should still influence the rate of occurrence in any interval, λi, and
thus can become part of the systematic component of our model. We can now rewrite this
component in one of two equivalent ways:
E(Yi)/Ni = exp(xiβ) or
E(Yi) = exp(xiβ+lnNi)
Substituting this new parameterization into our joint density function, we end up with the
following new log-likelihood function:
n
~
~
~
ln L(  | y )  { N i exp( xi  )  yi ( xi  )}
i 1
C. Incorporating the Length of the Interval into the Stochastic Component.
Alternatively, if you think of your event count as a realization of a random rate of occurrence
over intervals that differ in their lengths, you can model this in the stochastic component.
You many be counting the number of wars that countries such as Britain, the United States,
Botswana, and Uzbekistan have fought. You would want to incorporate the length of time,
ti, for which each country has existed into your stochastic component and rewrite it, just we
truncated the normal distribution when we had truncated data. The new stochastic
component and new log likelihood are given by:
n
Pr(Y |  , t )  
i 1
e  i ti i ti
yi !
yi
n
~
~
~
ln L(  | y )  {ti exp( xi  )  yi ( xi  )}
i 1
D. Estimating this Model in Stata. But what if I note that some of the more
professional legislatures have more time to introduce bills, and thus that the intervals here
vary in length? I can incorporate my information about session lengths into this model
through the “exposure” option in Stata, which allows me to input a variable “totaldays”
which tells Stata that some legislatures were exposed to long sessions than others.
poisson introreg salary stafflo senhear, exposure( totalday)
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
Poisson regression
Log likelihood = -62822.727
=
=
=
=
-62897.255
-62822.792
-62822.727
-62822.727
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
=
=
=
=
50
6650.69
0.0000
0.0503
-----------------------------------------------------------------------------introreg |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------salary |
1.08e-06
1.34e-07
8.07
0.000
8.17e-07
1.34e-06
stafflo |
.3192996
.0072353
44.13
0.000
.3051187
.3334806
senhear | -.1994782
.0059506
-33.52
0.000
-.2111412
-.1878152
_cons |
3.071918
.0042217
727.66
0.000
3.063644
3.080192
totalday | (exposure)
II.
Poisson Extension #2. When Events are Correlated.
A. Relaxing the Independence Assumption. In many situations, you may want to
relax the assumption that one event occurring in an interval has no effect on the number of
subsequent events in that interval. Consider a model that looks at the number of ethnic
violence killings in one country over a number of years. Features of the social system such
as unemployment and police deployments may drive the rate of killings, but when one killing
occurs, it could set off a cycle of reprisals. This is an example of positive “contagion,”
where one killing can lead to more killings over the rest of the period than we would expect,
given the social system. An example of negative contagion might be the occurrence of a
war, which then dissipates its combatants’ appetites for war over the rest of the decade.
B. Contagion Changes the Variance of Event Counts. We can think of positive
contagion either as involving correlation rather than independence across events, or as
events that change the rate of occurrence λi over an interval. In either case, the variable Yi is
now “overdispersed”: its variance is larger than one would expect under the independence
assumption. (If we have negative contagion, Yi becomes underdispersed.) The way to
model this is by relaxing an assumption, built into the Poisson function, that the variance σ2
is equal to 1. If we do not want to make the assumption that our events are independent,
and thus that σ2=1, we can select as our stochastic component a “super-distribution” that
King proposes. He calls this the “generalized event count” distribution. It allows σ2 to vary,
and uses three “nested” distributions depending on the value of σ2:
i. if 0 < σ2 < 1, Yi is distributed “continuous parameter binomial”
ii. if σ2 = 1, Yi is distributed Poisson
iii. if σ2 > 1, Yi is distributed “negative binomial”
C. Does this Complicated Function Make My Life Easier? Yes. First, it’s very
easy to estimate this model using Stata. Maximizing the ML model will produce estimates of
σ2 in addition to the estimates of β. If it happens that your events are independent, σ2 will
not be significantly different from 1, and your β will be about the same as if you had simply
estimated a Poisson model. If it turns out that seems to be over- or underdispersion, your
new estimates will tell your substantive story much better. Why? In this case, if you had
simply run a Poisson model, your β estimates would have been consistent, but inefficient,
and your estimates of their standard errors would be biased. King’s empirical example on
page 129 shows how much your substantive conclusions can be improved by the efficiency
gains of estimating a generalized event count model that let σ2 vary and thus provided a more
realistic model of the data generating process.
Download