Maximum Likelihood in Concept and Practice

advertisement
Maximum Likelihood in Concept and Practice
(cribbed mostly from Gary King’s Unifying Political Methodology)
Monday, March 29: The Goals and Foundations of Maximum Likelihood
I.
II.
III.
IV.
Definitions and Notation
The Linear Model in a General Form
Why Inverse Probability Doesn’t Work
Why the Likelihood Model of Inference Works
I. Definitions and Notation
A. Developed by statistician R.A. Fisher in the 1920s, borrowed first by economists,
and finally imported into political science, maximum likelihood provides a fundamental
rationale for using appropriate estimators and gives us lots of flexibility to match our
statistical models to the process that we think generated our data. There are a number or
overarching approaches designed to unify statistical methodologies, but this is by far the
most familiar to political scientists. It requires you to make explicit choices about how you
think your dependent variable is distributed (the model’s stochastic component) and what
the relationship is between your independent and dependent variables (the model’s
systematic component). Then, based on some basic rules of probability, it teaches you how
to write out a likelihood function and find the set of parameters most likely to have
generated observed data, given your assumed model. Before we learn the step-by-step
process of getting maximum likelihood estimate, we need to learn a bit of notation.
B. Let Yi be a “random variable.” It is random in that there is stochastic variation in
it across many experiments for a single observation, and a variable in that it varies across
observations in a single experiment. Let yi be one draw from the random variable. Let xi be
one draw (consisting of one or more explanatory factors) from the social system X.
C. Hypothesize some model M about how the social system produces the random
variable. We can partition this model into M*, the part of the model that we will assume,
and θ, the part of the model composed of parameters that we will estimate. A fully
“restrictive” model has all of its assumptions specified (it is all M* and no θ); it is the most
parsimonious model and omits all of the variables. An “unrestrictive” model estimates
everything, is all θ and no M*, and is more interesting but demands more from the data. In
hypothesis testing, we will often compare a fairly unrestrictive model to a slightly more
restrictive model.
II. The Linear Model in a General Form
A. You should be familiar with this way of writing out an OLS regression:
1
Yi  xi    i
systematic
component
stochastic or random
component
King refers to it as the linear normal regression model, breaking it down into its
linear systematic component and its normally distributed stochastic component. The
stochastic component’s distribution is given by εi ~ fn(ei|0, σ2). This should be read as “the
errors are distributed normally with a mean of zero and a variance of σ2,” which is elsewhere
written as εi ~ N(0, σ2).
B. A more general way of writing out an identical linear normal model is:
Yi ~ fn(yi|μi, σ2) where μi = xiβ
Note that this expression models the randomness in Yi directly, rather than through
εi. Econometrics textbook writers like Goldberger show that this assumption of normality in
the distribution of Yi around its expected value is equivalent to assuming that the errors are
normally distributed around zero. King uses this style of presentation because the maximum
likelihood process requires you to make a substantive assumption about how the data are
generated (and thus distributed), and thinking about the dependent variable itself is usually
more natural than thinking about its errors.
i. The systematic component in the expression above is a statement of how θi varies
over observations as a function of a vector of explanatory variables. It says that xi and Yi are
“parametrically related” through E(Yi) = μi = xiβ. It can be written in a general functional
form as θ=g(X,β).
ii. The stochastic component should not be viewed merely as an annoyance, but as
an expression that contains substantive information. It can be written generally as
Yi ~ fi(yi|θi,αi) where θ is the vector of parameters of interest, like μi in the linear case, and α
is the vector of ancillary parameters, like σ2 in the linear case.
III. Why Inverse Probability Doesn’t Work
A. Wouldn’t it be great if we could determine the absolute probability of some
parameter vector θ, given our data y and model M*? If we could do that, we could conduct
a poll, assume that the variables that we didn’t measure are irrelevant, and then make
statements like, “This is a 0.8237 probability that the effect of getting a PhD on your
expected annual income is -$32,689.” This would be an inverse probability statement, and
for a while was the holy grail of statistics. It can be formalized as Pr(θ|y, M*), though
because M* is assumed, it is usually suppressed and an inverse probability is written as
Pr(θ|y).
2
B. Using some basic rules of probability, we can see what we would need to calculate
in order to calculate an inverse probability:
Pr(θ|y) = Pr(θ,y) / Pr(y)
by the rule that Pr(a|b)=Pr(a,b)/Pr(b)
Pr(θ|y) = Pr(θ)Pr(y|θ) / Pr(y)
by substituting Pr(θ,y)=Pr(y,θ)=Pr(y|θ)Pr(θ)
This is “Bayes Theorem,” and statisticians thought it would give them a way to
calculate an inverse probability. It is possible to calculate Pr(y|θ), which the probability of
observing your data given a hypothesized parameter vector, and referred to as the
“traditional probability.” We can put Pr(y) in terms of Pr(y|θ) and Pr(θ), but this leaves us
with the tricky Pr(θ), one’s prior belief about θ. There is a raging debate in statistics between
the “frequentists,” who define probability as the relative frequency of an event in
hypothetical repetitions of an experiment, and those who say probability is only the
subjective judgment of individuals. But no matter which camp you come from, you cannot
use Pr(θ) to assign an absolute value to the inverse probability.
IV. Why the Likelihood Model of Inference Works
A. Without a method for calculating absolute inverse probabilities, we must be
content with relative measures of uncertainty. This is what the likelihood model of inference
~
gives us. Now we will let  be hypothetical values of the single, unobserved true value θ and
let ˆ be the point estimator for it. We can write out the “Likelihood Axiom:”
~
~
L( | y, M *)  L( | y )
~
~
L( | y )  k ( y ) Pr( y |  )
~
~
L( | y )  Pr( y |  )
In this axiom, k(y) can be treated as a constant, because it is an unknown function of
the data, which makes the likelihood of the true parameter given the data only proportional
to the traditional probability. For a given set of observed data, k(y) is the same over many
~
possible values of  . It varies, though, with different datasets, and this is what makes
likelihood statements relative (relative to the likelihoods of other parameters given the same
~
dataset). L(  |y) is the likelihood of a hypothetical model having generated the data,
assuming M*. It can take on any value, and you can only compare likelihoods for the same
dataset.
~
B. A “likelihood function” summarizes θ, allowing us to plot values of  by the
likelihood of each, given the data. “Maximum likelihood estimation” is a theory of point
estimation that finds the maximum value of the likelihood function.
3
Wednesday, March 31st. Constructing a Likelihood Function
V.
A Survey of Stochastic Components
VI. The Likelihood Function for a Normal Variable
VII. Summarizing a Likelihood Function
V. A Survey of Stochastic Components
A. Approach and Notation. King’s Chapter 3 looks at many possible forms that the
stochastic component of one observation yi of the random variable Yi could take, while
Chapter 4 introduces the systematic component of yi as it varies over N observations. Let S
be the sample space, the set of all possible events, and let zki be one event. This even is a set
of outcomes of type k. Let yji be a real number representing one possible outcome of an
experiment.
B. Useful Axioms of Probability. These tell us what the univariate probability
distributions that we will survey look like in general.
i.
For any event zki, Pr(zki) is > or = 0.
ii.
Pr(S)=1.0
iii.
If z1i,… zki are k mutually exclusive events, then
Pr(z1i U z2i …U zki) = Pr(z1i) + Pr(z2i) +… Pr(zki)
C. Results for stochastically independent random variables. These will be most
useful when we construct likelihood functions.
i.
Pr(Y1i, Y2i) = Pr(Y1i)Pr(Y2i)
ii.
Pr(Y1i|Y2i) = Pr(Y1i)
D. What is a Univariate Probability Distribution? It is a complete accounting of the
Pr that Yi takes on any particular value yi. For discrete outcomes, you can write
out (and graph) a probability mass function (pmf). For continuous outcomes,
you can write out or graph a probability density function (pdf). Each function is
derived from substantive assumptions about the underlying “data generating
process.” You can develop your own distribution, or you select one of the many
functions off the shelf that King surveys from, such as:
i.
The Bernoulli Distribution. This is used when a variable can only
take on two mutually exclusive and exhaustive outcomes, such as a
two party election where either one party wins or the other does.
The algebraic representation of the pmf incorporates a systematic
component that measures how fair the die was that determines which
outcome takes place.
ii.
The Binomial Distribution. The process that generates this sort of
data is a set of many Bernoulli random variables, and we observe
their sum. It could be how many heads you get in six flips of a coin,
or how many times a person voted over six elections. It requires the
assumption that the Bernoulli trials are “i.i.d.,” or independent and
identically distributed. This means that the coin (or person) has no
4
iii.
iv.
v.
vi.
memory, and that the probability of getting heads (or voting) is the
same in each trial.
Extended Beta Binomial. This relaxes the i.i.d. assumption of the
Binomial Distribution, and could be useful for looking at yes or no
votes cast by 100 Senators.
Poisson. For a count with no upper bound, when the occurrence of
one event has no influence on the expected # of subsequent events.
Negative Binomial. Just like the Poisson, but the rate of occurrence
of event varies according to the gamma distribution.
Normal Distribution. This is a continuous variable, where the
disturbance term is the sum of large number of independent but
unobserved factors. A possible substantive example is presidential
approval over time. The random variable is symmetric, and has
events with nonzero probability occurring everywhere. This means
that a normal distribution cannot generate a random variable that is
discrete, skewed, or bounded. Its pdf can be written out as:
fN (yi |μ,σ) = (2πσ2)-½ exp[-(yi-μ)2/2σ2]
vii.
where π=3.14 and exp[a]=ea
Log-Normal Distribution. It is like the Normal Distribution, but
with no negative values.
VI. The Likelihood Function for a Normal Variable
A. Begin by writing out the stochastic component in its functional form (the pdf that
returns the probability of getting yi in any single observation, given μi). This is the traditional
probability, and it is proportional to the likelihood.
fN (yi |μ,σ) = (2πσ2)-½ exp[-(yi-μ)2/2σ2]
B. If we can assume that there is stochastic independence across our observations
(no autocorrelation), we can use the Pr(Y1i, Y2i) = Pr(Y1i)Pr(Y2i) rule to build a joint
probability distribution for our observations:
f(y|μ) = Π fn(yi|μi)
f(y|μ) = Π (2πσ2)-½ exp[-(yi- μi)2/2σ2]
C. In the next step, called “reparameterization,” we substitute in a systematic
component for our generic parameter. In this case, we substitute a linear function.
f(y|μ) = Π (2πσ2)-½ exp[-(yi- xiβ)2/2σ2]
E. Now we take this traditional probability, which returns an absolute probability,
and use the likelihood axiom to get something that is proportional to the inverse probability.
We also want work with an expression that is mathematically tractable, and since any
monotonic function of the traditional probability can serve as a relative measure of
likelihood, for convenience we will take the natural log of this function. We can also use the
“Fisher-Neyman Factorization Lemma,” which proves that in a likelihood function, we can
drop every term not depending on the parameters, to get rid of k(y). Finally, we are also
going to use algebraic tricks like ln(abc)=ln(a)+ln(b)+ln(c) and ln(ab)=bln(a).
5
~
~
L(  ,~ 2 | y )  k ( y ) Pr( y |  , ~ 2 )
n
~
~
L(  ,~ 2 | y )  k ( y ) f n ( yi |  ,~ 2 )
i 1
n
~
~
L(  ,~ 2 | y )   f n ( yi |  ,~ 2 )
i 1
~
n
 ( y i  xi  ) 2
~ ~2
2 1 / 2
~
L(  , | y )   ( 2 ) exp[
]
2~ 2
i 1
~
 ( y i  xi  ) 2
n
~ ~2
2 1 / 2
~
ln L(  ,  | y )  i 1 ln{( 2 ) exp[
]}
2~ 2
~
 ( y i  xi  ) 2
1
n
~ ~2
2
~
ln L(  ,  | y )  i 1{ ln( 2 ) 
}
2
2~ 2
1
1
1
n
~
~
ln L(  , ~ 2 | y )  i 1{ ln( 2 )  ln( ~ 2 )  ~ 2 ( yi  xi  ) 2 }
2
2
2
1
1
n
~
~
ln L(  , ~ 2 | y )  i 1{ ln( ~ 2 )  ~ 2 ( yi  xi  ) 2 }
2
2
VII. Summarizing a Likelihood Function
A. This log likelihood function is an expression representing a function that could be
graphed, but you would need as many dimensions as: the likelihood value + constant term +
# of independent variables + # of ancillary parameters like σ2 (we have assumed
homeskedacity in this model, making σ2 constant, but we could have chosen to model its
variation across observations). So instead of using all of the information in the function, we
will summarize the function by finding its maximum, the value of β that gives us the greatest
likelihood of having generated the data.
B. Analytical Method. For relatively simple likelihood functions, we can find a
maximum by going through the following four steps.
i. Take the derivative of the log-likelihood with respect to your parameter
vector. The reason that we took the log of the likelihood function is mainly because taking
the derivative of a sum is much easier than taking the derivative of a product.
ii. Set the derivative equal to zero.
iii. Solve for your parameter, thus finding an extreme point.
iv. Find out whether this extreme point is a maxima or minima by taking the
second derivative of the log-likelihood function. If it is negative, the function bows
downward before and after the extreme point and you have a (possible local) maximum.
For our linear model, the analytical solution for the variance parameter is:
σ2= 1/nΣ(yi- xiβ)2 which should look familiar! You are trying to minimize
the squared error here, and thus OLS can be justified by maximum likelihood as well as by
the fact that it is a convenient way to summarize a relationship and that it has all of the
properties that are desirable in an estimator.
6
C. Numerical/Computations Methods. This is what Stata does, because some
likelihood functions do not have an analytical solution. You can write out a likelihood
function and then begin with a starting value (or vector of values) for the parameter of
interest. Then you can use an algorithm (a recipe for a repeated process) to try out better
and better combinations of parameter values until you maximize the likelihood. The
Newton-Raphsom and Gauss-Newton algorithms are common ones, and they use linear
algebra to take derivatives of matrices. Basically, they start with a parameter vector, get its
likelihood, then look at the gradient of the likelihood function to see which direction they
should move in order to get a higher likelihood value, and keep going until they can’t move
in either direction and get a higher likelihood.
Monday, April 5. What Good is a Likelihood Function?
VIII.
IX.
X.
XI.
Properties of Maximum Likelihood Estimators
Likelihood Ratio Test
Interpreting Functional Forms (The Hard Way)
Interpreting Functional Forms (The Easy Way)
VIII. Properties of Maximum Likelihood Estimators
A. Gary King prefers justifying maximum likelihood based on its deep philosophical
justifications, but statisticians have traditionally justified estimators based on their properties
or criteria. Here are some of the basic properties, (and note that maximum likelihood
estimators don’t always possess all of them):
B. Finite Sample Properties:
i.
Invariance to Reparameterization. You can “trick” maximum
likelihood by estimating β, and then taking the natural log of β in
order to estimate ln(β).
ii.
Invariance to Sampling (Size) Plans. ML estimators don’t depend on
the sampling size rule, so it’s OK if you run out of dissertation
funding and have to collect a smaller dataset.
iii.
Minimum Variance Unbiasedness. If a minimum variance unbiased
estimator exists, then ML picks it. You may have seen a proof in an
earlier class that a least squares estimators is a MVUE, and you
should be comforted by the fact that ML picked it, given
assumptions of linearity and normally distributed errors.
C. Asymptotic Properties: (these look at the properties of more and more ˆ s
estimated from datasets with larger and larger ns)
i.
Consistency. As n goes to infinity, the sampling distribution of an
estimator ˆn converges to a spike over the true θ. ML estimators can
violate consistency when you want to estimate as many parameters as
7
ii.
iii.
you have cases, but in these rare instances, no other estimator is
consistent.
Asymptotic Normality. For a very large n, the sampling distribution
of ˆn is normal.
Asymptotic Efficiency. An ML estimator has a smaller asymptotic
variance than any consistent and uniformly Normal estimator.
IX. Likelihood Ratio Test
A. Since maximum likelihood is a relative concept, we are going to have ot compare
hypotheses about the same data in order to judge the precision of estimates. Specifically, we
can compare an unrestrictive model to a more restrictive model representing the null
hypothesis that some parameter is fixed at zero (meaning that the variable is omitted). We
can do this in three ways:
i. Wald’s test corresponds to using standard errors of coefficients. (Greene,
page 486)
ii. A Rao’s Score/Lagrange Multiplier test uses only the null model. (Greene,
page 489)
iii. A Likelihood Ratio Test compares both model’s likelihoods. (Greene,
page 484)
B. “Likelihood Ratios.” In the same dataset, these allow us to compare the
likelihoods of two hypothetical values of the parameters in the same units as the
corresponding traditional probabilities, using simple math. This allows us to conduct
hypothesis tests. The likelihood ratio is:
~
~
L(1 | y ) k ( y ) Pr( y | 1 )

~
~
L( 2 | y ) k ( y ) Pr( y |  2 )
~
~
L(1 | y ) Pr( y | 1 )

~
~
L( 2 | y ) Pr( y |  2 )
D. The Likelihood Ratio Test. Let L* be the maximum of the likelihood function of
the unrestrictive model representing the alternative hypothesis. Let L*R be the maximum of
the likelihood function of the restrictive model representing the null hypothesis. We know
that L* is greater than or equal to L*R, because using an additional explanatory variable
cannot hurt your model. The question is (as in any hypothesis test), how do we know that
the improvement in the likelihood that we get by adding this variable is sufficiently large that
it is not due to chance alone? We rely on a result from distribution theory:
Likelihood Ratio R = -2ln(L*R/L*)
R = 2(ln(L*) – ln(L*R)) and this is distributed chi-square with df=m
This likelihood ratio is distributed according to the chi-square distribution with m degrees of
freedom, where m is the number of parameters. The expected value of R is m, so if you get
an R that is bigger than the number of parameters, you should probably reject the null and
adopt the unrestricted model. You can look at a chi-square table with m degrees of freedom
to find the probability of getting a given R by chance alone, assuming that the null
8
hypothesis is true. [Note that this is a traditional probability statement, and we are able to
assign an absolute probability here].
X. Interpreting Functional Forms (The Hard Way)
A. Here’s the axe that Gary King has so profitably ground: “If β has no substantive
interpretation for a particular model, then that model should not be estimated. P. 102)” He
is reacting to the previously common practice of reporting coefficients from ML models like
probit and logit, which are not obviously intuitive, discussing their sign and significance, and
then throwing up your hands at how to make sense of the coefficient’s point value. Of
course, he really isn’t advising anyone not to estimate any models. And his solution to this
problem is not to run OLS, with its easy-to-interpret coefficients, but to run the ML model
that fits your data generating assumptions, and then do some work in order to interpret
them. In the bad old days before CLARIFY, you had to get out your calculator to do this.
We are going to apply the logit function to discrete, stochastically independent outcomes
and learn how to make meaningful statements about ML coefficients.
B. Suppose you are trying to explain a phenomenon that can only result in two
outcomes. Rather than selecting a continuous univariate probability distribution (like the
Normal) to model its stochastic component, you’d want to select a discrete distribution with
two outcomes like the Bernoulli distribution. Remember that if yi=0,1, the probability of the
outcome given some parameter πi is given by:
πiyi(1- πi)1-yi
and takes on a value of zero elsewhere. Note that πi is really just the probability that your
variable takes on the value of one, and thus πi must be some number between zero and one.
We also need to supply a systematic component here, something that tells us how πi, the
chances of a particular outcome, varies across observations. If we used a linear systematic
component, it would return values of π that were potentially less than zero or greater than
one. We also might want a systematic component that represents a much larger marginal
effect of an explanatory variable when it varies in the middle of its range, but smaller effects
at the bottom and top ends of its range. A systematic component that fits this substantive
story is the logit functional form, substantively similar to probit but more tractable
mathematically.
πi =1/[1+exp(-xi β)]
If we can assume that all of our observations are generated by independent Bernoulli
processes, we write out a joint distribution, and (after omitted algebraic steps) turn it into a
log likelihood function.
n
Pr(Y |  )    iy i (1   i )1 yi
i 1
n
~
~
~
ln L(  | y )  { yi ln[ 1  exp(  xi  )]  (1  yi ) ln[ 1  exp( xi  )]}
i 1
9
C. Interpreting coefficients. One reason this is not as simple as in the linear case is
because the effect of an explanatory variable can depend on its level and upon the level of
other explanatory variables. Because effects vary in the case of logit, to isolate the impact of
one variable, we have to hold the other variables constant at some value. Holding variables
constant at their means is one intuitive choice. If some of these explanatory variables are
categorical or dichotomous, you might want to hold them constant at their median or modal
values. Once you do this, you can report the effects of changes in your key explanatory
variables in one of three ways:
i. Graphical Methods. You can graph the predicted probability of one
outcome, π, by different values Xji of variable Xj by using the following formula
(where X* is the vector of all other k-1 explanatory variables):
ˆ 
1
1  exp[  X *ˆ *  X ji  j ]
ii. Fitted Values. You can plug in just a few key values of the key explanatory
variable, run them through the equation above, and report the predicted probability
of one outcome.
iii. First Differences. This tells you how much a parameter like π changes
due to a change in your key explanatory variable. To compute it, you subtract the
fitted value at point Xja from the fitted value at point Xjb:
1
1
FirstDifference 

* ˆ*
* ˆ*
1  exp[  X   X jb  j ] 1  exp[  X   X ja  j ]
XII. Interpreting Functional Forms (The Easy Way)
A. All it takes is a set of coefficients, descriptive measures of your variables, and a
calculator to find these first differences. But we are lazy political scientists, and this laziness
led many researchers to stop after getting the coefficients. So 11 years after writing UPM,
Gary King, Michael Tomz, and Jason Wittenberg wrote a very useful program called
CLARIFY that works inside of Stata to calculate things like first differences for us. You can
go to http://GKing.harvard.edu, watch Gary’s face get assembled, and download it. Be sure
to get the documentation as well. Clarify estimates an ML model and then simulates 1000
vectors of parameters (rather than basing it confidence intervals on the standard errors of
coefficients).
i. In order to do this, just type estsimp at the beginning of the Stata
command line that you would normally enter to run an ML model
(i.e. estsimp logit y x1 x2).
ii. The next step asks you to use the setx command to hold the
explanatory variables constant at some value, such as (i.e. setx mean).
iii. Finally, use the simqi command to simulate some quantity of
interest, conditional on how you have set the explanatory variables.
For instance, simqi fd(predval(1)) changex(x1 67 263) asks it to
simulate the change in the probability that Y=1 brought by an
increase in variable x1 from a value of 67 to a value of 263.
10
We will go over how to use CLARIFY in a lab, but here is an example from some of
my research of what Stata/CLARIFY output and my notes on it look like, as well as a table
that presents this information. Stata commands are in bold.
estsimp mlogit exit adsalary totalday staffper ptedge ptloss turn1n
leaddem caucus house size appoint money k6
setx mean
For each outcome, I move continuous variables from one standard deviation below
their mean to one above, while holding other variables constant at their mean,
and predict the effect. For dichotomous variables, I simulate the effects of
moving from zero to one.
Effects on the Chances of Losing Power Even Though Party Retains Control
Salaries:
$2307 North Dakota, 1992
$24374 Illinois, 1992
simqi fd(prval(1)) changex(adsalary 2307 24374)
First Difference: adsalary 2307 24374
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(exit2 = 2) |
-.001879
.0033777
-.0090399
.0043757
Session Lengths:
simqi fd(prval(1)) changex(totalday 67 263)
First Difference: totalday 67 263
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(exit = 1) | -.0972276
.0079422
-.1131462
-.0815021
Turnover in the Subsequent Session:
simqi fd(prval(1)) changex(turn1n 12 34)
First Difference: turn1n 12 34
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(exit = 1) |
.129404
.0089526
.1123719
.1465948
Appointment Power:
simqi fd(prval(1)) changex(appoint 0 1)
First Difference: appoint 0 1
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(exit = 1) | -.0887804
.0079899
-.1038538
-.072931
11
Size of House:
simqi fd(prval(1)) changex(size 51 180)
First Difference: size 51 180
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(exit = 1) | -.1348364
.0071777
-.1484254
-.1201581
Table 3.5. Effects of Significant Predictors of the Probability that Leader Loses
Power Even Though Party Retains Control.
Variable
Shift in Variable (from, to) Shift in the Probability
of Losing Power
Session Length
(67, 263)
10% lower
Turnover Rate
Size of House (in members)
Committee Appointment Power
(12%, 34%)
13% higher
(51, 180)
13% lower
(0, 1)
9% lower
12
Download