Supplemental Digital Content 1: Where do the probabilities come

advertisement
Supplemental Digital Content 1: Where do the probabilities come from: A brief nontechnical introduction to Markov Chain Monte Carlo parameter estimation
In our context, a parameter is a measurable quantity that determines the specific form of a
model. To illustrate, assume we believe sample data come from a normal probability distribution
with an unknown mean m and standard deviation s. The general model is that the data are
normally distributed. However, to use the model to describe the data or make inferences about
the population from which the data came, we have to estimate values for the two model
parameters m and s.
Letting X be the data, we could write this model as X ~ N(m,s). N signifies that the data
are from a normal probability distribution and m and s are the unknown parameters of this
distribution. We could also represent the model in terms of a likelihood function: L(m,s;X).
The likelihood function is the probability of the data given the model parameters, which can be
also be written as P(X | m, s). In our case, L(m,s;X) is a normal probability density with mean m
and standard deviation s.
A common method to estimate the unknown model parameters is to find the values of m
and s that maximize the likelihood function, that is, the values for m and s that make the
likelihood of seeing what we saw (the data) as large as possible. This is called the maximum
likelihood approach. You can often solve analytically for the unknown parameters that
maximize the likelihood. When the likelihood function is complicated, there are algorithms that
will converge to the maximum likelihood estimates of the unknown parameters.
As an alternative to the maximum likelihood approach for estimating parameters of interest, one
could use a Bayesian approach. Underlying this approach is Bayes’ theorem, which in its
simplest form can be stated as follows:
P(parameters|data) = P(data|parameters)*P(parameters) / P(data)
The probability P of the data given the parameters is the likelihood function. The probability of
the parameters P(parameters) is called the prior probability density function. The probability of
the parameters given the data is called the posterior probability density function. The
probability of the data, P(data), is a constant, called a “normalizing” constant, which makes the
posterior probability distribution a “legitimate” probability distribution (technically a distribution
that integrates to 1).
The logic of Bayes’ theorem is that you start off with some guesses about the probability
distributions of the unknown parameters and then based on the data you modify your initial
guesses. If you have only very vague guesses about the prior distributions, the posterior
distributions are largely determined by the data; if you have strong feelings about the prior
distributions, it takes a lot of data to move the posterior distribution far from your prior
estimates. For example, one might guess that the mean is equally likely to be anywhere between
-1,000 and + 1,000 (i.e., it follows a uniform distribution over this range) and the standard
deviation could be anywhere between 0 and 10,000. With these types of vague or flat prior
distributions, the posterior distributions are largely determined from the data.
In non-Bayesian statistics, one makes inferences based on a sampling distribution (i.e. a
probability distribution that describes how a sample statistic varies). The values of key
parameters of this sampling distribution are those in the null hypothesis. The Bayesian approach
estimates relevant parameters directly from the data rather than from a null hypothesis.
Similar to maximum likelihood estimation, in some cases one can analytically solve for
the posterior distributions. However, in many cases this is not possible and a simulation
approach called Markov Chain Monte Carlo (MCMC) is used. Essentially, you simulate draws
from the posterior distribution many times, compute the average values of m and s, and then use
these averages as estimates of the unknown parameters. The Monte Carlo part of the method is
the random draws from the posterior distribution. The Markov Chain part of the method is that
the selected values form a Markov Chain (i.e., a random process in which the future values
depend only on the current state of the system and not the path that it took to arrive at the current
state) that will converge to the true values of the parameters.
To illustrate, suppose someone tells you they rolled 3 dice and the sum turned out to be
10. What is the probability the values of the dice are 3, 5 and 2? To answer this question, you
could simulate the values of 3 different dice conditional on their sum equal to 10. The
connection to the statistical problem is that the requirement the dice sum of 10 is equivalent to
the data we can see. The three rolls are the parameters that we can't see. Because of the
condition, the values of the simulated draws are dependent on each other. That is, the roll of the
first die impacts permissible values for the second and third die. One way to do the simulation
would be to simulate the rolls of 3 dice until their sum equals 10 and then output the 3 different
values for the dice when this happens. Among all the rolls with a sum of 10, you could calculate
the proportion where the values are 3, 5 and 2. But, we “waste” a lot of rolls since the chance of
getting a sum exactly equal to 10 is not going to occur most of the time. MCMC methods can
facilitate the process. We start by arbitrarily selecting numbers for the 3 die that add up to 10 –
say 4, 5 and 1. At each step in the process, we choose two dice at random (say the last 2) and
roll one of them (say the 2nd). We force the 3rd die to take a value so that the sum adds up to 10.
For example, if the second die turns up 3, then we force the 3rd die to be 3. If it is not possible to
find a value for the 3rd die so that the sum adds up to 10, we return all the dice to their previous
values. We then repeat this process over and over again. After a large number of repetitions in
which the sum of the die always equals 10, it turns out that the values on each of the dice will
have a distribution very close to their posterior distribution given the condition that the sum of
the dice equals 10. Notice that at the beginning of this process the values of the dice depend
heavily on their initial conditions. After many repetitions, the impact of the initial conditions
becomes less and less. This MCMC approach for simulating a conditional distribution is called
the “Gibbs sampler” approach.
As noted, the analogy to the statistical model above is to consider the unknown
parameters as the dice and the observed data the condition that sum equals 10. At each step, one
of the parameters at a time is chosen and then randomly resampled according to its conditional
distribution given all the other parameters and the data values (this is equivalent to rolling one of
the 2 randomly selected dice). If this process is repeated many times, it turns out that the
distribution of the unknown parameters will converge to the posterior distribution of the
unknown parameters given the data. If there are many parameters in a model, it can be very
difficult to resample all of them simultaneously – it is quite difficult to compute the joint
distribution of many parameters. The beauty of the MCMC Gibbs sampler approach is that only
one unknown parameter at a time needs to be resampled.
In terms of our problem, the parameters of interest are the measures of quality at each
facility. Once we have gone through one complete iteration of the Gibbs sampler – that is, a
random draw from the conditional distribution of the parameter measuring quality at each of the
facilities – we can rank the facilities in order from highest quality to lowest and record for each
facility whether its quality is in some top quantile or not. When the entire process is complete,
we can calculate for each facility the proportion of the times its quality was in the top quantile.
This is the probability metric for profiling facilities that we illustrate.
Download