Chapter 7: Bayesian Econometrics Christophe Hurlin June 26, 2014 University of Orléans

advertisement
Chapter 7: Bayesian Econometrics
Christophe Hurlin
University of Orléans
June 26, 2014
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
1 / 246
Section 1
Introduction
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
2 / 246
1. Introduction
The outline of this chapter is the following:
Section 2. Prior and posterior distribution
Section 3. Posterior distributions and inference
Section 4. Applications (VAR and DSGE)
Section 4. Numerical simulations
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
3 / 246
1. Introduction
References
Geweke J. (2005), Contemporary Bayesian Econometrics and Statistics. New
York: John Wiley and Sons. (advanced)
Geweke J., Koop G. and Van Dijk H. (2011), The Oxford Handbook of
Bayesian Econometrics. Oxford University Press.
Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil
Greenberg E. (2008), Introduction to Bayesian Econometrics, Cambridge
University Press. (recommended)
Koop, G. (2003), Bayesian Econometrics. New York: JohnWiley and Sons.
Lancaster T. (2004), An Introduction to Modern Bayesian Inference. Oxford
University Press.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
4 / 246
1. Introduction
Notations: In this course, I will (try to...) follow some conventions of
notation.
Y
y
fY ( y )
FY ( y )
Pr (.)
y
Y
random variable
realisation
probability density function or probability mass function
cumulative distribution function
probability
vector
matrix
Problem: this system of notations does not allow to discriminate between
a vector (matrix) of random elements and a vector (matrix) of
non-stochastic elements (realisation).
Abadir and Magnus (2002), Notation in econometrics: a proposal for a
standard, Econometrics Journal.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
5 / 246
Section 2
Prior and posterior distributions
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
6 / 246
2. Prior and posterior distributions
Objectives
The objective of this section are the following:
1
Introduce the concept of prior distribution
2
De…ne the hyperparameters
3
De…ne the concept of posterior distribution
4
De…ne the concept of unormalised posterior distribution
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
7 / 246
2. Prior and posterior distributions
In statistics, there a distinction between two concepts of probability:
1
Frequentist probability
2
Subjective probability
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
8 / 246
2. Prior and posterior distributions
Frequentist probability
Frequentists restrict the assignment of probabilities to statements that
describe the outcome of an experiment that can be repeated.
Example
Statement A1: “A coin tossed three times will come up heads either two
or three times.” We can imagine repeating the experiment of tossing a
coin three times and recording the number of times that two or three
heads were reported. Then:
Pr (A1 ) = lim
n !∞
Christophe Hurlin (University of Orléans)
number of times two or three heads occurs
n
Bayesian Econometrics
June 26, 2014
9 / 246
2. Prior and posterior distributions
Subjective probability
1
Those who take the subjective view of probability believe that
probability theory is applicable to any situation in which there is
uncertainty.
2
Outcomes of repeated experiments fall in that category, but so do
statements about tomorrow’s weather, which are not the outcomes of
repeated experiments.
3
Calling probabilities “subjective” does not imply that they can be set
arbitrarily, and probabilities set in accordance with the axioms are
consistent.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
10 / 246
2. Prior and posterior distributions
Reminder
The probability of event A is denoted by Pr (A). Probabilities are assumed
to satisfy the following axioms:
Pr (A)
1
0
1
2
Pr (A) = 1 if A represents a logical truth
3
If A and B describe disjoint events (events that cannot both occur),
then Pr (A [ B ) = Pr (A) + Pr (B )
4
Let Pr ( Aj B ) denote “the probability of A, given (or conditioned on
the assumption) that B is true.” Then
Pr ( Aj B ) =
Christophe Hurlin (University of Orléans)
Pr (A \ B )
Pr (B )
Bayesian Econometrics
June 26, 2014
11 / 246
2. Prior and posterior distributions
Reminder
De…nition
The union of A and B is the event that A or B (or both) occur; it is
denoted by A [ B.
De…nition
The intersection of A and B is the event that both A and B occur; it is
denoted by A \ B.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
12 / 246
2. Prior and posterior distributions
Questions
1
What is the fundamental idea of the posterior distribution?
2
How it can be computed from the likelihood function and the prior
distribution?
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
13 / 246
2. Prior and posterior distributions
Example (Subjective view of probability)
Let Y a binary variable with Y = 1 if a coin toss results in a head and 0
otherwise, and let
Pr (Y = 1) = θ
Pr (Y = 0) = 1
θ
which is assumed to be constant for each trial. In this model, θ is a
parameter and the value of Y is the data (realisation y ).
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
14 / 246
2. Prior and posterior distributions
Example (cont’d)
Under these assumptions, Y is said to have the Bernoulli distribution,
written as
Y
Be (θ )
with a probability mass function (pmf)
fY (y ; θ ) = Pr (Y = y ) = θ y (1
θ )1
y
We consider a sample of i.i.d. variables (Y1 , .., Yn ) that corresponds to n
repeated experiments. The realisation is denoted by (y1 , .., yn ) .
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
15 / 246
2. Prior and posterior distributions
Frequentist approach
1
From the frequentist point of view, probability theory can tell us
something about the distribution of the data for a given θ.
Y
Be (θ )
2
The parameter θ is an unknown number between zero and one.
3
It is not given a probability distribution of its own, because it is
not regarded as being the outcome of a repeated experiment.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
16 / 246
2. Prior and posterior distributions
Fact (Frequentist approach)
In a frequentist approach, the parameters θ are considered as constant
terms and the aim is to study the distribution of the data given θ,
through the likelihood of the sample.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
17 / 246
2. Prior and posterior distributions
Reminder
The likelihood function is de…ned as to be:
Ln : Θ
Rn ! R+
(θ; y1 , .., yn ) 7 ! Ln (θ; y1 , .., yn ) = fY1 ,..,YN (y1 , y2 , .., yn ; θ )
Under the i.i.d. assumption
n
(θ; y1 , .., yn ) 7 ! Ln (θ; y1 , .., yn ) = ∏ fY (yi ; θ )
i =1
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
18 / 246
2. Prior and posterior distributions
Remark: the (log-)likelihood function depends on two arguments: the
sample (realisation) and θ
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
19 / 246
2. Prior and posterior distributions
Reminder
In our example, Y is a discrete variable and the likelihood can be
interpreted as the joint probability to observe the sample (realisation)
(y1 , .., yn ) given a value of θ
Ln (θ; y1 .., yn ) = Pr ((Y1 = y1 ) \ ... \ (Yn = yn ))
If the sample (Y1 , .., Yn ) is i.i.d., then:
Ln (θ; y1 .., yn ) =
Christophe Hurlin (University of Orléans)
n
n
i =1
i =1
∏ Pr (Yi = yi ) = ∏ fY (yi ; θ )
Bayesian Econometrics
June 26, 2014
20 / 246
2. Prior and posterior distributions
Reminder
In our example, the likelihood of the sample (y1 , .., yn ) is
n
Ln (θ; y1 .., yn ) =
∏ fY (yi ; θ )
i =1
n
=
∏ θy
i
(1
θ )1
yi
i =1
Or equivalently
Ln (θ; y1 .., yn ) = θ Σyi (1
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
θ ) Σ (1
yi )
June 26, 2014
21 / 246
2. Prior and posterior distributions
Notations:
In the rest of the chapter, I will use the following alternative notations:
Ln (θ; y )
`n (θ; y )
L (θ; y1 , .., yn )
ln Ln (θ; y )
Christophe Hurlin (University of Orléans)
Ln (θ )
ln L (θ; y1 , .., yn )
Bayesian Econometrics
ln Ln (θ )
June 26, 2014
22 / 246
2. Prior and posterior distributions
Frequentist approach
From the subjective point of view, however, θ is an unknown
quantity.
Since there is uncertainty over its value, it can be regarded as a
random variable and assigned a probability distribution.
Before seeing the data, it is assigned a prior distribution
π (θ ) with 0
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
θ
1
June 26, 2014
23 / 246
2. Prior and posterior distributions
De…nition (Prior distribution)
In a Bayesian framework, the parameters θ associated to the distribution
of the data, are considered as random variables. Their distribution is
called the prior distribution and is denoted by π (θ ) .
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
24 / 246
2. Prior and posterior distributions
Remark
In our example, the endogenous variable Yi is discrete (0 or 1):
Yi
Be (θ )
but the parameter θ = Pr (Yi = 1) can be considered as a continuous
random variable over [0, 1]: in this case π (θ ) is a pdf.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
25 / 246
2. Prior and posterior distributions
Example (Prior distribution)
For instance, we may postulate that:
θ
Christophe Hurlin (University of Orléans)
U[0,1 ]
Bayesian Econometrics
June 26, 2014
26 / 246
2. Prior and posterior distributions
Remark
Whatever the type of the distribution of the endogenous variable (discrete
or continuous), the prior distribution is generally continuous.
π (θ ) = probability density function (pdf)
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
27 / 246
2. Prior and posterior distributions
De…nition (Uninformative prior)
An uninformative prior is a ‡at prior. Example: θ
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
U[0,1 ]
June 26, 2014
28 / 246
2. Prior and posterior distributions
Remark
In most of cases, the prior distribution is parametrised, i.e. the pdf
π (θ; γ) depends on a set of parameters γ.
De…nition (Hyperparameters)
The parameters of the prior distribution, called hyperparameters, are
supplied by the researcher.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
29 / 246
2. Prior and posterior distributions
Example (Hyperparameters)
If θ 2 R and if the prior distribution is normal
1
exp
π (θ; γ) = p
σ 2π
with γ = µ, σ2
>
µ )2
(θ
2σ2
!
the vector of hyperparameters.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
30 / 246
2. Prior and posterior distributions
Example (Beta prior distribution)
If θ 2 [0, 1] , a common (parametrised) prior distribution is the Beta
distribution denoted B (α, β) .
π (θ; γ) =
Γ (α + β) α
θ
Γ (α) Γ ( β)
1
(1
θ )β
1
α, β > 0
θ 2 [0, 1]
with γ = (α, β)> the vector of hyperparameters.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
31 / 246
2. Prior and posterior distributions
Beta distribution: reminder
The gamma function, denoted Γ (p ) , is de…ned as to be:
Γ (p ) =
Z +∞
e
x p 1
x
dx
p>0
0
with:
Γ (p ) = (p
Γ (p ) = (p
1) ! = (p
Γ
1) Γ (p
1)
(p
1
2
=
2) ...
p
1)
2
1 if p 2 N
π
.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
32 / 246
2. Prior and posterior distributions
The Beta distribution B (α, β) has very interesting features:
1
Depending on the choice of α and β, this prior can capture beliefs that
indicate θ is centered at 1/2, or it can shade θ toward zero or one;
2
It can be highly concentrated, or it can be spread out;
3
When both parameters are less than one, it can have two modes.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
33 / 246
2. Prior and posterior distributions
The shape of a Beta distribution can be understood by examining its
mean and variance:
θ B (α, β)
E (θ ) =
α
α+β
V (θ ) =
αβ
2
( α + β ) ( α + β + 1)
1
The mean is equal to 1/2 if α = β
2
A larger α ( β) shades the mean toward 1 (0)
3
the variance decreases as α or β increases.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
34 / 246
2. Prior and posterior distributions
a = 0.5 b = 0.5
a = 1.0 b = 1.0
6
6
4
4
2
0
2
0
0.2
0.4
0.6
0.8
1
0
0
0.2
a = 5.0 b = 5.0
6
4
4
2
0.8
1
2
0
0.2
0.4
0.6
0.8
1
0
0
0.2
a = 10.0 b = 5.0
0.4
0.6
0.8
1
a = 1.0 b = 30.0
6
6
4
4
2
0
0.6
a = 30.0 b = 30.0
6
0
0.4
2
0
0.2
Christophe Hurlin (University of Orléans)
0.4
0.6
0.8
1
0
0
0.2
Bayesian Econometrics
0.4
0.6
0.8
1
June 26, 2014
35 / 246
2. Prior and posterior distributions
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
36 / 246
2. Prior and posterior distributions
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
37 / 246
2. Prior and posterior distributions
Remark
In some models, the hyperparameters are stochastic: as for the
parameters of interest θ, they have a distribution.
This models are called hierarchical models
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
38 / 246
2. Prior and posterior distributions
De…nition (Hierarchical models)
In a hierarchical model, we add one or more additional levels, where the
hyperparameters themselves are given a prior distribution depending on
another set of hyperparameters.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
39 / 246
2. Prior and posterior distributions
Example (hierarchical model)
An example of hierarchical model is given by
y : pdf f ( y j θ )
θ : pdf π ( θ j α)
α : pdf π ( αj β)
where the hyperparameters β are constant terms.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
40 / 246
2. Prior and posterior distributions
De…nition (Posterior distribution)
Bayesian inference centers on the posterior distribution π ( θ j y ), which
is the conditional distribution of the random variable θ given the data
(realisation of the sample) y = (y1 , .., yn ).
θ j (Y1 = y1 , ..Yn = yn )
Christophe Hurlin (University of Orléans)
posterior distribution
Bayesian Econometrics
June 26, 2014
41 / 246
2. Prior and posterior distributions
Remark
When there is more than one parameter, the posterior distribution is a
joint conditional distribution of all the parameters given the observed data.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
42 / 246
2. Prior and posterior distributions
Notations
π (θ )
prior distribution = pdf of θ
π ( θj y )
posterior distribution = conditional pdf
p (y ; θ )
probability mass function (pmf)
Pr (A)
probability of event A
Discrete endogenous variable
Continuous endogenous variable
fY ( y ; θ )
probability distribution function (pdf)
FY ( y ; θ )
cumulative distribution function
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
43 / 246
2. Prior and posterior distributions
Warning
For all the general de…nitions and the general results, we will employ the
notation fY (y ; θ ) for both probability mass and density functions
Be careful with the interpretation of fY (y ; θ ): density (continuous case) or
directly probability (discrete case).
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
44 / 246
2. Prior and posterior distributions
Example (Discrete random variable)
If Y
Be (θ ) , the pmf given by:
fY (y ; θ ) = Pr (Y = y ) = θ y (1
θ )1
y
is a probability.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
45 / 246
2. Prior and posterior distributions
Example (Continuous random variable)
If Y
N µ, σ2 with θ = µ, σ2
>
, the pdf
(y
1
fY ( y ; θ ) = p
exp
σ 2π
µ )2
2σ2
!
is not a probability. The probability is given by
Pr (Y
y ) = FY ( y ; θ ) =
Zy
fY (x; θ ) dx
∞
Pr (Y = y ) = 0
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
46 / 246
2. Prior and posterior distributions
The posterior density function π ( θ j y ) is computed by Bayes theorem:
Theorem (Bayes’s Theorem)
For events A and B, the conditional probability of event A given that B
has occurred is
Pr ( B j A) Pr (A)
Pr ( Aj B ) =
Pr (B )
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
47 / 246
2. Prior and posterior distributions
Bayes theorem:
Pr ( Aj B ) =
Pr ( B j A) Pr (A)
Pr (B )
By setting A = θ and B = y , we have:
π ( θj y ) =
f Y jθ ( y j θ ) π ( θ )
fY ( y )
Z
f Y jθ ( y j θ ) π ( θ ) d θ
where
fY ( y ) =
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
48 / 246
2. Prior and posterior distributions
De…nition (Posterior distribution)
For one observation yi , the posterior distribution is the conditional
distribution of θ given yi , de…ned as to be:
π ( θ j yi ) =
f Y i jθ ( yi j θ ) π (θ )
fY i (yi )
Z
f Y i jθ ( yi j θ ) π (θ ) d θ
where
fY i (yi ) =
Θ
and Θ the support of the distribution of θ.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
49 / 246
2. Prior and posterior distributions
Remark
π ( θ j yi ) =
f Y i jθ ( yi j θ ) π (θ )
f Y i ( yi )
The term f Y i jθ ( yi j θ ) corresponds to the likelihood of the observation yi :
f Y i jθ ( yi j θ ) = Li (θ; yi )
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
50 / 246
2. Prior and posterior distributions
Remark
The e¤ect of dividing by fY i (yi ) is to make π ( θ j yi ) a normalized
probability distribution: integrating π ( θ j yi ) with respect to θ yields 1:
Z
π ( θ j yi ) d θ =
1
fY i (yi )
= 1
=
since fY i (yi ) =
Z f
Y i jθ ( yi j θ )
Z
Θ
Z
π (θ )
fY i (yi )
f Y i jθ ( yi j θ )
dθ
π (θ ) d θ
f Y i jθ ( yi j θ ) π (θ ) d θ.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
51 / 246
2. Prior and posterior distributions
Discrete variable
For a discrete random variable Yi , by setting A = θ and B = y , we have:
π ( θ j yi ) =
| {z }
p ( yi j θ )
| {z }
cond. probability
posterior (pdf)
where
p ( yi ) =
Christophe Hurlin (University of Orléans)
p (yi )
| {z }
π (θ )
| {z }
prior (pdf)
probability
Z
p ( yi j θ ) π (θ ) d θ
Bayesian Econometrics
June 26, 2014
52 / 246
2. Prior and posterior distributions
De…nition (Prior predictive distribution)
The term p (yi ) or fY i (yi ) is sometimes called the prior predictive
distribution
Z
p (yi ) = p ( yi j θ ) π (θ ) d θ
fY i (yi ) =
Christophe Hurlin (University of Orléans)
Z
f Y jθ ( yi j θ ) π (θ ) d θ
Bayesian Econometrics
June 26, 2014
53 / 246
2. Prior and posterior distributions
Remark
1
In general, we consider an i.i.d. sample Y = (Y1 , .., YN ) with a
realisation (data) y = (y1 , .., yn ) , and not only one observation.
2
In this case, the posterior distribution can be written as function of
the likelihood of the i.i.d. sample y = (y1 , .., yn ) .
Ln (θ; y1 , .., yn ) =
Christophe Hurlin (University of Orléans)
n
n
i =1
i =1
∏ Li (θ; yi ) = ∏ fY
Bayesian Econometrics
i
(yi )
June 26, 2014
54 / 246
2. Prior and posterior distributions
De…nition (Posterior distribution, sample)
For sample (y1 , .., yn ) , the posterior distribution is the conditional
distribution of θ given yi , de…ned as to be:
π ( θ j y1 , ..yn ) =
Ln (θ; y1 , .., yn ) π (θ )
fY 1 ,..,Y n (y1 , .., yn )
where Ln (θ; y1 , .., yn ) is the likelihood of the sample and
fY 1 ,..,Y n (y1 , .., yn ) =
Z
Ln (θ; y1 , .., yn ) π (θ ) d θ
Θ
and Θ the support of the distribution of θ.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
55 / 246
2. Prior and posterior distributions
Notations
π ( θ j y1 , ..yn ) =
Ln (θ; y1 , .., yn ) π (θ )
fY 1 ,..,Y n (y1 , .., yn )
For simplicity, we skip the notation (y1 , .., yn ) for the sample and we put
only the generic term y :
π ( θj y ) =
Christophe Hurlin (University of Orléans)
f Y jθ ( y j θ ) π ( θ )
Ln (θ; y ) π (θ )
=
fY ( y )
fY ( y )
Bayesian Econometrics
June 26, 2014
56 / 246
2. Prior and posterior distributions
Remark
π ( θ j y1 , ..yn ) =
Ln (θ; y1 , .., yn ) π (θ )
fY 1 ,..Y n (y1 , .., yn )
In this setting, the data (y1 , ..yn ) are viewed as constants whose
(marginal) distributions do not involve the parameters of interest θ.
fY 1 ,..Y n (y1 , .., yn ) = constant
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
57 / 246
2. Prior and posterior distributions
Remark (cont’d)
As a consequence, the Bayes theorem
Pr ( parametersj data) =
Pr ( dataj parameters) Pr (parameters)
Pr (data)
implies that
Pr ( parametersj data) ∝ Pr ( dataj parameters)
|
{z
}
{z
}
|
Posterior Density
Likelihood
where the symbol "∝" means “is proportional to.”
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
Pr (parameters)
|
{z
}
Prior Density
June 26, 2014
58 / 246
2. Prior and posterior distributions
De…nition (Unormalised posterior distribution)
The unormalised posterior distribution is the product of the likelihood
of the sample and the prior distribution:
π ( θ j y1 , ..yn ) ∝ Ln (θ; y1 , .., yn )
π (θ )
or with simplest notations
π ( θ j y ) ∝ Ln (θ; y )
π (θ )
where the symbol "∝" means “is proportional to.”
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
59 / 246
2. Prior and posterior distributions
Example (Posterior distribution)
Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that
Yi
Be (θ ) and:
fY i (yi ; θ ) = Pr (Yi = yi ) = θ yi (1
θ )1
yi
We assume that the (uninformative) prior distribution for θ is an
(continuous) uniform distribution over [0, 1].
Question: Write the pdf associated to the unormalised posterior
distribution and the posterior distribution
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
60 / 246
2. Prior and posterior distributions
Solution
π ( θ j y1 , ..yn ) =
Ln (θ; y1 , .., yn ) π (θ )
fY 1 ,..Y n (y1 , .., yn )
The sample (y1 , .., yn ) is i.i.d., so its likelihood is de…ned as to be:
n
Ln (θ; y1 , .., yn ) =
∏ fY
i =1
So, we have:
n
i
(yi ) = ∏ θ yi (1
Ln (θ; y1 .., yn ) = θ Σyi (1
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
θ )1
yi
i =1
θ ) Σ (1
yi )
June 26, 2014
61 / 246
2. Prior and posterior distributions
Solution (cont’d)
π ( θ j y1 , ..yn ) =
Ln (θ; y1 , .., yn ) π (θ )
fY 1 ,..Y n (y1 , .., yn )
The (uninformative) prior
distribution is
U[0,1 ]
θ
with a pdf:
π (θ ) =
1
0
if θ 2 [0, 1]
otherwise
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
Source: wikipedia
June 26, 2014
62 / 246
2. Prior and posterior distributions
Solution (cont’d)
π ( θ j y1 , ..yn ) ∝ Ln (θ; y1 , .., yn )
Ln (θ; y1 .., yn ) = θ Σyi (1
π (θ ) =
1
0
θ ) Σ (1
yi )
if θ 2 [0, 1]
otherwise
The unormalised posterior distribution is
( Σy
θ i (1
Ln (θ; y1 , .., yn ) π (θ ) =
0
Christophe Hurlin (University of Orléans)
π (θ )
Bayesian Econometrics
θ ) Σ (1
yi )
if θ 2 [0, 1]
otherwise
June 26, 2014
63 / 246
2. Prior and posterior distributions
Solution (cont’d)
π ( θ j y1 , ..yn ) =
Ln (θ; y1 , .., yn ) π (θ )
fY 1 ,..Y n (y1 , .., yn )
The joint density of (Y1 , .., Yn ) evaluated at (y1 , .., yn )
fY 1 ,..Y n (y1 , .., yn ) =
Z1
f Y 1 ,..Y n jθ ( y1 , .., yn j θ ) π (θ ) d θ
=
Z1
Ln (θ; y1 , .., yn ) π (θ ) d θ
0
=
0
1
Z
θ Σyi (1
θ ) Σ (1
yi )
dθ
0
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
64 / 246
2. Prior and posterior distributions
Solution (cont’d)
π ( θ j y1 , ..yn ) =
Ln (θ; y1 , .., yn ) π (θ )
fY 1 ,..Y n (y1 , .., yn )
Finally, we have
π ( θ j y1 , ..yn ) = Z
θ Σyi (1
1
0
θ
Σyi
(1
π ( θ j y1 , ..yn ) = 0
Christophe Hurlin (University of Orléans)
θ ) Σ (1
θ)
yi )
Σ (1 y i )
dθ
if θ 2 [0, 1]
if θ 2
/ [0, 1]
Bayesian Econometrics
June 26, 2014
65 / 246
2. Prior and posterior distributions
Example (Posterior distribution)
Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that
Yi
Be (θ ) with θ = 0.3. We assume that the (uninformative) prior
distribution for θ is an uniform distribution over [0, 1].
Question: Write a Matlab code in order (i) to generate a sample of size
n = 100, (ii) to display the pdf associated to the prior and the posterior
distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
66 / 246
2. Prior and posterior distributions
1.6
x 10
-28
Unormalised Posterior Distribution
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
Christophe Hurlin (University of Orléans)
0.2
0.4
θ
0.6
Bayesian Econometrics
0.8
1
June 26, 2014
67 / 246
2. Prior and posterior distributions
Posterior Distribution
9
8
7
6
5
4
3
2
1
0
0
Christophe Hurlin (University of Orléans)
0.2
0.4
θ
0.6
Bayesian Econometrics
0.8
1
June 26, 2014
68 / 246
2. Prior and posterior distributions
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
69 / 246
2. Prior and posterior distributions
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
70 / 246
2. Prior and posterior distributions
Example (Posterior distribution)
Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that
Yi
Be (θ ) and:
fY i (yi ; θ ) = Pr (Yi = yi ) = θ yi (1
θ )1
yi
We assume that the prior distribution for θ is a Beta distribution B (α, β)
with a pdf:
π (θ; γ) =
Γ (α + β) α
θ
Γ (α) Γ ( β)
1
(1
θ )β
1
α, β > 0
θ 2 [0, 1]
with γ = (α, β)> the vector of hyperparameters.
Question: Write the pdf associated to the unormalised posterior
distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
71 / 246
2. Prior and posterior distributions
Solution
The likelihood of the sample (y1 , .., yn ) is:
θ ) Σ (1
Ln (θ; y1 .., yn ) = θ Σyi (1
yi )
The prior distribution is:
π (θ; γ) =
Γ (α + β) α
θ
Γ (α) Γ ( β)
1
(1
θ )β
1
So the unormalised posterior distribution is:
π ( θ j y1 , ..yn ) ∝ Ln (θ; y1 , .., yn ) π (θ )
Γ (α + β) α 1
∝
θ
(1 θ ) β
Γ (α) Γ ( β)
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
1
θ Σyi (1
θ ) Σ (1
June 26, 2014
yi )
72 / 246
2. Prior and posterior distributions
Solution (cont’d)
π ( θ j y1 , ..yn ) ∝
Γ (α + β) α
θ
Γ (α) Γ ( β)
1
θ )β
(1
1
θ Σyi (1
θ ) Σ (1
yi )
or equivalently:
π ( θ j y1 , ..yn ) ∝
Γ (α + β) α
θ
Γ (α) Γ ( β)
1 +Σyi
(1
θ )β
1 + Σ (1 y i )
Note that the term Γ (α + β) / (Γ (α) Γ ( β)) does not depend on θ. So,
the unormalised posterior can also be written as:
π ( θ j y1 , ..yn ) ∝ θ (α+Σyi )
Christophe Hurlin (University of Orléans)
1
(1
θ ) ( β + Σ (1
Bayesian Econometrics
yi )) 1
June 26, 2014
73 / 246
2. Prior and posterior distributions
Solution (cont’d)
π ( θ j y1 , ..yn ) ∝ θ (α+Σyi )
1
(1
θ ) ( β + Σ (1
yi )) 1
Remind that the pdf of a Beta B (α, β) distribution is
Γ (α + β) α
θ
Γ (α) Γ ( β)
1
(1
θ )β
1
The posterior distribution is in the form of a beta distribution with
parameters
n
α1 = α + ∑ yi
n
β1 = β + n
i =1
∑ yi
i =1
This is an example of a conjugate prior, where the posterior distribution
is in the same family as the prior distribution
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
74 / 246
2. Prior and posterior distributions
De…nition (Conjugate prior)
A conjugate prior is such that the posterior distribution is in the same
family as the prior distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
75 / 246
2. Prior and posterior distributions
Example (Posterior distribution)
Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that
Yi
Be (θ ) and:
fY i (yi ; θ ) = Pr (Yi = yi ) = θ yi (1
θ )1
yi
We assume that the prior distribution for θ is a Beta distribution β (α, β)
Question: Determine the mean of the posterior distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
76 / 246
2. Prior and posterior distributions
Solution
We know that:
π ( θ j y1 , ..yn ) ∝ θ α1
1
(1
n
n
α1 = α + ∑ yi
1
θ ) β1
∑ yi
β1 = β + n
i =1
i =1
A simple way to normalise this posterior distribution consists in writing:
π ( θ j y1 , ..yn ) =
Γ ( α 1 + β 1 ) α1
θ
Γ ( α1 ) Γ ( β1 )
1
(1
θ ) β1
1
This expression corresponds to the pdf of B (α1 , β1 ) distribution and it is
normalised to 1 by de…nition:
Z1
0
Christophe Hurlin (University of Orléans)
π ( θ j y1 , ..yn ) d θ = 1
Bayesian Econometrics
June 26, 2014
77 / 246
2. Prior and posterior distributions
Solution (cont’d)
π ( θ j y1 , ..yn ) =
Γ ( α 1 + β 1 ) α1
θ
Γ ( α1 ) Γ ( β1 )
1
θ ) β1
(1
n
n
α1 = α + ∑ yi
1
β1 = β + n
i =1
∑ yi
i =1
Using the properties of the B (α1 , β1 ) distribution, we have:
E ( θ j y1 , ..yn ) =
Christophe Hurlin (University of Orléans)
α1
α1 + β1
=
α + ∑N
i =1 yi
N
α + ∑i =1 yi + β + n
=
α + ∑N
i =1 yi
α+β+n
Bayesian Econometrics
∑N
i =1 yi
June 26, 2014
78 / 246
2. Prior and posterior distributions
Solution (cont’d)
E ( θ j y1 , ..yn ) =
α + ∑N
i =1 yi
α+β+n
We can express the mean as a function of the MLE estimator,
y n = n 1 ∑N
i =1 yi , as follows:
E ( θ j y1 , ..yn ) =
|
{z
}
posterior mean
Christophe Hurlin (University of Orléans)
α+β
α+β+n
α
+
α+β
| {z }
prior mean
Bayesian Econometrics
n
α+β+n
yn
|{z}
MLE
June 26, 2014
79 / 246
2. Prior and posterior distributions
Remark
E ( θ j y1 , ..yn ) =
|
{z
}
α+β
α+β+n
posterior mean
1
α
+
α+β
| {z }
n
α+β+n
prior mean
yn
|{z}
MLE
If n ! ∞, then the weight on the prior mean approaches zero, and
the weight on the MLE approaches one, implying
lim E ( θ j y1 , ..yn ) = y n
n !∞
2
If the sample size is very small, n ! 0, then we have
lim E ( θ j y1 , ..yn ) =
n !0
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
α
α+β
June 26, 2014
80 / 246
2. Prior and posterior distributions
Example (Posterior distribution)
Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that
Yi
Be (θ ) and
∑N
i =1 yi = 3 if n = 10
∑N
i =1 yi = 15 if n = 50
We assume that the prior distribution for θ is a Beta distribution β (2, 2)
Question: Write a Matlab code to plot (i) the prior, (ii) the posterior
distribution and the (iii) the likelihood in the two cases.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
81 / 246
2. Prior and posterior distributions
n= 10
x 10
4
-3
Likelihood
Prior
Posterior
2
2
1
0
0
Christophe Hurlin (University of Orléans)
0.2
0.4
θ
0.6
Bayesian Econometrics
0.8
1
0
June 26, 2014
82 / 246
2. Prior and posterior distributions
n= 50
x 10
2
-11
Likelihood
Prior
Posterior
1
2
1
0
0
Christophe Hurlin (University of Orléans)
0.2
0.4
θ
0.6
Bayesian Econometrics
0.8
1
0
June 26, 2014
83 / 246
2. Prior and posterior distributions
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
84 / 246
2. Prior and posterior distributions
Key concepts of Section 2
1
Frequentist versus subjective probability
2
Prior distribution
3
Hyperparameters
4
Prior predictive distribution
5
Posterior distribution
6
Unormalised posterior distribution
7
Conjugate prior
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
85 / 246
Section 3
Posterior distributions and inference
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
86 / 246
3. Posterior distributions and inference
Objectives
The objective of this section are the following:
1
Generalise the bayesian approach to a vector of parameters
2
Generalise the bayesian approach to a regression model
3
Introduce the Bayesian updating mechanism
4
Study the posterior distribution in the case of a large sample
5
Discuss the inference in a Bayesian framework
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
87 / 246
3. Posterior distributions and inference
The concept of posterior distribution can be generalized to:
1
2
a case with a vector of parameters θ = (θ 1 , .., θ d )> .
a model with exogenous variable and/or lagged endogenous variables
(linear regression model, time series models, DSGE etc.).
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
88 / 246
Sub-Section 3.1
Generalisation to a vector of parameters
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
89 / 246
3. Posterior distributions and inference
3.1. Generalisation to a vector of parameters
Vector of parameters
Consider a model/variable with a pdf/pmf that depends on a vector of
parameters θ = (θ 1 , .., θ d )> .
The previous de…nitions of likelihood, prior, and posterior distributions
still apply.
But they are now, respectively, the joint prior distribution and joint
posterior distribution of the multivariate random variable θ
From the joint distributions, we may derive marginal and conditional
distributions.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
90 / 246
3. Posterior distributions and inference
3.1. Generalisation to a vector of parameters
De…nition (Marginal posterior distribution)
The marginal posterior distribution of θ 1 can be found by integrating
out the remainder of the parameters from the joint posterior distribution:
π ( θ1 j y ) =
Christophe Hurlin (University of Orléans)
Z
π ( θ 1 , .., θ d j y ) d θ 2 ..d θ d
Bayesian Econometrics
June 26, 2014
91 / 246
3. Posterior distributions and inference
3.1. Generalisation to a vector of parameters
De…nition (Conditional posterior distribution)
The conditional posterior distribution of θ 1 is de…ned as to be:
π ( θ 1 j θ 2 , .., θ d , y ) =
π ( θ 1 , .., θ d j y )
π ( θ 2 , .., θ d j y )
where the denominator on the right-hand side is the marginal posterior
distribution of (θ 2 , .., θ d ) obtained by integrating θ 1 from the joint
distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
92 / 246
3. Posterior distributions and inference
3.1. Generalisation to a vector of parameters
Remark
In most applications, the marginal distribution of a parameter is more
useful than its conditional distribution because the marginal takes into
account the uncertainty over the values of the remaining parameters, while
the conditional sets them at particular values.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
93 / 246
3. Posterior distributions and inference
3.1. Generalisation to a vector of parameters
Example (Marginal posterior distribution)
Consider the multinomial distribution Mn (.), which generalizes the
Bernoulli example discussed above. In this model, each trial, assumed
independent of the other trials, results in one of d outcomes, labeled
1, 2,..,d with probabilities θ 1 , .., θ d where ∑di=1 θ i = 1. When the
experiment is repeated n times and outcome i arises yi times, the
likelihood function is
Ln ( θ j y1 , .., yn ) = θ y11 θ y22 ...θ ydd
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
with ∑di=1 yi = n
June 26, 2014
94 / 246
3. Posterior distributions and inference
3.1. Generalisation to a vector of parameters
Example (cont’d)
Consider a Dirichlet prior distribution (generalisation of the Beta) with:
π (θ) =
Γ ∑di=1 αi
∏di=1
Γ ( αi )
θ 1α1
1 α2 1
θ 2 ...θ dαd 1 ,
αi > 0, ∑di=1 θ i = 1
Question: Determine the marginal posterior distribution of θ 1 .
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
95 / 246
3. Posterior distributions and inference
3.1. Generalisation to a vector of parameters
Solution
Following the previous procedure, we …nd the (unormalised) posterior joint
distribution given the data y = (y1 , .., yn ) :
π ( θj y1 , ..yn ) ∝ Ln ( θj y1 , .., yn )
∝
θ y11 θ y22 ...θ ydd
π (θ)
Γ ∑di=1 αi
∏di=1
Γ ( αi )
θ 1α1
1 α2 1
θ 2 ...θ dαd 1
Or equivalently:
π ( θj y1 , ..yn ) ∝ θ y11 +α1
1 y 2 + α2 1
θ2
...θ dyd +αd 1
Remark: the Dirichlet prior is a conjugate prior for the multinomial model.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
96 / 246
3. Posterior distributions and inference
3.1. Generalisation to a vector of parameters
Solution (cont’d)
The marginal distribution of θ 1 is de…ned as to be:
π ( θ 1 j y1 , .., yn ) =
Z1
0
π ( θj y1 , ..yn ) d θ 2 ..d θ d
In this context, we have:
π ( θ 1 j y1 , .., yn )
=
Γ (∑pi=1 αi + yi ) y1 +α1
θ
p
∏i =1 Γ (αi + yi ) 1
1
Z1
θ y22 +α2
1
...θ dyd +αd
1
d θ 2 ..d θ d
0
But, we can also use some general results about the Dirichlet distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
97 / 246
3. Posterior distributions and inference
3.1. Generalisation to a vector of parameters
Solution (cont’d)
De…nition (Dirichlet distribution)
The Dirichlet distribution generalizes the Beta distribution. Let
x = (x1 , .., xp ) with 0 xi
1, ∑pi=1 xi = 1. Then x D (α1 , .., αp ) if
f (x; α1 , .αp ) =
Γ (∑pi=1 αi ) α1
x
p
∏ i =1 Γ ( α i ) 1
1 α2 1
x2
...xdαd 1 ,
αi > 0
Marginally, we have
xi
Christophe Hurlin (University of Orléans)
B α i , ∑ k 6 =i α k
Bayesian Econometrics
June 26, 2014
98 / 246
3. Posterior distributions and inference
3.1. Generalisation to a vector of parameters
Solution (cont’d)
π ( θj y1 , ..yn ) ∝ θ y11 +α1
θj y1 , ..yn
1 y 2 + α2 1
θ2
...θ dyd +αd 1
D (y1 + α1 , .., yp + αp )
The marginal posterior distribution of θ 1 is a Beta distribution:
θ 1 j y1 , ..yn
Christophe Hurlin (University of Orléans)
B (y1 + α1 , ∑pi=2 yi + αi )
Bayesian Econometrics
June 26, 2014
99 / 246
Sub-Section 3.2
Generalisation to a model
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
100 / 246
3. Posterior distributions and inference
3.2. Generalisation to a model
Remark: We can also aim at estimating the parameters of a model (with
dependent and explicative variables) such that:
y = g (x; θ ) + ε
where θ denotes the vector or parameters, X a set of explicative variables,
ε and error term and g (.) the link function.
In this case, we generally consider the conditional distribution of Y given
X , which is equivalent to unconditional distribution of the error term ε :
YjX
Christophe Hurlin (University of Orléans)
D () ε
Bayesian Econometrics
D
June 26, 2014
101 / 246
3. Posterior distributions and inference
3.2. Generalisation to a model
Notations (model)
Let us consider two continuous random variables Y and X
We assume that Y has a conditional distribution given X = x with a
pdf denoted f Y jx (y ; θ ) , for y 2 R
θ = (θ 1 ..θ K )| is a K
that θ 2 Θ RK .
1 vector of unknown parameters. We assume
Let us consider a sample fX1 , YN gni=1 of i.i.d. random variables and
a realisation fx1 , yN gni=1 .
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
102 / 246
3. Posterior distributions and inference
3.2. Generalisation to a model
De…nition (Conditional likelihood function)
The (conditional) likelihood function of the i.i.d. sample fXi , Yi gni=1 is
de…ned to be:
n
Ln (θ; y j x ) =
∏ f Y jX ( yi j xi ; θ )
i =1
where f Y jX ( yi j xi ; θ ) denotes the conditional pdf of Yi given Xi .
Remark: The conditional likelihood function is the joint conditional
density of the data given θ.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
103 / 246
3. Posterior distributions and inference
3.2. Generalisation to a model
Example (Linear Regression Model)
Consider the following linear regression model:
yi = Xi> β + εi
where Xi is a K 1 vector of random variables and β = ( β1 ..βK )> a
K 1 vector of parameters. We assume that the εi are i.i.d. with
εi
N 0, σ2 . Then, the conditional distribution of Yi given Xi = xi is:
Yi j xi
N xi> β, σ2
1
Li (θ; y j x) = f Y jx ( yi j xi ; θ) = p
exp
σ 2π
where θ = β> σ2
>
is K + 1
Christophe Hurlin (University of Orléans)
yi
xi> β
2σ2
2!
1 vector.
Bayesian Econometrics
June 26, 2014
104 / 246
3. Posterior distributions and inference
3.2. Generalisation to a model
Example (Linear Regression Model, cont’d)
Then, if we consider an i.i.d. sample fyi , xi gni=1 , the corresponding
conditional (log-)likelihood is de…ned to be:
Ln (θ; y j x) =
=
n
n
i =1
i =1
∏ f Y jX ( yi j xi ; θ) = ∏
σ2 2π
`n (θ; y j x) =
Christophe Hurlin (University of Orléans)
n/2
n
ln σ2
2
exp
1
2σ2
yi
1
p exp
σ 2π
n
∑
yi
xi> β
i =1
n
ln (2π )
2
Bayesian Econometrics
1
2σ2
n
∑
yi
2
!
xi> β
2σ2
xi> β
2!
2
i =1
June 26, 2014
105 / 246
3. Posterior distributions and inference
3.2. Generalisation to a model
Remark: Given this principle, we can derive the (conditional) likelihood
and the log-likelihood functions associated to a speci…c sample for any
type of econometric model in which the conditional distribution of the
dependent variable is known.
Dichotomic models: probit, logit models etc.
Censored regression models: Tobit etc.
Times series models: AR, ARMA, VAR etc.
GARCH models
....
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
106 / 246
2. Prior and posterior distributions
3.2. Generalisation to a model
De…nition (Posterior distribution, model)
For the sample fyi , xi gni=1 , the posterior distribution is the conditional
distribution of θ given yi , de…ned as to be:
Ln (θ; y j x ) π (θ )
f Y jX ( y j x )
π ( θj y , x ) =
where Ln (θ; y j x ) is the likelihood of the sample and
f Y jX ( y j x ) =
Z
Θ
Ln (θ; y j x ) π (θ ) d θ
and Θ the support of the distribution of θ.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
107 / 246
Sub-Section 3.3
Bayesian updating
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
108 / 246
3. Posterior distributions and inference
3.3. Bayesian updating
Bayesian updating
A very attractive feature of Bayesian inference is the way in which
posterior distributions are updated as new information becomes available.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
109 / 246
3. Posterior distributions and inference
3.3. Bayesian updating
Bayesian updating
As usual for the …rst observation y1 , we have
π ( θ j y1 ) ∝ f ( y1 j θ ) π (θ )
Next, suppose that a new data y2 is obtained, and we wish to compute the
posterior distribution given the complete data set π ( θ j y1 , y2 ) :
π ( θ j y1 , y2 ) ∝ f ( y1 , y2 j θ ) π (θ )
The posterior can be rewritten as:
π ( θ j y1 , y2 ) ∝ f ( y2 j y1 , θ ) f ( y1 j θ ) π (θ )
∝ f ( y2 j y1 , θ ) π ( θ j y1 )
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
110 / 246
3. Posterior distributions and inference
3.3. Bayesian updating
De…nition (Bayesian updating)
The posterior distribution is updated as new information becomes
available as follows:
π ( θ j y1 , .., yn ) ∝ f ( yn j yn
1 , θ ) π ( θ j y1 , .., yn 1 )
If the observations yn and yn 1 are independent (i.i.d. sample), then
f ( yn j yn 1 , θ ) = f ( yn j θ ) and
π ( θ j y1 , .., yn ) ∝ f ( yn j θ ) π ( θ j y1 , .., yn
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
1)
June 26, 2014
111 / 246
3. Posterior distributions and inference
3.3. Bayesian updating
Example (Bayesian updating)
Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that
Yi
Be (θ ) and:
fY i (yi ; θ ) = Pr (Yi = yi ) = θ yi (1
θ )1
yi
We assume that the prior distribution for θ is a Beta distribution β (α, β)
with a pdf:
π (θ; γ) =
Γ (α + β) α
θ
Γ (α) Γ ( β)
1
(1
θ )β
1
α, β > 0
θ 2 [0, 1]
with γ = (α, β)> the vector of hyperparameters.
Question: Write the posterior of θ given (y1 , y2 ) as a function of the
posterior given y1 .
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
112 / 246
3. Posterior distributions and inference
3.3. Bayesian updating
Solution
The likelihood of (y1 , y2 ) is:
L (θ; y1 , y2 ) = θ y1 +y2 (1
θ )2
y1 y2
(1
θ )β
The prior distribution is:
π (θ; γ) =
Γ (α + β) α
θ
Γ (α) Γ ( β)
1
1
So the unormalised posterior distribution is:
π ( θ j y1 , y2 ) ∝ L (θ; y1 , y2 ) π (θ )
Γ (α + β) α 1
∝
θ
(1 θ ) β
Γ (α) Γ ( β)
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
1
θ y 1 +y 2 (1
θ )2
y1 y2
June 26, 2014
113 / 246
3. Posterior distributions and inference
3.3. Bayesian updating
Solution (cont’d)
π ( θ j y1 , y2 ) ∝
Γ (α + β) α
θ
Γ (α) Γ ( β)
1
(1
θ )β
1
θ )2
θ y 1 +y 2 ( 1
y1 y2
or equivalently:
π ( θ j y1 , y2 ) ∝
Γ ( α + β ) α +y 1 +y 2
θ
Γ (α) Γ ( β)
1
(1
θ ) β +2
y1 y2 1
So,
θ j y1 , y2
Christophe Hurlin (University of Orléans)
B (α + y1 + y2 , β + 2
Bayesian Econometrics
y1
y2 )
June 26, 2014
114 / 246
3. Posterior distributions and inference
3.3. Bayesian updating
Solution (cont’d)
π ( θ j y1 , y2 ) ∝
Γ ( α + β ) α +y 1 +y 2
θ
Γ (α) Γ ( β)
1
(1
B (α + y1 + y2 , β + 2
θ j y1 , y2
θ ) β +2
y1
y1 y2 1
y2 )
Given the observation y1 , we have:
π ( θ j y1 ) ∝
Γ ( α + β ) α +y 1
θ
Γ (α) Γ ( β)
θ j y1
Christophe Hurlin (University of Orléans)
1
(1
B (α + y1 , β + 1
Bayesian Econometrics
θ ) β +1
y1 1
y1 )
June 26, 2014
115 / 246
3. Posterior distributions and inference
3.3. Bayesian updating
Solution (cont’d)
π ( θ j y1 , y2 ) ∝
Γ ( α + β ) α +y 1 +y 2
θ
Γ (α) Γ ( β)
π ( θ j y1 ) ∝
Γ ( α + β ) α +y 1
θ
Γ (α) Γ ( β)
1
1
(1
(1
θ ) β +2
θ ) β +1
y1 y2 1
y1 1
The updating mechanism is given by:
π ( θ j y1 , y2 ) ∝ π ( θ j y1 ) θ y2 (1
θ )1
y2
or equivalently
π ( θ j y1 , y2 ) ∝ π ( θ j y1 ) fY 2 (y2 ; θ )
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
116 / 246
Sub-Section 3.4
Large sample
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
117 / 246
3. Posterior distributions and inference
3.4. Large sample
Large samples
Although all statistical results for Bayesian estimators are necessarily
"…nite sample" (they are conditioned on the sample data), it remains of
interest to consider how the estimators behave in large samples.
What is the the behavior of the posterior distribution when n is large?
π ( θ j y1 , .., yn )
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
118 / 246
3. Posterior distributions and inference
3.4. Large sample
Fact (Large sample)
Greenberg (2008) summarises the behavior of the posterior distribution
when n is large as follows:
(1) the prior distribution plays a relatively small role in determining the
posterior distribution,
(2) the posterior distribution converges to a degenerate distribution at the
true value of the parameter,
(3) the posterior distribution is approximately normally distributed with
mean b
θ, the MLE of θ.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
119 / 246
3. Posterior distributions and inference
3.4. Large sample
Large samples
What is the intuition of the two …rst results?
1
the prior distribution plays a relatively small role in determining the
posterior distribution
2
the posterior distribution converges to a degenerate distribution at
the true value of the parameter
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
120 / 246
3. Posterior distributions and inference
3.4. Large sample
De…nition (Log-Likelihood Function)
The log-likelihood function is de…ned to be:
`n : Θ
Rn ! R
n
(θ; y1 , .., yn ) 7 ! `n (θ; y1 , .., yn ) = ln (Ln (θ; y1 , .., yn )) =
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
∑ ln fY (yi ; θ )
i =1
June 26, 2014
121 / 246
3. Posterior distributions and inference
3.4. Large sample
Large samples
Introduce the mean log likelihood contribution (cf. chapter 3):
n
`n (θ; y ) =
∑ ln fY (yi ; θ ) = n` (θ; y )
i =1
The posterior distribution can be written as
π ( θ j y ) ∝ exp n` (θ; y )
|
{z
}
depends on n
Christophe Hurlin (University of Orléans)
π (θ )
| {z }
does not depend on n
Bayesian Econometrics
June 26, 2014
122 / 246
3. Posterior distributions and inference
3.4. Large sample
Large samples
π ( θ j y ) ∝ exp n` (θ; y )
{z
}
|
depends on n
π (θ )
| {z }
does not depend on n
For large n, the exponential term dominates π (θ ):
The prior distribution will play a relatively smaller role than do the
data (likelihood function), when the sample size is large.
Conversely, the prior distribution has relatively greater weight when n
is small
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
123 / 246
3. Posterior distributions and inference
3.4. Large sample
If we denote the true value of θ by θ 0 , it can be shown that
lim n` (θ; y ) = n` (θ 0 ; y )
n !∞
Then, we have for n large:
π ( θ j y ) ∝ exp n` (θ 0 ; y ) π (θ )
|
{z
}
8θ 2 Θ
does not depend on θ
Whatever the value of θ, the value of π ( θ j y ) tends to a constant times
the prior.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
124 / 246
3. Posterior distributions and inference
3.4. Large sample
Example (Large sample)
Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that
Yi
Be (θ ) with θ = 0.3 and:
fY i (yi ; θ ) = Pr (Yi = yi ) = θ yi (1
θ )1
yi
We assume that the prior distribution for θ is a Beta distribution B (2, 2) .
Question: Write a Matlab code to illustrate the changes in the posterior
distribution with n.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
125 / 246
3. Posterior distributions and inference
3.4. Large sample
An animation is worth 1,000,000 words...
Click me!
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
126 / 246
3. Posterior distributions and inference
3.4. Large sample
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
127 / 246
Sub-Section 3.5
Inference
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
128 / 246
3. Posterior distributions and inference
3.5. Inference
The output of the Bayesian estimation is the posterior distribution
π ( θj y )
1
In some particular case, the posterior distribution is a standard
distribution (Beta, normal etc..) and its pdf has an analytic form.
2
In other case, the posterior distribution is get from numerical
simulations (cf. section 5).
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
129 / 246
3. Posterior distributions and inference
3.5. Inference
Whatever the case (analytical or numerical), the outcome of the Bayesian
estimation procedure may be:
1
2
The graph of the posterior density π ( θ j y ) for all values of θ 2 Θ
Some particular moments (expectation, variance etc..) or quantiles
of this distribution
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
130 / 246
3. Posterior distributions and inference
3.5. Inference
Source: Gerali et al. (2014), Credit and Banking in a DSGE Model of the Euro Area, JMCB, 42(6) 107-141.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
131 / 246
3. Posterior distributions and inference
3.5. Inference
Source: Gerali et al. (2014), Credit and Banking in a DSGE Model of the Euro Area, JMCB, 42(6) 107-141.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
132 / 246
3. Posterior distributions and inference
3.5. Inference
But, we may also be interested in estimating one parameter of the model.
The Bayesian approach to this problem uses the idea of a loss function
θ, θ
L b
where b
θ is the Bayesian estimator (cf. chapter 2).
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
133 / 246
3. Posterior distributions and inference
3.5. Inference
Example (Absolute loss function)
The absolute loss function is de…ned to be:
θ
θ, θ = b
L b
θ
Example (Quadratique loss function)
The quadratic loss function is de…ned to be:
L b
θ, θ = b
θ
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
2
θ
June 26, 2014
134 / 246
3. Posterior distributions and inference
3.5. Inference
De…nition (Bayes estimator)
The Bayes estimator of θ is the value of b
θ that minimizes the expected
value of the loss, where the expectation is taken over the posterior
distribution of θ; that is, b
θ is chosen to minimize
E L b
θ, θ
Christophe Hurlin (University of Orléans)
=
Z
L b
θ, θ π ( θ j y ) d θ
Bayesian Econometrics
June 26, 2014
135 / 246
3. Posterior distributions and inference
3.5. Inference
Idea
The idea is to minimise the average loss whatever the possible value of θ
E L b
θ, θ
Christophe Hurlin (University of Orléans)
=
Z
L b
θ, θ π ( θ j y ) d θ
Bayesian Econometrics
June 26, 2014
136 / 246
3. Posterior distributions and inference
3.5. Inference
Under quadratic loss, we minimize
E L b
θ, θ
=
Z
b
θ
2
θ
π ( θj y ) d θ
By di¤erentiating the function with respect to b
θ and setting the derivative
equal to zero
Z
b
2
θ θ π ( θj y ) d θ = 0
or equivalently
b
θ=
Christophe Hurlin (University of Orléans)
Z
θ π ( θj y ) d θ = E ( θj y )
Bayesian Econometrics
June 26, 2014
137 / 246
3. Posterior distributions and inference
3.5. Inference
De…nition (Bayes estimator, quadratic loss)
For a quadratic loss function, the optimal Bayes estimator is the
expectation of the posterior distribution
b
θ = E ( θj y ) =
Christophe Hurlin (University of Orléans)
Z
θ π ( θj y ) d θ
Bayesian Econometrics
June 26, 2014
138 / 246
3. Posterior distributions and inference
3.5. Inference
Source: Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
139 / 246
3. Posterior distributions and inference
3.5. Inference
In addition to reporting a point estimate of a parameter θ, it is often
useful to report an interval estimate of the form
Pr (θ L
θ
θU ) = 1
α
Bayesians call such intervals credibility intervals (or Bayesian con…dence
intervals) to distinguish them from a quite di¤erent concept that appears
in frequentist statistics, the con…dence interval (cf. chapter 2).
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
140 / 246
3. Posterior distributions and inference
3.5. Inference
De…nition (Bayesian con…dence interval)
If the posterior distribution is unimodal, then a Bayesian con…dence
interval or credibility interval on the value of θ, is given by the two values
θ L and θ U such that:
Pr (θ L
θ
θU ) = 1
α
where α is the level of risk.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
141 / 246
3. Posterior distributions and inference
3.5. Inference
For a Bayesian, values of θ L and θ L can be determined to obtain the
desired probability from the posterior distribution.
If more than one pair is possible, the pair that results in the shortest
interval may be chosen; such a pair yields the highest posterior density
interval (h.p.d.).
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
142 / 246
3. Posterior distributions and inference
3.5. Inference
De…nition (Highest posterior density interval)
The highest posterior density interval (hpd) is the smallest region H such
that
Pr (θ 2 H ) = 1 α
where α is the level of risk. If the posterior distribution is multimodal, the
hpd may be disjoint.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
143 / 246
3. Posterior distributions and inference
3.5. Inference
Source: Colletaz et Hurlin (2005), Modèles non linéaires et prévision, Rapport pour l’Institut pour la Recherche CDC.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
144 / 246
3. Posterior distributions and inference
3.5. Inference
Another basic issue in statistical inference is the prediction of new data
values.
De…nition (Forecasting)
The general form of the pdf/pmf of Yn +1 given y1 , .., yn is:
f ( yn +1 j y ) =
Z
f ( yn +1 j θ, y ) π ( θ j y ) d θ
where π ( θ j y ) is the posterior distribution of θ.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
145 / 246
3. Posterior distributions and inference
3.5. Inference
Forecasting
f ( yn +1 j y ) =
1
Z
f ( yn +1 j θ, y ) π ( θ j y ) d θ
If Y is a discrete variable, this formula gives the conditional
probability of Yn +1 = yn +1 given y1 , .., yn :
Pr (Yn +1 = yn +1 j y ) =
2
Z
p ( yn +1 j θ, y ) π ( θ j y ) d θ
If Y is a continuous variable, this formula gives the conditional
denisty of Yn +1 given y1 , .., yn . In this case, we can compute the
expected value of Yn +1 as:
E ( yn +1 j y ) =
Christophe Hurlin (University of Orléans)
Z
f ( yn +1 j y ) yn +1 dyn +1
Bayesian Econometrics
June 26, 2014
146 / 246
3. Posterior distributions and inference
3.5. Inference
Forecasting
f ( yn +1 j y ) =
Z
f ( yn +1 j θ, y ) π ( θ j y ) d θ
If Yn +1 and Y1 , .., Yn are independent, then f ( yn +1 j θ ) and we have:
f ( yn +1 j y ) =
Christophe Hurlin (University of Orléans)
Z
f ( yn +1 j θ ) π ( θ j y ) d θ
Bayesian Econometrics
June 26, 2014
147 / 246
3. Posterior distributions and inference
3.5. Inference
Example (Forecasting)
Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that
Yi
Be (θ ). We assume that the prior distribution for θ is a Beta
distribution B (α, β) . We want to forecast the value of Yn +1 given the
realisations (y1 , .., yn ) .
Question: Determine the probability Pr ( Yn +1 = 1j y1 , .., yn ).
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
148 / 246
3. Posterior distributions and inference
3.5. Inference
Solution
In this example, the trials are independent, so Yn +1 is independent of
(Y1 , .., Yn ) .
Pr ( Yn +1 = 1j y ) =
Z
Pr ( Yn +1 = 1j θ, y ) π ( θ j y ) d θ =
Z
Pr ( Yn +1 =
The posterior distribution of θ is is in the form of a beta distribution:
π ( θj y ) =
with
Γ ( α 1 + β 1 ) α1
θ
Γ ( α1 ) + Γ ( β1 )
(1
n
α1 = α + ∑ yi
θ ) β1
1
n
β1 = α + n
i =1
Christophe Hurlin (University of Orléans)
1
Bayesian Econometrics
∑ yi
i =1
June 26, 2014
149 / 246
3. Posterior distributions and inference
3.5. Inference
Solution (cont’d)
Pr ( Yn +1 = 1j y ) =
π ( θj y ) =
Z
Pr ( Yn +1 = 1j θ ) π ( θ j y ) d θ
Γ ( α 1 + β 1 ) α1
θ
Γ ( α1 ) + Γ ( β1 )
1
(1
1
θ ) β1
Since Pr ( Yn +1 = 1j θ ) = θ, we have
Pr ( Yn +1 = 1j y ) =
=
Christophe Hurlin (University of Orléans)
Γ ( α1 + β1 )
Γ ( α1 ) + Γ ( β1 )
Z1
θ θ α1
Γ ( α1 + β1 )
Γ ( α1 ) + Γ ( β1 )
Z1
θ α1 (1
Bayesian Econometrics
1
θ ) β1
(1
1
dθ
0
θ ) β1
1
dθ
0
June 26, 2014
150 / 246
3. Posterior distributions and inference
3.5. Inference
Solution (cont’d)
Pr ( Yn +1
Γ ( α1 + β1 )
= 1j y ) =
Γ ( α1 ) + Γ ( β1 )
Z1
θ α1 (1
θ ) β1
1
dθ
0
We admit that:
Z1
θ α1 (1
θ ) β1
0
1
dθ =
Γ ( α1 + 1) Γ ( β1 )
Γ ( α1 + β1 + 1)
So, we have
Pr ( Yn +1 = 1j y ) =
Christophe Hurlin (University of Orléans)
Γ ( α1 + β1 )
Γ ( α1 ) + Γ ( β1 )
Bayesian Econometrics
Γ ( α1 + 1) Γ ( β1 )
Γ ( α1 + β1 + 1)
June 26, 2014
151 / 246
3. Posterior distributions and inference
3.5. Inference
Solution (cont’d)
Pr ( Yn +1 = 1j y ) =
Since Γ (p ) = (p
1) Γ (p
Pr ( Yn +1 = 1j y ) =
=
Christophe Hurlin (University of Orléans)
Γ ( α1 + β1 )
Γ ( α1 ) + Γ ( β1 )
Γ ( α1 + 1) Γ ( β1 )
Γ ( α1 + β1 + 1)
1) , we have:
Γ ( α1 + β1 )
Γ ( α1 ) + Γ ( β1 )
α1
α1 + β1
Bayesian Econometrics
Γ ( α1 ) α1 Γ ( β1 )
Γ ( α1 + β1 ) ( α1 + β1 )
June 26, 2014
152 / 246
3. Posterior distributions and inference
3.5. Inference
Solution (cont’d)
Pr ( Yn +1 = 1j y ) =
α + ∑ni=1 yi
α1
=
α1 + β1
α+β+n
So, we found that
Pr ( Yn +1 = 1j y ) = E ( θ j y ) =
α + ∑ni=1 yi
α+β+n
The estimate of Pr ( Yn +1 = 1j y ) is the mean of the posterior distribution
of θ.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
153 / 246
3. Posterior distributions and inference
Key concepts of Section 3
1
Marginal and conditional posterior distribution
2
Bayesian updating
3
Bayes estimator
4
Bayesian con…dence interval
5
Highest posterior density interval
6
Bayesian prediction
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
154 / 246
Section 4
Applications
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
155 / 246
5. Applications
Objectives
The objective of this section are the following:
1
To discuss the Bayesian estimation of VAR models
2
To propose various priors for this type of models
3
To introduce the issue of numerical simulations of the posterior
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
156 / 246
5. Applications
Consider a typical VAR(p )
Y t = B1 Y t
Yt is n
εt is a n
1
+ B2 Y t
2
+ .. + Bp Yt
t = 1, .., T
1 vector of error terms i.i.d. with
IIN
0 , Σ
n 1 n n
1 vector of exogenous variables
Bi for i = 1, .., p is a n
D is a n
+ Dzt + εt
1 vector of endogenous variables
εt
zt is a d
p
n matrix of parameters
d matrix of parameters
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
157 / 246
5. Applications
Classical estimation of the parameters B1 , .., Bp , D, Σ may yield
imprecisely estimated relations that …t the data well only because of
the large number of variables included: problem known as over…tting
The number of parameters to be estimated n (np + d + (n
grows geometrically with the number of variables (n) and
proportionally with the number of lags (p ).
1) /2)
When the number of parameters is relatively high and the sample
information is relatively loose (macro-data), it is likely that the
estimates are in‡uenced by noise as opposed to signal
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
158 / 246
5. Applications
A Bayesian approach to VAR estimation was originally advocated by
Litterman (1980) as solution to the over…tting problem.
Litterman R. (1980), Techniques for forecasting with Vector
AutoRegressions, University of Minnesota, Ph.D. dissertation.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
159 / 246
5. Applications
Bayesian estimation is a solution to the over…tting that avoids
imposing exact zero restrictions on the parameters
The researcher cannot be sure that some coe¢ cients are zero and
should not ignore their possible range of variations
A Bayesian perspective …ts precisely this view with the prior
distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
160 / 246
5. Applications
Y t = B1 Y t
1
+ B2 Y t
2
+ .. + Bp Yt
p
+ εt
Rewrite the VAR in a compact form:
Y t = Xt β + ε t
Xt = In
Wt
1
Wt
1
is n
nk with k = p + d
= Yt> 1 , .., Yt> p , zt>
>
β = vec (B1 , .., Bp , D) is a nk
Christophe Hurlin (University of Orléans)
is k
1
1 vector
Bayesian Econometrics
June 26, 2014
161 / 246
5. Applications
De…nition (Likelihood)
Under the normality assumption, the condional distribution of
Yt = Xt β + εt given Xt and the set of parameters β is normal
Y t j Xt , β
N (Xt β, Σ)
and the likelihood of the (realisations) sample (Y1 , .., YT ) , denoted Y, is:
!
1 T
>
T /2
1
LT ( Y j X, β, Σ) ∝ jΣj
exp
( Y t Xt β ) Σ ( Y t Xt β )
2 t∑
=1
By convention, we will denote LT ( Y j X, β, Σ) = LT (Y, β, Σ)
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
162 / 246
5. Applications
De…nition (Joint posterior distribution)
Given a prior π ( β, Σ) , the joint posterior distribution of the parameters
β is given by
LT (Y, β, Σ) π ( β, Σ)
π ( β, Σj Y ) =
f (Y )
or equivalently
π ( β, Σj Y ) ∝ LT (Y, β, Σ) π ( β, Σ)
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
163 / 246
5. Applications
Marginal posterior distribution
Given π ( β, Σj Y ) , the marginal posterior distribution for β and Σ can be
obtained by integration
π ( βj Y ) =
π ( Σj Y ) =
Z
Z
π ( β, Σj Y ) d Σ
π ( β, Σj Y ) d β
where d Σ and d β correspond to integrand de…ned for all the elements of
Σ or β respectively
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
164 / 246
5. Applications
Bayesian estimates
Location and dispersion of π ( βj Y ) and π ( Σj Y ) can be easily analysed
to yield point estimates (quadratic loss) of the parameters of interest and
measure of precision, comparable to those obtained by using classical
approach to estimation. Especially:
E (π ( βj Y ))
Christophe Hurlin (University of Orléans)
V (π ( βj Y ))
Bayesian Econometrics
E (π ( Σj Y ))
June 26, 2014
165 / 246
5. Applications
Two problems arise:
1
The numerical integration of the marginal posterior distribution
2
The choice of the prior
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
166 / 246
5. Applications
Fact (Numerical integration)
In most of case, the numerical integration of π ( β, Σj Y ) may be di¢ cult
or even impossible to implement. For instance, if n = 1, p = 4, we have
π ( Σj Y ) =
Z
π ( β, Σj Y ) d β =
Z Z Z Z
π ( β, Σj Y ) d β1 d β2 d β3 d β4
This problem, however, can often be solved by using numerical integration
based on Monte Carlo simulations.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
167 / 246
5. Applications
One particular MCMC (Markov-Chain Monte Carlo) estimation method is
the Gibbs sampler, which a particular version of the Metropolis-Hasting
algorithm (see section 5).
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
168 / 246
5. Applications
De…nition (Gibbs sampler)
The Gibbs sampler is a recursive Monte Carlo method that allows to
generate simulated values of ( β, Σ) from the joint posterior distribution.
This method requires only knowledge of the full conditional posterior
distribution of the parameters of interest π ( βj Y, Σ) and π ( Σj Y, β) .
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
169 / 246
5. Applications
De…nition (Gibbs algorithm)
Suppose that β and Σ are sacalr. The Gibbs sampler algorithm starts
from arbirtary values β0 ,Σ0 , and samples alternatively from the density
of each element of the parameter vector, conditional of the value of the
other element sampled in the previous iteration. Thus, the Gibbs sampler
samples recursively as follows:
β1 from π βj Y , Σ0
Σ1 from π Σj Y , β1
β2 from π βj Y , Σ1
...
βm from π βj Y , Σm
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
1
June 26, 2014
170 / 246
5. Applications
Fact (Gibbs sampler)
The vectors ( βm , Σm ) form a Markov-Chain and for a su¢ ciently large
number of iterations, m M, can be regarded as draws from the true
joint posterior distribution π ( β, Σj Y ) . Given a large sample of draws
from this limiting distribution, ( βm , Σm )Gm =M
M +1 , any posterior moment
of marginal density of interest can then be easily estimated consistently
with its corresponding sample average. For instance:
1
G
Christophe Hurlin (University of Orléans)
G
∑
m =M +1
βm ! E ( β j Y )
Bayesian Econometrics
June 26, 2014
171 / 246
5. Applications
Remark
The process must be started somewhere, though it does not matter
much where.
Nonetheless, a burn-in period is required to eliminate the in‡uence of
the starting values. So, the …rst M values ( βm , Σm ) are discarded.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
172 / 246
5. Applications
Example (Gibbs sampler)
We consider the bivariate normal distribution …rst. Suppose we wished to
draw a random sample from the population
X1
X2
N
0
0
,
1 ρ
ρ 1
Question: write a Matlab code to generate a sample of n = 1, 000
observations of (x1 , x2 )> with a Gibbs sampler.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
173 / 246
5. Applications
Solution
The Gibbs sampler takes advantage of the result
Christophe Hurlin (University of Orléans)
X1 j x2
N ρx2 , 1
ρ2
X2 j x1
N ρx1 , 1
ρ2
Bayesian Econometrics
June 26, 2014
174 / 246
5. Applications
3
2
1
0
-1
-2
-3
-3
Christophe Hurlin (University of Orléans)
-2
-1
0
Bayesian Econometrics
1
2
3
June 26, 2014
175 / 246
5. Applications
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
176 / 246
5. Applications
Priors
A second problem in implementing Bayesian estimation of VAR models is
the choice of the prior distribution for the model’s parameters
π ( β, Σ)
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
177 / 246
5. Applications
We distinguish two types of priors:
1
2
Some priors lead to analytical formula for the posterior distribution
1
Di¤use prior
2
Natural conjugate prior
3
Minnesoto prior
For other priors there is no analytical formula for the priors and the
Gibbs sampler is required.
1
Independent Normal-Wishart Prior
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
178 / 246
5. Applications
De…nition (Full Bayesian estimation)
A full Bayesian estimation requires specifying prior distribution for the
hyperparameters of the prior distribution of the parameters of interest.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
179 / 246
5. Applications
De…nition (Empirical Bayesian estimation)
The empirical Bayesian estimation is based on a preliminary estimation
of the hyperparameters γ. These estimates could be obtained by OLS,
maximum likelihood, GMM or other estimation method. Then, π (θ; γ) is
b ) . Note that the uncertainty in the estimation of the
substituted by π (θ; γ
hyperparameters is not taken into account in the posterior distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
180 / 246
5. Applications
Empirical Bayesian estimation
In this section, we consider only empirical Bayesian estimation methods:
Y t = Xt β + ε t
IIN (0, Σ)
εt
Notations:
b=
β
b=
Σ
T
T
∑ Xt> Xt
t =1
T
1
k
∑ (Y t
!
1
T
∑ Xt> Yt
t =1
Xt β ) > ( Y t
!
OLS estimator of β
Xt β) OLS estimator of Σ
t =1
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
181 / 246
5. Applications
Matlab codes for Baysesian estimation of VAR models are proposed by
Koop and Korobilis (2010)
1
2
Analytical results:
1
Code BVAR_ANALYT.m gives posterior means and variances of
parameters & predictives, using the analytical formulas.
2
Code BVAR_FULL.m estimates the BVAR model combining all the
priors discussed below, and provides predictions and impulse responses
Gibbs sampler: Code BVAR_GIBBS.m estimates this model, but
also allows the prior mean and covariance.
Koop G. and Korobilis D. (2010), Bayesian Multivariate Time Series
Methods for Empirical Macroeconomics
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
182 / 246
Sub-Section 4.1
Minnesota Prior
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
183 / 246
5. Applications
Fact (Stylised facts)
Literman (1986) speci…es his prior by appealing to three statistical
regularities of macroeconomic time series data:
(1) the trending behavior of macroeconomic time series
(2) more recent values of a series usually contain more information on the
current value of the series than past values
(3) past values of a given variable contain more information on its current
state than past values of other variables.
Litterman R. (1986), Forecasting with Bayesian Vector AutoRegressions,
JBES, 4, 25-38.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
184 / 246
5. Applications
A Bayesian researcher specify these regularities by assigning a probability
distribution to the parameters in such way that:
1
the mean of the coe¢ cients assigned to all lags other than the …rst
one is equal to zero.
2
the variance of the coe¢ cients depends inversely on the number of
lags
3
the coe¢ cient of variable j in equation g are assigned a lower prior
variance than those of variable g .
These requirements will be controlled for by the hyperparameters
π = (π 1 , .., π d )> =) prior on B1 , .., Bp , D, Σ
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
185 / 246
5. Applications
De…nition (Minnesota prior, Litterman 1986)
In the Minnesota prior, the variance covariance matrix Σ is assumed to be
b Denote β for k = 1, .., n, the vector of parameters
…xed and equal to Σ.
k
ith
of the k equation, the corresponding prior distribution is:
βk
Christophe Hurlin (University of Orléans)
N βk , Ωk
Bayesian Econometrics
June 26, 2014
186 / 246
5. Applications
Remark
1
Given these assumptions, there is prior and posterior independence
between equations.
2
The equations can be estimated separately.
3
By assuming Ωk = 0, the posterior mean of βk corresponds to the
OLS estimator of βk .
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
187 / 246
5. Applications
Litterman (1986) then assigns numerical values to the hyperparameters
given the previous stylised facts.
For the prior mean:
1
The traditional choice is to set βk = 0 for all the parameters if the
k th variable is a growth rate (random walk behavior).
2
If the variable is in level all the hyperparameters are set to 0, except
the parameter associated to the …rst own lag.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
188 / 246
5. Applications
The prior covariance matrix Ωk is assumed to be diagonal with elements
ω i ,i for i = 1, .., k.
ω i ,i =
8
<
:
a1
r2
a 2 σii
r 2 σjj
a3 σij
for coe¢ cients on own lag r = 1, .., p
for coe¢ cients on lag r of variable j 6= i, r = 1, .., p
for coe¢ cients on exogenous variables
This prior simpli…es the complicated choice of fully specifying all the
ele-ments of Ωk to choosing three scalars, a1 , a2 and a3 .
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
189 / 246
5. Applications
Theorem (Posterior distribution)
If we assume Minnesota priors, the posterior distribution of βk is:
βk j Y
e ,Ω
ek
N β
k
e =Ω
ek Ω
e k 1 β + σ 2 X> Y g
β
k
k
k ,k
e k = Ω k 1 + σ 2 X> X
Ω
k ,k
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
1
June 26, 2014
190 / 246
Sub-Section 4.2
Di¤use Prior
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
191 / 246
5. Applications
De…nition (Di¤use prior)
The Di¤use prior is de…ned as to be:
π ( β, Σ) ∝ jΣj
Christophe Hurlin (University of Orléans)
(n +1 )/2
Bayesian Econometrics
June 26, 2014
192 / 246
5. Applications
Remark
π ( β, Σ) ∝ jΣj
(n +1 )/2
Contrary to Minnesota prior:
1
Σ is not assumed to be …xed
2
The equations are not independent
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
193 / 246
5. Applications
Theorem (Posterior distribution)
For a Di¤usion prior distribution, the joint posterior distribution is given by:
π ( β, Σj Y ) = π ( βj Y, Σ) π ( Σj Y )
βj Y, Σ
Σj Y
b Σ
N β,
b T
IW Σ,
k
where IW denotes the Inverted Wishart distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
194 / 246
Sub-Section 4.3
Natural conjugate prior
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
195 / 246
5. Applications
De…nition (Natural conjugate prior)
The natural conjugate prior has the form:
βj Σ
N β, Σ
Σ
Ω
IW Σ, α
with α > n and IW denotes the Inverted Wishart distribution with a
degree of freedom equal to s. The hyperparameters are β, Ω, Σ and α.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
196 / 246
5. Applications
Theorem (Posterior distribution)
The posterior distribution associated to the natural conjugate prior is:
βj Σ, Y
Σj Y
where
e
Ω
e T +α
IW Σ,
e=Ω
e Ω
e
β
e = Ω
Ω
Christophe Hurlin (University of Orléans)
e Σ
N β,
1
b
β + X> X β
ols
1
+ X> X
Bayesian Econometrics
1
June 26, 2014
197 / 246
Sub-Section 4.4
Independent Normal-Wishart Prior
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
198 / 246
5. Applications
De…nition (Independent Normal-Wishart Prior)
The independent Normal-Wishart prior is de…ned as:
1
π β, Σ
= π ( β) π Σ
1
where
β
Σ
1
N β, Ω
W Σ
1
,α
where W (., α) denotes the Wishart distribution with α degrees of freedom.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
199 / 246
5. Applications
Remark
1
2
Note that this prior allows for the prior covariance matrix, Ω, to be
anything the researcher chooses, rather than the restrictive Σ Ω
form of the natural conjugate prior. For instance, the researcher could
choose a prior similar in spirit to the Minnesota prior.
A noninformative prior can be obtained by setting
β=Ω=Σ
Christophe Hurlin (University of Orléans)
1
Bayesian Econometrics
=0
June 26, 2014
200 / 246
5. Applications
Theorem (Posterior distribution)
In the case of independent Normal-Wishart prior, the joint posterior
distribution can not be derived analytically. We can only derive the
conditional posterior distribution (used in the Gibbs sampler)
e Σ
N β,
βj Σ, Y
Σj Y, β
where
e=Ω
e
β
e =
Ω
Christophe Hurlin (University of Orléans)
1
Ω
e
Ω
e T +α
IW Σ,
T
β+
∑
Xt> Σ 1 Yt
t =1
Ω
1
T
+
∑
Xt> Σ 1 Xt
t =1
Bayesian Econometrics
!
!
1
June 26, 2014
201 / 246
Sub-Section 5
Simulation methods
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
202 / 246
5. Simulation methods
Objectives
The objective of this section are the following:
1
Introduce the Probability Integral Transform (PIT) method
2
Introduce the Accept-Reject (AR) method
3
Introduce the Importance sampling method
4
Introduce the Gibbs algorithm
5
Introduce the Metropolis Hasting algorithm
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
203 / 246
5. Simulation methods
In bayesian econometrics, we may distinguish two case give the prior:
1
When we use a conjugate prior, the posterior distribution is
sometimes "standard" (normal, gamma, beta etc..) and standard
packages of the statistic softwares may be used to compute the
density π ( θ j y ) , its moments (for instance Eπ ( θ j y )), the hpd etc.
2
In most of cases, the posterior density π ( θ j y ) is obtained
numerically through simulation methods.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
204 / 246
Sub-Section 5.1
Probability Integral Transform (PIT) method
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
205 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
Probability Integral Transform (PIT) method
Suppose we wish to draw a sample of values from a continuous
random variable X that has cdf FX (.), assumed to be nondecreasing.
The PIT methods allows to generate a sample of X from a sample of
random values issued from U where U has a uniform distribution
over [0, 1] :
U U[0,1 ]
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
206 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
Probability Integral Transform (PIT) method
Indeed, if we assume that the random variable U is a function of the
random variable X such that:
U = FX ( X )
What is the distribution of U?
FX (x ) = Pr (X
x ) = Pr (FX (X )
Christophe Hurlin (University of Orléans)
FX (x )) = Pr (U
Bayesian Econometrics
FX (x ))
June 26, 2014
207 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
Probability Integral Transform (PIT) method
FX (x ) = Pr (X
x ) = Pr (FX (X )
FX (x )) = Pr (U
FX (x ))
So, we have:
Pr (U
FX (x )) = FX (x )
We know that if U has a uniform distribution then
Pr (U
u) = u
So, U has a uniform distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
208 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
De…nition (Probability Integral Transform (PIT) )
If X is a continuous random variable with a cdf FX (.) , then the
transformed (probability integral transformation) variable U = FX (X )
has a uniform distribution over [0, 1] .
U = FX ( X )
Christophe Hurlin (University of Orléans)
U[0,1 ]
Bayesian Econometrics
June 26, 2014
209 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
How to get a trial for X from a trial from U?
De…nition (PIT algorithm)
In order to get a realisation x of the random variable of X with a cdf
FX (.) , from a realisation u of the variable U with U U[0,1 ] , the
following procedure has to be adopted:
(1) Draw u from U [0, 1] .
(2) Compute x = FX 1 (u )
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
210 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
Example (PIT method)
Suppose we wish to draw a sample from a random variable with density
function
3 2
if 0 x 2
8y
fX ( x ) =
otherwise
0
Question: Write a Matlab code a generate a sample (x1 , .., xn ) with
n = 100.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
211 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
Solution (cont’d)
First, determine the cdf of X :
3
FX ( x ) =
8
Zx
t 2 dt =
0
1 3
x
8
So, we have:
1 3
X
U[0,1 ]
8
Then, determine the probability inverse transformation:
U = FX ( X ) =
X = FX 1 (U ) = 2U 1/3
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
212 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
Solution (cont’d)
2
1 .8
1 .6
1 .4
1 .2
1
0 .8
0 .6
0 .4
0 .2
0
20
40
60
Christophe Hurlin (University of Orléans)
80
100
Bayesian Econometrics
June 26, 2014
213 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
Remark
Note that a multivariate random variable cannot be simulated by this
method, because its cdf is not one-to-one and therefore not invertible.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
214 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
Remark
An important application of this method is to the problem of
sampling from a truncated distribution
Suppose that X has a cdf FX (x ) and that we want to generate
restricted values of X such that
c1
Christophe Hurlin (University of Orléans)
X
Bayesian Econometrics
c2
June 26, 2014
215 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
Remark (cont’d)
The cdf of the truncated variable is
FX ( x )
FX ( c 1 )
F ( c1 )
FX ( c2 )
for c1
x
c2
Then, we have:
U=
FX ( X )
FX ( c1 )
F ( c1 )
FX ( c2 )
U[0,1 ]
and the truncated variable can be de…ned as:
Xtrunc = FX 1 (FX (c1 ) + U (FX (c2 )
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
FX (c1 )))
June 26, 2014
216 / 246
5. Simulation methods
5.1. Probability Integral Transform (PIT) method
Recommendation
1
2
Why using the PIT method?
I
In order to generate a sample of simulated values from a given
distribution
I
This distribution is not available in my statistical software..
What are the prerequisites of the PIT?
I
The functional form of the cdf is known and we need to know
F 1 (x ) ....
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
217 / 246
Sub-Section 5.2
Accept-reject method
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
218 / 246
5. Simulation methods
5.2. Accept-reject method
De…nition (Accept-reject method)
The accept–reject (AR) algorithm can be used to simulate values from
a density function fX (.) if it is possible to simulate values from a density
g (.) and if a number c can be found such that
fX ( x )
c
g (x )
for all x in the support of fX (.).
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
219 / 246
5. Simulation methods
5.2. Accept-reject method
De…nition (Target and source densities)
The density fX (.) is called the target density and g (.) is called the
source density.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
220 / 246
5. Simulation methods
5.2. Accept-reject method
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
221 / 246
5. Simulation methods
5.2. Accept-reject method
Posterior distribution
In the context of the Bayesian econometrics, we have:
π ( θj y )
c
c = sup
θ 2Θ
where:
1
2
g ( θj y )
8θ 2 Θ
π ( θj y )
g ( θj y )
Target density = posterior distribution π ( θ j y )
Source density = g ( θ j y )
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
222 / 246
5. Simulation methods
5.2. Accept-reject method
De…nition (AR algorithm, posterior distribution)
The Accept-Reject algorithm for the posterior distribution is the following:
(1) Generate a value m θ s from g ( θ j y ).
(2) Draw a value u from U[0,1 ] .
(3) Return θ s as a draw from π ( θ j y ) if
u
π ( θs j y )
cg ( θ s j y )
If not, reject it and return to step 1.(The e¤ect of this step is to accept θ s
with probability π ( θ s j y ) /cg ( θ s j y ).)
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
223 / 246
5. Simulation methods
5.2. Accept-reject method
Example (AR algorithm)
We aim at simulating some realisations from a N (0, 1) distribution
(target distribution) with a source density given by a Laplace distribution
with a pdf:
1
g (x ) = exp ( jx j)
2
Question: write a Matlab code to simulate a sample of n = 1, 000
realisations of the normal distribution.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
224 / 246
5. Simulation methods
5.2. Accept-reject method
Solution
In order to simplify the problem, we generate only positive values for x.
The source density can transformed as follows (exponential density with
λ = 1).
g (x ) = exp ( x )
Since the normal and the Laplace are symmetric about zero, if the
proposal (x > 0) is accepted, it is assigned a positive value with
probability one half and a negative value with probability one half.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
225 / 246
5. Simulation methods
5.2. Accept-reject method
Solution (cont’d)
The pdf of the target and source distributions are for x > 0:
1
f (x ) = p
exp
2π
x2
2
g (x ) = exp ( x )
Determination of c : determine the maximum value of f (x ) /g (x ) :
x2
2
f (x )
1 exp
=p
exp
x
g (x )
( )
2π
The maximum is reached for x = 1, then we have:
1
c= p
exp
2π
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
1
2
June 26, 2014
226 / 246
5. Simulation methods
5.2. Accept-reject method
0.7
0.6
0.5
Ratio f(x)/g (x)
0.4
0.3
0.2
0.1
0
0
Christophe Hurlin (University of Orléans)
2
4
6
Bayesian Econometrics
8
10
June 26, 2014
227 / 246
5. Simulation methods
5.2. Accept-reject method
0.7
0.6
0.5
f(x)
c .g(x)
0.4
0.3
0.2
0.1
0
0
Christophe Hurlin (University of Orléans)
2
4
6
Bayesian Econometrics
8
10
June 26, 2014
228 / 246
5. Simulation methods
5.2. Accept-reject method
Solution (cont’d)
The AR algorithm is the following:
1
Generate x from an exponential distribution with parameter λ = 1.
2
Generate u from an uniform distribution U[0,1 ]
3
If
u
c
φ (x )
=
exp ( x )
p1
2π
p1
2π
x 2 /2
exp (1/2) exp ( x )
then return x if u > 1/2 and
Christophe Hurlin (University of Orléans)
exp
x if u
Bayesian Econometrics
= exp x
x2
+x
2
1/2. Otherwise, reject x.
June 26, 2014
229 / 246
5. Simulation methods
5.2. Accept-reject method
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
230 / 246
5. Simulation methods
5.2. Accept-reject method
Generated sample
Kernel density estimates
3
0 .5
0 .4 5
2
0 .4
1
0 .3 5
0 .3
0
0 .2 5
-1
0 .2
0 .1 5
-2
0 .1
-3
0 .0 5
-4
0
200
400
600
Christophe Hurlin (University of Orléans)
800
1000
Bayesian Econometrics
0
-5
-4
-3
-2
-1
0
1
2
3
June 26, 2014
4
231 / 246
5. Simulation methods
5.2. Accept-reject method
Fact (Unormalised posterior distribution)
An interesting feature of the AR algorithm is that is allows to simulate
some values for θ (from the posterior distribution π ( θ j y )) by only using
the unormalised posterior
π ( θj y ) =
f ( y j θ) π ( θj y )
f (y )
π ( θj y ) ∝ f ( y j θ) π ( θj y )
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
232 / 246
5. Simulation methods
5.2. Accept-reject method
Intuition
π ( θj y ) =
1
f (y )
| {z }
unknown
f ( y j θ) π ( θj y )
|
{z
}
known
Assume that f ( y j θ ) π ( θ j y ) is known, but 1/f (y ) is unknow.
If a value of θ generated from g ( θ j y ) is accepted with probability
f ( y j θ ) π ( θ j y ) /cg ( θ j y ), the accepted values of θ are a sample from
π ( θ j y ).
This method can therefore be used even if the normalizing constant of
the target distribution is unknow
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
233 / 246
5. Simulation methods
5.2. Accept-reject method
Example (AR and posterior distribution)
Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that
Yi
Be (θ ) with θ = 0.3. We assume that the (uninformative) prior
distribution for θ is an uniform distribution over [0, 1]. Then, we have
π ( θj y ) =
θ Σyi (1
Ln (θ; y1 , .., yn ) π (θ )
=Z1
fY 1 ,..Y n (y1 , .., yn )
θ Σyi (1
0
θ ) Σ (1
yi )
θ ) Σ (1
yi )
dθ
e ( θ j y ) to simulate some
We can use the pdf of the unormalised posterior π
values (θ 1 , ..θ S ) from the posterior distribution
e ( θ j y ) = θ Σyi (1
π
Christophe Hurlin (University of Orléans)
θ ) Σ (1
Bayesian Econometrics
yi )
June 26, 2014
234 / 246
5. Simulation methods
5.2. Accept-reject method
Recommendation
1
2
Why using the Accept-Reject method?
I
In order to generate a sample of simulated values from a given
distribution.
I
This distribution is not available in my statistical software.
I
To generate samples of θ from the unormalised posterior distribution
What are the prerequisites of the PIT?
I
The functional form of the pdf of the target distribution is known
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
235 / 246
Sub-Section 5.3
Importance sampling
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
236 / 246
5. Simulation methods
5.3. Importance sampling
Suppose that one is interested in calculating the value of the integral
I = E ( h (θ )j y ) =
Z
Θ
h (θ ) π ( θ j y ) d θ
where h (θ ) is a continuous function.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
237 / 246
5. Simulation methods
5.3. Importance sampling
Example (Importance sampling)
Suppose that we want to compute the expectation and the variance of the
posterior distribution
E ( θj y ) =
E θ
2
y =
Z
Z
Θ
θ π ( θj y ) d θ
Θ
θ2 π ( θj y ) d θ
V ( θj y ) = E θ2 y
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
E2 ( θ j y )
June 26, 2014
238 / 246
5. Simulation methods
5.3. Importance sampling
I =
Z
Θ
h (θ ) π ( θ j y ) d θ
Consider a source density g ( θ j y ) that is easy to sample from and which
is a close match to π ( θ j y ) . Write:
I =
Z
Θ
h (θ ) π ( θ j y )
g ( θj y ) d θ
g ( θj y )
This integral can be approximated by drawing a sample of G values from
g ( θ j y ) denotes θ g1 , .., θ gG and computing
I '
Christophe Hurlin (University of Orléans)
1
G
G
∑ h (θ gi )
i =1
π ( θ gi j y )
g ( θ gi j y )
Bayesian Econometrics
June 26, 2014
239 / 246
5. Simulation methods
5.3. Importance sampling
De…nition (Importance sampling)
The moment of the posterior distribution
E ( h (θ )j y ) =
Z
Θ
h (θ ) π ( θ j y ) d θ
can be approximated by drawing G realisations θ g1 , .., θ gG from a source
density g ( θ gi j y ) and computing
E ( h (θ )j y ) '
1
G
G
∑ h (θ gi )
i =1
π ( θ gi j y )
g ( θ gi j y )
This expression can be regarded as a weighted average of the h (θ gi ) where
importance weights are π ( θ gi j y ) /g ( θ gi j y )
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
240 / 246
5. Simulation methods
5.3. Importance sampling
Example (Truncated exponential)
Consider a continuous random variable X with an exponential distribution
X
exp (1) truncated to [0, 1] . We want to approximate
E
1
1 + X2
by the importance sampling method with a source density B (2, 3)
because it is de…ned over [0, 1] and because, for this choice of parameters,
the match between the beta function and the target density is good.
Question: write a Matlab code to approximate this integral. Compare it
to the value obtained by numerical integration.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
241 / 246
5. Simulation methods
5.3. Importance sampling
Solution
For an exponential distribution with a rate parameter of 1 we have
f (x ) = exp ( x )
F (x ) = 1
exp ( x ) forx > 0
If this density is truncated over [0, 1] (truncated at right) we have
π (x ) =
f (x )
exp ( x )
=
F (1)
1 exp ( 1)
So, we aim at computing:
E
1
1 + X2
Christophe Hurlin (University of Orléans)
=
Z1
0
1
exp ( x )
dx
2
1 + x 1 exp ( 1)
Bayesian Econometrics
June 26, 2014
242 / 246
5. Simulation methods
5.3. Importance sampling
Solution (cont’d)
The importance sampling algorithm is the following:
1
Generate a sample of G values x1 , .., xG from a Beta distribution
B (2, 3)
2
Compute
E
1
1 + X2
'
1
G
G
1
exp ( xi )
1
2
1 exp ( 1) g2,3 (xi )
i =1 1 + xi
{z
} | {z }
| {z }|
∑
π (x i )
h (x i )
g (x i )
where gα,β (xi ) is the pdf of the B (α, β) distribution evaluated at x
gα,β (x ) =
xα
1
(1 x ) β
B (α, β)
1
and B(α, β) is the beta function.
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
243 / 246
5. Simulation methods
5.3. Importance sampling
Results
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
244 / 246
5. Simulation methods
5.3. Importance sampling
Recommendation
1
Why using the Importance sampling method?
I
2
In order to compute the moments of a given distribution (typically the
posterior distribution).
What are the prerequisites of the PIT?
I
The functional form of the pdf of the target distribution is known
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
245 / 246
End of Chapter 7
Christophe Hurlin (University of Orléans)
Christophe Hurlin (University of Orléans)
Bayesian Econometrics
June 26, 2014
246 / 246
Download