Chapter 7: Bayesian Econometrics Christophe Hurlin University of Orléans June 26, 2014 Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 1 / 246 Section 1 Introduction Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 2 / 246 1. Introduction The outline of this chapter is the following: Section 2. Prior and posterior distribution Section 3. Posterior distributions and inference Section 4. Applications (VAR and DSGE) Section 4. Numerical simulations Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 3 / 246 1. Introduction References Geweke J. (2005), Contemporary Bayesian Econometrics and Statistics. New York: John Wiley and Sons. (advanced) Geweke J., Koop G. and Van Dijk H. (2011), The Oxford Handbook of Bayesian Econometrics. Oxford University Press. Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil Greenberg E. (2008), Introduction to Bayesian Econometrics, Cambridge University Press. (recommended) Koop, G. (2003), Bayesian Econometrics. New York: JohnWiley and Sons. Lancaster T. (2004), An Introduction to Modern Bayesian Inference. Oxford University Press. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 4 / 246 1. Introduction Notations: In this course, I will (try to...) follow some conventions of notation. Y y fY ( y ) FY ( y ) Pr (.) y Y random variable realisation probability density function or probability mass function cumulative distribution function probability vector matrix Problem: this system of notations does not allow to discriminate between a vector (matrix) of random elements and a vector (matrix) of non-stochastic elements (realisation). Abadir and Magnus (2002), Notation in econometrics: a proposal for a standard, Econometrics Journal. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 5 / 246 Section 2 Prior and posterior distributions Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 6 / 246 2. Prior and posterior distributions Objectives The objective of this section are the following: 1 Introduce the concept of prior distribution 2 De…ne the hyperparameters 3 De…ne the concept of posterior distribution 4 De…ne the concept of unormalised posterior distribution Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 7 / 246 2. Prior and posterior distributions In statistics, there a distinction between two concepts of probability: 1 Frequentist probability 2 Subjective probability Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 8 / 246 2. Prior and posterior distributions Frequentist probability Frequentists restrict the assignment of probabilities to statements that describe the outcome of an experiment that can be repeated. Example Statement A1: “A coin tossed three times will come up heads either two or three times.” We can imagine repeating the experiment of tossing a coin three times and recording the number of times that two or three heads were reported. Then: Pr (A1 ) = lim n !∞ Christophe Hurlin (University of Orléans) number of times two or three heads occurs n Bayesian Econometrics June 26, 2014 9 / 246 2. Prior and posterior distributions Subjective probability 1 Those who take the subjective view of probability believe that probability theory is applicable to any situation in which there is uncertainty. 2 Outcomes of repeated experiments fall in that category, but so do statements about tomorrow’s weather, which are not the outcomes of repeated experiments. 3 Calling probabilities “subjective” does not imply that they can be set arbitrarily, and probabilities set in accordance with the axioms are consistent. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 10 / 246 2. Prior and posterior distributions Reminder The probability of event A is denoted by Pr (A). Probabilities are assumed to satisfy the following axioms: Pr (A) 1 0 1 2 Pr (A) = 1 if A represents a logical truth 3 If A and B describe disjoint events (events that cannot both occur), then Pr (A [ B ) = Pr (A) + Pr (B ) 4 Let Pr ( Aj B ) denote “the probability of A, given (or conditioned on the assumption) that B is true.” Then Pr ( Aj B ) = Christophe Hurlin (University of Orléans) Pr (A \ B ) Pr (B ) Bayesian Econometrics June 26, 2014 11 / 246 2. Prior and posterior distributions Reminder De…nition The union of A and B is the event that A or B (or both) occur; it is denoted by A [ B. De…nition The intersection of A and B is the event that both A and B occur; it is denoted by A \ B. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 12 / 246 2. Prior and posterior distributions Questions 1 What is the fundamental idea of the posterior distribution? 2 How it can be computed from the likelihood function and the prior distribution? Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 13 / 246 2. Prior and posterior distributions Example (Subjective view of probability) Let Y a binary variable with Y = 1 if a coin toss results in a head and 0 otherwise, and let Pr (Y = 1) = θ Pr (Y = 0) = 1 θ which is assumed to be constant for each trial. In this model, θ is a parameter and the value of Y is the data (realisation y ). Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 14 / 246 2. Prior and posterior distributions Example (cont’d) Under these assumptions, Y is said to have the Bernoulli distribution, written as Y Be (θ ) with a probability mass function (pmf) fY (y ; θ ) = Pr (Y = y ) = θ y (1 θ )1 y We consider a sample of i.i.d. variables (Y1 , .., Yn ) that corresponds to n repeated experiments. The realisation is denoted by (y1 , .., yn ) . Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 15 / 246 2. Prior and posterior distributions Frequentist approach 1 From the frequentist point of view, probability theory can tell us something about the distribution of the data for a given θ. Y Be (θ ) 2 The parameter θ is an unknown number between zero and one. 3 It is not given a probability distribution of its own, because it is not regarded as being the outcome of a repeated experiment. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 16 / 246 2. Prior and posterior distributions Fact (Frequentist approach) In a frequentist approach, the parameters θ are considered as constant terms and the aim is to study the distribution of the data given θ, through the likelihood of the sample. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 17 / 246 2. Prior and posterior distributions Reminder The likelihood function is de…ned as to be: Ln : Θ Rn ! R+ (θ; y1 , .., yn ) 7 ! Ln (θ; y1 , .., yn ) = fY1 ,..,YN (y1 , y2 , .., yn ; θ ) Under the i.i.d. assumption n (θ; y1 , .., yn ) 7 ! Ln (θ; y1 , .., yn ) = ∏ fY (yi ; θ ) i =1 Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 18 / 246 2. Prior and posterior distributions Remark: the (log-)likelihood function depends on two arguments: the sample (realisation) and θ Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 19 / 246 2. Prior and posterior distributions Reminder In our example, Y is a discrete variable and the likelihood can be interpreted as the joint probability to observe the sample (realisation) (y1 , .., yn ) given a value of θ Ln (θ; y1 .., yn ) = Pr ((Y1 = y1 ) \ ... \ (Yn = yn )) If the sample (Y1 , .., Yn ) is i.i.d., then: Ln (θ; y1 .., yn ) = Christophe Hurlin (University of Orléans) n n i =1 i =1 ∏ Pr (Yi = yi ) = ∏ fY (yi ; θ ) Bayesian Econometrics June 26, 2014 20 / 246 2. Prior and posterior distributions Reminder In our example, the likelihood of the sample (y1 , .., yn ) is n Ln (θ; y1 .., yn ) = ∏ fY (yi ; θ ) i =1 n = ∏ θy i (1 θ )1 yi i =1 Or equivalently Ln (θ; y1 .., yn ) = θ Σyi (1 Christophe Hurlin (University of Orléans) Bayesian Econometrics θ ) Σ (1 yi ) June 26, 2014 21 / 246 2. Prior and posterior distributions Notations: In the rest of the chapter, I will use the following alternative notations: Ln (θ; y ) `n (θ; y ) L (θ; y1 , .., yn ) ln Ln (θ; y ) Christophe Hurlin (University of Orléans) Ln (θ ) ln L (θ; y1 , .., yn ) Bayesian Econometrics ln Ln (θ ) June 26, 2014 22 / 246 2. Prior and posterior distributions Frequentist approach From the subjective point of view, however, θ is an unknown quantity. Since there is uncertainty over its value, it can be regarded as a random variable and assigned a probability distribution. Before seeing the data, it is assigned a prior distribution π (θ ) with 0 Christophe Hurlin (University of Orléans) Bayesian Econometrics θ 1 June 26, 2014 23 / 246 2. Prior and posterior distributions De…nition (Prior distribution) In a Bayesian framework, the parameters θ associated to the distribution of the data, are considered as random variables. Their distribution is called the prior distribution and is denoted by π (θ ) . Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 24 / 246 2. Prior and posterior distributions Remark In our example, the endogenous variable Yi is discrete (0 or 1): Yi Be (θ ) but the parameter θ = Pr (Yi = 1) can be considered as a continuous random variable over [0, 1]: in this case π (θ ) is a pdf. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 25 / 246 2. Prior and posterior distributions Example (Prior distribution) For instance, we may postulate that: θ Christophe Hurlin (University of Orléans) U[0,1 ] Bayesian Econometrics June 26, 2014 26 / 246 2. Prior and posterior distributions Remark Whatever the type of the distribution of the endogenous variable (discrete or continuous), the prior distribution is generally continuous. π (θ ) = probability density function (pdf) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 27 / 246 2. Prior and posterior distributions De…nition (Uninformative prior) An uninformative prior is a ‡at prior. Example: θ Christophe Hurlin (University of Orléans) Bayesian Econometrics U[0,1 ] June 26, 2014 28 / 246 2. Prior and posterior distributions Remark In most of cases, the prior distribution is parametrised, i.e. the pdf π (θ; γ) depends on a set of parameters γ. De…nition (Hyperparameters) The parameters of the prior distribution, called hyperparameters, are supplied by the researcher. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 29 / 246 2. Prior and posterior distributions Example (Hyperparameters) If θ 2 R and if the prior distribution is normal 1 exp π (θ; γ) = p σ 2π with γ = µ, σ2 > µ )2 (θ 2σ2 ! the vector of hyperparameters. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 30 / 246 2. Prior and posterior distributions Example (Beta prior distribution) If θ 2 [0, 1] , a common (parametrised) prior distribution is the Beta distribution denoted B (α, β) . π (θ; γ) = Γ (α + β) α θ Γ (α) Γ ( β) 1 (1 θ )β 1 α, β > 0 θ 2 [0, 1] with γ = (α, β)> the vector of hyperparameters. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 31 / 246 2. Prior and posterior distributions Beta distribution: reminder The gamma function, denoted Γ (p ) , is de…ned as to be: Γ (p ) = Z +∞ e x p 1 x dx p>0 0 with: Γ (p ) = (p Γ (p ) = (p 1) ! = (p Γ 1) Γ (p 1) (p 1 2 = 2) ... p 1) 2 1 if p 2 N π . Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 32 / 246 2. Prior and posterior distributions The Beta distribution B (α, β) has very interesting features: 1 Depending on the choice of α and β, this prior can capture beliefs that indicate θ is centered at 1/2, or it can shade θ toward zero or one; 2 It can be highly concentrated, or it can be spread out; 3 When both parameters are less than one, it can have two modes. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 33 / 246 2. Prior and posterior distributions The shape of a Beta distribution can be understood by examining its mean and variance: θ B (α, β) E (θ ) = α α+β V (θ ) = αβ 2 ( α + β ) ( α + β + 1) 1 The mean is equal to 1/2 if α = β 2 A larger α ( β) shades the mean toward 1 (0) 3 the variance decreases as α or β increases. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 34 / 246 2. Prior and posterior distributions a = 0.5 b = 0.5 a = 1.0 b = 1.0 6 6 4 4 2 0 2 0 0.2 0.4 0.6 0.8 1 0 0 0.2 a = 5.0 b = 5.0 6 4 4 2 0.8 1 2 0 0.2 0.4 0.6 0.8 1 0 0 0.2 a = 10.0 b = 5.0 0.4 0.6 0.8 1 a = 1.0 b = 30.0 6 6 4 4 2 0 0.6 a = 30.0 b = 30.0 6 0 0.4 2 0 0.2 Christophe Hurlin (University of Orléans) 0.4 0.6 0.8 1 0 0 0.2 Bayesian Econometrics 0.4 0.6 0.8 1 June 26, 2014 35 / 246 2. Prior and posterior distributions Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 36 / 246 2. Prior and posterior distributions Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 37 / 246 2. Prior and posterior distributions Remark In some models, the hyperparameters are stochastic: as for the parameters of interest θ, they have a distribution. This models are called hierarchical models Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 38 / 246 2. Prior and posterior distributions De…nition (Hierarchical models) In a hierarchical model, we add one or more additional levels, where the hyperparameters themselves are given a prior distribution depending on another set of hyperparameters. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 39 / 246 2. Prior and posterior distributions Example (hierarchical model) An example of hierarchical model is given by y : pdf f ( y j θ ) θ : pdf π ( θ j α) α : pdf π ( αj β) where the hyperparameters β are constant terms. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 40 / 246 2. Prior and posterior distributions De…nition (Posterior distribution) Bayesian inference centers on the posterior distribution π ( θ j y ), which is the conditional distribution of the random variable θ given the data (realisation of the sample) y = (y1 , .., yn ). θ j (Y1 = y1 , ..Yn = yn ) Christophe Hurlin (University of Orléans) posterior distribution Bayesian Econometrics June 26, 2014 41 / 246 2. Prior and posterior distributions Remark When there is more than one parameter, the posterior distribution is a joint conditional distribution of all the parameters given the observed data. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 42 / 246 2. Prior and posterior distributions Notations π (θ ) prior distribution = pdf of θ π ( θj y ) posterior distribution = conditional pdf p (y ; θ ) probability mass function (pmf) Pr (A) probability of event A Discrete endogenous variable Continuous endogenous variable fY ( y ; θ ) probability distribution function (pdf) FY ( y ; θ ) cumulative distribution function Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 43 / 246 2. Prior and posterior distributions Warning For all the general de…nitions and the general results, we will employ the notation fY (y ; θ ) for both probability mass and density functions Be careful with the interpretation of fY (y ; θ ): density (continuous case) or directly probability (discrete case). Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 44 / 246 2. Prior and posterior distributions Example (Discrete random variable) If Y Be (θ ) , the pmf given by: fY (y ; θ ) = Pr (Y = y ) = θ y (1 θ )1 y is a probability. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 45 / 246 2. Prior and posterior distributions Example (Continuous random variable) If Y N µ, σ2 with θ = µ, σ2 > , the pdf (y 1 fY ( y ; θ ) = p exp σ 2π µ )2 2σ2 ! is not a probability. The probability is given by Pr (Y y ) = FY ( y ; θ ) = Zy fY (x; θ ) dx ∞ Pr (Y = y ) = 0 Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 46 / 246 2. Prior and posterior distributions The posterior density function π ( θ j y ) is computed by Bayes theorem: Theorem (Bayes’s Theorem) For events A and B, the conditional probability of event A given that B has occurred is Pr ( B j A) Pr (A) Pr ( Aj B ) = Pr (B ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 47 / 246 2. Prior and posterior distributions Bayes theorem: Pr ( Aj B ) = Pr ( B j A) Pr (A) Pr (B ) By setting A = θ and B = y , we have: π ( θj y ) = f Y jθ ( y j θ ) π ( θ ) fY ( y ) Z f Y jθ ( y j θ ) π ( θ ) d θ where fY ( y ) = Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 48 / 246 2. Prior and posterior distributions De…nition (Posterior distribution) For one observation yi , the posterior distribution is the conditional distribution of θ given yi , de…ned as to be: π ( θ j yi ) = f Y i jθ ( yi j θ ) π (θ ) fY i (yi ) Z f Y i jθ ( yi j θ ) π (θ ) d θ where fY i (yi ) = Θ and Θ the support of the distribution of θ. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 49 / 246 2. Prior and posterior distributions Remark π ( θ j yi ) = f Y i jθ ( yi j θ ) π (θ ) f Y i ( yi ) The term f Y i jθ ( yi j θ ) corresponds to the likelihood of the observation yi : f Y i jθ ( yi j θ ) = Li (θ; yi ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 50 / 246 2. Prior and posterior distributions Remark The e¤ect of dividing by fY i (yi ) is to make π ( θ j yi ) a normalized probability distribution: integrating π ( θ j yi ) with respect to θ yields 1: Z π ( θ j yi ) d θ = 1 fY i (yi ) = 1 = since fY i (yi ) = Z f Y i jθ ( yi j θ ) Z Θ Z π (θ ) fY i (yi ) f Y i jθ ( yi j θ ) dθ π (θ ) d θ f Y i jθ ( yi j θ ) π (θ ) d θ. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 51 / 246 2. Prior and posterior distributions Discrete variable For a discrete random variable Yi , by setting A = θ and B = y , we have: π ( θ j yi ) = | {z } p ( yi j θ ) | {z } cond. probability posterior (pdf) where p ( yi ) = Christophe Hurlin (University of Orléans) p (yi ) | {z } π (θ ) | {z } prior (pdf) probability Z p ( yi j θ ) π (θ ) d θ Bayesian Econometrics June 26, 2014 52 / 246 2. Prior and posterior distributions De…nition (Prior predictive distribution) The term p (yi ) or fY i (yi ) is sometimes called the prior predictive distribution Z p (yi ) = p ( yi j θ ) π (θ ) d θ fY i (yi ) = Christophe Hurlin (University of Orléans) Z f Y jθ ( yi j θ ) π (θ ) d θ Bayesian Econometrics June 26, 2014 53 / 246 2. Prior and posterior distributions Remark 1 In general, we consider an i.i.d. sample Y = (Y1 , .., YN ) with a realisation (data) y = (y1 , .., yn ) , and not only one observation. 2 In this case, the posterior distribution can be written as function of the likelihood of the i.i.d. sample y = (y1 , .., yn ) . Ln (θ; y1 , .., yn ) = Christophe Hurlin (University of Orléans) n n i =1 i =1 ∏ Li (θ; yi ) = ∏ fY Bayesian Econometrics i (yi ) June 26, 2014 54 / 246 2. Prior and posterior distributions De…nition (Posterior distribution, sample) For sample (y1 , .., yn ) , the posterior distribution is the conditional distribution of θ given yi , de…ned as to be: π ( θ j y1 , ..yn ) = Ln (θ; y1 , .., yn ) π (θ ) fY 1 ,..,Y n (y1 , .., yn ) where Ln (θ; y1 , .., yn ) is the likelihood of the sample and fY 1 ,..,Y n (y1 , .., yn ) = Z Ln (θ; y1 , .., yn ) π (θ ) d θ Θ and Θ the support of the distribution of θ. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 55 / 246 2. Prior and posterior distributions Notations π ( θ j y1 , ..yn ) = Ln (θ; y1 , .., yn ) π (θ ) fY 1 ,..,Y n (y1 , .., yn ) For simplicity, we skip the notation (y1 , .., yn ) for the sample and we put only the generic term y : π ( θj y ) = Christophe Hurlin (University of Orléans) f Y jθ ( y j θ ) π ( θ ) Ln (θ; y ) π (θ ) = fY ( y ) fY ( y ) Bayesian Econometrics June 26, 2014 56 / 246 2. Prior and posterior distributions Remark π ( θ j y1 , ..yn ) = Ln (θ; y1 , .., yn ) π (θ ) fY 1 ,..Y n (y1 , .., yn ) In this setting, the data (y1 , ..yn ) are viewed as constants whose (marginal) distributions do not involve the parameters of interest θ. fY 1 ,..Y n (y1 , .., yn ) = constant Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 57 / 246 2. Prior and posterior distributions Remark (cont’d) As a consequence, the Bayes theorem Pr ( parametersj data) = Pr ( dataj parameters) Pr (parameters) Pr (data) implies that Pr ( parametersj data) ∝ Pr ( dataj parameters) | {z } {z } | Posterior Density Likelihood where the symbol "∝" means “is proportional to.” Christophe Hurlin (University of Orléans) Bayesian Econometrics Pr (parameters) | {z } Prior Density June 26, 2014 58 / 246 2. Prior and posterior distributions De…nition (Unormalised posterior distribution) The unormalised posterior distribution is the product of the likelihood of the sample and the prior distribution: π ( θ j y1 , ..yn ) ∝ Ln (θ; y1 , .., yn ) π (θ ) or with simplest notations π ( θ j y ) ∝ Ln (θ; y ) π (θ ) where the symbol "∝" means “is proportional to.” Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 59 / 246 2. Prior and posterior distributions Example (Posterior distribution) Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that Yi Be (θ ) and: fY i (yi ; θ ) = Pr (Yi = yi ) = θ yi (1 θ )1 yi We assume that the (uninformative) prior distribution for θ is an (continuous) uniform distribution over [0, 1]. Question: Write the pdf associated to the unormalised posterior distribution and the posterior distribution Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 60 / 246 2. Prior and posterior distributions Solution π ( θ j y1 , ..yn ) = Ln (θ; y1 , .., yn ) π (θ ) fY 1 ,..Y n (y1 , .., yn ) The sample (y1 , .., yn ) is i.i.d., so its likelihood is de…ned as to be: n Ln (θ; y1 , .., yn ) = ∏ fY i =1 So, we have: n i (yi ) = ∏ θ yi (1 Ln (θ; y1 .., yn ) = θ Σyi (1 Christophe Hurlin (University of Orléans) Bayesian Econometrics θ )1 yi i =1 θ ) Σ (1 yi ) June 26, 2014 61 / 246 2. Prior and posterior distributions Solution (cont’d) π ( θ j y1 , ..yn ) = Ln (θ; y1 , .., yn ) π (θ ) fY 1 ,..Y n (y1 , .., yn ) The (uninformative) prior distribution is U[0,1 ] θ with a pdf: π (θ ) = 1 0 if θ 2 [0, 1] otherwise Christophe Hurlin (University of Orléans) Bayesian Econometrics Source: wikipedia June 26, 2014 62 / 246 2. Prior and posterior distributions Solution (cont’d) π ( θ j y1 , ..yn ) ∝ Ln (θ; y1 , .., yn ) Ln (θ; y1 .., yn ) = θ Σyi (1 π (θ ) = 1 0 θ ) Σ (1 yi ) if θ 2 [0, 1] otherwise The unormalised posterior distribution is ( Σy θ i (1 Ln (θ; y1 , .., yn ) π (θ ) = 0 Christophe Hurlin (University of Orléans) π (θ ) Bayesian Econometrics θ ) Σ (1 yi ) if θ 2 [0, 1] otherwise June 26, 2014 63 / 246 2. Prior and posterior distributions Solution (cont’d) π ( θ j y1 , ..yn ) = Ln (θ; y1 , .., yn ) π (θ ) fY 1 ,..Y n (y1 , .., yn ) The joint density of (Y1 , .., Yn ) evaluated at (y1 , .., yn ) fY 1 ,..Y n (y1 , .., yn ) = Z1 f Y 1 ,..Y n jθ ( y1 , .., yn j θ ) π (θ ) d θ = Z1 Ln (θ; y1 , .., yn ) π (θ ) d θ 0 = 0 1 Z θ Σyi (1 θ ) Σ (1 yi ) dθ 0 Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 64 / 246 2. Prior and posterior distributions Solution (cont’d) π ( θ j y1 , ..yn ) = Ln (θ; y1 , .., yn ) π (θ ) fY 1 ,..Y n (y1 , .., yn ) Finally, we have π ( θ j y1 , ..yn ) = Z θ Σyi (1 1 0 θ Σyi (1 π ( θ j y1 , ..yn ) = 0 Christophe Hurlin (University of Orléans) θ ) Σ (1 θ) yi ) Σ (1 y i ) dθ if θ 2 [0, 1] if θ 2 / [0, 1] Bayesian Econometrics June 26, 2014 65 / 246 2. Prior and posterior distributions Example (Posterior distribution) Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that Yi Be (θ ) with θ = 0.3. We assume that the (uninformative) prior distribution for θ is an uniform distribution over [0, 1]. Question: Write a Matlab code in order (i) to generate a sample of size n = 100, (ii) to display the pdf associated to the prior and the posterior distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 66 / 246 2. Prior and posterior distributions 1.6 x 10 -28 Unormalised Posterior Distribution 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 Christophe Hurlin (University of Orléans) 0.2 0.4 θ 0.6 Bayesian Econometrics 0.8 1 June 26, 2014 67 / 246 2. Prior and posterior distributions Posterior Distribution 9 8 7 6 5 4 3 2 1 0 0 Christophe Hurlin (University of Orléans) 0.2 0.4 θ 0.6 Bayesian Econometrics 0.8 1 June 26, 2014 68 / 246 2. Prior and posterior distributions Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 69 / 246 2. Prior and posterior distributions Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 70 / 246 2. Prior and posterior distributions Example (Posterior distribution) Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that Yi Be (θ ) and: fY i (yi ; θ ) = Pr (Yi = yi ) = θ yi (1 θ )1 yi We assume that the prior distribution for θ is a Beta distribution B (α, β) with a pdf: π (θ; γ) = Γ (α + β) α θ Γ (α) Γ ( β) 1 (1 θ )β 1 α, β > 0 θ 2 [0, 1] with γ = (α, β)> the vector of hyperparameters. Question: Write the pdf associated to the unormalised posterior distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 71 / 246 2. Prior and posterior distributions Solution The likelihood of the sample (y1 , .., yn ) is: θ ) Σ (1 Ln (θ; y1 .., yn ) = θ Σyi (1 yi ) The prior distribution is: π (θ; γ) = Γ (α + β) α θ Γ (α) Γ ( β) 1 (1 θ )β 1 So the unormalised posterior distribution is: π ( θ j y1 , ..yn ) ∝ Ln (θ; y1 , .., yn ) π (θ ) Γ (α + β) α 1 ∝ θ (1 θ ) β Γ (α) Γ ( β) Christophe Hurlin (University of Orléans) Bayesian Econometrics 1 θ Σyi (1 θ ) Σ (1 June 26, 2014 yi ) 72 / 246 2. Prior and posterior distributions Solution (cont’d) π ( θ j y1 , ..yn ) ∝ Γ (α + β) α θ Γ (α) Γ ( β) 1 θ )β (1 1 θ Σyi (1 θ ) Σ (1 yi ) or equivalently: π ( θ j y1 , ..yn ) ∝ Γ (α + β) α θ Γ (α) Γ ( β) 1 +Σyi (1 θ )β 1 + Σ (1 y i ) Note that the term Γ (α + β) / (Γ (α) Γ ( β)) does not depend on θ. So, the unormalised posterior can also be written as: π ( θ j y1 , ..yn ) ∝ θ (α+Σyi ) Christophe Hurlin (University of Orléans) 1 (1 θ ) ( β + Σ (1 Bayesian Econometrics yi )) 1 June 26, 2014 73 / 246 2. Prior and posterior distributions Solution (cont’d) π ( θ j y1 , ..yn ) ∝ θ (α+Σyi ) 1 (1 θ ) ( β + Σ (1 yi )) 1 Remind that the pdf of a Beta B (α, β) distribution is Γ (α + β) α θ Γ (α) Γ ( β) 1 (1 θ )β 1 The posterior distribution is in the form of a beta distribution with parameters n α1 = α + ∑ yi n β1 = β + n i =1 ∑ yi i =1 This is an example of a conjugate prior, where the posterior distribution is in the same family as the prior distribution Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 74 / 246 2. Prior and posterior distributions De…nition (Conjugate prior) A conjugate prior is such that the posterior distribution is in the same family as the prior distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 75 / 246 2. Prior and posterior distributions Example (Posterior distribution) Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that Yi Be (θ ) and: fY i (yi ; θ ) = Pr (Yi = yi ) = θ yi (1 θ )1 yi We assume that the prior distribution for θ is a Beta distribution β (α, β) Question: Determine the mean of the posterior distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 76 / 246 2. Prior and posterior distributions Solution We know that: π ( θ j y1 , ..yn ) ∝ θ α1 1 (1 n n α1 = α + ∑ yi 1 θ ) β1 ∑ yi β1 = β + n i =1 i =1 A simple way to normalise this posterior distribution consists in writing: π ( θ j y1 , ..yn ) = Γ ( α 1 + β 1 ) α1 θ Γ ( α1 ) Γ ( β1 ) 1 (1 θ ) β1 1 This expression corresponds to the pdf of B (α1 , β1 ) distribution and it is normalised to 1 by de…nition: Z1 0 Christophe Hurlin (University of Orléans) π ( θ j y1 , ..yn ) d θ = 1 Bayesian Econometrics June 26, 2014 77 / 246 2. Prior and posterior distributions Solution (cont’d) π ( θ j y1 , ..yn ) = Γ ( α 1 + β 1 ) α1 θ Γ ( α1 ) Γ ( β1 ) 1 θ ) β1 (1 n n α1 = α + ∑ yi 1 β1 = β + n i =1 ∑ yi i =1 Using the properties of the B (α1 , β1 ) distribution, we have: E ( θ j y1 , ..yn ) = Christophe Hurlin (University of Orléans) α1 α1 + β1 = α + ∑N i =1 yi N α + ∑i =1 yi + β + n = α + ∑N i =1 yi α+β+n Bayesian Econometrics ∑N i =1 yi June 26, 2014 78 / 246 2. Prior and posterior distributions Solution (cont’d) E ( θ j y1 , ..yn ) = α + ∑N i =1 yi α+β+n We can express the mean as a function of the MLE estimator, y n = n 1 ∑N i =1 yi , as follows: E ( θ j y1 , ..yn ) = | {z } posterior mean Christophe Hurlin (University of Orléans) α+β α+β+n α + α+β | {z } prior mean Bayesian Econometrics n α+β+n yn |{z} MLE June 26, 2014 79 / 246 2. Prior and posterior distributions Remark E ( θ j y1 , ..yn ) = | {z } α+β α+β+n posterior mean 1 α + α+β | {z } n α+β+n prior mean yn |{z} MLE If n ! ∞, then the weight on the prior mean approaches zero, and the weight on the MLE approaches one, implying lim E ( θ j y1 , ..yn ) = y n n !∞ 2 If the sample size is very small, n ! 0, then we have lim E ( θ j y1 , ..yn ) = n !0 Christophe Hurlin (University of Orléans) Bayesian Econometrics α α+β June 26, 2014 80 / 246 2. Prior and posterior distributions Example (Posterior distribution) Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that Yi Be (θ ) and ∑N i =1 yi = 3 if n = 10 ∑N i =1 yi = 15 if n = 50 We assume that the prior distribution for θ is a Beta distribution β (2, 2) Question: Write a Matlab code to plot (i) the prior, (ii) the posterior distribution and the (iii) the likelihood in the two cases. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 81 / 246 2. Prior and posterior distributions n= 10 x 10 4 -3 Likelihood Prior Posterior 2 2 1 0 0 Christophe Hurlin (University of Orléans) 0.2 0.4 θ 0.6 Bayesian Econometrics 0.8 1 0 June 26, 2014 82 / 246 2. Prior and posterior distributions n= 50 x 10 2 -11 Likelihood Prior Posterior 1 2 1 0 0 Christophe Hurlin (University of Orléans) 0.2 0.4 θ 0.6 Bayesian Econometrics 0.8 1 0 June 26, 2014 83 / 246 2. Prior and posterior distributions Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 84 / 246 2. Prior and posterior distributions Key concepts of Section 2 1 Frequentist versus subjective probability 2 Prior distribution 3 Hyperparameters 4 Prior predictive distribution 5 Posterior distribution 6 Unormalised posterior distribution 7 Conjugate prior Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 85 / 246 Section 3 Posterior distributions and inference Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 86 / 246 3. Posterior distributions and inference Objectives The objective of this section are the following: 1 Generalise the bayesian approach to a vector of parameters 2 Generalise the bayesian approach to a regression model 3 Introduce the Bayesian updating mechanism 4 Study the posterior distribution in the case of a large sample 5 Discuss the inference in a Bayesian framework Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 87 / 246 3. Posterior distributions and inference The concept of posterior distribution can be generalized to: 1 2 a case with a vector of parameters θ = (θ 1 , .., θ d )> . a model with exogenous variable and/or lagged endogenous variables (linear regression model, time series models, DSGE etc.). Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 88 / 246 Sub-Section 3.1 Generalisation to a vector of parameters Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 89 / 246 3. Posterior distributions and inference 3.1. Generalisation to a vector of parameters Vector of parameters Consider a model/variable with a pdf/pmf that depends on a vector of parameters θ = (θ 1 , .., θ d )> . The previous de…nitions of likelihood, prior, and posterior distributions still apply. But they are now, respectively, the joint prior distribution and joint posterior distribution of the multivariate random variable θ From the joint distributions, we may derive marginal and conditional distributions. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 90 / 246 3. Posterior distributions and inference 3.1. Generalisation to a vector of parameters De…nition (Marginal posterior distribution) The marginal posterior distribution of θ 1 can be found by integrating out the remainder of the parameters from the joint posterior distribution: π ( θ1 j y ) = Christophe Hurlin (University of Orléans) Z π ( θ 1 , .., θ d j y ) d θ 2 ..d θ d Bayesian Econometrics June 26, 2014 91 / 246 3. Posterior distributions and inference 3.1. Generalisation to a vector of parameters De…nition (Conditional posterior distribution) The conditional posterior distribution of θ 1 is de…ned as to be: π ( θ 1 j θ 2 , .., θ d , y ) = π ( θ 1 , .., θ d j y ) π ( θ 2 , .., θ d j y ) where the denominator on the right-hand side is the marginal posterior distribution of (θ 2 , .., θ d ) obtained by integrating θ 1 from the joint distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 92 / 246 3. Posterior distributions and inference 3.1. Generalisation to a vector of parameters Remark In most applications, the marginal distribution of a parameter is more useful than its conditional distribution because the marginal takes into account the uncertainty over the values of the remaining parameters, while the conditional sets them at particular values. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 93 / 246 3. Posterior distributions and inference 3.1. Generalisation to a vector of parameters Example (Marginal posterior distribution) Consider the multinomial distribution Mn (.), which generalizes the Bernoulli example discussed above. In this model, each trial, assumed independent of the other trials, results in one of d outcomes, labeled 1, 2,..,d with probabilities θ 1 , .., θ d where ∑di=1 θ i = 1. When the experiment is repeated n times and outcome i arises yi times, the likelihood function is Ln ( θ j y1 , .., yn ) = θ y11 θ y22 ...θ ydd Christophe Hurlin (University of Orléans) Bayesian Econometrics with ∑di=1 yi = n June 26, 2014 94 / 246 3. Posterior distributions and inference 3.1. Generalisation to a vector of parameters Example (cont’d) Consider a Dirichlet prior distribution (generalisation of the Beta) with: π (θ) = Γ ∑di=1 αi ∏di=1 Γ ( αi ) θ 1α1 1 α2 1 θ 2 ...θ dαd 1 , αi > 0, ∑di=1 θ i = 1 Question: Determine the marginal posterior distribution of θ 1 . Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 95 / 246 3. Posterior distributions and inference 3.1. Generalisation to a vector of parameters Solution Following the previous procedure, we …nd the (unormalised) posterior joint distribution given the data y = (y1 , .., yn ) : π ( θj y1 , ..yn ) ∝ Ln ( θj y1 , .., yn ) ∝ θ y11 θ y22 ...θ ydd π (θ) Γ ∑di=1 αi ∏di=1 Γ ( αi ) θ 1α1 1 α2 1 θ 2 ...θ dαd 1 Or equivalently: π ( θj y1 , ..yn ) ∝ θ y11 +α1 1 y 2 + α2 1 θ2 ...θ dyd +αd 1 Remark: the Dirichlet prior is a conjugate prior for the multinomial model. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 96 / 246 3. Posterior distributions and inference 3.1. Generalisation to a vector of parameters Solution (cont’d) The marginal distribution of θ 1 is de…ned as to be: π ( θ 1 j y1 , .., yn ) = Z1 0 π ( θj y1 , ..yn ) d θ 2 ..d θ d In this context, we have: π ( θ 1 j y1 , .., yn ) = Γ (∑pi=1 αi + yi ) y1 +α1 θ p ∏i =1 Γ (αi + yi ) 1 1 Z1 θ y22 +α2 1 ...θ dyd +αd 1 d θ 2 ..d θ d 0 But, we can also use some general results about the Dirichlet distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 97 / 246 3. Posterior distributions and inference 3.1. Generalisation to a vector of parameters Solution (cont’d) De…nition (Dirichlet distribution) The Dirichlet distribution generalizes the Beta distribution. Let x = (x1 , .., xp ) with 0 xi 1, ∑pi=1 xi = 1. Then x D (α1 , .., αp ) if f (x; α1 , .αp ) = Γ (∑pi=1 αi ) α1 x p ∏ i =1 Γ ( α i ) 1 1 α2 1 x2 ...xdαd 1 , αi > 0 Marginally, we have xi Christophe Hurlin (University of Orléans) B α i , ∑ k 6 =i α k Bayesian Econometrics June 26, 2014 98 / 246 3. Posterior distributions and inference 3.1. Generalisation to a vector of parameters Solution (cont’d) π ( θj y1 , ..yn ) ∝ θ y11 +α1 θj y1 , ..yn 1 y 2 + α2 1 θ2 ...θ dyd +αd 1 D (y1 + α1 , .., yp + αp ) The marginal posterior distribution of θ 1 is a Beta distribution: θ 1 j y1 , ..yn Christophe Hurlin (University of Orléans) B (y1 + α1 , ∑pi=2 yi + αi ) Bayesian Econometrics June 26, 2014 99 / 246 Sub-Section 3.2 Generalisation to a model Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 100 / 246 3. Posterior distributions and inference 3.2. Generalisation to a model Remark: We can also aim at estimating the parameters of a model (with dependent and explicative variables) such that: y = g (x; θ ) + ε where θ denotes the vector or parameters, X a set of explicative variables, ε and error term and g (.) the link function. In this case, we generally consider the conditional distribution of Y given X , which is equivalent to unconditional distribution of the error term ε : YjX Christophe Hurlin (University of Orléans) D () ε Bayesian Econometrics D June 26, 2014 101 / 246 3. Posterior distributions and inference 3.2. Generalisation to a model Notations (model) Let us consider two continuous random variables Y and X We assume that Y has a conditional distribution given X = x with a pdf denoted f Y jx (y ; θ ) , for y 2 R θ = (θ 1 ..θ K )| is a K that θ 2 Θ RK . 1 vector of unknown parameters. We assume Let us consider a sample fX1 , YN gni=1 of i.i.d. random variables and a realisation fx1 , yN gni=1 . Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 102 / 246 3. Posterior distributions and inference 3.2. Generalisation to a model De…nition (Conditional likelihood function) The (conditional) likelihood function of the i.i.d. sample fXi , Yi gni=1 is de…ned to be: n Ln (θ; y j x ) = ∏ f Y jX ( yi j xi ; θ ) i =1 where f Y jX ( yi j xi ; θ ) denotes the conditional pdf of Yi given Xi . Remark: The conditional likelihood function is the joint conditional density of the data given θ. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 103 / 246 3. Posterior distributions and inference 3.2. Generalisation to a model Example (Linear Regression Model) Consider the following linear regression model: yi = Xi> β + εi where Xi is a K 1 vector of random variables and β = ( β1 ..βK )> a K 1 vector of parameters. We assume that the εi are i.i.d. with εi N 0, σ2 . Then, the conditional distribution of Yi given Xi = xi is: Yi j xi N xi> β, σ2 1 Li (θ; y j x) = f Y jx ( yi j xi ; θ) = p exp σ 2π where θ = β> σ2 > is K + 1 Christophe Hurlin (University of Orléans) yi xi> β 2σ2 2! 1 vector. Bayesian Econometrics June 26, 2014 104 / 246 3. Posterior distributions and inference 3.2. Generalisation to a model Example (Linear Regression Model, cont’d) Then, if we consider an i.i.d. sample fyi , xi gni=1 , the corresponding conditional (log-)likelihood is de…ned to be: Ln (θ; y j x) = = n n i =1 i =1 ∏ f Y jX ( yi j xi ; θ) = ∏ σ2 2π `n (θ; y j x) = Christophe Hurlin (University of Orléans) n/2 n ln σ2 2 exp 1 2σ2 yi 1 p exp σ 2π n ∑ yi xi> β i =1 n ln (2π ) 2 Bayesian Econometrics 1 2σ2 n ∑ yi 2 ! xi> β 2σ2 xi> β 2! 2 i =1 June 26, 2014 105 / 246 3. Posterior distributions and inference 3.2. Generalisation to a model Remark: Given this principle, we can derive the (conditional) likelihood and the log-likelihood functions associated to a speci…c sample for any type of econometric model in which the conditional distribution of the dependent variable is known. Dichotomic models: probit, logit models etc. Censored regression models: Tobit etc. Times series models: AR, ARMA, VAR etc. GARCH models .... Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 106 / 246 2. Prior and posterior distributions 3.2. Generalisation to a model De…nition (Posterior distribution, model) For the sample fyi , xi gni=1 , the posterior distribution is the conditional distribution of θ given yi , de…ned as to be: Ln (θ; y j x ) π (θ ) f Y jX ( y j x ) π ( θj y , x ) = where Ln (θ; y j x ) is the likelihood of the sample and f Y jX ( y j x ) = Z Θ Ln (θ; y j x ) π (θ ) d θ and Θ the support of the distribution of θ. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 107 / 246 Sub-Section 3.3 Bayesian updating Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 108 / 246 3. Posterior distributions and inference 3.3. Bayesian updating Bayesian updating A very attractive feature of Bayesian inference is the way in which posterior distributions are updated as new information becomes available. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 109 / 246 3. Posterior distributions and inference 3.3. Bayesian updating Bayesian updating As usual for the …rst observation y1 , we have π ( θ j y1 ) ∝ f ( y1 j θ ) π (θ ) Next, suppose that a new data y2 is obtained, and we wish to compute the posterior distribution given the complete data set π ( θ j y1 , y2 ) : π ( θ j y1 , y2 ) ∝ f ( y1 , y2 j θ ) π (θ ) The posterior can be rewritten as: π ( θ j y1 , y2 ) ∝ f ( y2 j y1 , θ ) f ( y1 j θ ) π (θ ) ∝ f ( y2 j y1 , θ ) π ( θ j y1 ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 110 / 246 3. Posterior distributions and inference 3.3. Bayesian updating De…nition (Bayesian updating) The posterior distribution is updated as new information becomes available as follows: π ( θ j y1 , .., yn ) ∝ f ( yn j yn 1 , θ ) π ( θ j y1 , .., yn 1 ) If the observations yn and yn 1 are independent (i.i.d. sample), then f ( yn j yn 1 , θ ) = f ( yn j θ ) and π ( θ j y1 , .., yn ) ∝ f ( yn j θ ) π ( θ j y1 , .., yn Christophe Hurlin (University of Orléans) Bayesian Econometrics 1) June 26, 2014 111 / 246 3. Posterior distributions and inference 3.3. Bayesian updating Example (Bayesian updating) Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that Yi Be (θ ) and: fY i (yi ; θ ) = Pr (Yi = yi ) = θ yi (1 θ )1 yi We assume that the prior distribution for θ is a Beta distribution β (α, β) with a pdf: π (θ; γ) = Γ (α + β) α θ Γ (α) Γ ( β) 1 (1 θ )β 1 α, β > 0 θ 2 [0, 1] with γ = (α, β)> the vector of hyperparameters. Question: Write the posterior of θ given (y1 , y2 ) as a function of the posterior given y1 . Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 112 / 246 3. Posterior distributions and inference 3.3. Bayesian updating Solution The likelihood of (y1 , y2 ) is: L (θ; y1 , y2 ) = θ y1 +y2 (1 θ )2 y1 y2 (1 θ )β The prior distribution is: π (θ; γ) = Γ (α + β) α θ Γ (α) Γ ( β) 1 1 So the unormalised posterior distribution is: π ( θ j y1 , y2 ) ∝ L (θ; y1 , y2 ) π (θ ) Γ (α + β) α 1 ∝ θ (1 θ ) β Γ (α) Γ ( β) Christophe Hurlin (University of Orléans) Bayesian Econometrics 1 θ y 1 +y 2 (1 θ )2 y1 y2 June 26, 2014 113 / 246 3. Posterior distributions and inference 3.3. Bayesian updating Solution (cont’d) π ( θ j y1 , y2 ) ∝ Γ (α + β) α θ Γ (α) Γ ( β) 1 (1 θ )β 1 θ )2 θ y 1 +y 2 ( 1 y1 y2 or equivalently: π ( θ j y1 , y2 ) ∝ Γ ( α + β ) α +y 1 +y 2 θ Γ (α) Γ ( β) 1 (1 θ ) β +2 y1 y2 1 So, θ j y1 , y2 Christophe Hurlin (University of Orléans) B (α + y1 + y2 , β + 2 Bayesian Econometrics y1 y2 ) June 26, 2014 114 / 246 3. Posterior distributions and inference 3.3. Bayesian updating Solution (cont’d) π ( θ j y1 , y2 ) ∝ Γ ( α + β ) α +y 1 +y 2 θ Γ (α) Γ ( β) 1 (1 B (α + y1 + y2 , β + 2 θ j y1 , y2 θ ) β +2 y1 y1 y2 1 y2 ) Given the observation y1 , we have: π ( θ j y1 ) ∝ Γ ( α + β ) α +y 1 θ Γ (α) Γ ( β) θ j y1 Christophe Hurlin (University of Orléans) 1 (1 B (α + y1 , β + 1 Bayesian Econometrics θ ) β +1 y1 1 y1 ) June 26, 2014 115 / 246 3. Posterior distributions and inference 3.3. Bayesian updating Solution (cont’d) π ( θ j y1 , y2 ) ∝ Γ ( α + β ) α +y 1 +y 2 θ Γ (α) Γ ( β) π ( θ j y1 ) ∝ Γ ( α + β ) α +y 1 θ Γ (α) Γ ( β) 1 1 (1 (1 θ ) β +2 θ ) β +1 y1 y2 1 y1 1 The updating mechanism is given by: π ( θ j y1 , y2 ) ∝ π ( θ j y1 ) θ y2 (1 θ )1 y2 or equivalently π ( θ j y1 , y2 ) ∝ π ( θ j y1 ) fY 2 (y2 ; θ ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 116 / 246 Sub-Section 3.4 Large sample Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 117 / 246 3. Posterior distributions and inference 3.4. Large sample Large samples Although all statistical results for Bayesian estimators are necessarily "…nite sample" (they are conditioned on the sample data), it remains of interest to consider how the estimators behave in large samples. What is the the behavior of the posterior distribution when n is large? π ( θ j y1 , .., yn ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 118 / 246 3. Posterior distributions and inference 3.4. Large sample Fact (Large sample) Greenberg (2008) summarises the behavior of the posterior distribution when n is large as follows: (1) the prior distribution plays a relatively small role in determining the posterior distribution, (2) the posterior distribution converges to a degenerate distribution at the true value of the parameter, (3) the posterior distribution is approximately normally distributed with mean b θ, the MLE of θ. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 119 / 246 3. Posterior distributions and inference 3.4. Large sample Large samples What is the intuition of the two …rst results? 1 the prior distribution plays a relatively small role in determining the posterior distribution 2 the posterior distribution converges to a degenerate distribution at the true value of the parameter Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 120 / 246 3. Posterior distributions and inference 3.4. Large sample De…nition (Log-Likelihood Function) The log-likelihood function is de…ned to be: `n : Θ Rn ! R n (θ; y1 , .., yn ) 7 ! `n (θ; y1 , .., yn ) = ln (Ln (θ; y1 , .., yn )) = Christophe Hurlin (University of Orléans) Bayesian Econometrics ∑ ln fY (yi ; θ ) i =1 June 26, 2014 121 / 246 3. Posterior distributions and inference 3.4. Large sample Large samples Introduce the mean log likelihood contribution (cf. chapter 3): n `n (θ; y ) = ∑ ln fY (yi ; θ ) = n` (θ; y ) i =1 The posterior distribution can be written as π ( θ j y ) ∝ exp n` (θ; y ) | {z } depends on n Christophe Hurlin (University of Orléans) π (θ ) | {z } does not depend on n Bayesian Econometrics June 26, 2014 122 / 246 3. Posterior distributions and inference 3.4. Large sample Large samples π ( θ j y ) ∝ exp n` (θ; y ) {z } | depends on n π (θ ) | {z } does not depend on n For large n, the exponential term dominates π (θ ): The prior distribution will play a relatively smaller role than do the data (likelihood function), when the sample size is large. Conversely, the prior distribution has relatively greater weight when n is small Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 123 / 246 3. Posterior distributions and inference 3.4. Large sample If we denote the true value of θ by θ 0 , it can be shown that lim n` (θ; y ) = n` (θ 0 ; y ) n !∞ Then, we have for n large: π ( θ j y ) ∝ exp n` (θ 0 ; y ) π (θ ) | {z } 8θ 2 Θ does not depend on θ Whatever the value of θ, the value of π ( θ j y ) tends to a constant times the prior. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 124 / 246 3. Posterior distributions and inference 3.4. Large sample Example (Large sample) Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that Yi Be (θ ) with θ = 0.3 and: fY i (yi ; θ ) = Pr (Yi = yi ) = θ yi (1 θ )1 yi We assume that the prior distribution for θ is a Beta distribution B (2, 2) . Question: Write a Matlab code to illustrate the changes in the posterior distribution with n. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 125 / 246 3. Posterior distributions and inference 3.4. Large sample An animation is worth 1,000,000 words... Click me! Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 126 / 246 3. Posterior distributions and inference 3.4. Large sample Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 127 / 246 Sub-Section 3.5 Inference Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 128 / 246 3. Posterior distributions and inference 3.5. Inference The output of the Bayesian estimation is the posterior distribution π ( θj y ) 1 In some particular case, the posterior distribution is a standard distribution (Beta, normal etc..) and its pdf has an analytic form. 2 In other case, the posterior distribution is get from numerical simulations (cf. section 5). Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 129 / 246 3. Posterior distributions and inference 3.5. Inference Whatever the case (analytical or numerical), the outcome of the Bayesian estimation procedure may be: 1 2 The graph of the posterior density π ( θ j y ) for all values of θ 2 Θ Some particular moments (expectation, variance etc..) or quantiles of this distribution Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 130 / 246 3. Posterior distributions and inference 3.5. Inference Source: Gerali et al. (2014), Credit and Banking in a DSGE Model of the Euro Area, JMCB, 42(6) 107-141. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 131 / 246 3. Posterior distributions and inference 3.5. Inference Source: Gerali et al. (2014), Credit and Banking in a DSGE Model of the Euro Area, JMCB, 42(6) 107-141. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 132 / 246 3. Posterior distributions and inference 3.5. Inference But, we may also be interested in estimating one parameter of the model. The Bayesian approach to this problem uses the idea of a loss function θ, θ L b where b θ is the Bayesian estimator (cf. chapter 2). Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 133 / 246 3. Posterior distributions and inference 3.5. Inference Example (Absolute loss function) The absolute loss function is de…ned to be: θ θ, θ = b L b θ Example (Quadratique loss function) The quadratic loss function is de…ned to be: L b θ, θ = b θ Christophe Hurlin (University of Orléans) Bayesian Econometrics 2 θ June 26, 2014 134 / 246 3. Posterior distributions and inference 3.5. Inference De…nition (Bayes estimator) The Bayes estimator of θ is the value of b θ that minimizes the expected value of the loss, where the expectation is taken over the posterior distribution of θ; that is, b θ is chosen to minimize E L b θ, θ Christophe Hurlin (University of Orléans) = Z L b θ, θ π ( θ j y ) d θ Bayesian Econometrics June 26, 2014 135 / 246 3. Posterior distributions and inference 3.5. Inference Idea The idea is to minimise the average loss whatever the possible value of θ E L b θ, θ Christophe Hurlin (University of Orléans) = Z L b θ, θ π ( θ j y ) d θ Bayesian Econometrics June 26, 2014 136 / 246 3. Posterior distributions and inference 3.5. Inference Under quadratic loss, we minimize E L b θ, θ = Z b θ 2 θ π ( θj y ) d θ By di¤erentiating the function with respect to b θ and setting the derivative equal to zero Z b 2 θ θ π ( θj y ) d θ = 0 or equivalently b θ= Christophe Hurlin (University of Orléans) Z θ π ( θj y ) d θ = E ( θj y ) Bayesian Econometrics June 26, 2014 137 / 246 3. Posterior distributions and inference 3.5. Inference De…nition (Bayes estimator, quadratic loss) For a quadratic loss function, the optimal Bayes estimator is the expectation of the posterior distribution b θ = E ( θj y ) = Christophe Hurlin (University of Orléans) Z θ π ( θj y ) d θ Bayesian Econometrics June 26, 2014 138 / 246 3. Posterior distributions and inference 3.5. Inference Source: Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 139 / 246 3. Posterior distributions and inference 3.5. Inference In addition to reporting a point estimate of a parameter θ, it is often useful to report an interval estimate of the form Pr (θ L θ θU ) = 1 α Bayesians call such intervals credibility intervals (or Bayesian con…dence intervals) to distinguish them from a quite di¤erent concept that appears in frequentist statistics, the con…dence interval (cf. chapter 2). Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 140 / 246 3. Posterior distributions and inference 3.5. Inference De…nition (Bayesian con…dence interval) If the posterior distribution is unimodal, then a Bayesian con…dence interval or credibility interval on the value of θ, is given by the two values θ L and θ U such that: Pr (θ L θ θU ) = 1 α where α is the level of risk. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 141 / 246 3. Posterior distributions and inference 3.5. Inference For a Bayesian, values of θ L and θ L can be determined to obtain the desired probability from the posterior distribution. If more than one pair is possible, the pair that results in the shortest interval may be chosen; such a pair yields the highest posterior density interval (h.p.d.). Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 142 / 246 3. Posterior distributions and inference 3.5. Inference De…nition (Highest posterior density interval) The highest posterior density interval (hpd) is the smallest region H such that Pr (θ 2 H ) = 1 α where α is the level of risk. If the posterior distribution is multimodal, the hpd may be disjoint. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 143 / 246 3. Posterior distributions and inference 3.5. Inference Source: Colletaz et Hurlin (2005), Modèles non linéaires et prévision, Rapport pour l’Institut pour la Recherche CDC. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 144 / 246 3. Posterior distributions and inference 3.5. Inference Another basic issue in statistical inference is the prediction of new data values. De…nition (Forecasting) The general form of the pdf/pmf of Yn +1 given y1 , .., yn is: f ( yn +1 j y ) = Z f ( yn +1 j θ, y ) π ( θ j y ) d θ where π ( θ j y ) is the posterior distribution of θ. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 145 / 246 3. Posterior distributions and inference 3.5. Inference Forecasting f ( yn +1 j y ) = 1 Z f ( yn +1 j θ, y ) π ( θ j y ) d θ If Y is a discrete variable, this formula gives the conditional probability of Yn +1 = yn +1 given y1 , .., yn : Pr (Yn +1 = yn +1 j y ) = 2 Z p ( yn +1 j θ, y ) π ( θ j y ) d θ If Y is a continuous variable, this formula gives the conditional denisty of Yn +1 given y1 , .., yn . In this case, we can compute the expected value of Yn +1 as: E ( yn +1 j y ) = Christophe Hurlin (University of Orléans) Z f ( yn +1 j y ) yn +1 dyn +1 Bayesian Econometrics June 26, 2014 146 / 246 3. Posterior distributions and inference 3.5. Inference Forecasting f ( yn +1 j y ) = Z f ( yn +1 j θ, y ) π ( θ j y ) d θ If Yn +1 and Y1 , .., Yn are independent, then f ( yn +1 j θ ) and we have: f ( yn +1 j y ) = Christophe Hurlin (University of Orléans) Z f ( yn +1 j θ ) π ( θ j y ) d θ Bayesian Econometrics June 26, 2014 147 / 246 3. Posterior distributions and inference 3.5. Inference Example (Forecasting) Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that Yi Be (θ ). We assume that the prior distribution for θ is a Beta distribution B (α, β) . We want to forecast the value of Yn +1 given the realisations (y1 , .., yn ) . Question: Determine the probability Pr ( Yn +1 = 1j y1 , .., yn ). Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 148 / 246 3. Posterior distributions and inference 3.5. Inference Solution In this example, the trials are independent, so Yn +1 is independent of (Y1 , .., Yn ) . Pr ( Yn +1 = 1j y ) = Z Pr ( Yn +1 = 1j θ, y ) π ( θ j y ) d θ = Z Pr ( Yn +1 = The posterior distribution of θ is is in the form of a beta distribution: π ( θj y ) = with Γ ( α 1 + β 1 ) α1 θ Γ ( α1 ) + Γ ( β1 ) (1 n α1 = α + ∑ yi θ ) β1 1 n β1 = α + n i =1 Christophe Hurlin (University of Orléans) 1 Bayesian Econometrics ∑ yi i =1 June 26, 2014 149 / 246 3. Posterior distributions and inference 3.5. Inference Solution (cont’d) Pr ( Yn +1 = 1j y ) = π ( θj y ) = Z Pr ( Yn +1 = 1j θ ) π ( θ j y ) d θ Γ ( α 1 + β 1 ) α1 θ Γ ( α1 ) + Γ ( β1 ) 1 (1 1 θ ) β1 Since Pr ( Yn +1 = 1j θ ) = θ, we have Pr ( Yn +1 = 1j y ) = = Christophe Hurlin (University of Orléans) Γ ( α1 + β1 ) Γ ( α1 ) + Γ ( β1 ) Z1 θ θ α1 Γ ( α1 + β1 ) Γ ( α1 ) + Γ ( β1 ) Z1 θ α1 (1 Bayesian Econometrics 1 θ ) β1 (1 1 dθ 0 θ ) β1 1 dθ 0 June 26, 2014 150 / 246 3. Posterior distributions and inference 3.5. Inference Solution (cont’d) Pr ( Yn +1 Γ ( α1 + β1 ) = 1j y ) = Γ ( α1 ) + Γ ( β1 ) Z1 θ α1 (1 θ ) β1 1 dθ 0 We admit that: Z1 θ α1 (1 θ ) β1 0 1 dθ = Γ ( α1 + 1) Γ ( β1 ) Γ ( α1 + β1 + 1) So, we have Pr ( Yn +1 = 1j y ) = Christophe Hurlin (University of Orléans) Γ ( α1 + β1 ) Γ ( α1 ) + Γ ( β1 ) Bayesian Econometrics Γ ( α1 + 1) Γ ( β1 ) Γ ( α1 + β1 + 1) June 26, 2014 151 / 246 3. Posterior distributions and inference 3.5. Inference Solution (cont’d) Pr ( Yn +1 = 1j y ) = Since Γ (p ) = (p 1) Γ (p Pr ( Yn +1 = 1j y ) = = Christophe Hurlin (University of Orléans) Γ ( α1 + β1 ) Γ ( α1 ) + Γ ( β1 ) Γ ( α1 + 1) Γ ( β1 ) Γ ( α1 + β1 + 1) 1) , we have: Γ ( α1 + β1 ) Γ ( α1 ) + Γ ( β1 ) α1 α1 + β1 Bayesian Econometrics Γ ( α1 ) α1 Γ ( β1 ) Γ ( α1 + β1 ) ( α1 + β1 ) June 26, 2014 152 / 246 3. Posterior distributions and inference 3.5. Inference Solution (cont’d) Pr ( Yn +1 = 1j y ) = α + ∑ni=1 yi α1 = α1 + β1 α+β+n So, we found that Pr ( Yn +1 = 1j y ) = E ( θ j y ) = α + ∑ni=1 yi α+β+n The estimate of Pr ( Yn +1 = 1j y ) is the mean of the posterior distribution of θ. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 153 / 246 3. Posterior distributions and inference Key concepts of Section 3 1 Marginal and conditional posterior distribution 2 Bayesian updating 3 Bayes estimator 4 Bayesian con…dence interval 5 Highest posterior density interval 6 Bayesian prediction Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 154 / 246 Section 4 Applications Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 155 / 246 5. Applications Objectives The objective of this section are the following: 1 To discuss the Bayesian estimation of VAR models 2 To propose various priors for this type of models 3 To introduce the issue of numerical simulations of the posterior Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 156 / 246 5. Applications Consider a typical VAR(p ) Y t = B1 Y t Yt is n εt is a n 1 + B2 Y t 2 + .. + Bp Yt t = 1, .., T 1 vector of error terms i.i.d. with IIN 0 , Σ n 1 n n 1 vector of exogenous variables Bi for i = 1, .., p is a n D is a n + Dzt + εt 1 vector of endogenous variables εt zt is a d p n matrix of parameters d matrix of parameters Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 157 / 246 5. Applications Classical estimation of the parameters B1 , .., Bp , D, Σ may yield imprecisely estimated relations that …t the data well only because of the large number of variables included: problem known as over…tting The number of parameters to be estimated n (np + d + (n grows geometrically with the number of variables (n) and proportionally with the number of lags (p ). 1) /2) When the number of parameters is relatively high and the sample information is relatively loose (macro-data), it is likely that the estimates are in‡uenced by noise as opposed to signal Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 158 / 246 5. Applications A Bayesian approach to VAR estimation was originally advocated by Litterman (1980) as solution to the over…tting problem. Litterman R. (1980), Techniques for forecasting with Vector AutoRegressions, University of Minnesota, Ph.D. dissertation. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 159 / 246 5. Applications Bayesian estimation is a solution to the over…tting that avoids imposing exact zero restrictions on the parameters The researcher cannot be sure that some coe¢ cients are zero and should not ignore their possible range of variations A Bayesian perspective …ts precisely this view with the prior distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 160 / 246 5. Applications Y t = B1 Y t 1 + B2 Y t 2 + .. + Bp Yt p + εt Rewrite the VAR in a compact form: Y t = Xt β + ε t Xt = In Wt 1 Wt 1 is n nk with k = p + d = Yt> 1 , .., Yt> p , zt> > β = vec (B1 , .., Bp , D) is a nk Christophe Hurlin (University of Orléans) is k 1 1 vector Bayesian Econometrics June 26, 2014 161 / 246 5. Applications De…nition (Likelihood) Under the normality assumption, the condional distribution of Yt = Xt β + εt given Xt and the set of parameters β is normal Y t j Xt , β N (Xt β, Σ) and the likelihood of the (realisations) sample (Y1 , .., YT ) , denoted Y, is: ! 1 T > T /2 1 LT ( Y j X, β, Σ) ∝ jΣj exp ( Y t Xt β ) Σ ( Y t Xt β ) 2 t∑ =1 By convention, we will denote LT ( Y j X, β, Σ) = LT (Y, β, Σ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 162 / 246 5. Applications De…nition (Joint posterior distribution) Given a prior π ( β, Σ) , the joint posterior distribution of the parameters β is given by LT (Y, β, Σ) π ( β, Σ) π ( β, Σj Y ) = f (Y ) or equivalently π ( β, Σj Y ) ∝ LT (Y, β, Σ) π ( β, Σ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 163 / 246 5. Applications Marginal posterior distribution Given π ( β, Σj Y ) , the marginal posterior distribution for β and Σ can be obtained by integration π ( βj Y ) = π ( Σj Y ) = Z Z π ( β, Σj Y ) d Σ π ( β, Σj Y ) d β where d Σ and d β correspond to integrand de…ned for all the elements of Σ or β respectively Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 164 / 246 5. Applications Bayesian estimates Location and dispersion of π ( βj Y ) and π ( Σj Y ) can be easily analysed to yield point estimates (quadratic loss) of the parameters of interest and measure of precision, comparable to those obtained by using classical approach to estimation. Especially: E (π ( βj Y )) Christophe Hurlin (University of Orléans) V (π ( βj Y )) Bayesian Econometrics E (π ( Σj Y )) June 26, 2014 165 / 246 5. Applications Two problems arise: 1 The numerical integration of the marginal posterior distribution 2 The choice of the prior Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 166 / 246 5. Applications Fact (Numerical integration) In most of case, the numerical integration of π ( β, Σj Y ) may be di¢ cult or even impossible to implement. For instance, if n = 1, p = 4, we have π ( Σj Y ) = Z π ( β, Σj Y ) d β = Z Z Z Z π ( β, Σj Y ) d β1 d β2 d β3 d β4 This problem, however, can often be solved by using numerical integration based on Monte Carlo simulations. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 167 / 246 5. Applications One particular MCMC (Markov-Chain Monte Carlo) estimation method is the Gibbs sampler, which a particular version of the Metropolis-Hasting algorithm (see section 5). Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 168 / 246 5. Applications De…nition (Gibbs sampler) The Gibbs sampler is a recursive Monte Carlo method that allows to generate simulated values of ( β, Σ) from the joint posterior distribution. This method requires only knowledge of the full conditional posterior distribution of the parameters of interest π ( βj Y, Σ) and π ( Σj Y, β) . Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 169 / 246 5. Applications De…nition (Gibbs algorithm) Suppose that β and Σ are sacalr. The Gibbs sampler algorithm starts from arbirtary values β0 ,Σ0 , and samples alternatively from the density of each element of the parameter vector, conditional of the value of the other element sampled in the previous iteration. Thus, the Gibbs sampler samples recursively as follows: β1 from π βj Y , Σ0 Σ1 from π Σj Y , β1 β2 from π βj Y , Σ1 ... βm from π βj Y , Σm Christophe Hurlin (University of Orléans) Bayesian Econometrics 1 June 26, 2014 170 / 246 5. Applications Fact (Gibbs sampler) The vectors ( βm , Σm ) form a Markov-Chain and for a su¢ ciently large number of iterations, m M, can be regarded as draws from the true joint posterior distribution π ( β, Σj Y ) . Given a large sample of draws from this limiting distribution, ( βm , Σm )Gm =M M +1 , any posterior moment of marginal density of interest can then be easily estimated consistently with its corresponding sample average. For instance: 1 G Christophe Hurlin (University of Orléans) G ∑ m =M +1 βm ! E ( β j Y ) Bayesian Econometrics June 26, 2014 171 / 246 5. Applications Remark The process must be started somewhere, though it does not matter much where. Nonetheless, a burn-in period is required to eliminate the in‡uence of the starting values. So, the …rst M values ( βm , Σm ) are discarded. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 172 / 246 5. Applications Example (Gibbs sampler) We consider the bivariate normal distribution …rst. Suppose we wished to draw a random sample from the population X1 X2 N 0 0 , 1 ρ ρ 1 Question: write a Matlab code to generate a sample of n = 1, 000 observations of (x1 , x2 )> with a Gibbs sampler. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 173 / 246 5. Applications Solution The Gibbs sampler takes advantage of the result Christophe Hurlin (University of Orléans) X1 j x2 N ρx2 , 1 ρ2 X2 j x1 N ρx1 , 1 ρ2 Bayesian Econometrics June 26, 2014 174 / 246 5. Applications 3 2 1 0 -1 -2 -3 -3 Christophe Hurlin (University of Orléans) -2 -1 0 Bayesian Econometrics 1 2 3 June 26, 2014 175 / 246 5. Applications Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 176 / 246 5. Applications Priors A second problem in implementing Bayesian estimation of VAR models is the choice of the prior distribution for the model’s parameters π ( β, Σ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 177 / 246 5. Applications We distinguish two types of priors: 1 2 Some priors lead to analytical formula for the posterior distribution 1 Di¤use prior 2 Natural conjugate prior 3 Minnesoto prior For other priors there is no analytical formula for the priors and the Gibbs sampler is required. 1 Independent Normal-Wishart Prior Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 178 / 246 5. Applications De…nition (Full Bayesian estimation) A full Bayesian estimation requires specifying prior distribution for the hyperparameters of the prior distribution of the parameters of interest. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 179 / 246 5. Applications De…nition (Empirical Bayesian estimation) The empirical Bayesian estimation is based on a preliminary estimation of the hyperparameters γ. These estimates could be obtained by OLS, maximum likelihood, GMM or other estimation method. Then, π (θ; γ) is b ) . Note that the uncertainty in the estimation of the substituted by π (θ; γ hyperparameters is not taken into account in the posterior distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 180 / 246 5. Applications Empirical Bayesian estimation In this section, we consider only empirical Bayesian estimation methods: Y t = Xt β + ε t IIN (0, Σ) εt Notations: b= β b= Σ T T ∑ Xt> Xt t =1 T 1 k ∑ (Y t ! 1 T ∑ Xt> Yt t =1 Xt β ) > ( Y t ! OLS estimator of β Xt β) OLS estimator of Σ t =1 Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 181 / 246 5. Applications Matlab codes for Baysesian estimation of VAR models are proposed by Koop and Korobilis (2010) 1 2 Analytical results: 1 Code BVAR_ANALYT.m gives posterior means and variances of parameters & predictives, using the analytical formulas. 2 Code BVAR_FULL.m estimates the BVAR model combining all the priors discussed below, and provides predictions and impulse responses Gibbs sampler: Code BVAR_GIBBS.m estimates this model, but also allows the prior mean and covariance. Koop G. and Korobilis D. (2010), Bayesian Multivariate Time Series Methods for Empirical Macroeconomics Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 182 / 246 Sub-Section 4.1 Minnesota Prior Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 183 / 246 5. Applications Fact (Stylised facts) Literman (1986) speci…es his prior by appealing to three statistical regularities of macroeconomic time series data: (1) the trending behavior of macroeconomic time series (2) more recent values of a series usually contain more information on the current value of the series than past values (3) past values of a given variable contain more information on its current state than past values of other variables. Litterman R. (1986), Forecasting with Bayesian Vector AutoRegressions, JBES, 4, 25-38. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 184 / 246 5. Applications A Bayesian researcher specify these regularities by assigning a probability distribution to the parameters in such way that: 1 the mean of the coe¢ cients assigned to all lags other than the …rst one is equal to zero. 2 the variance of the coe¢ cients depends inversely on the number of lags 3 the coe¢ cient of variable j in equation g are assigned a lower prior variance than those of variable g . These requirements will be controlled for by the hyperparameters π = (π 1 , .., π d )> =) prior on B1 , .., Bp , D, Σ Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 185 / 246 5. Applications De…nition (Minnesota prior, Litterman 1986) In the Minnesota prior, the variance covariance matrix Σ is assumed to be b Denote β for k = 1, .., n, the vector of parameters …xed and equal to Σ. k ith of the k equation, the corresponding prior distribution is: βk Christophe Hurlin (University of Orléans) N βk , Ωk Bayesian Econometrics June 26, 2014 186 / 246 5. Applications Remark 1 Given these assumptions, there is prior and posterior independence between equations. 2 The equations can be estimated separately. 3 By assuming Ωk = 0, the posterior mean of βk corresponds to the OLS estimator of βk . Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 187 / 246 5. Applications Litterman (1986) then assigns numerical values to the hyperparameters given the previous stylised facts. For the prior mean: 1 The traditional choice is to set βk = 0 for all the parameters if the k th variable is a growth rate (random walk behavior). 2 If the variable is in level all the hyperparameters are set to 0, except the parameter associated to the …rst own lag. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 188 / 246 5. Applications The prior covariance matrix Ωk is assumed to be diagonal with elements ω i ,i for i = 1, .., k. ω i ,i = 8 < : a1 r2 a 2 σii r 2 σjj a3 σij for coe¢ cients on own lag r = 1, .., p for coe¢ cients on lag r of variable j 6= i, r = 1, .., p for coe¢ cients on exogenous variables This prior simpli…es the complicated choice of fully specifying all the ele-ments of Ωk to choosing three scalars, a1 , a2 and a3 . Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 189 / 246 5. Applications Theorem (Posterior distribution) If we assume Minnesota priors, the posterior distribution of βk is: βk j Y e ,Ω ek N β k e =Ω ek Ω e k 1 β + σ 2 X> Y g β k k k ,k e k = Ω k 1 + σ 2 X> X Ω k ,k Christophe Hurlin (University of Orléans) Bayesian Econometrics 1 June 26, 2014 190 / 246 Sub-Section 4.2 Di¤use Prior Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 191 / 246 5. Applications De…nition (Di¤use prior) The Di¤use prior is de…ned as to be: π ( β, Σ) ∝ jΣj Christophe Hurlin (University of Orléans) (n +1 )/2 Bayesian Econometrics June 26, 2014 192 / 246 5. Applications Remark π ( β, Σ) ∝ jΣj (n +1 )/2 Contrary to Minnesota prior: 1 Σ is not assumed to be …xed 2 The equations are not independent Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 193 / 246 5. Applications Theorem (Posterior distribution) For a Di¤usion prior distribution, the joint posterior distribution is given by: π ( β, Σj Y ) = π ( βj Y, Σ) π ( Σj Y ) βj Y, Σ Σj Y b Σ N β, b T IW Σ, k where IW denotes the Inverted Wishart distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 194 / 246 Sub-Section 4.3 Natural conjugate prior Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 195 / 246 5. Applications De…nition (Natural conjugate prior) The natural conjugate prior has the form: βj Σ N β, Σ Σ Ω IW Σ, α with α > n and IW denotes the Inverted Wishart distribution with a degree of freedom equal to s. The hyperparameters are β, Ω, Σ and α. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 196 / 246 5. Applications Theorem (Posterior distribution) The posterior distribution associated to the natural conjugate prior is: βj Σ, Y Σj Y where e Ω e T +α IW Σ, e=Ω e Ω e β e = Ω Ω Christophe Hurlin (University of Orléans) e Σ N β, 1 b β + X> X β ols 1 + X> X Bayesian Econometrics 1 June 26, 2014 197 / 246 Sub-Section 4.4 Independent Normal-Wishart Prior Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 198 / 246 5. Applications De…nition (Independent Normal-Wishart Prior) The independent Normal-Wishart prior is de…ned as: 1 π β, Σ = π ( β) π Σ 1 where β Σ 1 N β, Ω W Σ 1 ,α where W (., α) denotes the Wishart distribution with α degrees of freedom. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 199 / 246 5. Applications Remark 1 2 Note that this prior allows for the prior covariance matrix, Ω, to be anything the researcher chooses, rather than the restrictive Σ Ω form of the natural conjugate prior. For instance, the researcher could choose a prior similar in spirit to the Minnesota prior. A noninformative prior can be obtained by setting β=Ω=Σ Christophe Hurlin (University of Orléans) 1 Bayesian Econometrics =0 June 26, 2014 200 / 246 5. Applications Theorem (Posterior distribution) In the case of independent Normal-Wishart prior, the joint posterior distribution can not be derived analytically. We can only derive the conditional posterior distribution (used in the Gibbs sampler) e Σ N β, βj Σ, Y Σj Y, β where e=Ω e β e = Ω Christophe Hurlin (University of Orléans) 1 Ω e Ω e T +α IW Σ, T β+ ∑ Xt> Σ 1 Yt t =1 Ω 1 T + ∑ Xt> Σ 1 Xt t =1 Bayesian Econometrics ! ! 1 June 26, 2014 201 / 246 Sub-Section 5 Simulation methods Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 202 / 246 5. Simulation methods Objectives The objective of this section are the following: 1 Introduce the Probability Integral Transform (PIT) method 2 Introduce the Accept-Reject (AR) method 3 Introduce the Importance sampling method 4 Introduce the Gibbs algorithm 5 Introduce the Metropolis Hasting algorithm Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 203 / 246 5. Simulation methods In bayesian econometrics, we may distinguish two case give the prior: 1 When we use a conjugate prior, the posterior distribution is sometimes "standard" (normal, gamma, beta etc..) and standard packages of the statistic softwares may be used to compute the density π ( θ j y ) , its moments (for instance Eπ ( θ j y )), the hpd etc. 2 In most of cases, the posterior density π ( θ j y ) is obtained numerically through simulation methods. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 204 / 246 Sub-Section 5.1 Probability Integral Transform (PIT) method Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 205 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method Probability Integral Transform (PIT) method Suppose we wish to draw a sample of values from a continuous random variable X that has cdf FX (.), assumed to be nondecreasing. The PIT methods allows to generate a sample of X from a sample of random values issued from U where U has a uniform distribution over [0, 1] : U U[0,1 ] Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 206 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method Probability Integral Transform (PIT) method Indeed, if we assume that the random variable U is a function of the random variable X such that: U = FX ( X ) What is the distribution of U? FX (x ) = Pr (X x ) = Pr (FX (X ) Christophe Hurlin (University of Orléans) FX (x )) = Pr (U Bayesian Econometrics FX (x )) June 26, 2014 207 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method Probability Integral Transform (PIT) method FX (x ) = Pr (X x ) = Pr (FX (X ) FX (x )) = Pr (U FX (x )) So, we have: Pr (U FX (x )) = FX (x ) We know that if U has a uniform distribution then Pr (U u) = u So, U has a uniform distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 208 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method De…nition (Probability Integral Transform (PIT) ) If X is a continuous random variable with a cdf FX (.) , then the transformed (probability integral transformation) variable U = FX (X ) has a uniform distribution over [0, 1] . U = FX ( X ) Christophe Hurlin (University of Orléans) U[0,1 ] Bayesian Econometrics June 26, 2014 209 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method How to get a trial for X from a trial from U? De…nition (PIT algorithm) In order to get a realisation x of the random variable of X with a cdf FX (.) , from a realisation u of the variable U with U U[0,1 ] , the following procedure has to be adopted: (1) Draw u from U [0, 1] . (2) Compute x = FX 1 (u ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 210 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method Example (PIT method) Suppose we wish to draw a sample from a random variable with density function 3 2 if 0 x 2 8y fX ( x ) = otherwise 0 Question: Write a Matlab code a generate a sample (x1 , .., xn ) with n = 100. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 211 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method Solution (cont’d) First, determine the cdf of X : 3 FX ( x ) = 8 Zx t 2 dt = 0 1 3 x 8 So, we have: 1 3 X U[0,1 ] 8 Then, determine the probability inverse transformation: U = FX ( X ) = X = FX 1 (U ) = 2U 1/3 Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 212 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method Solution (cont’d) 2 1 .8 1 .6 1 .4 1 .2 1 0 .8 0 .6 0 .4 0 .2 0 20 40 60 Christophe Hurlin (University of Orléans) 80 100 Bayesian Econometrics June 26, 2014 213 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method Remark Note that a multivariate random variable cannot be simulated by this method, because its cdf is not one-to-one and therefore not invertible. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 214 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method Remark An important application of this method is to the problem of sampling from a truncated distribution Suppose that X has a cdf FX (x ) and that we want to generate restricted values of X such that c1 Christophe Hurlin (University of Orléans) X Bayesian Econometrics c2 June 26, 2014 215 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method Remark (cont’d) The cdf of the truncated variable is FX ( x ) FX ( c 1 ) F ( c1 ) FX ( c2 ) for c1 x c2 Then, we have: U= FX ( X ) FX ( c1 ) F ( c1 ) FX ( c2 ) U[0,1 ] and the truncated variable can be de…ned as: Xtrunc = FX 1 (FX (c1 ) + U (FX (c2 ) Christophe Hurlin (University of Orléans) Bayesian Econometrics FX (c1 ))) June 26, 2014 216 / 246 5. Simulation methods 5.1. Probability Integral Transform (PIT) method Recommendation 1 2 Why using the PIT method? I In order to generate a sample of simulated values from a given distribution I This distribution is not available in my statistical software.. What are the prerequisites of the PIT? I The functional form of the cdf is known and we need to know F 1 (x ) .... Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 217 / 246 Sub-Section 5.2 Accept-reject method Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 218 / 246 5. Simulation methods 5.2. Accept-reject method De…nition (Accept-reject method) The accept–reject (AR) algorithm can be used to simulate values from a density function fX (.) if it is possible to simulate values from a density g (.) and if a number c can be found such that fX ( x ) c g (x ) for all x in the support of fX (.). Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 219 / 246 5. Simulation methods 5.2. Accept-reject method De…nition (Target and source densities) The density fX (.) is called the target density and g (.) is called the source density. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 220 / 246 5. Simulation methods 5.2. Accept-reject method Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 221 / 246 5. Simulation methods 5.2. Accept-reject method Posterior distribution In the context of the Bayesian econometrics, we have: π ( θj y ) c c = sup θ 2Θ where: 1 2 g ( θj y ) 8θ 2 Θ π ( θj y ) g ( θj y ) Target density = posterior distribution π ( θ j y ) Source density = g ( θ j y ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 222 / 246 5. Simulation methods 5.2. Accept-reject method De…nition (AR algorithm, posterior distribution) The Accept-Reject algorithm for the posterior distribution is the following: (1) Generate a value m θ s from g ( θ j y ). (2) Draw a value u from U[0,1 ] . (3) Return θ s as a draw from π ( θ j y ) if u π ( θs j y ) cg ( θ s j y ) If not, reject it and return to step 1.(The e¤ect of this step is to accept θ s with probability π ( θ s j y ) /cg ( θ s j y ).) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 223 / 246 5. Simulation methods 5.2. Accept-reject method Example (AR algorithm) We aim at simulating some realisations from a N (0, 1) distribution (target distribution) with a source density given by a Laplace distribution with a pdf: 1 g (x ) = exp ( jx j) 2 Question: write a Matlab code to simulate a sample of n = 1, 000 realisations of the normal distribution. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 224 / 246 5. Simulation methods 5.2. Accept-reject method Solution In order to simplify the problem, we generate only positive values for x. The source density can transformed as follows (exponential density with λ = 1). g (x ) = exp ( x ) Since the normal and the Laplace are symmetric about zero, if the proposal (x > 0) is accepted, it is assigned a positive value with probability one half and a negative value with probability one half. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 225 / 246 5. Simulation methods 5.2. Accept-reject method Solution (cont’d) The pdf of the target and source distributions are for x > 0: 1 f (x ) = p exp 2π x2 2 g (x ) = exp ( x ) Determination of c : determine the maximum value of f (x ) /g (x ) : x2 2 f (x ) 1 exp =p exp x g (x ) ( ) 2π The maximum is reached for x = 1, then we have: 1 c= p exp 2π Christophe Hurlin (University of Orléans) Bayesian Econometrics 1 2 June 26, 2014 226 / 246 5. Simulation methods 5.2. Accept-reject method 0.7 0.6 0.5 Ratio f(x)/g (x) 0.4 0.3 0.2 0.1 0 0 Christophe Hurlin (University of Orléans) 2 4 6 Bayesian Econometrics 8 10 June 26, 2014 227 / 246 5. Simulation methods 5.2. Accept-reject method 0.7 0.6 0.5 f(x) c .g(x) 0.4 0.3 0.2 0.1 0 0 Christophe Hurlin (University of Orléans) 2 4 6 Bayesian Econometrics 8 10 June 26, 2014 228 / 246 5. Simulation methods 5.2. Accept-reject method Solution (cont’d) The AR algorithm is the following: 1 Generate x from an exponential distribution with parameter λ = 1. 2 Generate u from an uniform distribution U[0,1 ] 3 If u c φ (x ) = exp ( x ) p1 2π p1 2π x 2 /2 exp (1/2) exp ( x ) then return x if u > 1/2 and Christophe Hurlin (University of Orléans) exp x if u Bayesian Econometrics = exp x x2 +x 2 1/2. Otherwise, reject x. June 26, 2014 229 / 246 5. Simulation methods 5.2. Accept-reject method Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 230 / 246 5. Simulation methods 5.2. Accept-reject method Generated sample Kernel density estimates 3 0 .5 0 .4 5 2 0 .4 1 0 .3 5 0 .3 0 0 .2 5 -1 0 .2 0 .1 5 -2 0 .1 -3 0 .0 5 -4 0 200 400 600 Christophe Hurlin (University of Orléans) 800 1000 Bayesian Econometrics 0 -5 -4 -3 -2 -1 0 1 2 3 June 26, 2014 4 231 / 246 5. Simulation methods 5.2. Accept-reject method Fact (Unormalised posterior distribution) An interesting feature of the AR algorithm is that is allows to simulate some values for θ (from the posterior distribution π ( θ j y )) by only using the unormalised posterior π ( θj y ) = f ( y j θ) π ( θj y ) f (y ) π ( θj y ) ∝ f ( y j θ) π ( θj y ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 232 / 246 5. Simulation methods 5.2. Accept-reject method Intuition π ( θj y ) = 1 f (y ) | {z } unknown f ( y j θ) π ( θj y ) | {z } known Assume that f ( y j θ ) π ( θ j y ) is known, but 1/f (y ) is unknow. If a value of θ generated from g ( θ j y ) is accepted with probability f ( y j θ ) π ( θ j y ) /cg ( θ j y ), the accepted values of θ are a sample from π ( θ j y ). This method can therefore be used even if the normalizing constant of the target distribution is unknow Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 233 / 246 5. Simulation methods 5.2. Accept-reject method Example (AR and posterior distribution) Consider an i.i.d. sample (Y1 , .., Yn ) of binary variables, such that Yi Be (θ ) with θ = 0.3. We assume that the (uninformative) prior distribution for θ is an uniform distribution over [0, 1]. Then, we have π ( θj y ) = θ Σyi (1 Ln (θ; y1 , .., yn ) π (θ ) =Z1 fY 1 ,..Y n (y1 , .., yn ) θ Σyi (1 0 θ ) Σ (1 yi ) θ ) Σ (1 yi ) dθ e ( θ j y ) to simulate some We can use the pdf of the unormalised posterior π values (θ 1 , ..θ S ) from the posterior distribution e ( θ j y ) = θ Σyi (1 π Christophe Hurlin (University of Orléans) θ ) Σ (1 Bayesian Econometrics yi ) June 26, 2014 234 / 246 5. Simulation methods 5.2. Accept-reject method Recommendation 1 2 Why using the Accept-Reject method? I In order to generate a sample of simulated values from a given distribution. I This distribution is not available in my statistical software. I To generate samples of θ from the unormalised posterior distribution What are the prerequisites of the PIT? I The functional form of the pdf of the target distribution is known Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 235 / 246 Sub-Section 5.3 Importance sampling Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 236 / 246 5. Simulation methods 5.3. Importance sampling Suppose that one is interested in calculating the value of the integral I = E ( h (θ )j y ) = Z Θ h (θ ) π ( θ j y ) d θ where h (θ ) is a continuous function. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 237 / 246 5. Simulation methods 5.3. Importance sampling Example (Importance sampling) Suppose that we want to compute the expectation and the variance of the posterior distribution E ( θj y ) = E θ 2 y = Z Z Θ θ π ( θj y ) d θ Θ θ2 π ( θj y ) d θ V ( θj y ) = E θ2 y Christophe Hurlin (University of Orléans) Bayesian Econometrics E2 ( θ j y ) June 26, 2014 238 / 246 5. Simulation methods 5.3. Importance sampling I = Z Θ h (θ ) π ( θ j y ) d θ Consider a source density g ( θ j y ) that is easy to sample from and which is a close match to π ( θ j y ) . Write: I = Z Θ h (θ ) π ( θ j y ) g ( θj y ) d θ g ( θj y ) This integral can be approximated by drawing a sample of G values from g ( θ j y ) denotes θ g1 , .., θ gG and computing I ' Christophe Hurlin (University of Orléans) 1 G G ∑ h (θ gi ) i =1 π ( θ gi j y ) g ( θ gi j y ) Bayesian Econometrics June 26, 2014 239 / 246 5. Simulation methods 5.3. Importance sampling De…nition (Importance sampling) The moment of the posterior distribution E ( h (θ )j y ) = Z Θ h (θ ) π ( θ j y ) d θ can be approximated by drawing G realisations θ g1 , .., θ gG from a source density g ( θ gi j y ) and computing E ( h (θ )j y ) ' 1 G G ∑ h (θ gi ) i =1 π ( θ gi j y ) g ( θ gi j y ) This expression can be regarded as a weighted average of the h (θ gi ) where importance weights are π ( θ gi j y ) /g ( θ gi j y ) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 240 / 246 5. Simulation methods 5.3. Importance sampling Example (Truncated exponential) Consider a continuous random variable X with an exponential distribution X exp (1) truncated to [0, 1] . We want to approximate E 1 1 + X2 by the importance sampling method with a source density B (2, 3) because it is de…ned over [0, 1] and because, for this choice of parameters, the match between the beta function and the target density is good. Question: write a Matlab code to approximate this integral. Compare it to the value obtained by numerical integration. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 241 / 246 5. Simulation methods 5.3. Importance sampling Solution For an exponential distribution with a rate parameter of 1 we have f (x ) = exp ( x ) F (x ) = 1 exp ( x ) forx > 0 If this density is truncated over [0, 1] (truncated at right) we have π (x ) = f (x ) exp ( x ) = F (1) 1 exp ( 1) So, we aim at computing: E 1 1 + X2 Christophe Hurlin (University of Orléans) = Z1 0 1 exp ( x ) dx 2 1 + x 1 exp ( 1) Bayesian Econometrics June 26, 2014 242 / 246 5. Simulation methods 5.3. Importance sampling Solution (cont’d) The importance sampling algorithm is the following: 1 Generate a sample of G values x1 , .., xG from a Beta distribution B (2, 3) 2 Compute E 1 1 + X2 ' 1 G G 1 exp ( xi ) 1 2 1 exp ( 1) g2,3 (xi ) i =1 1 + xi {z } | {z } | {z }| ∑ π (x i ) h (x i ) g (x i ) where gα,β (xi ) is the pdf of the B (α, β) distribution evaluated at x gα,β (x ) = xα 1 (1 x ) β B (α, β) 1 and B(α, β) is the beta function. Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 243 / 246 5. Simulation methods 5.3. Importance sampling Results Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 244 / 246 5. Simulation methods 5.3. Importance sampling Recommendation 1 Why using the Importance sampling method? I 2 In order to compute the moments of a given distribution (typically the posterior distribution). What are the prerequisites of the PIT? I The functional form of the pdf of the target distribution is known Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 245 / 246 End of Chapter 7 Christophe Hurlin (University of Orléans) Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 246 / 246