Bayesian analysis of autoregressive panel data model: a simulation study Fabyano Fonseca e Silva1, Thelma Sáfadi2, Joel Augusto Muniz2, Luis Henrique de Aquino2 INTRODUCTION Panel data typically refer to data containing time series observations of a number of individuals. Therefore, observations in panel data involve at least two dimensions; a crosssectional dimension, indicated by subscript i, and a time series dimension, indicated by subscript t. These data consist of several times series generate by the same type of model, for example, autoregressive (AR), moving average (MA), autoregressive integrated moving average (ARIMA), and others more complex models. The key advantage of simultaneously modeling several series is the possibility of pooling information from all series. Thus, this advantage is related to generating more accurate predictions for individual outcomes by pooling the data rather than generating predictions of individual outcomes using the data on the individual in question. The pooling takes place as the parameters of the time series model are assumed to arise from the same distribution (Liu, 1980). The specification of this distribution leads to affirmation that the Bayesian procedure has a theoretical advantage over the classical procedure, independent of convenience, because the classical perspective focuses on the sampling distribution of an estimator, while the Bayesian procedure provides exact information about this parameters distribution. Widely, only approximate likelihood function is attempted in the Bayesian analysis of AR(p) models, because the unconditional or exact function don’t provide conditionals distributions with closed form, leading to more complex estimation process. Based on a panel data study, it considers that the conditionality to initial observations, defined by order p of each series, represent a larger information loss. Therefore, same that increases the complexity of the Bayesian analysis, it is important to consider this characteristic. The key element of Bayesian analysis is the choice of prior. A commonly used informative prior for parameters of autoregressive models is a multivariate normal distribution (Ni e Sun, 2003), but others can be used, for example multivariate Student’s t (Barreto e Andrade, 2004) and independent rescaled beta distribution (Liu, 1980). Thus, the 1 2 Professor Adjunto, Dep Informática, Setor de Estatística, Universidade Federal de Viçosa. fabyano@dpi.ufv.br Professor (a) Departamento de Ciências Exatas, Universidade Federal de Lavras. choice of prior distribution is crucial, and specific methods, for example, Bayes Factor (Gelfand, 1996) have been used. In the present study we propose a full Bayesian analysis of an autoregressive, AR(p), panel data model. The methodology considers exact likelihood function, comparative analysis of priors distributions and predictive distributions of future observations. Methodology The autoregressive model to describe this situation is presented by Liu (1980): yit i1 yi (t 1) i 2 yi (t 2) ... mp ym (t p ) eit , where yit is the actual value of an stochastic process, whole values assumed in the past are given by yt-1, yt-2, ... , yt-p; i1 , i 2 ,..., ip are autoregressive coefficients for each individual, iid eit is an error term, eit ~ N (0, e2 ) . The likelihood function presented in the matrix form is: L(Y | , e2 ) ( , e2 | Y p ) e2 ( , e2 | Y p ) e2 mp 2 Vp 1 2 m(n p ) 2 1 exp 2 (Y1 X ) '(Y1 X ) , where: 2 e 1 exp 2 Y p 'V pY p , 2 e Y p [ y11 , y12 ,..., y1 p , y21 , y22 ,..., y2 p ,..., ym1 , ym 2 ,..., ymp ]', Y1 [ y1 p 1 , y1 p 2 ,..., y1n , y2 p 1 , y2 p 2 ,..., y2 n ,..., ymp 1 , ymp 2 ,..., ymn ]'m(n-p) x 1 , X1 0 X= 0 0 0 0 X2 0 0 0 0 0 yip y 0 ip 1 Xi 0 X m m ( n p )mp yin 1 yi1 yi 2 and yin p ( n p ) p [11 , 12 ,..., 1 p , 21, 22 ,..., 2 p ,..., m1, m2 ,..., mp ]'mp x 1 mp . The matrix Vp is obtained by Yule-Walker equation (Box, Jenkins and Reinsel, 1994). In the present work we generalized this matrix for panel data using the block diagonal structure, which is illustrated for AR(2) autoregressive models: 0 0 1 - 122 -11 (1+ 12 ) 2 0 0 1 - 12 -11 (1+ 12 ) 2 0 0 1 - 22 -21 (1+ 22 ) 0 0 Vp -21 (1+ 22 ) 1 - 222 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 - m2 -m1 (1+ m2 ) 2 -m1 (1+ m2 ) 1 - m2 0 0 0 0 0 0 0 0 In this study were compared the hierarchical multivariate Normal – Inverse Gamma prior (model 1) and independent multivariate Student’s t – Inverse Gamma prior (model 2). The conditional posterior distributions for model 1 are given by: | e2 ,Y ~ ( , e2 | Yp )multivariate Normal (ˆ e ) , ˆ ( X ' X )-1 ( X 'Y1 ), e2 | ,Y ~ Inverse Gamma( mp mn 2 1 1 (Y p 'V pY p ) D ( ˆ ) ' ( ˆ )). 2 2 2 The conditional posterior distributions for model 2 are given by: | e2 ,Y ~ ( , e2 | Yp ) mult. Normal (ˆ , ( X ' X )1 ) mult. t Student ( , P -1 ), ( ˆ ) '( X ' X )( ˆ ) (Y1 Yˆ1 ) '(Y1 Yˆ1 ) mn 2 1 e2 | ,Y ~ Inverse Gamma (Y p 'V pY p ) , 2 2 2 Yˆ1 = Xˆ X ( X ' X )-1 ( X 'Y1 ). For each considered prior, one chain with starting values obtained for maximum likelihood estimation were run. After several trials, the length of each chain was set to 50,000. The burn-in period was 20,000 iterations, higher than the minimum burn-in required according to the method of Raftery & Lewis (1992), and the convergence was tested using the Gelman & Rubin (1992 criterion. The Gibbs Sampler and Metropolis-Hastings algorithms were implemented in the R (R Development Core Team, 2006) free software using matrix language. In the present panel data situation, for a specific individual i, the predictive distribution for one future observation is given by : P(Y(n+1) | Y ) 2 e e2 m 2 1 exp 2 Y(n+1) X ' Y(n+1) X P( , e2 | Y ) d d 2 . e 2 e This Integral don´t present analytical solution, but in agreement with Heckman & Leamer (2001), is possible to get a approximation via MCMC algorithm by the distribution: Y (q) (n+1) | Y ~ N X (q) , e2(q) I , were I is mp x mp identity matrix. The set of the values, proceeding of the each q MCMC iteration, constitutes a sample of the future observation posterior predictive distribution. Then, the point estimate of this values, given by the mean of this samples, is Pˆ (Y(n+1) | Y ) . To compare the priors was used the Bayes Factor (BF) under approach showed by Barreto & Andrade (2004), which consider the MCMC sample to obtain the normalization factor, P(Y | M p ) , for a specific p prior. The Bayes Factor expression is given by: 1 Q L(Y | (q) , M 1 ) ˆ P(Y | M 1 ) Q q 1 BF12 , were (q) indicate the generated values in the qth Q ˆ 1 P(Y | M 2 ) L(Y | (q) , M 2 ) Q q 1 iteration (q = 1,2, …, Q) for each compared priors. The index 1 is referent to model 1 and index 2 to model 2. Then, the term L(Y | (q) , M p ) is corresponding to the likelihood function values obtained via parameter substitution by MCMC estimates. A simulation study was conducted to evaluate the proposed methodology. The AR(2) model was used because it is the more simple multiparametric autoregressive approach. It is give by: Yit = i1Yi (t 1) +i 2Yi (t 2) eit ; i 1, 2,...,10, t 1, 2,...,12.i [i1, i 2 ]' 2 if i1 i 2 1; i 2 i1 1; 1 i 2 1 The parameters values, i1 and i 2 , were generated by multivariate normal (model 1) and multivariate Student’s t distributions (model 2): 0 0 0, 5 0, 025 0, 5 0, 025 , , , gl m(n 1) and ~ t - Student 0, 010 0, 010 0, 5 0 0, 5 0 ~ N The residual values distribution was a Normal, eit ~ N (0, e2 ) , were e2 is Inverse Gamma random number, e2 ~ IG (3, 2) . This simulation study also it provides a form of evaluate the predictive capacity, which is verified by last observation ( Yi12 ) exclusion. Therefore, the predicted values ( Yˆi12 ) can be compared with the true values. RESULTS AND DISCUSSION Table 1. Models considered in the simulation and comparison criteria given by Bayes Factor (BF) . Models Hierarchical Multivariate Normal – Inverse Gamma prior (Model 1) Independent multivariate Student's t – Inverse Gamma prior (Model 2) Criteria FB12 FB21 1.336.129 0,2602 5.133.331 2.973.740,179 474.798 6.263,167 Table 2. Last observation true values ( y12), posterior mean estimate ( ŷ12) and 95% credibility intervals (LL and UL). Model 1 Series y12 ŷ12 LL* 1 0,69 0,43 0,12 2 0,28 0,12 -0,13 3 0,42 0,25 0,06 4 0,66 1,04 0,69 5 1,18 1,36 1,13 6 1,18 1,41 0,89 7 1,43 1,02 0,75 8 -0,55 -0,08 -0,42 9 1,27 0,97 0,65 10 0,63 1,25 0,95 * Lower Limit (LL) and Upper Limit (UL) UL 0,74 0,46 0,44 1,39 1,59 1,93 1,29 0,26 1,30 1,55 y12 0,60 -1,58 -0,56 1,49 -0,70 0,03 -0,54 -0,94 -0,04 -0,96 Model 2 ŷ12 LL* 0,46 0,24 -1,17 -1,63 0,02 -0,44 1,92 1,35 -0,92 -1,24 0,14 -0,19 0,06 -0,46 -0,69 -1,00 0,13 -0,06 -1,19 -1,54 UL 0,68 -0,91 0,48 2,49 -0,60 0,47 0,51 -0,38 0,33 -0,84 It is observed in the Table 1 the model 2 superiority, even when the data were generated using the model 1. In general, the literature has related the Student's t prior quality for parameters of time series autoregressive models, among these can be cited Barreto & Andrade (2004), which showed its largest robustness. In the Table 2 is possible compare the predictive capacity of models, because we know the true value for the time series last observation, which were deleted in the analysis process. For the model 1, in 60% of the cases the credibility intervals contains the true values, while for the model 2 this quantity is 80%.Thus, the joint evaluation of the two models, produces an efficiency of 70%. This efficiency is similar at others studies that used the same evaluation form for the model predictive capacity (de Alba, 1993 and Hay & Pettitt, 2001). REFERENCES BARRETO, G.; ANDRADE, M.G. Robust Bayesian Approach for AR(p) Models Applied to Streamflow Forecasting. Journal Applied Statistical Science, New York, v.12, n.3, p.269292, Mar. 2004. BOX, G. E. P.; JENKINS, G. M.; G. C. REINSEL. Time Series Analysis: Forecasting and Control. 3 ed. San Francisco, USA: Holden-Day, 1994, 500p. de ALBA, E. Constrained forecasting in autoregressive time series models: A Bayesian analysis. International Journal of Forecasting, New York, v.9, n.1, p. 95-108, Apr. 1993. J. M. Bernardo et al.), Oxford, USA: University Press, p.763-773. 1992.