A Comparison of INLA and MCMC for the Estimation of Smoothed Risk Maps in Epidemiology Ramis13, Rebeca Virgilio Gómez-Rubio2, Peter J. Diggle1 & Gonzalo ... ciberesp López-Abente3 Centro de Investigación Biomédica en red Epidemiología y Salud Pública 1 Division of Medicine, Lancaster University, UK. {r.ramis,p.diggle}@lancaster.ac.uk 2 Departamento de Matemáticas, Universidad de Castilla-La Mancha, Spain. Virgilio.Gomez@uclm.es 3 Centro Nacional de Epidemiología, Instituto de Salud ’Carlos III’, Madrid, Spain. glabente@isciii.es Background In Epidemiology, spatial disease mapping models are commonly used for the estimation of smoothed risk maps. In this context, the Besag, York and Mollie model has been widely used: es por esofag o RME. Esófag o 1.5 a 200 1.3 a 1.5 1.1 a 1.3 1 a 1.1 0.95 a 1.05 0.91 a 0.95 0.77 a 0.91 0.67 a 0.77 0 a 0.67 1.5 a 200 (1378) 1.3 a 1.5 (249) 1.1 a 1.3 (304) 1 a 1.1 (72) 0.95 a 1.05 (157) 0.91 a 0.95 (60) 0.77 a 0.91 (228) 0.67 a 0.77 (171) 0 a 0.67 (5573) Besag, York and Molie model (161) (262) (892) (404) (1135) (470) (1853) (1579) (1436) Oi ~ Po(i ) log(i ) hi bi As in many other Bayesian models, MCMC is often used to estimate the posterior distribution of the parameters of interest; with the main disadvantage of being computationally intensive. Specially when the number of areas is high. (We work with the 8068 Spanish towns) hi ~ N (0, vh ) bi ~ CAR(, vb ) In contrast, INLA (Rue, Martino and Chopin,2009, JRSS-B, 71:319–392) has recently offered a different alternative with a negligible computational burden. Oi is the number of cases in area i, ρ is the prevalence of the disease, hi is a heterogeneity random effect and bi is a spatially correlated random effect. Our aim is to compare the performance and accuracy of both techniques using a factorial experiment analysis over the real geographical distribution of 8068 small areas in which Spain in divided. Factorial experiment yijk = mean(90% CI Empirical Coverage for scenario ijk) We carry out a factorial experiment with 3 factors: ρ (prevalence), vh (variance of heterogeneity term) y vb, (variance of spatial autocorrelation term). We define 3 levels for each factor: low (1), medium (2) and high (3). These combinations of factors try to reproduce various real scenarios of chronic disease outcomes in Spain. We simulate 25 datasets for the 27 different scenarios and then we compute the 90% CI Empirical Coverage. We take the mean of the 25 replications in each scenario. where: • i = 1,2,3 levels of the spatial autocorrelation term variance (vb) • j = 1,2,3 levels of the heterogeneity term variance (vh) • k =1,2,3 levels of prevalence (ρ) We repeat the experiment with INLA and MCMC (WinBUGS) to assess and compare their performance. Results 90% CI Empirical Coverage yINLA Our results show that both techniques produce similar estimations. ^ Here there are some examples of these results for some of the simulated data μ^INLA and μ MCMC are very similar however the standard errors (se) show a different behaviours. For INLA estimations μ^ and se are independent however for MCMC estimations se increase with ^ increasing value of μ. ˆ ˆ j=2 vs se MCMC j=3 1.0 Standard Error 0.6 Standard Error 4 3 MCMC i=1 i=2 i=3 k i=1 i=2 i=3 1 0.899 0.913 0.912 1 0.895 0.905 0.904 2 0.918 0.920 0.914 2 0.901 0.899 0.904 3 0.929 0.928 0.924 3 0.901 0.899 0.902 1 0.919 0.937 0.911 1 0.895 0.908 0.894 2 0.929 0.940 0.915 2 0.895 0.906 0.897 3 0.932 0.940 0.920 3 0.897 0.901 0.899 1 0.933 0.966 0.923 1 0.945 0.925 0.873 2 0.943 0.965 0.941 2 0.939 0.910 0.905 3 0.943 0.962 0.945 3 0.932 0.911 0.909 j=1 j=2 j=3 0 0.0 0.2 1 0.5 0.4 2 Scenario.1.1.1 (simulation 1) k 0.8 1.5 5 1.0 2.0 6 INLA vs MCMC ˆ vs se INLA j=1 yMCMC 1 2 3 4 5 6 1 2 3 4 5 6 0 1 2 3 Mean 4 5 6 Mean 0.45 4 1.2 INLA 0.8 0.4 0.6 Standard Error 0.35 0.30 Standard Error 0.25 2 0 0.0 0.15 0.2 0.20 1 Scenario.2.2.2 (simulation 1) MCMC 3 0.40 1.0 • 90% CI Empirical Coverage for INLA estimations are superior to 90% for all combinations but 1.1.1 0 1 2 3 4 0.5 1.0 1.5 2.5 3.0 3.5 0 1 2 3 Mean 4 5 6 7 Mean 0.25 Standard Error 0.20 Standard Error 0.2 0.15 0.3 0.4 2.5 2.0 0.1 0.10 0.0 MCMC 1.5 1.0 0.5 0.0 Scenario.3.3.3 (simulation 1) 0.0 0.5 1.0 1.5 INLA 2.0 2.5 3.0 • MCMC coverage intervals are almost 90% for scenarios with j=1 and j=2, however for j=3 they are superior but 3.3.1 0.5 3.0 INLA 2.0 0.5 1.0 1.5 Mean 2.0 2.5 1 2 3 4 Mean • INLA results show variation along factors levels. Increases in heterogeneity term variance (j) and in prevalence (k) produce increases in the 90% CI Empirical Coverage. But increases in spatial autocorrelation term variance (vh) do not produce the same effect. • MCMC results are not affected for changes in the factors levels. Concluding Remarks For situations with a high number of small areas, some remarks should be taken into account in order to choose a technique to estimate risk maps. • INLA and MCMC techniques estimate similar smoothed risk maps. • INLA standard errors are larger. • For scenarios with higher heterogeneity both techniques produce wider intervals.