Convergence Diagnostics of Bayesian MCMC Chains

Convergence Diagnostics for PHASE MCMC Chains We tested the convergence of the MCMC chains with the CODA library (Best et al., 1995), as implemented for the R v.2.11.1 environment (Plummer et al., 2010; R Development Core Team, 2010). CODA convergence diagnostics taken into account in our analyses included: heterogeneity among chains (Gelman and Rubin, 1992), convergence test for individual chains (Geweke, 1992), stationarity and half-width interval tests (Heidelberger and Welch, 1983), as well as the inspection of the density distributions and the auto- and cross-correlations plots. Point estimates of the parameters in the different applications were calculated as the medians of the second half from all stationary chains. Haploype Inferences with PHASE Haplotypes were obtained with the coalescent-based approach from Stephens & Donnelly (2003). The use of different priors (c, , f) for the haplotype reconstructions had little effect over the final set of inferred haplotypes. The differences were restricted to a few genotypes that produced alternative pairs of inferred haplotypes under the different models. The four LGs investigated had different homozygosity and number of markers, which led us to run chains of different lengths and thinning intervals. Burn-in iterations, thinning intervals and final chain lengths of the Bayesian chains that produced the reported results were as follows: LG2 (- X10, 1000 25 250); LG9 (-X10, 1000 25 500); LG10 (-X10, 1000 15 250); LG12 (-X10, 500 10 100). We measured the goodness of fit of the estimated haplotypes to an approximate coalescent with recombination, using the posterior pseudo-likelihood of the data under the model (Stephens and Donnelly, 2003), with the convergence tools available within CODA (not shown). Recombination Estimates with PHASE Recombination was estimated from population data (inferred haplotypes) using the general recombination model from Li and Stephens (2003) and Crawford et al. (2004), which allows hot and cold-spots of recombination to independently occur in different segments. We used four different sets of priors by combining two recombination probabilities per base-pair (c: the default value and the value obtained from the rate of the oaks genome content to their linkage length) with two priors for the population genetics recombination parameter (µ) and for the difference allowed (f) between the estimated population genetics recombination parameter () and its prior (µ). The default values for the hotspots priors were used in all cases. See PHASE documentation for a briefing on the recombination models and the priors. Our first attempts to calculate the population genetics recombination parameters were made with thinning intervals, final-chain lengths and burnin iterations shown in Table S5-1. Background recombination point- estimates show that informative priors might be essential to reach convergence, at least with this type of data. Only LG9 and LG10 point estimates obtained with the “oak priors” attained convergence. The GelmanRubins shrink factor test and Geweke`s Z scores indicated convergence failure in LG2 and LG12 (Figure S5-1 and Table S5-2). HeidelbergerWelch’s stationarity test failed only for the recombination estimate in the second segment from LG.9 (Table S5-3) The half-width tests failed for LG2 and LG12 (Table S5-3). Afterwards, we run a longer final set of iterations trying to confirm convergence in LG9 and LG10 and to attain it in LG2 and LG12. We used a thinning interval of 500 and a final chain length of 109 after 5x108 burn-in iterations, for all LGs. The results obtained are shown in the main text (Table 2). Convergence was confirmed for LG9 and LG10 (Figure S5-2, Tables S5-4, S5-5) and the recombination parameters point estimates varied only slightly. Furthermore, LG12 also passed the CODA convergence tests, although longer simulations would be needed to obtain accurate point estimates. Our attempts to obtain recombination estimates with the modified recombination probability per base pair (c) failed for three out the four LGs, even though we used much longer simulations (Table S5-6). Only LG9 simulations obtained with the oak priors seemed to reach convergence, with a point estimate rather close to the one obtained with the default recombination probability value. We did not further pursue convergence in other segments because one single simulation would last far over one month in our computers. REFERENCES S2 Best MG, Cowles MK, Vines SK (1995) CODA Manual version 0.30. MRC Biostatistics Unit, Cambridge, UK. Crawford D, Bhangale T, Li N, Hellenthal G, Rieder M, Nickerson D, Stephens M (2004) Evidence for substantial fine-scale variation in recombination rates across the human genome. Nature Genetics, 36, 700-706. Gelman A, Rubin DB (1992) Inference from iterative simulations using multiple sequences. Statistical Science, 7, 457-472 Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In Bayesina Statistics 4, (Eds. Bernardo JM, Berger JO, Dawid AP, Smith AFM). Clarendon Press, Oxford, UK Heidelberger P, Welch P (1983) Simulating run length control in the presence of an initial transient. Operations Research, 31, 1109-1144 Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics, 165, 2213-2233. Plummer M, Best N, Cowles K,Vines K (2006) CODA: Output analysis and diagnostics for MCMC. R package version 0.13-5. URL http://CRAN.R-project.org/package=coda R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org. Stephens M, Donnelly P (2003) A comparison of Bayesian methods for haplotype reconstruction. American Journal of Human Genetics, 73, 1162-1169. Table S5-1 Point estimates of the recombination parameters, together with burn-in periods and final chain lengths, for the first recombination estimates obtained with the default recombination probability per base pair (c=1E-08). Background Long Chain Burn-in Linkage Group Recombination Default priors (µ=4E-04, f=1E06) 5E+07 2.5E+05 LG2 2.4E-08 2.5E+07 1.3E+06 LG9 3.4E-08 2.5E+07 1.3E+06 LG10 4.1E-08 2.5E+07 1.3E+06 LG12 2.2E-08 r1 r2 0.82 0.81 0.62 0.88 0.54 0.85 NA 0.43 Oak priors (µ=0.04, f=1E04) 1E+08 5E+07 5E+07 2.5E+06 1E+08 5E+07 5E+08 2.5E+06 0.76 1.00 0.75 0.85 0.43 0.99 NA 0.38 LG2 LG9 LG10 LG12 4.8E-08 0.0331 0.0028 4.5E-08 Figure S5-1 Gelman-Rubin-Brooks plots of Gelman and Rubin (1992) shrink factor for the Bayesian chains used in the recombination estimates shown in Table S5-1 (only for the analyses with the “oak priors”). The pseudo-likelihood tests the goodness of fit of the inferred haplotypes to an approximate coalescent with recombination (Stephens and Donnelly, 2003). Table S5-2 Geweke Z-scores for the Bayesian chains used to estimate recombination parameter shown in Table S5-1 (only for the analyses with the ”oak priors”). Tests failures are indicated in bold characters. Z-scores Haplotype Parameter Chain 1 Chain 2 Chain 3 Chain 4 Chain 5 LG2 Background r1 r2 -1.000 0.998 0.323 1.000 2.230 -0.079 0.130 0.537 -0.258 -0.835 -2.995 -0.125 0.920 1.160 -0.080 LG9 Background r1 r2 1.390 0.600 -0.610 -1.986 1.440 0.070 -1.172 0.964 -1.194 1.426 -0.537 -1.648 0.899 0.450 -0.451 LG10 Background r1 0.106 1.492 -1.194 0.570 -0.269 0.055 0.423 0.913 -1.178 0.540 LG12 Background r1 r2 -0.242 0.218 0.594 1.163 2.581 1.174 1.270 -0.330 1.335 1.564 0.378 1.587 0.859 -0.623 0.722 Table S5-3 Heidelberger and Welch stationarity and half-width tests for the Bayesian chains used in the recombination estimates shown in Table S5-1 (only for the analyses with the ”oak priors”). Failures to pass the tests are indicated by bold characters. Haplotype Parameter LG2 LG9 LG10 LG12 Background r1 r2 Stationarity Test P-value C1 C2 C3 C4 C5 Half-width Test C1 C2 Mean C3 Half-width C4 1.04E-07 1.08E-07 C5 C1 C2 C3 C4 C5 passed passed 0.362 0.758 0.184 0.786 0.996 passed/failed 0.000 0.028 0.002 0.001 0.055 7.96E-09 2.07E-09 0.003 0.715 0.288 0.732 0.163 0.339 passed 1.348 1.341 1.330 1.280 1.273 0.016 0.016 0.025 0.014 0.013 passed 0.684 0.300 0.460 0.684 0.966 passed 0.682 0.671 0.666 0.688 0.682 0.009 0.008 0.013 0.006 0.007 passed Background passed r1 passed/failed r2 0.261 0.084 0.465 0.777 0.870 passed 12.360 12.760 12.860 13.030 13.190 0.620 0.878 0.594 0.436 0.672 0.563 0.084 0.549 0.072 0.643 passed 1.910 1.910 1.930 1.930 1.920 0.027 0.023 0.031 0.036 0.031 0.558 0.718 0.065 0.215 0.048 passed/NA 1.920 1.940 1.910 1.940 NA 0.026 0.022 0.028 0.042 NA Background r1 passed 0.676 0.614 0.604 0.672 0.686 passed 8.200 9.100 10.210 7.940 8.660 0.728 0.976 0.627 0.416 0.699 passed 0.342 0.571 0.618 0.965 0.405 passed 1.480 1.640 1.580 1.430 1.570 0.047 0.039 0.033 0.044 0.100 Background r1 r2 passed 0.611 0.754 0.661 0.306 0.871 passed/failed 1.576 1.190 1.426 1.330 2.229 0.261 0.243 0.401 0.242 0.783 passed 0.617 0.714 0.304 0.954 0.394 passed 1.552 1.530 1.539 1.550 1.554 0.025 0.026 0.027 0.019 0.032 passed 0.342 0.107 0.609 0.512 0.984 passed 0.683 0.687 0.683 0.680 0.787 0.030 0.031 0.040 0.027 0.070 Figure S5-2 Gelman-Rubin-Brooks plots of Gelman and Rubin (1992) shrink factor for the Bayesian chains used in the reported recombination estimates (Table 2, main text; only for the analyses with the “oak priors”). Table S5-4 Geweke Z-scores for the Bayesian chains used to estimate recombination parameter shown in Table 2 of the main text (only for the analyses with the “”oak priors”). Tests failures are indicated in bold characters. Z-scores Haplotype Parameter Chain 1 Chain 2 Chain 3 Chain 4 LG2 Background r1 r2 0.074 0.785 -0.508 0.371 0.499 0.857 0.411 -0.020 0.691 -1.206 -0.669 -2.297 LG9 Background r1 r2 0.229 -0.783 0.482 1.267 -1.791 0.256 0.492 -0.786 -1.312 1.267 -1.791 0.256 LG10 Background r1 -0.652 -0.388 -0.285 -0.727 0.726 -1.218 -1.169 -0.847 LG12 Background r1 r2 1.727 -0.246 0.342 -2.315 -1.862 -1.319 1.399 -0.453 0.707 1.152 0.987 1.160 Table S5-5 Heidelberger and Welch stationarity and half-width tests for the Bayesian chains used in the recombination estimates shown in Table 2 of the main text (only for the analyses with the ”oak priors”). Failures to pass the tests are indicated by bold characters. Haplotype Parameter LG2 LG9 LG10 LG12 Stationarity Test P-value C1 C2 C3 C4 Half-width Test C1 C2 Mean C3 C4 Half-width C1 C2 C3 C4 Background passed 0.622 0.307 0.757 0.555 failed 0.025 0.032 0.225 0.010 0.025 0.029 0.160 0.013 r1 passed 0.345 0.778 0.620 0.695 passed 1.270 1.346 1.326 1.295 0.013 0.008 0.009 0.009 r2 passed 0..841 0.387 0.507 0.414 passed 0.695 0.703 0.679 0.668 0.012 0.005 0.011 0.005 Background passed 0.543 0.392 0.134 0.366 passed 12.720 12.820 13.000 13.040 0.186 0.437 0.237 0.343 r1 passed 0.730 0.433 0.145 0.406 passed 1.910 1.930 1.910 1.920 0.012 0.010 0.012 0.012 r2 passed 0.064 0.902 0.985 0.400 passed 1.910 1.930 1.920 1.930 0.011 0.011 0.015 0.011 Background passed 0.776 0.146 0.694 0.719 passed 9.560 8.490 7.970 8.250 0.725 0.351 0.218 0.269 r1 passed 0.934 0.866 0.666 0.529 passed 1.640 1.490 1.490 1.480 0.031 0.025 0.133 0.033 Background passed 0.183 0.340 0.227 0.204 passed 8.550 8.250 8.610 8.100 0.371 0.232 0.294 0.244 r1 passed 0.579 0.458 0.225 0.121 passed 1.800 1.790 1.790 1.730 0.013 0.012 0.016 0.025 r2 passed 0.663 0.491 0.984 0.154 passed 1.520 1.460 1.490 1.400 0.015 0.017 0.017 0.020 Table S5-6 Recombination estimates, together with burn-in periods and final chain lengths, obtained using an estimate of the oaks recombination probability per base pair based in published data (ratio of the DNA content to the linkage length). Only the longest simulations for each segment are shown. Background LG1 Long Chain Burn-in Recombination Default priors (µ =4E-04, f =1E06) 1.0E+09 5.0E+08 LG9 2.1E-07 5.0E+08 5.0E+09 LG10 2.7E-07 1.0E+09 5.0E+08 LG12 2.5E-08 r1 r2 0.92 0.83 0.89 0.94 NA 0.50 Oak priors (µ =0.04, f =1E04) 2E+09 1E+09 LG2 1E+09 2.5E+09 LG9 1E+09 2.5E+10 LG10 1E+09 5.0E+08 LG12 0.85 0.91 0.62 0.93 0.63 0.95 NA 0.60 1.6E-06 0.0119 5.3E-06 1.6E-06 1: Only simulations for LG9 with the "oak priors" seemed to reach convergence

Convergence Diagnostics of Bayesian MCMC Chains

Related documents

Products

Support

Convergence Diagnostics of Bayesian MCMC Chains

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib