ICES CM 2000/V:08 Final Aug 30/2000 Do Different Methods Provide Accurate Probability Statements in the Short Term? V.R. Restrepo, K.R. Patterson, C.D. Darby, S. Gavaris, L.T. Kell, P. Lewy, B. Mesnil, A.E. Punt, R.M. Cook, C.M. O´Brien, D.W. Skagen, and G. Stefánsson Abstract.- The performance of uncertainty estimation procedures was evaluated with respect to accuracy. A confidence statement is said to be accurate if the confidence point achieves the desired probability coverage. A Monte Carlo experiment with 100 trials was conducted with a “true” population that experienced contrast between low and high fishing mortality. Observations for the last 25 years were drawn stochastically by adding measurement error. The assessment approaches were VPA-based (ADAPT and XSA with errors on the effort data). The “Delta Method”, parametric bootstrap and non-parametric bootstrap (NPB), and a Bayesian approach were used to quantify coverage and assess the accuracy of confidence limits of estimated interest parameters (F0.1, SSB and TACF0.1 in year 26) by comparing against the “true” values. Variations of the Delta Method and bootstrap were used to account for statistical estimation bias. The results indicated that accurate inference statements are possible with the different approaches and that bias correction can improve accuracy when it can be applied. The bias-corrected Delta-ADAPT and bias-corrected NPB-ADAPT applications performed best. Inference statements about F0.1 were more accurate than those for SSB or TAC. V. Restrepo: International Commission for the Conservation of Atlantic Tunas, Corazón de María 8-Sixth, 28002 Madrid, Spain [tel. +34 91 4165600, fax +34 91 4152612, e-mail: victor.restrepo@iccat.es]. K. Patrterson and R. Cook: FRS Marine Laboratory, Aberdeen, UK. C. Darby, L. Kell and C. O´Brien: CEFAS Laboratory, Lowestoft, UK. S. Gavaris: DFO Biological Station, St. Andrews, Canada. P. Lewy: DIFRES, Charlottenlund, Denmark. B. Mesnil: IFREMER, Nantes, France. A. Punt: CSIRO Marine Research, Hobart, Australia. D. Skagen: IMR, Bergen, Norway. G. Stefánsson: MRI and UI, Reykjavik, Iceland. 1 INTRODUCTION Before evaluating the performance of any uncertainty estimation procedure in real situations, it is a good idea to evaluate it in controlled experiments, as are afforded by Monte Carlo simulation studies. In this study we are interested in evaluating the accuracy of various estimation procedures given that the assessment method in which they are embedded make the same structural assumptions made in generating the simulated data. That is, we try to avoid confounding of these two potential sources of inaccuracy: the mismatch between assumed and real processes, and the intrinsic differences in uncertainty estimation procedures. Naturally, assumptions about the way the real world operates are a very important part of stock assessments and cannot be ignored in real applications, but our more narrowly-focused study should be a first step in this type of performance evaluation. One way of categorizing age-structured assessment methods in vogue today is as either admitting or ignoring measurement errors in catch-at-age observations. It is well known that the analysis of age-structured information is over-parameterized and both of these types of approaches aim to reduce the number of parameters to be estimated by making particular assumptions. Methods that do not admit catch-at-age errors are sometimes called “VPA-based”. By not admitting catch errors, they greatly reduce the number of parameters that require estimation. Separable methods allow for the statistical modeling of the admitted error in the catch-at-age observations, thus increasing the number of data being “fitted” by the model, but reduce the number of estimated parameters by assuming that fishing mortality can be split into two-way age-specific and yearspecific components. The analyses conducted for this study are done with “VPA-based” methods only. Although we did not examine separable-type models, testing based on these could be conducted in a similar fashion. METHODS Monte Carlo Test Procedure Our study examines the quality of inference statements judged with respect to accuracy. A confidence statement is said to be accurate if the confidence point achieves the desired probability coverage, Pr (η ≤ ηˆ[α ]) ≈ α . To compare the accuracy of alternative methods of making inference statements, a Monte Carlo experiment with 100 trials was conducted. The specification of the simulation structure for this experiment is described below. Each of the data sets was analyzed using an assessment procedure that made the same assumptions made in generating the data. The “Delta Method”, parametric bootstrap and non-parametric bootstrap (NPB), and a Bayesian approach were used to quantify coverage and assess the accuracy of confidence limits of estimated interest parameters. The results from the Monte Carlo experiment can be used to quantify coverage and assess the accuracy of confidence limits. For example, under the frequentist paradigm, using a correct model we would expect 10 of the 100 cumulative frequency distributions to contain the ‘true’ SSB between the 20th and 30th percentile confidence limits and similarly for each of the ten deciles from 0% to 100% (Figure 1). These same results can be plotted on a cumulative basis (Figure 1), in which case one would expect a straight line. Using an incorrect model for inference statements would result in patterns that would deviate from the straight lines shown in Figure 1. Differences in accuracy could be attributed to either 2 differences in the location of the point estimate with respect to the true value, or differences in spread, or both. Figure 2 depicts eight possible diagnostic patterns, labelled A to H. Note that in practice it may be difficult to distinguish the cause of patterns C, E or G based on these plots alone, because they all look similar. Dynamics of simulated population The dynamics for the stock have fixed growth, maturity and natural mortality: Length at age: l a = 100(1 − e −0.15 a ) 3 Weight at age: wa = 0.00001 l a (for the purpose of computing yield, mid-year weight computations are made, i.e. using [a+0.5]; for computing SSB, beginning of the year [a] is used) M=0.2 for all ages, 1 to 15 (there is no plus group) Proportion mature: 0, a=1 to 4, 0.3, a=5 0.5, a=6 0.7, a=7 0.9, a=8 0.95, a=9 1, a=10 and older. The only stochastic process was in recruitment, which was governed by a Beverton-Holt relationship with a steepness of 0.7 having lognormal errors (CV=0.6) and following an AR(1) process with ρ=0.5. The deterministic relationship was given by Rt +1 = SSBt , 0.67945 + 5.6621× 10 −5 SSBt which results in a virgin biomass of 100,000, and a maximum recruitment of 15,769. A single fleet with fixed selection pattern over time, Sa = 0.05, 0.1, 0.3, 0.7, 0.9, for a=1 to 5 and Sa=1.0 for older ages, exploits the stock. Catchability was held constant and there were no stochastic components to the fleet dynamics. With the above parameters, the following reference points are estimated: Fmax 0.335 F0.1 0.193 FMSY 0.166 Fcrash 0.473. Initially, the operating model was run until the population reached equilibrium at ½FMSY. Then, over a 25-year period, the stock experienced a "2-way trip" to 2FMSY and back to ½FMSY. The purpose of this was to allow substantial contrast in the data. 3 Observations used for model fitting For all 25 years, surveys (CPUE) indexing start of the year abundance for ages 1-15 were generated. These had lognormal errors with age-specific CVs ranging between 0.5 and 0.75 (higher CVs for young and old ages, lower CVs for intermediate ages). No survey data were produced for year 26. For testing of the VPA-based models, the true catch at age (ages 1-15, without a plus group) was used. Performance Statistics Although the individual assessment realizations generated a large number of outputs, we limited the analysis of results to the following three interest statistics: - SSB at start of year 26 (true = 31,801), F0.1 (calculated using the estimated selectivity for year 25, true = 0.193), and TAC in year 26 corresponding to the estimated F0.1 (true = 9,334). The rationale for selecting these three was the following: SSB is one of the population variables that is monitored in most assessments, as it indexes the reproductive potential of the stock at any one time. F0.1 is a biological reference point that is used as a target by various fishery management organizations. Lastly, the TAC corresponding to the target fishing mortality is the quantity that managers are ultimately interested in. Computation of the TAC in year 26 required a projection of recruits, which was fixed at zero for simplicity. Inference Methods We examined frequentist and Bayesian inference methods. Frequentist confidence distributions of interest parameters are derived from sampling distributions of their estimators (Efron 1998, Schweder and Hjort 1999) and represent the probability that the true value of the interest parameter (assumed to be a single valued constant) would be contained within the specified limits under repeated sampling. Bayesian posterior distributions of interest parameters are derived from a synthesis of the likelihood for the observed data and specified prior distributions for model parameters (Punt and Hilborn 1997) and represent the probability distribution of the interest parameter (assumed to be a random variable). Though the frequentist confidence distribution and the Bayesian posterior distribution have different interpretations, both are used in a similar way to support fisheries management decisions. For the purposes of evaluating accuracy, analysts provided the percentile in which the true value of the statistic (SSB, F0.1 or TAC) fell in each of the 100 trials. Various techniques have been devised to obtain inferences under the frequentist or Bayesian paradigms. The frequentist methods examined included Delta and bootstrap variants while Bayesian methods were based on numerical integration using either Sampling Importance Resampling (Rubin 1987) or Markov Chain Monte Carlo algorithms (Gilks et al. 1996). The Delta method is a technique for deriving approximate estimates of variance for complex model parameters and/or interest statistics. These estimates of variance may then be coupled 4 with some assumption about the sampling distribution of the model parameter or interest statistic to construct confidence distributions. The bootstrap is a data based simulation technique that can be used to obtain confidence distributions of interest parameters. The idea is to substitute a simple data based estimate for the sampling distribution of a parameter. Parametric variants of the bootstrap assume a parametric form of the distribution but obtain estimates of the defining parameters from observed data. Nonparametric variants of the bootstrap use the observed data, or residuals about the model fit, directly to define the distribution completely. Sampling Importance Resampling and Markov Chain Monte Carlo are techniques for integrating over the posterior distribution of model parameters given the observed data. Sampling Importance Resampling uses an importance function of model parameters to obtain importance ratios that can be used as weights in resampling. In Markov Chain Monte Carlo, samples are drawn from required distributions, constructed using Markov chains for a long time, and averaged to approximate expectations. Fisheries assessment models involve relationships that are not linear in the model parameters and the interest parameters. Estimation for such models will result in confidence distributions or posterior distributions that are displaced. The frequentist notion associated with this characteristic is bias. Some adjustment for statistical estimation bias was attempted with several of the frequentist methods, and in those cases the results are presented separately. Bayesian analogues for adjustment of a displacement are not available. VPA-type Assessment Models Although VPA based methods succeed in reducing the number of parameters to be estimated by assuming that the error in the catch at age is negligible, experience has shown that further assumptions are often required to obtain reliable estimates of stock status. Two variants of VPA type methods are common. One imposes additional conditioning by constraining the oldest age fishing mortality to be equal to some function of fishing mortality for younger ages in the same year. The other imposes additional conditioning by constraining the index catchabilities to be equal over specified older ages. As indicated earlier, the intention was to apply assessment methods which made correct structural assumptions. For this simulation study, assessment methods were tailored to make the correct assumption that expected fishing mortality for age 15 was equal to fishing mortality over younger ages, i.e. ages 6 - 14, in the same year. All index catchabilities for ages 1 - 15 were estimated independently without constraints. Additional structural assumptions required for the assessment model were made to be consistent with the operational model. The annual natural mortality rate, M, is assumed constant and equal to 0.2 over all ages and years. The errors in the abundance indices, Ia,y , are assumed independent and identically distributed with age specific variance across all years, σ a2 , after taking natural logarithms of the values. The methods were tasked with estimating age-specific CVs (i.e. make the same structural assumption made during data generation, but were not given the true values to use as input). The F-constrained VPA assessments were carried out using either the ADAPT adaptive framework (Gavaris 1988) with various software implementations or the XSA extended survivors iterative algorithm (Shepherd 1999) as implemented in the Lowestoft assessment suite (Darby and Flatman 1994). It should be noted that there are differences in the implementation of various algorithms by different scientists (for instance in stopping criteria) which can lead to differences in the results that could complicate interpretation of results. We carried out simple 5 comparisons of the individual iteration results obtained with the various ADAPT and XSA implementations and found that, while differences existed, they were not substantial. ADAPT specification Model parameters: θ a , 26 = ln N a,26 = ln population abundance , ages a = 2 - 15 at the beginning of year 26, κ a = ln q a = ln calibration constant , for ages a = 1 - 15. A weighted least squares solution for the parameters was obtained by minimizing the sum of squared differences between the natural logarithm of observed abundance indices and the natural logarithm of population abundance adjusted for catchability by the calibration constants: Ψ( a, y ) ( ( ( ))) 1 θˆ, κˆ = å 2 ln I a , y − κˆ a + ln N a , y θˆ a , y σˆ a 2 or equivalently, for the Bayesian methods, using the likelihood function: L( D | θ , κ ) ∝ ∏ σˆ1a exp(− 2 (σˆ1 ) 2 [ln I a , y − ln(q a N a , y )]2 ) . a a, y At the beginning of year 26, the population abundance was obtained directly from the parameter θˆ estimates, N a , 26 = e a , 26 . For all other times, the population abundance was computed using the VPA algorithm, which incorporates the common exponential decay model N a +1, y +1 = N a , y e ( − Fa , y + M ) . The annual fishing mortality rate, Fa , y , was obtained by solving the catch equation, Ca , y ( ( − F F N 1 − e a,y = a, y a, y (Fa, y + M ) +M ) ). The fishing mortality rate for age 15 was assumed equal to the average for ages 6 - 14 during that time interval 14 F15, y = å Fa , y 9 . a =6 The interest statistics, SSB26, F0.1 and TAC26 were subsequently calculated. SSB26 = å ma wa N a , 26 a where m is proportion mature at age and w is weight at age. F0.1 is the fishing mortality corresponding to one tenth the slope at the origin of the yield per recruit as a function of fishing mortality. Yield per recruit is 6 ( ) YPR = å wa +0.5 N a FPRa 1 − e −( FPRa + M ) FPRa + M a where N1 is set to 1 and the partial recruitment to the fishery is derived from the estimated fishing mortality at age for year 25 1 15 PRa = F25 å Fa 10 a =6 The TAC corresponding to F0.1 is ( ) TAC 26 = å wa +0.5 N a , 26 F0.1 PRa 1 − e − ( F0.1PRa + M ) F0.1 PRa + M a Delta- ADAPT This implementation of the Delta method derives approximate estimates of variance for the interest parameters and couples that with an assumption about the parametric form of their sampling distribution to derive confidence distributions. The covariance matrix of the model parameter set, {θ,κ} were estimated using the common linear approximation (Kennedy and Gentle 1980 p.476) [ ( ) ( )] ( ) cov θˆ, κˆ = σˆ 2 J T θˆ, κˆ J θˆ, κˆ −1 ( ) where σ 2 is the mean square residual and J θ ,κ is the Jacobian matrix of the vector of ( ) residuals. The variance of an interest parameter, ηˆ = g θˆ, κˆ where g is the transformation function, were estimated using the Delta approximation (Ratkowsky 1983): [ ( )] Var (ηˆ ) = tr GG T cov θˆ, κˆ where G is the vector of first derivatives of g with respect to parameters. Assuming a Gaussian distribution, confidence distributions of the interest parameters were approximated as ( ) N ~ ηˆ , Var (ηˆ ) . As indicated above, estimation bias is expected. A bias adjusted Delta confidence distribution was constructed by shifting the Delta confidence distribution to account for the magnitude of the estimated bias and ignoring any increase in variance associated with the bias estimate. The bias of the model parameters was estimated using Box’s (1971) approximation, which assumes that the errors are normally distributed: −1 −1 ù − σˆ 2 æ ö æ ö éæ ö T ˆ T ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ Bias θ , κ = ç å J i′ θ , κ J i′ θ , κ ÷ ç å J i θ , κ ÷tr êç å J i′ θ , κ J i′ θ , κ ÷ H i θˆ, κˆ ú 2 è i′ ø è i ø ëêè i′ ø úû ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) where J i θˆ, κˆ = J i′ θˆ, κˆ are vectors of the first derivatives for each residual and H i θˆ, κˆ are the Hessian matrices of second derivatives for each residual. The bias of interest parameters is then derived using the method described in Ratkowsky (1983): ( ) [ ( )] Bias (ηˆ ) = G T Bias θˆ, κˆ + tr W cov θˆ, κˆ 2 7 where W is the matrix of second derivatives of g with respect to parameters. Again assuming a Gaussian distribution, confidence distributions of the interest parameters were approximated as N ~ ηˆ − Bias(ηˆ ), Var (ηˆ ) . ( ) PB-ADAPT This method is explained in Restrepo et al. (1992) and Mesnil (1995). The method requires externally provided CVs in order to generate replicates of the dataset. Here, the same CVs as those used in generating the artificial data sets were used. NPB-ADAPT Efron (1979) introduced the non-parametric bootstrap percentile method as a simple automatic data based technique to calculate confidence distributions for complex statistics. Let η represent the interest parameter, (with estimate η̂ corresponding to the least-squares solution) . Its cumulative frequency distribution is derived from the bootstrap replicate estimates η̂ b . The replicates are computed by applying the estimation formulae to bootstrap samples. Nonparametric bootstrap replications are obtained when bootstrap samples are generated by random sampling with replacement from the observed data. Here we generated modelconditioned bootstrap replications, which are obtained by sampling with replacement from all the observed abundance index residuals (nonparametric) and adding these to the model predicted values for the abundance indices. As the residuals were weighted by the inverse standard error at each age in the minimization objective function, the resampled weighted residuals used to construct the bootstrap samples were multiplied by the standard error before being added to the predicted index values. For the percentile method, the confidence distribution of the interest parameter is defined as the proportion of bootstrap replicates, η̂ b , less than or equal to that b ˆ ( x ) = Prob{ηˆ ≤ x} = # ηˆ ≤ x . value, Ω B { } The bias-corrected percentile method of Efron (1982), improves on the percentile method by adjusting for differences between the median of the bootstrap percentile density function and the estimate obtained with the original data sample. The confidence distribution of the interest parameter is obtained with the bias-corrected percentile method by constructing the paired b values (ηˆ BC , α ) . The α are the respective probability levels equal to 1 B , 2 B , 3 B , L , B − 1 B where B is the total number of bootstrap replicates. For each α, calculate the bias adjusted b ˆ −1 (Φ(2 z + z )) . Here, Φ is the cumulative distribution function of a standard =Ω quantity, ηˆ BC 0 α −1 ˆ (η̂ ) . The term z0 achieves the bias adjustment. normal variate, z = Φ (α ) and z = Φ −1 Ω α 0 ( ) ˆ −1 ( ) or Φ −1 ( ) is used to represent the inverse distribution function, i.e. the The notation Ω critical value corresponding to the specified probability level. Note that computations are not carried out for α = B B because zα = Φ −1 (α = 1) is not defined. Bayes - ADAPT The priors for the ADAPT analysis are assumed to be: Parameter {θ 26,a : a = 2,3,..,15} Prior distribution U[-∞, ∞] 8 {κ a : a = 1,2..,15} U[-∞, ∞] σ ∝ σ12 2 a a XSA Specification The design of XSA was intended to implement a q-constrained VPA estimation method. The default algorithm assumes that index catchability for the oldest age is equal to that for the next oldest age. For this study, the algorithm was modified to estimate catchability for all ages independently. XSA also estimates the survivors for all year-classes by default. However, an Fconstrained VPA can be mimicked through the shrinkage option. Shrinkage to the mean F was applied to the oldest age in all years other than the terminal year. The weighting for F shrinkage was derived assuming a standard error of 0.01. Observations from the CPUE series were weighted using inverse variances obtained for each age. The variances were derived from the standard errors of the log catchability which are assumed to be log normal. In addition the standard errors were assumed to have a minimum value of 0.3. Delta-XSA Numbers at age in the last data year are assumed to have a log normal distribution i.e. Ln Nmc ~ Ln N + D (0,σ2) where N is the XSA estimate of the survivors for a cohort. D(0,σ2) represents the normal distribution with unit mean and variance σ2 as derived from the XSA model. Within XSA two estimates of standard error are identified - “internal” and “external”. To be conservative the larger of the two values at each age was adopted as the estimate of variance. Covariances are ignored. To estimate statistics of interest draws are made from the distributions of N. These Ns are then used to solve the catch equation using the given catches and natural mortalities. This ensures that fishing mortality is consistent with the observed catches. The parameters of interest, F0.1, SSB in the last year and the TAC in the year after the last data year are then derived using these simulated Ns and Fs. NPB-XSA The nonparametric bootstrap of XSA models uncertainty in the CPUE indices and hence other quantities of interest by performing a nonparametric bootstrap of the relationship between CPUE and population size. CPUE is transformed to relate the population abundance during the time at which the catch was taken to the population abundance at the beginning of the year. Time series weighting was not used so that the residuals were selected with equal probability. The bootstrap algorithm 1) 2) 3) 4) Run XSA and estimate log catchability Sample by year from the “log catchability residuals” within age and fleet with replacement Generate new CPUE indices Re-run XSA and generate statistics of interest 9 5) Repeat steps 2)-5) RESULTS AND CONCLUSIONS Figures 3-12 show the results obtained for the different procedures. Referring to Figure 2, the following patterns were observed in the results: Patterns in the results.A “+” denotes good performance; a “B” indicates that the parameter is overestimated - refer to Figure 2 for diagnostic patterns Method SSB F0.1 TAC Delta-ADAPT B + B Delta-ADAPT-BC + + + NPB-ADAPT B + B NPB-ADAPT-BC + + + PB-ADAPT B + B Bayes-ADAPT B + B Delta-XSA B + B NPB-XSA B + B The following table provides an indication of the accuracy with which the performance statistics were estimated. Each entry is the number of simulations out of 100 for which the true value was smaller than the estimated 50th percentile (i.e. entries equal to 50 indicate accurate results; higher values indicate over-estimation). Method Delta-ADAPT Delta-ADAPT-BC NPB-ADAPT NPB-ADAPT-BC PB-ADAPT Bayes-ADAPT Delta-XSA NPB-XSA SSB 56 50 71 51 68 57 75 59 F0.1 49 52 45 51 52 54 56 59 TAC 61 53 71 54 74 66 77 63 For these VPA-based methods, inference statements about F0.1 tended to be more accurate than those for SSB or TAC, which tended to be overestimated (except for bias corrected NPBADAPT and bias corrected Delta-ADAPT applications). The Bias Corrected Delta-ADAPT and Bias Corrected NPB-ADAPT applications performed better than the others. Bias correction, when it can be applied, tends to help improve the accuracy of inference statements. 10 It is evident that accurate inference statements about some parameters can be obtained with the approaches examined here. Methods with bias correction appeared to perform well, but a troubling result is that the other methods overestimated SSB and TAC. These methods are in quite widespread use for advisory purposes. The results tabulated above imply that, for these data, choosing a TAC corresponding to the median, for example, could result in fishing mortality levels that exceed the target in about 7 out of 10 cases (instead of the 50% intended by choosing the median). We do not know the degree to which this result can be generalized because we only examined one simulated population with a given trajectory and a given set of observation errors. It may be informative to repeat this exercise with different population trajectories to see what patterns emerge. Nevertheless, these simulations suggest that the magnitude of the bias can be of concern. As mentioned earlier, we conducted these tests ensuring that the structural models in the assessment procedures were specified in the same way as the simulated population was generated. We believe that the issue of model mis-specification needs to be confronted in studies of the type we have conducted here. However, the model components that can be mis-specified are numerous, making this an onerous exercise. ACKNOWLEDGMENTS The study has been carried out with financial support from the Commission of the European Communities, Agriculture and Fisheries (FAIR) specific RTD programme, FAIR-PL98-4231, “Evaluation and Comparison of Methods for Estimating Uncertainty in Harvesting Fish from Natural Populations”. It does not necessarily reflect its views and in no way anticipates the Commission´s future policy in this area. REFERENCES Box, M.J. (1971) Bias in nonlinear estimation. Journal of the Royal Statistical Society, Series B 33, 171-201. Darby, C.D. and Flatman, S. (1994) Virtual Population Analysis: version 3.1 (Windows/Dos) user guide. Information Technology Series, MAFF Directorate of Fisheries Research, Lowestoft No. 1, 85pp. Efron, B. (1982) The jacknife, the bootstrap and other resampling plans (CBMS Monograph No. 38). Society for Industrial and Applied Mathematics, Philadelphia. Efron, B. (1998). R.A. Fisher in the 21st. century. Statistical Science 13(2), 95-122. Gavaris, S. (1988) An adaptive framework for the estimation of population size. Canadian Atlantic Fisheries Scientific Advisory Committee Research Document No. 88/29, 12 pp. Gilks, W. R., Richardson, S. and Spiegelhalter, D. J. (1996). Markov Chaine Monte Carlo in Practise. Chapmann & Hall, London. Kennedy, W.J., Jr. and J.E. Gentle. 1980. Statistical computing. Marcel Dekker. New York. 591 pp. 11 Lewy, P. and Nielsen, A. (2000). A Bayesian approach to fish stock assessment of North Sea plaice using Markov Chain Monte Carlo and graphical models. Working Paper for the EU concerted action FAIR PL98-4231 project meeting in Madrid, February 2000. Mesnil, B. (1995) Analyse de risque en évaluation de stocks et gestion de pêcheries. Utilisation des méthodes de Monte Carlo. In: Les recherches françaises en évaluation quantitative et modélisation des ressources et des systèmes halieutiques (Actes du 1er Forum Halieumétrique, ENSA, Rennes 29 juin - 1er juillet 1993) D. Gascuel, D., J.L. Durand and A. Fonteneau, eds. ORSTOM, Colloques et séminaires, Rennes. 105-125. [In French.] Punt, A.E. and Hilborn, R. (1997) Fisheries stock assessment and decision analysis: A review of the Bayesian approach. Reviews in Fish Biology and Fisheries 7, 35-63. Ratkowsky, D.A. (1983) Nonlinear regression modelling. Marcel Dekker, New York. Restrepo, V.R., Hoenig, J.M., Powers, J.E., Baird, J.W. and Turner, S.C. (1992) A simple simulation approach to risk and cost analysis, with applications to swordfish and cod fisheries. Fishery Bulletin, US. 90, 736-748. Rubin, D. B. (1987). Comment: The calculation of posterior distributions by data augmentation. J. Am. Statist. Assoc. 82: 543-546. Schweder, T. and N.L. Hjort (1999). Frequentist analogues of priors and posteriors. (Statistical Research Report No. 8), University of Oslo, Oslo. Shepherd, J. G. (1999) Extended survivors analysis: An improved method for the analysis of catch-at-age data and abundance indices. ICES Journal of Marine Science 56, 584-591. 12 Observed frequency Observed cumulative dist. 100 75 50 25 0 0.00 0.25 0.50 0.75 50 40 30 20 10 0 1 1.00 2 3 4 5 6 7 8 9 10 Prob. Interval (10% bins) True probability Figure 1. Example characterization of the results for a model with perfect accuracy. The left panel shows the cumulative number of simulation realizations resulting in a given true probability statement; the right panel shows the same information categorized in 10 intervals of equal size. Observed frequency True µ higher than µ estimate True σ equal to σ estimate Observed cumulative dist. 100 75 50 25 A 0 0.00 0.25 0.50 0.75 50 45 40 35 30 25 20 15 10 5 0 1 1.00 2 3 Observed frequency Observed cumulative dist. 100 True µ smaller than µ estimate True σ equal to σ estimate 75 50 25 B 0 0.00 0.25 0.50 0.75 1 1.00 Observed frequency Observed cumulative dist. 75 50 25 C 0 0.25 0.50 2 3 0.75 1 1.00 Observed frequency Observed cumulative dist. 75 50 25 D 0.25 0.50 Fractile 8 9 10 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 8 9 10 Prob. Interval (10% bins) 100 0 0.00 7 50 45 40 35 30 25 20 15 10 5 0 Fractile True µ equal to µ estimate True σ smaller than σ estimate 6 Prob. Interval (10% bins) 100 0.00 5 50 45 40 35 30 25 20 15 10 5 0 Fractile True µ equal to µ estimate True σ higher than σ estimate 4 Prob. Interval (10% bins) Fractile 0.75 1.00 50 45 40 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 Prob. Interval (10% bins) Figure 2 (a). Example simulation results for methods that do not perform well. The text on the left of each pair of graphs explains the cause of inaccuracy, in terms of the location (µ) and dispersion (σ) parameters. 13 Observed frequency True µ higher than µ estimate True σ higher than σ estimate Observed cumulative dist. 100 75 50 25 E 0 0.00 0.25 0.50 0.75 50 45 40 35 30 25 20 15 10 5 0 1 1.00 2 3 Observed frequency Observed cumulative dist. 100 True µ higher than µ estimate True σ smaller than σ estimate 75 50 25 F 0 0.00 0.25 0.50 0.75 1 1.00 Observed frequency Observed cumulative dist. 75 50 25 G 0.25 0.50 2 3 0.75 1 1.00 Observed frequency Observed cumulative dist. 75 50 25 H 0.25 0.50 Fractile 8 9 10 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 8 9 10 Prob. Interval (10% bins) 100 0 0.00 7 50 45 40 35 30 25 20 15 10 5 0 Fractile True µ smaller than µ estimate True σ smaller than σ estimate 6 Prob. Interval (10% bins) 100 0 0.00 5 50 45 40 35 30 25 20 15 10 5 0 Fractile True µ smaller than µ estimate True σ higher than σ estimate 4 Prob. Interval (10% bins) Fractile 0.75 1.00 50 45 40 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 Prob. Interval (10% bins) Figure 2 (b). Example simulation results for methods that do not perform well. The text on the left of each pair of graphs explains the cause of inaccuracy , in terms of the location (µ) and dispersion (σ) parameters. 14 Delta-ADAPT 100 50 40 Count SSB Count 75 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 100 Count Count 50 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.5 0.6 Prob. interval 100 50 40 Count 75 Count 0.7 40 25 TAC 0.6 50 75 F0.1 0.5 Prob. interval 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.5 0.6 Prob. interval Figure 3. Simulation results for Delta-ADAPT Delta-ADAPT Bias Corrected 100 50 40 Count SSB Count 75 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 100 40 Count Count 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.6 50 40 Count 75 Count 0.5 Prob. interval 100 TAC 0.6 50 75 F0.1 0.5 Prob. interval 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 1.0 0.1 0.2 Fractile 0.3 0.4 0.5 0.6 Prob. interval Figure 4. Simulation results for Delta-ADAPT, bias-corrected. 15 NPB-ADAPT 100 50 40 Count SSB Count 75 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 100 Count Count 50 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.5 0.6 Prob. interval 100 50 40 Count 75 Count 0.7 40 25 TAC 0.6 50 75 F0.1 0.5 Prob. interval 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.5 0.6 Prob. interval Figure 5. Simulation results for NPB-ADAPT. NPB-ADAPT Bias Corrected 100 50 40 Count SSB Count 75 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 100 40 Count Count 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.6 50 40 Count 75 Count 0.5 Prob. interval 100 TAC 0.6 50 75 F0.1 0.5 Prob. interval 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 1.0 0.1 0.2 Fractile 0.3 0.4 0.5 0.6 Prob. interval Figure 6. Simulation results for NPB-ADAPT, bias-corrected. 16 PB ADAPT 100 50 40 Count SSB Count 75 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 100 Count Count 50 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.5 0.6 Prob. interval 100 50 40 Count 75 Count 0.7 40 25 TAC 0.6 50 75 F0.1 0.5 Prob. interval 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.5 0.6 Prob. interval Figure 7. Simulation results for PB-ADAPT Bayes ADAPT 100 50 40 Count SSB Count 75 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 100 40 Count Count 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.6 50 40 Count 75 Count 0.5 Prob. interval 100 TAC 0.6 50 75 F0.1 0.5 Prob. interval 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 Fractile 1.0 0.1 0.2 0.3 0.4 0.5 0.6 Prob. interval Figure 8. Simulation results for Bayes-ADAPT. 17 Delta XSA 100 50 40 Count SSB Count 75 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 100 Count Count 50 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 0.7 0.8 0.9 1 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.5 0.6 Prob. interval 100 50 40 Count 75 Count 0.7 40 25 TAC 0.6 50 75 F0.1 0.5 Prob. interval 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.5 0.6 Prob. interval Figure 9. Simulation results for Delta-XSA. NPB XSA 100 50 40 Count SSB Count 75 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 100 40 Count Count 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 0.1 1.0 0.2 0.3 0.4 Fractile 0.6 50 40 Count 75 Count 0.5 Prob. interval 100 TAC 0.6 50 75 F0.1 0.5 Prob. interval 50 25 30 20 10 0 0 0.0 0.3 0.5 0.8 Fractile 1.0 0.1 0.2 0.3 0.4 0.5 0.6 Prob. interval Figure 10. Simulation results for NPB-XSA. 18 19