Supplemental Materials An Exemplar-Familiarity Model Predicts Short-Term and Long-Term Probe Recognition Across Diverse Forms of Memory Search By R. M. Nosofsky et al., 2014, Journal of Experimental Psychology: Learning, Memory, and Cognition http://dx.doi.org/10.1037/xlm0000015 Hierarchical Bayesian Implementation of the EBRW In order to verify the parameter estimates for the exemplar-based random walk (EBRW) model obtained from least-squares fitting, as presented in the main body of the paper, we implemented the model as a hierarchical Bayesian model (for a general reference, see Kruschke, 2011; for examples of hierarchical Bayesian cognitive modeling, see Lee & Wagenmakers, forthcoming). In this model, we obtain not point estimates for each parameter but rather a complete posterior distribution over parameter estimates. This is done both at the group level, for each condition, and at the individual level, for each participant. Thus, we assume that each participant has a value for each parameter of the EBRW. These values, in turn, are drawn from a group distribution on each parameter that has a characteristic mean and variance (thus, we assume the group distributions are unimodal). The group distributions are assumed to be independent. This situation is depicted schematically in Figure S1. In order to do Bayesian inference on the parameters of the model, we need two things: a likelihood function and a prior. The likelihood gives the joint probability of an observed response and response time (RT) on a given trial, conditioned on the parameters. The prior specifies the initial “naive” probabilities assigned to different values of each parameter. We address each of these components in turn. Likelihood In the version of the EBRW presented in the main body of the paper, the model predicts discrete RTs, as each step of the random walk is assumed to take a constant 1 amount of time. While this approximation of continuous time poses no problems when fitting mean RT, it is a difficulty when performing Bayesian inference. The underlying reason is that we need to compute the likelihood for the observed data (response and RT) on each trial, rather than in aggregate. If the observed RT on a trial falls between two steps of the random walk, it will be assigned zero likelihood. Intuitively, this is a problem of translating between discrete and continuous time. Many solutions to the problem are possible. For example, one might interpolate between the peaks of the discrete RT distribution and renormalize it, such that RTs that fall between two steps of the random walk have non-zero likelihood. We, however, choose to use a different continuous approximation to the random walk; namely, the Wiener diffusion process. The Wiener diffusion process was introduced to psychological modeling of RT by Ratcliff (1978), who explicated the model as the continuous limit of a discrete-time, discrete-space random walk exactly like the EBRW. The basic idea is to take the mean and variance of each step of the random walk and approximate this step distribution with a Gaussian possessing the same mean and variance. In the EBRW, the probability p of taking a step toward the “Old” boundary is given by Equation 3 in the main text. Let X be a Bernoulli random variable representing the steps of the EBRW. X takes the value 1 with probability p and 1 with probability (1 p). The mean of X is then: E[X] = p – (1 – p) = 2p – 1 (S1) and its variance is Var[X] = E[X2] – E[X]2 = p + (1 – p) – (2p – 1)2 = 4p(1 – p). (S2) Imagine that, instead of taking a single step (+1 or 1) at each time interval [t, t + dt], the amount by which we move is an average of n equally-spaced steps during that interval. The average of n steps is also a random variable and, by the central limit theorem, as n approaches infinity, this variable will be distributed as a Gaussian with its mean and variance given by Equations S1 and S2. Thus, we can approximate a discretetime, discrete-space random walk with an infinite number of infinitesimal Gaussian increments. This is the Wiener diffusion process. 2 Let μ be the mean step size given by Equation S1 and σ2 be the variance in Equation S2. Further, assume that the “New” boundary lies at zero, the “Old” boundary lies at a value A and the walk begins at a point z = Ac between 0 and A (i.e., 0 < c < 1). Then, the likelihood of eventually reaching the “Old” boundary is 2μz p(Old) = exp( 2 )−1 σ 2μz exp( 2 )−exp(− σ . (S3) 2μ(A−z) ) σ2 This expression is given in a slightly different form in Ratcliff (1978). Alone, Equation S3 gives us only the likelihood of the final response, not the RT. Assuming, as in the main text, that there is a certain residual non-decision time Tr, the likelihood that a response will occur at a time t after this residual time, conditioned on the final response (old or new) is given by f(t|Old) = πσ2 A2 exp (− (1−c)Aμ σ2 ) ∑∞ k=1 t μ2 ksin(kπ(1 − c))exp [− 2 (σ2 + 3 π2 k2 σ2 A2 )] (S4a) f(t|New) = πσ2 A2 exp (− cAμ ) ∑∞ k=1 σ2 t μ2 ksin(kπc)exp [− 2 (σ2 + π2 k2 σ2 A2 )] (S4b) Notice that Equations S4a and b contain infinite sum terms—these must be evaluated numerically. We use a package specifically for this purpose developed by Wabersich and Vandekerckhove (2013). In this way, we have moved from the original discrete-time, discrete-space EBRW to a continuous approximation that allows us to compute the joint likelihood of a response and its RT, given a set of model parameters. Prior We assume that each participant has his or her own value for each of the nine model parameters. These are all drawn from a group-level distribution that is independent of the other groups and parameterized by a mean and precision. For ease of implementation, we first transform each model parameter to lie on the real line, such that the group-level distributions are Gaussian. We then place a very broad prior on the mean and precision of these group-level Gaussian distributions, such that the posterior estimates are driven entirely by the data. The model parameters and their associated ranges, transformations, and priors are given in Table S1. One exception to this procedure is the residual time parameter Tr, for which we do not assume any group-level distribution. Instead, it is estimated independently for each participant, with a prior that is uniform over the range of 0 to the minimum observed RT for that participant. The group mean residual time is, then, just the average of the Tr estimated for each participant in the group. Posterior Inference This hierarchical Bayesian version of the EBRW was implemented in JAGS (Plummer, 2011), which estimates the joint posterior distribution by drawing a large number of samples from the posterior using Gibbs sampling. We drew 5,000 samples from the posterior after 2,000 steps of burn-in. 4 Results Posterior Distributions on Group Parameters Because our primary interest is in the differences in model parameters between conditions, our analysis focuses on the posterior distributions for the group-level means on each parameter. We report the posterior distributions of the untransformed parameters, such that they lie in the same range as the parameter estimates reported in the main body of the paper (i.e., we examine the group-mean similarity parameter, μs, rather than the transformed version, logit(s)). The distributions are summarized by their mean and 95% highest density interval (HDI) in Table S2. We can determine whether the value of the group means is credibly different between conditions by examining the 95% HDI of their difference. If the 95% HDI of their difference excludes zero, we say the two group means are credibly different (this is the logic emphasized by Kruschke, 2011). These qualitative differences in group means (whether they are greater than, less than, or equal to each other) are also indicated in Table S2. The pattern of parameter estimates reported in Table S2 was discussed in the main text of our article and is not repeated here. Predictions In addition to estimating EBRW parameters for each participant and group, we can produce predictions based on the samples drawn from the posterior. Before attempting to interpret the posterior distribution on model parameters, we must verify that the hierarchical Bayesian version of the EBRW matches the data well. To achieve this aim, we start by performing a posterior predictive check by producing predictions for mean error rate and correct RT, which can be visually compared to the observed data. Each sample from the posterior represents the parameters of the EBRW that are inferred for each participant in each condition. Thus, for each trial a participant completes, we can form a prediction of the probability that the participant will judge the probe item on that trial as “old,” as well as the participant’s mean correct and error RT for the trial. 5 Furthermore, for the Wiener diffusion process, the mean RT, conditional on hitting either the “old” or “new” boundary, can be obtained by solving the Kolmogorov equations for a Gaussian diffusion process, as described by Grasman, Wagenmakers, and van der Maas (2009; analogous expressions are given by Palmer, Huk, and Shadlen, 2005). These are 1 A E[RT|Old] = TER + μ [ μA tanh( 2 ) σ 1 E[RT|New] = TER + μ [ − A μA tanh( 2 ) σ z μz tanh( 2 ) σ − ] A−z tanh( μ(A−z) ) σ2 (S5a) ]. (S5b) By averaging the predicted p(Old) and mean RT over all the posterior samples, we effectively marginalize over the posterior to obtain a predicted p(Old) and mean correct RT for each trial and each participant. Finally, we average these predictions over each participant in order to obtain predicted p(Old) and mean correct RT in each condition. The predicted p(Old) is converted into an error rate (1 – p(Old) for old probe items, p(Old) for new probe items). The predicted mean error rates and correct RT are shown in Figure S2. These correspond quite closely with the observed data. The present method also allows us to compute the complete predicted group RT distributions rather just the mean RTs. Figure S3 shows predicted and observed RT quantiles for correct responses as a function of condition, probe type (old/new), and set size. The observed quantiles are the result of “Vincentizing”; that is, quantiles are computed for each participant and then averaged to produce group quantile estimates (although this method is not always the best way to construct group distributions, it suffices for visualization purposes because we are not directly fitting the estimated group quantiles; Rouder & Speckman, 2004). Error bars for the observed RT quantiles represent 1 within-subject standard error. Predicted quantiles are computed in the same way: For each sample from the Gibbs sampler, predicted RT quantiles are computed for each participant for each trial, using the algorithm for the cumulative first-passage time distribution of a Wiener process developed by Blurton, Kesselmeier, and Gondan (2012). The predicted quantile for each 6 trial is, then, the average over the quantiles computed at each step. The final predicted group quantiles are again Vincentized by taking the average RT quantile over each trial from each participant. Thus, the error bars for the predicted RT quantiles represent 1 predicted within-subject standard error. Given how well the model predicts mean RT, it is no surprise that it is able to closely match observed median RTs (the .5 quantile; see Figure S3). Predictions for the leading and trailing edges of the RT distributions (the .1 and .9 quantiles) are not as good quantitatively in some conditions but clearly capture the overall trend. This analysis suggests that the EBRW provides a good account of not just mean accuracy and RT but also captures important variation in the complete RT distributions. 7 References Blurton, S. P., Kesselmeier, M., & Gondan, M. (2012). Fast and accurate calculations for cumulative first-passage time distributions in Wiener diffusion models. Journal of Mathematical Psychology, 56, 470-475. Grasman, R. P. P. P., Wagenmakers, E.-J., & van der Maas, H. L. J. (2009). On the mean and variance of response times under the diffusion model with an application to parameter estimation. Journal of Mathematical Psychology, 53, 55–68. Kruschke, J. K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS. Academic Press. Lee, M. D., & Wagenmakers, E.-J. (forthcoming). Bayesian cognitive modeling: A practical course. Cambridge University Press. Palmer, J., Huk, A. C., & Shadlen, M. N. (2005). The effect of stimulus strength on the speed and accuracy of a perceptual decision. Journal of Vision, 5, 376–404. Plummer, M. (2011). JAGS: Just another Gibbs sampler. Retrieved from http://mcmcjags.sourceforge.net/ Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108. Rouder, J. N., & Speckman, P. L. (2004). An evaluation of the Vincentizing method of forming group-level response time distributions. Psychonomic Bulletin & Review, 11, 419–427. Wabersich, D., & Vandekerckhove, J. (2013). Extending JAGS: A tutorial on adding custom distributions to JAGS (with a diffusion model example). Behavior Research Methods. 8 Table S1 Functions Used to Transform the Gaussian Group-Level Distributions Back Into Their Natural Range, as Well as the Priors Placed on the Mean and Precision of the GroupLevel Gaussian Distributions EBRW parameter Range Transformation Prior on mean Prior on precision s [0, 1] Logit μs ~ N(0, 1000) τs ~ Gamma(0.001, 0.001) α [0, ∞) Log μα ~ N(0, 1000) τα ~ Gamma(0.001, 0.001) β [0, ∞) Log μβ ~ N(0, 1000) τβ ~ Gamma(0.001, 0.001) B [0, ∞) Log μB ~ N(0, 1000) τB ~ Gamma(0.001, 0.001) u [0, ∞) Log μu ~ N(0, 1000) τu ~ Gamma(0.001, 0.001) v [0, ∞) Log μv ~ N(0, 1000) τv ~ Gamma(0.001, 0.001) A [0, ∞) Log μA ~ N(0, 1000) τA ~ Gamma(0.001, 0.001) c [0, 1] Logit μc ~ N(0, 1000) τc ~ Gamma(0.001, 0.001) Note. s = similarity; α = memory-strength asymptote; β = memory-strength decay rate; B = background activation; u = criterion-activation intercept; v = criterion-activation slope; A = old response threshold; c = starting-point proportion. 9 Table S2 Posterior Means and 95% HDI (in Parentheses) for the Group-Level Mean of each Parameter, Transformed Back Into Their natural Range (i.e., After Inverting the Transformation in Table S1) Parameter Varied condition All-new condition Consistent condition s 0.304 (0.274–0.333) > 0.093 (0.046–0.139) > 0.001 (0–0.002) a 1.35 (1.19–1.51) = 0.963 (0.648–1.25) = 0.989 (0.660–1.29) 1.90 (1.42–2.44) > 0.972 (0.640–1.36) > 0.343 (0.079–0.657) B 2.89 (2.56–3.16) < 4.29 (4.08–4.62) = 4.51 (4.08–4.89) u 3.94 (3.58–4.21) < 5.09 (4.85–5.48) = 5.59 (5.12–6.10) v 0.394 (0.348–0.433) > 0.105 (0.037–0.182) > 0.002 (0–0.005) A 54.9 (51.6–58.2) = 57.5 (53.4–61.1) = 57.3 (52.9–61.7) c 0.483 (0.466–0.500) = 0.490 (0.472–0.507) < 0.553 (0.532–0.575) Tr 381 (376–386) > 336 (331–341) > 288 (281–294) Note. The results of pairwise comparisons are indicated between the columns. Two distributions are considered credibly different (greater than or less than one another, as indicated) if the 95% highest density interval (HDI) of their difference excludes zero. s = similarity, α = memory-strength asymptote; β = memory-strength decay rate; B = background activation; u = criterion-activation intercept; v = criterion-activation slope; A = old response threshold; c = starting-point proportion; Tr = residual time. 10 Figure S1. Schematic depiction of the hierarchical EBRW model. 11 Figure S2. Predicted error rates (top) and mean correct RT (bottom) from the full hierarchical model. Conditions: 1 = All-new, 2 = Consistent, 3 = Varied. Note that a lag of “zero” indicates a new probe item. 12 Figure S3. Observed and predicted correct RT quantiles plotted as a function of condition, probe type (new/old), and set size. 13