xlm-XLM-2013-1179-supplement.FINAL

advertisement
Supplemental Materials
An Exemplar-Familiarity Model Predicts Short-Term and Long-Term Probe
Recognition Across Diverse Forms of Memory Search
By R. M. Nosofsky et al., 2014, Journal of Experimental Psychology: Learning,
Memory, and Cognition
http://dx.doi.org/10.1037/xlm0000015
Hierarchical Bayesian Implementation of the EBRW
In order to verify the parameter estimates for the exemplar-based random walk
(EBRW) model obtained from least-squares fitting, as presented in the main body of the
paper, we implemented the model as a hierarchical Bayesian model (for a general
reference, see Kruschke, 2011; for examples of hierarchical Bayesian cognitive
modeling, see Lee & Wagenmakers, forthcoming). In this model, we obtain not point
estimates for each parameter but rather a complete posterior distribution over parameter
estimates. This is done both at the group level, for each condition, and at the individual
level, for each participant. Thus, we assume that each participant has a value for each
parameter of the EBRW. These values, in turn, are drawn from a group distribution on
each parameter that has a characteristic mean and variance (thus, we assume the group
distributions are unimodal). The group distributions are assumed to be independent. This
situation is depicted schematically in Figure S1.
In order to do Bayesian inference on the parameters of the model, we need two
things: a likelihood function and a prior. The likelihood gives the joint probability of an
observed response and response time (RT) on a given trial, conditioned on the
parameters. The prior specifies the initial “naive” probabilities assigned to different
values of each parameter. We address each of these components in turn.
Likelihood
In the version of the EBRW presented in the main body of the paper, the model
predicts discrete RTs, as each step of the random walk is assumed to take a constant
1
amount of time. While this approximation of continuous time poses no problems when
fitting mean RT, it is a difficulty when performing Bayesian inference. The underlying
reason is that we need to compute the likelihood for the observed data (response and RT)
on each trial, rather than in aggregate. If the observed RT on a trial falls between two
steps of the random walk, it will be assigned zero likelihood. Intuitively, this is a problem
of translating between discrete and continuous time.
Many solutions to the problem are possible. For example, one might interpolate
between the peaks of the discrete RT distribution and renormalize it, such that RTs that
fall between two steps of the random walk have non-zero likelihood. We, however,
choose to use a different continuous approximation to the random walk; namely, the
Wiener diffusion process.
The Wiener diffusion process was introduced to psychological modeling of RT by
Ratcliff (1978), who explicated the model as the continuous limit of a discrete-time,
discrete-space random walk exactly like the EBRW. The basic idea is to take the mean
and variance of each step of the random walk and approximate this step distribution with
a Gaussian possessing the same mean and variance. In the EBRW, the probability p of
taking a step toward the “Old” boundary is given by Equation 3 in the main text. Let X be
a Bernoulli random variable representing the steps of the EBRW. X takes the value 1 with
probability p and 1 with probability (1  p). The mean of X is then:
E[X] = p – (1 – p) = 2p – 1
(S1)
and its variance is
Var[X] = E[X2] – E[X]2 = p + (1 – p) – (2p – 1)2 = 4p(1 – p). (S2)
Imagine that, instead of taking a single step (+1 or 1) at each time interval [t, t +
dt], the amount by which we move is an average of n equally-spaced steps during that
interval. The average of n steps is also a random variable and, by the central limit
theorem, as n approaches infinity, this variable will be distributed as a Gaussian with its
mean and variance given by Equations S1 and S2. Thus, we can approximate a discretetime, discrete-space random walk with an infinite number of infinitesimal Gaussian
increments. This is the Wiener diffusion process.
2
Let μ be the mean step size given by Equation S1 and σ2 be the variance in
Equation S2. Further, assume that the “New” boundary lies at zero, the “Old” boundary
lies at a value A and the walk begins at a point z = Ac between 0 and A (i.e., 0 < c < 1).
Then, the likelihood of eventually reaching the “Old” boundary is
2μz
p(Old) =
exp( 2 )−1
σ
2μz
exp( 2 )−exp(−
σ
.
(S3)
2μ(A−z)
)
σ2
This expression is given in a slightly different form in Ratcliff (1978). Alone, Equation
S3 gives us only the likelihood of the final response, not the RT. Assuming, as in the
main text, that there is a certain residual non-decision time Tr, the likelihood that a
response will occur at a time t after this residual time, conditioned on the final response
(old or new) is given by
f(t|Old) =
πσ2
A2
exp (−
(1−c)Aμ
σ2
) ∑∞
k=1
t
μ2
ksin(kπ(1 − c))exp [− 2 (σ2 +
3
π2 k2 σ2
A2
)] (S4a)
f(t|New) =
πσ2
A2
exp (−
cAμ
) ∑∞
k=1
σ2
t
μ2
ksin(kπc)exp [− 2 (σ2 +
π2 k2 σ2
A2
)]
(S4b)
Notice that Equations S4a and b contain infinite sum terms—these must be evaluated
numerically. We use a package specifically for this purpose developed by Wabersich and
Vandekerckhove (2013).
In this way, we have moved from the original discrete-time, discrete-space
EBRW to a continuous approximation that allows us to compute the joint likelihood of a
response and its RT, given a set of model parameters.
Prior
We assume that each participant has his or her own value for each of the nine
model parameters. These are all drawn from a group-level distribution that is independent
of the other groups and parameterized by a mean and precision. For ease of
implementation, we first transform each model parameter to lie on the real line, such that
the group-level distributions are Gaussian. We then place a very broad prior on the mean
and precision of these group-level Gaussian distributions, such that the posterior
estimates are driven entirely by the data. The model parameters and their associated
ranges, transformations, and priors are given in Table S1.
One exception to this procedure is the residual time parameter Tr, for which we do
not assume any group-level distribution. Instead, it is estimated independently for each
participant, with a prior that is uniform over the range of 0 to the minimum observed RT
for that participant. The group mean residual time is, then, just the average of the Tr
estimated for each participant in the group.
Posterior Inference
This hierarchical Bayesian version of the EBRW was implemented in JAGS
(Plummer, 2011), which estimates the joint posterior distribution by drawing a large
number of samples from the posterior using Gibbs sampling. We drew 5,000 samples
from the posterior after 2,000 steps of burn-in.
4
Results
Posterior Distributions on Group Parameters
Because our primary interest is in the differences in model parameters between
conditions, our analysis focuses on the posterior distributions for the group-level means
on each parameter. We report the posterior distributions of the untransformed parameters,
such that they lie in the same range as the parameter estimates reported in the main body
of the paper (i.e., we examine the group-mean similarity parameter, μs, rather than the
transformed version, logit(s)). The distributions are summarized by their mean and 95%
highest density interval (HDI) in Table S2. We can determine whether the value of the
group means is credibly different between conditions by examining the 95% HDI of their
difference. If the 95% HDI of their difference excludes zero, we say the two group means
are credibly different (this is the logic emphasized by Kruschke, 2011). These qualitative
differences in group means (whether they are greater than, less than, or equal to each
other) are also indicated in Table S2. The pattern of parameter estimates reported in
Table S2 was discussed in the main text of our article and is not repeated here.
Predictions
In addition to estimating EBRW parameters for each participant and group, we
can produce predictions based on the samples drawn from the posterior. Before
attempting to interpret the posterior distribution on model parameters, we must verify that
the hierarchical Bayesian version of the EBRW matches the data well. To achieve this
aim, we start by performing a posterior predictive check by producing predictions for
mean error rate and correct RT, which can be visually compared to the observed data.
Each sample from the posterior represents the parameters of the EBRW that are
inferred for each participant in each condition. Thus, for each trial a participant
completes, we can form a prediction of the probability that the participant will judge the
probe item on that trial as “old,” as well as the participant’s mean correct and error RT
for the trial.
5
Furthermore, for the Wiener diffusion process, the mean RT, conditional on
hitting either the “old” or “new” boundary, can be obtained by solving the Kolmogorov
equations for a Gaussian diffusion process, as described by Grasman, Wagenmakers, and
van der Maas (2009; analogous expressions are given by Palmer, Huk, and Shadlen,
2005). These are
1
A
E[RT|Old] = TER + μ [
μA
tanh( 2 )
σ
1
E[RT|New] = TER + μ [
−
A
μA
tanh( 2 )
σ
z
μz
tanh( 2 )
σ
−
]
A−z
tanh(
μ(A−z)
)
σ2
(S5a)
].
(S5b)
By averaging the predicted p(Old) and mean RT over all the posterior samples, we
effectively marginalize over the posterior to obtain a predicted p(Old) and mean correct
RT for each trial and each participant. Finally, we average these predictions over each
participant in order to obtain predicted p(Old) and mean correct RT in each condition.
The predicted p(Old) is converted into an error rate (1 – p(Old) for old probe items,
p(Old) for new probe items). The predicted mean error rates and correct RT are shown in
Figure S2. These correspond quite closely with the observed data.
The present method also allows us to compute the complete predicted group RT
distributions rather just the mean RTs. Figure S3 shows predicted and observed RT
quantiles for correct responses as a function of condition, probe type (old/new), and set
size. The observed quantiles are the result of “Vincentizing”; that is, quantiles are
computed for each participant and then averaged to produce group quantile estimates
(although this method is not always the best way to construct group distributions, it
suffices for visualization purposes because we are not directly fitting the estimated group
quantiles; Rouder & Speckman, 2004). Error bars for the observed RT quantiles represent
1 within-subject standard error.
Predicted quantiles are computed in the same way: For each sample from the
Gibbs sampler, predicted RT quantiles are computed for each participant for each trial,
using the algorithm for the cumulative first-passage time distribution of a Wiener process
developed by Blurton, Kesselmeier, and Gondan (2012). The predicted quantile for each
6
trial is, then, the average over the quantiles computed at each step. The final predicted
group quantiles are again Vincentized by taking the average RT quantile over each trial
from each participant. Thus, the error bars for the predicted RT quantiles represent 1
predicted within-subject standard error.
Given how well the model predicts mean RT, it is no surprise that it is able to
closely match observed median RTs (the .5 quantile; see Figure S3). Predictions for the
leading and trailing edges of the RT distributions (the .1 and .9 quantiles) are not as good
quantitatively in some conditions but clearly capture the overall trend. This analysis
suggests that the EBRW provides a good account of not just mean accuracy and RT but
also captures important variation in the complete RT distributions.
7
References
Blurton, S. P., Kesselmeier, M., & Gondan, M. (2012). Fast and accurate calculations for
cumulative first-passage time distributions in Wiener diffusion models. Journal of
Mathematical Psychology, 56, 470-475.
Grasman, R. P. P. P., Wagenmakers, E.-J., & van der Maas, H. L. J. (2009). On the mean
and variance of response times under the diffusion model with an application to
parameter estimation. Journal of Mathematical Psychology, 53, 55–68.
Kruschke, J. K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS.
Academic Press.
Lee, M. D., & Wagenmakers, E.-J. (forthcoming). Bayesian cognitive modeling: A
practical course. Cambridge University Press.
Palmer, J., Huk, A. C., & Shadlen, M. N. (2005). The effect of stimulus strength on the
speed and accuracy of a perceptual decision. Journal of Vision, 5, 376–404.
Plummer, M. (2011). JAGS: Just another Gibbs sampler. Retrieved from http://mcmcjags.sourceforge.net/
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108.
Rouder, J. N., & Speckman, P. L. (2004). An evaluation of the Vincentizing method of
forming group-level response time distributions. Psychonomic Bulletin & Review,
11, 419–427.
Wabersich, D., & Vandekerckhove, J. (2013). Extending JAGS: A tutorial on adding
custom distributions to JAGS (with a diffusion model example). Behavior
Research Methods.
8
Table S1
Functions Used to Transform the Gaussian Group-Level Distributions Back Into Their
Natural Range, as Well as the Priors Placed on the Mean and Precision of the GroupLevel Gaussian Distributions
EBRW
parameter
Range
Transformation
Prior on mean
Prior on precision
s
[0, 1]
Logit
μs ~ N(0, 1000) τs ~ Gamma(0.001, 0.001)
α
[0, ∞)
Log
μα ~ N(0, 1000) τα ~ Gamma(0.001, 0.001)
β
[0, ∞)
Log
μβ ~ N(0, 1000) τβ ~ Gamma(0.001, 0.001)
B
[0, ∞)
Log
μB ~ N(0, 1000) τB ~ Gamma(0.001, 0.001)
u
[0, ∞)
Log
μu ~ N(0, 1000) τu ~ Gamma(0.001, 0.001)
v
[0, ∞)
Log
μv ~ N(0, 1000) τv ~ Gamma(0.001, 0.001)
A
[0, ∞)
Log
μA ~ N(0, 1000) τA ~ Gamma(0.001, 0.001)
c
[0, 1]
Logit
μc ~ N(0, 1000) τc ~ Gamma(0.001, 0.001)
Note. s = similarity; α = memory-strength asymptote; β = memory-strength decay rate; B
= background activation; u = criterion-activation intercept; v = criterion-activation slope;
A = old response threshold; c = starting-point proportion.
9
Table S2
Posterior Means and 95% HDI (in Parentheses) for the Group-Level Mean of each
Parameter, Transformed Back Into Their natural Range (i.e., After Inverting the
Transformation in Table S1)
Parameter
Varied condition
All-new condition
Consistent condition
s
0.304 (0.274–0.333)
>
0.093 (0.046–0.139)
>
0.001 (0–0.002)
a
1.35 (1.19–1.51)
=
0.963 (0.648–1.25)
=
0.989 (0.660–1.29)

1.90 (1.42–2.44)
>
0.972 (0.640–1.36)
>
0.343 (0.079–0.657)
B
2.89 (2.56–3.16)
<
4.29 (4.08–4.62)
=
4.51 (4.08–4.89)
u
3.94 (3.58–4.21)
<
5.09 (4.85–5.48)
=
5.59 (5.12–6.10)
v
0.394 (0.348–0.433)
>
0.105 (0.037–0.182)
>
0.002 (0–0.005)
A
54.9 (51.6–58.2)
=
57.5 (53.4–61.1)
=
57.3 (52.9–61.7)
c
0.483 (0.466–0.500)
=
0.490 (0.472–0.507)
<
0.553 (0.532–0.575)
Tr
381 (376–386)
>
336 (331–341)
>
288 (281–294)
Note. The results of pairwise comparisons are indicated between the columns. Two
distributions are considered credibly different (greater than or less than one another, as
indicated) if the 95% highest density interval (HDI) of their difference excludes zero.
s = similarity, α = memory-strength asymptote; β = memory-strength decay rate; B =
background activation; u = criterion-activation intercept; v = criterion-activation slope; A
= old response threshold; c = starting-point proportion; Tr = residual time.
10
Figure S1. Schematic depiction of the hierarchical EBRW model.
11
Figure S2. Predicted error rates (top) and mean correct RT (bottom) from the full
hierarchical model. Conditions: 1 = All-new, 2 = Consistent, 3 = Varied. Note that a lag
of “zero” indicates a new probe item.
12
Figure S3. Observed and predicted correct RT quantiles plotted as a function of
condition, probe type (new/old), and set size.
13
Download