Document 11159464

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium IVIember Libraries
http://www.archive.org/details/memorybasedmodelOOmull
UtWtY
--
i.
1-
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
A MEMORY BASED MODEL OF
BOUNDED RATIONALITY
Sendhil MuUainathan
Working Paper 01 -28
September 2000
Room
E52-251
50 Memorial Drive
Cambridge, MA 02142
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://papers.ssm.com/paper.tafPabstract id=277357
MASSACHUSEHS
INSTITUTE
OF TECHNOLOGY
AUG 1
5 2001
LIBRARIES
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
A MEMORY BASED MODEL OF
BOUNDED RATIONALITY
Sendhil Mullainathan
Working Paper 01-28
September 2000
Room
E52-251
50 Memorial Drive
Cambridge,
MA 02142
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://papers.ssrn.com/abstract=277357
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
A MEMORY BASED MODEL OF
BOUNDED RATIONALITY
Sendhil Mullainathan
Working Paper 01 -28
Septennber 2000
Room
E52-251
50 Memorial Drive
Cambridge,
MA 02142
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://papers.ssm.coin/paper.taf?abstract
id=XXXXX
A Memory
Based Model of Bounded Rationality
Sendhil Mullainathan*
September
20,
2000
Abstract
do memory limitations affect economic behavior? I develop a model of memory grounded
in psychology and biology research to investigate this question. Using this model, I study the
case where people apply Bayes rule to the history they recall as if it were the true history. The
resulting beliefs exhibit over-reaction on average. They also exhibit under-reaction with the
model providing enough structure to allow predictions about which effect dominates when. I
then apply this general framework to an otherwise standard model of consumption. It predicts
How
the broad structure of consumption predictability as well as differences in marginal propensity
to consume across different income streams. Most importantly, because it ties the extent of
bias to a measurable aspect of the stochastic process being forecasted, the model
makes
novel,
testable empirical predictions.
"Massachusetts Institute of Technology and National Bureau of Economic Research, e-mail: mullain@mit.edu.
This paper Weis written while I was a graduate student at Harvard University. I am indebted to my thesis committee,
Drew Fudenberg, Lawrence Katz and Andrei Shleifer for their generous advice and encouragement, and to Marianne
Bertrand, Edward Glaeser, David Laibson, and Richard Thaler for many helpful discussions. I have also greatly
benefitted from comments by three anonymous referees, George Baker, John Campbell, Caroline Hoxby, Erzo
and participants at numerous seminars. Financial support from the Chiles Foundation is gratefully
P.P. Luttmer,
acknowledged.
Introduction
1
Memory
limitations appear to affect the
information used, such as the prices,
way we form
may come from
forecasts.
Consider purchasing a
outside sources.
But much
car.
Some
come from
will
our recollections, everything from newspaper reports about a recall to recollections of particular
friendly acts by a boss (perhaps suggesting a promotion and, therefore, the ability to
expensive car). Despite
its
obvious importance in
limitations are largely ignored.^
In this paper,
I
many
facets of
economic
life,
buy a more
however,
memory
attempt to develop a tractable model of
human
memory, one that has testable predictions about economic behavior.
To
build this model,
and psychologists. The
easier to
two stylized facts from the broad research on memory by biologists
cull
I
first fact,
remember that event
termed
again.
rehearsal, states that
Most students studying
remembering an event once makes
for
an exam, by reading their lecture
The second
notes and repeatedly attempting to recall the material, take advantage of rehearsal.
fact,
termed associativeness, states that similarity of the memory to current events
Cues
in today's events trigger
how
his Fiat has turned out to
While these
People
may be
cues.
facilitates recall.
Hearing your friend lament about
be a lemon may remind you of other Fiat horror
facts describe the technology of
memory, they don't
tell
us
stories.
how memory
is
used.
naive and treat the histories they recall as true, and apply Bayes' rule to them.
Alternatively, they
and
memories that contain similar
it
may be
sophisticated, by possessing complete knowledge of the recall technology
correct for distortions in recall
when forming
forecasts.
Each of these decision
"partial adjustment" rules
— has
This paper takes the
step and draws out the implications of the naive model.
first
its
rules
appeal and undoubtedly a characterization of both
— as well as
is
necessary.
'See Conlisk (1996) for a general survey of previous work on bounded rationality. Dow (1991) also presents a
limitations, which examines optimal storage of information (in the context of search) given limited
model of memory
capacity.
^I chose to examine the naive rule first since experimental evidence suggests that individuals have neither accurate
models of memory, nor correct for their memory mistakes in laboratory settings, making the naive model a natural
first model to study. Of course, in cases with repetition and room for learning, sophistication may come to have
more descriptive power. This makes it the next natural model to study and work in progress is examining it. This
first pass also abstracts from recall effort. Individuals may work harder to remember certain events in the past over
others. Such effort may take the form of mental exertion or the use of diaries to keep track of important information.
Section 5 discusses these and other extensions to be pursued in future work.
When memory
means
is
used in this way, several interesting features
First, associativeness
result.
that events affect beliefs not just through the information they convey, but also through
the memories they evoke. Receiving a devestating referee report likely evokes
other instances that erode self confidence.
More
many bad memories,
generally, associativeness generates
an over-reaction
(on average) to information as each event draws forth similar, supporting, memories.
cultivates
by a
laid-off
other,
In a related vein, completely
pessimism by selectively evoking negative information.
uninformative signals can influence beliefs
worker on the
if
they affect what
difficulty of finding
a decent job
is
may
Bad news
recalled.
Viewing a
affect beliefs
fictional speech
because
it
may
trigger
more informative memories.
Second, because of rehearsal, the memories evoked by an event linger, causing errors in belief
So the over-reaction caused by associativeness dissipates only slowly. More generally,
to persist.
events will continue to have an effect long after the information they contain has been discredited.
A
smear campaign can have lingering
thoroughly debunked.
on the
target.
The
unflattering
effects
even after
results
little
Most
interestingly, though, the
the extent of such
phenomena
the stochastic process requires
use of
hard to
fully
shadow
to disregard the testimony
comply with such a request.
memory and hence
little
effect of salient information, the hot
model makes a
set of out of
hand or
sample predictions.
to the stochastic process that individuals are forecasting.
little
use of history (such as a
random
walk), there will be
of the biases described. This ability to relate the extent of the
bais to a measurable aspect of the stochastic process being studied
is
proclaimed have been
—and others derived through similar reasoning— match many of the experimentally
belief perseverance.
When
it
it
stay, casting a negative
As another example, consider a judge instructing the jury
found biases in human inference, such as the greater
It relates
the "facts"
memories brought to mind
they just heard. Even a well-intentioned jury would find
These
all
makes the model refutable and
formalized in Section 3.4.
To
assess the effectiveness of the general
model
in
economic contexts,
within a simple Permanent Income Hyopthesis framework.
tions of the standard orthogonality predictions:
Memory
I
apply
it
to
consumption
distortions here generate viola-
consumption changes can be predicted using lagged
information. Intuitively,
when
individuals receive good news about their personal income, such as
a glowing compliment from the boss, they are more
them
remember other good news, causing
likely to
to over-forecast their income.
This generates predictability of future consumption changes.
when
there are multiple income sources, the marginal propensity to
The model
also predicts that
consume permanent income changes
will
be different
for each stream.
Income sources
high marginal propensity to consume when memories play a large role in forecasting
it,
will
show a
such as with
personal labor income rather than with gains in one's 401 (k). Most importantly, as in the general
the model makes a set of out of sample predictions.
Ccise,
It relates
the extent of consumption
predictability to a parameter of the income process that measures the importance of the history.
Moreover, since income streams with a high
the income streams with the largest
MPCs
MPC
connote over-reaction,
it
further predicts that
should also show the greatest negative correlation with
future consumption. Section 4 discusses these predictions.
These results suggest that bounds on human memory can be
match psychological
findings as well as empirical facts in consumption.
sample predictions that can be tested on standard data
modeled. The results
fruitfully
also generates out of
It
Models based on bounded rationality
sets.
often invoke fears oi post hoc rationalization, fears that with a sufficiently flexible set of assumptions
almost any behavior can be "explained". The out of sample predictions are a
such
fears.
As a whole, the
first
step in alleviating
findings suggest that models incorporating realistic limitations on recall
have strong, testable implications about economic behavior.
2
Setup
The
basic
framework examines an individual who forms expectations about a state
variable.
take this variable to be synonymous with permanent income in future discussions, but
many
will
can be
other things: a firm's earning power, macroeconomic conditions, or an employee's abilities
are just a few examples. Forecasts of income clearly influence
or portfolio choice
moves
it
I
for
— and
in Section 4,
I
explicitly study the
many
decisions
— savings, job search,
consumption decision. Labor income
a variety of reasons, such as macroeconomic shocks, technological innovations, or changes
in expectations
about an individual's
combining a diverse
ability.
set of information.
As
Some
these examples indicate, forming forecasts requires
of this information
readily available in records: income in prior months,
"soft" or
loosely speaking, "hard" or
is,
unemployment or GDP. Other information
harder to capture in records: a friend in a similar position being fired or a boss telling
you that you are one of the best employees he has seen. This disjunction between hard and
will
is
be useful
for the
model that
follows.
soft
Knowledge of soft information depends on memory, while
knowledge of hard information typically does not.
The remainder
of this section formalizes the
setup.
Environment
2.1
Let
yt,
income, obey the stochastic process:
t
yt
= Yl^k+et
(1)
k=l
where
et is
a transitory shock distributed N{Q,a^) and
will describe shortly, yt is
I'k
is
a permanent shock, whose structure
observed by the individual and represents the hard information.
that individuals have priors about the value of y which are normally distributed with
variance
Assume
mean
yo
and
CTq.
Each period with probability p an event
income. Each event
no event,
I
I
et
et
occurs, which will provide "soft" information about
has an informative, xt, and non-informative,
will write et
=
(0.0).
= (x(,nt)~7V(0,S);S =
component
=
E[Tit]
=
0.^
When
there
is
The covariance term, a^n
=
*
(Jrnn.
^«
where E[xt]
component.
Conditional on an event occurring, they are distributed:
Crl
et
rit,
/
cov(x,n), measures whether the neutral
typically appears with positive or negative information.
A
concrete example of an
event might be hearing a friend describing his recent unemployment experience.
The
length of his
^In Mullainathan (1998), this assumption is discussed in greater detail. Normality implies that xt and nt axe
symmetrically distributed. Symmetry implies that neither good nor bad events are more memorable (in the sense of
both evocativeness and vividness discussed below.
unemployment
medical
spell
would be informative
[xt),
while the fact that he has a pregnant wife with
up would be uninformative {nt)^ The model includes uninformative, or neutral
bills piling
components, because they
The permanent shock
will affect recall probabilities.
at time
t
will
be defined
iyt=xt
where
zt
~
as:
+ zt
N{0,o'^).^ Thus, while the informative part of the event tells her something about the
shock that period,
informativeness
its
is
incomplete and depends on a^/crf
At the beginning of each period, she observes the event,
with past information to form a forecast,
observed.
2.1.1
The
The game
Perfect
perfect
is
forgetful forecast.
She then combines
At the end of the period the true value of income
is
Forecasts
forecast will serve as a useful base case against
The
this information
then repeated.
Memory
memory
ijt.
C;.
which one can compare the
process in equation (1) generates a signal extraction problem: the individual
must separate out the permanent shocks to
yt
from the transitory ones.
represent a negative shock to permanent income or
both past events and yk help to solve
may
this inference
A 5%
income drop may
only affect current income. Knowledge of
problem. Events e^ are useful because they
allow one to extract a component of the time k innovation (x^) that
is
for sure permanent. Past
income realizations y^ are useful because they allow one tease out the remainder of the permanent
shock
(zfc)
but with
less certainty.
Repeated observations of high income
will suggest a
permanent
rise.^
""Of course, Jis the example also illustrates, every part of an event will have some information content, and the
dichotomy between xt and nt merely simplifies this spectrum.
^A slight awkwardness in this definition should be noted. The variance of zt is higher when there is a shock
than when there is none. In this sense, "signal" may not be a completely accurate word. This assumption does not
drive the results; I use it so that the residual variance of Zt conditional on observing x, is constant, allowing a more
transparent analysis.
^Contrast with the case where yt follows a standard random walk. Then, !/(_i is the only information in the past
needed to forecast yt- For the model formulated in this paper, yt-\ is a sufficient statistic for eJI past information.
This is an artifact, however, of the simplicity of the model. If we complicate it by assuming that different events have
diff'erent levels of mean reversion rather than all being permanent, this will no longer be true. Forecasts must then
rely directly on all past yk and e*.
forecast can
The optimal
posterior at time
t
will
be
easily derived using the
mean
be normally distributed with
beliefs these will equal (see
Lemma
1
Kalman
yt
Filter (see
and variance
Harvey 1993). The
ct^.
In steady state
in the appendix):
t-i
ytiht.et)
=
X,
^
+
+
[a'-^x,
(1
-
A*-^)Ayfc]
(2)
fc=i
aUhuet)
where Ay^
=
-
yt
yk-i and A
=
=
^2 ^^2
al
(3)
-
Understanding the marginal impact of different variables
cast rule.
is
Xk
a direct
+
on Xk
Zk
is
+
x^ influences forecasts one-for-one.
First,
eff"ect
^k
—
will
impact
Its
which contributes X*'^Xk and an indirect
Efc-ii
that contributes (1
unity. Second,
Ay^
—
A*"'^)^;^.
—
A)
<
1
is
the
eff'ect
1
—
because
A*"'^
it
<
1
sum
from
Summing shows
enters forecasts with weight
Third, y^ influences forecasts at A*~'^~^(l
improve intuition about the
as
enters in
of two terms.
(or
There
because Ay^
Ay*;,
=
that the total coefficient
is
clear
from the formula.
Ay^ and
in
Ay^+i. Both
yk and Ay;t have a less than one-for-one impact because both are noisy estimates of
income
fore-
permanent
permanent income changes). That Xk has greater impact reiterates the importance of
events in separating signal from noise.
that for sure
is
no information.
They show the
individual a portion of the income change
permanent. Fourth, n^ has zero impact as expected: neutral components convey
Finally, A
measures the importance of history
receive greater weight. This
is
intuitive.
When
in forecasts.
As A
increases, older yk
A approaches zero, most of the variance comes from
permanent shocks, and hence the process resembles a random walk. In
the least and past values should receive the least weight.
this case, history matters
Memory
2.2
Formal setup
2.2.1
Memory
will
Let history,
be modeled
that transforms true history into perceived history J
be a vector that includes y^ and e^ for k
ht,
ht
Memory maps
map
as a stochastic
ht into
the nature of this
=
t:
(yi,---,yt-i,ei,...,et_i)
a random variable
map and
<
/if.
I
begin by making mathematical assumptions about
then use experimental evidence to characterize the remainder.
As discussed, past values of income are hard information
therefore,
assume that y^
soft information,
will
be recalled perfectly.
Formally, write recalled history as /if
Notice that in the recalled history, y^
transformed into a random variable, ef whose value
(0, 0)
probability that event et
is
is
with probability
recalled at time
t is
is
unaffected, whereas
is
forgotten,
it
is
exactly as
if
1
—
denoted by
ri-t
r^t,
where these probabilities are
no event occurred that period.
Picture history as a series of boxes, one for every time period.
that period's event.
each box and
if
flips
forgotten, the
An empty box
signifies that
a coin with weight
box appears empty
r^t to
A
hnked.*
metaphor may
Each box contains the
When
help.
details of
if
to
the event in that box will be remembered;
to the individual.
Notice some of the implicit assumptions
distorted versions of events:
may be
no event occurred that period. Memory goes
determine
is
r^t
applied independently across events, though algebraically the probabilities
an event
cj
=
governed by:
with probability
Ck
I,
Events, on the other hand, characterize
and are more prone to be forgotten.
(ef,e^, ...,ef_i,yi,...,yt_i).
The
readily available in records.
made
in this specification. Individuals
they either remember them or not.
They
do not remember
also do not ''remember"
'^The recall process readily lends itself to a probabilistic interpretation. Casual conversation consists of phrases
such as "more likely to remember" and experimental work supports this. James(1890) seems to present the first
probabilistic interpretation of memory, though of course he does not use this terminology.
^Formally, conditional on rn and Tjr, Rkt and Rjr are independent.
events that never happened.
feeling of "I think
might
all
matches a non-event, so that there
something happened but I'm not sure what"
be useful tasks
for the future.
evidence for motivation on
scientific
Evidence on
2.2.2
Finally, a forgotten event
how
Weakening of these assumptions
.
To complete the model,
I
need to specify
r^t- I
turn to the
to specify this.
Memory
Research by biologists and psychologists has generated much knowledge about memory.^
on two of these features, which
The
first,
no
is
in
my
I
will focus
opinion are by far the most relevant ones for economists.
memory
rehearsal, states that recalling a
increases future recall probabilities.
Stu-
dents quickly recognize this property: repetition strengthens memories. Experimental evidence on
rehearsal can be found even at the neuronal level. Repeated firings between neurons strengthens
their synaptic connections, or the strenght of the
"memory" stored
experiments with humans shows similar behavior.
of words.
it
One group then
practices recall of this
Two
list
there. ^'^
At a more macro
level,
groups of subjects memorize the same
list
periodically, while the other does not (both see
only once). After the same time has elapsed for both groups, the group that has been periodically
recalling the
list
shows higher
recall of the
list.
That these findings should seem so obvious
is
a
testament to the intuitive appeal of the rehearsal assumption.
The
recall.
second, associativeness, states that events more similar to current events are easier to
For example, hearing a friend talk about his vacation will invoke memories of one's
vacations. Associativeness
The importance
colleagues,
word
is
may
arise
because events serve as cues that help "find"
lost
own
memories.
of associativeness in every day recall has been emphasized by Tulving and his
who study
the role of cues in recall. Subjects learn a
paired with a cue word.
The
subjects are then asked to
either provided with the associated cue or not.
A
broad
set of
list
of words in which each target
remember the
target words,
and are
such experiments finds that recall of
^Schacter (1996) presents an excellent overview of this literature, one that I draw upon.
'°See Kandel, Schwartz and Jessell (1991) for a discussion. A contrasting effect is habituation, wherein synaptic
strength decresises with frequency. This corresponds to the idea that novel stimulus receives notice which lessens as
the novelty wears
off.
I
ignore this property because
it
a property of attention rather than of memory.
the target words
is
subjects
who
is
higher
when
the paired word
learn the sentence (Anderson
The
They are more
likely to
remember
fish
is
et.
present. ^^
al.,
A
related
example of this phenomena
1976)
attacked the
swimmer
this sentence if given the cue "shark"
than
if
given no cue at
all.
Notice, however, that "shark" never appears in the sentence, which illustrates that associativeness
likely
operates also through conceptual similarity.-'^
As these
studies demonstrate, both rehearsal and associativeness have a strong experimental
In fact, the most popular models of
basis.^'^
memory
(and neural function generally), Parallel
Distributed Processing Models, possess both features (Rumelhart and McClelland, 1986). Nevertheless,
I
do not mean to imply that these are the only "important" facts about memory, merely
that these appear to be the most relevant to economists.^''
2.2.3
Formalism
Three parameters appear
rehearsal),
that
and x (which
m+P+X
<
1-
in the formalism:
m
(the baseline recall probability), p (which quantifies
quantifies associativeness).
Let Rkt denote the
random
Assume
that
all
are between zero and one
variable which equals
1 iff
event k
is
and
recalled at
''These paired words sometimes share a natural connection, such as "brain" and 'mind" or "brain" and "drain",
and sometimes are unrelated, such as "brain" and "doughnut". The findings hold in both cases though the the effect
is stronger when the words are connected.
'^Laibson (1997) derives a theory of consumption based on preferences that exhibit a form of conditioning, which
Our papers differ because I focus on expectations rather than
is related to associativeness (MacKintosh, 1983).
preferences. The similarity is interesting, however, and suggests that a memory model, in which individuals must use
past experience to forecast preferences, potential!}' provides one microfoundation to the preferences used by Laibson
(1997).
'^The evolutionary advantage of these two properties are easy to understand. Frequently encountered phenomena
and memories similar to current circumstances are both more relevant. I have not formally pursued such intuitive notion to get at a more evolutionary or optimizing basis of memory. Such a model would require a precise understanding
of the constraints on what memory mechanisms are even biologically feasible.
'"Let me cite the two most interesting omissions. First, researchers now believe that certain memories are episodic
(the time you tasted caviar), while others are semantic (you dislike caviar). This distinction is interesting because
semantic memories may not possess all the episodes that gave rise to them. Second, memory seems to be reconstructive
nature (Neisser 1967). The process of reconstruction uses a priori theories to put together the pieces, so facts that
more likely be forgotten. In a seminal experiment, Bartlett (1932) demonstrated how
in recalling stories, subjects often edit out inconsistent parts. I ignore these for the time being, however, because they
lack the mass of evidence that supports the other two assumptions and because they are analytically more vague.
in
deviate from these theories will
10
time
with R{t-i)t
t,
=
1
and
=
r^t-i)t
Note that E[Rkt]
1-
=
With
^kt-
this notation,
we can
write:
rkt=m + pRk{t-i)+Xakt
The
first
term equals the basehne
recall probability for all
(4)
memories, m. The second term captures
This formalism of rehearsal
rehearsal. Events recalled in the last period get a "boost" of p.
seem awkward. Consider two
constant) both have the
same
The
and
do exhibit exponential decay.
results
=
Alternatively,
(a^fc,7^fc)
occurs not only in expectation but on every
it
and
e^
=
where
akt
measures the similarity of event e^ to
is,
an inverse distance function). Then similarity
o-kt
and with the assumption that
= e"^^
c{x)
att
=
akt
which allows one
=
=
2
if
Ct-
The
two points on a plane. Similarity can then be defined as
(xt,nt) are
a negative function of the distance between the points. Let c
function (that
could build exponential decay
I
would not change.
third term captures associativeness
events ek
(holding the third term
Proposition 6 demonstrates that in expectation,
superficial.
is
Then
yesterday.
Recall appears to display sharp, rather than
dynamics of memorability, so that
directly into the
realization
remembered
is
recall probability.
smooth, decay. This awkwardness
recall probabilities
which occurred two days ago and et-20 which occurred
events: et-2
twenty days ago, and suppose neither
may
-
(c(a;t
either
to write: akt
Cfc
=
Xk)
or
+
c{nt
cj is
-
:
(—00,00) —> (0,1) be a closeness
is
defined as:
Uk))
a non-event.
I
will take the specific function
+6"'"'""'=)^].
5 (e"(^'~^*)^
Thus
<
a^t
<
1
and
1-
Substituting back in to the original equation provides:
rkt
and
/fct
recall that
=
1
-
can write
rkt
I
=
assume that
and similarly
fkt=l +
m + P-Rfc(t-i) + X2
m + p+x
Ffc(
pF^t-i)
=
- X^
1
-
<
1.
It will
i^i^t
-
^k)
+ c{nt -
Uk))
also be useful to define the forgetting probability,
Rkt- Further define the constant /
{c{xt
-
Xk)
(5)
+ c{nt 11
Uk)).
=
1
—
m - p,
so that
we
Finally, note that unlike the other basic
form here
is
arbitrary.
I
assumptions of the model, the choice of functional
could well have included an interaction term between associativeness and
rehearsal or higher order terms.
As another example,
might have allowed
I
for limited capacity so
that only a finite set of memories can be recalled at any time, which would generate crowd out.
The
intuition
behind the results that follow does not rely on the functional form, though some of
these extensions are clearly worth investigating.
Basic Results
3
Dynamics
3.1
of Recall
Before moving to forecasts,
it
be useful to see how
will
recall works.
Simple recursive substitution
yields:
hm
E[fkt\ek]
=
E[fkt\ek\
= =
[l-xE[akt\ek])^-~—
^
t->oo
for all k
<
t.
l
(6)
7
~P
Recall probabilities decay exponentially over time: further back memories have higher
chances of being forgotten. Experimental evidence on recall probabilities indicate that exponential
decay of memories
which
I
fits
will define as
the data rather well.^^ Also,
vividness,
Memories that are very similar
V(efc),
to a
£^[afct|efc]
increases memorability. This term,
measures how strongly associativeness
randomly drawn event
to be triggered through associativeness and, therefore,
will
=
E[xkakt\et]- If today's event
is e(,
likely
more memorable. ^^
to define the average information of these associated events. Define the
£(et)
memory.
be more vivid. The are more
While vividness capture the strength of associations that an event possesses,
is:
affects a
it
will also
be useful
evocativeness of event
then a^t measures the strength of
its
event (memory) e^, while Xk measures the information content of that past event.
et
association with
The expectation
of this product, therefore, measures the average information content of memories brought forth
'^See Crovitz and Schiffman (1974).
A
power function, however, seems to
fit
better.
'^This result, however, has the unfortunate property that outHers, very unusual events, have /oiuer recall probability,
contrary to one's intuition. One resolution to this problem may be found in allowing for the possibility that unusual
events
may
receive greater attention, and that attention
may
12
increase memorability.
by associativeness.
To
illustrate evocativeness, consider the event e*
evocativeness of this event has two parts.
close to
1 will
Second, nj
rif
=
=
=
I
(xt,nt)
(1,1)-
The
imphes positive evocativeness. Other
xt
be evoked, leading to an oversamphng of positive memories and positive evocativeness.
1
can have a positive or negative impact on evocativeness depending on Oxn- Since
depends on Oxn-
When
the effect on evocativeness
is
Oxn
is
knowing that n^ >
zero,
zero. If ffxn
is
positive,
knowing n^
says nothing about Xk, so that
>
tells
us that x^
events are selectively triggered causing a positive effect on evocativeness.
negative, nt
>
Limited
3.2
=
other events with positive n^ are triggered, but the information content of these events
1,
clearly
First, xt
=
We now
tells
us that
Memory
a:^
<
Finally,
>
0:
positive
when o^n
is
generating a negative effect on evocativeness.
Expectations
turn to the forecasts of the forgetful individual.
I
will
make a
crucial
assumption here:
the forgetful individual applies the forecasting rule in equation 2 to the recalled history. In other
words, she takes the the recalled history as the true history. Let y^{hf-,et) denote the
mean and
a1^[hf-,et) denote the variance of a (naive) forgetful Bayesian's posteriors. This assumption can
then be stated formally
I
as:
yf (/if ,et)
=
and d'l^{h^,et)
yt{hf-,et)
=
d^{h^,et).
have referred to this as the naive decision maker, in contrast to the sophisticated decision
maker,
who
completely knows the model of
memory and
corrects his forecast rule accordingly.
Deciding between these two polar models will be an important task, and one that requires a
complete characterization of behavior in both
I
it
cases.
have chosen to investigate the naive case
first
describes behavior at least in the laboratory.
because experimental evidence suggests that
Studies of individual's judgements of their
understanding their memory process
example, Reder 1996).
memories reveal inaccuracy
in
Similarly, experiments have
manipulated the memorability of information and tested whether
(see, for
own
indi-
viduals' decisions correct for this manipulation. Supportive of the naivete assumption, decisions are
insensitive to this manipulation.
Of
course, repetition
'^See, for example, Trope (1978). This also resembles findings by
heuristic, that individuals take
more
easily
remembered events
13
and room
for learning,
Kahneman and Tversky
to also be
more probable.
may
dramatically
(1982) on the availability
alter these findings. Nonetheless, the findings suggest that characterizing the naive decision
would be a useful
first
maker
step.^^
Simple substitution gives the formula
for the forgetful forecast:
t-i
=
yf(/if,e()
+ 5^
X,
[i?fc,A'-^Xfc
+
(1
-
A'-'=)Ayfc]
(8)
k=\
=
a/«(/if,e,)
ai
(9)
In words, forgetful forecasts look just Hke perfect recall forecasts except that forgotten events
{Rkt
=
0) are excluded.
Note that yf
is
a
random
variable.
Taking expectations over
this
random
variable implies that events are weighted by their recall probability.
In order to contrast the perfect recall and forgetful forecasts,
error. First, let errt
=
yt
-
ijt
forgetful forecasts respectively.
and
errl^
Now
=
Vt
- yf
err[^
=
=
yt
+
memory
error of the forgetful individual.
recall
and
- y^
error, that is the difference in forecasts
err^^, so that the
be useful to define a memory
be the forecast errors of the perfect
memory
errt
will
define:
errY"
to be the
it
With
memory
problems. Note that
how memory
distorts the forecast
caused by
error also measures
these definitions in hand,
I
now examine the determinants
of beliefs.
Proposition
The impact of event et on time t beliefs does not depend on its vividness, but does
on its evocativeness. On the other hand, its impact on time t+j beliefs depends
on both vividness and evocativeness.
depend
1
(positively)
Proof:
From Lemma
3
E[y^\e,]
=
xt
+
x£{et)
—1-
\t-i\
''Preliminary results suggest that even more nuanced results may arise in the sophisticated model. For example,
suppose that forecasts are not remembered but that zero-one decisions which condition on forecasts are remembered
with certainty. Then, a herding problem akin to Banerjee (1992) arises. Consider an individual who remembers
is the best choice. For certain histories and
choosing 1 several times but currently faces information that suggests
parameter values, the weight of having chosen 1 in the past ("I must have had some reason to do it") will dominate
and he will choose 1 again. But this implies that the signal that he received this period will be jammed and he will
be stuck in a herding equilibrium.
14
showing the dependence of evocativeness, and the absence of a vividness
intuition here
Evocativeness influences what memories are triggered and,
simple.
is
The
eifect.
Vividness only operates through increased
on behefs.
therefore, has a direct effect
memorability, which of course cannot have an impact on contemporaneous beliefs.
For the impact on future
where we see
as before the
triggered at time
bility
We
beliefs,
appendix shows that:
dependence on evocativeness. This
t
+ j.
Consistent with
also see here that vividness
Thus
increases memorability.
likely to
4 in the
because the memories
is
were rehearsed and, therefore, continue to have higher
t
even at time
Lemma
now
note that as p
plays a role.
As we saw
—)
0,
the
efi'ect
in
forming
beliefs.
One
disappears.
in Proposition 6, vividness
increases the marginal impact of xt by
it
be recalled and used
this,
recall proba-
implication
changes in vividness have no impact: whether or not the event
is
is
making
that
when
recalled,
it
it
xt
more
=
0,
does not
influence beliefs.
This proposition and
sociated a
memory
its
is,
proof makes several points that are worth reiterating. Vividnes, how as-
plays
no
role in
how an
event influences beliefs at the time
only matters as time passes and a chance to forget the event appears.
ity,
vividness influences whether or not an event
information
it
conveys
is
is
By
it
occurs.
It
increasing memorabil-
remembered and thereby whether
or not the
used in the future.
Evocativeness, the average information content of memories associated with an event, does
influence beliefs contemporaneously.
tionately draws forth positive
An
event with positive evocativeness, for example, dispropor-
memories leading
to a
more
positive forecast. Moreover, since these
triggered memories persist (by rehearsal), evocativeness also influences future beliefs, though its
efl'ect
diminishes over time (the p^ exponent).
intuitive notion of "salience" into
bility,
Summarizing, the current model decomposes the
two components: vividness, which captures increased memora-
and evocativeness, which captures the
ability of events to trigger
15
supporting evidence. Both
an event's impact on
affect
beliefs
even completely uninformative signals can affect
Proposition 2 Let et = (0, nj)
This event influences
(nt # 0/
An
but do so in diiferent ways.
be
interesting implication
is
that
beliefs.
an uninformative event but with non-zero neutral component
and only if Oj:n # 0. The sign of this influence equals
beliefs if
sign{oa:nnt).
Proof:
Appealing to
evocativeness
is
non-zero.
E\xkakt\et
The
first
term
is
Lemma
=
The
if
their
evocativeness of an uninformative event equals
(0,nf)]
zero by
uninformative events can influence beliefs only
3,
=
-
i^E[xkc{Q
symmetry of
-
Xk)\et]+ -E[xkc{nk
nt)\et]
and the symmetry of the Xk distribution. To
c(-)
evaluate the second term, apply the law of iterated expectations and condition on
and
nf.
nt)\et\
=
E[E[xk\nunk\c{nk -
5 shows, this
is
non-zero whenever nt
E{xkc{nk
As Lemma
-
hence the event's evocativeness)
The
logic here
is
simple.
in this process is
is
still
and hence the event
beliefs.
is
0,
0)
memories.
- nOk*]
a:,nE{nkc{nk
and the sign of
signal with
shaped by the memories these events
negative information. For example, a^n
>
/
=
this
term (and
sign{crxnnt) which establishes the first part.
Oxm which determines whether
positive neutral cue (n^
nt)\et]
Even though individuals disregard a
uninformative, their beliefs are
These
rik
>
0,
=
trigger.
as completely
The mediator
the neutral cue tends to appear with positive or
a positive neutral cue (nt
If CTi„
a;<
>
0,
>
0) selectively evokes other
these memories will (on average) have Xk
selecitvely evoking positive information
>
memories.
results relate to experimental findings that salient information has a greater effect
Two
experiments highlight the differential
Reyes and Bower (1979) place subjects into the
effect of evocativenss
role of jurors,
prosecution witness testimony about a drunk driving case.
16
who
One
on
and vividness. Thompson,
are asked to read defense and
side's case
was manipulated
to
be
salient while the other's
was manipulated to be
pallid. ^^ After
reading the two sides, subjects rate
When
the guilt of the defendant and are asked to return the next day.
to perform the rating again (they do not read the testimony again).
the salience manipulation has no effect on the
is
comforting since
it
Thompson
lack of an
et.
al.
find that
immediate impact
suggests that the salience manipulation did not also manipulate perceived
informativeness. For example,
we can
rule out the possibility that subjects felt that a witness
testimony contains more details was more
the second days' judgements of guilt:
judgements of
The
days' ratings.
first
they return, they are asked
guilt rose
(fell).
One
The
reliable.
when
whose
salience manipulation did, however, affect
the prosecution's (defense's) case was more salient,
interpretation of these results
is
that the presence of additional
cues (guacamole on white carpet) facilitates recall by marshaling associativeness.^*^ Vividness, as
I
have defined
carpet
it,
increases because these (irrelevant) cues
— are commonly encountered
ones.
The
—
for
example, spilling something on a
increased vividness of one side's case
memories over-represent evidence supporting that
means that
side.
Hamill, Wilson, and Nisbett (1979) present another experiment, one that resembles evocativeness
more than
vividness.
As Nisbett and Ross
One
set of subjects
(1980, p. 57) summarize:
is
presented with a description of a welfare recipient.
"The
central figure
and irresponsible Puerto Rican woman who had been on welfare
was an obese,
for
many
friendly, emotional,
years. Middle-aged
now,
she had lived with a succession of 'husbands,' typically also unemployed, and had borne children
by each of them. Her home was a nightmare of dirty and dilapidated plastic furniture bought on
time at outrageous prices,
filthy
Her children. ..attended school
teens, with the older children
kitchen appliances, and cockroaches walking about in the daylight.
off
and on and had begun
now thoroughly enmeshed
to
run afoul of the law
in a life of heroin,
in their early
numbers-running and
'^The salience manipulation was performed through adding inconsequential details to one side's testimony. For
example, in describing the defendant about to leave a party and drive home, the pallid version states that he bumped
into a table, and knocked a bowl to the floor. The salient version, on the other hand, states that he knocked a bowl
of guacamole dip off a table and onto a white carpet.
""A weakness of the current model of the experiment should be pointed out. The guacamole on white carpet cue
is effective not because it associates with current events but because it associates with past events. In other words,
the model needs to allow not only for current events to form associations that facilitate recall, but also memories
themselves should form associations that further facilitate recall. I expand on this when I discuss future extensions
in Section 5.
17
Another group was given summary
welfare."
median stay (two
(only
10%
years)
on welfare recipients documenting the short
and the small proportion that are on welfare
than four years). These
for longer
statistics
rolls for
statistics contrasted sharply
long periods of time
with the priors of control
subjects.
When
the groups are asked to state their attitudes about welfare recipients, those receiving the
story expressed far
statistics
more unfavorable attitudes than a control group. Those receiving the
showed no
Evocativeness provide one interpretation of these findings.
difference.
story that subjects read
is
recipients in a poor light
such evocative cues.
overflowing with cues
— drug
The
commonly found
pallid
The
in evidence that paints welfare
use by children, immigrant, obese
— whereas
the statistics lack
story thereby triggers evidence from the past that also contain these
cues, evidence that will generally
towards welfare recipients. In
be negative.
It,
this interpretation,
it
therefore,
is
prompts more negative attitudes
not that the single case study
is
taken as
informative. Queried, subjects should state that of course they recognize that one story (especially
a manufactured one) proves nothing, but that
evidence.
Of
course, the pallid statistics
it
reminds them of other previously encountered
do not possess such cues and, therefore, have lower
evocativeness.^^
Together, these experiments illustrate the contrast between vividness and evocativeness.
inessential cues in the testimony (e.g.
guacamole on the carpet)
will
The
not (on average) trigger other
memories that condemn or exonerate the defendant. They do, however, make the testimony more
vivid,
making
it
more
likely to
hand, the welfare mother story
will
be remembered and influence
will selectively trigger
beliefs in the future.
memories.
Its
On
the other
evocativeness means that
it
have greater contemporaneous impact.
^^That they have no effect, however, indicates either that individuals do not put much faith in the statistics(numbers
can be manipulated) or that other factors are at play there. It is also worth noting that an implication of this
interpretation is that the effect of the manipulation (seeing the story) should disappesir over time. If subjects were
brought in at later dates, the difference between treatment and control should diminish and eventuaJly vanish.
18
3.3
Over-reaction and Under-reaction
The previous
propositions illustrate
impact; the memories
how information
similar information. Such
an
effect
The
...
- yf
,
,.
Note that err^
-ii
=
=
errt
+
=
xt)
Cov{errl^, xt)
I
x and
this over-reaction increases with
Cov[err^,xt)
following proposition formalizes this idea.
errors are negatively correlated with the information in the latest event:
Cov{yt
Proof:
how
causes an over-reaction to news: today's events causes similar
evidence to be over-represented in memory.
Proposition 3 Forecast
us about
tell
Associativeness implies that events trigger memories that convey
forecast errors will be biased.
The extent of
But these propositions do not
triggers also matters.
it
content alone does not determine an event's
\
A;
and that
err'^
<
dCovierrP ,xt)
q;^
errt
j dCovierrf' ,xt)
n
U and
q^
<^
independent of
is
xt-
,
<
rv
U.
Therefore,
Cov{errY',Xt). Calculating this:
t-i
E[errrxt]
= J2
^'-'EifktXkXt]
Using the fact that x^ and xt are independent, we can write the
Intuitively, E[xkXtakt]
is
positive because
implies that the overall covariance
is
Ofct
measures
negative.
To
summand as:
similarity.
See
—xE[xkXtakt]-
Lemma
6.
This
get the comparative statics, let's
complete the calculation:
ElerrPxt]
=
-xE[xkXtakt]{X
+
+
A^
•
•
•
+
A*-i)
=
Partial differentiation shows that this decreases with
teresting. It
happens because when A
is
A(l
-xE[xkXtakt]
x and
large, the selective
A.
The
— A*~M
\_^
effect of
A
is
in-
sampling of past memories
becomes more important, since these memories enter with greater weight into the
fore-
cast rule.
Intuitively,
good information may lead to a
rosier
view of the past, which leads to forecasts that
are too large, which leads to a negative forecast error. ^^ Over-reaction increases as
^^
x
rises
because
Evidence on over-reaction can be found in studies of the representativeness heuristic by Kahneman and Tversky
Kahneman (1971) and Grether (1980). These studies find that in forming assessments
(1972,1973), Tversky and
19
X
quantifies the importance of associativeness.
Finally, the effect of
A arises because
the importance of history and thereby the importance of selective recall.
important point, which we
The
will
return to in Section
Cf
=
results in
it
is
an extremely
3.4.
previous proposition paints a picture of individuals over-reacting to information. Rehearsal,
however, generates under- reaction. To see
event
This
meeisures
it
(0,
n^) at time
Prop 2
t.
illustrate
Suppose that
how
this,
consider an individual
this event evokes positive
who
faces
an uninformative
memories so that
beliefs over-react to this non-information: the positive
triggers results in forecasts that are too large.
>
£{et)
0.
The
memories
Since these memories are rehearsed, they will
experience higher recall probabilities in future periods, meaning that forecasts will continue to be
too large. As time goes on, they will decay towards the true value as the effect of the rehearsal on
recall probabilities diminishes.
To an
outsider, the belief changes in later periods will
they were unrfer-reaction. At both times
as the
memories decay
t
+ j and t + j +
The
in each of those periods.
I,
she will see a
Notice that
if
Lemma
observer, therefore, sees a negative change
we
=
(0,n,)]
=
;^-i_^(e,)(Ap)^(Ap
illustrates the negative "drift" in beliefs that follows
-
drift
If we difference
1)
an over-reaction. In other words,
future belief revisions are negatively proportional to the initial evocativeness. Beliefs
appear to
negative
find:
E[Ay^^j_,,\et
which
first
4, that:
p were zero, this term would be zero, emphasizing the role of rehearsal.
this over time,
if
downward adjustment,
followed by another predictable negative change, an apparent under-reaction to the
change. Formally, note from
seem as
towards some equilibrium. The intuition behind this finding
is
all
will, therefore,
that there
is
more
weight on base rate evidence and too much weight on the latest piece of information.
form of perceptions of a "hot hand": individuals seeing a streak expect it to
continue. While this model does not provide compelling evidence of all the actual experimental evidence (in many of
these, the relevant information is directly available and memory plays no role), it generates behavior in real settings
individuals place too
A
similar
little
phenomenon
arises in the
that resembles the findings.
20
information in her forecast errors than the individual realizes:
err^
As with the
in
which their memory
{errt).
both
arises even
more
If
also tells the individual the
it
she
is
positively surprised, the
I
discuss this further in Section 3.4.
intuitively in a slightly modified version of the model.
there
ej,
is
(perhaps a rumor). Abstracting away from
for
rit
now, suppose that x[ equals xt plus noise.
statistic followed
still
depend partly on
x[
by a
revision. In this setup,
even after xt
Because, even though the individual discards the information contained in
it
An
But rehearsal combined with
revealed the individual should pay no attention to x[.
associativeness will imply that beliefs will
Suppose
a period where the individual observes a noisy event
example might be the announcement of a government
is
some
probably has been a positive shock and that she
infer that there
that before observing the true event
once Xt
But,
systematically biased {err^).
is
systematically under-sampling positive memories.
Slow adjustment
ej
+ err'^
permanent component
in the
forgetful individual should
is
errt
perfect recall individual, the forecast error tells the forgetful individual that
change has occurred
way
=
x[,
is
revealed.
Why?
the set of memories
evoked have been rehearsed and they continue to have an impact in later periods. ^^
Finally, the following proposition
Proposition 4 Let
T>
When
t.
shows that forecast errors can be positively auto-correlated.
events are very memorable (f low,
x ond p
large), then
Cow[errf ,er7-,^i] >
Proof:
(Sketch)
details.
The proof
I will
for the finite
(that tend to zero as
follows:
(1)
present a proof for the case where
Use the
t
t
case
exactly the
is
gets large) involved.
fact that
The
f
^^ oo to abstract from
same but with more constants
general strategy of the proof
in all
some terms that
on belief perseverance (e.g. Ross, Lepper and Hubbard,
and Weber, 1990) and on hindsight bias (Fischoff, 1982).
three of these categories emmphasize the inability of subjects to "undo" information. See Mullittle
resembleiiice to the findings
1975), the curse of knowledge (e.g. Camerer, Loewenstein
Experiments
as
err^^ can be written as a function of err™ plus some
terms: (2) Substitute into Elerr'^-^err^] to get a ElerrY^ err^^] plus
^^This result bears a
is
lainathan (1998) for a discussion.
21
resemble
to
J5[xfcerrJ"]; (3) these
generate opposing signs so that the variance term tends
dominate whenever the probabihty of forgetting
For the
first
Lemma
step in the proof, see
is
small.
which shows
7
that:
t-i
err^i
^ XkH -
= pAerr™ +
xak{t+i))
k=l
Substitution into E[err^ierr^] gives (step
pWarierrD +
2):
^ X'+'-'E[x,{1 -
xak(t+i))errr]
k=l
Substitution for the latter gives:
t-i (-1
pXVarierrD +
The
E
XE[{1 - xat(t+i))^terr^]
EE
X^'^'-'^-^ E[xkX,{l
-
xafc(t+i))/j*]
third term can be written as:
X''+''''E[xl{l-xak(t+i))fkt] +
E E >^''^'~'~'p'''E[xkXjil-xam+i)){l-Xajk)]
fc=lj=l
k=l
where since x^ and Xj are independent
E
+
X'^+'-^'^Elxlil
-
xak(t+v)fkt]
-
this
can be written
EE
as:
X''+'--'-^p'-'E[x,x,{l
"
Xakit+i))xajk)]
k=\j=l
fc=l
Putting these terms together gives:
t-\
pXVarierrD
+
E
^'*"^'"''^[^^(/
"
Xafc(.+ i))/a]
k=l
+
-
XE[{l-xatit+i))xterr^]
E E X''+'-''^p'-'E[xkXj(l -
xakit+i))xajk)]
fc=ij=i
Now
the
first
and second terms are
clearly positive,
term are clearly negative. The key insight
memorability gets large
Therefore,
(as
/ -^
when memorability
is
is
where as the third and fourth
that the negative terms tend to zero as
0) since these predicate
on having forgotten xt or x^-
sufficiently high, the overall expression is positive.
22
Positive covariance can be understood as overlapping samples. Forgetting
events from history. Since the samples at times
t
and
T draw from overlapping histories, correlations
The
increasing the autocorrelation in forecast errors.
occurs for the following reason. Suppose that /
is
condition that / must be sufficiently low
very large. Then, the xt from the past will likely
be forgotten and hence xt shows up with large weight
is
analogous to sampling
Moreover, rehearsal implies that memories that were forgotten will be forgotten again,
arise.
Xt
is
in err(^y.
We know
from Proposition 3 that
negatively correlated to err(^. This implies a negative auto-correlation.
The
results so far illustrate
a model such as this
is
that
it
two conflicting
and under-reaction. One advantage of
forces: over-
allows us to trade off such effects and figure out
when we expect
one to arise over the other. The following propositions quantify when belief changes are negatively
(over-reaction)
and when they are positively correlated (under-reaction) to lagged information.
Proposition 5 Suppose
low.
that forgetting probabilities are small, so that f
is
high and p and
x
o,re
Then:
Cm;[Ayi^i,Ayt_i]
When
(10)
>
(11)
these probabilities are large, however:
Cot;[Ay,^i,Ayi_i]
When
<
the covariance is negative, then a
change in A makes
dCov[Ay(l^,Ayt-i
dx
Ayt^i
xjt
more negative:
<
(12)
Now,
Proof:
and
it
is
independent of
all
=
Ayt
7,
we can
err^^^
+ err"^
lagged information. Therefore, the covariance equals:
£[errrAy,_i]
From Lemma
-
write err™
j
- S[err^i Ay,_i]
terms of err'^
in
.
Substituting for this gives:
t
(1
-
\p)E\x,.,err^\
- J^
\'-''^'E[{f_
k=\
23
-
xak(i+i))xkXt-i\
Reapplying
Lemma
7 to
err^
gives:
k=l
fc=l
Note that x^ and Xt^i are independent
(1
-
Define
+
Xp)XpE[xt_,errr.,]
C
to be E[xf_i(l
-
(1
summations
in the
- Ap)A£[xti(/ -
xa{t-i)t)]
for
xa(t-i)t)]
which also equals
-
A;
7^ i
—
1,
A2i;[x2_i(/
E[x^_-^{f^
-
leaving:
-
xait-i)t)]
This
xa(<-i)((+i))]-
gives:
(1
Substituting for the
-
first
+ CA(1 -
\p)XpE[xt-ierrYLi]
A(l
+ p))
part from the proof of Proposition 3 gives:
~^ ^^\~_\
^
E[aktXkXt]
+ CA(1 -
A(l
+
(13)
p))
Suppose events are very memorable, so that the forgetting probability, /
and pare high. Then
or even negative.
is
-
(A(l
The
first
A(l
term
+p))C = {X{1- X{1+ p))E[x'^^_-^{lis
is
low and
xa^t-i)t)]
^iid
p are low. Then, the
small
already negative, so that in this case the correlation
negative. Suppose, on the other hand, that events are easy to forget so that /
and X
is
x
first
term tends to
zero, while the
is
high
second term implying
a positive correlation.
Differentiating with respect to A gives:
1
-
A*-i
-E[aktXkXt]
1which
1
is
the
— A(H-p)
same
as equation (13) except
has been replaced by
second only makes
Consequently
if
it
is
(i) it
has been divided through by A and
1— 2A(l+p). The
more negative since
equation (13)
+ C{l-2X{l+p))
C
is
first
positive
has no result on the sign and the
and
l-A(l+p) >
negative the derivative with respect to A
24
(ii)
is
1
— 2A(l+p).
also negative.
The
behind this proposition
intuition
forgetting
and
On
over- reaction.
is
that there are two effects that govern behef dynamics:
the one hand, yesterday's information
which means that the individual must learn
beliefs
much
to Xt
again. This induces a positive correlation
the other, over-reaction
between
means that the individual responded too
occurred, meaning that beliefs must correct for this. This induces a negative
When
events are on average quite memorable, the over-reaction effect dominates.
events are readily forgotten, the individual must relearn old information.
A
reflects the discussion in Section 3.4.
on A
forgotten,
it
when
correlation.
When
On
and lagged information.
it
may have been
larger
and takes a longer time to undo.^^
3.4
The Role
The dependency
greater emphasis on history implies over-reaction
of History
Recall that A measures the weight put on past values in the forecast. In this section
how A mediates
is
and slow
over-reaction
learning.
I
will
I
will
examine
argue that A can be measured easily and
therefore the empirical tests involving A can actually be implemented.
Let's begin
by considering the impact of forgetting an event. The memory error equals:
(-1
err^
=
y,
_ yf =
^
i-l
A*"'^!!
- R^tjxk =
Y,
If Fj^t
=
1,
so that event e^
is
forgotten, the
memory
error
would go up by
that the impact of forgetting an event declines as time passes
rate of decline depends on A.
distant past.
Xk
is
does this happen?
them means that
learns about the
learning
is
slow.
yt-
is
distant the
an event
in the
due to forgetting
this
absence of this signal, the individual
time through
yt.
Since
yt is noisy,
memory, the more time there has been
This establishes who there
biases,
it is
will
however, this
to learn about the
be slow learning. This
natural to ask whether an individual might not be simply
memory
completely. Mullainathan (1998) shows that ignoring
bias but almost surely raises variance (since information is lost).
better off by simply ignoring their
Moreover, the
k gets large).
lost
This shows
Events provide perfect signals of the permanent shocks, so
yt instead.
Given both these over amd under-reaction
—
As time proceeds, the information
permanent shock but
The more
{t
\^~'^Xk.
A, the larger is the effect of forgetting
this perfect signal is lost. In the
event thorough observations of
^''
larger
slowly re- learned through the
forgetting
still
Why
The
>^'~HFkt)xk
k=l
k=l
25
memory may reduce
learning occurs at rate A because A measures the noise-signal ratio in
When
yt-
it is
large, y is
a very noisy signal of permanent income and forgotten events are learned about very slowly.
summarize, A captures who quickly a forgotten event can relearned through the
quickly errors in
Let's
now
memory
y,
To
and hence how
are corrected.
return to rederiving
how
beliefs
respond to an event
e^:
t-i
E[y('\et]
=
xt-J2^*~''^i^kfkt\et]
t-l
k=l
xt
To
we
get the second equation,
+ X^{et)
exploit the fact that
third equation comes from the definition of
A;,
+
(a
<f(e4).
A2
[x^
To
+
is
A^
+
.
.
.
+
A*-i)
independent of Xj as
is
fkU-i^Xk-
The
interpret this equation, notice that at time
associativeness results in a selective sampling that has effect equal to the evocativeness, £{et).
But
as we've seen the impact of recall mistakes
on
beliefs
depends on
A.
In the formula,
that selectively recalling the events at time k has impact X^~^£{et). Taking
the impact of selective recall
i
—
)•
we
see
co for simplicity,
is:
^£[e^)(\
+
\^
+
+ ..)=x£{et)j
\^
A
Therefore, as A increases, the importance of selective recall increases. Intuitively,
when A
is
large,
the triggering of certain types of memories over others has bigger impacts because the past matters
more.
We,
therefore, see
two basic properties of
slowly individuals adjust their
since A can be
^'Cajroll
measured
in
memory
A.
It
both measures extent of over-reaction and how
mistakes. These two observations are especially interesting
standard data
sets."^^
and Sajnwick (1995) present a technique that allows us to estimate A. Notice that Var{i\'^yt) = dal + ia"^.
/^dVt in the data for many different d and regressing them on d, one can back out the variances.
By computing
26
Application to Consumption
4
Having developed the general
individuals.^^ Let
i
results,
the model to the consumption decision of
now apply
I
index individuals, and yu represent and income, c^ denote consumption and
u{c) be the instantaneous utility function taken to be the
individual maximizes discounted (subjective) expected
same
utility,
across individuals.
where the discount rate
Suppose the
is S.
Assume
that she faces no borrowing or savings constraints and can borrow or save risklessly at a rate
that 5
= Y^.
Under these
t,
Further, impose a no-Ponzi
game condition
so that there
is
conditions, marginal utility of consumption will be equated:
no
infinite
u'{ct)
=
r,
and
borrowing.
u'{ct) for
all
T. Taking a quadratic or log-utility function implies that consumption will be equalized across
time. In the current model, this allows us to write consumption as:
Cit
where ^o
=
and
Aj(f+i)
^cu =
=
+ r){Ait + yu -
(1
i^^vu
t
—
1
This
forecast error.
determine consumption
in this
is
is
= Yqr^^y^^ + ^^^-d
proportional to the change in income expectations plus
permanent income considerations completely
model.
individual and an aggregate component.
we write
influences the individual.
the assets. Differencing across time gives:
yf(i-i)
intuitive since
Now, suppose that the income process
individual specific one,
Cit) is
+ yr{t-i) -
In other words, the change in consumption
the time
= r"— ^zt + yu
1 + r
yit
=
y^t
is
the
Letting
+ '^iVt-
sum
yt
of two components:
be the aggregate component, and y° be the
where a, measures how much the aggregate shock
Both the aggregate and individual income components follow processes
described so far and the individual income components are
for
both processes. Let
q
one specific to the
iid
across people. ^^ Events are observed
be aggregate consumption.
^^Mullainathan (1998) presents an application to asset pricing as well.
^^There is a slight oddness in the results here. Income is normally distributed meaning that it might well be
negative. Using a log-normal distribution would generate all the results here but with added technical complications.
The goal here is simply to illustrate the kinds of results that arise rather than to flesh out a structural model.
27
In this simple Permanent Income setup, consumption changes should be unpredictable. Since
they essentially represent belief changes, one should not be able to predict them on the basis of
lagged information available to consumers. In contrast, the errors of the forgetful f6recaster lead to
consumption
predictability,
and the pattern of this predictability can be pinned down under certain
conditions.
Prediction
1 Suppose:
1.
Personal events are highly memorable and aggregate events are not very memorable
2.
a, is small
then at the micro
;
and
level:
<
Cov{Acnt+k),^yit)
dCov{Acnt+k)^^yit)
<
dCov{Acnt+f,),^yzt)
>
doi
while at the aggregate level:
Cov(Act+k,Ayt)
To
see
how
this prediction works, note that:
=
Cov{Ac^t+k),^yrt)
Taking the
>
first
term,
we can break
E[Ayf^^^^)Ayit]
into the
it
+ E[err^t^f^_,)Ay,t]
components due to the aggregate shock and the
parts due to the idiosyncratic component:
E[Ayll^^,^Ayu]
where because of independence,
5,
we know
that the
first
I
=
expression
Therefore,
if
term here
a,
is
+
a^E[A§^^,^,^Ay°]
have dropped terms such as E[Ay^^_^_|^^Ay^]. Applying Proposition
is
memorable), and that the second term
forgotten).
£[AyO«^,)AyO]
negative (we have assumed that personal events are very
is
positive (we have
assume that aggregate events are
small, the whole expression
is
negative.
is:
£;[errO- ,_i)AyO]
+ a^E[efr^^^^,_,^Am]
28
The second term
easily
in the
where errif^
the
is
memory
error for the idiosyncratic
error for the aggregate component.
negative
first
when
events are
term here
is
Just as in the proof of Proposition
memorable and positive when events are easy
negative and the second term
negative sign for the sum. Putting this
respect to A^
come
income component and efr"^
clearly
all
is
5,
5,
memory
these correlations are
to forget.
Therefore, the
positive with the smallness of Oi generating a
together gives Cov{/Sc,(t+k)^^yit)
from Proposition
the
is
<
0-
The
partial with
whereas the partial with respect to o, comes from
the fact that the aggregate contribution to the covariance
is
positive.
Suppose now that we aggregate up consumption and income. Since the idiosyncratic components
of income and
its
forecasts are iid across people, aggregation produces zero for these. This gives:
Reapplying Proposition
where a
is
positive.
This establishes the aggregate results.
the average of
a^.
Intuitively, over-reaction
memorable. The dominant
boss calls
them
recall other
in, tells
effect is
for the idiosyncratic
future,
and
this causes
ability,
aggregate
The
(or
them
to selectively
and hence, high permanent
As one aggregates up, the idiosyncratic over-reactions cancel
Macro-covariances, therefore, depend on recall of the aggregate component.
forgotten, there
is
be
the smallness of a, guarantees that the reaction to the aggregate
information does not matter.
information
will
that individuals over-react to their private information. Their
them that they have a bright
level,
term
components of income since these are
information that makes them think they have high
At the micro
income.
dominates
5 as before tells us that this
is
under-reaction to
it.
out.
Because aggregate
This leads to a positive covariance at the
level.
first
assumption of
differential
memorability can be justified only by appeal to intuition
perhaps through surveys): personal events may hold more memorability
they deal with
some support
many more everyday
events than aggregate events.
in the data, as Pischke (1995)
of individual income
is
for
consumers because
The second assumption
receives
and others have argued that the aggregate component
small.
29
At the micro
level,
who consume more
the
first
part of this prediction resembles "rule of
thumb" consumers, ones
of their income than permanent income considerations would justify.
The
prediction has generally, though not always) found support in the literature (Hall and Mishkin
1982, Hayashi 1985a, 1985b, Jappelh and Pagano, 1988 and Mariger and Shaw, 1990).
and third
predictions, however, have not been tested as far as
has received support, as seen in Campbell and
Mankiw
I
The second
know. Finally, the macro prediction
(1989).
Deaton (1992) summarizes
this
evidence.
Now, suppose that we go back
streams.
to a single individual, set a^
The marginal propensity
consume out of these
to
=
0,
different
and allow
income streams
on the extent of the that stream's evocativeness. Note, from Proposition
recruitment effect the larger the forecast error and hence stronger the
to be
income stream
s
and
Prediction 2 In general
MFCs
MPCg
to
7^
MFCs > MFCs'
To
see,
how
this
term, which
is
will
income
depend
that the stronger the
1,
mean
reversion. Define y^t
be marginal propensity to consume out of stream
s.
Then:
MFCs'- Moreover,
=> Cov{Act,Ays^t-i))
<
Cou(Act, Ay,,(j_i))
works note that:
MFCs =
Since err^^jN
for several
Cov{Acu Ayst) = E[Ayf^Ayst] +
E[err'Jl^_^s^Ayst]
independent of Ay^f we can drop the second term. This leaves us with the
we can write
first
as:
E[AysAyst] - ElAerr^^Ayst]
The
first
term here
is
the appropriate
MPC
memory
mistakes.
The second
income streams especially since E[£{e).r]
will vary. In
in the
absence of any
term represents the distortion:
E[errftAyst]
This
will in general
be different
for different
= xE[£{e)x]j
A
A
other words, streams that have high evocativeness, where information about earnings
30
in that
stream
relies
heavily on soft information that has
greater negative lagged correlation
many
cues, will have larger
MFCs. The
implication for
comes directly from the discussion to date. The greater E[£{e)x],
the greater the over-reaction and hence the greater the correlation to lagged income changes.
Intuitively, the prediction follows
because changes
MFC
Empirically, differences in
"visceral" reactions.
Serious empirical difficulties arise, however.
differences in propensities to
in different
income streams invoke
has received some support (Thaler, 1990).
Empirical differences in
consume permanent income.
MFCs may
Alternatively, they
ences in the informativeness of income changes. Yet another possibility
differences in information
different
is
may
represent true
represent differ-
that they
may
represent
between the econometrician and the individual due either to measure-
ment error or private information. This makes testing such predictions heavily
assumptions about the income process.
On
reliant
on structural
the other hand, the relationship between
excess sensitivity has not been tested as far as
I
know, and the empirical
difficulties
MFC
here
and
may be
less severe.
5
Conclusion
To summarize,
this
paper has built a simple model of memory limitations. The model has been
based on two basic facts drawn from
Interestingly, these
decision
two
facts in
scientific research
topic:
rehearsal and association.
combination generate several of the experimentally found biases in
making under uncertainty. This suggests that memory limitations might be an important
component
for realistic
models attempting a unified treatment of bounded
also generate relevant predictions in the
and
on the
asset pricing.
We
rationality.
economic applications we have examined: consumption
have also seen how previously untested predictions
predictions will provide a
way
The model
of refudiating this model.
Many
arise.
Testing these
other applications are possible that
have not been pursued here: advertising, subjective performance evaluation (where assessments of
an individual may depend on intangible aspects of past performance), and bargaining situations
(where opponents
may
disagree on the past) are a few of the examples. Each of these has
subtleties.
31
its
own
Let
me
conclude by outlining two directions of future work. First, this paper has focused on
the naive case.
What
does behavior in the sophisticated case look like?
flavor of the kinds of results that
from
full rationality
become no
case of outsiders manipulating
might
arise in footnote 18.
memory
As pointed out
Another point
less interesting.
limitations even
sophistication, the possibility for manipulation can
if
the
to
mean
have real
still
have already given a
I
there, the deviations
be made here
effect
effects.
is
"taken out" due to
For example,
attempt to use advertising to manipulate memories but individuals attempt to undo
Equilibrium can result in positive
beliefs.
levels
that in the
is
it,
if
firms
the
Nash
even though there will be no equilibrium distortion in
In other words, a standard "signal jamming" argument can be applied
when
advertising
attempts to manipulate sophisticated players.
Second, associativeness as formulated in this paper has a
trigger related memories, the
an extension
I
memories that one
refer to as association chains.
multiple steady states in recall.
good and bad. For a fixed
had high
recall
recalls
history,
in
for
recall.
such chains raises the possibility of
which there are only two types of events,
one possible steady state
and bad events have had low
While current events can
cannot themselves trigger other memories,
Allowing
Consider a world
failing.
By
is
that good events by chance have
rehearsal,
good events
also have high
current recall probabilities rkf Such an individual appears optimistic since he systematically overrecalls
future.
good events. Moreover, when he encounters a good event,
The
existing stock of
good events have high
recall
and
it
will
have higher
will, therefore, trigger this
frequently through association chains, generating a great deal rehearsal and raising
recall probability.
Similarly, a
bad event, by virtue of
its
recall in the
its
new event
steady state
association chains being with low recall
probability bad events will tend towards a low recall steady state.
This optimist, therefore, not
only systematically recalls positive information he has already received, he also has a propensity
to better recall
"sticks" to
any good information he receives
him while bad information
in the future.
"slides" off
In other words,
good information
him. Symetrically, there would be a pessimistic
steady state. To understand the local dynamics between these steady states, consider an optimist
who encounters
a long sequence of negative information.
32
Their recency
inak(\s tiicse bail
events
very memorable, and they form an association chain that can raise the recall probabilities of
bad
events. Thus, a sequence of such events
may push
the individual to a pessimistic steady state.
This sketch illustrates the possibilities of this approach.
33
all
References
Akerlof, George (1991).
"Procrastination and Obedience," American Economic Review, 81(2), May,
pp. 1-19.
C,
Anderson, R.
Pichert, J. W., Goetz, E. T., Schallert, D. L., Stevens, K. V.,
and
Trollip, S. R.
"Instantiation of General Terms," Journal of Verbal Learning and Verbal Behavior,
(1976).
15, 667-679.
Banerjee, Abhijit (1992).
"A Simple Model of Herd Behavior," Quarterly Journal of Economics,
107(3), 797-817.
Barberis, Nick, Shleifer, Andrei
NBER
Working Paper
#
and Vishny, Robert
5926. Cambridge,
(1997).
"A Model of Investor Sentiment,"
MA: NBER.
Bernard, Victor (1993). "Stock Price Reactions to Earnings Announcements:
Anomalous Evidence and Possible Explanations,"
Thaler (1993), pp. 303-40.
in
Remembering. Cambridge: Cambridge University Press.
Bartlett, F.C. (1932).
Camerer, Colin (1995). "Individual Decision Making,"
in
Kagel and Roth (1995).
Camerer, Colin, Loewenstein, George, and Weber, Martin (1989).
Economic
A Summary of Recent
Settings:
An
"The Curse of Knowledge
in
Experimental Analysis," Journal of Political Economy, 97(5), 1232-
54.
Campbell, John, and Mankiw, N. Gregory (1989). "Consumption, Income, and Interest Rates,"
NBER
Macroeconomics Annual 1989,
bridge,
MA: MIT
Campbell, John, and
ed.
by Olivier Blanchard and Stanley Fischer. Cam-
Robert (1988a).
"The Dividend-Price Ratio and Expectations of
Press, 185-216.
Shiller,
Future Dividends and Discount Factors," Review of Financial Studies,
Campbell, John, and
in
Shiller,
1,
195-227.
Robert (1988b). "Sotck Prices, Earnings and Expected Dividends,"
Journal of Finance, 43, 661-676.
Carol], Christopher,
Working Paper
Conlisk,
John
and Samwick, Andrew
#
(1996).
5193. Cambrige,
"Why Bounded
(1995).
MA:
"The Nature of Precautionary Wealth,"
NBER
National Bureau of Economic Research.
Rationality?"
Journal of Economic Literature, 34(2), 669-
700.
Crovitz, H. F., and Schiffman, H. (1974). "Frequency of Episodic
Age," Bulletin of the Psychonomic Society,
Cutler, David, Poterba, James,
4,
Memories as a Function of
their
pp. 517-518.
and Summers, Lawrence
(1989).
"What Moves Stock
Prices?"
Journal of Portfolio Management, 15(3), 4-12.
De Long,
Bradford, Shleifer, Andrei, Summers, Lawrence, and Waldmann, Micahel (1990). "'Noise
Trader Risk in Financial Markets," Journal of Political Economy, 98(4), 703-38.
Deaton, Angus (1992). Understanding Consumption. Oxford: Oxford Unversity Press.
Dow, James
(1991).
"Search Decisions with Limited Memory," Review of Economic Studies, 58
34
(1),
FischofF,
January, pp. 1-14.
Baruch
Hindsight," in
Condemned
"For Those
(1982).
Kahneman,
Slovic
to
Heuristics and Biases in
Study the Past:
and Tversky (1982).
Fudenberg, Drew and Levine, David (1997). Theory of Learning in Games, mimeo, Harvard University.
Grether, David (1980).
"Bayes Rule as a Descriptive Model: The Representativeness Heuristic,"
Quarterly Journal of Economics, 95
Hall,
November, 537-57.
(3),
Robert and Mishkin, Frederic (1982). "The Sensitivity of Consumption to Transitory Income:
Estimates from Panel Data on Households," Econometrica, 50, 461-81.
A
Hayashi, Fumio (1985a). "The Effect of Liquidity Constraints on Consumption:
Cross- Sectional
Analysis," Quarterly Journal of Economics, 100, 183-206.
Hayashi, Fumio (1985b). "The Permanent Income Hypothesis and Consumption Durability," Quarterly
Journal of Economics, 100, 1083-1113.
"Ignoring Sample Bias:
Hamill, R., Wilson, T.D., and Nisbett, R.E. (1979).
Collectivities
Harvey,
Andrew
Inferences about
from Atypical Cases," unblished manuscript. University of Michigan.
Time
(1993).
James. William (1890).
Series Models. 2nd. ed. Cambridge,
MA: MIT
The Principles of Psychology. Reprinted, Cambridge,
Press.
MA: Harvard
Uni-
versity Press (1983)
Jappelli, Tullio
and Pagono, Marco
Section," Centre for
(1988). "Liquidity Constrained
Econ Policy Research,
Households
in
an Italian Cross-
iscussion Paper: 257, August.
Kagel, John, and Roth, Alvin, eds. (1995). The Handbook of Experimental Economics. Princeton,
NJ: Princeton University Press.
Kahneman, Daniel, and Tversky, Amos
"Subjective Probability:
(1972).
sentativeness," Cognitive Psychology,
3,
A
Judgement of Repre-
Kahneman,
430-54. Reprinted in
Slovic,
and Tversky
(1982).
Kahneman,
Daniel,
and Tversky, Amos
"Availability:
(1973).
and Probability," Cognitive Psychology,
4,
A
Heuristic for Judging Frequency
207-32. Reprinted in
Kahneman,
Slovic,
and Tver-
sky (1982).
Kahneman,
Daniel, Slovic, Paul, and Tversky,
Heuristics and Biases.
New
New
York,
NY:
(1982).
NY: Cambridge University
York,
Kandel, Eric, Schwartz, James, and
Amos
Jessell,
Thomas, 3rd
Judgement Under Uncertainty:
Press.
ed. (1991). Principles of
Neural Science.
Elsevier Science Publishing.
Laibson, David (1997). "A
Cue Theory
of Consumption,"
mimeo. Harvard University.
Lakonishok, Josef, Shleifer, Andrei and Vishny, Robert (1994). "Contrarian Investment, Extrapolation,
Lo,
and Risk," Journal
of Finance, 49(5), pp. 1541-78.
Andrew and MacKinlay, A. Craig
(1990).
"Data-Snooping Biases
Pricing Models," Review of Financial Studies,
3,
431-468
in Tests of Financial Asset
Mackintosh, N.J. (1983). Conditioning and Associative Learning. Oxford: Claredon Press.
Mariger, Randall, and Shaw, Kathryn (1990). "Unanticipated Aggregate Disturbances and Tests of
the Life-Cycle
Muth, John
Model Using Panel Data," Board
Statistical Association,
Neisser, U. (1967).
mimeo.
"Optimal Properties of Exponentially Weighted Forecasts," Journa/ of the
(1960).
American
of Governors of the Federal Reserve,
55 (290).
Cognitive Psychology.
Nisbett, Richard, and Ross, Lee (1980).
Judgement. Englewood
Cliffs,
Pischke, Jorn-Steffen (1995).
New York: Appleton-Century-Crofts.
Human Inference: Strategies and Shortcomings
of Social
NJ: Prentice-Hall.
"Individual Income, Incomplete Information, and Aggregate Con-
sumption," Econometrica, 63(4), July, pp. 805-840.
Rabin, Matthew (1997). "Psychology and Economics," mimeo, University of California-Berkeley.
Reder, Lynne, editor (1996).
Cognition.
Roll,
Mahwah,
Implicit
N.J.:
memory and
metacognition, Carnegie Mellon Symposia on
Lawrence Erlbaum.
Richard (1984). "Orange Juice and Weather," American Economic Review,
Ross, L., Lepper,
M.
R.,
and Hubbard, M.
(1975).
74, 861-80.
"Perseverance in Self Perception and Social Per-
ception: Biased Attributional Processes in the Debriefing Paradigm," Journal of Personality
and Social Psychology,
32, pp. 880-892.
Rumelhart, David, McClelland, James, and the
Processing
:
PDP
Research Group (1986). Parallel Distributed
Explorations in the Microstructure of Cognition. Cambridge,
MA MIT
:
Press,
1986.
Sargent,
Thomas
Oxford and
(1993).
New
Bounded Rationality
in Macroeconomics.
Arne Ryde Memorial Lectures.
York: Oxford University Press.
Schacter, Daniel (1996).
Searching for
Memory: The
New
York,
"The Limits of Arbitrage," Journal of Finance,
52(1),
Brain, the Mind, and the Past.
NY: Basic Books.
Shiller,
Robert (1989). Market
Shleifer, Andrei,
Volatility.
and Vishny, Robert
Cambridge, MA:
(1997).
MIT
Press.
35-55.
Stigler,
George; Becker, Gary (1977). "De Gustibus
Non
Est Disputandum," American Economic
Review, 67(2), 76-90.
Thaler, Richard (1990).
"Saving, Fungibility,
and Mental Accounts," Journal of Economic Per-
spectives, 4(1) 193-205.
Thaler, Richard (1993). Advances in Behavioral Finance.
Thompson, W. C, Reyes, R. M., and Bower, G. H.
New
(1979).
York,
NY: Russell
Sage.
"Delayed Effects of Availability on
Judgement". Manuscript, Stanford University.
Trope, Yaacov (1978). "Inferences of Personal Characteristics on the Basis of Information Retrieved
from One's Memory," Journal of Personality and Social Psychology.
Kahneman,
Slovic and Tversky (1982), 378-390.
36
36, 93-106.
Reprinted in
Tulving, E. and Schacter, D. L. (1990).
"Priming and
Human Memory
Systems," Science, 247,
301-306.
Tulving, E. and Thomson, D. M. (1973). "Encoding Specificity and Retrieval Processes in Episodic
Memory," Psychological Review,
80, 352-373.
Tversky, Amos, and Kahneman, Daniel, (1971). "Belief in the
ical Bulletin, 2, 105-110.
Law
of Small Numbers," Psycholog-
Reprinted in Kahneman, Slovic, and Tversky (1982).
Waldfogel, Joel (1993), "The Deadweight Loss of Christmas," American Economic Review, 83(5),
December pp.
1328-36.
37
Appendix
Lemma
The optimal forecast
1
satisfies:
t-i
yt{huet)
=
+ ^[wk,tXk +
xt
{l-Wk,t){yk-yk-\)]
k=\
f
ol{huet)
=
ol
+
d1^^
2
I
^t-i
+ ^?
2
where nst
where A
is
=
jt^,
the error to truth ratio:
In^al
=
ol
}rmw,(t^k)
=
A
and
define: Wk^t
= \(al +
=
HjILo
^ol{al
'^Syt+j
.
In the limit,
+ Aa^,))
ai+at
Computing the optimal
Proof:
filter;
see ch. 4,
setting d^
=
Harvey
=
a^^j
forecast
is
a straightforward apphcation of the
Given the forecast
(1993).^^
Kalman
computing the steady requires
rule,
o^:
2
af
+ at
Solving the resulting quadratic provides:
As
Lemma
f
—>
oo,
Tikt
-^
2*7
a
meaning that Wkt -^ X*~^.
2 Forgetting probabilities satisfy:
'
E[fk{t+j)Xk\et]
=
~
.
Proof:
When
k
>
t,
fk{t+j)'-^k
i-p
(1
-
P^J^t
-pJX^(et)
depends only on events at time greater than
dence across time, therefore, shows that E[fk(t+j)Xk\et]
E[fk{t+j)Xk\et]
E[ft{t+j)\et]
'A derivation
=
for the
=
ootE[f^t+j)\et]-
E[f_
-
ifk>t
lfk = t
ifk<t
xat(t+j)\et]
=
in this case.
t.
Indepen-
When
k
=
t,
Breaking this apart:
+ pE[f_ -
steady state can be found
in
xati^i+3-\)\ct]
Muth
38
(1960).
+
+
f^^^E[f_
-
xO((t+i)k(]
-
This equals \:^{f_
= E[xk{l -
E[xkfk(t+j)\et]
xak{t+j))\et]
+p>E[xk{f_
By independence,
=
E[xkf\et]
Lemma
0.
all
This
time
et,
E[yt\et]
=
Xt-
•
•
xak{t+j-i))\et]
y/^
=
=
—
yt
+
+ p'^^-''~^E[xk{l -
xak(k+i))
-
Even
X<^kt)\et\-
here,
-XP'^iet)-
beliefs satisfy:
t
=
+ x£{et)^{l -
xt
A*-^)
err^. This allows writing:
=
E[y('\et]
Now,
+ pE[xk{l -
- xakt)\ei\ +
-xp^ E[aktXk\et]
E[y^\et\
Notice that
that
terms here are zero except fPE[xk{f_
gives:
3 Conditioning on
Proof:
when k <t, note
X^i^t))- Finally,
E[yt\et]-E[errr\et]
The second term can be written
as:
t-i
= -
-E[err^\et]
^
X'->'E[xkfkt\et]
k=l
By Lemma
2,
E[xkfkt\^t]
=
—X^i^t)- Substitution provides that:
E[ynet]
=
+ x£{et){X +
xt
X'
+
---X <-ii
= xt+x£{et)Y^{l-X'-')
Lemma
4 Conditioning on
E[yr+j\et]
Proof:
first
et,
time
t
+j
--
=x,\l- ^ ^_^
Again, notice that yj^
term with respect to
et
equals
beliefs satisfy:
(1
=
Vt
xt-
- p')Xn + {pxyx£{et)y-^(l - V"^)
~
^''"''t+j-
The
'^^^ conditional expectation of the
conditional expectation of the second term
equals:
t+j-i
-E[errr+j\et]
Applying
This
Lemma
=-
2 tells us that the
£
X'+^-'E[xkfkit+j)\et]
summands
in this
summation are zero
leaves:
t-i
-X^xtE[fkt\et]
-
A^'
^
k=l
39
X'-'=E[xkfkit+j)\et]
for k
>
t.
Lemma
Again applying
+
-X^xtE[fkt\et]
2 to E[xkfk{t+j)\et] provides:
X'p^X^iet)
E
^'"'
=
->^'^tE[ht\et]
+ [XpY X^ {et)
^
{I
-
X'-')
Putting these together:
-E[errr+j\et]
Finally,
Lemma
=
x,(l
-
A^.B[/fc(,+,)|e,])
allows us to write:
2,
+ {Xpyx£iet)Y^{l =
E[fm_f.j)\et]
-
—
^_
(1
-
A'-^)
Substitution
p').
gives the stated formula:
E[y(lAet]
Lemma
=
x, (l
-
I=^^(i _ ^)aA
+ [pXyxE[e^)-^[l -
X'-')
5 The following are true:
Proof:
I
will
show the
sign{E[xkc{x
—
Xk)\x\)
sign{E[ni.c{n
—
n^)|n])
first
=
=
sign{x)
sign{n)
of the two equations, the proof for the second
the same.
is
exactly
/oc
Xkc{x
- Xk)dFk
-oo
Breaking the integral at zero and applying symmetry of the x distribution
gives:
roc
/
Xk[c[x
-
Xk)
-
c[x
-\-
Xk)]dFk
Jo
Since Xk
>
in this equation, the sign of
c{x
if
and only
fif
x
Formally, since
\x
+
Xk\-
is
c(-)
—
c(x
0, this is
+ Xk) >
which happens
measures closeness, c{x - Xk)
Squaring both
>
Xfe)
sides, gives
- Xkf -
(x
equivalent to i
— Xk) — c(x + Xk))- Now,
equals the sign{c(x
closer to Xk than to —Xk~
(a:
Since Xk
—
it
>
c{x
if
and only
+ Xk)
:
+ Xkf >
>
0.
sign{E[xkc(xi
-
{2x){2xk)
This shows that:
-
Xk)])
40
=
sign(x)
>
if
if
and only
x
is
if \x
positive.
—
Xk\
>
Lemma
6 Associativeness implies:
>
E[xkXtakt]
Proof:
Note
that:
/•oo
roo
roc
E[xkXtakt]=
XkXtc{xk— xt)dFkdFt
/
Joe
Jco
Breaking apart the integrals allows us to write:
(/•oo
TOO
rO
//+//+//+/
/-co
/-O
Jo
J-oo
Perform the integral transformation
in the
xi
1-^
—xt-
y-oo ./-oo
Jo
Jo
By symmetry
F
of the
Jo
roc
2
/
Jo
/
rO
this
+
xt)dFk
—Xk now
Xk[c{xk
-
which as we saw in the previous proof
is
/
Jo
- xt)dFkdFt
-
xt)dFk
)
XtdFt
gives:
/
Xt)
-
c{xk
+ xt)]xtdFt
Jo
positive since for positive Xf, c{xk
7 Forecast errors satisfy:
t
errr+i
= pXerr^ +
^
A*+i-'=Xfc(/
-
X«fc(t+i))
A:=l
Proof:
Write:
errr+,
=
E
\'^'-'xkfkit+,)
jt=i
Using the
fact that fk{t+i)
Substituting in for
errl'^ in
=
J
c{Xk+Xt).
Lemma
and
/•oo
/•oo
2
-Xfc
\
Xkc{xk
J-oc
^-^
>->
\
XkXtc{xk
/
\Jo
Performing the transformation x^
xt)dFkdFt
becomes:
rO
-
Xkc{xk
-
J-ooJ
Jo
roc
/
and c(),
/
/
XkXtc{xk
/
second and third integrals of Xk
roc
+
/
/
J-ocJo
distribution
rCG
Oroc
r°°\
/-O
/-O
=
Pfkt
the
errr+r
+
first
/
-
Xafc(i+i)>
term
= pXerr^
we
get:
gives:
E ^'^''Hl -
k=l
completing the proof
41
Xakit+i))
—
xt)
>
Lemma
8 The variance, yar[errj"|ef]
is less
than Var[errl] for small
x and
increases with
Note that Kar[err™|e(] equals:
Proof:
t-i t-i
k=l j=l
When X
to zero.
is
To
close to zero, notice that the non-diagonal terms,
see, this notice
E[{1
The
the
first
Xajt)xkXj\et]
=
p'^E[fk{t-i)fj(t-i)XkXj]+X^E[aktajtXkXj\et]
«
is
are also close
because the second term directly goes to zero as
term goes to zero since
and
fk{t-i)
Lemma
associativeness, as seen in
2.
fj{t-i) only
x
does,
and
depend on each other through
Since the non-diagonal terms are close to zero,
focus on the diagonal terms:
let's
E[il
Again, as
x^.
^j
that these terms equal:
+ pfk(t-\) - XO-kt){f_ + Pfj{t-i) -
approximation
last
where k
x goes
And, since
to zero, this
j^
is
less
+ pfk{t-i) -
Xaktfxl\et]
becomes a constant
than
this
1
whole term
= Y,
Var[errl\et]
(as usual, taking
will
be
less
t
than
—
>•
oo),
(
j^
1
times
-E[x|]. Therefore,
^''~'' E[xl]
k=l
t-l
«
To
these also increase with x-
,
2
E[errr\et]
the yar[errj|e(]
=
intuitive:
increases, the
Lemma
of these terms, therefore, rises with x-
^
j)illustrate
It
an important phenomena. Consider
contains no such cross-terms.
They
exist only
6 tells us that the cross terms will be positive. This
associativeness raises variance by systematically introducing a correlation
in the recalled information.
X
(A;
Yll^Ji A^'~^'^S[x|].
because of associativeness.
x
Similarly, in the derivation of the diagonal elements,
The sum
Finally, the non-diagonal terms
as
r
see the increase with Xi first note that in the derivation above, as
non-diagonal elements increase.
is
/
When
gets large), then Var[err^\et]
these cross terms are sufficiently large (for example,
may
even be larger than Var[errJ|e(]
42
x
)6''.^
II
I
3
Date Due
Lib-26-67
„MIT LIBRARIES
3 9080 02246 0551
Download