Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/memorybasedmodelOOmull UtWtY -- i. 1- Massachusetts Institute of Technology Department of Economics Working Paper Series A MEMORY BASED MODEL OF BOUNDED RATIONALITY Sendhil MuUainathan Working Paper 01 -28 September 2000 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at http://papers.ssm.com/paper.tafPabstract id=277357 MASSACHUSEHS INSTITUTE OF TECHNOLOGY AUG 1 5 2001 LIBRARIES Massachusetts Institute of Technology Department of Economics Working Paper Series A MEMORY BASED MODEL OF BOUNDED RATIONALITY Sendhil Mullainathan Working Paper 01-28 September 2000 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at http://papers.ssrn.com/abstract=277357 Massachusetts Institute of Technology Department of Economics Working Paper Series A MEMORY BASED MODEL OF BOUNDED RATIONALITY Sendhil Mullainathan Working Paper 01 -28 Septennber 2000 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at http://papers.ssm.coin/paper.taf?abstract id=XXXXX A Memory Based Model of Bounded Rationality Sendhil Mullainathan* September 20, 2000 Abstract do memory limitations affect economic behavior? I develop a model of memory grounded in psychology and biology research to investigate this question. Using this model, I study the case where people apply Bayes rule to the history they recall as if it were the true history. The resulting beliefs exhibit over-reaction on average. They also exhibit under-reaction with the model providing enough structure to allow predictions about which effect dominates when. I then apply this general framework to an otherwise standard model of consumption. It predicts How the broad structure of consumption predictability as well as differences in marginal propensity to consume across different income streams. Most importantly, because it ties the extent of bias to a measurable aspect of the stochastic process being forecasted, the model makes novel, testable empirical predictions. "Massachusetts Institute of Technology and National Bureau of Economic Research, e-mail: mullain@mit.edu. This paper Weis written while I was a graduate student at Harvard University. I am indebted to my thesis committee, Drew Fudenberg, Lawrence Katz and Andrei Shleifer for their generous advice and encouragement, and to Marianne Bertrand, Edward Glaeser, David Laibson, and Richard Thaler for many helpful discussions. I have also greatly benefitted from comments by three anonymous referees, George Baker, John Campbell, Caroline Hoxby, Erzo and participants at numerous seminars. Financial support from the Chiles Foundation is gratefully P.P. Luttmer, acknowledged. Introduction 1 Memory limitations appear to affect the information used, such as the prices, way we form may come from forecasts. Consider purchasing a outside sources. But much car. Some come from will our recollections, everything from newspaper reports about a recall to recollections of particular friendly acts by a boss (perhaps suggesting a promotion and, therefore, the ability to expensive car). Despite its obvious importance in limitations are largely ignored.^ In this paper, I many facets of economic life, buy a more however, memory attempt to develop a tractable model of human memory, one that has testable predictions about economic behavior. To build this model, and psychologists. The easier to two stylized facts from the broad research on memory by biologists cull I first fact, remember that event termed again. rehearsal, states that Most students studying remembering an event once makes for an exam, by reading their lecture The second notes and repeatedly attempting to recall the material, take advantage of rehearsal. fact, termed associativeness, states that similarity of the memory to current events Cues in today's events trigger how his Fiat has turned out to While these People may be cues. facilitates recall. Hearing your friend lament about be a lemon may remind you of other Fiat horror facts describe the technology of memory, they don't tell us stories. how memory is used. naive and treat the histories they recall as true, and apply Bayes' rule to them. Alternatively, they and memories that contain similar it may be sophisticated, by possessing complete knowledge of the recall technology correct for distortions in recall when forming forecasts. Each of these decision "partial adjustment" rules — has This paper takes the step and draws out the implications of the naive model. first its rules appeal and undoubtedly a characterization of both — as well as is necessary. 'See Conlisk (1996) for a general survey of previous work on bounded rationality. Dow (1991) also presents a limitations, which examines optimal storage of information (in the context of search) given limited model of memory capacity. ^I chose to examine the naive rule first since experimental evidence suggests that individuals have neither accurate models of memory, nor correct for their memory mistakes in laboratory settings, making the naive model a natural first model to study. Of course, in cases with repetition and room for learning, sophistication may come to have more descriptive power. This makes it the next natural model to study and work in progress is examining it. This first pass also abstracts from recall effort. Individuals may work harder to remember certain events in the past over others. Such effort may take the form of mental exertion or the use of diaries to keep track of important information. Section 5 discusses these and other extensions to be pursued in future work. When memory means is used in this way, several interesting features First, associativeness result. that events affect beliefs not just through the information they convey, but also through the memories they evoke. Receiving a devestating referee report likely evokes other instances that erode self confidence. More many bad memories, generally, associativeness generates an over-reaction (on average) to information as each event draws forth similar, supporting, memories. cultivates by a laid-off other, In a related vein, completely pessimism by selectively evoking negative information. uninformative signals can influence beliefs worker on the if they affect what difficulty of finding a decent job is may Bad news recalled. Viewing a affect beliefs fictional speech because it may trigger more informative memories. Second, because of rehearsal, the memories evoked by an event linger, causing errors in belief So the over-reaction caused by associativeness dissipates only slowly. More generally, to persist. events will continue to have an effect long after the information they contain has been discredited. A smear campaign can have lingering thoroughly debunked. on the target. The unflattering effects even after results little Most interestingly, though, the the extent of such phenomena the stochastic process requires use of hard to fully shadow to disregard the testimony comply with such a request. memory and hence little effect of salient information, the hot model makes a set of out of hand or sample predictions. to the stochastic process that individuals are forecasting. little use of history (such as a random walk), there will be of the biases described. This ability to relate the extent of the bais to a measurable aspect of the stochastic process being studied is proclaimed have been —and others derived through similar reasoning— match many of the experimentally belief perseverance. When it it stay, casting a negative As another example, consider a judge instructing the jury found biases in human inference, such as the greater It relates the "facts" memories brought to mind they just heard. Even a well-intentioned jury would find These all makes the model refutable and formalized in Section 3.4. To assess the effectiveness of the general model in economic contexts, within a simple Permanent Income Hyopthesis framework. tions of the standard orthogonality predictions: Memory I apply it to consumption distortions here generate viola- consumption changes can be predicted using lagged information. Intuitively, when individuals receive good news about their personal income, such as a glowing compliment from the boss, they are more them remember other good news, causing likely to to over-forecast their income. This generates predictability of future consumption changes. when there are multiple income sources, the marginal propensity to The model also predicts that consume permanent income changes will be different for each stream. Income sources high marginal propensity to consume when memories play a large role in forecasting it, will show a such as with personal labor income rather than with gains in one's 401 (k). Most importantly, as in the general the model makes a set of out of sample predictions. Ccise, It relates the extent of consumption predictability to a parameter of the income process that measures the importance of the history. Moreover, since income streams with a high the income streams with the largest MPCs MPC connote over-reaction, it further predicts that should also show the greatest negative correlation with future consumption. Section 4 discusses these predictions. These results suggest that bounds on human memory can be match psychological findings as well as empirical facts in consumption. sample predictions that can be tested on standard data modeled. The results fruitfully also generates out of It Models based on bounded rationality sets. often invoke fears oi post hoc rationalization, fears that with a sufficiently flexible set of assumptions almost any behavior can be "explained". The out of sample predictions are a such fears. As a whole, the first step in alleviating findings suggest that models incorporating realistic limitations on recall have strong, testable implications about economic behavior. 2 Setup The basic framework examines an individual who forms expectations about a state variable. take this variable to be synonymous with permanent income in future discussions, but many will can be other things: a firm's earning power, macroeconomic conditions, or an employee's abilities are just a few examples. Forecasts of income clearly influence or portfolio choice moves it I for — and in Section 4, I explicitly study the many decisions — savings, job search, consumption decision. Labor income a variety of reasons, such as macroeconomic shocks, technological innovations, or changes in expectations about an individual's combining a diverse ability. set of information. As Some these examples indicate, forming forecasts requires of this information readily available in records: income in prior months, "soft" or loosely speaking, "hard" or is, unemployment or GDP. Other information harder to capture in records: a friend in a similar position being fired or a boss telling you that you are one of the best employees he has seen. This disjunction between hard and will is be useful for the model that follows. soft Knowledge of soft information depends on memory, while knowledge of hard information typically does not. The remainder of this section formalizes the setup. Environment 2.1 Let yt, income, obey the stochastic process: t yt = Yl^k+et (1) k=l where et is a transitory shock distributed N{Q,a^) and will describe shortly, yt is I'k is a permanent shock, whose structure observed by the individual and represents the hard information. that individuals have priors about the value of y which are normally distributed with variance Assume mean yo and CTq. Each period with probability p an event income. Each event no event, I I et et occurs, which will provide "soft" information about has an informative, xt, and non-informative, will write et = (0.0). = (x(,nt)~7V(0,S);S = component = E[Tit] = 0.^ When there is The covariance term, a^n = * (Jrnn. ^« where E[xt] component. Conditional on an event occurring, they are distributed: Crl et rit, / cov(x,n), measures whether the neutral typically appears with positive or negative information. A concrete example of an event might be hearing a friend describing his recent unemployment experience. The length of his ^In Mullainathan (1998), this assumption is discussed in greater detail. Normality implies that xt and nt axe symmetrically distributed. Symmetry implies that neither good nor bad events are more memorable (in the sense of both evocativeness and vividness discussed below. unemployment medical spell would be informative [xt), while the fact that he has a pregnant wife with up would be uninformative {nt)^ The model includes uninformative, or neutral bills piling components, because they The permanent shock will affect recall probabilities. at time t will be defined iyt=xt where zt ~ as: + zt N{0,o'^).^ Thus, while the informative part of the event tells her something about the shock that period, informativeness its is incomplete and depends on a^/crf At the beginning of each period, she observes the event, with past information to form a forecast, observed. 2.1.1 The The game Perfect perfect is forgetful forecast. She then combines At the end of the period the true value of income is Forecasts forecast will serve as a useful base case against The this information then repeated. Memory memory ijt. C;. which one can compare the process in equation (1) generates a signal extraction problem: the individual must separate out the permanent shocks to yt from the transitory ones. represent a negative shock to permanent income or both past events and yk help to solve may this inference A 5% income drop may only affect current income. Knowledge of problem. Events e^ are useful because they allow one to extract a component of the time k innovation (x^) that is for sure permanent. Past income realizations y^ are useful because they allow one tease out the remainder of the permanent shock (zfc) but with less certainty. Repeated observations of high income will suggest a permanent rise.^ ""Of course, Jis the example also illustrates, every part of an event will have some information content, and the dichotomy between xt and nt merely simplifies this spectrum. ^A slight awkwardness in this definition should be noted. The variance of zt is higher when there is a shock than when there is none. In this sense, "signal" may not be a completely accurate word. This assumption does not drive the results; I use it so that the residual variance of Zt conditional on observing x, is constant, allowing a more transparent analysis. ^Contrast with the case where yt follows a standard random walk. Then, !/(_i is the only information in the past needed to forecast yt- For the model formulated in this paper, yt-\ is a sufficient statistic for eJI past information. This is an artifact, however, of the simplicity of the model. If we complicate it by assuming that different events have diff'erent levels of mean reversion rather than all being permanent, this will no longer be true. Forecasts must then rely directly on all past yk and e*. forecast can The optimal posterior at time t will be easily derived using the mean be normally distributed with beliefs these will equal (see Lemma 1 Kalman yt Filter (see and variance Harvey 1993). The ct^. In steady state in the appendix): t-i ytiht.et) = X, ^ + + [a'-^x, (1 - A*-^)Ayfc] (2) fc=i aUhuet) where Ay^ = - yt yk-i and A = = ^2 ^^2 al (3) - Understanding the marginal impact of different variables cast rule. is Xk a direct + on Xk Zk is + x^ influences forecasts one-for-one. First, eff"ect ^k — will impact Its which contributes X*'^Xk and an indirect Efc-ii that contributes (1 unity. Second, Ay^ — A*"'^)^;^. — A) < 1 is the eff'ect 1 — because A*"'^ it < 1 sum from Summing shows enters forecasts with weight Third, y^ influences forecasts at A*~'^~^(l improve intuition about the as enters in of two terms. (or There because Ay^ Ay*;, = that the total coefficient is clear from the formula. Ay^ and in Ay^+i. Both yk and Ay;t have a less than one-for-one impact because both are noisy estimates of income fore- permanent permanent income changes). That Xk has greater impact reiterates the importance of events in separating signal from noise. that for sure is no information. They show the individual a portion of the income change permanent. Fourth, n^ has zero impact as expected: neutral components convey Finally, A measures the importance of history receive greater weight. This is intuitive. When in forecasts. As A increases, older yk A approaches zero, most of the variance comes from permanent shocks, and hence the process resembles a random walk. In the least and past values should receive the least weight. this case, history matters Memory 2.2 Formal setup 2.2.1 Memory will Let history, be modeled that transforms true history into perceived history J be a vector that includes y^ and e^ for k ht, ht Memory maps map as a stochastic ht into the nature of this = t: (yi,---,yt-i,ei,...,et_i) a random variable map and < /if. I begin by making mathematical assumptions about then use experimental evidence to characterize the remainder. As discussed, past values of income are hard information therefore, assume that y^ soft information, will be recalled perfectly. Formally, write recalled history as /if Notice that in the recalled history, y^ transformed into a random variable, ef whose value (0, 0) probability that event et is is with probability recalled at time t is is unaffected, whereas is forgotten, it is exactly as if 1 — denoted by ri-t r^t, where these probabilities are no event occurred that period. Picture history as a series of boxes, one for every time period. that period's event. each box and if flips forgotten, the An empty box signifies that a coin with weight box appears empty r^t to A hnked.* metaphor may Each box contains the When help. details of if to the event in that box will be remembered; to the individual. Notice some of the implicit assumptions distorted versions of events: may be no event occurred that period. Memory goes determine is r^t applied independently across events, though algebraically the probabilities an event cj = governed by: with probability Ck I, Events, on the other hand, characterize and are more prone to be forgotten. (ef,e^, ...,ef_i,yi,...,yt_i). The readily available in records. made in this specification. Individuals they either remember them or not. They do not remember also do not ''remember" '^The recall process readily lends itself to a probabilistic interpretation. Casual conversation consists of phrases such as "more likely to remember" and experimental work supports this. James(1890) seems to present the first probabilistic interpretation of memory, though of course he does not use this terminology. ^Formally, conditional on rn and Tjr, Rkt and Rjr are independent. events that never happened. feeling of "I think might all matches a non-event, so that there something happened but I'm not sure what" be useful tasks for the future. evidence for motivation on scientific Evidence on 2.2.2 Finally, a forgotten event how Weakening of these assumptions . To complete the model, I need to specify r^t- I turn to the to specify this. Memory Research by biologists and psychologists has generated much knowledge about memory.^ on two of these features, which The first, no is in my I will focus opinion are by far the most relevant ones for economists. memory rehearsal, states that recalling a increases future recall probabilities. Stu- dents quickly recognize this property: repetition strengthens memories. Experimental evidence on rehearsal can be found even at the neuronal level. Repeated firings between neurons strengthens their synaptic connections, or the strenght of the "memory" stored experiments with humans shows similar behavior. of words. it One group then practices recall of this Two list there. ^'^ At a more macro level, groups of subjects memorize the same list periodically, while the other does not (both see only once). After the same time has elapsed for both groups, the group that has been periodically recalling the list shows higher recall of the list. That these findings should seem so obvious is a testament to the intuitive appeal of the rehearsal assumption. The recall. second, associativeness, states that events more similar to current events are easier to For example, hearing a friend talk about his vacation will invoke memories of one's vacations. Associativeness The importance colleagues, word is may arise because events serve as cues that help "find" lost own memories. of associativeness in every day recall has been emphasized by Tulving and his who study the role of cues in recall. Subjects learn a paired with a cue word. The subjects are then asked to either provided with the associated cue or not. A broad set of list of words in which each target remember the target words, and are such experiments finds that recall of ^Schacter (1996) presents an excellent overview of this literature, one that I draw upon. '°See Kandel, Schwartz and Jessell (1991) for a discussion. A contrasting effect is habituation, wherein synaptic strength decresises with frequency. This corresponds to the idea that novel stimulus receives notice which lessens as the novelty wears off. I ignore this property because it a property of attention rather than of memory. the target words is subjects who is higher when the paired word learn the sentence (Anderson The They are more likely to remember fish is et. present. ^^ al., A related example of this phenomena 1976) attacked the swimmer this sentence if given the cue "shark" than if given no cue at all. Notice, however, that "shark" never appears in the sentence, which illustrates that associativeness likely operates also through conceptual similarity.-'^ As these studies demonstrate, both rehearsal and associativeness have a strong experimental In fact, the most popular models of basis.^'^ memory (and neural function generally), Parallel Distributed Processing Models, possess both features (Rumelhart and McClelland, 1986). Nevertheless, I do not mean to imply that these are the only "important" facts about memory, merely that these appear to be the most relevant to economists.^'' 2.2.3 Formalism Three parameters appear rehearsal), that and x (which m+P+X < 1- in the formalism: m (the baseline recall probability), p (which quantifies quantifies associativeness). Let Rkt denote the random Assume that all are between zero and one variable which equals 1 iff event k is and recalled at ''These paired words sometimes share a natural connection, such as "brain" and 'mind" or "brain" and "drain", and sometimes are unrelated, such as "brain" and "doughnut". The findings hold in both cases though the the effect is stronger when the words are connected. '^Laibson (1997) derives a theory of consumption based on preferences that exhibit a form of conditioning, which Our papers differ because I focus on expectations rather than is related to associativeness (MacKintosh, 1983). preferences. The similarity is interesting, however, and suggests that a memory model, in which individuals must use past experience to forecast preferences, potential!}' provides one microfoundation to the preferences used by Laibson (1997). '^The evolutionary advantage of these two properties are easy to understand. Frequently encountered phenomena and memories similar to current circumstances are both more relevant. I have not formally pursued such intuitive notion to get at a more evolutionary or optimizing basis of memory. Such a model would require a precise understanding of the constraints on what memory mechanisms are even biologically feasible. '"Let me cite the two most interesting omissions. First, researchers now believe that certain memories are episodic (the time you tasted caviar), while others are semantic (you dislike caviar). This distinction is interesting because semantic memories may not possess all the episodes that gave rise to them. Second, memory seems to be reconstructive nature (Neisser 1967). The process of reconstruction uses a priori theories to put together the pieces, so facts that more likely be forgotten. In a seminal experiment, Bartlett (1932) demonstrated how in recalling stories, subjects often edit out inconsistent parts. I ignore these for the time being, however, because they lack the mass of evidence that supports the other two assumptions and because they are analytically more vague. in deviate from these theories will 10 time with R{t-i)t t, = 1 and = r^t-i)t Note that E[Rkt] 1- = With ^kt- this notation, we can write: rkt=m + pRk{t-i)+Xakt The first term equals the basehne recall probability for all (4) memories, m. The second term captures This formalism of rehearsal rehearsal. Events recalled in the last period get a "boost" of p. seem awkward. Consider two constant) both have the same The and do exhibit exponential decay. results = Alternatively, (a^fc,7^fc) occurs not only in expectation but on every it and e^ = where akt measures the similarity of event e^ to is, an inverse distance function). Then similarity o-kt and with the assumption that = e"^^ c{x) att = akt which allows one = = 2 if Ct- The two points on a plane. Similarity can then be defined as (xt,nt) are a negative function of the distance between the points. Let c function (that could build exponential decay I would not change. third term captures associativeness events ek (holding the third term Proposition 6 demonstrates that in expectation, superficial. is Then yesterday. Recall appears to display sharp, rather than dynamics of memorability, so that directly into the realization remembered is recall probability. smooth, decay. This awkwardness recall probabilities which occurred two days ago and et-20 which occurred events: et-2 twenty days ago, and suppose neither may - (c(a;t either to write: akt Cfc = Xk) or + c{nt cj is - : (—00,00) —> (0,1) be a closeness is defined as: Uk)) a non-event. I will take the specific function +6"'"'""'=)^]. 5 (e"(^'~^*)^ Thus < a^t < 1 and 1- Substituting back in to the original equation provides: rkt and /fct recall that = 1 - can write rkt I = assume that and similarly fkt=l + m + P-Rfc(t-i) + X2 m + p+x Ffc( pF^t-i) = - X^ 1 - < 1. It will i^i^t - ^k) + c{nt - Uk)) also be useful to define the forgetting probability, Rkt- Further define the constant / {c{xt - Xk) (5) + c{nt 11 Uk)). = 1 — m - p, so that we Finally, note that unlike the other basic form here is arbitrary. I assumptions of the model, the choice of functional could well have included an interaction term between associativeness and rehearsal or higher order terms. As another example, might have allowed I for limited capacity so that only a finite set of memories can be recalled at any time, which would generate crowd out. The intuition behind the results that follow does not rely on the functional form, though some of these extensions are clearly worth investigating. Basic Results 3 Dynamics 3.1 of Recall Before moving to forecasts, it be useful to see how will recall works. Simple recursive substitution yields: hm E[fkt\ek] = E[fkt\ek\ = = [l-xE[akt\ek])^-~— ^ t->oo for all k < t. l (6) 7 ~P Recall probabilities decay exponentially over time: further back memories have higher chances of being forgotten. Experimental evidence on recall probabilities indicate that exponential decay of memories which I fits will define as the data rather well.^^ Also, vividness, Memories that are very similar V(efc), to a £^[afct|efc] increases memorability. This term, measures how strongly associativeness randomly drawn event to be triggered through associativeness and, therefore, will = E[xkakt\et]- If today's event is e(, likely more memorable. ^^ to define the average information of these associated events. Define the £(et) memory. be more vivid. The are more While vividness capture the strength of associations that an event possesses, is: affects a it will also be useful evocativeness of event then a^t measures the strength of its event (memory) e^, while Xk measures the information content of that past event. et association with The expectation of this product, therefore, measures the average information content of memories brought forth '^See Crovitz and Schiffman (1974). A power function, however, seems to fit better. '^This result, however, has the unfortunate property that outHers, very unusual events, have /oiuer recall probability, contrary to one's intuition. One resolution to this problem may be found in allowing for the possibility that unusual events may receive greater attention, and that attention may 12 increase memorability. by associativeness. To illustrate evocativeness, consider the event e* evocativeness of this event has two parts. close to 1 will Second, nj rif = = = I (xt,nt) (1,1)- The imphes positive evocativeness. Other xt be evoked, leading to an oversamphng of positive memories and positive evocativeness. 1 can have a positive or negative impact on evocativeness depending on Oxn- Since depends on Oxn- When the effect on evocativeness is Oxn is knowing that n^ > zero, zero. If ffxn is positive, knowing n^ says nothing about Xk, so that > tells us that x^ events are selectively triggered causing a positive effect on evocativeness. negative, nt > Limited 3.2 = other events with positive n^ are triggered, but the information content of these events 1, clearly First, xt = We now tells us that Memory a:^ < Finally, > 0: positive when o^n is generating a negative effect on evocativeness. Expectations turn to the forecasts of the forgetful individual. I will make a crucial assumption here: the forgetful individual applies the forecasting rule in equation 2 to the recalled history. In other words, she takes the the recalled history as the true history. Let y^{hf-,et) denote the mean and a1^[hf-,et) denote the variance of a (naive) forgetful Bayesian's posteriors. This assumption can then be stated formally I as: yf (/if ,et) = and d'l^{h^,et) yt{hf-,et) = d^{h^,et). have referred to this as the naive decision maker, in contrast to the sophisticated decision maker, who completely knows the model of memory and corrects his forecast rule accordingly. Deciding between these two polar models will be an important task, and one that requires a complete characterization of behavior in both I it cases. have chosen to investigate the naive case first describes behavior at least in the laboratory. because experimental evidence suggests that Studies of individual's judgements of their understanding their memory process example, Reder 1996). memories reveal inaccuracy in Similarly, experiments have manipulated the memorability of information and tested whether (see, for own indi- viduals' decisions correct for this manipulation. Supportive of the naivete assumption, decisions are insensitive to this manipulation. Of course, repetition '^See, for example, Trope (1978). This also resembles findings by heuristic, that individuals take more easily remembered events 13 and room for learning, Kahneman and Tversky to also be more probable. may dramatically (1982) on the availability alter these findings. Nonetheless, the findings suggest that characterizing the naive decision would be a useful first maker step.^^ Simple substitution gives the formula for the forgetful forecast: t-i = yf(/if,e() + 5^ X, [i?fc,A'-^Xfc + (1 - A'-'=)Ayfc] (8) k=\ = a/«(/if,e,) ai (9) In words, forgetful forecasts look just Hke perfect recall forecasts except that forgotten events {Rkt = 0) are excluded. Note that yf is a random variable. Taking expectations over this random variable implies that events are weighted by their recall probability. In order to contrast the perfect recall and forgetful forecasts, error. First, let errt = yt - ijt forgetful forecasts respectively. and errl^ Now = Vt - yf err[^ = = yt + memory error of the forgetful individual. recall and - y^ error, that is the difference in forecasts err^^, so that the be useful to define a memory be the forecast errors of the perfect memory errt will define: errY" to be the it With memory problems. Note that how memory distorts the forecast caused by error also measures these definitions in hand, I now examine the determinants of beliefs. Proposition The impact of event et on time t beliefs does not depend on its vividness, but does on its evocativeness. On the other hand, its impact on time t+j beliefs depends on both vividness and evocativeness. depend 1 (positively) Proof: From Lemma 3 E[y^\e,] = xt + x£{et) —1- \t-i\ ''Preliminary results suggest that even more nuanced results may arise in the sophisticated model. For example, suppose that forecasts are not remembered but that zero-one decisions which condition on forecasts are remembered with certainty. Then, a herding problem akin to Banerjee (1992) arises. Consider an individual who remembers is the best choice. For certain histories and choosing 1 several times but currently faces information that suggests parameter values, the weight of having chosen 1 in the past ("I must have had some reason to do it") will dominate and he will choose 1 again. But this implies that the signal that he received this period will be jammed and he will be stuck in a herding equilibrium. 14 showing the dependence of evocativeness, and the absence of a vividness intuition here Evocativeness influences what memories are triggered and, simple. is The eifect. Vividness only operates through increased on behefs. therefore, has a direct effect memorability, which of course cannot have an impact on contemporaneous beliefs. For the impact on future where we see as before the triggered at time bility We beliefs, appendix shows that: dependence on evocativeness. This t + j. Consistent with also see here that vividness Thus increases memorability. likely to 4 in the because the memories is were rehearsed and, therefore, continue to have higher t even at time Lemma now note that as p plays a role. As we saw —) 0, the efi'ect in forming beliefs. One disappears. in Proposition 6, vividness increases the marginal impact of xt by it be recalled and used this, recall proba- implication changes in vividness have no impact: whether or not the event is is making that when recalled, it it xt more = 0, does not influence beliefs. This proposition and sociated a memory its is, proof makes several points that are worth reiterating. Vividnes, how as- plays no role in how an event influences beliefs at the time only matters as time passes and a chance to forget the event appears. ity, vividness influences whether or not an event information it conveys is is By it occurs. It increasing memorabil- remembered and thereby whether or not the used in the future. Evocativeness, the average information content of memories associated with an event, does influence beliefs contemporaneously. tionately draws forth positive An event with positive evocativeness, for example, dispropor- memories leading to a more positive forecast. Moreover, since these triggered memories persist (by rehearsal), evocativeness also influences future beliefs, though its efl'ect diminishes over time (the p^ exponent). intuitive notion of "salience" into bility, Summarizing, the current model decomposes the two components: vividness, which captures increased memora- and evocativeness, which captures the ability of events to trigger 15 supporting evidence. Both an event's impact on affect beliefs even completely uninformative signals can affect Proposition 2 Let et = (0, nj) This event influences (nt # 0/ An but do so in diiferent ways. be interesting implication is that beliefs. an uninformative event but with non-zero neutral component and only if Oj:n # 0. The sign of this influence equals beliefs if sign{oa:nnt). Proof: Appealing to evocativeness is non-zero. E\xkakt\et The first term is Lemma = The if their evocativeness of an uninformative event equals (0,nf)] zero by uninformative events can influence beliefs only 3, = - i^E[xkc{Q symmetry of - Xk)\et]+ -E[xkc{nk nt)\et] and the symmetry of the Xk distribution. To c(-) evaluate the second term, apply the law of iterated expectations and condition on and nf. nt)\et\ = E[E[xk\nunk\c{nk - 5 shows, this is non-zero whenever nt E{xkc{nk As Lemma - hence the event's evocativeness) The logic here is simple. in this process is is still and hence the event beliefs. is 0, 0) memories. - nOk*] a:,nE{nkc{nk and the sign of signal with shaped by the memories these events negative information. For example, a^n > / = this term (and sign{crxnnt) which establishes the first part. Oxm which determines whether positive neutral cue (n^ nt)\et] Even though individuals disregard a uninformative, their beliefs are These rik > 0, = trigger. as completely The mediator the neutral cue tends to appear with positive or a positive neutral cue (nt If CTi„ a;< > 0, > 0) selectively evokes other these memories will (on average) have Xk selecitvely evoking positive information > memories. results relate to experimental findings that salient information has a greater effect Two experiments highlight the differential Reyes and Bower (1979) place subjects into the effect of evocativenss role of jurors, prosecution witness testimony about a drunk driving case. 16 who One on and vividness. Thompson, are asked to read defense and side's case was manipulated to be salient while the other's was manipulated to be pallid. ^^ After reading the two sides, subjects rate When the guilt of the defendant and are asked to return the next day. to perform the rating again (they do not read the testimony again). the salience manipulation has no effect on the is comforting since it Thompson lack of an et. al. find that immediate impact suggests that the salience manipulation did not also manipulate perceived informativeness. For example, we can rule out the possibility that subjects felt that a witness testimony contains more details was more the second days' judgements of guilt: judgements of The days' ratings. first they return, they are asked guilt rose (fell). One The reliable. when whose salience manipulation did, however, affect the prosecution's (defense's) case was more salient, interpretation of these results is that the presence of additional cues (guacamole on white carpet) facilitates recall by marshaling associativeness.^*^ Vividness, as I have defined carpet it, increases because these (irrelevant) cues — are commonly encountered ones. The — for example, spilling something on a increased vividness of one side's case memories over-represent evidence supporting that means that side. Hamill, Wilson, and Nisbett (1979) present another experiment, one that resembles evocativeness more than vividness. As Nisbett and Ross One set of subjects (1980, p. 57) summarize: is presented with a description of a welfare recipient. "The central figure and irresponsible Puerto Rican woman who had been on welfare was an obese, for many friendly, emotional, years. Middle-aged now, she had lived with a succession of 'husbands,' typically also unemployed, and had borne children by each of them. Her home was a nightmare of dirty and dilapidated plastic furniture bought on time at outrageous prices, filthy Her children. ..attended school teens, with the older children kitchen appliances, and cockroaches walking about in the daylight. off and on and had begun now thoroughly enmeshed to run afoul of the law in a life of heroin, in their early numbers-running and '^The salience manipulation was performed through adding inconsequential details to one side's testimony. For example, in describing the defendant about to leave a party and drive home, the pallid version states that he bumped into a table, and knocked a bowl to the floor. The salient version, on the other hand, states that he knocked a bowl of guacamole dip off a table and onto a white carpet. ""A weakness of the current model of the experiment should be pointed out. The guacamole on white carpet cue is effective not because it associates with current events but because it associates with past events. In other words, the model needs to allow not only for current events to form associations that facilitate recall, but also memories themselves should form associations that further facilitate recall. I expand on this when I discuss future extensions in Section 5. 17 Another group was given summary welfare." median stay (two (only 10% years) on welfare recipients documenting the short and the small proportion that are on welfare than four years). These for longer statistics rolls for statistics contrasted sharply long periods of time with the priors of control subjects. When the groups are asked to state their attitudes about welfare recipients, those receiving the story expressed far statistics more unfavorable attitudes than a control group. Those receiving the showed no Evocativeness provide one interpretation of these findings. difference. story that subjects read is recipients in a poor light such evocative cues. overflowing with cues — drug The commonly found pallid The in evidence that paints welfare use by children, immigrant, obese — whereas the statistics lack story thereby triggers evidence from the past that also contain these cues, evidence that will generally towards welfare recipients. In be negative. It, this interpretation, it therefore, is prompts more negative attitudes not that the single case study is taken as informative. Queried, subjects should state that of course they recognize that one story (especially a manufactured one) proves nothing, but that evidence. Of course, the pallid statistics it reminds them of other previously encountered do not possess such cues and, therefore, have lower evocativeness.^^ Together, these experiments illustrate the contrast between vividness and evocativeness. inessential cues in the testimony (e.g. guacamole on the carpet) will The not (on average) trigger other memories that condemn or exonerate the defendant. They do, however, make the testimony more vivid, making it more likely to hand, the welfare mother story will be remembered and influence will selectively trigger beliefs in the future. memories. Its On the other evocativeness means that it have greater contemporaneous impact. ^^That they have no effect, however, indicates either that individuals do not put much faith in the statistics(numbers can be manipulated) or that other factors are at play there. It is also worth noting that an implication of this interpretation is that the effect of the manipulation (seeing the story) should disappesir over time. If subjects were brought in at later dates, the difference between treatment and control should diminish and eventuaJly vanish. 18 3.3 Over-reaction and Under-reaction The previous propositions illustrate impact; the memories how information similar information. Such an effect The ... - yf , ,. Note that err^ -ii = = errt + = xt) Cov{errl^, xt) I x and this over-reaction increases with Cov[err^,xt) following proposition formalizes this idea. errors are negatively correlated with the information in the latest event: Cov{yt Proof: how causes an over-reaction to news: today's events causes similar evidence to be over-represented in memory. Proposition 3 Forecast us about tell Associativeness implies that events trigger memories that convey forecast errors will be biased. The extent of But these propositions do not triggers also matters. it content alone does not determine an event's \ A; and that err'^ < dCovierrP ,xt) q;^ errt j dCovierrf' ,xt) n U and q^ <^ independent of is xt- , < rv U. Therefore, Cov{errY',Xt). Calculating this: t-i E[errrxt] = J2 ^'-'EifktXkXt] Using the fact that x^ and xt are independent, we can write the Intuitively, E[xkXtakt] is positive because implies that the overall covariance is Ofct measures negative. To summand as: similarity. See —xE[xkXtakt]- Lemma 6. This get the comparative statics, let's complete the calculation: ElerrPxt] = -xE[xkXtakt]{X + + A^ • • • + A*-i) = Partial differentiation shows that this decreases with teresting. It happens because when A is A(l -xE[xkXtakt] x and large, the selective A. The — A*~M \_^ effect of A is in- sampling of past memories becomes more important, since these memories enter with greater weight into the fore- cast rule. Intuitively, good information may lead to a rosier view of the past, which leads to forecasts that are too large, which leads to a negative forecast error. ^^ Over-reaction increases as ^^ x rises because Evidence on over-reaction can be found in studies of the representativeness heuristic by Kahneman and Tversky Kahneman (1971) and Grether (1980). These studies find that in forming assessments (1972,1973), Tversky and 19 X quantifies the importance of associativeness. Finally, the effect of A arises because the importance of history and thereby the importance of selective recall. important point, which we The will return to in Section Cf = results in it is an extremely 3.4. previous proposition paints a picture of individuals over-reacting to information. Rehearsal, however, generates under- reaction. To see event This meeisures it (0, n^) at time Prop 2 t. illustrate Suppose that how this, consider an individual this event evokes positive who faces an uninformative memories so that beliefs over-react to this non-information: the positive triggers results in forecasts that are too large. > £{et) 0. The memories Since these memories are rehearsed, they will experience higher recall probabilities in future periods, meaning that forecasts will continue to be too large. As time goes on, they will decay towards the true value as the effect of the rehearsal on recall probabilities diminishes. To an outsider, the belief changes in later periods will they were unrfer-reaction. At both times as the memories decay t + j and t + j + The in each of those periods. I, she will see a Notice that if Lemma observer, therefore, sees a negative change we = (0,n,)] = ;^-i_^(e,)(Ap)^(Ap illustrates the negative "drift" in beliefs that follows - drift If we difference 1) an over-reaction. In other words, future belief revisions are negatively proportional to the initial evocativeness. Beliefs appear to negative find: E[Ay^^j_,,\et which first 4, that: p were zero, this term would be zero, emphasizing the role of rehearsal. this over time, if downward adjustment, followed by another predictable negative change, an apparent under-reaction to the change. Formally, note from seem as towards some equilibrium. The intuition behind this finding is all will, therefore, that there is more weight on base rate evidence and too much weight on the latest piece of information. form of perceptions of a "hot hand": individuals seeing a streak expect it to continue. While this model does not provide compelling evidence of all the actual experimental evidence (in many of these, the relevant information is directly available and memory plays no role), it generates behavior in real settings individuals place too A similar little phenomenon arises in the that resembles the findings. 20 information in her forecast errors than the individual realizes: err^ As with the in which their memory {errt). both arises even more If also tells the individual the it she is positively surprised, the I discuss this further in Section 3.4. intuitively in a slightly modified version of the model. there ej, is (perhaps a rumor). Abstracting away from for rit now, suppose that x[ equals xt plus noise. statistic followed still depend partly on x[ by a revision. In this setup, even after xt Because, even though the individual discards the information contained in it An But rehearsal combined with revealed the individual should pay no attention to x[. associativeness will imply that beliefs will Suppose a period where the individual observes a noisy event example might be the announcement of a government is some probably has been a positive shock and that she infer that there that before observing the true event once Xt But, systematically biased {err^). is systematically under-sampling positive memories. Slow adjustment ej + err'^ permanent component in the forgetful individual should is errt perfect recall individual, the forecast error tells the forgetful individual that change has occurred way = x[, is revealed. Why? the set of memories evoked have been rehearsed and they continue to have an impact in later periods. ^^ Finally, the following proposition Proposition 4 Let T> When t. shows that forecast errors can be positively auto-correlated. events are very memorable (f low, x ond p large), then Cow[errf ,er7-,^i] > Proof: (Sketch) details. The proof I will for the finite (that tend to zero as follows: (1) present a proof for the case where Use the t t case exactly the is gets large) involved. fact that The f ^^ oo to abstract from same but with more constants general strategy of the proof in all some terms that on belief perseverance (e.g. Ross, Lepper and Hubbard, and Weber, 1990) and on hindsight bias (Fischoff, 1982). three of these categories emmphasize the inability of subjects to "undo" information. See Mullittle resembleiiice to the findings 1975), the curse of knowledge (e.g. Camerer, Loewenstein Experiments as err^^ can be written as a function of err™ plus some terms: (2) Substitute into Elerr'^-^err^] to get a ElerrY^ err^^] plus ^^This result bears a is lainathan (1998) for a discussion. 21 resemble to J5[xfcerrJ"]; (3) these generate opposing signs so that the variance term tends dominate whenever the probabihty of forgetting For the first Lemma step in the proof, see is small. which shows 7 that: t-i err^i ^ XkH - = pAerr™ + xak{t+i)) k=l Substitution into E[err^ierr^] gives (step pWarierrD + 2): ^ X'+'-'E[x,{1 - xak(t+i))errr] k=l Substitution for the latter gives: t-i (-1 pXVarierrD + The E XE[{1 - xat(t+i))^terr^] EE X^'^'-'^-^ E[xkX,{l - xafc(t+i))/j*] third term can be written as: X''+''''E[xl{l-xak(t+i))fkt] + E E >^''^'~'~'p'''E[xkXjil-xam+i)){l-Xajk)] fc=lj=l k=l where since x^ and Xj are independent E + X'^+'-^'^Elxlil - xak(t+v)fkt] - this can be written EE as: X''+'--'-^p'-'E[x,x,{l " Xakit+i))xajk)] k=\j=l fc=l Putting these terms together gives: t-\ pXVarierrD + E ^'*"^'"''^[^^(/ " Xafc(.+ i))/a] k=l + - XE[{l-xatit+i))xterr^] E E X''+'-''^p'-'E[xkXj(l - xakit+i))xajk)] fc=ij=i Now the first and second terms are clearly positive, term are clearly negative. The key insight memorability gets large Therefore, (as / -^ when memorability is is where as the third and fourth that the negative terms tend to zero as 0) since these predicate on having forgotten xt or x^- sufficiently high, the overall expression is positive. 22 Positive covariance can be understood as overlapping samples. Forgetting events from history. Since the samples at times t and T draw from overlapping histories, correlations The increasing the autocorrelation in forecast errors. occurs for the following reason. Suppose that / is condition that / must be sufficiently low very large. Then, the xt from the past will likely be forgotten and hence xt shows up with large weight is analogous to sampling Moreover, rehearsal implies that memories that were forgotten will be forgotten again, arise. Xt is in err(^y. We know from Proposition 3 that negatively correlated to err(^. This implies a negative auto-correlation. The results so far illustrate a model such as this is that it two conflicting and under-reaction. One advantage of forces: over- allows us to trade off such effects and figure out when we expect one to arise over the other. The following propositions quantify when belief changes are negatively (over-reaction) and when they are positively correlated (under-reaction) to lagged information. Proposition 5 Suppose low. that forgetting probabilities are small, so that f is high and p and x o,re Then: Cm;[Ayi^i,Ayt_i] When (10) > (11) these probabilities are large, however: Cot;[Ay,^i,Ayi_i] When < the covariance is negative, then a change in A makes dCov[Ay(l^,Ayt-i dx Ayt^i xjt more negative: < (12) Now, Proof: and it is independent of all = Ayt 7, we can err^^^ + err"^ lagged information. Therefore, the covariance equals: £[errrAy,_i] From Lemma - write err™ j - S[err^i Ay,_i] terms of err'^ in . Substituting for this gives: t (1 - \p)E\x,.,err^\ - J^ \'-''^'E[{f_ k=\ 23 - xak(i+i))xkXt-i\ Reapplying Lemma 7 to err^ gives: k=l fc=l Note that x^ and Xt^i are independent (1 - Define + Xp)XpE[xt_,errr.,] C to be E[xf_i(l - (1 summations in the - Ap)A£[xti(/ - xa{t-i)t)] for xa(t-i)t)] which also equals - A; 7^ i — 1, A2i;[x2_i(/ E[x^_-^{f^ - leaving: - xait-i)t)] This xa(<-i)((+i))]- gives: (1 Substituting for the - first + CA(1 - \p)XpE[xt-ierrYLi] A(l + p)) part from the proof of Proposition 3 gives: ~^ ^^\~_\ ^ E[aktXkXt] + CA(1 - A(l + (13) p)) Suppose events are very memorable, so that the forgetting probability, / and pare high. Then or even negative. is - (A(l The first A(l term +p))C = {X{1- X{1+ p))E[x'^^_-^{lis is low and xa^t-i)t)] ^iid p are low. Then, the small already negative, so that in this case the correlation negative. Suppose, on the other hand, that events are easy to forget so that / and X is x first term tends to zero, while the is high second term implying a positive correlation. Differentiating with respect to A gives: 1 - A*-i -E[aktXkXt] 1which 1 is the — A(H-p) same as equation (13) except has been replaced by second only makes Consequently if it is (i) it has been divided through by A and 1— 2A(l+p). The more negative since equation (13) + C{l-2X{l+p)) C is first positive has no result on the sign and the and l-A(l+p) > negative the derivative with respect to A 24 (ii) is 1 — 2A(l+p). also negative. The behind this proposition intuition forgetting and On over- reaction. is that there are two effects that govern behef dynamics: the one hand, yesterday's information which means that the individual must learn beliefs much to Xt again. This induces a positive correlation the other, over-reaction between means that the individual responded too occurred, meaning that beliefs must correct for this. This induces a negative When events are on average quite memorable, the over-reaction effect dominates. events are readily forgotten, the individual must relearn old information. A reflects the discussion in Section 3.4. on A forgotten, it when correlation. When On and lagged information. it may have been larger and takes a longer time to undo.^^ 3.4 The Role The dependency greater emphasis on history implies over-reaction of History Recall that A measures the weight put on past values in the forecast. In this section how A mediates is and slow over-reaction learning. I will I will examine argue that A can be measured easily and therefore the empirical tests involving A can actually be implemented. Let's begin by considering the impact of forgetting an event. The memory error equals: (-1 err^ = y, _ yf = ^ i-l A*"'^!! - R^tjxk = Y, If Fj^t = 1, so that event e^ is forgotten, the memory error would go up by that the impact of forgetting an event declines as time passes rate of decline depends on A. distant past. Xk is does this happen? them means that learns about the learning is slow. yt- is distant the an event in the due to forgetting this absence of this signal, the individual time through yt. Since yt is noisy, memory, the more time there has been This establishes who there biases, it is will however, this to learn about the be slow learning. This natural to ask whether an individual might not be simply memory completely. Mullainathan (1998) shows that ignoring bias but almost surely raises variance (since information is lost). better off by simply ignoring their Moreover, the k gets large). lost This shows Events provide perfect signals of the permanent shocks, so yt instead. Given both these over amd under-reaction — As time proceeds, the information permanent shock but The more {t \^~'^Xk. A, the larger is the effect of forgetting this perfect signal is lost. In the event thorough observations of ^'' larger slowly re- learned through the forgetting still Why The >^'~HFkt)xk k=l k=l 25 memory may reduce learning occurs at rate A because A measures the noise-signal ratio in When yt- it is large, y is a very noisy signal of permanent income and forgotten events are learned about very slowly. summarize, A captures who quickly a forgotten event can relearned through the quickly errors in Let's now memory y, To and hence how are corrected. return to rederiving how beliefs respond to an event e^: t-i E[y('\et] = xt-J2^*~''^i^kfkt\et] t-l k=l xt To we get the second equation, + X^{et) exploit the fact that third equation comes from the definition of A;, + (a <f(e4). A2 [x^ To + is A^ + . . . + A*-i) independent of Xj as is fkU-i^Xk- The interpret this equation, notice that at time associativeness results in a selective sampling that has effect equal to the evocativeness, £{et). But as we've seen the impact of recall mistakes on beliefs depends on A. In the formula, that selectively recalling the events at time k has impact X^~^£{et). Taking the impact of selective recall i — )• we see co for simplicity, is: ^£[e^)(\ + \^ + + ..)=x£{et)j \^ A Therefore, as A increases, the importance of selective recall increases. Intuitively, when A is large, the triggering of certain types of memories over others has bigger impacts because the past matters more. We, therefore, see two basic properties of slowly individuals adjust their since A can be ^'Cajroll measured in memory A. It both measures extent of over-reaction and how mistakes. These two observations are especially interesting standard data sets."^^ and Sajnwick (1995) present a technique that allows us to estimate A. Notice that Var{i\'^yt) = dal + ia"^. /^dVt in the data for many different d and regressing them on d, one can back out the variances. By computing 26 Application to Consumption 4 Having developed the general individuals.^^ Let i results, the model to the consumption decision of now apply I index individuals, and yu represent and income, c^ denote consumption and u{c) be the instantaneous utility function taken to be the individual maximizes discounted (subjective) expected same utility, across individuals. where the discount rate Suppose the is S. Assume that she faces no borrowing or savings constraints and can borrow or save risklessly at a rate that 5 = Y^. Under these t, Further, impose a no-Ponzi game condition so that there is conditions, marginal utility of consumption will be equated: no infinite u'{ct) = r, and borrowing. u'{ct) for all T. Taking a quadratic or log-utility function implies that consumption will be equalized across time. In the current model, this allows us to write consumption as: Cit where ^o = and Aj(f+i) ^cu = = + r){Ait + yu - (1 i^^vu t — 1 This forecast error. determine consumption in this is is = Yqr^^y^^ + ^^^-d proportional to the change in income expectations plus permanent income considerations completely model. individual and an aggregate component. we write influences the individual. the assets. Differencing across time gives: yf(i-i) intuitive since Now, suppose that the income process individual specific one, Cit) is + yr{t-i) - In other words, the change in consumption the time = r"— ^zt + yu 1 + r yit = y^t is the Letting + '^iVt- sum yt of two components: be the aggregate component, and y° be the where a, measures how much the aggregate shock Both the aggregate and individual income components follow processes described so far and the individual income components are for both processes. Let q one specific to the iid across people. ^^ Events are observed be aggregate consumption. ^^Mullainathan (1998) presents an application to asset pricing as well. ^^There is a slight oddness in the results here. Income is normally distributed meaning that it might well be negative. Using a log-normal distribution would generate all the results here but with added technical complications. The goal here is simply to illustrate the kinds of results that arise rather than to flesh out a structural model. 27 In this simple Permanent Income setup, consumption changes should be unpredictable. Since they essentially represent belief changes, one should not be able to predict them on the basis of lagged information available to consumers. In contrast, the errors of the forgetful f6recaster lead to consumption predictability, and the pattern of this predictability can be pinned down under certain conditions. Prediction 1 Suppose: 1. Personal events are highly memorable and aggregate events are not very memorable 2. a, is small then at the micro ; and level: < Cov{Acnt+k),^yit) dCov{Acnt+k)^^yit) < dCov{Acnt+f,),^yzt) > doi while at the aggregate level: Cov(Act+k,Ayt) To see how this prediction works, note that: = Cov{Ac^t+k),^yrt) Taking the > first term, we can break E[Ayf^^^^)Ayit] into the it + E[err^t^f^_,)Ay,t] components due to the aggregate shock and the parts due to the idiosyncratic component: E[Ayll^^,^Ayu] where because of independence, 5, we know that the first I = expression Therefore, if term here a, is + a^E[A§^^,^,^Ay°] have dropped terms such as E[Ay^^_^_|^^Ay^]. Applying Proposition is memorable), and that the second term forgotten). £[AyO«^,)AyO] negative (we have assumed that personal events are very is positive (we have assume that aggregate events are small, the whole expression is negative. is: £;[errO- ,_i)AyO] + a^E[efr^^^^,_,^Am] 28 The second term easily in the where errif^ the is memory error for the idiosyncratic error for the aggregate component. negative first when events are term here is Just as in the proof of Proposition memorable and positive when events are easy negative and the second term negative sign for the sum. Putting this respect to A^ come income component and efr"^ clearly all is 5, 5, memory these correlations are to forget. Therefore, the positive with the smallness of Oi generating a together gives Cov{/Sc,(t+k)^^yit) from Proposition the is < 0- The partial with whereas the partial with respect to o, comes from the fact that the aggregate contribution to the covariance is positive. Suppose now that we aggregate up consumption and income. Since the idiosyncratic components of income and its forecasts are iid across people, aggregation produces zero for these. This gives: Reapplying Proposition where a is positive. This establishes the aggregate results. the average of a^. Intuitively, over-reaction memorable. The dominant boss calls them recall other in, tells effect is for the idiosyncratic future, and this causes ability, aggregate The (or them to selectively and hence, high permanent As one aggregates up, the idiosyncratic over-reactions cancel Macro-covariances, therefore, depend on recall of the aggregate component. forgotten, there is be the smallness of a, guarantees that the reaction to the aggregate information does not matter. information will that individuals over-react to their private information. Their them that they have a bright level, term components of income since these are information that makes them think they have high At the micro income. dominates 5 as before tells us that this is under-reaction to it. out. Because aggregate This leads to a positive covariance at the level. first assumption of differential memorability can be justified only by appeal to intuition perhaps through surveys): personal events may hold more memorability they deal with some support many more everyday events than aggregate events. in the data, as Pischke (1995) of individual income is for consumers because The second assumption receives and others have argued that the aggregate component small. 29 At the micro level, who consume more the first part of this prediction resembles "rule of thumb" consumers, ones of their income than permanent income considerations would justify. The prediction has generally, though not always) found support in the literature (Hall and Mishkin 1982, Hayashi 1985a, 1985b, Jappelh and Pagano, 1988 and Mariger and Shaw, 1990). and third predictions, however, have not been tested as far as has received support, as seen in Campbell and Mankiw I The second know. Finally, the macro prediction (1989). Deaton (1992) summarizes this evidence. Now, suppose that we go back streams. to a single individual, set a^ The marginal propensity consume out of these to = 0, different and allow income streams on the extent of the that stream's evocativeness. Note, from Proposition recruitment effect the larger the forecast error and hence stronger the to be income stream s and Prediction 2 In general MFCs MPCg to 7^ MFCs > MFCs' To see, how this term, which is will income depend that the stronger the 1, mean reversion. Define y^t be marginal propensity to consume out of stream s. Then: MFCs'- Moreover, => Cov{Act,Ays^t-i)) < Cou(Act, Ay,,(j_i)) works note that: MFCs = Since err^^jN for several Cov{Acu Ayst) = E[Ayf^Ayst] + E[err'Jl^_^s^Ayst] independent of Ay^f we can drop the second term. This leaves us with the we can write first as: E[AysAyst] - ElAerr^^Ayst] The first term here is the appropriate MPC memory mistakes. The second income streams especially since E[£{e).r] will vary. In in the absence of any term represents the distortion: E[errftAyst] This will in general be different for different = xE[£{e)x]j A A other words, streams that have high evocativeness, where information about earnings 30 in that stream relies heavily on soft information that has greater negative lagged correlation many cues, will have larger MFCs. The implication for comes directly from the discussion to date. The greater E[£{e)x], the greater the over-reaction and hence the greater the correlation to lagged income changes. Intuitively, the prediction follows because changes MFC Empirically, differences in "visceral" reactions. Serious empirical difficulties arise, however. differences in propensities to in different income streams invoke has received some support (Thaler, 1990). Empirical differences in consume permanent income. MFCs may Alternatively, they ences in the informativeness of income changes. Yet another possibility differences in information different is may represent true represent differ- that they may represent between the econometrician and the individual due either to measure- ment error or private information. This makes testing such predictions heavily assumptions about the income process. On reliant on structural the other hand, the relationship between excess sensitivity has not been tested as far as I know, and the empirical difficulties MFC here and may be less severe. 5 Conclusion To summarize, this paper has built a simple model of memory limitations. The model has been based on two basic facts drawn from Interestingly, these decision two facts in scientific research topic: rehearsal and association. combination generate several of the experimentally found biases in making under uncertainty. This suggests that memory limitations might be an important component for realistic models attempting a unified treatment of bounded also generate relevant predictions in the and on the asset pricing. We rationality. economic applications we have examined: consumption have also seen how previously untested predictions predictions will provide a way The model of refudiating this model. Many arise. Testing these other applications are possible that have not been pursued here: advertising, subjective performance evaluation (where assessments of an individual may depend on intangible aspects of past performance), and bargaining situations (where opponents may disagree on the past) are a few of the examples. Each of these has subtleties. 31 its own Let me conclude by outlining two directions of future work. First, this paper has focused on the naive case. What does behavior in the sophisticated case look like? flavor of the kinds of results that from full rationality become no case of outsiders manipulating might arise in footnote 18. memory As pointed out Another point less interesting. limitations even sophistication, the possibility for manipulation can if the to mean have real still have already given a I there, the deviations be made here effect effects. is "taken out" due to For example, attempt to use advertising to manipulate memories but individuals attempt to undo Equilibrium can result in positive beliefs. levels that in the is it, if firms the Nash even though there will be no equilibrium distortion in In other words, a standard "signal jamming" argument can be applied when advertising attempts to manipulate sophisticated players. Second, associativeness as formulated in this paper has a trigger related memories, the an extension I memories that one refer to as association chains. multiple steady states in recall. good and bad. For a fixed had high recall recalls history, in for recall. such chains raises the possibility of which there are only two types of events, one possible steady state and bad events have had low While current events can cannot themselves trigger other memories, Allowing Consider a world failing. By is that good events by chance have rehearsal, good events also have high current recall probabilities rkf Such an individual appears optimistic since he systematically overrecalls future. good events. Moreover, when he encounters a good event, The existing stock of good events have high recall and it will have higher will, therefore, trigger this frequently through association chains, generating a great deal rehearsal and raising recall probability. Similarly, a bad event, by virtue of its recall in the its new event steady state association chains being with low recall probability bad events will tend towards a low recall steady state. This optimist, therefore, not only systematically recalls positive information he has already received, he also has a propensity to better recall "sticks" to any good information he receives him while bad information in the future. "slides" off In other words, good information him. Symetrically, there would be a pessimistic steady state. To understand the local dynamics between these steady states, consider an optimist who encounters a long sequence of negative information. 32 Their recency inak(\s tiicse bail events very memorable, and they form an association chain that can raise the recall probabilities of bad events. Thus, a sequence of such events may push the individual to a pessimistic steady state. This sketch illustrates the possibilities of this approach. 33 all References Akerlof, George (1991). "Procrastination and Obedience," American Economic Review, 81(2), May, pp. 1-19. C, Anderson, R. Pichert, J. W., Goetz, E. T., Schallert, D. L., Stevens, K. V., and Trollip, S. R. "Instantiation of General Terms," Journal of Verbal Learning and Verbal Behavior, (1976). 15, 667-679. Banerjee, Abhijit (1992). "A Simple Model of Herd Behavior," Quarterly Journal of Economics, 107(3), 797-817. Barberis, Nick, Shleifer, Andrei NBER Working Paper # and Vishny, Robert 5926. Cambridge, (1997). "A Model of Investor Sentiment," MA: NBER. Bernard, Victor (1993). "Stock Price Reactions to Earnings Announcements: Anomalous Evidence and Possible Explanations," Thaler (1993), pp. 303-40. in Remembering. Cambridge: Cambridge University Press. Bartlett, F.C. (1932). Camerer, Colin (1995). "Individual Decision Making," in Kagel and Roth (1995). Camerer, Colin, Loewenstein, George, and Weber, Martin (1989). Economic A Summary of Recent Settings: An "The Curse of Knowledge in Experimental Analysis," Journal of Political Economy, 97(5), 1232- 54. Campbell, John, and Mankiw, N. Gregory (1989). "Consumption, Income, and Interest Rates," NBER Macroeconomics Annual 1989, bridge, MA: MIT Campbell, John, and ed. by Olivier Blanchard and Stanley Fischer. Cam- Robert (1988a). "The Dividend-Price Ratio and Expectations of Press, 185-216. Shiller, Future Dividends and Discount Factors," Review of Financial Studies, Campbell, John, and in Shiller, 1, 195-227. Robert (1988b). "Sotck Prices, Earnings and Expected Dividends," Journal of Finance, 43, 661-676. Carol], Christopher, Working Paper Conlisk, John and Samwick, Andrew # (1996). 5193. Cambrige, "Why Bounded (1995). MA: "The Nature of Precautionary Wealth," NBER National Bureau of Economic Research. Rationality?" Journal of Economic Literature, 34(2), 669- 700. Crovitz, H. F., and Schiffman, H. (1974). "Frequency of Episodic Age," Bulletin of the Psychonomic Society, Cutler, David, Poterba, James, 4, Memories as a Function of their pp. 517-518. and Summers, Lawrence (1989). "What Moves Stock Prices?" Journal of Portfolio Management, 15(3), 4-12. De Long, Bradford, Shleifer, Andrei, Summers, Lawrence, and Waldmann, Micahel (1990). "'Noise Trader Risk in Financial Markets," Journal of Political Economy, 98(4), 703-38. Deaton, Angus (1992). Understanding Consumption. Oxford: Oxford Unversity Press. Dow, James (1991). "Search Decisions with Limited Memory," Review of Economic Studies, 58 34 (1), FischofF, January, pp. 1-14. Baruch Hindsight," in Condemned "For Those (1982). Kahneman, Slovic to Heuristics and Biases in Study the Past: and Tversky (1982). Fudenberg, Drew and Levine, David (1997). Theory of Learning in Games, mimeo, Harvard University. Grether, David (1980). "Bayes Rule as a Descriptive Model: The Representativeness Heuristic," Quarterly Journal of Economics, 95 Hall, November, 537-57. (3), Robert and Mishkin, Frederic (1982). "The Sensitivity of Consumption to Transitory Income: Estimates from Panel Data on Households," Econometrica, 50, 461-81. A Hayashi, Fumio (1985a). "The Effect of Liquidity Constraints on Consumption: Cross- Sectional Analysis," Quarterly Journal of Economics, 100, 183-206. Hayashi, Fumio (1985b). "The Permanent Income Hypothesis and Consumption Durability," Quarterly Journal of Economics, 100, 1083-1113. "Ignoring Sample Bias: Hamill, R., Wilson, T.D., and Nisbett, R.E. (1979). Collectivities Harvey, Andrew Inferences about from Atypical Cases," unblished manuscript. University of Michigan. Time (1993). James. William (1890). Series Models. 2nd. ed. Cambridge, MA: MIT The Principles of Psychology. Reprinted, Cambridge, Press. MA: Harvard Uni- versity Press (1983) Jappelli, Tullio and Pagono, Marco Section," Centre for (1988). "Liquidity Constrained Econ Policy Research, Households in an Italian Cross- iscussion Paper: 257, August. Kagel, John, and Roth, Alvin, eds. (1995). The Handbook of Experimental Economics. Princeton, NJ: Princeton University Press. Kahneman, Daniel, and Tversky, Amos "Subjective Probability: (1972). sentativeness," Cognitive Psychology, 3, A Judgement of Repre- Kahneman, 430-54. Reprinted in Slovic, and Tversky (1982). Kahneman, Daniel, and Tversky, Amos "Availability: (1973). and Probability," Cognitive Psychology, 4, A Heuristic for Judging Frequency 207-32. Reprinted in Kahneman, Slovic, and Tver- sky (1982). Kahneman, Daniel, Slovic, Paul, and Tversky, Heuristics and Biases. New New York, NY: (1982). NY: Cambridge University York, Kandel, Eric, Schwartz, James, and Amos Jessell, Thomas, 3rd Judgement Under Uncertainty: Press. ed. (1991). Principles of Neural Science. Elsevier Science Publishing. Laibson, David (1997). "A Cue Theory of Consumption," mimeo. Harvard University. Lakonishok, Josef, Shleifer, Andrei and Vishny, Robert (1994). "Contrarian Investment, Extrapolation, Lo, and Risk," Journal of Finance, 49(5), pp. 1541-78. Andrew and MacKinlay, A. Craig (1990). "Data-Snooping Biases Pricing Models," Review of Financial Studies, 3, 431-468 in Tests of Financial Asset Mackintosh, N.J. (1983). Conditioning and Associative Learning. Oxford: Claredon Press. Mariger, Randall, and Shaw, Kathryn (1990). "Unanticipated Aggregate Disturbances and Tests of the Life-Cycle Muth, John Model Using Panel Data," Board Statistical Association, Neisser, U. (1967). mimeo. "Optimal Properties of Exponentially Weighted Forecasts," Journa/ of the (1960). American of Governors of the Federal Reserve, 55 (290). Cognitive Psychology. Nisbett, Richard, and Ross, Lee (1980). Judgement. Englewood Cliffs, Pischke, Jorn-Steffen (1995). New York: Appleton-Century-Crofts. Human Inference: Strategies and Shortcomings of Social NJ: Prentice-Hall. "Individual Income, Incomplete Information, and Aggregate Con- sumption," Econometrica, 63(4), July, pp. 805-840. Rabin, Matthew (1997). "Psychology and Economics," mimeo, University of California-Berkeley. Reder, Lynne, editor (1996). Cognition. Roll, Mahwah, Implicit N.J.: memory and metacognition, Carnegie Mellon Symposia on Lawrence Erlbaum. Richard (1984). "Orange Juice and Weather," American Economic Review, Ross, L., Lepper, M. R., and Hubbard, M. (1975). 74, 861-80. "Perseverance in Self Perception and Social Per- ception: Biased Attributional Processes in the Debriefing Paradigm," Journal of Personality and Social Psychology, 32, pp. 880-892. Rumelhart, David, McClelland, James, and the Processing : PDP Research Group (1986). Parallel Distributed Explorations in the Microstructure of Cognition. Cambridge, MA MIT : Press, 1986. Sargent, Thomas Oxford and (1993). New Bounded Rationality in Macroeconomics. Arne Ryde Memorial Lectures. York: Oxford University Press. Schacter, Daniel (1996). Searching for Memory: The New York, "The Limits of Arbitrage," Journal of Finance, 52(1), Brain, the Mind, and the Past. NY: Basic Books. Shiller, Robert (1989). Market Shleifer, Andrei, Volatility. and Vishny, Robert Cambridge, MA: (1997). MIT Press. 35-55. Stigler, George; Becker, Gary (1977). "De Gustibus Non Est Disputandum," American Economic Review, 67(2), 76-90. Thaler, Richard (1990). "Saving, Fungibility, and Mental Accounts," Journal of Economic Per- spectives, 4(1) 193-205. Thaler, Richard (1993). Advances in Behavioral Finance. Thompson, W. C, Reyes, R. M., and Bower, G. H. New (1979). York, NY: Russell Sage. "Delayed Effects of Availability on Judgement". Manuscript, Stanford University. Trope, Yaacov (1978). "Inferences of Personal Characteristics on the Basis of Information Retrieved from One's Memory," Journal of Personality and Social Psychology. Kahneman, Slovic and Tversky (1982), 378-390. 36 36, 93-106. Reprinted in Tulving, E. and Schacter, D. L. (1990). "Priming and Human Memory Systems," Science, 247, 301-306. Tulving, E. and Thomson, D. M. (1973). "Encoding Specificity and Retrieval Processes in Episodic Memory," Psychological Review, 80, 352-373. Tversky, Amos, and Kahneman, Daniel, (1971). "Belief in the ical Bulletin, 2, 105-110. Law of Small Numbers," Psycholog- Reprinted in Kahneman, Slovic, and Tversky (1982). Waldfogel, Joel (1993), "The Deadweight Loss of Christmas," American Economic Review, 83(5), December pp. 1328-36. 37 Appendix Lemma The optimal forecast 1 satisfies: t-i yt{huet) = + ^[wk,tXk + xt {l-Wk,t){yk-yk-\)] k=\ f ol{huet) = ol + d1^^ 2 I ^t-i + ^? 2 where nst where A is = jt^, the error to truth ratio: In^al = ol }rmw,(t^k) = A and define: Wk^t = \(al + = HjILo ^ol{al '^Syt+j . In the limit, + Aa^,)) ai+at Computing the optimal Proof: filter; see ch. 4, setting d^ = Harvey = a^^j forecast is a straightforward apphcation of the Given the forecast (1993).^^ Kalman computing the steady requires rule, o^: 2 af + at Solving the resulting quadratic provides: As Lemma f —> oo, Tikt -^ 2*7 a meaning that Wkt -^ X*~^. 2 Forgetting probabilities satisfy: ' E[fk{t+j)Xk\et] = ~ . Proof: When k > t, fk{t+j)'-^k i-p (1 - P^J^t -pJX^(et) depends only on events at time greater than dence across time, therefore, shows that E[fk(t+j)Xk\et] E[fk{t+j)Xk\et] E[ft{t+j)\et] 'A derivation = for the = ootE[f^t+j)\et]- E[f_ - ifk>t lfk = t ifk<t xat(t+j)\et] = in this case. t. Indepen- When k = t, Breaking this apart: + pE[f_ - steady state can be found in xati^i+3-\)\ct] Muth 38 (1960). + + f^^^E[f_ - xO((t+i)k(] - This equals \:^{f_ = E[xk{l - E[xkfk(t+j)\et] xak{t+j))\et] +p>E[xk{f_ By independence, = E[xkf\et] Lemma 0. all This time et, E[yt\et] = Xt- • • xak{t+j-i))\et] y/^ = = — yt + + p'^^-''~^E[xk{l - xak(k+i)) - Even X<^kt)\et\- here, -XP'^iet)- beliefs satisfy: t = + x£{et)^{l - xt A*-^) err^. This allows writing: = E[y('\et] Now, + pE[xk{l - - xakt)\ei\ + -xp^ E[aktXk\et] E[y^\et\ Notice that that terms here are zero except fPE[xk{f_ gives: 3 Conditioning on Proof: when k <t, note X^i^t))- Finally, E[yt\et]-E[errr\et] The second term can be written as: t-i = - -E[err^\et] ^ X'->'E[xkfkt\et] k=l By Lemma 2, E[xkfkt\^t] = —X^i^t)- Substitution provides that: E[ynet] = + x£{et){X + xt X' + ---X <-ii = xt+x£{et)Y^{l-X'-') Lemma 4 Conditioning on E[yr+j\et] Proof: first et, time t +j -- =x,\l- ^ ^_^ Again, notice that yj^ term with respect to et equals beliefs satisfy: (1 = Vt xt- - p')Xn + {pxyx£{et)y-^(l - V"^) ~ ^''"''t+j- The '^^^ conditional expectation of the conditional expectation of the second term equals: t+j-i -E[errr+j\et] Applying This Lemma =- 2 tells us that the £ X'+^-'E[xkfkit+j)\et] summands in this summation are zero leaves: t-i -X^xtE[fkt\et] - A^' ^ k=l 39 X'-'=E[xkfkit+j)\et] for k > t. Lemma Again applying + -X^xtE[fkt\et] 2 to E[xkfk{t+j)\et] provides: X'p^X^iet) E ^'"' = ->^'^tE[ht\et] + [XpY X^ {et) ^ {I - X'-') Putting these together: -E[errr+j\et] Finally, Lemma = x,(l - A^.B[/fc(,+,)|e,]) allows us to write: 2, + {Xpyx£iet)Y^{l = E[fm_f.j)\et] - — ^_ (1 - A'-^) Substitution p'). gives the stated formula: E[y(lAet] Lemma = x, (l - I=^^(i _ ^)aA + [pXyxE[e^)-^[l - X'-') 5 The following are true: Proof: I will show the sign{E[xkc{x — Xk)\x\) sign{E[ni.c{n — n^)|n]) first = = sign{x) sign{n) of the two equations, the proof for the second the same. is exactly /oc Xkc{x - Xk)dFk -oo Breaking the integral at zero and applying symmetry of the x distribution gives: roc / Xk[c[x - Xk) - c[x -\- Xk)]dFk Jo Since Xk > in this equation, the sign of c{x if and only fif x Formally, since \x + Xk\- is c(-) — c(x 0, this is + Xk) > which happens measures closeness, c{x - Xk) Squaring both > Xfe) sides, gives - Xkf - (x equivalent to i — Xk) — c(x + Xk))- Now, equals the sign{c(x closer to Xk than to —Xk~ (a: Since Xk — it > c{x if and only + Xk) : + Xkf > > 0. sign{E[xkc(xi - {2x){2xk) This shows that: - Xk)]) 40 = sign(x) > if if and only x is if \x positive. — Xk\ > Lemma 6 Associativeness implies: > E[xkXtakt] Proof: Note that: /•oo roo roc E[xkXtakt]= XkXtc{xk— xt)dFkdFt / Joe Jco Breaking apart the integrals allows us to write: (/•oo TOO rO //+//+//+/ /-co /-O Jo J-oo Perform the integral transformation in the xi 1-^ —xt- y-oo ./-oo Jo Jo By symmetry F of the Jo roc 2 / Jo / rO this + xt)dFk —Xk now Xk[c{xk - which as we saw in the previous proof is / Jo - xt)dFkdFt - xt)dFk ) XtdFt gives: / Xt) - c{xk + xt)]xtdFt Jo positive since for positive Xf, c{xk 7 Forecast errors satisfy: t errr+i = pXerr^ + ^ A*+i-'=Xfc(/ - X«fc(t+i)) A:=l Proof: Write: errr+, = E \'^'-'xkfkit+,) jt=i Using the fact that fk{t+i) Substituting in for errl'^ in = J c{Xk+Xt). Lemma and /•oo /•oo 2 -Xfc \ Xkc{xk J-oc ^-^ >-> \ XkXtc{xk / \Jo Performing the transformation x^ xt)dFkdFt becomes: rO - Xkc{xk - J-ooJ Jo roc / and c(), / / XkXtc{xk / second and third integrals of Xk roc + / / J-ocJo distribution rCG Oroc r°°\ /-O /-O = Pfkt the errr+r + first / - Xafc(i+i)> term = pXerr^ we get: gives: E ^'^''Hl - k=l completing the proof 41 Xakit+i)) — xt) > Lemma 8 The variance, yar[errj"|ef] is less than Var[errl] for small x and increases with Note that Kar[err™|e(] equals: Proof: t-i t-i k=l j=l When X to zero. is To close to zero, notice that the non-diagonal terms, see, this notice E[{1 The the first Xajt)xkXj\et] = p'^E[fk{t-i)fj(t-i)XkXj]+X^E[aktajtXkXj\et] « is are also close because the second term directly goes to zero as term goes to zero since and fk{t-i) Lemma associativeness, as seen in 2. fj{t-i) only x does, and depend on each other through Since the non-diagonal terms are close to zero, focus on the diagonal terms: let's E[il Again, as x^. ^j that these terms equal: + pfk(t-\) - XO-kt){f_ + Pfj{t-i) - approximation last where k x goes And, since to zero, this j^ is less + pfk{t-i) - Xaktfxl\et] becomes a constant than this 1 whole term = Y, Var[errl\et] (as usual, taking will be less t than — >• oo), ( j^ 1 times -E[x|]. Therefore, ^''~'' E[xl] k=l t-l « To these also increase with x- , 2 E[errr\et] the yar[errj|e(] = intuitive: increases, the Lemma of these terms, therefore, rises with x- ^ j)illustrate It an important phenomena. Consider contains no such cross-terms. They exist only 6 tells us that the cross terms will be positive. This associativeness raises variance by systematically introducing a correlation in the recalled information. X (A; Yll^Ji A^'~^'^S[x|]. because of associativeness. x Similarly, in the derivation of the diagonal elements, The sum Finally, the non-diagonal terms as r see the increase with Xi first note that in the derivation above, as non-diagonal elements increase. is / When gets large), then Var[err^\et] these cross terms are sufficiently large (for example, may even be larger than Var[errJ|e(] 42 x )6''.^ II I 3 Date Due Lib-26-67 „MIT LIBRARIES 3 9080 02246 0551