Journal of Experimental Psychology: Applied 2008, Vol. 14, No. 3, 266 –275 Copyright 2008 by the American Psychological Association 1076-898X/08/$12.00 DOI: 10.1037/1076-898X.14.3.266 Correcting Memory Improves Accuracy of Predicted Task Duration Michael M. Roy Scott T. Mitten and Nicholas J. S. Christenfeld Elizabethtown College University of California, San Diego People are often inaccurate in predicting task duration. The memory bias explanation holds that this error is due to people having incorrect memories of how long previous tasks have taken, and these biased memories cause biased predictions. Therefore, the authors examined the effect on increasing predictive accuracy of correcting memory through supplying feedback for actual task duration. For Experiments 1 (paper-counting task) and 2 (essay-writing task), college students were supplied with duration information about their previous performance on a similar task before predicting task duration. For Experiment 3, participants were recruited at various locations, such as fast food restaurants and video arcades, and supplied with average task duration for others before predicting how long the task would take. In all 3 experiments, supplying feedback increased predictive accuracy. Overall, results indicate that, when predicting duration, people do well when they rely not on memory of past task duration but instead on measures of actual duration, whether their own or that of others. Keywords: memory bias, planning fallacy, time, prediction, feedback papers, performing various school and personal tasks, and completing computer assignments (Buehler, Griffin, & Ross, 1994; Connolly & Dean, 1997; Griffin & Buehler, 1999; Koole & Van’t Spijker, 2000; Newby-Clark, Ross, Buehler, Koehler, & Griffin, 2000; Taylor, Pham, Rivkin, & Armor, 1998). However, not all studies have found underestimation. Participants performing tasks lasting fewer than 5 min, such as solving anagrams (Thomas, Handley, & Newstead, 2004), completing the Tower of Hanoi puzzle (Thomas, Newstead, & Handley, 2003), buying a stamp, looking up a book in the library, filling out a biographical form, and sorting a deck of cards (Burt & Kemp, 1994), tended to overestimate task duration. It tends to be for the longer tasks, many taking hours, days, or even weeks to complete, that underestimation is found. There appears to be a shift in bias at some point, with short tasks overestimated and long tasks underestimated. We have found, for example, that participants completing a short paper-counting task (50 pages) tended to overestimate duration, whereas participants completing a longer version of the same task (more than 100 pages) underestimated duration (Roy & Christenfeld, 2008). Underestimation also occurs more frequently for familiar tasks, whereas novel tasks are often overestimated (Boltz et al., 1998; Hinds, 1999). In a study using an origami task, participants new to the task overestimated how long it would take, whereas those familiar with the task underestimated the completion time (Roy & Christenfeld, 2007). A possible reason for this tendency to make biased predictions of future task durations is that, in making such predictions, people consult memories of past durations, and those memories are systematically biased. That is, memories of previous task duration are incorrect; therefore, predictions of future duration for similar tasks are also incorrect. This view, which we have termed the memory bias account (Roy, Christenfeld, & McKenzie, 2005b), suggests that it is error in memory that causes a corresponding error in prediction. Memories of previous task duration are likely to be based on estimates because it is rare that people precisely monitor duration People are often called on to predict when tasks will be finished. Whether it is in the context of planning the rest of the day or promising when an important work project will be completed, the ability to predict task duration is a critical skill. However, people’s predictions are often biased, specifically in the direction of underestimating when a task will be finished. Too often, deadlines come and go while a task remains incomplete, or perhaps even unstarted, and missed deadlines, even when they are self-imposed, can have numerous negative consequences. Studies have found underestimation for an impressive variety of tasks, including tasks performed in a laboratory setting, such as building computer stands, making origami objects (Byram, 1997), formatting a manuscript, making hors d’oeuvres (Kruger & Evans, 2004), doing spell-check tasks (Francis-Smythe & Robertson, 1999), reading a manuscript (Josephs & Hahn, 1995), counting a large stack of paper (Roy & Christenfeld, 2008), looking up items in a catalogue (König, 2005), and playing very familiar piano pieces (Boltz, Kupperman, & Dunne, 1998); and tasks performed in natural settings, such as waiting in line for gas (Koneçni & Ebbesen, 1976), programming software (Jorgensen & Sjoberg, 2001), completing tax forms (Buehler, Griffin, & MacDonald, 1997), finishing Christmas shopping (Buehler & Griffin, 2003), creating Web-development projects (Molokken-Ostvold & Jorgensen, 2005), performing group projects (Buehler, Messervey, & Griffin, 2005), writing Michael M. Roy, Department of Psychology, Elizabethtown College; Scott T. Mitten and Nicholas J. S. Christenfeld, Department of Psychology, University of California, San Diego. Part of this work was supported by the National Institute of Mental Health, National Research Service Award MH14257 to the University of Illinois. A portion of this work was completed while Michael M. Roy was a postdoctoral trainee in the Quantitative Methods Program of the Department of Psychology, University of Illinois at Urbana–Champaign. Correspondence concerning this article should be addressed to Michael Roy, Department of Psychology, Elizabethtown College, 1 Alpha Drive, Elizabethtown, PA 17022. E-mail: roym@etown.edu 266 IMPROVING PREDICTIVE ACCURACY while they perform a task, especially one completed over multiple sessions. These estimates may not be accurate, and previous research has indicated that people show an overall tendency to underestimate task duration in retrospect, remembering tasks they have completed as having taken less time than they actually did (see Block & Zakay, 1997; Fraisse, 1963; Poynter, 1989; Wallace & Rabin, 1960, for reviews). For example, participants asked to reconstruct their day by supplying all the day’s activities and the duration for each generally produced a schedule less than 24 hours in length (Kahneman, Krueger, Schkade, Schwarz, & Stone, 2004). Consulting biased memories to make future predictions should bias prediction in the same way. In support of the memory bias account, research has indicated that tasks that are likely to be remembered as taking longer than they actually did, such as novel tasks (Roy & Christenfeld, 2007) or short tasks (Roy & Christenfeld, 2008), are also likely to be predicted to take longer than they actually will. Furthermore, research has found that error in memory for previous task duration is predictive of error in estimation for how long it will take to repeat the task (Thomas et al., 2004), and that completing a shorter version of a task before prediction leads to underestimation and completing a longer version before prediction leads to overestimation (Thomas et al., 2003; Thomas, Handley, & Newstead, 2007). It may be that when predicting task duration, people consult a general representation that they have for that task that is based on past experience. This past experience could be performing the task directly or observing others. Prediction then could be made by using this general representation as an anchor and adjusting prediction up or down on the basis of the specific task at hand. In this way, the process of predicting task duration may be similar to that of remembering task duration using reconstructive memory, in which estimation is likely made by calling on a general representation for task duration and adjusting up or down depending on memory of whether or not the task was shorter or longer than average (Burt, 1992; Burt & Kemp, 1991; Burt, Kemp, & Conway, 2001). If the general representation used as an anchor is biased, then after adjustment is made for the specific characteristics of the task at hand, the resulting prediction will likely still be biased. In support, it has been found that people do at times employ an anchoring and adjustment strategy when predicting task duration, with high anchors leading to overestimation and low anchors leading to underestimation (König, 2005; Thomas & Handley, 2008). The majority of previous explanations for bias in estimation of future task duration, in contrast, have been based on the assumption that it is the predictive process that is flawed. Investigators have proposed that, in making predictions, people might be overly optimistic and disregard the possibility that surprises or interruptions could arise during the task (Byram, 1997; Hinds, 1999; Kahneman & Tversky, 1979, 1982). It has also been suggested that people use a top-down approach to planning, predicting for the task as a whole and failing to sufficiently weight the various components of the task (Byram, 1997; Connolly & Dean, 1997; Jorgensen, 2004). On the other hand, underestimation may result from people using a bottom-up process when planning, so that, when listing the individual components of a task, they neglect key subcomponents in the process (Jorgensen, 2004; MolokkenOstvold & Jorgensen, 2005). Another possibility is that people form an overly optimistic scenario of how events will unfold and, 267 in so doing, fail to imagine other potential courses the process could take (Buehler, Griffin, & Ross, 2002; Byram, 1997; Kahneman & Tversky, 1979, 1982; Newby-Clark et al., 2000). People may simply be overly optimistic about their future in general (Armor & Taylor, 1998). Finally, it has been suggested that people may, in making their predictions, disregard their memories of how long similar tasks have taken in the past, and so ignore relevant prognostic information (Buehler et al., 1994, 2002; Hinds, 1999; Kahneman & Tversky, 1979, 1982). Among explanations for errors in prediction, the most developed is the planning fallacy (Kahneman & Tversky, 1979, 1982), which incorporates a number of the specific possible mechanisms discussed above. The planning fallacy attempts to explain the seeming paradox that people continue to underestimate how long it will take them to complete future tasks, even though they are aware that similar tasks have taken longer than planned in the past. For example, a number of experiments have found that people predict that projects such as completing term papers and tax forms will be finished well before their deadlines, even though they realize that in the past similar projects were finished very close to deadline (see Buehler et al., 2002, for review). To explain this phenomenon, it was proposed that people construct an overly optimistic scenario of how a task will be completed and ignore their memories for how long similar tasks have taken in the past. It is the narrow focus on the task at hand, along with a tendency to be overly optimistic about the future, that produces underestimation of future task durations. This narrow focus causes people to disregard their memories of how long similar tasks have taken previously, as well as leading them to discount the possibility of surprises or interruptions that may delay completion. Based on this view, Kahneman and Tversky (1979, 1982) proposed that prediction would be improved if memories of past completion times were fully consulted during the prediction process. Using this framework, a number of experiments have attempted to increase the accuracy of prediction. The majority of these have aimed to correct the flawed prediction process by increasing the salience of similar past experiences or the salience of alternative ways the task could unfold. Participants have been asked to reflect on past completion times (Buehler et al., 1994; Hinds, 1999), to break down tasks into their individual components (Byram, 1997; Connolly & Dean, 1997; Kruger & Evans, 2004), to list possible surprises that could arise during the task (Byram, 1997; Hinds, 1999), to form alternative scenarios of how the task might be completed (Byram, 1997; Connolly & Dean, 1997), and to examine the problem as observers instead of as actors (Buehler et al., 1994; Byram, 1997; Newby-Clark et al., 2000). For the most part, these attempts have failed to improve the overall accuracy of prediction. Although a few of the interventions produced a decrease in the tendency to underestimate (Buehler et al., 1994; Connolly & Dean, 1997; Kruger & Evans, 2004), they did not consistently produce improvements in other measures of accuracy such as overall accuracy (average error regardless of sign) and relative accuracy (correlation between estimated and actual duration). It appears that although these interventions may have at times produced a reduction in error at the level of group means, they did not also produce a reduction in variability of prediction. Although these interventions found some success, they are unlikely to be helpful to the individual trying to improve prediction for a single project. 268 ROY, MITTEN, AND CHRISTENFELD We have focused here on interventions aimed at changing the way that predictions are made, but it should be acknowledged that researchers have found some success using other interventions, such as altering behavior subsequent to prediction using implementation intentions (Koole & Van’t Spijker, 2000; Taylor et al., 1998) and, in the case of novel tasks, supplying experience with the task (Thomas et al., 2007). Taken together, the persistent failure of attempts to improve overall predictive accuracy by increasing the salience of past experience suggests two possible explanations. One, based on the planning fallacy, is that people are extremely resistant to using information about past experience, even when it is made as explicit as experimenters can manage (Griffin & Buehler, 2005). The other, based on a memory bias account, is that the error is due not to failure to use memory appropriately but instead to the inaccuracy of the memories themselves. To explore the second explanation, we examined an alternative method for improving predictive accuracy: explicitly correcting memory. Although no previous experiments have directly, experimentally examined the effect of receiving accurate feedback of past completion times on prediction, some indication of the efficacy of this method comes from a study in which all participants were given feedback about the actual duration of practice trials for an anagram task before predicting how long it would take them to perform the task again (Buehler et al., 1997). Here, on the whole, participants were very accurate in their predictions, showing little bias. Another study found that when software designers were predicting how long a project would take, prediction was more accurate if the designers could identify previous projects that were similar and use information on how long those projects had taken (Jorgensen, 2004). These studies indicate that supplying feedback of actual task duration before making a new prediction may be a viable way of increasing predictive accuracy, and one that warrants further investigation. Although the effect of feedback on prediction of duration is not well understood, it has been shown that receiving performance feedback can increase accuracy when people are asked to reproduce short periods of time by pressing a button to indicate the start and finish of an interval that matches a previously experienced interval (Walters, 1933). Use of feedback has also been found to improve judgment accuracy for a number of other tasks, such as dating pieces of art (Stone & Opel, 2000), predicting future income (Youmans & Stone, 2005), and forecasting outcomes of time series (Remus, O’Connor, & Griggs, 1996). Although supplying feedback has produced increased accuracy in a number of different realms, especially when the feedback has focused the predictor’s attention on motivation or learning for the task, not all studies have found a positive effect of feedback (Kluger & DeNisi, 1996). It may be that effectiveness of feedback is dependent on its content; for example, it may be more useful to direct participants to the cues they should use when making a decision rather than informing them of which cues they have actually used in the past (Balzer, Doherty, & O’Connor, 1989). It is also possible that feedback may improve some forms of predictive accuracy but not others. Previous research on interventions aimed at increasing accuracy for other types of judgments found that increasing accuracy on one measure often led to a decrease in accuracy on another measure (Epley & Dunning, 2006; Stone & Opel, 2000). For example, one set of studies found that interventions that decreased the tendency to under- or overestimate likelihood in judgments about future voting and dating behavior had little, or even a negative, effect on the correlation between predicted and actual behavior and vice versa (Epley & Dunning, 2006). Here we present three experiments examining the effect of feedback on multiple measures of predictive accuracy. For the first experiment, we had participants perform a manual task and then asked them to predict how long it would take to perform that task again. In the second experiment, we examined whether there is an optimal time to provide accurate feedback. In the third experiment, we examined the effectiveness of supplying general averages on increasing predictive accuracy for real-world tasks. Experiment 1 For Experiment 1, participants completed a paper-counting task. Half the participants were asked to estimate how long the task had taken, and half were given feedback on the actual duration. Both groups then predicted how long it would take to repeat the task and then repeated the task. Method Participants. Seventy-eight students (50 females, 28 males, M age ⫽ 19.8 years) participated. Participants received course credit for their psychology classes in exchange for participation. Procedure. Participants arrived singly for the experiment and were asked to remove their watch and any rings or bracelets they were wearing. They were told this was because they would be working with their hands on the task, although it actually served to ensure that they could not consult their watches and calculate how long the task was taking. There was also no clock in the room. Participants were asked to count out a stack of 500 sheets of paper by placing them into perpendicular stacks of 10 while an experimenter surreptitiously timed them. A paper-counting task was chosen because a previous experiment had found underestimation for both remembered and predicted duration using this task (Roy & Christenfeld, 2008). Participants were randomly assigned to either the feedback or the estimation condition. Those in the feedback condition were told how long it had taken them, in minutes and seconds, to complete the task and were asked to record this information on a response sheet. Participants in the estimation condition were asked to remember how long it had taken them to complete the task and record this on their sheet. Participants in both conditions were then asked to estimate how long it would take them to count out 500 sheets again and write this below their previous entry. Finally, participants were asked to count out another 500 sheets while being timed by the experimenter. Dependent measures. Accuracy was measured in three ways: directional error, overall error, and relative error. Directional error is the comparison of mean estimated duration and mean actual duration, and is a measure of whether participants generally underestimate or overestimate duration. Directional error was computed by taking the log of the ratio of estimated and actual duration (log proportional error). This measure was used for two reasons. First, taking the log of the ratio of estimated and actual duration centers the measure on zero so that positive values indicate overestimation and negative values indicate underestimation. Second, IMPROVING PREDICTIVE ACCURACY it removes positive skew in estimated duration found in most studies, including the following three experiments (average skewness in estimated duration for the experiments reported here was ⫹1.02). Using an untransformed ratio of estimated duration to actual duration can also cause skewness, because underestimation would be represented by values between 0 and 1 and overestimation represented by values larger than 1 and unbounded. Taking the log of the ratio removes skew and makes both underestimation and overestimation unbounded (Roy & Christenfeld, 2007, 2008). Overall error, or average error disregarding sign, was measured by taking the absolute value of the difference between the estimated duration and the actual duration expressed in terms of minutes (absolute error). This measures how close people’s estimates come to the actual duration, regardless of direction of deviation. Finally, relative error was measured by finding the correlation between log estimated duration and log actual duration. A high correlation indicates that participants know whether they will be slow or fast at the task compared with others. These three measures are largely independent. It could be, for example, that half the group estimates very high, and half very low, so the directional error is zero, whereas the absolute error remains high. It could also be that estimates are very close to correct, but all participants deviate in the same direction, so the directional error remains significant. It is also possible for both errors to be large, so the group is systematically biased and inaccurate about how long it will take, but with people still aware of how they compare with others. It would be desirable for an intervention to decrease log proportional error and absolute error and to increase the correlation between estimated and actual duration. Results and Discussion Memory. The paper-counting task took 13.4 min on average to complete the first time. Table 1 gives log proportional error, absolute error, and relative error for participants who estimated previous task duration (n ⫽ 39). Participants remembered the task as taking less time than it actually did (M log proportional error significantly different from 0), t(38) ⫽ 2.88, p ⫽ .007, d ⫽ 0.46. Prediction. It took participants an average of 11.4 min to complete the task the second time. Analysis performed on log proportional error indicates a significant difference in bias for participants who estimated previous task duration and those who were given feedback on the actual duration, t(76) ⫽ 2.05, p ⫽ .04, d ⫽ 0.46 (see Table 1 for full results). As with remembered duration, participants in the estimation condition underestimated how long it would take to perform the task the second time, Table 1 Accuracy in Remembered and Predicted Duration for Experiment 1 Prediction Accuracy M (SE) log proportional error M (SE) absolute error Relative accuracy (r) Memory Estimation Feedback ⫺0.10 (0.03) ⫺0.05 (0.03) 0.03 (0.01) 3.86 (0.44) 3.71 (0.46) 2.21 (0.27) .42 .38 .61 269 although this bias was not significant, t(38) ⫽ ⫺1.31, p ⫽ .2, d ⫽ 0.21. In contrast, participants in the feedback condition overestimated how long it would take, t(38) ⫽ 2.18, p ⫽ .04, d ⫽ 0.34. In terms of absolute error, participants in the feedback condition were more accurate than participants in the estimation condition, t(76) ⫽ 2.84, p ⫽ .006 d ⫽ 0.64, with absolute error reduced by 40%. Receiving feedback also showed some sign of decreasing relative error, the correlation between the log of the estimated duration and the log of the actual duration, although this difference was not significant, Z ⫽ 1.29, p ⬍ .2. Although the presence of feedback eliminated the tendency to underestimate and improved overall accuracy, it also caused participants to overestimate task duration. This shift toward overestimation in the feedback condition may be explained by the tendency to underestimate improvement. In both conditions, participants predicted only a moderate improvement (of roughly half a minute) over their perceived previous completion time, whereas their actual improvement was almost 4 times as large (about 2 min). When participants anchored prediction on available information about the previous duration, underestimating improvement led them to overestimate how long the task would take in the feedback condition and to a decrease in underestimation in the estimation condition compared with remembered duration. It appears that participants in both the estimation and feedback conditions used the information at their disposal to form their predictions. For participants in the estimation condition, the correlation between estimation for previous task duration and prediction of future task duration was .85, Z ⫽ 7.51, p ⬍ .001, whereas in the feedback condition, the correlation between feedback and prediction was .71, Z ⫽ 5.35, p ⬍ .001. Relationship between accuracy in memory and prediction. For the paper-counting task, it appears that bias in memory led to bias in prediction. Size and direction of a participant’s bias in remembered duration was prognostic of size and direction of bias in that person’s predicted duration. For participants in the estimation condition, correlation between bias (log proportional error) in memory and bias in prediction was .62, n ⫽ 39, Z ⫽ 4.37, p ⬍ .001. Experiment 2 Overall, it appears that supplying feedback about how long the task took previously improved participants’ ability to predict how long it would take them to perform the task again in comparison to participants relying on memory of previous task duration. Although the effect of accurate feedback was positive, it is possible that, for the control group, their having estimated previous task duration produced some bias in prediction. That is, we do not know how accurate participants would have been if asked simply to predict duration without having made an estimate. To address this, in Experiment 2 we compared how feedback influenced prediction in comparison to either estimating previous task duration or receiving no intervention. Also, in this experiment, we examined whether the timing of the feedback makes a difference in enhancing predictions. Participants performed an essay-writing task and were asked to return 1 week later to repeat the task. Feedback was provided either directly after performing the task, directly before prediction, or not at all. 270 ROY, MITTEN, AND CHRISTENFELD Method Participants. One hundred sixteen University of California, San Diego, students (86 females, 30 males, M age ⫽ 20.0 years) completed the experiment (3 participants were not able to return 1 week later and 4 participants were excluded due to technical problems). All participants received course credit for their psychology classes in exchange for participation. Design. The experiment was a 3 ⫻ 3 between-participants design, with practice trial information (estimate, feedback, or none) provided immediately after the practice trial, and practice trial information (estimate, feedback, or none) provided 1 week later directly before the task was performed again. After participants wrote an essay (practice trial), they were either asked to estimate the duration of that task, given feedback about the duration, or given no information. Participants returned approximately 1 week later (M ⫽ 6.8 days, SD ⫽ 1.1), again were placed in one of the three intervention conditions, fully crossed, then predicted how long it would take to complete a similar essay and completed that task. Thus, for example, some participants would be told how long it had taken them right after completing the task, and then, a week later, would be asked to estimate how long it had taken them last week to do it before predicting how long doing it again would take. Procedure. Participants arrived singly for the experiment and were asked to remove their watch and any rings or bracelets they were wearing. Participants were asked to write a short, twoparagraph essay on either “why it is not good to litter” or “the possible negative consequences of smoking,” randomly determined. Participants were surreptitiously timed while completing the essay. After completion, they either were given no information about their completion time, were asked to estimate completion time, or were given feedback on actual duration to write the essay. Participants in the feedback condition were shown the face of the stopwatch to verify that the time supplied by the experimenter was correct. Next, participants scheduled a time as close to exactly 1 week later as possible when they would return for the second portion of the experiment. On returning, participants again either were given no information about the previous practice trial, were asked to estimate how long it had taken last time, or were told the exact duration from the previous week. All participants were then asked to predict how long it would take them to write a second essay on whichever topic, littering or smoking, they had not written about in the first session, and then wrote the essay. Predicted duration was compared with actual duration for the second essay. Overall, participants were assigned into one of nine conditions, depending on which of the three interventions they received at Time 1 and which at Time 2. Results and Discussion Memory. It took participants 15.2 min on average to complete the task the first time. Participants estimating duration directly after writing the essay (n ⫽ 37) exhibited little bias (M log proportional error not different from 0), t(36) ⫽ ⫺.31, p ⫽ .7, d ⫽ 0.05. See Table 2, immediate recall, for full results. Of greater interest was memory for task duration 1 week later, right before participants predicted duration for a similar task. Participants estimating how long it had taken to write their essay 1 week earlier (n ⫽ 39) overestimated duration, t(38) ⫽ ⫺2.38, p ⫽ .02, d ⫽ 0.38, Table 2 Accuracy in Remembered Duration for Task 1 in Experiment 2 Accuracy Immediate recall Delayed recall M (SE) log proportional error M (SE) absolute error Relative accuracy (r) ⫺0.01 (0.02) 3.24 (0.50) .86 0.06 (0.02) 3.63 (0.58) .87 with no difference in bias due to which condition they were in the previous week, as indicated by an analysis of variance (ANOVA) on log proportional error by condition at Time 1, F(2, 36) ⫽ 2.13, p ⫽ .13, 2 ⫽ .11 (see Table 2, delayed recall). Prediction. Participants were able to finish the essay approximately 1.7 min more quickly the second time, taking an average of 13.5 min to complete the task. Participants again underestimated their improvement, predicting that they would finish, on average, slightly less than 1 min faster than their last performance. Their not appreciating how much they would improve increased their overall tendency to overestimate how long it would take to repeat the task. This result, and the similar finding from the first experiment, suggests that people may actually be pessimistic about their futures, at least when it comes to how much they will improve on various tasks. As with remembered duration, participants tended to overestimate how long it would take them to complete the task the second time, when predicted duration for how long it would to take to write the second essay was compared with the actual duration (see Table 3). When log proportional error was analyzed in a 3 (condition at Time 1) ⫻ 3 (condition at Time 2) ANOVA, there was a significant effect of condition at Time 2, F(2, 107) ⫽ 8.29, p ⬍ .001, 2 ⫽ .13. Post hoc analysis using Fisher’s protected least significant difference indicates that log proportional error was less when participants were supplied with feedback at Time 2 (M ⫽ .002) than when they were given no intervention (M ⫽ .072) or asked to remember previous task duration (M ⫽ .121, ps ⬍ .03). As can be seen in Table 3, this same relation held when accuracy was measured using absolute error: There was an overall effect of condition at Time 2, F(2, 107) ⫽ 5.45, p ⬍ .01, 2 ⫽ .09, with a decrease in error of 0.8 min when comparing the feedback condition with the control condition at Time 2 (ns) and 2.7 min when comparing feedback with estimation ( p ⬍ .01). There was no significant effect of condition at Time 1 and no interaction between conditions at Time 1 and Time 2 for either log proportional error or absolute error (all Fs ⬍ 1.9). Whereas the previous analysis indicates no effectiveness of feedback at Time 1 on prediction at Time 2, inspection of Tables 3 and 4 indicates a possible benefit. In the absence of intervention at Time 2, feedback at Time 1 appears to have reduced error. A one-way ANOVA of log proportional error by Time 1 condition for participants who received no intervention at Time 2 (n ⫽ 39) indicates a significant effect of condition, F(2, 36) ⫽ 6.38, p ⬍ .01, 2 ⫽ .26, such that directional error was less in the feedback condition than in either the control ( p ⬍ .01) or estimation ( p ⫽ .05) condition. There was a similar trend also for absolute error, where receiving feedback reduced error by approximately 2 min overall, although this effect was not significant, F(2, 36) ⫽ 1.66, p ⫽ .2, 2 ⫽ .08. IMPROVING PREDICTIVE ACCURACY Table 3 Log Proportional Error in Prediction for Task 2 in Experiment 2 271 Table 5 Correlation Between Predicted and Actual Duration for Task 2 in Experiment 2 Time 2 Time 2 Time 1 Control Estimate Feedback Time 1 Control Estimate Feedback M (SE) control M (SE) estimate M (SE) feedback 0.14 (0.03) 0.08 (0.03) 0.00 (0.02) 0.11 (0.05) 0.14 (0.04) 0.11 (0.04) ⫺0.02 (0.03) 0.00 (0.03) 0.02 (0.03) Control Estimate Feedback Overall .82 .87 .88 .84 .65 .79 .68 .72 .93 .91 .80 .88 The potential benefit of correcting memory after task performance appears to have been negated when participants were asked to estimate previous task duration 1 week later. This is likely due to participants’ being unable to accurately recall task duration; of the 13 participants who had received precise task-duration feedback at Time 1 and then estimated at Time 2 how long it had taken them, none recalled the correct duration (average absolute error ⫽ 2.14 min). Overall, participants estimating prior task duration were inaccurate in their consequent prediction, no matter what information they had received the previous week. Relative accuracy, the correlation between log estimated and log actual duration at Time 2, was very high in all conditions. Table 5 gives the correlations between predicted duration and actual duration for Task 2 for all nine groups as well as the overall correlations when participants are grouped by condition at Time 2. The large correlations between estimated and actual duration were likely due to the great variability in the time it took participants to complete the task—actual duration ranged from 5 to 50 min for the task. The pattern of correlations between estimated and actual duration was similar to that found for the other measures of accuracy, with relative accuracy highest when feedback was given at Time 2 (r ⫽ .88) and lowest when participants estimated previous task duration at Time 2 (r ⫽ .72); the difference between these two groups is significant, Z ⫽ 2.03, p ⫽ .04. Again, it appears that participants in both the estimation and feedback conditions (Time 2) used the information at their disposal to aid in prediction. For participants who estimated duration at Time 2, the correlation between estimation for previous task duration and prediction of future task duration was .88, n ⫽ 39, Z ⫽ 8.11, p ⬍ .001, and for participants given feedback at Time 2, the correlation between feedback and prediction was .84, n ⫽ 38, Z ⫽ 7.16, p ⬍ .001. Relationship between accuracy in memory and prediction. One week after performing the task, most participants remembered the task as taking longer than it actually had, and, correspondingly, they overestimated how long it would take to perform a similar task. As with the previous experiment, results indicated that a Table 4 Absolute Error in Prediction for Task 2 in Experiment 2 Time 2 Time 1 Control Estimate Feedback M (SE) control M (SE) estimate M (SE) feedback 4.5 (1.3) 3.9 (1.0) 2.1 (0.6) 5.6 (1.4) 5.5 (1.2) 5.0 (1.0) 2.7 (0.7) 2.3 (0.9) 3.0 (0.9) participant’s accuracy in recall was prognostic of their accuracy in prediction. For the 39 participants asked to estimate previous essay duration at Time 2, the correlation in log proportional error between memory and prediction was .49, Z ⫽ 3.25, p ⫽ .001. Even though participants did not originally overestimate when asked to estimate duration directly after completion, there is some evidence that individual bias in memory from a week earlier was prognostic of error in prediction at Time 2. That is, participants who tended to remember the task as taking longer than it had were also likely to predict it to take longer than it would. For the 12 participants who estimated duration at Time 1 and received no intervention at Time 2, the correlation in log proportional error between memory and prediction was .53, Z ⫽ 1.77, p ⫽ .08. However, because the sample size was small, we are unable to determine whether or not this relationship was reliable. Experiment 3 In Experiments 1 and 2, participants were given, and used, individual feedback on their own previous task performance. However, access to a specific person’s history may not be possible in all situations. Therefore, in Experiment 3, we examined whether it is also effective to supply general population averages in increasing predictive accuracy. Participants were approached before performing various tasks in real-world settings and were supplied, or not, with average task duration before estimating how long the task would take. In addition to changing to a nonlaboratory setting, Experiment 3 also tested the feedback intervention using a very different population (older and including more men) than the first two experiments. Method Participants. Participants were 185 southern California residents (65 females, 120 males, M age ⫽ 36 years) approached at a barbershop, car wash, gas station, Subway restaurant, Wendy’s restaurant, and video arcade. Store managers agreed to allow recruitment of their customers. Design and procedure. Participants were approached at the various locations and asked to answer a few questions for a university study. The first half of participants in each location served as the control group. They were asked to estimate how long it would take to perform the selected task and then were timed while performing it. The remaining participants served as the experimental group. They were given the average duration of the control group participants as feedback before estimating how long the task would take and then they completed the task while being timed. Table 6 lists the task locations, number of participants, how ROY, MITTEN, AND CHRISTENFELD 272 Table 6 Task Information for Experiment 3 Participants (n) Timing Task location Control Feedback Start Stop Feedback Gas station Wendy’s Video arcade Subway Barbershop Car wash 16 15 16 15 15 15 15 15 17 15 15 16 Pumping starts Order placed Game starts Order placed Cover clipped on Car approached by attendant Pumping ends Food received Game ends Food received Cover removed Customer returns to car 1 min 27 s 1 min 40 s 2 min 49 s 4 min 23 s 20 min 25 s 34 min 31 s the task was timed, and the average duration for the control group (which was the feedback supplied to participants in the experimental group). Results and Discussion Figure 1 gives the directional error for the tasks, with tasks ordered from shortest to longest in duration. A 6 (task) ⫻ 2 (condition) ANOVA on log proportional error revealed a difference in bias for the various tasks, F(5, 173) ⫽ 23.82, p ⬍ .001, 2 ⫽ .39. As has been found previously (Roy & Christenfeld, 2008), directional error moved from overestimation for the short tasks toward underestimation for the longer tasks. Central to our interest, there was a significant interaction between task type and experimental condition, F(5, 173) ⫽ 3.34, p ⬍ .007, 2 ⫽ .05. Post hoc tests indicate that there was a reduction in overestimation for pumping gas and in underestimation for buying food at Subway for participants in the feedback condition ( ps ⬍ .05). There was not a significant decrease in bias for the other four tasks, although it should be noted that for those four tasks participants in the control group did not exhibit significant bias initially ( ps ⬎ .3). A decrease in bias would not be expected where none initially existed. Across all tasks, average absolute error was 4.70 min in the control condition and 2.24 min in the feedback condition, a reduction in error of 52.3%. A 6 (task) ⫻ 2 (condition) ANOVA on absolute error indicates that the decrease in error was significant, F(1, 173) ⫽ 12.45, p ⬍ .001, 2 ⫽ .04 (see Figure 2). There was also an effect of particular task type on overall absolute error, with long tasks (getting a haircut and getting a car wash) having larger overall error, F(5, 173) ⫽ 18.23, p ⬍ .001, 2 ⫽ .32, but no significant interaction between task type and reduction in error, F(5, 173) ⫽ 1.93, p ⫽ .09, 2 ⫽ .03. Supplying an average duration before prediction also increased relative accuracy: The correlation between the log of the estimated and the log of the actual duration was greater in the feedback condition for all six tasks (see Table 7). When comparing the average correlation for the two conditions, the increase in relative accuracy for the feedback condition was significant, Z ⫽ ⫺2.16, p ⫽ .03. Participants seemed to have some notion about how their performance compared with that of others. This suggests that, when given a value to use as an anchor for their prediction, participants were able to appropriately adjust their predictions upward or downward on the basis of their previous experiences 0.7 Control 0.6 Feedback Log Proportional Error 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 Gas Station Wendy's Video Arcade Subway Barbershop Carwash Task Location Figure 1. Log proportional error (log[estimated duration/actual duration]) for control and feedback conditions for Experiment 3. Tasks ordered from shortest to longest. Error bars represent standard error. IMPROVING PREDICTIVE ACCURACY 273 18 Control 16 Feedback Absolute Error 14 12 10 8 6 4 2 0 Gas Station Wendy's Video Arcade Subway Barbershop Carwash Task Location Figure 2. Absolute error (abs[estimated duration-actual duration]) for control and feedback conditions for Experiment 3. Error bars represent standard error. and, thereby, increase relative accuracy. Although participants may not know the exact duration for a task, they seem to be aware of the relative duration. Supplying feedback for previous task duration led to increased accuracy, even though, in this case, participants were given information not about their own previous performance, but that of others. Related to this, a previous experiment found that participants’ predictions were affected by reported completion times of other participants, except that bias was increased if the reported completion times for others were either very short or very long (Thomas & Handley, 2008). Although our results indicate that supplying general averages can improve prediction, results must be qualified somewhat by the nature of the tasks used. A number of the tasks (receiving a haircut, getting a car wash, and waiting for take-out food to be prepared) are tasks largely beyond participants’ control. It could have been that on tasks within their control, when duration needed to complete a task reflected on the self, people were more resistant to debiasing. However, results suggest that control over the task did not influence willingness to use feedback. Reduction in bias was actually greatest for pumping gas, a task for which the participant was the main actor. Also, absolute error was cut approximately in half for all the tasks regardless of level of involvement. Central to our interest, however, is that people were Table 7 Correlation Between Predicted and Actual Duration for Experiment 3 Task location No feedback Feedback Gas station Wendy’s Video arcade Subway Barbershop Car wash Average .27 ⫺.03 .05 .22 .30 .15 .16 .59 .56 .30 .23 .62 .41 .46 inaccurate in their predictions when not given feedback, but they improved with feedback. General Discussion These three experiments indicate that correcting memory increases accuracy of prediction for how long a task will take. Supplying participants with feedback on previous task duration, whether their own or that of people in general, led to increased accuracy for several different tasks, in diverse environments, with heterogeneous participant populations, with various amounts of delay after experience with a similar task, and for bias in the direction of both underestimation and overestimation. For tasks where prediction was biased, supplying feedback reduced that bias. Across the range of tasks, supplying feedback cut overall error approximately in half. Feedback also helped increase the relative accuracy of participants’ predictions, even when the feedback was not related to their own previous performance. Participants who did not receive feedback were inaccurate in their predictions and, in support of the memory bias account, this appears to be attributable to inaccuracy in remembered duration. A person’s bias in memory for completion time was prognostic of that person’s bias in predicted duration for similar tasks. Previous attempts at increasing predictive accuracy have been largely unsuccessful (see Roy et al., 2005b, for review). A reason that has been given for the lack of success in these studies, based on the planning fallacy, is that people are predisposed to look narrowly at the task when making a prediction, which leads them to disregard information about how long similar tasks have usually taken, even when they are explicitly supplied with such information (Griffin & Buehler, 2005). In three experiments, however, our participants were willing to use feedback when it was provided, and used it to improve predictive accuracy. This was true even when participants were supplied with information that was unrelated to their own previous performances. In light of these findings, people’s previous failures to improve prediction may not 274 ROY, MITTEN, AND CHRISTENFELD have been due to their unwillingness to use information in aiding prediction. Given that the experiments asked participants to recall previous durations, the error is instead likely due to the uncertain accuracy of the information that participants generated. However, it should be pointed out that none of our three experiments is an example of the planning fallacy: Bias was not consistently in the direction of underestimation, and we did not measure participants’ beliefs about whether or not similar tasks had taken longer than planned in the past. It is not clear from our result whether the feedback intervention would have been as successful if the tasks used had met these criteria. In general, the intervention used here may not be as successful for much longer, real-world tasks. When tasks are completed over the span of days and weeks, many more complicating factors, such as interruptions and time spent performing other unrelated tasks, may influence completion time. Further research is necessary to determine whether the beneficial effect found here holds for much longer tasks. The memory bias account attempts to explain bias found in prediction for a wide variety of tasks: short tasks, long tasks, novel tasks, familiar tasks, tasks for which overestimation is likely, and tasks for which underestimation is likely (Roy, Christenfeld, & McKenzie, 2005a). Previous experiments have found bias for a large range of tasks, including shorter experiments that did not allow for the possibility of interruptions or surprises, similar to those employed here (Boltz et al., 1998; Burt & Kemp, 1994; Byram, 1997; Francis-Smythe & Robertson, 1999; Josephs & Hahn, 1995; Koneçni & Ebbesen, 1976; König, 2005; Kruger & Evans, 2004; Roy & Christenfeld, 2007, 2008; Thomas et al., 2003, 2004). At least for short tasks, and in support of the memory bias account, results indicate that correcting memory for previous task duration is an effective intervention. The method of intervention used in these experiments was fairly heavy-handed, with participants given explicit information about previous performance right before prediction. It is thus possible to think of the finding as a form of demand characteristic, with participants using the feedback because the experimenter supplied it. However, such a view does not undermine the importance of the findings. The goal was not to try to trick participants into arriving at better predictions, but to give them information that they would willingly use to their benefit. In fact, any intervention where the experimenter supplies information aimed at helping people improve prediction has, to some extent, a demand characteristic. Previous attempts at increasing predictive accuracy—telling participants what to consider, how to make their predictions, and so on— have had the same sort of demand characteristics, but were unsuccessful. Similarly, we found no benefit from asking participants to remember previous task duration before making a prediction, even though there was likely a demand characteristic in this condition as well. An additional benefit of a conscious strategy such as memory correction is that anyone can easily be taught to apply it in domains where predictive accuracy is known to be suspect. Overall, our results indicate that, when predicting duration, people do well when they rely not on memory of past task duration but instead on measures of actual duration. Information on actual previous duration can come from either their own experience or that of others. Unfortunately, in many situations, getting accurate information on task durations may not be simple, especially for tasks that are performed piecemeal across multiple sessions over a span of weeks, months, or even years. However, there are cases in which data on task duration are available but underutilized. For example, in the area of software design, companies routinely keep detailed records of how long previous projects have taken in the form of billable-hours records. Even though these data are often available, employees responsible for predicting new project durations rarely, if ever, consult these records (Jorgensen, 2004). Simply compiling the data and organizing records by a few different variables, such as relative size of the project (small, medium, or large), and whether or not the project involved development of new technology, could be beneficial in predicting future task duration (Jorgensen, 2007). In support of the memory bias account, one study found that when software designers were predicting how long a project would take, prediction was more accurate if the designers could identify previous projects that were similar and were able to determine, objectively, how long those projects had taken (Jorgensen, 2004). Taken with the current results, this suggests that people are often inaccurate in their memories of how long tasks have taken previously and in their predictions on how long they will take in the future, and that improving the accuracy of memory leads to improved accuracy in prediction. References Armor, D. A., & Taylor, S. E. (1998). Situated optimism: Specific outcome expectancies and self-regulation. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 30, pp. 309 –379). San Diego, CA: Academic Press. Balzer, W. K., Doherty, M. E., & O’Connor, R., Jr. (1989). Effects of cognitive feedback on performance. Psychological Bulletin, 106, 410 – 433. Block, R. A., & Zakay, D. (1997). Prospective and retrospective durations judgments: A meta-analytic review. Psychonomic Bulletin & Review, 4, 184 –197. Boltz, M. G., Kupperman, C., & Dunne, J. (1998). The role of learning in remembered duration. Memory & Cognition, 26, 903–921. Buehler, R., & Griffin, D. (2003). Planning, personality, and prediction: The role of future focus in optimistic time predictions. Organizational Behavior and Human Processes, 92, 80 –90. Buehler, R., Griffin, D., & MacDonald, H. (1997). The role of motivated reasoning in optimistic time predictions. Personality and Social Psychology Bulletin, 23, 238 –247. Buehler, R., Griffin, D., & Ross, M. (1994). Exploring the “planning fallacy”: Why people underestimate their task completion times. Journal of Personality and Social Psychology, 67, 366 –381. Buehler, R., Griffin, D., & Ross, M. (2002). Inside the planning fallacy: The causes and consequences of optimistic time prediction. In T. D. Gilovich, D. W. Griffin, & D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive judgment (pp. 250 –270). New York: Cambridge University Press. Buehler, R., Messervey, D., & Griffin, D. (2005). Collaborative planning and prediction: Does group discussion affect optimistic biases in time estimation? Organizational Behavior and Human Decision Processes, 97, 47– 63. Burt, C. D. B. (1992). Reconstruction of the duration of autobiographical events. Memory & Cognition, 20, 124 –132. Burt, C. D. B., & Kemp, S. (1991). Retrospective duration estimation of public events. Memory & Cognition, 19, 252–262. Burt, C. D. B., & Kemp, S. (1994). Construction of activity duration and time management potential. Applied Cognitive Psychology, 8, 155–168. Burt, C. D. B., Kemp, S., & Conway, M. (2001). What happens if you retest autobiographical memory 10 years on? Memory & Cognition, 29, 127–136. IMPROVING PREDICTIVE ACCURACY Byram, S. J. (1997). Cognitive and motivational factors influencing time prediction. Journal of Experimental Psychology: Applied, 3, 216 –239. Connolly, T., & Dean, D. (1997). Decomposed versus holistic estimates of effort required for software writing tasks. Management Science, 43, 1029 –1045. Epley, N., & Dunning, D. (2006). The mixed blessing of self-knowledge in behavioral prediction: Enhanced discrimination but exacerbated bias. Personality and Social Psychology Bulletin, 32, 641– 655. Fraisse, P. (1963). The psychology of time. New York: Harper & Row. Francis-Smythe, J. A., & Robertson, I. T. (1999). On the relationship between time management and time estimation. British Journal of Psychology, 90, 33–347. Griffin, D., & Buehler, R. (1999). Frequency, probability, and prediction: Easy solutions to cognitive illusions? Cognitive Psychology, 38, 48 –78. Griffin, D., & Buehler, R. (2005). Biases and fallacies, memories and predictions: Comment on Roy, Christenfeld, and McKenzie (2005). Psychological Bulletin, 131, 757–760. Hinds, P. J. (1999). The curse of expertise: The effects of expertise and debiasing methods on predictions of novice performance. Journal of Experimental Psychology: Applied, 5, 205–221. Jorgensen, M. (2004). Top-down and bottom-up expert estimation of software development effort. Information and Software Technology, 46, 3–16. Jorgensen, M. (2007). Forecasting of software development work effort: Evidence on expert judgment and formal models. International Journal of Forecasting, 23, 449 – 462. Jorgensen, M., & Sjoberg, D. I. K. (2001). Impact of effort estimates on software project work. Information and Software Technology, 43, 939 –948. Josephs, R. A., & Hahn, E. D. (1995). Bias and accuracy in estimation of task duration. Organizational Behavior and Human Decision Processes, 61, 202–213. König, C. J. (2005). Anchors distort estimates of expected duration. Psychological Reports, 96, 253–256. Kahneman, D., Krueger, A. B., Schkade, D. A., Schwarz, N., & Stone, A. A. (2004, December 3). A survey method for characterizing daily life experience: The day reconstruction method. Science, 306, 1776 –1780. Kahneman, D., & Tversky, A. (1979). Intuitive prediction: Biases and corrective procedures. TIMS Studies in Management Sciences, 12, 313–327. Kahneman, D., & Tversky, A. (1982). Intuitive prediction: Biases and corrective procedures. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgments under uncertainty: Heuristics and biases (pp. 414 – 421). Cambridge, England: Cambridge University Press. Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119, 254 –284. Koneçni, V. J., & Ebbesen, E. E. (1976). Distortions of estimates of numerousness and waiting time. Journal of Social Psychology, 100, 45–50. Koole, S., & Van’t Spijker, M. (2000). Overcoming the planning fallacy through willpower: Effects of implementation intentions on actual and predicted task-completion times. European Journal of Social Psychology, 30, 873– 888. Kruger, J., & Evans M. (2004). If you don’t want to be late, enumerate: Unpacking reduces the planning fallacy. Journal of Experimental Social Psychology, 40, 586 –598. 275 Molokken-Ostvold, K., & Jorgensen, M. (2005). Expert estimation of Web-development projects: Are software professionals in technical roles more optimistic than those in non-technical roles? Empirical Software Engineering, 10, 7–29. Newby-Clark, I. R., Ross, M., Buehler, R., Koehler, D. J., & Griffin, D. (2000). People focus on optimistic scenarios and disregard pessimistic scenarios while predicting task completion times. Journal of Experimental Psychology: Applied, 6, 171–182. Poynter, D. (1989). Judging the duration of time intervals: A process of remembering segments of experience. In I. Levin & D. Zakay (Eds.), Time and human cognition: A life-span perspective (pp. 305–322). Amsterdam: North-Holland. Remus, W., O’Connor, M., & Griggs, K. (1996). Does feedback improve the accuracy of recurrent judgment forecasts? Organizational Behavior and Human Decision Processes, 66, 22–30. Roy, M. M., & Christenfeld, N. J. S. (2007). Bias in memory predicts bias in estimation of future task duration. Memory & Cognition, 35, 557–564. Roy, M. M., & Christenfeld, N. J. S. (2008). Effect of task length on remembered and predicted duration. Psychonomic Bulletin & Review, 15, 202–207. Roy, M., Christenfeld, N., & McKenzie, C. (2005a). The broad applicability of memory bias and its coexistence with the planning fallacy: Reply to Griffin and Buehler (2005). Psychological Bulletin, 131, 761–762. Roy, M. M., Christenfeld, N. J. S., & McKenzie, C. R. M. (2005b). Underestimating the duration of future events: Memory incorrectly used or memory bias. Psychological Bulletin, 131, 738 –756. Stone, E. R., & Opel, R. B. (2000). Training to improve calibration and discrimination: The effect of performance and environmental feedback. Organizational Behavior and Human Decision Processes, 83, 282–309. Taylor, S. E., Pham, L. B., Rivkin, I. D., & Armor, D. A. (1998). Harnessing the imagination. American Psychologist, 53, 429 – 439. Thomas, K. E., & Handley, S. J. (2008). Anchoring in time estimation. Acta Psycologica, 127, 24 –29. Thomas, K. E., Handley, S. J., & Newstead, S. E. (2004). The effects of prior experience on estimating the duration of simple tasks. Current Psychology of Cognition, 22, 83–100. Thomas, K. E., Handley, S. J., & Newstead, S. E. (2007). The role of prior task experience in temporal misestimation. Quarterly Journal of Experimental Psychology, 60, 230 –240. Thomas, K. E., Newstead, S. E., & Handley, S. J. (2003). Exploring the time prediction process: The effect of task experience and complexity on prediction accuracy. Applied Cognitive Psychology, 17, 655– 673. Wallace, M., & Rabin, A. I. (1960). Temporal experience. Psychological Bulletin, 57, 213–236. Walters, R. H. (1933). The specificity of knowledge of results and improvement. Psychological Bulletin, 30, 673. Youmans, R. J., & Stone, E. R. (2005). To thy own self be true: Finding the utility of cognitive information feedback. Journal of Behavioral Decision Making, 18, 319 –341. Received July 9, 2007 Revision received February 22, 2008 Accepted February 25, 2008 䡲