Testing and Spacing: Keys to Enhancing Learning and Retention Sean Kang Department of Psychology, UCSD TDLC Bootcamp Aug 10, 2009 Purpose of Tests / Quizzes • Traditionally, an assessment tool • But testing does not merely measure the contents of memory • Taking a test can serve as a learning opportunity, enhancing memory retention to a greater extent than additional studying… the testing effect (also referred to as retrieval practice) Spitzer (1939) • 3,605 sixth-graders in Iowa • Students read ~600-word article on the bamboo plant • 25-item multiple-choice test (no feedback) • Varied the retention interval and frequency of testing Spitzer (1939) Time After Studying (Days) Group 0 1 1 T1 T2 2 T1 3 4 5 6 7 8 7 14 21 28 63 T3 T2 T1 T3 T2 T1 T2 T1 T2 T1 T2 T1 T1 Spitzer (1939) Time After Studying (Days) Group 0 1 13.2 2 13.2 3 4 5 6 7 8 1 7 14 21 28 63 9.6 7.9 7.0 6.5 6.8 6.4 Spitzer (1939) Time After Studying (Days) Group 0 1 1 13.2 13.1 2 13.2 3 4 5 6 7 8 7 14 21 28 63 12.2 11.8 9.6 10.7 8.9 7.9 8.2 7.0 7.1 6.5 7.1 6.8 6.4 The Testing Effect • Journal of Educational Psychology, 1989: • Dempster, F. N. (1992). Using tests to promote learning: A neglected classroom resource. Journal of Research & Development in Education, 25, 213–217. • Resurgence of interest in the testing effect in recent years Roediger & Karpicke (2006) • Stimuli: 2 prose passages from TOEFL prep book (~260 words each) • Learning condition (within-subjects): – Restudy (two 7-min periods of study) vs. Test (7-min period of study, followed by 7-min period of test) • Retention interval (between-subjects): – 5 min, 2 days, or 1 week Does testing benefit memory for non-verbal materials? • Past research has focused exclusively on verbal materials (or at least required verbal responses at test) Carpenter & Pashler (2007) Roediger & Karpicke (2008) • Stimuli: 40 Swahili-English word pairs • Subjects studied and were tested on the Swahili words in alternating blocks d = 4.03 Testing effect: How does it work? 1. Additional (focused) presentation of material 2. Operations/processes engaged by an initial test are also engaged during the final test, resulting in positive transfer to same type of tests (i.e., practice effect) 3. Retrieval itself is a potent memory modifier, with increasing retrieval demand/effort enhancing later retention Does test format matter? Initial test type - Short Answer (SA), Multiple Choice (MC), Read Fact Final, criterial test (SA, MC) Corrective feedback given after each initial test question. COMPETING PREDICTIONS: 1) Repeated exposure 2) Transfer appropriate processing 3) Retrieval “effort” Initial MC Initial SA MC SA Final test Initial MC Initial SA MC SA Final test Procedure ENCODING Read 4 Current Directions articles ~15 min each INTERVENING EXPERIENCE FINAL TEST Multiple choice Short answer Read answer Control/filler Within-Subjects, after each article 8 items/condition 3 days Mult. choice (16): 4 from each of the 4 prior conditions Short answer (16): 4 from each of the 4 prior conditions Feedback provided after each test question N=48 (Kang, McDermott, & Roediger, 2007) Sample Test Question (E.g., after reading article on literacy acquisition by Rebecca Treiman) Read Fact: Young Joe is more likely to know the name of the letter ‘j’ than Alice or Tom. Short Answer: Young Joe is more likely to know the _______ of the letter ‘j’ than Alice or Tom. Multiple-choice: a. place of articulation b. phoneme c. name d. sound Testing enhanced later memory, and the enhancement was greater when the initial test format was short answer None 1 Read statements 0.9 MC 0.8 SA INITIAL TEST Proportion Correct 0.7 0.6 0.5 0.4 0.3 .69 .83 .87 .94 0.2 .27 .46 .53 .57 0.1 0 FINAL MC FINAL SA COMPETING PREDICTIONS: Transfer appropriate processing Retrieval “effort” Initial MC Initial SA MC SA Final Test Initial MC Initial SA MC SA Final Test Does feedback matter? ENCODING Read 4 Current Directions articles ~15 min each INTERVENING EXPERIENCE FINAL TEST Multiple choice Short answer Read answer Control/filler Within-Subjects, after each article 8 items/condition 3 days Mult. choice (16): 4 from each of the 4 prior conditions Short answer (16): 4 from each of the 4 prior conditions Feedback provided after each test question N=48 (Kang, McDermott, & Roediger, 2007) Corrective feedback important, especially when initial Does feedback matter? test performance is not high None 1 Read statements 0.9 MC 0.8 SA INITIAL TEST Proportion Correct 0.7 0.6 0.5 0.4 .74 .88 .87 .80 0.3 0.2 .33 .51 .62 .48 0.1 0 FINAL MC FINAL SA The Testing Effect • Taking a test can be a potent learning event, often yielding better long-term retention than additional studying. • Testing benefits learning of a diverse range of materials, both verbal and nonverbal. • Repeated retrieval practice augments the benefit. • The size of the testing effect is modulated by test format & feedback – Tests requiring effortful retrieval are more effective at enhancing retention, implicating retrieval as a causal mechanism – To maximize the benefit of testing, feedback should be provided when initial test performance is low The Spacing Effect • Reviews are more effective when distributed or spaced out, rather than massed (with total time equated) • One of the most robust phenomenon; observed with diverse range of materials / types of learning • Ebbinghaus (1885): – When learning to recite a list of 12 nonsense syllables, if 68 repetitions in one day, 7 repetitions required the next day to relearn. If 38 repetitions spread across 3 days, however, 6 repetitions required the following day to relearn. “…with any considerable number of repetitions a suitable distribution of them over a space of time is decidedly more advantageous than the massing of them at a single time.” The Spacing Effect Inter-Study Interval (ISI) Or practise retrieving Spacing effect: Spaced > Massed Lag effect: Comparison of different levels of spacing Theoretical accounts • Deficient processing theory – At short ISI, processing of 2nd presentation is deficient; less attention paid to an item that is relatively more familiar • Encoding variability theory – Item and its context stored at encoding; – Context is assumed to undergo random drift; – Average distance between any prior context and the current context will increase with passing of time; – Likelihood of successful retrieval depends on the distance between context at test and context at encoding; – As ISI increases, increased probability that test context will be similar to at least one of the study/encoding contexts The Spacing Effect • Is there an optimal ISI / gap? • Does the answer depend on the RI? < Cepeda et al. (2006) The Spacing Effect • For RI >= 1 day, is a 1-day ISI/gap sufficient to produce most/all of the benefit of spacing? • Only a handful of studies provide multi-gap comparisons, with RI >= 1 day. Cepeda et al. (2009), Experiment 1 • N = 182 • Stimuli: 40 Swahili-English word pairs • ISI / Gap (between-subjects): – 0, 1, 2, 4, 7, and 14 days • RI: 10 days • Procedure – Session 1: All items presented for study once, followed by testing with feedback until all items successfully recalled 2x. – Session 2: After appropriate gap, all items tested 2x with feedback. – Session 3: After 10-day RI, final test. Cepeda et al. (2009), Experiment 2 • N = 161 • Stimuli: 2 sets – Obscure facts (e.g., Who invented snow golf? Rudyard Kipling) – Photographs of not-well-known objects paired with facts E.g., Name this model, in which Amelia Earhart made her ill fated last flight. Lockheed Electra. • ISI / Gap (between-subjects): – 0, 1, 7, 28, 84, 168 • RI: 168 days Cepeda et al. (2009), Conclusions • Spacing benefits observed with RIs > 1 week • Gap/ISI had non-monotonic effects on final test performance; accuracy increased then decreased as gap increased. • For sufficiently long RIs, optimal gap/ISI > 1 day. Cepeda et al. (2008) • Experiment conducted on the internet • N = 1,354 • 26 different combinations of gaps and RIs • Stimuli: 32 obscure facts • Procedure – Session 1: Learn 32 facts to criterion of one correct recall of each fact. – Session 2: After appropriate gap, subjects tested 2x with feedback. – Session 3: After appropriate RI, final test. Cepeda et al. (2008), Conclusions • For each RI, final performance initially increased with increasing gap, then fell as gap increased further. • The effect of gap was very large: the optimal gap provided a 64% increase (averaged across RIs) in final recall, relative to the 0-day gap condition. • As RI increases, the optimal gap also increases, but the ratio of optimal gap to RI should decline. • Smaller costs associated with using gap that is longer than the optimal value than using gap that is shorter. Expanding vs. Equal Interval Spaced Retrieval Expanding vs. Equal Interval Spaced Retrieval • Landauer & Bjork (1978) demonstrated the advantage of expanding over equal interval retrieval practice. • But findings since then have been rather inconsistent, with several instances of failures to replicate. E.g., Karpicke & Roediger (2007) Applications of Testing & Spacing • Supermemo www.supermemo.com • Spaced Ed www.spaceded.com