A Comparison of Two Flashcard Drill Methods Targeting Word

advertisement

J Behav Educ (2011) 20:117–137

DOI 10.1007/s10864-011-9124-y

O R I G I N A L P A P E R

A Comparison of Two Flashcard Drill Methods

Targeting Word Recognition

Robert J. Volpe • Christina M. Mule´ •

Amy M. Briesch • Laurice M. Joseph •

Matthew K. Burns

Published online: 8 May 2011

Ó Springer Science+Business Media, LLC 2011

Abstract Traditional drill and practice (TD) and incremental rehearsal (IR) are two flashcard drill instructional methods previously noted to improve word recognition. The current study sought to compare the effectiveness and efficiency of these two methods, as assessed by next day retention assessments, under 2 conditions

(i.e., opportunities to respond held constant across methods, length of instructional session held constant across methods) with 4 first-grade students identified by their teachers as struggling readers. Social validity of the two intervention methods also was evaluated. Results suggested important differences in response to instruction across students. Differences in effectiveness between TD and IR were minimal both when holding opportunities to respond and length of instructional session constant across methods. Minimal differences in efficiency were noted between TD and IR when holding instructional time constant across conditions. However, TD was clearly more efficient than IR for all students when holding opportunities to respond constant across conditions. Social validity data indicated that half of the students preferred the TD method and half preferred the IR method. Limitations and implications of the current study for school-based professionals are discussed.

Keywords Reading interventions Incremental rehearsal Traditional drill

Sight word training

R. J. Volpe (

&

) C. M. Mule´ A. M. Briesch

School Psychology Program, Bouve College of Health Sciences, Northeastern University,

404 International Village, 360 Huntington Ave, Boston, MA 02115-5000, USA e-mail: r.volpe@neu.edu

L. M. Joseph

Ohio State University, Columbus, OH, USA

M. K. Burns

University of Minnesota, Minneapolis, MN, USA

123

118 J Behav Educ (2011) 20:117–137

Important links exist between early literacy skill development, better developed reading skills, and overall academic success (Good et al.

2001 ). Moreover, students

who are competent readers are more likely to succeed in school and become productive members of society (Adams

1990

). Sight word recognition, or the ability to read words accurately and automatically, is one such early literacy skill that plays an important role in reading development as it permits children to focus on gaining meaning from text (Samuels

1997

). Development of sight word recognition therefore significantly enhances reading fluency as well as reading comprehension

(Burns et al.

2004 ; Tan and Nicholson

1997

).

The development of automatic or effortless word reading typically requires repeated practice (Schneider and Shiffrin

1977 ). Hence, researchers have empha-

sized the use of flashcard drill methods to promote automaticity and improve word recognition skills for struggling readers who have not adequately responded to interventions focusing upon core areas of reading instruction (e.g., phonemic awareness, phonics, and decoding skills; Browder and Xin

1998 ; Tan and Nicholson

1997

). Flashcard drill techniques have been designed to increase opportunities to practice, or respond to, instructional targets (MacQuarrie et al.

2002 ; Symonds and

Chase

1992 ). Methods that have received considerable attention include traditional

drill and practice, interspersal training, and incremental rehearsal.

Traditional drill and practice (TD) is a flashcard drill method demonstrated to be effective for helping children learn to read whole words that previously were unknown to them (Tan and Nicholson

1997

). Specifically, TD is a flashcard technique in which 100% of the words presented to a student are unknown (e.g.,

1U-2U-3U-4U-5U-1U-2U-3U-4U-5U, where U is an unknown word). In the initial trial, the instructor presents a word to the student, models reading the word accurately, and then asks the student to read the word aloud. During subsequent trials, the student attempts to read the word independently, and the instructor provides the student with corrective feedback on words read inaccurately. This procedure typically continues until the student masters all of the unknown words targeted in that session. Several studies have demonstrated the effectiveness of TD in helping children learn to read new words, as well as enhancing their reading comprehension (Joseph and Schisler

2007 ; Tan and Nicholson 1997 ).

Alternatively, several drill methods employ words that the student already knows, by interspersing known words within a pool of unknown words. Research has found that the inclusion of known words increases motivation, task preference, and task completion rates and enhances learning through differential reinforcement

(Burns et al.

2009

; Dunlap

1984 ; Dunlap and Koegel 1980 ; Skinner 2002 ). One

common interspersal procedure, interspersal training, aims to teach new words by interspersing previously learned words among unknown words, while also allowing the ratio of unknown-to-known words to vary (Burns

2004

; Nist and Joseph

2008 ).

In this procedure, the instructor first models the unknown words and then asks the student to read the words aloud herself. The instructor then arranges the unknown words with the known words (i.e., 1K-1U-2U-3U-2K-4U-5U-6U-3K, where K is a known word, and U is an unknown word) and presents them to the student. Students receive corrective feedback for incorrect responses and verbal praise for correct responses.

123

J Behav Educ (2011) 20:117–137 119

Another interspersal technique that has gained significant attention is a procedure called incremental rehearsal (IR; Tucker

1988 ). IR differs from other

interspersal techniques in that unknown words are presented incrementally

(e.g., 1U, 1K, 1U, 1K, 2K, 1U, 1K, 2K, 3K, etc., where 1U is the first unknown word, and 1K is the first known word). Once the instructor presents the unknown word five to nine times (depending on the age and attention span of the student), it becomes the first known word in the next IR sequence, allowing for multiple presentations and many more opportunities to practice that word (MacQuarrie et al.

2002

; Nist and Joseph

2008 ). Several advantages of IR have been noted, including

errorless learning (Browder and Shear

1996 ), spaced repetition (Dempster

1991 ),

and high opportunities for practice (Greenwood et al.

1984

). Moreover, studies have consistently demonstrated a link between IR and high retention and increased generalization rates, as well as enhanced reading fluency and comprehension

(Burns

2007

; Burns et al.

2004

; MacQuarrie et al.

2002 ). Additionally, IR has been

used successfully to teach word recognition to students with developmental disabilities (Browder and Shear

1996 ) and students with limited English proficiency

(Matchett and Burns

2009

). Although numerous studies have demonstrated the positive effects of IR, it has been criticized for being less efficient in helping children learn because its procedures are lengthier than others (Skinner

2008 ). That

is, research has found that IR is less efficient in helping children to learn due to the large number of responses dictated by the procedure (Nist and Joseph

2008

;

Skinner

2008 ).

Several studies have compared these interventions based on their ability to facilitate students’ acquisition of word recognition skills (effectiveness) as well as the rate at which skill acquisition takes place (efficiency). One such study evaluated the effectiveness and efficiency of teaching intermediate-grade students new words using high probability sequencing (a flashcard drill method utilizing a high ratio of known to unknown words), interspersal training, and TD (Joseph and

Nist

2006

). Under these three conditions, six words were targeted for each session and the researchers held opportunities to respond (i.e., the number of times students practiced the targeted words) constant and instructional time (i.e., the time taken to complete the intervention) varied across conditions (Joseph and Nist

2006

). Although results indicated no significant differences between conditions with regard to effectiveness, TD was identified as the most efficient method

(Joseph and Nist

2006

). That is, students required less time to learn words in the

TD condition. Moreover, a comparison of the efficiency and effectiveness of two word recognition flashcard drill methods (i.e., TD and IR) and a phonics analysis during repeated reading lessons with students in first through third grade found that

TD was the more effective and efficient instructional condition (Joseph and

Schisler

2007

).

MacQuarrie et al. ( 2002

) compared the effectiveness of drill sandwich (a variation of interspersal training), TD, and IR in teaching third- and seventh-grade students to read new words. The intervention adhered to the standard procedures for each instructional condition, and the unknown words received a varying number of presentations. Specifically, the instructor conducted (a) TD until students demonstrated mastery through three or more successful trials (b) drill sandwich for three

123

120 J Behav Educ (2011) 20:117–137 trials, and (c) IR until three unknown words were incrementally rehearsed with nine known words. Consequently, the authors did not hold either time or opportunities to respond constant across conditions. Results from this study indicated that the instructional conditions that provided the most opportunities to respond were most effective in helping students to retain new words. Specifically, IR was identified as most effective, followed by TD and drill sandwich, but the authors did not report the efficiency of the intervention techniques utilized within the study.

Nist and Joseph ( 2008

) compared TD, interspersal training, and IR methods to teach word recognition to 6 first-grade students. During each intervention session, the instructor targeted six words within each instructional condition. Results from this study were consistent with previous research by MacQuarrie et al. (

2002 ), in

that IR led to higher retention rates than the TD or interspersal training techniques

(Nist and Joseph

2008 ). Also, consistent with previous studies, TD was identified as

the most efficient method for teaching sight word recognition (Joseph and Nist

2006

; Joseph and Schisler

2007 ; Nist and Joseph 2008

; Schmidgall and Joseph

2007 ). However, although Nist and Joseph ( 2008

) targeted the same number of unknown words in each condition, the IR condition allowed more opportunities to respond than TD and interspersal training. This was because in IR unknown words, once learned, are typically folded-in as known words in subsequent trials. Therefore, although opportunities to respond were held constant across TD and interspersal training, carefully controlled comparisons involving IR were not possible making implications regarding IR unclear.

Some of the aforementioned studies have attempted to examine learning and learning rates; however, none have been able to examine IR while truly holding opportunities to respond constant. Although IR typically involves folding in of newly learned words, it is not clear whether this is a critical element in the effectiveness of the intervention procedure. Additionally, there is a need to conduct studies experimentally holding time constant to more accurately understand learning and learning rates (e.g., Skinner et al.

1997

). That is, as opposed to simply calculating instructional efficiency across conditions, a more practical approach would be to compare methods using the same instructional period and allow opportunities to respond to vary (see Skinner

2008

). This is important because the drill procedures being investigated typically require one-to-one instruction, and thus, time is an important consideration in the allocation of limited resources.

Although it is important for children to initially learn and maintain the ability to read new words in isolation, the ultimate goal is for them to be able to read the newly acquired words in unfamiliar contexts. One method of assessing such generalization is to examine whether children can read the newly acquired words when imbedded in sentences (e.g., Nist and Joseph

2008

).

Accordingly, the current study sought to extend extant findings by examining the instructional effectiveness and efficiency of TD and IR (without the folding in of recently learned words) in teaching word recognition while holding opportunities to respond constant (allowing instructional time to vary) and holding instructional time constant (allowing opportunities to respond to vary). Use of a multielement design allowed for the examination of differential effectiveness across interventions. In addition to word recognition retention on next day assessments, maintenance,

123

J Behav Educ (2011) 20:117–137 121 generalization, and social validity also were examined. The following research questions guided the study:

1.

What are the relative effects of IR and TD on the cumulative number of words read correctly on next day retention probes when holding opportunities to respond constant?

2.

What are the relative effects of IR and TD on the cumulative number of words read correctly on next day retention probes when holding time constant?

3.

What are the effects of IR and TD on the cumulative rate of growth in correctly read words on next day retention probes when holding opportunities to respond constant?

4.

What are the effects of IR and TD on the cumulative rate of growth in correctly read words on next day retention probes when holding time constant?

5.

What relative effect did IR and TD have on generalization to a different context

(sentence reading) when opportunities to respond and time were held constant across conditions?

Method

Participants and Setting

Participants included 4 African American students, all 6 years old and in the first grade (3 female and 1 male) who were referred to the research team by their classroom teachers because of word reading difficulties. All 4 students attended a public elementary school in an urban school district in the Northeastern United

States. The student to teacher ratio in the school was 14:1, and approximately 71% of the students were eligible for free or reduced-price lunch. None were receiving special education services.

Two school psychology graduate students served as the interventionists for the study and were responsible for administering the flashcard techniques under investigation (described in detail below). An additional two graduate students served as the procedural integrity and inter-observer agreement data collectors. These graduate students were not involved in the administration of intervention procedures.

The data collectors did interact with the students, although minimally, and the students seemed to be comfortable with them. Training of research assistants consisted of two 1-h training sessions involving didactic instruction and role play.

Intervention sessions occurred twice a week in a semi-private space outside of each student’s classroom in which the interventionist, data collector, and student were the only individuals present. During the intervention, the interventionist and student sat at a right angle from each other at a large desk, while the data collector sat behind the student.

Materials

Prior to intervention, the interventionists conducted a pre-assessment to identify specific targets for instruction. Words used in the assessment consisted of 379

123

122 J Behav Educ (2011) 20:117–137 three- to eight-letter words randomly selected from a number of sources of first-grade word lists (e.g., Dolch Sight Vocabulary List, Houghton Mifflin High Frequency

Words, Enchanted Learning’s Vocabulary Words for First Grade Readers, US

Department of Education website). Words were printed with a landscape orientation in black ink on white 3

00

9 4

00 index cards, shuffled, and presented to the student one at a time. The interventionist asked the students to read each word aloud. If a student mispronounced a word or did not respond within 3 s, the word was categorized as

‘‘unknown’’ and placed in a corresponding pile. The interventionist did not provide error correction during the assessment. If the student correctly pronounced the word within 3 s, the word was categorized as ‘‘known’’ and placed in a known pile. All known words received verbal praise. The interventionist conducted the full assessment on a second occasion (3 days following the first assessment) and omitted any words that were discrepant between administrations (e.g., read correctly on Trial

1 but incorrectly on Trial 2) from the study.

The interventionist next randomly assigned those words categorized as unknown to one of the four instructional conditions: (a) IR with opportunities to respond held constant (IR-ORC), (b) TD with opportunities to respond held constant (TD-ORC),

(c) IR with instructional time held constant (IR-TC), or (d) traditional drill with instructional time held constant (TD-TC). In each condition, the interventionist targeted three unknown words for intervention (i.e., 12 total unknown words targeted for instruction per session), and over the 4 weeks of intervention, a total of

96 words were targeted across the four instructional conditions (24 words targeted in each condition). The interventionist targeted each word in only one intervention session. Because IR involves the use of both known and unknown words, the interventionist randomly selected known words from the pool of known words identified via the pre-assessment for use in the IR conditions.

All words were printed in black ink on 3

00

9 4

00 index cards with landscape orientation; however, the IR conditions utilized blue index cards, and the TD conditions utilized yellow index cards. Colored index cards were used so that students could differentiate between intervention procedures (i.e., IR versus TD), thereby helping to enhance the reliability of the social validity data. The interventionists only used the colored index cards for instructional purposes, and all retention and maintenance assessments utilized the white index cards used during the pre-assessment.

Procedure

Students received intervention two times a week for four weeks. Within a given week, each student participated in three sessions. During the first and second sessions of each week, the interventionists delivered the instructional conditions. At the beginning of the second session (prior to intervention), probes to assess next day retention were administered, and this assessment was the only activity during the third session. Retention probes administered during the second and third sessions only contained words targeted on the previous day (the first and second weekly session, respectively). During the fifth week of the study, the interventionists

123

J Behav Educ (2011) 20:117–137 123 collected maintenance, generalization, and social validity data, for those words that the student read correctly across all next day retention assessments.

The order in which interventions were administered was counterbalanced across sessions and students (e.g., IR-ORC, TD-ORC, IR-TC, TD-TC for session 1;

TD-TC, IR-ORC, TD-ORC, IR-TC for session 2). Before beginning each of the interventions described below, the interventionist read each of the targeted words to the child and then asked the child to repeat the word. This part of the intervention was not timed. The interventionist then used a stopwatch to record the length of each session, with timing beginning with the presentation of the first flashcard in the

IR or TD procedure following the modeling portion of the intervention. The interventionist discontinued timing after either (a) the student read the last word in the opportunities to respond held constant conditions or (b) 3 min had elapsed in the time held constant conditions. In all conditions, the students received verbal praise

(e.g., ‘‘good job’’ or ‘‘nice work’’) for all words read correctly within 3 s and received corrective feedback for any miscues (i.e., words read incorrectly). The following sections provide a description of each instructional condition.

Traditional Drill and Practice

In the TD conditions, the interventionist presented all targeted words to the students according to the prescribed TD method as follows: 1U, 2U, 3U, 1U, 2U, 3U, 1U,

2U, 3U, 1U, 2U, 3U, 1U, 2U, 3U. In the TD-ORC condition, the entire TD procedure using three unknown words was implemented five times. However, in the

TD-TC condition, the interventionist continued the procedure until 3 min of instructional time had elapsed, with varying practice opportunities depending on how quickly the student responded and how much error correction the student needed.

Incremental Rehearsal

In the IR conditions, the interventionist presented the three unknown words targeted for instruction with five known words as follows: 1U, 1K, 1U, 1K, 2K, 1U, 1K, 2K,

3K, 1U, 1K, 2K, 3K, 4K, 1U, 1K, 2K, 3K, 4K, 5K, 2U, 1K, etc. According to standard IR procedures, the first unknown word targeted for instruction becomes the first known word after introducing the second unknown word, therefore affording further practice (particularly for the first several unknown words). This folding in of newly learned words did not occur in the current study so that opportunities to respond could be held constant across the TD-ORC and IR-ORC conditions and across words targeted with IR-ORC. In the IR-ORC condition, the interventionist implemented the entire modified IR procedure one time, such that the student practiced each of the three unknown words five times. However, in the IR-TC condition, the procedure was repeated until 3 min of instructional time had elapsed.

Therefore, as in the TD-ORC condition, the number of opportunities to respond varied. Although the five known words did not change within each session (they were not replaced with recently learned words), each intervention session involved a

123

124 J Behav Educ (2011) 20:117–137 new set of known words selected from the list of words that were known in the pre-assessment.

Measures

Retention

One day after the intervention session, the interventionist administered a retention probe including all of the unknown words instructed in the previous session (i.e., 3 unknown words per instructional condition, totaling 12 unknown words across all conditions), which served as a measure of intervention effectiveness. The interventionist presented words one at a time in random order and prompted the student to read each word aloud. If the student accurately read the word in 3 s or less, the word was considered learned. If the student incorrectly read the word or did not respond in 3 s, the word was considered unknown. The interventionist did not provide any feedback to students for their responses; however, they received verbal praise for ‘‘working hard’’ during the assessment.

Instructional Effectiveness and Efficiency

Instructional effectiveness was defined as the cumulative number of words read accurately on the next day retention probe, whereas instructional efficiency was defined as the cumulative rate of words learned. The rate of words learned or instructional efficiency (Cates et al.

2003 ) was calculated by multiplying the number

of words read accurately (WRA) on the next day retention probe by 60 s and dividing that sum by the instructional time in seconds (WRA 9 60 s/Instructional

Time [s]).

Maintenance

Students received a maintenance probe 1 week after the intervention concluded (5th week of study) to determine how many of the learned words were maintained over time. The interventionist presented the student with a randomly ordered deck of flashcards containing only the words read correctly on the retention probes and asked the student to read each word aloud. Words were considered correct if the student pronounced them accurately within 3 s. Again, students did not receive feedback for their responses, but did receive verbal praise for working hard during the assessment.

Generalization

The interventionist also administered a generalization probe 1 week after the intervention concluded (i.e., same day as the maintenance probe administration) to assess whether the students could read the targeted words correctly when presented in sentences. Sentences were developed in a manner consistent with previous research (Nist and Joseph

2008

), which included one learned word (based on

123

J Behav Educ (2011) 20:117–137 125 retention data), as well as four to six monosyllabic words. The interventionist presented sentences one at a time to the students and prompted the students to read each sentence aloud. Responses were considered correct if the student read the target word accurately in the context of the sentence. Conversely, responses were considered incorrect if the student was not able to accurately pronounce the target word or if the student paused for 3 s or longer. As in the retention and maintenance assessments, students only received general verbal praise for working hard and did not receive feedback for correct or incorrect responses.

Social Validity

A structured interview was created to assess student acceptability of each intervention (IR, TD). At the end of the study, the interventionist modeled the instructional conditions for the students and asked them a series of social validity questions. The interview consisted of the following questions: (a) How many words did you learn? (Response set: None, Some, A lot); (b) Which one did you like better

(i.e., instructional condition)? Did you like the blue one (i.e., IR) or the yellow one

(i.e., TD) better (Response set: Blue or Yellow)? (c) Why do you like the blue or yellow one better (Response set: Open ended)? (d) Why did not you like the blue or yellow one as much (Response set: Open ended)?

Interobserver Agreement

Interobserver agreement (IOA) data were collected for 50% of the next day retention assessments, and 100% of the maintenance and generalization assessments. Agreement was calculated by summing the total number of agreements on student correct responses and errors, dividing by the total number of words assessed, and multiplying by 100%. Overall, agreement was high for retention, maintenance, and generalization probes (93, 94 and 97%, respectively).

Procedural Integrity

The data collectors determined procedural integrity using an eight-item procedural integrity checklist. Observers placed a checkmark in the ‘‘yes’’ column if the interventionist implemented a procedural step as indicated on the checklist or placed a checkmark in the ‘‘no’’ column if the interventionist did not implement a procedural step as indicated on the checklist. Data collectors recorded procedural integrity for 50% of the intervention sessions. According to these data, the interventionists followed 100% of procedural steps.

Experimental Design

The research questions were evaluated using a series of multielement designs (one for each student), which is a type of alternating treatment design. We selected the multielement design as it is an appropriate method for identifying differences

123

126 J Behav Educ (2011) 20:117–137 between two or more conditions over time (Riley-Tillman and Burns

2009 ; Ulman

et al.

1975

).

Results

Summaries of individual student performance on next day retention probes across the four intervention conditions (effectiveness) and efficiency scores for those data are provided in Figs.

1

and

2 , respectively. Tables

1

and

2

provide summaries of cumulative retention, maintenance and generalization data for each student for each of the opportunities to respond held constant and time held constant conditions, respectively.

Effectiveness of IR and TD: Holding Opportunities to Respond Constant

The effectiveness of the IR-ORC and TD-ORC conditions was not clearly differentiated in any of the multielement designs for individual students. That is, there was not clear separation of data series for each intervention for any student

(Fig.

1 ; Table

1 ). Three out of the four students retained numerically more words

targeted in the TD-ORC condition; however, differences between conditions were minimal (one or two words for each student). It should be noted that when opportunities to respond were held constant, the session length (in minutes) was much longer for IR-ORC ( M = 3.00; SD = 0.57) than for TD-ORC ( M = 0.85;

SD = 0.41).

Effectiveness of IR and TD: Holding Time Constant

When time was held constant across conditions (3 min per session for each condition), the effectiveness of the IR-TC and TD-TC conditions was not clearly differentiated (Table

2

). Students read slightly more words correctly that were targeted in the IR-TC (between 5 and 15 words) than the TD-TC condition (between

3 and 13 words). Three out of four students read more words accurately on next day retention probes that were targeted in the IR-TC than the TD-TC condition. Again, differences across conditions were small with the exception of Student 1 who read six more words correctly that were targeted in IR-TC than TD-TC (15 compared to 9). However, Student 3 read more words accurately that were targeted in the

TD-TC condition (13 words compared to 7 words in the IR-TC condition). Since IR involves the presentation of known words in addition to the unknown words targeted for instruction, students received many more opportunities to respond to unknown words in the TD-TC ( M = 78.53; SD = 14.48) compared to the IR-TC

( M = 17.13; SD = 2.22) condition.

Efficiency of IR and TD

An examination of the multielement designs in Fig.

2

clearly demonstrated that TD was more efficient than IR when opportunities to respond were held constant

123

J Behav Educ (2011) 20:117–137 127

Fig. 1 Multi-element design of instructional effectiveness across four students

(Table

1

). Students learned more words per instructional minute in the TD compared to the IR condition (rates between 0.58 and 2.21 and between 0.13 and

0.42 words learned per minute of instruction, respectively). However, when holding time of intervention session constant (Table

2

), the number of words learned per instructional minute was comparable (between 0.13 and 0.54 for TD-TC; between

123

128 J Behav Educ (2011) 20:117–137

Fig. 2 Multi-element design of instructional efficiency across four students

0.21 and 0.63 for IR-TC), despite there being three to six times as many opportunities to respond in the TD-TC compared to the IR-TC condition.

Maintenance and Generalization

An examination of maintenance data revealed no consistent differences between TD and IR across students. Student 1 maintained more words that were targeted in

123

J Behav Educ (2011) 20:117–137 129

Table 1 Number of words read correctly, mean time of instruction, rate of words read for each student by instructional conditions controlling for opportunities to respond

W

IR-ORC

M Time (SD) Rate W

TD-ORC

M Time (SD) Rate

Retention

Student 1

Student 2

Student 3

Student 4

Mean

Maintenance

Student 1

Student 2

Student 3

Student 4

Mean

Generalization

Student 1

Student 2

Student 3

Student 4

Mean

5

3

4

2

9

9

8

3

7.25

4

5

3

2

3.50

3.50

2.96 (0.51)

2.69 (0.49)

3.36 (0.55)

2.98 (0.59)

2.99 (0.54)

0.38

0.42

0.30

0.13

0.31

0.17

0.23

0.11

0.08

0.15

0.21

0.14

0.15

0.08

0.15

10

7

9

5

7.75

7

5

3

1

4.00

9

4

4

2

4.75

0.56 (0.13)

0.53 (0.14)

1.20 (0.34)

1.09 (0.42)

0.85 (0.26)

2.21

1.65

0.93

0.58

1.34

1.99

0.94

0.42

0.23

0.90

W number of words read correctly, M Time (SD) number of minutes, Rate rate of words read (number of words read accurately/time)

1.55

1.18

0.31

0.12

0.79

TD-ORC than IR-ORC (7 words compared to 4 words). However, Student 4 maintained one more word in the IR-ORC compared to TD-ORC condition and the remaining two students maintained the same number of words across the two conditions. Consideration of instructional time indicated that students maintained more words per instructional minute in the TD-ORC condition (between 0.12 and

1.55 words) compared to the IR-ORC condition (between 0.08 and 0.23). A different pattern emerged, however, when holding time constant across conditions.

Three of the four students maintained slightly more words that were targeted in the

TD-TC condition compared to the IR-TC condition, and since time was held constant across conditions, the rate of learning mirrored these findings. Of interest, results for Student 1 favored IR-TC over TD-TC (10 and 7 words maintained, respectively).

In terms of generalization, no consistent pattern emerged across students.

However, as might be expected, Student 1 generalized more words that were targeted in TD compared to IR (9 words compared to 5 words) when holding opportunities to respond constant and Student 1 generalized more words that were targeted in IR compared to TD (11 words compared to 7 words) when holding time constant.

123

130 J Behav Educ (2011) 20:117–137

Table 2 Number of words read, mean opportunities to respond, rate of words read for each student by instructional condition controlling instructional time

W

IR-TC

M OTR (SD) Rate W M

TD-TC

OTR (SD) Rate

Retention

Student 1

Student 2

Student 3

Student 4

Mean

Maintenance

Student 1

Student 2

Student 3

Student 4

Mean

Generalization

Student 1

Student 2

Student 3

Student 4

Mean

15

8

7

5

8.75

10

5

2

0

4.25

11

6

2

1

5.00

18.50 (1.50)

19.25 (1.67)

15.00 (2.83)

15.75 (2.87)

17.13 (2.22)

0.63

0.33

0.29

0.21

0.37

0.42

0.21

0.08

0

0.18

0.46

0.25

0.08

0.04

0.21

9

7

13

3

8.00

7

6

6

1

5.00

7

6

4

1

4.50

79.75 (14.18)

119.5 (13.88)

49.86 (14.39)

65.00 (15.48)

78.53 (14.48)

0.38

0.29

0.54

0.13

0.34

0.29

0.25

0.17

0.04

0.19

W number of words read correctly, M OTR (SD) mean opportunities to respond (standard deviation),

Rate rate of words read (number of words read accurately/time)

0.29

0.25

0.25

0.04

0.21

Social Validity

With respect to social validity, all students felt that they learned ‘‘a lot’’ of words and enjoyed working with their interventionist. After a demonstration of the interventions (i.e., traditional drill and incremental rehearsal), the interventionist asked each student to indicate which method they preferred. Half of the students reported preferring the IR method (Students 1 and 4), and half preferred the TD method (Students 2 and 3). The students who preferred the IR method reported favoring it because it included known words and because it did not excessively repeat words, as in the TD condition. However, students who preferred the TD method reported favoring it because they liked the repetition of unknown words.

Discussion

The goal of this study was to extend prior research examining the effectiveness and efficiency of flashcard drill interventions. Typically, studies comparing these interventions have attempted to control the number of times students respond to instructional targets (i.e., words) across intervention conditions. However, because

123

J Behav Educ (2011) 20:117–137 131 interspersal methods involve the presentation of both known and unknown words, they take longer to implement than TD wherein only unknown words are presented.

Because interspersal methods take more time to administer than TD when holding opportunities to respond constant, it is not surprising that numerous studies have found the efficiency of TD to be superior to interspersal methods (e.g., Nist and

Joseph

2008 ). Moreover, comparing rates of learning between interventions when

the length of intervention sessions is not consistent has been criticized since students may learn more during initial trials than in later trials within the same intervention session (Skinner

2008 ). According to Skinner ( 2008

), a more practical approach to comparing interventions is to hold time of intervention session constant and allow opportunities to respond to vary. Because we were interested in determining which intervention strategy made the best use of an equivalent number of opportunities to respond and which intervention was superior when the same amount of time was available, we compared TD to a modified version of IR when holding both opportunities to respond and time constant across intervention methods.

When opportunities to respond were held constant across flashcard drill methods, results were consistent with those of previous studies showing TD to be more efficient than IR (Joseph and Schisler

2007

; Nist and Joseph

2008

). In fact, the rate of student learning based on next day retention was between 3 and 6 times as large in the TD condition as in the IR condition. This finding was not entirely surprising, given the fact that significantly more time was spent carrying out the IR intervention over the four-week period ( M = 23.98 min per student) than the TD intervention ( M = 6.77 min per student). Given these differences in dedicated intervention time, however, it was somewhat surprising that the overall effectiveness of TD and IR was comparable for the next day retention (IR-ORC = 3–9 words; TD-ORC = 5–10 words), maintenance (IR-ORC = 2–5 words, TD-ORC =

1–7 words), and generalization (IR-ORC = 2–5 words, TD-ORC = 2–9 words) probes. This stands in contrast to results of extant studies that found IR to be superior to TD when holding opportunities to respond constant and in which the length of time needed to administer the intervention was not considered (e.g., Nist and Joseph

2008

). It should be noted, however, that in previous studies, opportunities to respond for IR were underestimated because these authors did not consider the folding in of previously unknown words in IR, whereas the current study eliminated folding in so that opportunities to respond were equivalent across interventions. Hence, our findings suggest that it is possible that the advantages found for IR in previous studies may be at least partially accounted for by the additional opportunities to respond afforded via the folding in procedure. Moreover, it is possible that the folding in component of IR offers benefits beyond simply providing additional opportunities to respond. Indeed, the conversion of unknown words to known words magnifies the incremental presentation of instructional stimuli. In addition, presentations of newly acquired words may offer an additional challenge in recalling subsequent words targeted for instruction in the same session.

An additional unexpected finding was that differences in effectiveness were minimal for next day retention (TD-TC = 3–13 cumulative words, IR-TC = 5–15 cumulative words), maintenance (TD-TC = 1–7 cumulative words, IR-TC = 0–10

123

132 J Behav Educ (2011) 20:117–137 cumulative words), and generalization (TD-TC = 1–7 cumulative words, IR-TC =

1–11 cumulative words) probes when holding length of instructional session constant and allowing opportunities to respond to vary across interventions. Given that the length of instructional sessions was held constant across conditions, efficiency rates were also similar for TD and IR. This was particularly surprising because although the interventions were conducted for the same period of time

(i.e., 3 min), differences in opportunities to respond across interventions were substantial (between 3 and 6 times as many in the TD-TC as compared to IR-TC).

Specifically, students had far fewer opportunities to respond to instructional targets in the IR condition ( M = 17.13; SD = 2.90) than the TD condition ( M = 78.53;

SD = 29.93). It should be noted that each session targeted only three words within each condition. If additional words had been targeted during TD sessions, it is likely that the additional time in the time held constant condition would result in notable improvements commensurate with the increase in instructional time.

However, individual differences in the number of words a student can practice and later recall, also known as a student’s acquisition rate (Glicking and Thompson

1985

; see also Cesaro

1967

), is an important consideration. Targeting too many words in a single session may have had a negative impact on instructional efficiency (see Burns

2001

) and has been associated with increases in off-task behavior (Burns and Dean

2005 ).

Interesting comparisons also were noted within intervention type. Within the TD condition, instructional sessions were nearly three times as long when holding time constant as they were when holding opportunities to respond constant. Even so, the overall instructional gains were roughly equivalent (e.g., between 3 and 13 words retained in TD-TC, between 5 and 10 words retained in TD-ORC). One explanation for this finding is that the benefits of TD may be gained in early trials, and repeating learned words provides no additional benefit (cf., Skinner

2008 ). The effectiveness

and efficiency of IR was somewhat better in the time held constant condition than in the opportunities to respond held constant condition for Students 1 and 4, but was slightly worse or comparable for Students 2 and 3. Although the length of sessions was almost identical overall, students had slightly more opportunities to respond in the time held constant versus opportunities to respond constant condition

( M = 17.13 and M = 15, respectively). This raises the question as to why students had more opportunities to respond in the time held constant condition. One explanation was that students knew they were being timed and may have worked faster as a result.

An examination of data for individual students revealed that no one intervention was most effective across student participants. Within the opportunities to respond constant condition, for example, maintenance outcomes for TD were either better

(Student 1), worse (Student 4), or the same as (Students 2 and 3) outcomes for IR depending on the student examined. Results of generalization probes were similarly inconsistent across students; however, again TD and IR were equally effective for half of the students. Furthermore, it could be argued that neither intervention was effective for Student 4. This suggests that it may be difficult to select a flashcard drill procedure that will consistently outperform another across all students.

123

J Behav Educ (2011) 20:117–137 133

Limitations

Although findings suggest interesting implications with regard to use of flashcard drill procedures, limitations of the study must be noted. First, the study involved a very small sample ( N = 4), thereby limiting generalizations that can be made from the current dataset. One advantage to examining data at the single case level, however, was that we were able to detect the degree of variability in response to intervention that occurred across students and conditions. That is, in comparison to studies that have suggested the superiority of one intervention over another at the group level (e.g., Joseph and Schisler

2007

; MacQuarrie et al.

2002 ), results of the

current study suggest that one intervention may not be more effective or efficient for all students. Clearly, these results must be replicated and extended to other content areas.

Second, the current study modified standard IR procedures in order to make the process of holding opportunities to respond constant feasible. Typically, unknown words are converted to known words as students move through the IR instructional sequence; however, this conversion did not take place in the current study. As a result, students received fewer exposures to a particular unknown word in the modified IR procedure than is typical practice. In addition, practice with initially unknown words was less distributed (i.e., spaced out), despite the fact that some have suggested that the effectiveness of IR may be explained by the distributed practice effects afforded by the procedure (MacQuarrie et al.

2002 ). Although this

modification was necessary in order to hold opportunities to respond constant across interventions, it seems to have had a negative impact on the ability of students to retain, maintain, and generalize the words targeted for intervention. Needed are studies to directly compare IR with and without folding in so that the incremental benefit of folding in can be quantified.

Third, we only measured maintenance and generalization on one occasion (in

Week 5 of the study). Therefore, the latency between intervention and these assessments differed substantially depending on which week the assessed words were targeted for intervention, therefore limiting the interpretability of these findings. We examined the impact differential latency may have had on the results by charting each student’s maintenance and generalization scores by week of intervention (Week 1–Week 4). The data series for each participant demonstrated a linear positive trend with students maintaining and generalizing many more words that were targeted in Week 4 as compared to Week 1. For both maintenance and generalization, students, as a group, read twice as many words targeted in Week 4 correctly compared to those targeted in Week 1. These findings, though tentative, highlight the need for more attention to the maintenance and generalization of the effects of academic interventions. Such studies may indicate the need for programming to maximize the number of words children can maintain and generalize. One potential benefit to IR that has not been studied systematically would be to extend the folding in of recently learned words from words learned within the present session to words learned in previous sessions. Though, several studies have demonstrated IR to be less efficient than drill techniques involving only unknown words (e.g., TD), such programming for long-term retention could amount

123

134 J Behav Educ (2011) 20:117–137 to gains in maintenance that would not be reflected in measures of instructional efficiency calculated from next day retention outcomes.

Fourth, the current study only investigated one fixed interval of time (i.e., 3 min) in the time held constant conditions. We selected this relatively short period of time in order to ensure the feasibility of carrying out four interventions within a given instructional block. Although the 3-min period likely provided sufficient practice within the TD condition (given that the entire procedure could be carried out approximately three times), this may not have been true for the IR condition (for which the average implementation time was roughly 3 min). As suggested by

Skinner (

2008 ), it may be that the effectiveness of a given instructional strategy may

fluctuate depending on the length of the instructional session. Therefore, it is important that future research investigates the relative effectiveness and efficiency of these drill methods across instructional periods of varying length.

Finally, although only three words were targeted in each condition, a total of 12 words were targeted in each intervention session across the four conditions. It is likely that targeting so many words in each session contributed to the relatively weak response to the interventions across participants (e.g., Burns

2001

). Nist and

Joseph (

2008 ) targeted even more words (18) in each session. However, in that

study, a word missed on next day retention assessments was targeted again until it was learned. Since we targeted each word only in one session, it is difficult to draw direct comparisons across studies.

Future Directions

First, the current study should be replicated in large diverse samples to determine whether these findings are reliable and generalize to other populations. Although the current study sought to compare IR and TD under tight experimental control, the adaptation of IR to exclude folding in of newly learned words may actually have identified folding in as a critical ingredient of IR. Future studies should examine this issue further by comparing IR procedures with and without folding in. Distributed practice of recently learned words could be extended to class-wide interventions, where practice could be distributed over longer periods of time (e.g., during a class period or even throughout the school day). Finally, this study supports Skinner’s argument regarding the disadvantages of comparing interventions while holding opportunities to respond constant and demonstrates that there are at least two ways to study instructional efficiency. Holding opportunities to respond constant allows for careful examination of the best use of instructional trials, but interpreting results to make instructional decisions (e.g., which intervention would be best for a 10-min intervention session) may be hazardous because efficiency scores may be drawn from interventions of different lengths. However, when instructional time is held constant across interventions, the effectiveness score is the efficiency score, and the practical question regarding an intervention’s efficiency is provided in directly translatable terms. Targeting too many words per session in the current study likely limited the effectiveness of the intervention for participants. Future studies should carefully balance concerns related to ceiling effects and rates of acquisition.

123

J Behav Educ (2011) 20:117–137 135

Implications for Practice

Results of the current study afford practical implications for school-based professionals. When instructional time was held constant in the current study, one method was not significantly better (i.e., more effective or efficient) than the other across all students. Although the results require replication and extension examining different numbers of instructional targets, this tentatively suggests that, given a fixed intervention period, practitioners may best make choices between TD and IR based on brief experimental analysis (e.g., Daly et al.

1999 ). Results additionally suggest

implications for schools operating within a response-to-intervention framework.

Traditional drill may be an instructional procedure that requires very little time or resources to implement; however, gains in sight word recognition were consistently noted across student participants given eight biweekly sessions. As noted by Skinner

(

2008

), all too often the price of academic remediation comes at the cost of physical education or art or recess, given that there are only so many minutes in a school day.

However, practitioners can feasibly implement drill procedures within the regular classroom with minimal disruption to busy schedules. In this study, we examined interventions lasting 3 min or less, yet found instructional efficiency to be similar to estimates identified by investigators studying much longer intervention sessions (cf., Nist and Joseph

2008 ). This finding suggests that these flashcard drill

procedures need not require removing students from classroom or recreational activities, but instead could be administered in the classroom by an aide or volunteer on a rotating basis to each student in the classroom (cf. Volpe et al.

2011 ).

References

Adams, M. (1990).

Beginning to read . Cambridge, MA: MIT Press.

Browder, D. M., & Shear, S. M. (1996). Interspersal of known items in a treatment package to teach sight words to students with behavior disorders.

Journal of Special Education, 29 , 400–413.

Browder, D. M., & Xin, Y. P. (1998). A meta-analysis and review of sight word research and its implications for teaching functional reading to individuals with moderate to severe disabilities.

Journal of Special Education, 32 , 130–153.

Burns, M. K. (2001). Measuring acquisition and retention rates with curriculum-based assessment.

Journal of Psychoeducational Assessment, 19 , 148–157.

Burns, M. K. (2004). Empirical analysis of drill ratio research: Refining the instructional level for drill tasks.

Remedial and Special Education, 25 , 167–173.

Burns, M. K. (2007). Reading at the instructional level with children identified as learning disabilities:

Potential implications for response-to-intervention.

School Psychology Quarterly, 22 , 297–313.

Burns, M. K., & Dean, V. J. (2005). Effect of acquisition rates of off-task behavior with children identified as having learning disabilities.

Learning Disability Quarterly, 28 , 273–281.

Burns, M. K., Dean, V. J., & Foley, S. (2004). Preteaching unknown key words with incremental rehearsal to improve reading fluency and comprehension with children identified as reading disabled.

Journal of School Psychology, 42 , 303–314.

Burns, M. K., Ardoin, S. P., Parker, D. C., Hodgson, J., Klingbeil, D. A., & Scholin, S. E. (2009).

Interspersal technique and behavioral momentum for reading word lists.

School Psychology Review,

38 , 428–434.

Cates, G. L., Skinner, C. H., Watson, T. S., Meadows, T. J., Weaver, A., & Jackson, B. (2003).

Instructional effectiveness and instructional efficiency as considerations for data-based decision making: An evaluation of interspersing procedures.

School Psychology Review, 32 , 601–616.

123

136 J Behav Educ (2011) 20:117–137

Cesaro, J. (1967). The interference theory of forgetting.

Scientific American, 217 , 117–124.

Daly, E. J., Martens, B. K., Hamler, K. R., Dool, E. J., & Eckert, T. L. (1999). A brief experimental analysis for identifying instructional components needed to improve oral reading fluency.

Journal of

Applied Behavior Analysis, 32 , 83–94.

Dempster, F. N. (1991). Synthesis of research on reviews and tests.

Educational Leadership, 48 , 71–76.

Dunlap, G. (1984). The influence of task variation and maintenance tasks on the learning and affect of autistic children.

Journal of Experimental Child Psychology, 37 , 41–64.

Dunlap, G., & Koegel, R. L. (1980). Motivating autistic children through stimulus variation.

Journal of

Applied Behavior Analysis, 13 , 619–627.

Glicking, E., & Thompson, V. (1985). A personal view of curriculum-based assessment.

Exceptional

Children, 52 , 205–218.

Good, R. H., Gruba, J., & Kaminski, R. A. (2001). Best practices in using dynamic indicators of basic early literacy skills (DIBELS) in an outcomes-driven model. In A. Thomas & J. Grimes (Eds.), Best practices in school psychology IV (pp. 679–700). Washington, DC: National Association of School

Psychologists.

Greenwood, C. R., Delquadri, J., & Hall, R. V. (1984). Opportunity to respond and student academic performance. In W. Heward, T. Heron, D. Hill, & J. Trap-Porter (Eds.), Focus on behavior analysis in education (pp. 58–88). Columbus, OH: Charles E. Merrill.

Joseph, L. M., & Nist, L. M. (2006). Comparing the effects of unknown-known ratios on word reading learning versus learning rates.

Journal of Behavioral Education, 15 , 69–79.

Joseph, L. M., & Schisler, R. A. (2007). Getting the ‘‘most bang for your buck:’’ Comparison of the effectiveness and efficiency of phonic and whole word reading techniques during repeated reading lessons.

Journal of Applied School Psychology, 24 , 69–90.

MacQuarrie, L. L., Tucker, J. A., Burns, M. K., & Hartman, B. (2002). Comparison of retention rates using traditional drill sandwich and incremental rehearsal flashcard methods.

School Psychology

Review, 31 , 584–595.

Matchett, D. L., & Burns, M. K. (2009). Increasing word recognition fluency with an English language learner.

Journal of Evidence Based Practices in Schools, 10 , 194–209.

Nist, L., & Joseph, L. M. (2008). Effectiveness and efficiency of flashcard drill instructional methods on urban first-graders’ word recognition, acquisition, maintenance, and generalization.

School

Psychology Review, 37 , 294–308.

Riley-Tillman, T. C., & Burns, M. K. (2009).

Evaluating educational interventions: Single-case design for measuring response to intervention . New York: The Guilford Press.

Samuels, S. J. (1997). The method of repeated readings.

The Reading Teacher , 50 , 376–381. (Reprinted from The Reading Teacher , 32 , by S.J. Samuels, 1979).

Schmidgall, M., & Joseph, L. M. (2007). Comparison of phonic analysis and whole word-reading on first graders’ cumulative words read and cumulative reading rate: An extension in examining instructional effectiveness and efficiency.

Psychology in the Schools, 44 , 319–332.

Schneider, R. M., & Shiffrin, R. M. (1977). Controlled and automatic human information processing:

I. Detection, search, and attention.

Psychological Review, 84 , 1–66.

Skinner, C. H. (2002). An empirical analysis of interspersal research evidence, implications, and applications of discrete task completion hypothesis.

Journal of School Psychology, 40 , 347–368.

Skinner, C. H. (2008). Theoretical and applied implication of precisely measuring learning rates.

School

Psychology Review, 37 , 309–314.

Skinner, C. H., Belfiore, P. J., Mace, H. W., Williams-Wilson, S., & Johns, G. A. (1997). Altering response topography to increase response efficiency and learning rates.

School Psychology

Quarterly, 12 , 54–64.

Symonds, P. M., & Chase, D. H. (1992). Practice vs. motivation.

Journal of Educational Psychology, 84 ,

282–289.

Tan, A., & Nicholson, T. (1997). Flashcards revisited: Training poor readers to read words faster improves their comprehension of text.

Journal of Educational Psychology, 59 , 276–288.

Tucker, J. A. (1988).

Basic flashcard technique when vocabulary is the goal. Unpublished teaching material . Chattanooga, TN: University of Tennessee at Chattanooga.

Ulman, J. D., & Sulzer-Azaroff, B. (1975). Multi-element baseline design in educational research.

In E. Ramp & G. Semb (Eds.), Behavior analysis: Areas of research and application (pp. 371–391).

Upper Saddle River, NJ: Prentice Hall.

123

J Behav Educ (2011) 20:117–137 137

Volpe, R. J., Burns, M. K., DuBois, M., & Zaslofsky, A. F. (2011).

Computer-assisted tutoring: Teaching letter sounds to kindergarten students using incremental rehearsal Psychology in the Schools, 48 ,

332–342.

123

Download