Original Article Developing second language speaking skills: Eliciting repeated speech to increase fluency and accuracy Colleen K Davy (Department of Psychology), Carnegie Mellon University Brian MacWhinney (Department of Psychology), Carnegie Mellon University Abstract Repeated production of several-minute speeches on a given topic leads to increases in fluency and complexity of those speeches (Bygate et al. 2001) as well as other speeches given weeks later (de Jong & Perfetti, 2011). However, this type of task does not improve accuracy, because the task demands do not allow the speaker to immediately correct errors, which is necessary for greater accuracy. Highly constrained, sentence-level rehearsal exercises (i.e. Yoshimura & MacWhinney (2007) may be more conducive to acquiring fluent and accurate production of new structures and vocabulary items. This study investigates the use of a repeated sentence imitation exercise designed to improve second language speech fluency in native English L2 Spanish learners. We find that, though the exercise is not communicative in nature, it requires them to process the meaning of the sentence and reconstruct it, rather than simply echoing what they hear. Furthermore, this task leads to increases in fluency and accuracy with each production. Finally, we show that manipulating the sentences used in practice can adjust the fluency and accuracy of production of similar sentences. We conclude that imitation can be useful as both as a pedagogical tool and as a method for studying processes in second language learning. Keywords Repeated imitation, fluency, second language acquisition, speaking Introduction Speaking is typically considered the hardest skill to acquire when learning a second language. Language learners continue to struggle in the development of fluency in speech production, often failing to achieve improvements in speaking fluency even after two years of intensive and immersive language instruction (Derwing et al., 2007). One potential reason for this is that, in situations where the learner does not speak the language much outside the classroom, there are few opportunities for the development of speaking skills. Furthermore, Rossiter, Derwing, Manimtim, and Rhomson (2010) have shown that many foreign language textbooks and teacher resource manuals do not provide opportunities for speaking, and those that do often fail to focus on fluency. They note that many of the speaking activities provided are free-production tasks, with little control over the structures and vocabulary produced. Further, these activities tend not to provide opportunities for rehearsal or repetition, as repetition and rehearsal in the language classroom are considered non-realistic. In a recent paper, DeKeyser (2011) makes a case for the importance of rehearsal, suggesting that teachers can develop tasks that provide repetitive speaking activities while still being realistic. However, while naturalistic activities can be more useful, they require effort and a significant amount of cleverness to create appropriate activities. We suggest that there may be a role in second language acquisition (SLA) for non-naturalistic activities that provide repeated practice of speaking. In this paper we will provide evidence that highly constrained rehearsal activities can be useful for developing speaking skills, even when they are not entirely naturalistic. Developing speaking skills through rehearsal Current second language pedagogy tends to emphasize the importance of learning through naturalistic, communicative speaking tasks sometimes complemented with structured input (i.e., Krashen, 1982; Van Patten and Cadierno, 1993). This format provides little room for repetitive practice. However, research on skill acquisition (i.e., Carlson, Sullivan, and Schneider, 1989; Anderson, 1993) has long suggested that performing complex tasks requires large amounts of repeated practice to achieve fluent execution. Responding to this issue, DeKeyser (2011) has recently called for a return to a focus on rehearsal in the second language classroom, suggesting that, while speaking tasks should be naturalistic (i.e., more realistic than the traditional audio-lingual method of mere listen-and-repeat tasks), a bit of realism can be sacrificed in favor of ensuring sufficient practice on a wider variety of grammar and vocabulary than can typically be covered in truly naturalistic contexts. Bygate (2001) found that having speakers perform the same speaking activity many times over a ten-week period led to increases in fluency and complexity of that speech, even with weeks in between each rehearsal. A common format for such rehearsals is the 4/3/2 task (Nation, 1989), in which speakers give a speech on a topic in four minutes, then three minutes, then two. De Jong and Perfetti (2011) found that use of the 4/3/2 task led to improvements in fluency not only on the rehearsed speeches, but also on other speaking activities completed weeks after training. Moreover, these effects were only found if the practice involved rehearsal on the same speech: the control condition which practiced three different speeches did not show these effects, showing that it is not simple speaking practice that leads to improvements, but the repeated nature of the practice. The authors of these studies suggest two possible explanations for the increases in performance. First, the first repetition activates the useful lexical and grammatical nodes needed for performance, and these remain partially activated during the second repetition, leading to easier selection in the short term, which leads to a lower cognitive load and allowing more resources for producing speech fluently. Second, as shown by skill acquisition models (i.e., Anderson’s ACT-R model (1993)), multiple repetitions of the same task can lead to proceduralization of grammatical forms or faster retrieval of lexical items, which then leads to lowered cognitive load in producing utterances, allowing for more efficient planning of future speech. However, given the short time period and low number of practice trials, it seems unlikely that enough proceduralization is occurring to lead to the observed effect. We offer a third explanation, which is simply that the repetition provides “fluency facilitation” by allowing the speaker to focus on fluency rather than other aspects of the task. This is similar to Skehan’s (2009) suggestion that when the speaker is given multiple opportunities to give a speech, they will refocus their attention each time. Thus, rather than necessarily leading to proceduralization, repeating a task allows the speaker to refocus their attention on a different dimension; in the case of Bygate’s study, the relevant dimension is complexity and fluency rather than accuracy. This fluency facilitation effect is consistent with the finding that, while fluency and, in some cases, complexity increased, these studies did not find increases in accuracy. If these increases were due to proceduralization, accuracy should increase as well, since proceduralization often begins with the refinement of procedures to produce correct output. Bygate (2001) posits that increases in accuracy cannot occur in longer monologic or dialogic tasks, even those that allow for repeated practice, because the gap between the repetitions is too long. Moreover, increases in accuracy require self-monitoring and immediate correction, which cannot occur in such long tasks. In order to allow for increases in accuracy (and potentially fluency), speakers must repeat the problem structure immediately, or at least very soon, after the error. So, while naturalistic speaking activities may fulfill the requirement of being more connected to form-meaning mappings, they still have shortcomings, both in their ability to lead to more accurate production and subsequent proceduralization of correct language and their usefulness as a method of exploring the processes and mechanisms behind the development of speaking skills. To fill in these pedagogical and theoretical gaps, we suggest that even more constrained, but less naturalistic tasks, may be useful. For example, Yoshimura & MacWhinney (2007) found that prompting repeated rehearsal of individual sentences through overt reading led to increases in fluency and accuracy of production of sentences containing new vocabulary items. They found these improvements even in as non-naturalistic a context as a read-aloud task, suggesting that highly controlled practice like this can still be useful as a method of practice in L2 learners who are still struggling with particular vocabulary items and constructions. This is important because L2 learners often avoid language that they are not yet comfortable with, which may delay acquisition of new structures and vocabulary. Furthermore, controlling the exact input and output learners use facilitates careful experimental comparisons. Although overt reading does lead to improvements, it is very different from actual speech production. A different task that does not rely on the use of reading and is more reconstructive may provide a more realistic speaking process and thus produce greater effects than a reading aloud task. We propose that sentence imitation represents a natural and effective way to prompt repeated practice of new or problematic constructions. The Repeated Imitation (RI) task Sentence repetition is often used in self-practice materials, such as the Pimsleur method, as well as informally in the classroom environment. However, the effects of those methods have seldom been subjected to detailed experimental evaluation of the type we will present here. It is important to remember that sentence imitation is a perfectly natural process. Young children often spontaneously mimic adult speech, especially words and phrases they are not yet ready to use in their own speech (for one such study on child imitation, see Bloom, Hood, & Lightbown, 1974). Some cultures specifically encourage imitative speech; for example, the Kwara-ae, a Melanesian people in the Solomon Islands, raise children with an activity referred to as calling-out, where adults prompt children to relay information or make requests of others by repeating after them. Subsequently, imitating adult speech becomes a significant part of the child’s life during the language development stage (see Watson-Gegeo and Gegeo, 1973). One fundamental concern with using imitation as a training method is that since learners are merely repeating what they hear, they may not be creating form-meaning mappings during this task that can lead to morphosyntactic development and appropriate speech usage. However, Erlam’s (2006) review of the use of the Elicited Imitation task used in L2 assessment has suggested that the ability to repeat sentences correctly is tied closely to the ability of the learner to comprehend the sentence, and that performance on this task is highly correlated with other more widely-used measurements of L2 speaking ability. We propose that imitation is useful for more than assessment. In this paper we explore the use of imitation in the form of an exercise that we refer to as Repeated Imitation, which consists of listening to a native speaker and immediately repeating what was said, then repeating this process multiple times in a row for each sentence. We suggest that the native speaker model and iterative nature of the task will allow the students to practice and monitor their own speech, providing practice without requiring a native speaker or teacher for feedback. The two studies in this paper investigate the use of this task as a method of practice for proficient language learners. We predict the task will show the following effects: 1. Through each repetition of the RI practice, participants will improve in accuracy. 2. Through each repetition of the RI practice, participants will improve in fluency. 3. Participants will be able to produce sentences they have rehearsed more quickly and accurately than those they did not practice. 4. Since RI is reconstructive and not merely echoic, the speech participants are able to produce will closely mirror the way in which they translate the sentence. Study 1 Study 1 tested using RI as a method of rehearsal elicitation, focusing specifically on whether this task would lead to improvements on producing the target sentences, and whether participants processed the sentences they repeated or were merely echoing phonological input. Methods Procedure. After obtaining informed consent, participants were seated in front of a computer and were instructed that during this task they would hear a sentence, after which they would repeat the sentence back as quickly and accurately as possible. They initiated a trial by clicking a button, at which point the word “Listen” appeared on the screen, and the sentence was presented through the computer speakers. After the sentence finished playing, the words “Repeat Now” appeared on the screen and they repeated the sentence they just heard, pressing the space bar to stop recording. After speaking, they translated the sentence into English and rated their performance on the sentence on a scale of 1 to 7, with 1 being the lowest and 7 being the highest. They repeated this process four times for each sentence, translating the sentence and rating their speech in between each trial. *** Did you really have them translating after each repetition? Why so much? Also there seems to be a confusion between “repetition” and “trial”. Why use both terms? This introduces confusion later *** Participants returned one week after the Training session for a post-test, where they heard the sentences and repeated each of them back one time, to measure the long-term effects of the practice. Stimuli. The stimuli were 40 sentences taken from the Foreign Service Institute’s Basic Spanish program. The sentences had between four and 19 words, with an average of 8.42 words, and between 9 and 31 syllables, with an average of 15.84 syllables. For this first study, we used a wide range of constructions, consisting of both statements and questions, in the indicative and subjunctive mood, and in present, preterite, imperfect, and future tense. These sentences were spoken by a number of different speakers, both male and female. Participants. The participants in this study were nine students at Carnegie Mellon University currently enrolled in a third semester Spanish class. Coding. The data in this study were transcribed using Praat speech analysis software (Boersma & Weenink, 2007) and were coded for temporal data and grammatical errors. As a measure of fluency, we calculated the mean duration of utterance (MDU) by subtracting the onset of speech from the end of production. We then coded the utterances for completeness and accuracy, first marking whether the speaker completely reproduced the sentence, then coding for a number of different errors. We coded for a series of verbrelated errors such as tense, subject-verb agreement, and conjugation errors, lexical errors such as choosing the wrong word or having incorrect number or gender, as well as pronunciation errors. We do not distinguish, however, between errors of omission (missing a word) or commission (using the wrong word). For the current analysis, we combined all errors into one measurement of Total Errors per sentence. Finally, we coded the translations provided by the participant after each repetition according to whether they matched the speech. If the speech matched the translation exactly, it was coded as a Match (even if the speech and translation were incorrect); otherwise, it was coded as Missing (containing less information than the speech), Extra (containing more information than the speech), or Meaning (containing the same amount of information but having incorrect information; for example, providing the translation in the wrong tense or using the wrong verb translation). This coding system allows us to judge whether participants had a general representation of the meaning of the speech they produced, or whether they were producing speech without being capable of conveying the meaning in their native language. Results Effect of practice on completeness. First, we investigated whether participants’ performance in repeating the sentences they heard improved with each repetition of that specific sentence. A logistic generalized linear regression on the probability of completely repeating the sentence at each repetition (1-5, with 5 being the delayed post-test) showed a significant main effect of Repetition (p<0.001), with each subsequent repetition having fewer incomplete sentences than the first trial. This effect held even at the delayed posttest (p<0.01). Effect of practice on accuracy. Next, we performed a generalized linear regression on the number of errors produced at each repetition and each trial, to see whether participants improved both on producing each individual sentence and whether they performed the task better over time. Again, there was a significant effect of Repetition, with participants making significantly fewer errors by the fourth training repetition (p<0.001); however, there was no significant difference between performance at the first repetition during Training and the test trial a week later. So, while participants were more likely to completely repeat the sentence at the delayed test, they produced more errors. Finally, there was a significant effect of Trial (p<0.0001), with participants making fewer mistakes overall as the practice progressed. Effect of practice on fluency. To measure the effects of the training on fluency, we performed a generalized linear regression of the MDU by repetition and trial number. We found a significant effect of Repetition, with participants producing the sentences more quickly by the third and fourth repetitions (p<0.0001); the post-test production was also significantly more fluent than the first and second repetition during training (p<0.01). We also found a significant main effect of Trial, with participants producing sentences more fluently later in the training than in the beginning (p<0.01). *** I have no idea what “trial number” or the “Trial factor” means and how it differs from “repetition” or “session” or what. Is this just a binary factor of Training vs Posttest? *** Translatability and speech production. We next addressed the question of whether RI is based on reconstruction, rather than repetition. If the task is reconstructive, participants’ ability to translate the sentence into English, which also requires a reconstruction of meaning, should match their ability to repeat the sentence. If participants repeat the sentence correctly but cannot give a correct translation, that indicates that they are able to simply memorize sequences of sounds and produce them, without understanding the sentence. For these analyses, we counted only trials where the participant provided a translation. Not providing a translation may be indicative of lack of understanding, but it may also simply be due to difficulties with typing or simply boredom. However, if the participant provided even one word in their translation, it was included in the analysis. As a result, of 1716 total trials, we ended up with 1498 trials where the participant provided a translation. *** I can see how they would get bored if you are asking them to repeatedly give the same translation four times. Throwing out data on that basis seems unnecessary. *** The vast majority of transcriptions matched the speech produced; of the 1468 valid trials, 1082 matched, while only 386 did not. A chi-squared test for independence showed that this was significant (χ = 329.984, p<0.001). Of the 386 non-matching trials, a vast majority (246) of those transcriptions were a meaning-related mismatch, with relatively small numbers of transcriptions having missing or extra information (47 and 93, respectively). We then broke down the results according to whether the participant had correctly produced the sentence or not. Table 1 presents descriptive information for each code, for all transcriptions, and transcriptions for correctly-produced and incorrectly-produced sentences. We then ran chi-squared tests to see whether the differences between groups were significant. Incorrect Correct Total χ2 Sig. Transcription Matching 606 476 1082 15.619 .000 Match? Different 173 73 246 40.650 .000 46 1 47 43.085 .000 75 18 93 34.935 .000 900 568 1468 Meaning Missing Trans. Extra Trans. Total Table 1: Descriptive statistics for transcription/speech matches. Chi-squared tests all with 1 degree of freedom. To further address the issue of the mismatched translations, we provide two specific examples. (1) Usted quiere decir que los profesores son exigentes, verdad? you want-3S say-INF that the professor-PL are demanding-PL, correct? ‘You mean that the professors are demanding, right?’ The word exigente, or “demanding”, was unknown to many participants. No participant correctly translated the word, though eight out of the nine participants correctly repeated it. In this case, it is clear that the participants are merely imitating. However, in this case it may actually be beneficial to them, as it allows them to practice new vocabulary that they otherwise would be unable to produce spontaneously. The second example (2), however, is a little more problematic: (2) Porque me trajeron esta carta? why me bring-3P&PRET this letter? ‘Why did they bring me this letter?’ When producing this utterance, many were able to correctly repeat the sentence at least once. However, while five participants were able to repeat this utterance, only three correctly translated it. The other two, rather than using the third person plural, used the second person singular “you” as the subject of the sentence. This illustrates a shortcoming of this type of practice: participants can accurately repeat Spanish s without fully understanding what they convey in terms of tense, mood, and person-number agreement. Regardless of whether the sentence is produced correctly in speech, the translation appears to match very closely. However, the fact that there are instances where the speaker produced more speech than they were able to translate suggests that speakers can rely somewhat on echoic memory to reproduce the sentences. As our two examples showed, this can be a positive feature of the task or a negative one, depending on the type of grammatical structure or vocabulary item being targeted by the training. Discussion We first asked whether using repetitive imitation could lead to improvements in students’ ability to repeat back aurally presented sentences. Looking at student performance across multiple repetitions of sentences, we found that with each repetition, students improved significantly, both in their fluency, as measured by initial pause and total duration, and accuracy, as measured by the number of errors produced per utterance. This suggests that, as in the Yoshimura and MacWhinney task, repeated practice can lead to marked improvements in speech production. Next, we asked whether this task prompted students to process the speech they were hearing, which may lead to reconstructing the sentence during production, rather than merely echoing what they heard. The fact that the participants’ ability to repeat the sentences back appears to match quite closely their ability to translate the sentence into English suggests that comprehension plays a strong role in the completion of this task. Moreover, the target sentences are often too long to be stored just as a series of unlinked words in working memory. These two facts suggest that accurate sentence repetition requires a conceptual understanding of the sentence. The next step is to investigate the implications of using this task as a method of improving speaking skills in the long term. Since participants practiced a wide variety of grammar structures and vocabulary through the training, it is impossible to track improvement on any specific feature over time. Study 2 trained language learners on two different structures, to investigate how performance on specific structures changes over time as a result of repeated imitation. Study 2 In Study 2 we adjusted the RI exercise to include the prompting of meaning through pictures, in order to provide more contextual information and create a stronger mapping between form and meaning. Additionally, using pictures also allows for prompting speech without providing an immediate native speaker model: participants can describe the pictures in the way modeled earlier during training and later produce speech on their own. This study contained immediate and delayed post-tests consisting of sentences either seen during the practice or of similar constructions to the trained sentences. Additionally, we manipulated the grain size of training, prompting participants to practice the sentences either in short phrases or as full sentences. In doing so, we will be able to further investigate whether this task can adjust speakers’ accuracy and fluency depending on the type of practice they receive. If speakers show different patterns of accuracy and fluency, despite receiving the same amount of practice on the same sentences, that will show that rehearsal affects the development of speaking skills in ways other than simply providing multiple opportunities for proceduralization; different types of training may differentially affect how participants allocate resources to accuracy and fluency. Methods Procedure. The study consisted of three parts: RI training, immediate post-test, and delayed post-test. The task was identical to the one used in Study 1 with three changes: first, participants practiced each sentence three times instead of four; second, the participants saw pictures (as shown in Figure 1), during sentence presentation; and third, participants were not prompted to produce a translation or give a self-rating of their performance. Figure 1: An example picture used in training and test trials. In this case, the sentence elicited would be “El sugiere que el cocine la cena.” After the RI training phase they moved to the immediate post-test. In this phase, they saw the pictures as in the training phase, but did not hear the sentence; instead, they were immediately prompted to produce a sentence to describe the pictures they saw. Half the sentences in this test had been practiced during the training phase; the other half were novel, but containing the same vocabulary and sentence structure as the trained sentences. One week later the participants came back for the delayed post-test, which was identical to the immediate post-test but presented in a different order. Stimuli. The stimuli in this study consisted of two types of sentences. The first sentence type, illustrated in (3), used two coordinated clauses, each with a verb in either the Preterite or the Imperfect. The first clause included a temporal adverb that signaled the correct tense. (3) Ayer tu limpiaste los platos y yo cociné la cena. Yesterday you clean PRET the dishes and I cook PRET the dinner. ‘Yesterday you cleaned the dishes and I cooked dinner.’ The second sentence type, illustrated in (4), included a main verb in the Present tense and a complement clause with a verb in the Subjunctive. (4) Yo aconsejo que tu limpies los platos. I suggest PRES that you wash SBJV the dishes. ‘I suggest that you wash the dishes.’ In this second sentence type, the speaker must attend to cues that trigger use of the Subjunctive mood, and manipulate conjugation in two different verb moods. All of the vocabulary and verb forms trained in this study were familiar to the participants, but the use of the verb forms had not yet been mastered. Conditions. In this study we contrasted two types of practice: practice in phrases versus practice in full sentences. The sentence types used in this study differ on dimensions of length and complexity, which may differ in the types of practice they require and the ability of that structure to push accuracy or fluency. The preterite/imperfect sentences, while they are not particularly complex, are very long, and thus put large demands on the speaker’s memory. *** you should give mean lengths of the sentences in the two different conditions *** Practicing in full sentences, as opposed to individual phrases, increases what Robinson (2005) refers to as resource-dispersing demands, or demands that require emphasis on general performance, precluding a focus on grammaticality and accuracy. Meanwhile, the subjunctive sentences are very complex while still being short in length, and subsequently are high in what Robinson refers to as resource-directing demands, which require a focus on the conceptual and grammatical features of the speech, and subsequently increase accuracy. All participants in this study received practice in both conditions. One group, the PretPhrase/SubSent condition, practiced preterite/imperfect as phrases and subjunctive sentences as full sentences. The other group, the PretSent/SubPhrase condition, practiced preterite/imperfect sentences as full sentences and subjunctive sentences as phrases. Training Conditions PretPhrase/SubSent Preterite/Imperfect Subjunctive Phrase Sentence Sentence Phrase (SC) PretSent/SubPhrase (LS) Table 1: Training tasks used for each condition. If the task’s effect on performance is related solely to the focus of attention on form and potential proceduralization of that item achieved during practice, we would expect to see a clear fluency/accuracy tradeoff by sentence type, with sentences trained in the phrase condition showing greater accuracy and those trained in the sentence condition showing greater fluency. However, if task performance is related to general factors of speaker attention and biases, training in one condition should affect performance on both types of sentences. Analyses Analyses were conducted much in the same manner as in Study 1. However, given that we were comparing two constructions that varied greatly in length, using MDU as a measurement of length would make it harder to compare the two sentence types. Instead, we used Duration Ratio, which was calculated by taking the MDU and dividing it by the time it took the native speaker to produce that sentence. This measurement is meant to be indicative of the native-likeness of production, independent of the length. We also measured the initial pause, or the amount of time the participant took before they started speaking, which also indicates the fluency of speech by measuring the amount of time needed to think before speaking. Accuracy was coded using the same hierarchical coding scheme as in Study 1, again combining all errors into one measure of total errors for these analyses. Results Training. We performed two 2 (Condition) by 3 (Repetition - 1, 2, or 3) univariate ANOVAs – one for errors and one for duration. There was a significant difference in duration ratio (F = 15.95, p<0.001) and errors (F=22.75, p<0.001), particularly between the first and second repetitions. There were no significant interactions. Testing: By condition. Next we performed 2 (Condition- LS and SC) by 2 (Test- immediate and delayed) univariate ANOVAs for duration ratio, initial pause, and errors on items on the immediate and delayed post-tests administered after the training. We found that for duration ratio, there was no significant effect of condition at the immediate post-test, but by the delayed post-test the PretSent/SubPhrase condition had significantly shorter duration ratio. For initial pauses, we found a significant main effect of Condition (F= 17.21, p<0.001), with PretSent/SubPhrase having significantly shorter initial pauses at both immediate and delayed test than the PretPhrase/SubSent condition. For errors we found a significant main effect of Condition (F = 12.34, p<0.001), but with the opposite pattern of results, with PretSent/SubPhrase having significantly more errors than the PretPhrase/SubSent condition. There was a significant interaction between Test and Condition, with the PretSent/SubPhrase condition improving significantly between immediate and delayed test; by the delayed post-test, there was no significant difference between conditions. Test: By sentence type. Next, we examined whether performance at test differed by sentence type. We performed separate 2(Condition-LS and SC) by 2(Test- immediate and delayed) univariate ANOVAs for duration ratio, initial pauses, and errors for both subjunctive and preterite/imperfect sentences. For preterite/imperfect sentences, the longer, simpler sentences, we found a clear fluency/accuracy tradeoff: there was a significant main effect of Condition for both duration ratio and initial pause (F = 6.009, p = 0.014 and F = 17.314, p<0.01, respectively), with the PretSent/SubPhrase condition (which practiced the preterite/imperfect sentences as full sentences) having shorter duration ratios and initial pauses at both immediate and delayed test. For duration ratio, we found a significant interaction of Condition and Repetition (F = 5.931, p = 0.015). This interaction involved no significant difference at the immediate post-test, but with shorter duration ratios for the PretSent/SubPhrase condition at the delayed post-test. Errors showed the opposite pattern: we found a significant difference of Condition (F = 6.620, p=0.01), with the PretSent/SubPhrase condition making significantly more errors than the PretPhrase/SubSent condition. For subjunctive sentences we do not see this tradeoff. For duration ratio, we found a significant interaction between Test and Condition (F = 15.324, p<0.01), with the PretSent/SubPhrase condition initially performing worse at immediate post-test, but better by the delayed post-test (suggesting an additional learning period for learning to produce long full sentences but overall leading to better learning of the construction). For initial pauses, we found a similar interaction (F = 4.741, p = 0.03), with the PretSent/SubPhrase condition having shorter initial pausess at the delayed post-test, though having no difference for Condition at the immediate post-test. We found a similar interaction for errors, with the PretSent/SubPhrase condition performing much worse at the immediate post-test compared to the PretPhrase/SubSent condition (F = 16.692, p<0.01), but having numerically fewer, though not significantly fewer, errors than the PretPhrase/SubSent condition by the delayed post-test. Novel vs. trained sentences. Finally, we analyzed whether there was a difference between sentences that students had practiced during the training phase and similar but novel sentences. If performance improved significantly on trained sentences only, this would indicate that improvement is due to proceduralization of individual sentences. A one-way ANOVA comparing trained to novel sentences showed a significant difference for duration ratio (p<0.001), with trained sentences having shorter duration ratios than untrained sentences, but with no effect for initial pauses or errors produced. Furthermore, comparing the two sentence types separately, it appeared that this effect was present only in the preterite/imperfect sentences. Finally, we found no difference between training conditions; neither condition led to more generalizable performance than the other. Summary The goal of this second study was to identify a) whether repeated imitation affects the production of speech outside the performance of the task itself and b) whether manipulating the task can affect whether it enhances fluency, accuracy, or both. Similar to Study 1, we found that, during training, participants improved through each repetition in both accuracy and fluency of imitation. There was no effect of training type during this phase. The type of training did have an affect on performance on later sentence production tasks, however. First, the type of sentence and the size of the trained unit affected the fluency and accuracy of production. When sentences were practiced initially as phrases, rather than sentences, it led to increased accuracy during later production. This indicates that a part of the driving force of the fluency-accuracy tradeoff is that lowering task constraints will allow for greater focus on accuracy, which may lead to more long-term learning than practicing initially in sentences. Conversely, practicing in sentences leads to more fluent speech overall, potentially because when the sentences are longer they require effort simply to produce the entire sentence, thus taking attention away from accuracy and instead encouraging fluent production. The key finding is that, in addition to improving performance on specific sentence constructions and vocabulary, providing practice that encourages fluency (i.e., practicing long sentences) improves fluency on sentence performance overall. This was clearest in the PretSent/SubPhrase condition, which produces preterite/imperfect sentences that are long and provide the highest performance-based demands. This condition produces increased fluency on all sentences produced, even on sentence constructions that were rehearsed in phrases. This suggests that the degree to which repeated practice leads to proceduralization of grammatical structures can be influenced by specific aspects of the training materials and method. General Discussion The current paper examines the extent to which rehearsal in general, and the Repeated Imitation task in particular, is a useful method of developing speaking skills in a second language. In Study 1 we tested the use of Repeated Imitation, an exercise that involves listening to a native speaker and repeating what you hear, to improve proficient Spanish speakers’ fluency and accuracy in repeating sentences. Participants did improve in their production, both in grammatical accuracy and fluency of repetition. We also confirmed previous findings that imitation tasks like this one are reconstructive, not echoic, in nature, by showing that students’ ability to repeat the sentence closely mirrored their ability to translate the sentence into English. Study 2 replicated the results of Study 1, showing that participants improved throughout the training, and also that this training is generalizable to other tasks, by administering post-tests, both immediately and a week after training, using a picture description task that did not involve repeating sentences. We also compared the effects of two types of training; practicing in sentences, which places an emphasis on fluency in order to produce the entire thing, and practicing in phrases, which allows for a greater emphasis on accuracy. We found that, during training, both conditions led to improvements in fluency and accuracy of production, just as found in Study 1. It is possible that this is due to proceduralization of the sentence constructions and vocabulary items used in the trained sentences, but it is also possible that this finding is more in line with Skehan’s (2009) interpretation that repeating a task allows the speaker to adjust their focus on a different dimension (fluency, accuracy, or complexity). Furthermore, the type of training affected the fluency and accuracy of sentence production during testing. A simple fluency-accuracy tradeoff might be more in line with the proceduralization view of the training; we would expect practicing in phrases to lead to greater proceduralization, because it allows the speaker to focus on form-meaning mappings, while practicing in sentences would focus the speaker’s attention on performance constraints (see Robinson’s Cognition Hypothesis) and lead to greater fluency. However, while we did find this tradeoff for the PretPhrase/SubSent condition, we saw an overall benefit for fluency in the PretSent/SubPhrase condition, which involved practicing producing very long sentences. The participants in this condition showed more accurate production on the complex subjunctive sentences, but also greater fluency, compared to the participants who practiced subjunctive sentences as full sentences. The major concern with this task is, of course, the fact that is not naturalistic in this particular social context. While we have shown that the training affects performance on other related speaking tasks that do not involve imitation, more work needs to be done to show how much this task leads to improvements in speech in other contexts. We believe that this type of practice should not be used as a replacement for open-ended speaking activities; rather, this task should be a supplementary activity, used to prompt focused practice on problematic forms. However, this task can be used as a viable method of prompting speaking practice, with numerous benefits that open-ended tasks can not offer. For teachers, this task can allow practice that their students can perform on an individual basis, either as homework or an in-class activity. The native-speaker model may allow students to monitor their own speech, and they can also record their speech to receive teacher feedback later. It also provides practice on constructions and vocabulary the students may opt not to use in more open-ended tasks. In addition to pedagogic implications, the structured nature of this task makes it useful as a way of investigating the development of speaking skills. The results of Study 2 suggest that students perform this task much in the same way as open-ended tasks, and that the task can be manipulated to change performance, allowing researchers to investigate the development of speech during practice. Acknowledgements This work was supported in part by a Graduate Training Grant awarded to Carnegie Mellon University by the Department of Education [grant number R305B040063], and the Pittsburgh Science of Learning Center, which is funded by the National Science Foundation [grant number SBE0354420]. We also thank John Kowalski for his programming assistance, Maria Liliana Mariño for her help in creating and recording stimuli, and David Plaut for reading and editing previous drafts of this paper. References Anderson, J. (1993). The Architecture of Cognition. Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc. Bloom, L., Hood, L., & Lightbown, P. (1974). Imitation in language development: If, when, and why. Cognitive Psychology, 6, 380–420. Boersma, P., & Weenink, D. (2007). PRAAT. Retrieved from http://www.praat.org Bygate, M., Skehan, P., & Swain, M. (2001). Effects of task repetition on the structure and control of oral language. In Researching pedagogic tasks: Second language learning, teaching and testing (pp. 23–48). Pearson Longman. de Jong, N., & Perfetti, C. (2011). Fluency training in the ESL classroom: An experimental study of fluency development and proceduralization. Language Learning, 61(2), 533– 568. DeKeyser, R. (2011). Practice for second language learning: Don't throw out the baby with the bathwater. International Journal of English Studies, 10(1), 155–165. Derwing, T., Munro, M., & Thomson, R. (2007). A longitudinal study of ESL learners' fluency and comprehensibility development. Applied Linguistics, Applied Linguistics, 29(3), 359–380. Erlam, R. (2006). Elicited imitation as a measure of L2 implicit knowledge: an empirical validation study. Applied Linguistics, 27(3), 464–491. Krashen, Stephen D. 1985. The input hypothesis. London: Longman. Nation, P. (1989). Improving speaking fluency. System, 17(3), 377–384. Robinson, P. (2005). Cognitive complexity and task sequencing: studies in a componential framework for second language task design. International Review of Applied Linguistics in Language Teaching, 43(1), 1–32. Robinson, P., Cadierno, T., & Shirai, Y. (2009). Time and motion: Measuring the effects of the conceptual demands of tasks on second language speech production. Applied Linguistics, 30(4), 533–554. Rossiter, M. J., Derwing, T. M., Manimtim, L. G., & Rhomson, R. I. (2010). Oral Fluency: The Neglected Component in the Communicative Language Classroom. The Canadian Modern Language Review, 66(4), 583–606. Skehan, P. (2009). Modelling second language performance: Integrating complexity, accuracy, fluency, and lexis. Applied Linguistics, 30(4), 510–532. VanPatten, B., & Cadierno, T. (1993). Input processing and second language acquisition: A role for instruction. The Modern Language Journal, 77(1), 45–57. Watson-Gegeo, K. A., & Gegeo, D. W. (1986). Calling-out and repeating routines in Kwara-ae children's language socialization. In B. B. Schieffelin & E. Ochs (Eds.), Language Socialization Across Cultures. Cambridge University Press. Yoshimura, Y., & MacWhinney, B. (2007). The effect of oral repetition on L2 speech fluency: An experimental tool and language tutor. Presented at the Speech and Language Technology in Education, The Summit Inn, Farmington, PA.