Copyright 1996 by the American Psychological Association, Inc. 0278-7393/96/53.00 Journal of Experimental Psychology: Learning, Memory, and Cognition 1996, Vol. 22, No. 1,216-230 Selection of Procedures in Mental Addition: Reassessing the Problem Size Effect in Adults Gregory S. Sadesky and Jeffrey Bisanz Jo-Anne LeFevre University of Alberta Carleton University Adults' solution times to simple addition problems typically increase with the sum of the problems (the problem size effect). Models of the solution process are based on the assumption that adults always directly retrieve answers to problems from an associative network. Accordingly, attempts to explain the problem size effect have focused either on structural explanations that relate latencies to numerical indices (e.g., the area of a tabular representation) or on explanations that are based on frequency of presentation or amount of practice. In this study, the authors have shown that the problem size effect in simple addition is mainly due to participants' selection of nonretrieval procedures on larger problems (i.e., problems with sums greater than 10). The implications of these results for extant models of addition performance are discussed. Twenty years of research on mental arithmetic has shown that problems involving larger numbers (e.g., 9 + 6) are solved more slowly than problems involving smaller numbers (e.g., 3 + 4). Surprisingly, in spite of the wealth of empirical data and the extensive theoretical development on mental arithmetic, the problem size effect has eluded satisfactory explanation (Ashcraft, 1992; McCloskey, Harley, & Sokol, 1991; Widaman & Little, 1992). The goal of the present research was to test an explanation of the problem size effect in adults that has been used to account for the arithmetic performance of children (Ashcraft, 1992; Siegler, 1987). We hypothesized that variability in the selection of procedures to solve simple addition problems has a major impact on solution latencies and may account for a substantial portion of the problem size effect. Retrieval in Mental Arithmetic Most current models of arithmetic solution share the assumption that adults solve simple arithmetic problems by retrieving an answer from a network of stored associations (Ashcraft, 1992; Campbell & Oliphant, 1992; Widaman, Geary, Cormier, & Little, 1989; Widaman & Little, 1992; reviewed by McCloskey Jo-Anne LeFevre, Department of Psychology, Carleton University, Ottawa, Ontario, Canada; Gregory S. Sadesky and Jeffrey Bisanz, Department of Psychology, University of Alberta, Edmonton, Alberta, Canada. The research reported in this article was supported by grants from the Natural Sciences and Engineering Research Council of Canada. Jo-Anne LeFevre also gratefully acknowledges the support provided by the Medical Research Council Applied Psychology Unit in Cambridge, England during the preparation of this article. Mark Ashcraft, David Geary, Chris Herdman, Michael McCloskey, and Monique S6n6chal provided helpful comments on a version of this article. We are particularly grateful to Alison Kulak for providing insights that led to the initiation of this research. Correspondence concerning this article should be addressed to Jo-Anne LeFevre, Department of Psychology, Carleton University, Ottawa, Ontario, Canada K1S 5B6. Electronic mail may be sent via Internet to jlefevre@ccs.carleton.ca. 216 et al., 1991). In accord with this view, the problem size effect has been attributed to various aspects of the representation of stored arithmetic knowledge. For example, Widaman and his colleagues (Widaman et al., 1989; Widaman & Little, 1992; Widaman, Little, Geary, & Cormier, 1992) have proposed a structural account in which a fact (e.g., 3 + 5) activates the area of a stored table denned by the product of the addends (e.g., 15). This model is based on the finding that, in regression analyses of solution latencies, the product is a good predictor of latencies to addition problems (Miller, Perlmutter, & Keating, 1984; Widaman et al., 1989). Similarly, Campbell and Oliphant proposed that larger numbers are represented less distinctively than smaller numbers so that retrieving the answer for a large problem is more difficult than for a small problem. These structural accounts of the problem size effect can be contrasted with associative accounts in which the form of the representation is primarily based on the acquisition history of arithmetic knowledge (Ashcraft, 1987, 1992; Campbell & Graham, 1985). For example, Ashcraft (1987, 1992) proposed that the strength of the connection between a problem and its sum is stronger for small problems than for large problems because small problems are presented and practiced more frequently (Hamann & Ashcraft, 1986). Campbell and Graham suggested that, for multiplication facts, both frequency of exposure and order of acquisition affect the associative strength between problems and answers. In these models, retrieval time is assumed to be proportional to the strength of the connection, and thus small problems are retrieved more quickly than large problems. An associative model of arithmetic knowledge has also been proposed by Siegler and his colleagues (Siegler & Jenkins, 1989; Siegler & Shipley, 1995; Siegler & Shrager, 1984) to account for the performance of children on simple addition problems. In contrast to the models described above, Siegler's model includes an explicit and important role for nonretrieval procedures such as counting: Siegler and others have convincingly demonstrated that retrieval is only one of a number of possible procedures used by young children to solve arithmetic problems (Geary & Brown, 1991; Geary & Burlingham- PROBLEM SIZE AND PROCEDURE SELECTION Dubree, 1989; Siegler, 1987, 1988b, 1989; Siegler & Jenkins, 1989; Siegler & Shrager, 1984). According to Siegler's distribution-of-associations model (Siegler, 1988b; Siegler & Jenkins, 1989; Siegler & Shrager, 1984), problems are associated with both correct and incorrect answers. As children practice simple arithmetic, their representations become more "peaked" in that the association between a problem and the correct answer becomes stronger than the associations between that problem and incorrect answers. In this model, use of nonretrieval procedures (e.g., counting) is one factor that contributes to the problem size effect. Essentially, nonretrieval procedures operate less accurately on large than on small problems, and thus the distributions of associations for large problems are less peaked than those for small problems. Siegler's model has not explicitly been applied to adult performance: Researchers have assumed that adults retrieve virtually 100% of the time. The assumption that adults always retrieve on simple arithmetic problems has rarely been questioned, even by Siegler, who stated that "adults do not typically need to use any approach other than retrieval . . . to solve simple arithmetic problems" (Siegler, 1989, p. 505). This assumption was consistent with interpretations of latency data for children and adults (e.g., Groen & Parkman, 1972). For children younger than 10 years of age, response times for simple addition problems (e.g., 3 + 4) increase linearly with the size of the augend or the addend, whichever is smaller (the min). These data suggested that children count on from the larger addend to obtain a sum (the min model; Ashcraft & Fierman, 1982; Groen & Parkman, 1972; Hamann & Ashcraft, 1985). Adults also show a significant increase in response latencies as a function of problem size, but this increase is much smaller (e.g., 20 ms per unit of problem size) than the effect found in children (e.g., 400 ms per unit of problem size). Moreover, response times for young children form a linear function of problem size, as indexed by min; whereas response times for adults form curvilinear functions that are fitted best by the square of the sum or by the products of the augends and addends. These differences in the problem size effect between adults and young children are generally interpreted as evidence that, unlike children, adults use fast and efficient retrieval from memory to solve simple addition problems. Other data support the view that adults rely on retrieval to solve simple arithmetic problems. For example, adults and older children show evidence of associative facilitation and interference that is consistent with the assumption that they are retrieving solutions from a network of stored facts (Campbell, 1987b, 1991). For example, cross-operation errors are common among both adults and practiced children (e.g., 3 x 4 = 7; Miller & Paredes, 1990; Miller et al., 1984), suggesting that the two operations are stored in closely linked networks (Geary, Widaman, & Little, 1986). There is also evidence that arithmetic facts are activated without intention in nonarithmetic tasks (Ashcraft & Stazyk, 1981; LeFevre, Bisanz, & Mrkonjic, 1988; LeFevre & Kulak, 1994; LeFevre, Kulak, & Bisanz, 1991) and that multiplication facts are subject to facilitation through priming (Campbell, 1987b, 1991; Koshmider & Ashcraft, 1991). Thesefindingssupport the view 217 that retrieval is the dominant procedure for adults and older children. Retrieval-based accounts of the problem size effect, however, do not account for all of the available data. For example, structural models of arithmetic knowledge, such as the model proposed by Widaman and colleagues (Widaman et al., 1989; Widaman & Little, 1992), do not account for the effects of ties (e.g., 3 + 3; McCloskey et al., 1991). Ties are consistently solved more quickly than nontie problems and show substantially smaller effects of problem size (Miller et al., 1984). To account for the effect of ties, Campbell and Oliphant (1992) proposed that information about ties is represented separately from information about nonties. In contrast, associative models can account for the effect of ties without postulating separate representations: It is assumed that ties are practiced more frequently than nonties (Siegler & Shrager, 1984; cf. Baroody, 1994). Associative models of problem size have also been criticized, however. The problem size effect in these models has been related to the amount of difficulty that children have learning individual problems (Wheeler, 1939), the frequency with which problems are presented in textbooks (Hamann & Ashcraft, 1986), and the order of acquisition of specific facts (Graham & Campbell, 1992). There is no clear empirical support, however, for the assumption that differences in acquisition experience could be preserved in the years between initial learning and adulthood (McCloskey et al., 1991; Widaman & Little, 1992). Further, in a recent compilation of the frequency of addition facts across Grades 1-6, cumulative frequency of occurrence was not strongly related to structural indices of problem size such as product (Ashcraft & Christy, 1995). Thus, neither structural nor associative models provide satisfactory accounts of the problem size effect (Ashcraft, 1992; McCloskey et al., 1991). An Alternative Approach We propose that adults, like children, use both retrieval and nonretrieval procedures to solve simple addition problems (see also Baroody, 1994). Siegler has consistently found that children as young as 4 years of age use a variety of procedures and that this variability obtains at least through Grade 2 (Siegler, 1987) with addition problems and Grade 4 with subtraction problems (Siegler, 1989). Siegler (1987,1989) used the variability in children's selection of procedures to show that averaging response latencies across procedures can result in misleading conclusions about how children solve problems. For example, averaging across trials in which different procedures were used leads to the conclusion that young children use the min procedure (count up from larger addend) on most simple addition trials (e.g., Ashcraft & Fierman, 1982). However, Siegler (1987) found that kindergarten, Grade 1, and Grade 2 children reported only using the min procedure on 36% of the trials. On 35% of the trials they used retrieval, and other procedures accounted for the remaining 29% of the trials. In spite of this variability in procedure selection across trials, the min model accounted for 74% of the variance on averaged trials in Siegler's (1987) study. Because latencies were longer and more variable on min trials than on other 218 L E F E V R E , SADESKY, AND BISANZ trials, averaging across trials resulted in min being the best predictor (see Siegler, 1987, for details). Thesefindingsmake it clear that regression analyses in which latencies are averaged across trials that involve different procedures can produce misleading conclusions about performance. The problems inherent in averaging across trials in which different solution procedures are used might also apply to adult performance if adults, like children, use a variety of solution procedures. If adults do not retrieve on all trials, and if the problem size effect reflects variability in selection of procedures, then models with a strictly representational basis for the problem size effect may need to be modified. Thus, the assumption that adults use only retrieval to solve simple addition deserves thorough evaluation. Interestingly, Groen and Parkman (1972) did suggest that the problem size effect they observed in adults could reflect the combination of a few counting trials with a majority of retrieval trials. Ashcraft and Stazyk (1981) tested Groen and Parkman's claim that adults count on 5% of the trials by removing the slowest 5% and then recalculating the problem size effect. They predicted that the problem size effect would disappear if counting was occurring on that 5%. They found, however, that the problem size effect was still significant. Ashcraft and Stazyk had to remove the slowest 50% of the trials before the latencies were no longer predicted by problem size. Ashcraft and Stazyk concluded that it was unlikely that adults would count on such a high proportion of trials, but they did not consider the possibility that adults might use other nonretrieval procedures to solve simple addition problems. A few researchers have used the more direct approach of asking participants to report their solution procedures in combination with chronometric data to examine simple arithmetic performance. The results of these studies challenged the strong assumption that adults always use retrieval on simple addition problems. For example, Svenson (1985) asked participants to indicate how sure they were that they used retrieval on simple addition problems. Svenson found that participants were sure they had used retrieval on only 78% of trials. Svenson did not ask participants to describe their procedures, however, and the performance of individual participants was not analyzed. Geary and his colleagues (Geary, Frensch, & Wiley, 1993; Geary & Wiley, 1991) have also found that some adults do not always use retrieval to solve simple arithmetic problems. Geary and Wiley found that university students (i.e., younger than 30 years of age) reported using retrieval procedures on 88% of addition problems; older adults (older than 60 years) rarely used procedures other than retrieval. Similarly, Geary et al. found that younger adults used retrieval on only 71% of subtraction trials. Thus, adults may not always use retrieval to solve simple arithmetic problems. Self-reports of procedure use have been criticized (see Russo, Johnson, & Stephens, 1989) on the basis that such reports may be inaccurate and may change the pattern of data. Although Siegler has demonstrated the usefulness of selfreports with children (see Siegler, 1987, 1989; cf. Cooney & Ladd, 1992), it is possible that adults' performance is less amenable to accurate introspection because adults' arithmetic performance is much faster than that of children. Self-reports have successfully been used in a similar domain, however, by Logan and his colleagues (Compton & Logan, 1991; Logan & Klapp, 1991). Compton and Logan used self-reports of procedure use to clarify patterns of latency data in an alphabet arithmetic task. In alphabet arithmetic, participants are given problems such as A + 3 = D and are required to verify them either by counting along the alphabet string or by memory retrieval (once they have learned specific combinations). Self-reports of procedure use (i.e., counting vs. memory retrieval) conformed well to patterns of latency data, even after substantial practice (Logan & Klapp, 1991). The results supported the validity of using self-report data to study memory processes (Compton & Logan, 1991). In the present research, we examined adults' use of retrieval and nonretrieval procedures on simple addition problems by collecting trial-by-trial reports of procedure use (Siegler, 1987, 1988b, 1989; Siegler & Shrager, 1984). In contrast to previous research in which similar methods were used with adults (Geary & Wiley, 1991; Svenson, 1985), we addressed the issue of how selection of procedures varies across participants as well as across problems. Furthermore, few researchers have addressed the issues of validity and reactivity in adults' solution descriptions (cf. Cooney & Ladd, 1992; Russo et al., 1989). We compared our results to those of previous researchers to illuminate these issues. We hypothesized that adults use a variety of procedures to solve simple addition problems. University students were presented with simple addition problems on a computer screen and were asked to generate the answers. Response times were recorded, and the experimenter prompted participants to describe how they solved each problem. We expected to find both individual differences and problem differences in the frequency of use of various procedures. Participants also completed a paper-and-pencil arithmetic test that was designed to provide an independent measure of fluency in solving simple arithmetic problems. Method Participants Sixteen undergraduate students (8 male and 8 female) participated in this experiment to partially fulfill a course requirement. The students' median age was 19:10 (years:months), with a range of 18:3 to 37:11. Materials Simple addition problems. The problem set was composed of the 100 combinations of addends 0-9 and augends 0-9. Each participant saw one of four pseudorandom orders of these 100 problems. Problems were ordered with the constraints that no addend, augend, or sum were repeated on consecutive trials, and no problem and its inverse (e.g., 2 + 3 and 3 + 2) appeared in the same half of the problem set. Arithmetic fluency test. After finishing the computer trials, participants completed the Addition test and the Multiplication test of the Subtraction-Multiplication Test from the French kit (French, Ekstrom, & Price, 1963). The Addition test consists of two pages of three-term addition problems (total 120). The Multiplication test consists of two pages of one- by two-digit multiplication problems 219 PROBLEM SIZE AND PROCEDURE SELECTION (total 60). Arithmetic fluency was defined as the total number of problems solved correctly on both tests. This measure of arithmetic fluency reflects an individual's ability to quickly and accurately execute addition and multiplication procedures (including retrieval) on multidigit problems. two different procedures on the same trial, memory aids (e.g., a poem), and visualization. Examples of these procedures are given in Appendix A. Procedure Accuracy Each participant was individually tested in a single session lasting not more than SO min. Before the session began, the participant was told that the purpose of the study was to determine what sorts of procedures adults use when solving simple arithmetic problems. An example, as well as some possibilities, was offered: Participants rarely made errors. Of the 1,600 trials, 5 responses were invalid because of equipment or experimenter errors, and 17 responses were incorrect. Thus, errors were too infrequent to be analyzed. This 1% error rate is similar to those reported in other production experiments, including the 2% error rate reported by Miller et al. (1984) and the rate of 3% reported by Geary and Wiley (1991) for their young adults. What do people do when asked to add 9 + 6? You could just remember the answer, 15. It just sort of pops into your head. You could figure the answer out by counting. You think 9, and then 10, 11,12,13,14,15. You could figure it out using a special trick. You could remember that 10 + 6 = 16, so 9 + 6 has to be just one less. Or you could solve it in some other way. Participants were told that each trial was composed of two parts. First, they were to say, as quickly and accurately as possible, the answer to an addition problem that appeared on a video monitor. Second, they were to describe how they had solved the problem (i.e., whether they had remembered, counted, or used a special trick or other procedure). Participants were encouraged to report as much as possible about how they solved the problem, and the experimenter noted their responses. Ten practice problems were presented, participants were again reminded to answer as quickly and accurately as possible; the 100 experimental problems were then presented. The practice problems for each list were randomly selected from the problems in the second half of that list. A short break was allowed after 50 problems. Participants initiated each trial by saying "go" after afixationpoint (an asterisk) appeared on the monitor. Following a 100-ms tone and a 700-ms blank interval, the problem appeared and timing was initiated. When presented at a distance of 0.6 m, the problem subtended a visual angle of 0.33° vertically and 1.15° horizontally. The participant's vocal responses activated a voice key, and each distinct latency was counted and recorded. In the few cases in which the participant made more than one response, the experimenter keyed in the count for the intended response and the corresponding latency was recorded. Following the response, the cue "Remember or Strategy?" appeared on the screen to prompt participants to report the procedure that they had used. After the computer trials, the paper-and-pencil arithmetic tests were administered with the standard instructions. Participants were given 2 min per page for each of the tests. Total time to complete these tests was about 10 min. Classification of Self-Reports Participants' self-reports were used to classify procedures. Five categories were used: retrieval, counting, transformation, zero rule, and other. Retrieval was inferred when participants reported remembering or knowing the answer without any additional processing. When participants mentioned any sort of mental incrementation, the selfreport was classified as counting. A procedure in which an individual reportedly changed the presented problem to take advantage of familiar arithmetic facts was classified as a transformation (referred to elsewhere as decomposition [e.g., Siegler, 1987] or as a derived-facts procedure [e.g., Putnam, deBettencourt, & Leinhardt, 1990]). A zero-rule classification was given only if the addend or augend was zero and the participant reported that any number added to zero is that same number. Other procedures included guessing, the reporting of Results Selection of Procedures As shown in Table 1, the four substantive categories accounted for over 98% of participants' reported procedures. For each participant we computed the frequency, median latency, and standard deviation of latencies for each procedure. Means are presented in Table 1 along with information about the prevalence of use among individuals. Retrieval was the fastest and most common procedure, but frequency of use was highly variable across subjects and, overall, well below 100%. In fact, only 2 participants reported using retrieval on all trials. Transformation and counting procedures were somewhat slower than retrieval but were used by most participants and accounted for 25% of all self-reports. Interestingly, the reported 9% use of counting was close to the 5% use of counting estimated by Groen and Parkman (1972). Transformations included both decomposition, by using a known fact (Siegler, 1987), and conversion of addition to multiplication. Examples of the most common transformations are shown in Appendix A. The majority of the transformations reported by our participants (65%) involved the use of facts that summed to 10. For example, 4 + 7 was transformed to 7 + 3 + 1, and 8 + 5 was transformed to 8 + 2 + 3. On problems that included a 9, an alternative procedure involved changing the 9 to 10, adding the remaining addend to 10, then subtracting 1. For example, 9 + 7 was transformed to 10 + 7 — 1. On 19% of transformation trials a tie was used as the known fact, usually when the addend and augend differed by one (e.g., 6 + 7 became 6 + 6 + 1). On 13% of the trials a multiplication Table 1 Prevalence and Latencies of Procedures Used on Addition Problems Procedure Trials (%) No. of people3 Frequency ofuse b (%) Retrieval Transformation Counting Zero rule Other 71.2 16.5 8.6 2.2 1.5 16 33-100 1-37 2-26 1-17 1-8 13 9 4 Latency SD M 749 120 1,094 301 985 378 854 134 1,386 990 "Number of participants who used the procedure at least once. ••Represents a range across those individuals who used the procedure at least once. 220 L E F E V R E , SADESKY, AND BISANZ fact was used (e.g., 6 + 5 became 2 x 5 + 1). Only 4% of trials involved some other fact. Thus, 84% of transformation procedures involved the retrieval of a well-known fact (i.e., ties or sums to 10) in combination with an adjustment that reflected the initial transformation. Although transformation procedures involved direct retrieval of a stored association, this retrieval was only a subcomponent of processing. The frequency with which transformations were used is inconsistent with the assumption that direct retrieval alone is used to solve simple addition problems (Ashcraft, 1987; Campbell & Oliphant, 1992; Widaman & Little, 1992). In the future, therefore, an important distinction needs to be made between direct retrieval of an association from memory and the more complex procedures that include retrieval, or a series of retrievals, as part of processing. The zero rule was reported on 12% of the trials in which one digit was zero. The results also provide evidence pertaining to the issue of the relative latencies of rules versus retrieval (i.e., Ashcraft, 1982; Baroody, 1983): In this case, rule use was reasonably fast and efficient. Other procedures were reported very infrequently. Standard deviations of latencies were relatively low for the retrieval and zero-rule procedures, which are presumed to be fairly uniform in execution. Latencies were more variable for counting, a procedure that, depending on the problem, could involve more or fewer increments. Similarly, the categories labeled transformation and other included procedures that varied considerably in computational complexity (see Appendix A), and, as would be expected, the corresponding standard deviations were high. Selection of procedures was compared across subjects. Siegler (1987) found that 99% of young children used two or more procedures on addition problems. In this study, 81% of the participants used two or more of the counting, retrieval, and transformation procedures, and 63% used three or more procedures. This result extends Siegler's conclusions about the variability of procedure selection to adults. Over half (56%) of the participants used counting at least once, and 81% used a transformation procedure at least once. This variety in procedure selection supports the view that the importance of retrieval has been overemphasized in models of adult performance. Furthermore, there are significant individual differences in the frequency of procedure use (see Appendix B for frequency of use across individuals). This variability across subjects has important implications for models of arithmetic: The modal model (i.e., that adults always retrieve) applied to only a few of our participants. Relative latencies of the various procedures were examined for individual participants. In comparing latencies for retrieval, transformations, and counting, retrieval was the fastest of the three procedures for 15 of the 16 participants. For 3 of the 4 participants who used the zero rule, rule use was faster than retrieval. Finally, for 6 of the 8 participants who used both procedures, counting was faster than transformations. Thus, the pattern of relative speed across procedures for individuals was similar to that in the grouped data (see Table 1). On the basis of the data in Table 1 and Appendix B, we can conclude that adults use a variety of procedures to solve single-digit addition problems. This finding contrasts sharply with the common assumption that adults always, or nearly always, directly retrieve addition facts from memory. Transformation procedures involve retrieval of an intermediate sum but not direct retrieval of an answer to the presented problem. It is also clear from our data that use of procedures varies substantially among individuals: Some participants used nonretrieval procedures on a majority of problems. Analysis of the Problem Size Effect Problem size and averaged solution latencies. One concern with self-reports is that they may change the pattern of performance (i.e., they may show reactivity; Russo et al., 1989). To address this issue, we compared our results to those of other investigators by using the same analyses as have frequently been reported in the literature. Note that none of these other studies included self-reports of procedure use. Six structural variables were used to predict solution latencies on all problems. Regression analyses were calculated by using minimum addend (min), correct sum (sum), sum squared, product (prod), the Wheeler (1939) difficulty variable, and a variable (Greater 10) that coded whether the sum was greater than 10 (cf. Siegler, 1987). The Wheeler difficulty index reflects children's errors in learning simple addition facts and has been used in research with adults as a measure of associative strength (e.g., Ashcraft, 1987). As shown in Table 2, for the complete data set (i.e., including ties and zero-addend problems), each of these structural predictors accounted for a significant percentage of the variance in latencies. The best predictor was Greater 10, followed closely by product (Widaman et al., 1989) and the Wheeler difficulty measure (Ashcraft, 1987). Next, ties and zero-addend problems were removed from the data set, and the regressions were recalculated on all remaining problems. Again, all of the structural variables accounted for a significant percentage of the variance. The two best predictors were product and sum squared (Ashcraft & Stazyk, 1981), but sum and Greater 10 accounted for almost as much variance as the two "best" predictors. Thus, consistent with other investigations of mental arithmetic, latencies increased with problem size as indexed by these structural variables. Table 2 Structural Predictors of Latency Subsets of data All data Predictor r 2 Minimum Sum Sum2 Product Wheeler Greater 10 .46 .49 .53 .54 .55 .57 a Reduced set b Slope r 2 Slope 39.1 23.2 1.3 5.0 3.5 213.6 .49 .55 .59 .58 .48 .56 49.4 30.1 1.5 5.9 4.1 210.2 Zeros 0 2 r Slope .08 .14 -3.97 -0.05 .00 -0.21 Ties" r2 Slope .47 .47 .53 .53 .33 .66 10.7 10.7 0.3 1.2 1.1 73.8 Note. All squared correlations (r2) are significant (ps < .05), except those for the Zeros subset. a Number of problems in the regressions was 100. 'The number of problems in the regressions was 79. T h e number of problems in the regressions was 19. dThe number of problems in the regressions was 10. 221 PROBLEM SIZE AND PROCEDURE SELECTION Comparison of the percentage of variance accounted for by the various structural variables in this experiment with the values reported by Ashcraft and Stazyk (1981)1 and by Miller et al. (1984)2 indicated that there is substantial agreement across studies. For example, min accounted for 60% and 50% of the variance, respectively, in those studies, as compared with 47% in the present study. Similarly, sum squared accounted for 60%, 57%, and 53% of the variance in latencies in Ashcraft and Stazyk, Miller et al., and the present study, respectively. The best predictor identified by Miller et al. was product, which accounted for 57% of the variance in latencies, compared with 54% in the present study. Slope values reported in the literature are also similar to those in the present study. For example, Miller et al. reported a slope of 3.02 for latencies on addition problems when product was used as the predictor. Ashcraft and Stazyk reported a slope of 2.2 for addition latencies with sum squared as the predictor (see also Ashcraft, 1987). Thus, the results of the regression analyses indicated that solution latencies in this experiment showed very similar problem size effects as have been reported elsewhere in the literature. Self-reports apparently did not change the pattern of results. We analyzed zero-addend and tie problems separately because of suggestions that these problems are solved differently than other problems (e.g., Ashcraft & Stazyk, 1981; Miller et al., 1984). On zero-addend problems, none of the structural predictors accounted for a significant percentage of the variance in solution latencies. Participants generally either retrieved the answer or used the zero rule (i.e., n + 0 = n). On ties, however, the structural variables predicted a significant percentage of the variance. The slope of the problem size effect on nontie problems, however, was approximately five times larger than that on tie problems (for the product variable). Thus, although ties did show a significant problem size effect, it was small compared with that found on other (nontie) problems (see Miller et al., 1984 for a similar result). In summary, when averaged across procedures, our data are comparable to those reported by a number of other investigators (e.g., Ashcraft & Stazyk, 1981; Miller et al., 1984; Siegler, 1987; Widaman et al., 1989). We found that latencies generally increased as a function of problem size, that the nonlinear predictors, sum squared and product, accounted for more variance than min or sum (a result typical of adult performance), and that the problem size effects for zero-addend and tie problems were different than for nontie problems. Problem size and selection of procedures. According to extant models, problem size effects in adults are due either to the structure of the mental network (e.g., tabular models; Widaman & Little, 1992) or to a function of network strength as determined by frequency of presentation, amount of practice, and interference from competing associations (Ashcraft, 1987, 1992; Campbell, 1987a; Campbell & Graham, 1985; Graham & Campbell, 1992; Siegler & Shrager, 1984). Alternatively, the problem size effect may be a result of differential selection of procedures across problems. The proportion of use of each of the three most common procedures (retrieval, counting, and transformations) is shown in Figure 1 as a 100 <D p. 70 R 60 T C 50 Retrieval Transformations Counting 40 T-T 30 20 10 10 - OS BH* / ^C^C--'-' T--T-.T^- T ~-T- 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 SUM Figure 1. The percentage of problems solved by each of the retrieval (R), transformation (T), and counting (C) procedures as a function of the sum of the problems. function of the sum.3 It is clear that procedure use varied systematically with problem size. First, retrieval was the most frequently used procedure on problems with sums less than 10. Second, the use of transformation procedures increased dramatically on problems with sums greater than 10. Third, although counting procedures were used across problem size, frequency of use was higher on problems with mins of one, two, or three (16%) than on problems with mins larger than three (7%, excluding ties; see Appendix C). Thus, procedure selection was associated with particular problem characteristics. Further examination of the data in Figure 1 suggests that the nonlinear relation between problem size and problem latency, and hence the enhanced predictiveness of sum squared or product relative to the other structural variables, is closely linked to the pattern of retrieval use. The best predictor of latencies on the complete data set was a variable that simply represented whether the sum was greater than 10 (as shown in Table 2). Because latencies on transformation trials were slower than on retrieval trials (Table 1), it is likely that the problem size effect is strongly influenced by latencies on nonretrieval trials. To explore this hypothesis, latencies for retrieval, transformation, and counting trials were separately analyzed, as described below. When latencies were separately analyzed for retrieval, counting, and transformation trials, the best structural predictors on retrieval trials accounted for substantially less variance than on all trials (see Table 3). Similarly, on transformation trials none of the structural variables were particularly good 1 The data reported by Ashcraft and Stazyk (1981; Experiment 1) were based on the performance of 20 participants in a verification task. The reported squared correlation values include the 90 nontie problems. 2 The data reported by Miller et al. (1984) were based on the performance of 6 participants in a production task. The reported squared correlation values include the 90 nontie problems. 3 Sum was chosen as the variable in the graphs because it has a reasonably straightforward relation to the individual problems (Geary & Wiley, 1991) and, thus, the graphs are easily interpreted. 222 L E F E V R E , SADESKY, AND BISANZ Table 3 Regression Analyses of Latencies for the Different Procedures All trials Data set/ predictor Retrieval trials r2 Slope t2 .49** .47** .54** .57** 23.2 39.1 5.0 213.6 .18 #H * .22 *i * .22 • i * .12 *x .55** * .49** * .58** * .56** 30.1 49.4 5.7 210.2 .08 ** .07 *i .09 .03 Transformation trials Counting trials Slope r2 Slope r2 Slope 7.7 14.6 1.7 52.2 .61 * .72 ** .70 ** * .66 ** * 131.8 263.8 28.7 997.6 .14*** .07 .II*11 .24*' 34.4 41.5 5.2 302.2 602.0 10.0 1.2 26.9 .60 • • * .71 * * .69 * * .65 * * 135.6 264.7 28.7 986.5 .16*' .15*' * .15*' * .23*' * 42.8 67.8 6.7 310.2 11 Complete Sum Min Product Greater 10 Reduced6 Sum Min Product Greater 10 Note. Min = minimum; r2 = squared correlation. a n = 100. bn = 72. **p < .05. ***p < .01. predictors of latencies, although the best predictor was whether the sum was greater than 10. Separation of trials by reported procedure use suggests that averaged latencies provide a misleading picture of performance (Siegler, 1987, 1989). The latency data for all trials and for retrieval trials only are plotted in Figure 2 for the complete data set (100 addition problems). These data are consistent with the hypothesis that use of nonretrieval procedures on larger problems contributes substantially to the problem size effect. Indeed, although latencies on retrieval trials still increased with problem size, the increase was not particularly systematic. Moreover, on the reduced data set (ties and zero addends omitted), the percentage of variance accounted for on retrieval trials by the best structural variable was less than 10% as compared with almost 60% on all trials. Given that many researchers have studied only the nontie, nonzero problems (e.g., Geary & Wiley, 1991), this finding has important consequences for interpreting reported effects of problem size. 1100 o All Trials • Retrieval Trials Only 1000 On counting trials, the best structural predictor of latencies was the minimum addend (Table 3). This result is consistent with the participants' claims that they used counting on these trials. As shown in Figure 3, the increase in latencies with min was dramatic, especially as compared with latencies on retrieval trials. Note that the scale on this figure is different from that of Figure 2 and that the latencies are plotted as a function of min rather than of sum. As shown in Table 3, the coefficient associated with the min regression was consistent with the interpretation that participants counted on these trials. The estimate of 264 ms per count was similar to other estimates of well-practiced internal counting; whereas the coefficients associated with min on retrieval and transformation trials were small and not easily associated with a plausible rate of internal counting. The results for tie problems support the view that ties are generally solved with retrieval. Self-reports indicated that participants either retrieved the solutions to these problems or 2500 o Counting • Retrieval 2000 900 1500 800 700 1000 600 - 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 SUM Figure 2. Median latencies collapsed across subjects for all trials (open circles) and for trials in which retrieval was used (closed squares). Ties were omitted. Also shown are best fitting regression lines in which sum was used as the predictor: The dotted line is for all trials, and the dashed line is for retrieval trials only. 500 1 2 3 4 5 6 7 Minimum Addend 8 9 Figure 3. Median latencies collapsed across subjects on counting trials (open circles), and for comparison, on retrieval trials (closed squares) as a function of the minimum addend. Also shown are best fitting regression lines in which min was used as the predictor. PROBLEM SIZE AND PROCEDURE SELECTION used a multiplication transformation (e.g., 4 + 4 = 2 x 4 ) . The slope of the regression equation for latencies on ties (with use of product as the structural variable) was equivalent to that on retrieval trials alone for nonties (1.2 ms). Removing the nonretrieval trials for tie problems had little effect on the pattern of performance. Thus, ties are solved differently than nontie problems: Ties are primarily solved with retrieval; whereas nontie problems are solved with a variety of procedures. In summary, the present results suggest that averaging across procedures results in misleading conclusions about performance (Siegler, 1987). When retrieval trials alone were considered, the problem size effect was substantially reduced. The small amount of variance accounted for by problem size reduced its status as a major variable to a noticeable but minor factor in retrieval latencies. Of importance was the finding that retrieval was used less than 50% of the time on some problems (e.g., 8 + 9 and 9 + 8; see Appendix C for the percentage of procedure use on individual problems). Thus, the trials in which adults did not retrieve exerted a significant influence on latencies. Veridicality and Reactivity The present results provide some support for the assumption that participants can accurately report their procedure use across trials (cf. Russo et al., 1989). First, the latency data conformed fairly well to expectations that were based on the nature of the self-reported procedures. In general, procedures that should have been faster were faster, arid procedures that should have been more variable were more variable (see Table 1). Second, latencies were very strongly related to structural variables only for counting trials and not for retrieval or transformation trials (see Table 3). Moreover, the coefficient for min on counting trials represented a rate of internal counting that is entirely plausible. If participants merely had reported solution procedures randomly, then patterns of slopes and squared correlations should have been similar for all three types of trials. If participants had only discriminated fast ("retrieval") from slow trials and labeled the latter either counting or transformation on a random basis, then slopes and squared correlations should have been similar for counting and transformation trials. The finding that the coefficients for the counting trials differed from those for the comparably slow transformation trials, as well as those for the faster retrieval trials, is consistent with the view that participants discriminated and reported use of these different procedures accurately. Third, the finding that latencies (averaged across procedures) showed similar patterns as in previous studies supports the conclusion that asking participants to report their procedures did not substantially alter their performance (see also Compton & Logan, 1991; Logan & Klapp, 1991). In summary, the pattern of latencies was such that we feel confident that our participants were reasonably accurate (i.e., veridical) at reporting their procedures and that their performance was not influenced systematically (i.e., was not reactive) by the requirement to provide self-reports. 223 Analyses of Individual Differences Data from the performance of individual participants are rarely analyzed separately. As described above, data are usually averaged across individuals, even when procedures are analyzed separately (Siegler, 1987). The same criticisms apply to averaging across individuals, however, as to averaging across procedures. Averaged data may not describe the performance of any individual person. We have included analyses on averaged data primarily to illustrate the similarities between our data and those reported by other researchers. The procedure of averaging across subjects is, however, statistically unsound (Lorch & Myers, 1990). As illustrated by Lorch and Myers, averaging across subjects has two unfortunate consequences. If there is any interaction between subjects and items (i.e., if participants vary in their latencies across items), then averaging results in inflated alpha levels and inflated estimates of the percentage of variance accounted for, as outlined below. Problem size and individual differences. To examine the problem size effect for individuals, we conducted regression analyses on correct trials separately for each participant by using product as the structural variable.4 The regression coefficients and squared correlation values for individuals are shown in Table 4. Overall, the percentage of variance accounted for by product was lower for individual participants than for averaged data, with a mean of 22% across subjects as compared with 54% in the averaged data (Table 3). The mean slope value across subjects (6.9 ms) was statistically significant, <(15) = 4.94, p < .05, and was similar to that for averaged data (5.0 ms). As noted by Lorch and Myers (1990), estimates of population coefficients with averaged data are not seriously compromised by the averaging procedure, although estimated standard errors would be incorrect. Thus, averaging across subjects in this study resulted in substantially inflated estimates of the relation between problem size and solution latencies. Retrieval latencies and individual differences. We were par- ticularly interested in the influence of nonretrieval processing on patterns of latencies. Recall that, for averaged data, squared correlations and slope values decreased when retrieval trials were separately analyzed. The same result occurred in the data of individuals. Although problem size (as indexed by product) continued to predict latencies, <(15) = 7.37, p < .05, its influence was reduced: Mean percentage of variance accounted for by product decreased to 15% (from 22%), and mean slope was reduced to 3.5 ms (from 6.9 ms). Thus, the change in the pattern of latencies observed in averaged data when retrieval trials were eliminated was not due to some artifact of averaging across individuals. This analysis indicated that, on retrieval trials alone, problem size accounts for a relatively small percentage of the variance in solution times. The finding contrasts sharply with the result that problem size accounts for over half the variance in solution latencies when latencies are averaged across subjects and across procedures. 4 The analyses were very similar when sum was used as the predictor; however, product was chosen because it is typically the best predictor of latencies for adults (Ashcraft, 1987; Miller et al., 1984). 224 L E F E V R E , SADESKY, AND BISANZ Table 4 Individual Regression Results on All Trials and Retrieval-Only Trials Using Product and Percentage Use of Nonretrieval as Predictors All trials3 Retrieval only2 Retrieval onlyb Participant r2 Slope rz Slope r2 Slope 01 13 15 09 02 16 07 14 06 03 10 05 04 11 12 08 .086* .070* .279* .379* .091* • .456* .006 .190* .216* .284* .553* .009 .383* .181* .140* .236* * 0.7 3.3 2.3 7.2 3.4 9.0 0.5 6.8 16.3 10.9 18.1 1.2 7.7 3.4 4.8 14.1 .086* " .070* .265* • .377* .086* ' .214* .053c .051* .149* 1 .168* " .192* k .067* " .141* .248* 1 .062* k .082 0.7 3.3 1.7 7.0 4.0 4.1 0.8 3.2 5.5 4.3 6.9 1.1 3.9 3.6 3.7 2.5 .069* " .040* .176* ' .399* ' .135* " .368* .063* * .001 .147* .175* " .117*" .010 .111* • .115* .024 .118° 0.6 2.6 1.4 7.9 6.5 7.0 0.8 0.8 7.9 4.6 7.5 0.4 2.8 3.5 3.3 4.3 Note. Participants are ordered according to percentage use of retrieval as shown in Appendix B; r 2 = squared correlation. a Predictor variable was product. b Predictor variable was nonretrieval use. 'Marginally significant a t p < .06. "p < .05. trials were removed; whereas for skilled participants, the amount of variance predicted by product was similar for all trials as compared with retrieval-only trials (10% vs. 12%). Similarly, slopes for less skilled participants decreased from 10.5 ms to 4.6 ms; whereas for skilled participants slope estimates were stable (2.1 ms vs. 2.2 ms). Thus, for both skilled and less skilled participants, problem size continued to predict latencies but to a much more modest degree than was indicated by latencies averaged across procedures. Even on retrieval trials alone, however, the difference in slopes between more and less skilled participants was significant, f(14) = 8.82, p < .05. Less skilled participants showed larger effects of problem size than more skilled participants. The results of the analyses of individual participants indicated that averaging across individuals and procedures creates a misleading picture of performance. First, averaging resulted in hugely inflated values of squared correlations for the problem size effect. Second, problem size effects for individual participants were strongly influenced by the presence of nonretrieval trials, even though most individuals used retrieval on the majority of trials. Third, problem size effects varied with arithmetic fluency. More skilled individuals consistently used retrieval and showed relatively small effects of problem size. Less skilled individuals showed larger effects of problem size, but these effects were heavily influenced by the presence of nonretrieval trials. Discussion Individual differences and arithmetic fluency. We also exam- ined the relations between arithmetic fluency and solution latencies. More skilled individuals are assumed to retrieve arithmetic facts quickly and efficiently (Ashcraft, Donley, Halas, & Vakali, 1992; Kaye, 1986; Kaye, deWinstanley, Chen, & Bonnefil, 1989; LeFevre & Kulak, 1994). Consistent with this hypothesis, there was a significant correlation between fluency and percentage use of retrieval: Skilled individuals used retrieval more frequently than did less skilled individuals (r = .52, p < .05). The three individuals who used retrieval most frequently (98%, 100%, and 100% of the time) were in the more skilled group. In fact, their fluency performance ranked first, second, and fourth, respectively (see Appendix B). Problem size effects also varied with arithmeticfluency.For example, the correlation betweenfluencyand squared correlations on all trials approached significance (r = —.47,/? = .064), suggesting that problem size was a less important predictor of latencies for more skilled than for less skilled participants. When only retrieval trials were considered, however, the correlation betweenfluencyand squared correlations dropped to -.04, indicating that skilled and less skilled participants showed similar patterns of performance when only retrieval trials were considered. These results suggest that selection of procedures is a factor in individual differences in addition performance, even among relatively skilled adults (e.g., Siegler, 1988a). To further examine the relation between arithmetic fluency and patterns of performance, we divided participants into more and less skilled groups on the basis offluencyscore.5 For less skilled participants, the amount of variance accounted for by product decreased from 32% to 16% when nonretrieval The most important variable in the area of mental arithmetic is problem size: Problems with larger numbers are solved more slowly than problems with smaller numbers. Given its persistence across a variety of tasks and participant groups, the problem size effect is equivalent in importance to that of the frequency effect in the study of word recognition. Certainly any complete theory of mental arithmetic must include an account of the problem size effect. Our results indicate that averaging across participants and procedures has resulted in inflated estimates of the importance of the problem size effect. For individual participants, selection of procedures appears to account for a substantial proportion of the problem size effect on simple addition problems, afindingthat is inconsistent with theories of mental arithmetic that do not include a role for solution procedures other than direct retrieval of facts. These results, plus the observation that problem size effect is quite modest on problems in which retrieval is used, have important implications for theories of mental arithmetic in adults. The Problem Size Effect on All Problems Our findings were simple. On problems with sums greater than 10, transformation procedures were used almost as frequently as direct retrieval. Retrieval was consistently faster, 5 The mean arithmetic fluency score was 71.2. The less skilled group included 9 participants, with a mean fluency score of 56.6 (SD = 7.0), range of 48-65. The more skilled group included 7 participants, with a mean fluency score of 89.6 (SD = 8.4) and a range of 77-99. PROBLEM SIZE AND PROCEDURE SELECTION however. Hence, averaging retrieval and transformation trials resulted in relatively slow latencies on problems with sums greater than 10. Participants showed fast and accurate retrieval of sums up to 10, and they rarely used transformations on these problems. This finding presumably reflects the centrality of 10 in the base-ten number system. In contrast, counting was used most often on problems with minimum addends or augends of one, two, or three, and latencies on counting trials were predicted best by the number of counts that were required. Thus, our data showed clear relations between problem characteristics and procedure selection that had important implications for latencies. These results might be considered unremarkable if the participants had been children (Bisanz, Morrison, & Dunn, 1995; Geary & Brown, 1991; Geary, Brown, & Samaranayake, 1991; Siegler, 1987; Siegler & Shrager, 1984). Our participants, however, were relatively skilled university students. Moreover, careful examination of the literature revealed other findings consistent with our own. Geary and Wiley (1991) also used a self-report method with students who solved simple addition problems. Their participants reported nonretrieval procedures on 12% of trials and also used nonretrieval procedures increasingly as a function of problem size. Although Geary and Wiley did not explicitly consider differences in the problem size effect on retrieval versus nonretrieval trials, they found a squared correlation of .30 for the relation between problem size (indexed by product) and latencies on retrieval trials.6 Although not as small as in our study, this value is considerably smaller than those reported in the literature for all trials when latencies were averaged across procedures. Similarly, Svenson (1985) also reported an attenuation in the slope of the problem size effect on retrieval trials (from 23 ms per increment to 4 ms per increment). In accord with the present results, the findings reported by Svenson and Geary and Wiley are consistent with the view that problem size and nonretrieval solutions are strongly related. We also found that selection of procedures varies across individuals as well as across problems. Theorists have assumed that retrieval is the dominant, if not the sole, procedure used by adults and, thus, that any nonretrieval trials would not substantially affect the overall pattern of results. This view has led to models that neglect or downplay the importance of the link between selection of procedures and problem solutions (e.g., Ashcraft, 1982,1987; Campbell & Oliphant, 1992; Widaman et al., 1989; Widaman & Little, 1992). The presentfindings,therefore, have important implications for extant models of how arithmetic facts are processed. In most models, the problem size effect serves as an index of how retrieval occurs by means of spreading activation in the mental network (e.g., Ashcraft, 1987; Campbell & Oliphant, 1992; Widaman et al., 1989). We found, however, that direct retrieval of an answer occurs infrequently on some problems (see Appendix C). Given that performance on such problems contributes substantially to the problem size effect, this index is not likely to provide a veridical reflection of retrieval processes. Thus, models of mental arithmetic must incorporate a link between selection of procedures and performance (Bisanz & LeFevre, 1990; Siegler & Shipley, 1995). Our results do not imply that there is no network but rather that indices of problem size on trials averaged across procedures do not directly reflect processing within the network. 225 The need to incorporate processes other than fact retrieval in models of mental arithmetic has previously been identified (e.g., Baroody, 1994), but the relation between fact retrieval and nonretrieval processes is rarely described explicitly. One exception is a model by Logan and his colleagues (Compton & Logan, 1991; Klapp, Bodies, Trabert, & Logan, 1991; Logan & Klapp, 1991), which was designed to account for changes in performance of adults on a novel, alphabetic arithmetic task as a function of extended practice. According to this model, the process of direct retrieval can operate in parallel with nonretrieval procedures. Whether the answer is produced by retrieval or nonretrieval procedures depends on which process is completed more quickly. Another exception is a series of models developed by Siegler and his colleagues (e.g., Siegler, 1988b; Siegler & Jenkins, 1989; Siegler & Shrager, 1984) to account for age-related changes in the diversity of solution methods used by children. In the most recent version of this model (Siegler & Shipley, 1995), associations are assumed to exist between a problem (e.g., 8 + 9) and possible answers (e.g., 16,17, and 18) and also between a problem and various nonretrieval procedures. Whether a child first attempts to solve the problem with fact retrieval or with a nonretrieval procedure depends on the relative strengths of the corresponding problem-answer and problem-procedure associations: Stronger associations are more likely to be activated and used to generate a solution. The models proposed by Logan (Compton & Logan, 1991; Klapp et al., 1991; Logan & Klapp, 1991) and by Siegler (Siegler, 1988b; Siegler & Jenkins, 1989; Siegler & Shrager, 1984) differ in many ways, but both can, in principle, accommodate the finding that adults frequently use nonretrieval procedures. In the case of Logan's model, the failure of most adults to rely entirely on retrieval could result from insufficient practice or from the use of procedures that are extremely efficient. Similarly, in Siegler and Shipley's (1995) model, the associative strength between a problem and a nonretrieval procedure could exceed that between the problem and an answer if experience with the problem is limited or if the nonretrieval procedure is especially efficient and accurate. In a simulation of the effects of extended practice, however, their model eventually used retrieval on 95% of all trials, a level greatly exceeding that of most of our adult participants. Either such extensive practice is not common among adults or the model requires some modification to prevent retrieval from being used so frequently. One such modification would be to ensure that the strength of problem-procedure associations is maintained relative to retrieval when a procedure is successfully executed, thus contributing to persistence in using that procedure on future trials. A second modification would be to include a greater variety of nonretrieval procedures during acquisition. In Siegler and Shipley's simulation of development, only two nonretrieval procedures (sum and min) were available. Notably absent were transformation procedures, which were numerous among our adults and which are not uncommon among children (Putnam et al., 1990; Siegler, 1987). Providing a broader range of fast and accurate proce6 Geary and Wiley (1991) used a subset (n = 40) of the 56 nontie combinations of the digits from 2-9, so their problem set is not directly comparable to that used in our study. 226 LEFEVRE, SADESKY, AND BISANZ dures (and, ultimately, the conceptual capacity to construct these procedures) might contribute toward continued use of these procedures, even as problem-answer associations become increasingly strong. Neither of these changes is particularly dramatic, and so our results seem consistent with the basic architecture of Siegler and Shipley's model. Thus, adults' frequent use of nonretrieval procedures presumably could be accommodated in both models with appropriate assumptions about the learning history of the individual (practice with particular problems and procedures), about the relative efficacy of constructed procedures in the course of their development, and about how these factors influence associative strengths (Siegler) or speed (Logan). To account for the pervasive problem size effect, it must also be assumed that smaller problems are practiced more frequently than larger problems and that nonretrieval procedures are relatively more efficacious for larger than for smaller problems. The plausibility of any such assumptions would have to be evaluated empirically in simulations or in research with children or novices. The Problem Size Effect on Retrieval Trials Although the problem size effect appears to be due primarily to the selective use of nonretrieval procedures and is severely reduced when nonretrieval latencies are omitted, it does not disappear completely (Tables 3 and 4). The issue, then, is how to explain this (reduced) problem size effect on retrieval trials. We do not propose a definitive explanation, but we suggest four possibilities, none of which necessarily excludes the others. First, retrieval latencies may reflect the structural characteristics of a network of problem-answer associations. Existing hypotheses about the structure of this network (e.g., Campbell & Oliphant, 1992; Widaman & Little, 1992) may apply, but they will require considerable reformulation in view of the fact that the problem size effect is so small on retrieval trials. In revising these hypotheses, it will be important to avoid the use of variables that index internal structure, such as product, that have very limited psychological plausibility (Ashcraft, 1992; McCloskey et al., 1991; Siegler & Shipley, 1995). The second possibility is that retrieval latencies may simply reflect the strength of problem-answer associations or the speed of retrieving answers that, in turn, depends primarily on frequency of exposure to specific facts (Ashcraft, 1987; Ashcraft & Christy, 1995; Campbell & Graham, 1985; Hamann & Ashcraft, 1986). Although frequency may be important early in acquisition (see Siegler & Shrager, 1984), in a recent and thorough study by Ashcraft and Christy (1995),7 the frequency with which addition facts are presented in a mathematics textbook series (Grades 1-6) was essentially uncorrelated with problem size (r = .12). The measure of frequency derived from Ashcraft and Christy was moderately correlated with latencies on retrieval trials in the present study (r = .23, p < .05), however, and predicted a significant 3% of variance in retrieval latencies, even after product was partialed out. Thus, frequency of presentation may contribute to a complete model of arithmetical representation, but on its own it does not provide a compelling account of problem size effects in retrieval of addition facts. The third possibility is based on the notion that retrieval latencies may reflect the use of nonretrieval procedures in the course of development. More specifically, a history of frequently using nonretrieval procedures for particular problems may result in longer latencies on the (infrequent) occasions when retrieval is used for those problems. That is, the probability of using nonretrieval procedures may influence latencies on problems solved with retrieval. Empirical support for this somewhat counterintuitive proposition is evident in the data of individual participants. We calculated the probability, across all subjects, of using a nonretrieval procedure for each problem (see the Probability of Retrieval section in Appendix C) and used this variable to account for variability in retrieval latencies across problems. We hypothesized that solution latencies would be slower on problems that were retrieved less often as compared with those that were retrieved more often. This hypothesis was based on the assumption that variability in procedure selection may reduce the peakedness of the retrieval distribution for a given problem (Siegler, 1987; Siegler & Shipley, 1995; Siegler & Shrager, 1984). As compared with the structural predictors, percentage of nonretrieval was not a particularly good predictor for data averaged across subjects, predicting only 1% of the variance for all trials and 2% for the reduced set. The picture is very different, however, when we analyzed the latencies of individual participants by using percentage of nonretrieval as the predictor. We compared these results with those obtained with product as the predictor, as shown in Table 4. For 15 of the 16 participants, product was a significant predictor of latencies on retrieval trials, accounting for a mean of 15% of the variance in latencies. Similarly, percentage of nonretrieval was a significant predictor of retrieval latencies for 13 of the 16 participants, accounting for a mean of 13% of the variance in latencies. Thus, percentage use of nonretrieval, as estimated on the basis of data from all participants, was almost as powerful of a predictor of retrieval latencies for individual participants as the most commonly used structural predictor. In contrast, frequency of presentation from Ashcraft and Christy (1995) was significantly correlated with retrieval latencies for only 4 of the 16 participants. Unlike product and other structural predictors, the probability of using nonretrieval procedures appears to have considerable psychological plausibility as a predictor of retrieval latencies, at least in the sense that the relation can be accommodated without major alterations to some existing models. In Siegler and Shipley's (1995) model, for example, the rinding that the probability of using a nonretrieval procedure is related to latencies on retrieval trials can be understood as a consequence of how individuals attempt to solve problems for which the answer is not well learned. More specifically, when retrieval of a poorly learned problemanswer association is used to generate a response, it is likely that multiple retrieval attempts are involved, rather than a single retrieval attempt. The explanation for this inference requires some elaboration. In Siegler and Shipley's (1995) model, as noted earlier, problems are associated with varying strengths to answers and 7 For multiplication facts, frequency of exposure was correlated with problem size. PROBLEM SIZE AND PROCEDURE SELECTION to nonretrieval procedures, and the probability of retrieving an answer or activating a nonretrieval procedure is proportional to these strengths. If the strength of a problem-answer association is somewhat weaker than the strength of a problemprocedure association, then chances are that the nonretrieval procedure will be used to solve a given problem. This selection is probabilistic, however, and on the assumption that all problem-answer associations are not zero, retrieval will be used on at least some occasions. In this model, one characteristic of problem-answer associations that are not well learned is that the distribution of associative strengths is relatively flat rather than peaked. Consequently, the chance of retrieving an acceptable answer on the first retrieval attempt (i.e., an answer that exceeds an internal confidence criterion) is not high; if the answer is retrieved at all, it is likely that multiple retrieval attempts will be required (see Siegler & Shipley, 1995; Siegler & Shrager, 1984). Thus, if a problem is presented that has weak problemanswer associations and relatively strong problem-procedure associations, then a nonretrieval procedure is likely to be used. On the occasions when retrieval happens to be used to answer such a problem, the latency to produce the answer is likely to involve multiple retrieval attempts and hence is likely to be slower than for problems in which an answer is likely to be retrieved in a single attempt. Consequently, a positive correlation across problems should be observed between (a) the probability of solving a problem with a nonretrieval procedure and (b) latency on a problem when retrieval is used. Precisely this pattern was observed in the data for most individual participants. According to this view, the relation between indices of problem size and latency is likely to be epiphenomenal. In Siegler's (Siegler & Shipley, 1995; Siegler & Shrager, 1984) models, for example, the strength of a problem-answer association depends in part on the accuracy with which answers were produced when a nonretrieval procedure was used. When learning addition facts, children often use counting-based procedures to solve problems, and associations between problems and answers are strengthened when correct answers are generated. Counting errors are assumed to be more likely on problems with larger addends or augends because these problems require more counting, and so the strengthening of problem-answer associations is more likely to be delayed on these problems than on problems with smaller addends and augends. Thus, indices of problem size, such as product, are correlated with latency primarily because of the way in which knowledge of arithmetic facts is acquired and not because these indices reflect structural variables that influence solution directly. Indeed, a simulation of learning in Siegler and Shipley's (1995) model generated substantial problem size effects indexed by product, even though no structural features reflecting product are represented in the model. In summary, our data are consistent with the notion that acquisition history, as preserved both in the distributions of associations for problems and answers and in the relative strength of nonretrieval procedures, accounts for the small problem size effect found in retrieval latencies. The framework described by Siegler and Shipley (1995) provides a psychologically plausible account of the relation between nonretrieval processing and retrieval processing and of the patterns of 227 latencies on problems averaged across all procedures. We also expect that this model will be useful for understanding performance on multiplication problems. Research has shown that latencies on addition and multiplication problems are closely related (Geary et al., 1986; Miller et al., 1984). Thus, multiplication performance may also be influenced by the pattern of procedure selection across problems. Although the overall frequency of retrieval may be greater on multiplication than on addition trials (Hubbard, LeFevre, & Greenham, 1994), the course of acquisition of multiplication facts may be similar to that for addition (Siegler, 1988b). For example, in multiplication, ties and facts involving multiples of five may thoroughly be learned (e.g., Campbell & Graham, 1985) and function as "known" facts in transformation solutions in the way that the sums-to-ten and ties do in addition. Further, there is some evidence that multiple retrievals are common in multiplication performance and influence latencies (Thibodeau, 1993). Thus, we expect that nonretrieval use will be important for understanding performance in multiplication. Finally, the link between probability of nonretrieval procedures being used and latency on retrieval trials may arise from errors of self-report. More specifically, if participants have a tendency to report use of retrieval when in fact they have used a nonretrieval procedure, and if this false-positive rate is constant across problems, then falsely reported retrieval latencies would be more likely to occur for problems on which nonretrieval procedures are used more commonly. Given the correlation between problem size and probability of using nonretrieval procedures (i.e., r = .71 with product as an index of problem size), the resulting problem size effect for retrieval latencies is not surprising. Although our data generally are consistent with the notion that self-reports were veridical, this final possibility cannot be ruled out conclusively. If this alternative is correct, however, the likely conclusion is that the venerable problem size effect for retrieval latencies is even less substantial than we propose. Resolution of this issue will require additional research. In summary, although there are a number of alternative explanations for the problem size effect on retrieval trials, most can be linked to the influence of nonretrieval procedures either concurrently or during acquisition. Thus, in combination with thefindingthat nonretrieval trials have a substantial effect on latencies on trials that are averaged across procedures, the analyses of retrieval-only trials suggest that attention to nonretrieval processing is critical to understanding simple arithmetic processes. These results have implications for theories of simple arithmetic, as discussed above, and for the study of arithmetic performance when methodologies are used in which latencies are averaged across diverse procedures. Concluding Remarks One striking finding in the present research is the richness and variability of the nonretrieval alternatives. Although a variety of studies have shown that adults use many procedures when solving problems (e.g., LeFevre & Bisanz, 1986), finding such variability on a task as basic as simple addition emphasizes that diversity is the rule in cognitive processing (see also 228 L E F E V R E , SADESKY, AND BISANZ Bisanz & LeFevre, 1990; Siegler & Shipley, 1995). Models that do not have a role for such diversity will never be adequate. Ashcraft (1992) reviewed 20 years of research on simple arithmetic and concluded that, in spite of the advances made in the field, the central question had not been answered: Why is there a problem size effect? The results of our research suggest that an important part of the answer is related to the role of nonretrieval processing in simple arithmetic. Implausible though it might seem, adults do continue to use procedures other than direct retrieval to solve simple addition problems. References Ashcraft, M. H. (1982). The development of mental arithmetic: A chronometric approach. Developmental Review, 2, 213-236. Ashcraft, M. H. (1987). Children's knowledge of simple arithmetic: A developmental model and simulation. In J. Bisanz, C. J. Brainerd, & R. Kail (Eds.), Formal methods in developmental psychology: Progress in cognitive development research (pp. 302-338). New York: SpringerVerlag. Ashcraft, M. H. (1992). Cognitive arithmetic: A review of data and theory. Cognition, 44, 75-106. Ashcraft, M. H., & Christy, K. S. (1995). The frequency of arithmetic facts in elementary texts: Addition and multiplication in Grades 1-6. Journal for Research in Mathematics Education, 26, 396-421. Ashcraft, M. H., Donley, R. D., Halas, M. A., & Vakali, M. (1992). Working memory, automaticity, and problem difficulty. In J. I. D. Campbell (Ed.), The nature and origins of mathematical skills (pp. 301-329). Amsterdam: Elsevier Science. Ashcraft, M. H., & Fierman, B. A. (1982). Mental addition in third, fourth, and sixth graders. Journal of Experimental Child Psychology, 33, 216-234. Ashcraft, M. H., & Stazyk, E. H. (1981). Mental addition: A test of three verification models. Memory & Cognition, 9, 185-196. Baroody, A. J. (1983). The development of procedural knowledge: An alternative explanation for chronometric trends of mental arithmetic. Developmental Review, 3, 225-230. Baroody, A. J. (1994). An evaluation of evidence supporting factretrieval models. Learning and Individual Differences, 6, 1-36. Bisanz, J., & LeFevre, J. (1990). Strategic and nonstrategic processing in the development of mathematical cognition. In D. F. Bjorklund (Ed.), Children's strategies: Contemporary views of cognitive development (pp. 213-243). Hillsdale, NJ: Erlbaum. Bisanz, J., Morrison, F. J., & Dunn, M. (1995). Effects of age and schooling on the acquisition of elementary quantitative skills. Developmental Psychology, 31, 221-236. Campbell, J. I. D. (1987a). Network interference and mental multiplication. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 109-123. Campbell, J. I. D. (1987b). Production, verification, and priming of multiplication facts. Memory & Cognition, 15, 349-364. Campbell, J. I. D. (1991). Conditions of error priming in number-fact retrieval. Memory & Cognition, 19, 197-209. Campbell, J. I. D., & Graham, D. J. (1985). Mental multiplication skill: Structure, process, and acquisition. Canadian Journal of Psychology, 39, 338-366. Campbell, J. I. D., & Oliphant, M. (1992). Representation and retrieval of arithmetic facts: A network-interference model and simulation. In J. I. D. Campbell (Ed.), The nature and origin of mathematical skills (pp. 331-364). Amsterdam: Elsevier Science. Compton, B. J., & Logan, G. D. (1991). The transition from algorithm to retrieval in memory-based theories of automaticity. Memory & Cognition, 19, 151-158. Cooney, J. B., & Ladd, S. F. (1992). The influence of verbal protocol methods on children's mental computation. Learning and Individual Differences, 4, 237-257. French, J. W., Ekstrom, R. B., & Price, I. A. (1963). Kit of reference tests for cognitive factors. Princeton, NJ: Educational Testing Service. Geary, D. C , & Brown, S. C. (1991). Cognitive addition: Strategy choice and speed-of-processing differences in gifted, normal, and mathematically disabled children. Developmental Psychology, 27, 398-406. Geary, D. C , Brown, S. C , & Samaranayake, V. A. (1991). Cognitive addition: A short longitudinal study of strategy choice and speed-ofprocessing differences in normal and mathematically disabled children. Developmental Psychology, 27, 787-797. Geary, D. C , & Burlingham-Dubree, M. (1989). External validation of the strategy choice model for addition. Journal of Experimental Child Psychology, 47, 175-192. Geary, D. C , Frensch, P. A., & Wiley, J. G. (1993). Simple and complex mental subtraction: Strategy choice and speed-of-processing differences in younger and older adults. Psychology and Aging, 8, 242-256. Geary, D. C , Widaman, K. R., & Little, T. D. (1986). Cognitive addition and multiplication: Evidence for a single memory network. Memory & Cognition, 14, 478-487. Geary, D. C , & Wiley, J. G. (1991). Cognitive addition: Strategy choice and speed-of-processing differences in young and elderly adults. Psychology and Aging, 6, 474-483. Graham, D. J., & Campbell, J. I. D. (1992). Network interference and number-fact retrieval: Evidence from children's alphaplication. Canadian Journal of Psychology, 46, 65-91. Groen, G. J., & Parkman, J. M. (1972). A chronometric analysis of simple addition. Psychological Review, 79, 329-343. Hamann, M. S., & Ashcraft, M. H. (1985). Simple and complex mental addition across development. Journal of Experimental Child Psychology, 40, 49-72. Hamann, M. S., & Ashcraft, M. H. (1986). Textbook presentations of the basic arithmetic facts. Cognition and Instruction, 3, 173-192. Hubbard, K. E., LeFevre, J., & Greenham, S. L. (1994, June). Procedure use in multiplication by adolescents. Poster presented at the annual meeting of the Canadian Society for Brain, Behaviour, and Cognitive Science, Vancouver, British Columbia, Canada. Kaye, D. B. (1986). The development of mathematical cognition. Cognitive Development, 1, 157-170. Kaye, D. B., deWinstanley, P., Chen, Q., & Bonnefil, V. (1989). Development of efficient arithmetic computation. Joumalof Educational Psychology, 81, 467-480. Klapp, S. T., Boches, C. A., Trabert, M. L., & Logan, G. D. (1991). Automatizing alphabet arithmetic: II. Are there practice effects after automaticity is achieved? Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 196-209. Koshmider, J. W., & Ashcraft, M. H. (1991). The development of children's mental multiplication skills. Journal of Experimental Child Psychology, 51, 53-89. LeFevre, J., & Bisanz, J. (1986). A cognitive analysis of number series problems: Sources of individual differences in performance. Memory & Cognition, 14, 287-298. LeFevre, J., Bisanz, J., & Mrkonjic, L. (1988). Cognitive arithmetic: Evidence for obligatory activation of arithmetic facts. Memory & Cognition, 16, 45-53. LeFevre, J., & Kulak, A. G. (1994). Individual differences in the obligatory activation of addition facts. Memory & Cognition, 22, 188-200. LeFevre, J., Kulak, A. G., & Bisanz, J. (1991). Individual differences and developmental change in the associative relations among numbers. Journal of Experimental Child Psychology, 52, 256-274. Logan, G. D., & Klapp, S. T. (1991). Automatizing alphabet arithmetic: I. Is extended practice necessary to produce automaticity? 229 PROBLEM SIZE AND PROCEDURE SELECTION Siegler, R. S., & Jenkins, E. A. (1989). How children discover new strategies. Hillsdale, NJ: Erlbaum. Siegler, R. S., & Shipley, E. (1995). Variation, selection, and cognitive change. In G. Halford & T. Simon (Eds.), Developing cognitive competence: New approaches to process modelling (pp. 31-76). Hillsdale, NJ: Erlbaum. Siegler, R. S., & Shrager, J. (1984). Strategy choices in addition and subtraction: How do children know what to do? In C. Sophian (Ed.), Origins of cognitive skills (pp. 229-293). Hillsdale, NJ: Erlbaum. Svenson, O. (1985). Memory retrieval of answers of simple additions as reflected in response latencies. Acta Psychologica, 59, 285-304. Thibodeau, M. T. (1993). Automatic activation of multiplication facts and procedure use on multiplication problems: Evidence for ADAMM. Unpublished master's thesis, University of Alberta, Edmonton, Alberta, Canada. Wheeler, L. R. (1939). A comparative study of the difficulty of the 100 addition combinations. Journal of Genetic Psychology, 54, 295-312. Widaman, K. F., Geary, D. C , Cormier, P., & Little, T. D. (1989). A componential model for mental addition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 898-919. Widaman, K. F., & Little, T. D. (1992). The development of skill in mental arithmetic: An individual differences perspective. In J. I. D. Campbell (Ed.), The nature and origins of mathematical skills (pp. 189-253). Amsterdam: Elsevier Science. Widaman, K. F., Little, T. D., Geary, D. C , & Cormier, P. (1992). Individual differences in the development of skill in mental addition: Internal and external validation of chronometric models. Learning and Individual Differences, 4, 167-213. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 179-195. Lorch, R. F., Jr., & Myers, J. L. (1990). Regression analyses of repeated measures data in cognitive research. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 149-157. McCloskey, M., Harley, W., & Sokol, S. M. (1991). Models of arithmetic fact retrieval: An evaluation in light of findings from normal and brain-damaged subjects. Journal ofExperimental Psychology: Learning, Memory, and Cognition, 17, 377-397. Miller, K. F., & Paredes, D. R. (1990). Starting to add worse: Effects of learning to multiply on children's addition. Cognition, 37, 213-242. Miller, K. F., Perlmutter, M., & Keating, D. (1984). Cognitive arithmetic: Comparison of operations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 46-60. Putnam, R. T., deBettencourt, L. U., & Leinhardt, G. (1990). Understanding of derived-facts strategies in addition and subtraction. Cognition and Instruction, 7, 245-285. Russo, J. E., Johnson, E. J., & Stephens, D. L. (1989). The validity of verbal protocols. Memory & Cognition, 17, 759-769. Siegler, R. S. (1987). The perils of averaging over strategies: An example from children's addition. Journal of Experimental Psychology: General, 116, 250-264. Siegler, R. S. (1988a). Individual differences in strategy choices: Good students, not-so-good students, and perfectionists. Child Development, 59, 833-851. Siegler, R. S. (1988b). Strategy choice procedures and the development of multiplication skill. Journal of Experimental Psychology: General, 117, 258-275. Siegler, R. S. (1989). Hazards of mental chronometry. An example from children's subtraction./o(OTUi/o/£(iu<;ariona/ftryc/io/c^v, 81, 497-506. Appendix A Procedures Used on Addition Problems Procedure Retrieval Counting Transformation Examples of self-reports Procedure Examples of self-reports "I remembered it. I knew it." "I counted. I counted in my head." "I used my fingers." "It's just one more than . . . " "5 + 6 is just 5 + 5 + 1" (using a tie). "9 + 6 is 9 + 1 + 5" or "4 + 9 is 10 + 4 - 1" (through 10). "6 + 5 is 2 x 5 - 1" (using multiplication). Zero rule "Anything plus zero is that same number." "Zero again." "I guessed." "I counted, but then I remembered the answer." "2 plus any even number is the next even number." "I visualized the numbers 8 and 7." "I visualized the numbers and then counted." Other Appendix B Arithmetic Fluency Scores and Frequency of Procedure Use (Percentage) for Each Participant Frequency of use Participant Fluency Retrieval Counting Transformation Rule Other 01 13 15 09 02 16 07 14 06 97 90 98 56 77 65 95 54 49 100 100 98 90 84 79 70 70 69 — — — 2 8 _ — 1 2 4 — 5 1 2 18 (Appendixes continue on next page) 4 16 18 15 30 12 1 230 L E F E V R E , SADESKY, AND BISANZ Appendix B (continued) Frequency of use Participant Fluency Retrieval Counting Transformation 03 10 05 04 11 12 08 65 57 80 65 89 48 50 68 68 61 — 23 16 6 26 21 16 32 1 10 37 26 31 30 57 47 46 33 Rule Other 12 17 Note. Participants are ordered according to frequency of use of retrieval. Dashes represent 0% use of a procedure by a participant. Appendix C Probability of Use of Retrieval, Counting, and Transformation Procedures Addend Augend 0 1 2 3 0 1 2 3 4 5 6 7 8 9 .88 .88 .94 .94 .88 .88 .87 .88 .88 .88 .81 .88 .69 .81 .69 .81 .81 .69 .81 .75 .88 .87 .81 .75 .75 .75 .88 .56 .81 .75 .94 .81 .81 .94 .81 1.00 .40 .53 4 5 6 7 8 9 .94 .75 .73 .81 .75 .87 .50 .60 .56 .19 .88 .69 .87 .81 .81 .56 .75 .44 .44 .44 .94 .75 .75 .94 .46 .54 .50 .81 .50 .25 .94 .75 .94 .88 .75 .63 .56 .38 .38 .40 .25 .27 .81 Retrieval .75 .69 .88 .81 .81 1.00 .75 .75 .88 .56 .50 .33 .50 .56 .53 .69 .50 .88 .27 Counting 0 1 2 3 4 5 6 7 8 9 (.12) (.19) (.19) (.19) (.12) (.06) (.12) (.06) (.06) (.12) (.12) .06 .25 .25 .19 .25 .31 .25 .25 .25 (.06) .31 — .06 .06 .20 .07 .19 .06 .31 — — — — (.06) .19 .13 .06 .14 .06 — — — — (.12) .31 .19 .15 .13 .19 — — — — (.06) .13 .25 .15 .07 .06 — — — — (.12) .19 .19 .06 .06 .13 — (.12) .31 .19 .13 .07 .06 — .06 .06 — — (.12) .25 .13 .07 .13 .06 .13 — (.12) .25 .13 .07 .13 .06 .06 .06 — — Transformation 0 1 2 3 4 5 6 7 8 9 — — — — — — — — — — — — — — — .12 — — — — .06 — .31 .19 .13 .06 .13 .40 .40 .06 — .25 .25 .13 .25 .38 .53 — — — — .06 .25 .13 .44 .33 .38 .69 .19 .13 .44 — .38 .23 .44 .19 .38 .69 .36 .31 .40 .25 .38 .12 .67 .25 .50 .44 .50 .06 .38 .44 .56 .47 .69 .73 .19 Note. Numbers in parentheses represent the probability of zero rule. Dashes represent zero probability of use. Received January 19,1994 Revision received January 30,1995 Accepted January 31,1995