236 CHAPTER 4: EXPERIMENTAL DESIGN AND PRESENTATION OF RESULTS 4.0 Introduction In this chapter, I present my own experimental evidence for children’s acquisition of the tough construction (TC) and related null operator structures (NOS). These findings, to be discussed in §4.5, were obtained in large part from an experimental study that I conducted with forty-four monolingual child speakers of British English, who ranged in age from 3;4 to 7;5. These forty-four subjects were drawn from an original group of 122 children whom I pre-tested for their knowledge of the meaning of several tough adjectives, as well as for their recall memory for story events. In keeping with the broad research questions originally outlined in §1.0 of Chapter 1, my experimental study was designed to allow me to test the following three hypotheses: (1) a. Acquisition of the TC is relatively delayed because the construction is syntactically complex and children initially lack the requisite syntactic ability to interpret the TC in a target-like manner. b. Acquisition of the TC is relatively delayed because children require some time to learn the correct lexical properties of the tough adjective. c. Children initially fail to interpret the TC in a target-like manner because they experience a more general difficulty with the interpretation of syntactically displaced object arguments. Hypothesis (1a) is predicated on the following two assumptions. The first is that the structural representation of the TC, like other NOS, involves a null operator-gap configuration in an embedded clause, which is referentially coindexed with a matrix antecedent. The second is that the derivation of any NOS is reasonably considered syntactically complex in comparison with other structures in the language which do not involve the interpretation of a displaced syntactic constituent. In essence, hypothesis (1a) predicts that the ability to interpret various NOS should be acquired concurrently and also fairly late. Accordingly, supportive evidence for the validity of D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 237 this hypothesis would be obtained if even my oldest subjects (i.e. those over 7;0) were to demonstrate an inability to interpret all NOS in a target-like manner. Conversely, I reasoned that if my subjects proved able to interpret some NOS in a target-like manner, but not others, this pattern of performance could still be taken as informative for syntactic theory. This is because concurrent acquisition of various NOS would be predicted according to a theory, such as the null operator analysis reviewed in Chapter 2, which takes NOS to share certain fundamental aspects of syntactic representation and a similar level of syntactic complexity. If instead it could be demonstrated that children acquire various NOS, including the TC, in a piecemeal fashion, then this finding would cast doubt on these fundamental assumptions. At the very least, were my subjects to show an early mastery of certain NOS, their nontarget-like performance on other NOS would be much less plausibly explained in terms of the syntactic complexity of these structures. My review of previous experimental studies of the acquisition of NOS in Chapter 3 presented yet one more possibility with regard to my evaluation of hypothesis (1a). This concerns the widely reported finding that children in the Intermediate stage of the acquisition of the TC assign both target-like and non-target-like readings to the construction (cf. Cromer 1970). I view such a pattern of performance as being inconsistent with the validity of hypothesis (1a), at least as concerns a child’s ability to interpret the TC. That is, I maintain that for the claim to hold that a child lacks the requisite syntactic ability to interpret the TC, the child should consistently fail to interpret the TC in a target-like manner. Turning to hypothesis (1b), supportive evidence for the validity of this hypothesis would be obtained were my subjects to demonstrate a varying ability to assign a target-like interpretation to the TC depending on which particular tough adjective featured in the construction. Conversely, however, a child’s consistent non-target-like performance on the TC would be compatible with either hypothesis (1a) or (1b) or with both. Thus, on the basis of such a pattern of performance, I would be unable to establish a definitive explanation for the child’s non-target-like treatment of the TC. Even so, hypothesis (1b) predicts some degree of inconsistency in the child’s D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 238 interpretation of individual TC items, and therefore I think consistent non-target-like performance on the TC would cast doubt on the validity of (1b). As regards hypothesis (1c), supportive evidence would be obtained should any of the subject groups demonstrate a concurrent inability to interpret passives and NOS in a target-like manner, given that both constructions involve the interpretation of a displaced object argument. As noted in Chapter 3, Cromer (1970) had previously tested children’s concurrent ability to interpret the TC and the passive and had reported that performance on the passive was comparatively better than on the TC for even the youngest subjects in his study. This early finding thus undermines the validity of hypothesis (1c). Nevertheless, as Cromer’s was the only study to offer concurrent testing of the two constructions, I thought a contemporary re-testing of the two was warranted. Furthermore, Cromer had offered only two tokens of the passive in his study, both of which featured the verb to bite. Since I am aware that children’s competence in interpreting the passive has been claimed to vary according to whether the passive features a verb that typically takes an agentive subject (e.g. bite) or an experiencer subject (e.g. like) (see, e.g. Maratsos, Fox, Becker, and Chalkley 1985), I decided to expand the range of passives tested in my own study, offering sentences that featured passivized versions of both types of verbs. My aim was to determine, first, if my subjects would show the same dissociation between target-like performance on the TC and the passive as had Cromer’s subjects and, second, if my subjects would experience difficulty with the interpretation of psychological (or nonactional) passives as compared to agentive (or actional) passives. Lastly, as the reader will recall, one of the research questions I posed in §1.0 of Chapter 1 pertains to the issue of how experimental findings obtained in studies of the acquisition of the TC can be used to inform more general theories of language acquisition and, in particular, generative theories of the same. I address this particular question in Chapter 5 of this thesis. As a preview of the discussion to be contained in that chapter, I will argue that the experimental findings reported in the present chapter raise clear implications for generative theories of acquisition and, specifically, that D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 239 these findings can inform our general understanding of how learnability principles operate in the acquisition of a first language. 4.1 Organization of chapter In §4.2, I describe a pre-test that I conducted with 122 children drawn from a preschool and primary school, both located in the village of Willingham, Cambridgeshire, UK. The pre-test consisted of two experimental trials. In the first, I tested subjects for their knowledge of the tough adjectives, easy, hard, difficult, and impossible, as well as for their knowledge of the degree construction (DC). In the second, I assessed each child’s ability to retain a short sequence of story events in memory. The motivation for this particular pre-test was my selection of the truthvalue judgment (TVJ) task as the assessment technique to be employed in the main experimental study. As the TVJ task requires a child to retain a short sequence of story events in memory long enough to assign a contextually appropriate interpretation to the sentence under consideration, I believed that successful performance on a pre-test of recall memory would serve as an appropriate inclusion criterion for a child’s participation in the main study. As I detail in §4.2, my findings from the pre-test included the observation that a sizeable number of the subjects I tested lacked knowledge of the meaning of the adjective impossible, a finding which raises questions regarding the appropriateness of the design and/or methodology employed in certain previous studies (e.g. Kessel 1970 or McKee 1997a). I also explain how the results of the vocabulary pre-test suggest a predictable order of acquisition of the four tough adjectives, with the acquisition of easy and hard preceding that of either difficult or impossible. Finally, I review the results obtained in the memory pre-test, including the rather disappointing performance of my younger subjects. In §4.3, I review the design of the main study, in which I tested forty-four children between the ages of 3;4 and 7;5 for their knowledge of NOS, including the TC, object-gap degree construction (ODC), object-gap purpose construction (OPC), and infinitival relative construction (IR), as well as for their knowledge of passive D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 240 sentences. After Crain and Thornton (1998) and Gordon (1998), I review the basic procedures associated with use of the TVJ task. I then discuss specific modifications of the task that I adopted in my own study. Lastly, taking each of the abovereferenced constructions in turn, I illustrate the design features of each test condition. Section 4.4 contains a brief review of the findings I obtained in a pilot study that I conducted one month prior to the main study and a discussion of how these findings prompted me to alter certain aspects of my original experimental design. In §4.5, I present the results of the main study, discussing each construction in turn, beginning with the TC. I first analyse my results in terms of group performance, with the following four groups, each consisting of eleven children, organized according to age: Group 1 (ages 3;4 to 4;4), Group 2 (ages 4;6 to 5;5), Group 3 (ages 5;6 to 6;3), Group 4 (6;5 to 7;5). In §4.5.0.0, I offer a statistical analysis of group performance on the TC, which I found to be non-target-like for all but the oldest group. For Groups 1-3, I also report considerable individual variation in subject performance, consistent with findings earlier reported by McKee (1997a). Pace McKee, however, I point out that I did not find any evidence that the performance of my subjects varied according to the presence of a particular tough adjective or adjectives in the TC, and thus that my results do not provide support for hypothesis (1b). In the same section, I report that the balanced variation I introduced in the design of my test items did not have any appreciable effect on subject performance. As I explain, this finding raises implications for certain of the design recommendations outlined in Crain and Thornton (1998). I also detail problematic aspects of the design of certain of my own test/control items. In §4.5.0.1, I provide a detailed analysis of the performance of individual subjects on the TC. While analysis of performance at the group level indicated that my subjects below the age of 6;3 failed to interpret the TC in a target-like manner, an analysis of individual performance revealed that the majority of my subjects offered both targetlike and non-target-like interpretations of the TC, consistent with their having entered the Intermediate stage of acquisition (cf. Cromer 1970). When considered in conjunction with the production data I review in §4.5.0.2, where Intermediate subjects D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 241 are observed to provide appropriate explanations of both target-like and non-targetlike interpretations of the TC, I argue that my findings present a picture of Intermediate performance in which the child does not apply guesswork to the task, as has been elsewhere argued in the literature, but instead chooses between two interpretive options made available by her grammar. In §4.5.1, 4.5.2, and 4.5.3, I present the results I obtained, respectively, for the ODC, IR, and OPC. With regard to group performance on the ODC, I report that I did not find support for hypothesis (1a), given that children in all four age groups demonstrated the ability to assign both subject and object readings to the ambiguous DC. Looking at individual performance, I did find that four subjects, all under the age of 4;4, failed to provide any object readings of the DC. Nevertheless, I argue that since each of these children provided at least one target-like reading of the IR and the OPC and gave mixed readings of the TC, their performance on the DC could simply reflect an interpretive bias for the subject reading of this construction. I detail how this hypothesis is given further support by my analysis of group results, which reveal that a preference for the object reading of the DC, characteristic of the adult population, is demonstrated by children only after the age of 6;5. I take this latter finding as indicative that children require some time to recognize the existence of this interpretive bias in the primary linguistic data (PLD) and for their production of this form to be probabilistically adjusted. In §4.5.2, I report that subjects in all four age groups demonstrated the ability to assign target-like readings to the IR and thus that they did not perform in a manner consistent with hypothesis (1a). As I note, however, this target-like performance was largely restricted to one of the two items tested in this condition. I consider how the design of this particular item could have negatively influenced subject performance, for example, by admitting a third, unintended reading of this particular test sentence. In §4.5.3, I explain that my subjects’ largely successful performance on the OPC not only fails to provide support for hypothesis (1a) but also conflicts with results earlier reported by H. Goodluck and colleagues. For example, whereas Goodluck and Behne (1992) had claimed that children as old as ten fail to demonstrate target-like ability to D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 242 interpret the OPC, I found that only two of my subjects above the age of 5;6 offered a non-target-like interpretation of the OPC, in each case an error restricted to a single test item. In §4.5.4, I analyse the results I obtained for actional and nonactional passives, reporting that subjects in all four age groups performed like adults on actional passives. I therefore observed no necessary correlation between non-target-like performance on the TC, which was observed for all groups with the exception of group 4, and non-target-like performance on actional passives. As I detail, then, my findings do not provide support for hypothesis (1c). With regard to nonactional passives, however, all four of my child groups performed worse on these items than on actional passives, consistent with findings earlier reported in the literature (e.g. Maratsos et al. 1979, 1985). I examine the performance of individual subjects in this condition to investigate the source of the difficulty that children experience with these structures. In §4.5.5, I perform a statistical comparison of group performance on the TC and group performance on the DC. I argue that the results of this statistical analysis provide support for my contention that children under the age of 6;5 treat both constructions as ambiguous, with the subject reading of each construction remaining a strong preference prior to this age. In §4.5.6, I shift the focus to a consideration of the performance of individual subjects across the full range of constructions tested. I demonstrate that at the individual level, I once again find little support for the validity of hypothesis (1a), since all of my subjects displayed target-like ability with regard to one or more NOS. As I point out, this finding holds true even in the case of the three children in the study who could reasonably be classified as P-R Users, having provided one or fewer target-like readings of the TC. The data reported in §4.5.6 thus do not support concurrent acquisition of NOS nor delayed acquisition of all such structures. In §4.6, I close my presentation of experimental results with a review of the performance of my subjects on the British Picture Vocabulary Scale (BPVS) (Dunn & Dunn 1997), which I administered after each subject’s participation in the main study. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 243 The inclusion of this post-test was inspired by one of Cromer’s findings (1970), which was that a subject’s verbal mental age (VMA), as determined by administration of the Peabody Picture Vocabulary Test (PPVT), proved a more reliable predictor of TC performance in his experiment than a subject’s chronological age. I was therefore interested to determine if the interpretive abilities of my own subjects could be similarly correlated with vocabulary ability. I report that, like Cromer, I found little correlation between TC performance and chronological age, but that, unlike Cromer, I found only a very rough correlation between VMA and subject performance on the TC. Moreover, as I observed that a relatively high VMA score did not necessarily predict a subject’s successful performance on the TC, I argue that this finding casts further doubt on the validity of hypothesis (1b). 4.2 Pre-Test 4.2.0 Part one: Vocabulary test 4.2.0.0 Design In Chapter 3, I reviewed a sizeable number of experimental studies of the acquisition of TC and yet noted only two, Macaruso et al. (1993) and McKee (1997a), which had featured testing of the meaning of various tough adjectives independent of their occurrence in the TC. In both of these studies, however, this testing followed rather than preceded presentation of the same adjectives in the structural context of the TC. As earlier noted, I consider this problematic in two respects. First, it is impossible for the researcher to control for any effect of bias that may be introduced by presenting these adjectives in a suitable structural context prior to the testing of their meaning alone. Second, I believe it is preferable from the standpoint of experimental design for the researcher to adopt selection criteria that are strict enough to define a homogeneous sample in the first instance, rather than to adopt what might be reasonably termed ‘exclusion criteria,’ which can identify unsuitable participants only after the testing of main experimental items is complete. As discussed in §3.2.1.1 of Chapter 3, I believe that a further complication may have been introduced by McKee as a result of her decision not to exclude child participants D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 244 who missed only one tough adjective (i.e. easy, hard, difficult, or impossible) in the vocabulary post-test that she administered. As a consequence, even if one of McKee’s subjects failed to demonstrate knowledge of the meaning of a particular tough adjective, the same child’s interpretations of TC test items containing this adjective were still included in the overall analysis of results. In my opinion, this situation is less than desirable, given that it calls into question the reliability of certain of the data collected in the study. Accordingly, in my own study, I chose to test my subjects’ knowledge of the meaning of various tough adjectives prior to presentation of these lexical items in the TC. The pre-test I administered consisted of two different experimental trials. The first of these was the vocabulary test referenced above, which also included assessment of the ability of my subjects to interpret a degree construction. The second was designed to evaluate subject memory for story details, for reasons that will be detailed in §4.3, below. A total of 122 children participated in the pre-test. All were monolingual native speakers of British English, whose parents were primarily of middle or working class background. The 122 subjects, who ranged in age from 3;0 to 7;6, consisted of roughly equal numbers of boys and girls. Those below the age of 4;8 attended the Honeypot Pre-School in Willingham, Cambridgeshire, while those over this age attended Willingham Primary School, which is physically adjacent to the pre-school. All testing was conducted on site at the particular school the child attended. For part one of the pre-test, which I will term the vocabulary test, I adopted the same technique used in McKee’s (op.cit.) vocabulary post-test. Child participants were presented with a pair of pictures, only one of which matched the correct interpretation of an expletive-headed tough sentence, such as It is hard for the boy to open the door. Like McKee, I reasoned that because the logical subject and logical object of the embedded verb are transparently represented in the surface word order of such a sentence, children who knew the meaning of the tough adjective featured in the expletive-headed sentence would assign the sentence a target-like interpretation. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 245 The particular lexical items that I chose to test were the adjectives easy, hard, difficult, and impossible. Picture pairs were designed to allow flexibility in the order of presentation of individual adjectives across different test trials. For example, the same pair of pictures could be used to test one child on the adjective easy and another child on the adjective hard. In Figures 4.0 and 4.1, below, I provide examples of specific picture pairs that I used to test easy, hard, and difficult. (Note that all of the drawings used in the pre-test are the work of Mrs. Karen Harris, a teaching assistant at the Honeypot Pre-School, whose contribution to the study was much valued and appreciated.) Figure 4.0: It is ‘easy/hard/difficult’ for the boy to open the door. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 246 Figure 4.1: It is ‘easy/hard/difficult’ for the dog to get his bone. Two warm-up items preceded presentation of the actual test items to allow subjects to familiarize themselves with the requirements of the picture-selection task. For individual test items, I would first briefly discuss features of both pictures and then present the test sentence, using language such as that exemplified in (2), below: (2) In this picture there is a dog and a bone, and in this picture there is a dog and a bone. But in one of these pictures, it is easy for the dog to get his bone. Can you show me which one? The test sentence might be repeated a number of times until the child made a choice. Following this, I sometimes asked the child a follow-up question, which typically took the form of a ‘why’ question; for example, “Why is it easy for the dog to get his bone?” or “Why is it hard for the boy to open the door?” Notably, the form of the follow-up question remained the same regardless of which picture the child had actually selected since, even in the case of an incorrect choice, the child’s response D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 247 was taken as an assertion that the chosen picture, for her, represented the one that best matched the meaning of the test sentence. For testing the adjective impossible, four different picture pairs were originally designed. In the early stages of the pre-test, two of these proved less suitable than the others (see the discussion in §4.2.0.1, below) and were therefore dropped from further use. The two remaining pairs are presented in Figures 4.2 and 4.3, below: Figure 4.2: It is ‘impossible’ for the duck to eat. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 248 Figure 4.3: It is ‘impossible’ for the fish to swim. Finally, following a suggestion made by Ianthi Tsimpli (p.c.), the vocabulary test was extended to include an example of a DC. This suggestion was prompted after the author’s search of the Wells (1973-77) corpus had yielded very few examples of the use of either the subject-gap degree clause (SDC) or the object-gap degree clause (ODC) in naturalistic child speech (see §3.2.0, Chapter 3). Since, on the basis of these limited findings, I could not be entirely sure that the DC represented a construction known to children as young as three, I followed Tsimpli’s recommendation to test my subjects’ familiarity with this type of construction. In order to avoid prior presentation of any of the structures that would feature in the main study, and thus the introduction of bias, I chose to assess my potential subjects’ knowledge of the DC through presentation of a subject-gap degree clause (SDC) (e.g. The mouse is too big to go through the hole in the wall), rather than an ODC. As in the case of the tough adjectives presented in this phase of the pre-test, subjects were D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 249 required to demonstrate target-like knowledge of the SDC in order to be selected for participation in the main study. Two different picture pairs were prepared to test children’s comprehension of the SDC, with the choice of pair randomly determined. In Figure 4.4, below, I provide an example of one of the two picture pairs used in this condition: Figure 4.4: The girl is ‘too small’ to carry the box. With only two exceptions, the 122 children who participated in the vocabulary pretest were given a total of seven picture pairs to evaluate, consisting of two warm-up pairs, four tough adjective pairs, and one ODC pair. The two children who proved an exception were aged 3;0 and 3;3 and had failed both easy and hard pairs on first presentation. These children were therefore not tested on difficult and impossible but instead were retested on easy and hard at a later point in the session. For all subjects, the order of presentation of the two warm-up pairs was fixed, but the order of presentation of actual test items was randomly varied, as was the left-to-right D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 250 order of presentation of individual picture pairs.1 Finally, the total time required to administer the vocabulary test generally did not exceed five minutes. 4.2.0.1 Presentation of results As anticipated, I generally found that even the youngest subjects experienced little difficulty in complying with the requirements of the picture selection task. Out of the 122 children participants, I found only two who proved unable to perform the task, both below the age of 3;4.2 Notably, these were not the same two children referenced in the preceding section, who were tested twice on easy and hard items. Additionally, I was unable to analyse the results for one child, aged 5;6, whose responses were lost due to a taping error. For the purpose of analysing the results of the vocabulary test, I divided the remaining 119 subjects into five separate age groups. Given that the number of subjects in each age group is not identical, any between-group comparisons that are drawn in the discussion to follow will be based on percentages rather than numerical counts. The overall performance of subjects on the adjective easy is reported in Table 4.0, below: 1 One minor exception to variation in the order of presentation was made with respect to testing of impossible. Because I felt that accessing the meaning of this adjective might prove challenging for younger subjects, I decided that testing of impossible should never immediately follow presentation of the warm-up picture pairs. 2 The results for these two subjects were compromised because they at times insisted on choosing both pictures even after having been prompted to make only a single selection, with one child habitually replying, “Me want this one and that one.” Furthermore, their reason for selecting a single picture, when they did do so, appeared to have more to do with some salient activity depicted in the picture – consider one child’s comment, “Mouse eat a cheese” - than with the test sentence itself. D.L. Anderson, University of Cambridge 251 Chapter 4: Experimental Design and Presentation of Results Age group 3;0 to 3;11 4;0 to 4;10 5;1 to 5;11 6;0 to 6;11 7;0 to 7;6 # of subjs. 29 22 27 27 14 Correct 26 (89.7%) 22 (100%) 27 (100%) 27 (100%) 14 (100%) Incorrect 3 (10.3%) 0 0 0 0 Don’t know 0 0 0 0 0 Not tested 0 0 0 0 0 Table 4.0: Results for ‘easy’ by age group As Table 4.0 illustrates, the only errors reported for easy occurred in the youngest age group and these represented only approximately 10% of the total responses. Subjects in all other age groups demonstrated target-like knowledge of the meaning of easy. The overall performance of subjects on the adjective hard is next reported in Table 4.1, below: Age group 3;0 to 3;11 4;0 to 4;10 5;1 to 5;11 6;0 to 6;11 7;0 to 7;6 # of subjs. 29 22 27 27 14 Correct 21 (72.4%) 22 (100%) 25 (92.6%) 26 (96.3%) 14 (100%) Incorrect 5 (17.3%) 0 1 (3.7%) 1 (3.7%) 0 Don’t know 3 (10.3%) 0 1 (3.7%) 0 0 Not tested 0 0 0 0 0 Total NTL 8 (27.6%) 0 2 (7.4%) 1 (3.7%) 0 Table 4.1: Results for ‘hard’ by age group (NB: ‘NTL’ = non-target-like) D.L. Anderson, University of Cambridge 252 Chapter 4: Experimental Design and Presentation of Results As indicated in the table above, most subjects over the age of four demonstrated target-like knowledge of the meaning of hard, and the majority of subjects (i.e. 72.4%) in the youngest age group also performed successfully. Nevertheless, the number of children between the ages of 3;0 and 3;11 who did not know the meaning of hard were more than double the number who failed easy. Table 4.2, below, next compares performance across the five age groups on difficult: Age group 3;0 to 3;11 4;0 to 4;10 5;1 to 5;11 6;0 to 6;11 7;0 to 7;6 # of subjs. 29 22 27 27 14 Not tested 3 0 1 0 0 Correct 11 (42.3%) 19 (86.4%) 22 (84.6%) 27 (100%) 14 (100%) Incorrect 13 (50%) 3 (13.6%) 4 (15.4%) 0 0 Don’t know 2 (7.7%) 0 0 0 0 Total NTL 15 (57.7%) 3 (13.6%) 4 (15.4%) 0 0 Table 4.2: Results for ‘difficult’ by age group As Table 4.2 illustrates, the majority of errors were once again made by subjects in the youngest age group. Notably, however, whereas approximately 90% of the children in this age group gave target-like responses on easy, and 72.4% on hard, less than half of those between 3;0 and 3;11 (i.e. 42.3%) demonstrated target-like knowledge of the meaning of difficult. These results therefore suggest that difficult is less likely to be included in the vocabulary of children under the age of four than either easy or hard. For those subjects over the age of four, however, performance was considerably better than for children under this age, with the majority demonstrating a target-like understanding of difficult. D.L. Anderson, University of Cambridge 253 Chapter 4: Experimental Design and Presentation of Results Next, the performance of my subjects on the adjective impossible is reported in Table 4.3, below, although, for reasons that will be explained in the discussion to follow, the figures reported in the table are for only ninety of the 119 participants in the pre-test: Age group 3;0 to 3;11 4;0 to 4;10 5;1 to 5;11 6;0 to 6;11 7;0 to 7;6 # of subjs. 14 17 21 24 14 Correct 6 (42.9%) 13 (76.5%) 15 (71.4%) 21 (87.5%) 13 (92.9%) Incorrect 7 (50%) 4 (23.5%) 6 (28.6%) 3 (12.5%) 1 (7.1%) Don’t know 1 (7.1%) 0 0 0 0 Total NTL 8 (57.1%) 4 (23.5%) 6 (28.6%) 3 (12.5%) 1 (7.1%) Table 4.3: Results for ‘impossible’ by age group I chose not to consider certain of the data that I collected for impossible due to concerns that I had regarding the materials used to collect these results. In particular, I was concerned that the first two picture pairs I had initially used were not equally balanced in terms of subject interest and, therefore, that some degree of experimental control was lost when using these pairs. The first pair contrasted a large goldfish swimming in some water with the same goldfish sitting on the seat of a bicycle, while the second contrasted a goldfish swimming in water with one sitting on top of a snowcovered mountain. Unfortunately, pre-testing of these picture pairs had not revealed any problems, and it was only after sixteen children had been tested on one of these two pairs that I could see a specific pattern of behaviour emerging, with my subjects displaying a disproportionate interest in the incongruous site of a goldfish either sitting on a bicycle seat or resting on a mountain top. After testing of the sixteen children referenced above, new materials were introduced that were more equally balanced in terms of subject interest. These are the pictures D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 254 pairs illustrated in Figures 4.2 and 4.3. Furthermore, the sixteen children first tested on the problematic pairs were later re-tested for their knowledge of impossible, using the more suitable pairs, in order for me to gain a more reliable assessment of their understanding of this adjective. Nevertheless, I still chose to err on the side of caution and exclude data collected from these sixteen children in the figures reported in Table 4.3, since the conditions under which they were tested deviated from those under which the majority of participants were tested. Turning now to the figures reported in Table 4.3, it is notable that well under half (i.e. 42.9%) of the subjects in the youngest age group demonstrated target-like knowledge of impossible, just as was observed in the case of difficult. In the case of impossible, however, there were an additional fourteen subjects over the age of four who made errors, including four children over the age of six. By comparison, there were only seven children over the age of four who missed difficult, and none over the age of six who missed this adjective. Thus, these results suggest that acquisition of the meaning of impossible, at least for some children, is relatively delayed in comparison with acquisition of the meaning of other tough adjectives. I will return to a consideration of this issue in §4.2.0.2, below. Finally, Table 4.4, lists the results obtained for the single SDC item: D.L. Anderson, University of Cambridge 255 Chapter 4: Experimental Design and Presentation of Results Age group 3;0 to 3;11 4;0 to 4;10 5;1 to 5;11 6;0 to 6;11 7;0 to 7;6 # of subjs. 29 22 27 27 14 Not tested 6 2 0 0 0 Correct 14 (60.9%) 19 (95%) 24 (88.9%) 26 (96.3%) 14 (100%) Incorrect 8 (34.8%) 1 (5%) 3 (11.1%) 1 (3.7%) 0 Don’t know 1 (4.3%) 0 0 0 0 Total NTL 9 (39.1%) 1 (5%) 3 (11.1%) 1 (3.7%) 0 Table 4.4: Results for subject-control degree construction (SDC) by age group As the figures in Table 4.4 indicate, most subjects over the age of four performed in a target-like manner, but the performance of subjects below this age was more mixed, with approximately 40% of the children in the youngest age group failing the item. These results suggest, therefore, that a significant number of 3-year-olds do not have an adult-like command of the SDC and thus, presumably, lack target-like knowledge of the DC in general. Consequently, successful performance on the single SDC item in the pre-test became a particularly important consideration when evaluating the suitability of younger children for participation in the main study. I had earlier noted that, in addition to performing the picture selection task, some subjects were also asked to answer a follow-up question of the type, “Why is it easy/hard/difficult/impossible for X to do Y?” or, in the case of the SDC, “Why is X too big/small to do Y?” In general, all subjects proved able to comply with this type of request and the responses, with very few exceptions, served to corroborate the D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 256 child’s choice of picture.3 In Tables 4.5 and 4.6, below, I list representative examples of the types of responses that I obtained during follow-up questioning, in support of both correct as well as incorrect choices of pictures: Subj. no. & age Adjective Picture choice no. 72 (5;9) easy TL no. 12 (3;3) hard NTL no.25 (3;10) difficult TL no. 50 (4;10) difficult NTL no. 8 (3;2) impossible TL Why is it impossible for him (= the fish) to swim? “Cause there’s no water.” no. 58 (5;4) impossible NTL Why is it impossible for the fish to swim? “Cause he likes water.” NTL Why is it impossible for the duck to eat? “Because he hasn’t got a paper round his beak.” And so he can eat? Subject nods. Does that make it impossible? “Yeah.” no. 94 (6;7) impossible Explanation Why is it easy for the dog to get his bone? “Because he hasn’t got a lead tied up to a stick.” Why is it hard for the duck to eat? “Cause he has to eat that (food) because he’s hungry.” Why is it difficult for the boy to open the door? “Cause he’s not big enough.” Why is it difficult (for him) to open the door? “Because it’s got a handle on it.” Table 4.5: Selected responses to follow-up questions in vocabulary pre-test (NB: ‘TL’ = target-like; ‘NTL’ = non-target-like) 3 Exceptional responses were typically provided by subjects under the age of four and these tended to be non-explanatory rather than strictly incorrect. For example, subject no. 15, age 3;6, gave a targetlike response to the SDC item and was asked, “Why is it hard for her (i.e. to carry the box) in that picture?”, to which he simply replied, “Because it is.” And when subject no. 10, age 3;3, who gave a non-target-like judgment of the adjective hard, was asked, “Why is it difficult for the dog to get his bone?”, he provided a description of the picture of his choice, rather than an explanation, viz., “He’s running for his bone.” Again, these types of responses to follow-up questions represented only a very minor portion of the data that I collected. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results Subject no.& age 257 Picture choice Explanation no. 49 (4;9) T Why’s he too big to go through that hole? “Cause he’s ate-en (sic) lots of cheese.” no 102 (6;9) T Why is the mouse too big to go through the hole in the wall? “Cause it’s tiny. He can only fit a paw or his nose or his tail in.” no. 12 (3;3) NTL Spontaneous comment: “That can’t go through cause that’s a baby hole but that one can.” no.21 (3;7) NTL Is the mouse too big to go through the hole in the wall? “He can.” Table 4.6: Selected responses to follow-up questions on SDC item (NB: ‘TL’ = target-like; ‘NTL’ = non-target-like) 4.2.0.2 Discussion Perhaps the most striking result obtained in the vocabulary pre-test concerns the performance of subjects on the adjective impossible. Specifically, over half of the subjects between the ages of 3;0 and 3;11 failed this item, in addition to fourteen over this age, including four over the age of six. With respect to the performance of those over four, this finding clearly contrasts with the results obtained for the other three tough adjectives, since no child over this age missed easy, only three over this age missed hard, and of the seven subjects over this age who missed difficult, none were over six. These results therefore suggest relatively delayed acquisition of impossible, in comparison to easy, hard and difficult. Further support for this hypothesis is provided on examining the performance of the above-referenced fourteen subjects over the age of four who missed impossible. This is because twelve of these children performed like adults with respect to each of the other three tough adjectives. Thus, acquisition of the meaning of impossible quite clearly lagged behind that of easy, hard, and difficult, at least for children in this age group. Given, then, that I had reason to believe that even some of my older subjects could lack knowledge of impossible, I chose to exclude this particular adjective from use in the test/control sentences employed in the main study. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 258 Before leaving this topic, it will be instructive to revisit certain findings reported by Kessel (1970) and McKee (1997a), which were obtained in connection with testing of the adjective impossible and which were previously discussed in Chapter 3. First, Kessel reported that some of the younger subjects in the study, the youngest of whom was nearly 6-years-old, did not know the meaning of this adjective. He did attempt to address this issue but, to my mind, the procedure he chose was not a wholly appropriate one. Specifically, he reported that children who appeared not to know the meaning of impossible, were read the sentence “Linus was very, very, very hard to see,” rather than the sentence “Linus was impossible to see” (ibid.:24). Aside from the obvious criticism that use of this procedure introduces some measure of inconsistency into the testing situation, I am furthermore concerned that Kessel’s decision to employ a substitution of this type ignores the fact that the two predicates are not strictly synonymous. As regards McKee (op.cit.), subjects in this study were independently tested for their knowledge of tough adjectives, including impossible; however, as earlier noted, this testing followed rather than preceded presentation of these adjectives in TC items. Additionally, McKee’s subjects were required only to demonstrate target-like knowledge of three out of the four adjectives tested. On the basis of the results I have obtained, it would therefore seem likely that some, if not the majority, of McKee’s subjects who met the requirement of passing three out of four vocabulary items failed the adjective impossible. Yet, according to the design of the study, subjects who missed impossible in the vocabulary assessment would still have been tested for their knowledge of TCs that contained the same adjective. In my opinion, this situation provides reason to question the reliability of at least certain of the data presented in McKee.4 4 The same possibility is, of course, entertained in connection with the performance of McKee’s (1997a) subjects on TCs that featured easy, hard, or difficult, if any of these represented a vocabulary item that the subject had failed. Regrettably, since an items analysis is not available for the vocabulary test that McKee conducted (McKee, p.c.), it is impossible to determine the specific nature of the errors made by those subjects who provided three out of four correct responses in this condition. Nevertheless, based on the results of my own vocabulary assessment, it is reasonable to speculate that those of McKee’s subjects who missed only a single vocabulary item were more likely to have missed impossible than any of the other three tough adjectives. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 259 In order to avoid introducing a similar problem into my own study, I chose to tighten McKee’s inclusion criterion by requiring that potential subjects demonstrate knowledge of the meanings of the adjectives easy, hard, and difficult, prior to participation in the main experimental study. And, although I continued to pre-test children’s knowledge of the adjective impossible, I eliminated this adjective from use in the main study for the reasons earlier stated. One remaining issue of interest concerns the question of whether children acquire tough adjectives in any fixed order. Although the vocabulary test was not designed to investigate this specific issue, I believe that my findings can be taken as suggestive that a predictable, if not altogether fixed, order of acquisition of these adjectives does exist. Focusing first on my youngest subjects, that is, those between the ages of 3;0 and 3;11, I have previously noted that nearly 90% of these children gave correct responses for easy, as compared with 72.4% for hard, 42.3% for difficult, and 42.9% for impossible. Therefore, insofar as it is reasonable to generalize these findings to the wider population, it would appear that acquisition of easy and hard typically precedes acquisition of the latter two adjectives. This hypothesis is further strengthened by looking at the individual performance of subjects under the age of four. With regard to the three children who failed easy, two failed all other test items, while the third got hard correct but failed both difficult and impossible. And of the five who failed hard, notably, two failed all other items and two got only easy correct.5 Furthermore, in looking at the performance of subjects over the age of four, the generalization noted above holds even more strongly since there was only a single subject out of the ninety children tested who gave a correct response to difficult or 5 The fifth child, age 3;1, represented a bit of an exception since he scored correct on impossible but failed both hard and difficult. Furthermore, this child was exceptional in another respect since he was able to provide a clear explanation of his correct judgment of the impossible picture pair, an ability that many of his peers clearly lacked; for example, when asked, “Why is it impossible for him (= the fish) to swim?”, he reasonably replied, “Cause he hasn’t got any water.” Since this child was ultimately not selected to participate in the main study, however, I will not investigate his atypical abilities in any further detail here. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 260 impossible pairs yet made an error on either easy or hard pairs.6 Thus I observed that failure on easy and hard served as a fairly reliable predictor of unsuccessful performance on difficult and impossible, while the opposite was generally found not to hold true 4.2.1 4.2.1.0 Part two: Memory test Design The main study to be discussed later in this chapter involves use of a truth-value judgment (TVJ) task, in which a child is asked to join a puppet in watching an experimenter demonstrate a series of actions that are performed by toy characters. After watching a demonstration of the story, children are asked to judge whether the puppet’s evaluation of what happened in the story, the test sentence, accurately describes the events depicted. The task thus indirectly relies on a child’s ability to recall the main events of the story that has been demonstrated for her and represent these events in memory long enough to allow her to evaluate the test sentence against the story context. On the basis of her own experimental findings, Bauer (1997) has argued that this particular ability is not beyond the capabilities of even very young children, since children as young as twenty months of age have demonstrated correct recall of a short sequence of events when they are asked to reproduce the sequence by acting it out with toys. Moreover, it has also been experimentally demonstrated that by the age of thirty months, children are capable of reproducing a sequence of events (e.g. “building a house”) that involves as many as eight separate steps (Bauer & Fivush 1992, cited in Bauer op.cit.). The test stories designed for use in the main study included both actions and dialogue, with the relative balance of each varying somewhat from item to item. None of the stories, however, included more than seven separate actions performed by the toy characters, with the average falling somewhere in between five and six actions and/or 6 With regard to the acquisition of difficult and impossible, as previously noted, my results suggest a general tendency for children to acquire impossible last. Yet there were three instances in which subjects over the age of four gave correct responses to impossible but failed difficult. I submit that these results would therefore seem to suggest that the order of acquisition of these two adjectives is not completely predictable but may instead be subject to some individual variation. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 261 events per story. Consistency was maintained in terms of the length of time required to demonstrate each story, with no story exceeding one minute in the total time required for demonstration. Because, as noted above, success in the TVJ task relies on adequate recall of story events, I decided to incorporate an assessment of this ability into the existing vocabulary pre-test. I chose to use an act-out task for this purpose on the recommendation of Bauer (op.cit.), who advocates use of this technique for children who may lack the linguistic skills to provide an accurate verbal report of events that they have watched. As Bauer points out, it is widely recognized that procedural or non-declarative memory is not robust when tested across different modalities; consequently, successful performance on the act-out task, which is a cross-modality task, would imply the child’s use of recall or declarative memory.7 For the specific design of the task, I followed the basic recommendations contained in Goodluck (1996). The memory assessment involved the same 119 children who participated in the vocabulary pre-test, although useable data was collected from only 116 of these children.8 With only minor exceptions (as described in ftnt. 8) each subject was tested on two stories, one of which included six separate events and another of which included eight events. At the recommendation of Ianthi Tsimpli (p.c.), the six and eight-event tasks were written to include at least one event that could be considered non-plausible and/or non-predictable based on the type of general knowledge that 7 I adopt Mandler’s (1986; cited in Bauer 1997:85) definition of recall memory, which describes a process in which a “cognitive structure” is retrieved solely on the basis of past experience and in the absence of “on-going perceptual support.” Although children are given some perceptual support in the act-out task in the form of the continued presence of the toy props in the experimental workspace, they receive no such perceptual support with respect to the temporal ordering of events in the story. For this reason, it is generally accepted that it is recall (or declarative) memory that is tested when children are asked to reproduce a specific sequence of events. 8 Three of the subjects under the age of four who had participated in the vocabulary test proved unable to comply with the requirements of the memory test and, consequently, any data collected from these children was excluded from my analysis of the final results. It should also be noted that eight of the 116 participants were tested according to a slightly different procedure. These children, whose abilities were assessed during the first two days of testing, were given three rather than two stories, featuring a series of four events, six events, eight events, respectively. It became apparent early on in the testing process, however, that even the youngest children experienced no difficulty in acting out the four-event task; therefore, I took the decision to administer only the six- and eight-event tasks to subsequent subjects, since I believed that children’s performance on these two tasks was likely to be the most informative as regards their abilities. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 262 even a very young child might reasonably be expected to possess. This step was taken to ensure that subjects constructed an on-line representation of the story for the purposes of recall, rather than a representation based on an entirely predictable series of events. Several alternative stories were developed for use in the memory pre-test; examples of the two most frequently used are provided in (3) and (4), below: (3) Six-event story a. A pig drives his car to a local park. (implausible event) b. At the park, the pig first plays on a roundabout. c. The pig uses his snout to push a little boy who is swinging on a swing. (unpredictable event) d. The boy thanks the pig. e. The pig announces that the pushing he has done has made him hungry. f. The pig gets back into his car and drives home. (4) Eight-event story a. A little girl on a horse jumps over a barrier. b. The horse is thirsty and has a drink of water from a watering trough. c. The little girl’s father tells her that it is time to go home. d. The girl dismounts. e. The father offers the horse a bucket of food. f. A little mouse runs over and tries to eat food from the bucket. (unpredictable event) g. The father sends the mouse away. h. The father and daughter walk home. Before the demonstration of the story, subjects were told that they would not be asked to remember “what the characters say,” but rather only “what they do.” Additionally, subjects were allowed to watch up to three demonstrations of a story before being asked to retell the story themselves. Successful performance required that the child be able to demonstrate each of the events or steps in the story in the exact sequence in which they were originally presented by the experimenter. In the case of several older subjects who expressed reluctance to manipulate the toy props, a correct oral D.L. Anderson, University of Cambridge 263 Chapter 4: Experimental Design and Presentation of Results presentation of the story events was also considered an acceptable test of their recall memory. 4.2.1.1 Presentation of results The overall performance of subjects on the six-event memory task is reported in Table 4.7, below, with the 116 participants divided into four age groups for the purpose of comparison: Age group # of subjs. Pass Fail N/A9 3;0 to 3;11 26 16 (64%) 9 (36%) 1 4;0 to 4;11 22 21 (95.5%) 1 (4.5%) 0 5;0 to 5;11 27 25 (92.6%) 2 (7.4%) 0 6;0 to 7;6 41 34 (94.4%) 2 (5.6%) 5 Table 4.7: Performance on six-event memory task by age group As Table 4.7 illustrates, subjects over the age of four experienced little difficulty in successfully performing the six-event task. The task proved more challenging, though, for those subjects under the age of 3;11, over one-third of whom failed to perform the task correctly. Turning to the overall performance of subjects on the eight-event task, these results are reported in Table 4.8, below: 9 The figures listed under ‘N/A’ in Tables 4.7 and 4.8 represent instances in which a pass/fail assessment of either a six- or eight-event story could not be made. In two of these cases, an eight-event task was not administered at all because the subjects, both under the age of 3;3, experienced great difficulty in completing the six-event task. The other cases can be attributed to recording equipment failure or to unforeseen interruptions in the task that precluded a normal assessment of the child’s abilities. D.L. Anderson, University of Cambridge 264 Chapter 4: Experimental Design and Presentation of Results Age group # of subjs. Pass Fail N/A 3;0 to 3;11 26 5 (21.7%) 18 (78.3%) 3 4;0 to 4;11 22 11 (55%) 9 (45%) 2 5;0 to 5;11 27 11 (42.3%) 15 (57.7%) 1 6;0 to 7;6 41 26 (65%) 14 (35%) 1 Table 4.8: Performance on eight-event memory task by age group The data listed in Table 4.8 indicate that the performance of all age groups on the eight-event task lagged behind general performance on the six-step task. In the case of the youngest age group, proportions were reversed as compared to the six-event task, with more children (78.3%) failing rather than passing (21.7%) the eight-event task. For those subjects between the ages of 4;0 and 5;11, performance was mixed, with roughly equal numbers passing and failing the latter task. It is therefore only in the case of the oldest subjects in the study, that is, in the case of those over the age of six, that successful performance on the eight-event task can be seen to become more assured, with approximately two-thirds of those tested passing this item. Finally, in Table 4.9, below, I illustrate the performance of each age group across both conditions: D.L. Anderson, University of Cambridge 265 Chapter 4: Experimental Design and Presentation of Results Failed only Failed only 6-event 8-event Age group # of subjs. 3;0 to 3;11 26 1 (4.5%) 10 (45.5%) 4;0 to 4;11 22 0 8 (40%) 1 (5%) 11 (55%) 2 5;0 to 5;11 27 0 13 (50%) 2 (7.7%) 11 (42.3%) 1 6;0 to 7;6 41 0 10 (28.6%) 2 (5.7%) 23 (65.7%) 6 Failed both Passed both 4 7 (31.8%) (18.2%) N/A 4 Table 4.9: Performance on six-event and eight-event tasks by age group As Table 4.9 indicates, the percentage of children who passed both test items remains quite low in the case of the 3-year-olds, and it is only after the age of six that performance begins to approach adult norms. Problems raised by the relatively poor performance of my younger subjects in both conditions of the memory pre-test will be considered in the following section. 4.2.1.2 Discussion As the results reported in Tables 4.8 and 4.9 indicate, the eight-event task presented a real challenge to a sizeable number of children across all of the age groups included in the study. What the figures in the tables fail to reflect, however, is the fact that the performance of individual subjects was not entirely uniform, with some observed to experience a greater degree of difficulty in performing the eight-event task than others. A closer examination of the types of errors made on the horse story (cf. (3)), which was the most frequently used test item in this condition, is informative in this regard. A total of sixty-seven subjects were given this story and, of these, thirty children (or 45%) passed and thirty-seven children (or 55%) failed the task. I found that the most common error subjects made was that of either completely forgetting to include the final event of the story or of only remembering to include the final event when supplied with a prompt by the experimenter (e.g. “And what happens at the end of the story?”). This type of error occurred over 50% of the time, and this rate of D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 266 error was similar across all of the eight-event stories I used. These findings therefore suggest to me that my subjects did not omit the final event in the horse story because it was not salient or particularly memorable, as this type of error was witnessed regardless of the particular series of events presented in this condition. Because the error of omitting the final step in the eight-event story was observed to occur with such frequency across subjects of all age groups, I felt it was reasonable to relax the criterion of successful performance that had originally set for this task. Accordingly, I chose to assign a passing score to any child who made only a single error of this type. Turning now to the second most frequent error made on the horse story, this involved children omitting the step in which the little girl dismounts the horse at her father’s request. Sixteen children, or 24% of the total, made this error, with occurrences fairly evenly distributed across subjects in all age groups. In fact, errors of this type far exceeded those in which children omitted (or altered) the unpredictable event of the mouse trying to eat the horse’s food, which therefore suggests that subjects were not simply relying on event predictability as a means of recall. Rather, I believe that these results indicate that event salience serves as a particularly strong cue for recall memory, with the activities of the mouse, for example, generating more interest and therefore a stronger memory trace than the routine event of the girl dismounting the horse. Returning to Table 4.9, recall that the percentage of children under the age of 3;11 who passed both tasks was less than 20% of the total. This finding thus presented a problem with regard to the selection criteria that I had initially established, which had conditioned a child’s participation in the main study on her successful performance in both pre-tests. In order to secure a reasonable number of subjects below the age of four, I chose to amend the inclusion criteria by allowing successful performance on the vocabulary test to serve as the primary consideration when selecting subjects below this age. With regard to the memory test, I felt it was reasonable, based on the findings reported above, to relax the criterion for successful performance for the D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 267 under 4-year-olds. Specifically, I required that children under this age pass the sixevent task and additionally make no more than one error on the eight-event task. In fact, as later observation established, those children below the age of four who were selected to participate in the main study on the basis of the revised criterion performed comparably to their peers who had passed both memory tasks with ease. Moreover, certain of these subjects even showed exceptional ability to retain story events in memory when these events were introduced in the context of the TVJ task. Thus, I observed that less-than-perfect performance on the eight-event task did not necessarily predict a subject’s inability to cope with the requirements of the TVJ task, and that, interestingly, the opposite appeared to hold true in several cases. In particular, there were four subjects who passed both the six-and eight-event tasks, but whom I nevertheless later judged to be unsuited for participation in the main study. 4.3 Design of main experimental study 4.3.0 The truth-value judgment task 4.3.0.0 General features of design As earlier noted, in both the pilot and main studies I used a TVJ task (cf. Crain and Thornton 1998; Gordon 1998) to test children’s comprehension of TCs and other syntactically related structures. My reasons for adopting this methodology will be reviewed in the following section. First, however, I present a basic description of the features of the TVJ task in this section, which is primarily based on the discussion contained in Chapter 27 of Crain and Thornton (op.cit.:221-37). As noted in the previous section, the TVJ task involves a child who joins a puppet in watching an experimenter tell a short story with toys. A photograph of the experimental workspace, taken from my own experimental study, is offered in Figure 4.5, below: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 268 Figure 4.5: Set-up of experimental workspace for TVJ task (Anderson 2002a,b) At the end of the story, the toy props are left in position in the workspace to serve as a visual reminder of the events just described. The puppet is then asked to explain what happened in the story and it is in this manner that the test sentence is presented to the child. After presentation of the test sentence, the child is called upon to evaluate whether the puppet’s assessment of what happened in the story (i.e. the test sentence) is true (“right”) or false (“wrong”) in the context of the story that has just been demonstrated. At this point, an elicitation measure can be added to the task in which the child is asked to explain why the puppet’s response has been judged as being incorrect; for example, Crain & Thornton recommend asking the child, “What really happened (in the story)?” As the authors note, this feature can prove useful in establishing that a child has rejected the puppet’s statement for a legitimate reason. Additionally, the experimenter may allow the child to either reward the puppet for a correct assessment of the story or offer it some type of negative consequence for an incorrect assessment as a means of enhancing the child’s enjoyment of the task (op.cit.:222). D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 269 In my own study, I adopted this last recommendation and I believe that it served, as predicted, to enhance the cooperation of my subjects. On the basis of remarks made by some of the parents of these children, however, I felt compelled to develop an alternative procedure for providing feedback to the puppet to those previously discussed in the literature. For example, Crain & Thornton (1998) used a technique in which the child pretends to feed the puppet either a desirable (plastic) food or an undesirable food. Alternatively, Crain and McKee (1985) asked children to either feed the puppet a cookie as a reward for a true statement or a rag as a consequence of making a false statement, activities which they report that their child subjects enjoyed very much. Yet, parents of some of the children participating in my study expressed concern that use of either of these techniques might inspire their children to treat reallife pets in a similar fashion or might serve to reinforce their children’s dislike of certain healthy foods. Therefore, in response to the concerns noted above, I chose two self-inking stamps to serve the same function as the cookie and rag in Crain and McKee’s study. One stamp featured a gold star and the other a smiling face. If the child judged that the puppet had correctly assessed what happened in the story, then they were instructed to stamp the paper with a gold star; for an incorrect assessment, the smiley face stamp was used. In order to avoid the possibility that preference for the use of one stamp over the other would influence the child’s judgments of the puppet’s statements, the stamps themselves were carefully selected so as to be equally desirable to child subjects. I also chose to label the stamp associated with an incorrect assessment of the story (i.e. the smiley face) a try-again stamp so that children would not be tempted to associate use of this stamp with punishment of the puppet and, accordingly, restrict their negative judgments of the puppet’s statements. As a further precaution against this last possibility, I continually encouraged my subjects to view themselves as teachers who took responsibility for making proper assessments of the puppet’s statements in order to help him learn. In fact, I found that my incorporation of the self-inking stamps into the experimental task conveyed an unexpected benefit, which was that children were given something D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 270 to do with their hands. I noted that my younger subjects in particular needed frequent reminders not to touch the toy props during storytelling, even though they were freely allowed to do so before the story began or after offering their judgment of the test sentence. The stamps therefore helped to curb these tendencies and I observed that my subjects also enjoyed taking responsibility for keeping a record of the puppet’s performance. 4.3.0.1 Methodological advantages My decision to use the TVJ task was motivated by my belief that use of this task offers clear methodological advantages over the use of other experimental techniques. Crain and Thornton (1998:210), who advocate exclusive use of the TVJ task for evaluating comprehension, and Gordon (1998) offer a number of reasons why this task can be viewed as superior to other alternatives. For Crain and Thornton, the first and foremost advantage offered is that the experimenter is not only allowed to control the form of the sentence presented to the child but also the context in which the sentence is to be interpreted. As they point out, certain alternative methods, such as the act-out task, suffer by comparison, since the experimenter can control the form of the test sentences but not the interpretive context. The authors argue that if it can be demonstrated that a child is able to accurately distinguish contexts that make a particular sentence true from those that make it false, then it is reasonable to infer that the child shares adult-like knowledge of the sentence/meaning pair under investigation. Furthermore, as Gordon (ibid.:212) notes, evaluation of the truth value of an utterance does not require the use of metalinguistic skills, as required in the performance of the grammaticality judgment task; instead, successful performance in the TVJ task requires only that the child have some conception of the truth relations that hold between a sentence and the particular situation to which it refers. Another advantage of the TVJ task concerns the design of individual test items. Specifically, stories that precede the presentation of test sentences can be written in such a way as to support two potential interpretations of a single sentence. As Crain and Thornton (op.cit.) observe, this feature thus proves particularly useful in studies D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 271 such as my own, in which the issue to be investigated is whether children can assign more than one interpretation to a sentence that is unambiguous in the adult grammar. It has also been argued that use of the TVJ task, in comparison with the use of other techniques, helps minimize demands placed on children’s memory resources. Gordon (op.cit.), for example, notes that the TVJ task essentially asks the child to perform normal discourse processing and so can therefore be viewed as imposing no extraordinary demands on memory. Furthermore, it can be argued that one design feature of the TVJ task actually facilitates recall. This concerns the step in which the toy props are left in place at the end of the story, which allows the child to consult a visual record of the story context against which the test sentence is to be evaluated. A final very important advantage of the TVJ task is a psychological one, since child subjects are never made to feel as if their own knowledge is being tested. Instead, it is the puppet whose responses are under evaluation and who is perceived as being occasionally fallible. In contrast, the child herself is treated at all times as an authority with respect to her judgments of the puppet’s statements and, for this reason, her own judgments are never subject to any correction or comment. Furthermore, the inclusion of a puppet in the task confers a psychological advantage in its own right, as Crain and Thornton (op.cit.) point out, since interaction with a puppet, rather than with an adult, helps address any reluctance that a child may feel to provide negative judgments of statements that are made by adult experimenters. In my own experience with use of the TVJ task I found confirmation of the general claim that the task is well suited to the psychological needs of children. This is because I observed that subjects of all ages were comfortable with the testing situation, took the role of playing teacher to the puppet quite seriously and expressed great willingness to participate in further sessions. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 4.3.0.2 272 Procedure With regard to the design of the test items used in the study, a complete list of which is offered in Appendix I, I followed a number of specific recommendations made by Crain and Thornton (1998:222-7). It is perhaps easiest to illustrate these recommendations, as well as the reasons the authors advocate them, by analysing how they were incorporated into the design of a single test item. As a representative example, I have chosen story 10, which preceded presentation of the TC, The dog was difficult to teach. For adult speakers of English, there is of course only one possible interpretation of this sentence, in which an unspecified agent experiences difficulty when attempting to teach a dog something. I have previously termed this the object reading of the sentence in reference to the logical role that is played by the sentenceinitial determiner phrase (DP) with respect to the embedded infinitive verb. As discussed in Chapter 3, however, it has been proposed in the literature that children may initially assign an alternative interpretation to the TC, which I have termed the subject reading. With reference to story 10, this reading would be one in which the dog experiences difficulty when attempting to teach something to an unspecified entity. Both readings of TC10 are listed in (5) below, along with a suggested syntactic representation of each sentence:10 10 The embedded verb used here, to teach, is recognized as one that licenses unspecified object deletion in English (Rizzi 1986:509). Specifically, the verb may occur with a phonologically null object, which is construed as having a canonical or prototypical interpretation (cf. I’m planning to teach in the autumn). Since the story accompanying the presentation of TC10 provides a specific context for the interpretation of the sentence, however, I made the tentative assumption that children who chose a subject reading would have a specific referent in mind for the pro object of the infinitive verb, which in this case would be the pig. Thus, the null object of teach in this instance would more accurately represent an example of what Rizzi (ibid.) has termed null complement anaphora (see also Hankamer and Sag 1976), since it would be assumed to take definite and anaphoric reference. Notably, however, the adult grammar does not recognize this last option for the verb to teach (cf. *I taught, with the interpretation I taught Jane). Nevertheless, I will argue that children who assign subject readings to the TC do allow a definite interpretation of the null embedded object. Supportive evidence for this claim consists in part of explanations offered by my child subjects for their nontarget-like readings of the TC, which will be reviewed in §4.5.0.2. Additionally, I am aware of other experimental studies, such as Eisenberg and Cairns (1994:722), which have produced findings that young children (i.e. those below the age of five) are willing to entertain object drop in certain contexts that adults would reject. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (5) 273 a. Object reading - Allowed in the adult grammar ‘Someone finds it difficult to teach the dog something.’ The dogi was difficult [PRO to teach ti]. b. Subject reading - Disallowed (*) in the adult grammar ‘The dog finds it difficult to teach someone something.’ *The dogi was difficult [PROi to teach prok]. The main issue I sought to investigate in the present study was this: Do children at an early stage of grammatical development share the same constraints on interpretation of TCs that adult speakers of the language do? In particular, are children restricted, as adults are, to the assignment of an object reading only? As is standard in the literature, I associated the null hypothesis with the assumption that children possess target-like knowledge of the control restrictions associated with the tough adjective; according to this hypothesis, then, the only interpretation available to the child should be the object reading.11 Initially, I took the experimental hypothesis to assert that children lack target-like knowledge of the interpretive constraints associated with the tough adjective. However, a potential problem arose in that there are actually two possibilities that exist with respect to the state of the child’s grammar; these are schematically represented in (6) below: (6) Experimental hypothesis - Either (a) or (b): a. Object and subject readings will be available. b. Only subject readings will be available. According to the first possibility, the child’s grammar treats the TC as ambiguous, while according to the second, only the subject reading of the TC is allowed. In both cases, the child’s interpretation of the TC is reasonably considered non-target-like. I 11 In this respect, I deviate from Crain and Thornton (1998) since the authors, contrary to general practice, associate the null hypothesis with children’s lack of target-like knowledge of some grammatical principle and/or constraint (see ibid.:221-2 for discussion). In my own study, I chose to follow standard practice in the experimental literature (cf. Hsu & Hsu 1998:316) and take the null hypothesis to describe the situation in which the two populations compared (i.e. children and adults), do not differ with respect to their knowledge of the interpretive constraints associated with the TC. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 274 therefore chose to formulate the experimental hypothesis in more narrow terms, according to the assumption that the child lacks the ability to construct a target-like syntactic representation of any NOS and is consequently restricted to only subject readings of the TC. Thus, I resolved that the pattern of performance represented in (6a) would not be taken as providing support for the experimental hypothesis, even though this pattern of performance is also rightly considered non-target-like. The determination as to whether a particular child subject performed in a manner consistent with (6a) or (6b) was made by analysing the performance of each subject across the full set of twelve TC items. Note that this analysis of individual subject performance constitutes the focus of §4.5.0.1. Returning to the issue of the design of TC items, Figure 4.6, below, illustrates the materials used for the story that accompanied TC10, and Figure 4.7 presents the text of the story: Figure 4.6: Materials used for TC10, ‘The dog was difficult to teach.’ D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 275 [Toy props: A pig, a dog, a cat, a football, a goal (fence), and a slide.] Narrator: This is a story about a dog and a pig who are playing in the park. Dog: I can teach you how to play football, pig. Would you like that? Pig: But you’re a dog, not a football player. How can you teach me how to play football? Dog: Watch! I push the football with my nose. Then I run with the ball and push it into the fence. There! I scored a goal. Now you do the same thing, pig. Pig: OK. Like this? (Pig pushes ball into fence with his nose.) Yea! I scored a goal too. Thanks for teaching me how to play football, dog. Now I’ll teach you how to play on the slide. Just watch. You go up the steps like this and then slide down this end. Whee! Your turn, dog. Dog: Like this? (The dog tries to go up the slide from the wrong end.) Pig: No, dog, that’s not right. Go round to the steps and try again. Dog: (noticing cat) Hey, is that a cat over there? Forget about the slide, I’m going to chase that cat! (Dog runs after cat. Cat makes ‘meow’ noise.) Puppet: I know what happened in that story. The dog was difficult to teach. Figure 4.7: Text of story preceding TC10, ‘The dog was difficult to teach.’ As was noted earlier in this section, the TVJ story allows presentation of two different interpretive contexts against which the truth-value of a particular test sentence can be evaluated. In the case of TC10, the first such context involves a situation in which a pig experiences difficulty in teaching a dog how to slide down a playground slide. This is the intended context for the target-like or object reading of the sentence, which should be judged true on this interpretation since the pig does indeed experience difficulty in teaching the dog. The second context involves a situation in which the dog does not experience any difficulty in teaching the pig how to play football. This is the intended context for the non-target-like or subject reading of the sentence, which should be judged false. There are two respects, however, in which the design of story 10 did not meet the specific recommendations of Crain and Thornton (op.cit.). In both cases, this is the result of my decision to adopt the traditional conception of the null hypothesis (see ftnt. 11), rather than the conception proposed by the authors. The first point of D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 276 difference concerns Crain and Thornton’s recommendation that the false judgment of a TVJ sentence should always be associated with the target-like reading of the sentence and, conversely, that the non-target-like reading should always correspond with an affirmative response. With reference to story 10, the reader will observe that the opposite holds true. Since Crain and Thornton associate the experimental hypothesis with the assumption that the child shares target-like knowledge of the structure under investigation, they are concerned to avoid a situation in which a child’s affirmative response may erroneously be taken as providing support for the experimental hypothesis. Such a situation can arise, the authors explain, when a child provides a “yes” response to an experimental question due to confusion or a lack of understanding of the test sentence, rather than because the child possesses target-like knowledge of the structure under consideration.12 I return to this issue shortly. The second way in which story 10 fails to meet the recommendations contained in Crain and Thornton is with respect to the order of presentation of story events. In story 10, the final sequence of events pertains to the pig’s attempts to teach the dog, which is the intended context for an object or target-like reading of the test sentence (i.e. The dogi was difficult for the pig to teach ei). Crain and Thornton suggest, however, that the final events presented in a TVJ story should favour the non-targetlike reading of a test sentence in order to make this reading the most salient and therefore preferred interpretation, if allowed by the child’s grammar. The authors argue that if a child overrides a bias in the presentation of story events in order to assign a target-like interpretation to the sentence, then the child’s performance will serve as more robust evidence of target-like syntactic competence than if the child’s correct response corresponds to the most recently mentioned event (op.cit.:224). It is important to note, however, that the argument referenced above is predicated on one fundamental assumption that the authors make, which is that a child’s 12 The type of error described here – that is, one in which experimental results are taken to provide support for the experimental hypothesis when, in fact, the null hypothesis is true - is sometimes termed a Type I error (Crain and Thornton 1998:213). I have chosen not to use this term in the discussion above, however, due to the potential for confusion; this is because the consequence of committing such an error will be seen to differ according to whether one adopts the formulation of the experimental hypothesis advocated in Crain and Thornton or the formulation that I have proposed here. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 277 performance should not be affected by any bias in test design if the child possesses target-like knowledge of the grammatical constraint under investigation. That is, Crain and Thornton expect children, like adults, to override any such biases if their knowledge of a particular test item is target-like. Ianthi Tsimpli (p.c.), however, has questioned whether the introduction of biases in the design of test items might adversely affect the performance of children whose grammars are not yet target-like but are instead in a state of development. Since this is a consideration that, as far as I can determine, is not addressed by Crain and Thornton, I chose to exercise caution in adopting their specific recommendations with regard to what I will term the affirmative response and order of presentation biases. In particular, I decided to vary the direction of the two types of biases across individual items as follows. First, I identified test items as those pairing the true response with the target-like or object reading of the TC and control items as those exhibiting the opposite pairing. In my study, control items thus served the primary function of providing child subjects with an alternative pairing of the affirmative response with the subject reading of the TC. It was hoped that this feature of design would help guard against the possible occurrence of a training effect, in which the subject comes to associate the true or false response exclusively with either a target-like or non-target-like interpretation of experimental items. Additionally, I varied the direction of the order of presentation bias in pairs of both test and control items. The latter feature of design is perhaps best illustrated graphically, and so in Table 4.10, below, I have categorized each of the twelve TC items used in the study according to the direction of the two biases:13 13 It was not until testing of subjects had commenced that I noticed that the story accompanying hard 8 included an order of presentation bias towards the subject or non-target-like reading of the sentence, even though this item should have featured the reverse bias. As data had already been collected for this item at the point at which the error was discovered, I felt that revision of the story and/or test sentence at this late stage would be ill advised. D.L. Anderson, University of Cambridge 278 Chapter 4: Experimental Design and Presentation of Results Item type Affirmative response (AR) bias Order of presentation (OP) bias Abbr. Test/control items Test object or targetlike (TL) reading subject or non-targetlike (NTL) reading TS-SB easy 4 hard 6 hard 8 difficult 12 Test object (TL) reading object (TL) reading TS-OB easy 2, difficult 10 Control subject (NTL) reading subject (NTL) reading CS-SB easy 3, hard 7 difficult 11 Control subject (NTL)reading CS-OB easy 1, hard 5, difficult 9 object (TL) reading Table 4.10: Distribution of biases in TC experimental items (NB: ‘TS’ = test sentence; ‘CS’ = control sentence; ‘SB’ = subject reading bias; ‘OB’ = object reading bias) The means of classification that I have adopted in Table 4.10 (see ‘Abbr.’ column) reflects the fact that test items are distinguished from control items according to whether an affirmative judgement is associated with a target-like or non-target-like interpretation of an item. Furthermore, like items are distinguished in terms of whether the final event in the story is linked to a target-like or non-target-like interpretation of the item. The notation TS-OB, for example, indicates first that the pairing of an affirmative (i.e. “true”) response with a target-like reading of the TC distinguishes this as a test sentence (TS) and, second, that the order of presentation of story events also favours the object or target-like reading of the TC. (Note that the effect of these particular biases on subject performance will be discussed in §4.5.0.0) Returning to the original design recommendations advocated by Crain and Thornton, there is an additional pragmatic consideration that the authors discuss, which I incorporated into the design of my own experimental items. This is what the authors term the condition of plausible dissent (or Russell’s maxim) (op.cit.:226). Here, Crain and Thornton adopt a view originally espoused by Russell (1948:138; cited in Crain D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 279 and Thornton op.cit.), who proposed that a possible negative judgment of a sentence is felicitously made only when the speaker has already made or considered a positive evaluation of the same. Accordingly, Crain and Thornton argue that it is unreasonable to expect children to judge a test sentence false if the discourse context does not make it clear precisely why the sentence is false. The authors therefore specifically recommend that the design of each TVJ story meet the condition of plausible dissent (hereafter, CPD) (op.cit.:225-6).14 With reference to the sample item previously discussed, TC10, the reader can verify by consulting Figure 4.7 that I included an event in the story that meets the CPD. Recall that a false judgement of TC10 is associated with a non-target-like interpretation of the sentence The dog was difficult to teach, an interpretation which is supported by the success that a dog experiences in teaching a pig how to play football (i.e. *The dog was not difficult to teach the pig). The CPD requires, however, that the listener consider a positive judgment of this reading of the sentence prior to judging it false. This requirement was met in the case of TC10 by adding an event in which the pig expresses some reservation about the dog’s ability to teach football, stating, “But you’re a dog, not a football player. How can you teach me how to play football?” Thus, the possibility is briefly introduced that the dog might prove an inadequate instructor for the pig, even though subsequent events in the story rule out this initial consideration. The last of Crain and Thornton’s recommendations to be discussed in this section pertains to the inclusion of filler sentences in experimental trials. The authors advocate the use of such items as a means of: (1) keeping children motivated and/or interested in the experimental task; (2) determining whether a child is paying proper attention; and (3) establishing whether a child is responding appropriately to experimental items or merely providing the same answer in all circumstances. In my 14 Gordon (1998:216-18), however, has questioned whether the CPD is a necessary requirement for the design of the TVJ task. He suggests the possibility that the same effect – that is, one of providing a reasonable explanation for the negation of a test sentence - might in some circumstances be achievable through highlighting certain information that is already present in the background of the story. Although I do not dismiss the validity of Gordon’s suggestion, I nevertheless chose to adopt Crain and Thornton’s recommendations in order to err on the side of caution when designing my own experimental items. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 280 own study, I used filler items to serve all of these purposes. Nonetheless, I chose to deviate from one specific recommendation for the design of these items that is advanced by Crain and Thornton; this concerns their advice that filler items should ideally be similar in complexity to test and/or control items (op.cit.:134). Instead, I chose to present filler items that were simpler in terms of both length and content than either test or control items. This is because pilot testing of TCs and related NOS had indicated that a relatively high level of concentration was required in order for subjects to properly evaluate these particular sentences against the TVJ story context. And since this observation held true as much for my adult subjects as for my child subjects, I felt that the use of filler items of lesser complexity would better serve to maintain the interest of the subjects and promote confidence in their overall ability to perform the experimental task. 4.3.0.3 Preparation for use of the task In this section, I briefly review steps that I took several months before the commencement of the main study in order to familiarize potential subjects with the requirements of the TVJ task. Prior to conducting both the pilot and main experiments, I worked for several months as a volunteer at both the Honeypot PreSchool and Willingham Primary School, the two facilities from which experimental subjects were exclusively drawn. During this period of time, I became a familiar presence in various classrooms and was therefore able to develop a comfortable working relationship with subjects prior to their participation in the experiment. It was also during this period that children in the two facilities were first introduced to Fudge, a plush dog puppet I had chosen for use in the study (see Figure 4.5, §4.3.0.0). In order to encourage interest in the puppet, children were asked to help think up suitable names for him and the name Fudge was selected according to the results of this competition. Some time later, I introduced a special game that involved the puppet, an activity specially designed to serve as preparation for participation in the TVJ task. Children were asked to join Fudge in listening to a story that was read by their teacher. At the end of the story, Fudge would raise his paw and the teacher would call on him to make some comment about the story. The children were told in D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 281 advance that sometimes Fudge listens very well and can therefore be expected to say something sensible about the story, but other times does not pay close enough attention and can therefore be expected to say something wrong or silly. Children were asked to listen to what Fudge said and to decide whether his statement was right or wrong, according to the context of the story that the teacher had just read to them. Children in all of the classrooms enjoyed this game immensely and readily took to the role of playing surrogate teacher to Fudge. Importantly, however, the grammatical structures which were to be tested in the experimental study were not introduced at any time prior to the study itself. This is because these classroom sessions were solely intended to serve as a means of introducing children to the procedures associated with the TVJ task, rather than to the test items themselves. 4.3.1 Test/control items 4.3.1.0 Selection of vocabulary In designing the present study, I was concerned to address what I consider a weakness of certain previous experimental studies of the acquisition of the TC. This pertains to the choice of vocabulary to be used in TC items. In Cromer (1970), for example, I earlier noted (see Chapter 3, §3.2.1.0) that all of the TCs used in his study featured the single embedded infinitive verb to bite. Yet, as I pointed out, this verb is strongly associated with a transitive rather than intransitive interpretation in adult English (cf. ?John bit), and therefore I believe it is possible that use of this verb could have biased his subjects’ interpretation of test items. Similarly, in McKee (1997a), the TCs that were offered to child subjects all contained infinitive verbs (e.g. reach, catch, chase, and kick) that are standardly considered obligatorily transitive (see Levin 1993). Since both studies included children who provided target as well as non-target-like readings of the TC, I acknowledge that the bias considered here cannot be argued to have wholly dictated the subjects’ choice of sentence interpretation; nevertheless, I D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 282 submit that it remains a possibility that exclusive use of strongly transitive verbs could have compromised the reliability of the data collected in the two studies.15 In my own study, therefore, I was interested to make the two potential representations of the TC, DC, and other NOS equally accessible to child subjects. Accordingly, I decided to use experimental items that featured only verbs that readily allow intransitive as well as transitive readings. Moreover, I sought to restrict my selection to only those verbs that are likely to be included in the receptive vocabulary of a 3year-old. On this basis, I selected the following six verbs for use in my experimental items, with the exception of passive sentences: draw, eat, fight, help, ride, and teach.16 (Note that the design of passive items will be discussed in §4.5.4.) Although only three of these verbs, draw, eat, and teach, are included in Levin’s (op.cit.) list of verbs that allow deletion of an unspecified or indefinite object, all six share certain critical features of syntactic distribution with verbs of this class.17 For example, all 15 Ingham (1989:310-1) collected interesting evidence in this regard, arguing on the basis of his own experimental findings that children as young as three are already sensitive to the argument structure restrictions associated with particular verbs in English. He asserted that young children tend to use transitive verbs in intransitive contexts only when the adult grammar licenses this possibility. This claim would therefore seem to predict that Cromer (1970) and McKee’s subjects (1997a) should have been strongly biased towards the transitive interpretation of the embedded verb in TC items and thus to a target-like interpretation of the TC. Nevertheless, as discussed in ftnt. 10, the manner in which children utilize “object drop” in English does not necessarily match adult practice in all respects (cf. Eisenberg and Cairns 1994) and therefore I must leave as an unresolved issue the extent to which Cromer and McKee’s subjects may have been influenced by their use of strongly transitive embedded verbs. 16 In order to meet the latter of the two considerations referenced above, I used the (Communicative Development Inventory) WORD list (Reznick and Goldsmith 1989:94-7) as a primary, although not exclusive, source of words that are likely to be included in the receptive vocabulary of very young children (i.e. those aged 1;0 to 2;0). Four of the verbs I chose, draw, eat, help, and ride, are included in this list. The last two, teach and fight, were selected as being highly likely to be known to my subjects, all over the age of three, on the basis that all were students attending either nursery or primary school at the time that the study was conducted. 17 This is not intended to imply that the six verbs I selected share all features of syntactic distribution. The verb fight, for example, takes an understood reciprocal object when it selects for a plural NP subject; for example, the sentence The men fought implies that the men fought each other (cf. Levin 1993). Since my test sentences involved only single DPs as subjects, however, this particular feature of the argument structure of fight was not of immediate concern. There is one distributional difference, however, that I believe may have influenced the findings reported in §4.5.0.0 and this concerns the verb to help. Of the six verbs selected for use in my study, this is the only one for which it has been proposed that a deleted object has a contextually specified, rather than indefinite or generic referent (cf. The invigilator told the students not to help, where the object of help is most naturally construed as a empty pronoun coindexed with the subject DP) (Ingham 1989:125; see also the discussion in ftnt.15.) D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 283 allow object omission when used in the present progressive or imperative (e.g. The girl is drawing; Stop fighting!) and therefore contrast with strongly transitive verbs, which are generally considered unacceptable in the same forms (cf. *?The boy is chasing; *?Stop reaching!). Following Rizzi (1986), I assume that object deletion in English is lexically licensed rather than syntactically licensed as in other languages, such as Italian.18 Yet, if object deletion is specified as an idiosyncratic property of particular lexical items in English, as argued by Rizzi, as well as Ingham (1989), then it becomes important to establish the age at which a child will have acquired this type of knowledge. As previously observed (see ftnt. 15), Ingham (op.cit.) has experimentally investigated this question and has claimed that 3-year-olds already display sensitivity to lexical restrictions on the licensing of object deletion in English. On this basis, then, I felt reasonably confident in assuming that my own child subjects, who were all over the age of 3;4, would recognize the availability of both transitive and intransitive readings of the six common verbs chosen for use in the study. 18 Rizzi (1986) proposes that in English, certain lexical items allow a θ-role associated with an object argument to be saturated in the lexicon, thereby bypassing the GB projection principle, which requires that thematic structure is necessarily given syntactic representation. He distinguishes between three different types of null objects, all of which are lexically licensed in English: (1) those which receive an arbitrary interpretation (John is always ready to please (people)); (2) those which receive a canonical or prototypical interpretation (John ate); and (3) those which are interpreted as being definite and anaphoric to some pragmatically salient element (I know) (see also the discussion in McConnell-Ginet 1982 and Jacobson 1992). For the sake of consistency, and because I was limited to using activity verbs which are likely to be known to children as young as three, I tried to select only those verbs that belong to the second of these categories. (Although, as acknowledged in ftnt. 17, above, the verb to help may represent a single exception to this generalization.) Nevertheless, while I strove to select verbs for use in the study whose null object admits a prototypical interpretation, I did not necessarily expect children to assign this type of interpretation to the empty object argument. That is, as discussed earlier in ftnt. 10, I believe it is possible that children who access the subject interpretation of the TC allow the null object to take specific reference. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 4.3.1.1 284 Degree constructions (DCs) For the four DCs included in the study, I will once again illustrate the basic features of design through use of a specific example, story 14, which preceded presentation of DC14, The giraffe was too big to ride. Figure 4.8, below, illustrates the test materials used for the story: Figure 4.8: Materials used for DC14, ‘The giraffe was too big to ride.’ In this condition, I once again took as the null hypothesis that children share targetlike knowledge of the construction under investigation; accordingly, children, like adults, should have two readings of the DC available to them. The two options are illustrated in (7), below, along with a proposed syntactic representation of each. The reader will note that, as in the previous condition, I have chosen to distinguish subject and object readings of a DC item according to whether the matrix subject argument plays the logical role of subject or object with reference to the embedded infinitive verb: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (7) 285 DC14, The giraffe was too big to ride a. Object reading - False The giraffe was too big (for the pony) to ride. The giraffek was too big [Opk [PROi to ride tk]]. b. Subject reading - True The giraffe was too big to ride (the pony). The giraffek was too big [PROk to ride proi/the pony]. As regards the experimental hypothesis, I adopted the same formulation as in the TC condition; namely, I associated this hypothesis with the assumption that children lack the ability to construct a target-like syntactic representation of an NOS and should therefore be restricted to the assignment of subject readings only. A complication arises, however, when it is considered that an adult subject could demonstrate an exclusive preference for the subject reading of the DC, which would not imply his or her inability to assign an object reading to the same structure. Thus, by analogy, a similar pattern of performance on the part of a child cannot be taken as necessarily indicative of the child’s lack of syntactic competence in interpreting the DC. Instead, I decided that a child’s exclusive preference for subject readings of DC items could be taken as supportive evidence for the experimental hypothesis only when such a pattern of performance was observed to be consistent across all of the NOS tested in the study. Returning to the example of DC14, the context of the story was intended to elicit a true judgment of the sentence, The giraffe was too big to ride, according to a subject or non-target-like reading of the same. This is because the giraffe’s attempt to ride the pony is thwarted by his big size. Conversely, according to the story events, a false judgment would be associated with an object or target-like interpretation of the sentence, since the pony succeeds in climbing up on the giraffe and is given a nice ride around a field. Recall that the condition of plausible dissent requires that on the false interpretation of the sentence, a corresponding positive judgment of the sentence should have been under consideration at some previous point in time. For story 14, this condition is D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 286 satisfied when the pony initially expresses concern that he will not be able to climb up on the giraffe’s back, thus temporarily introducing the possibility that the pony may not succeed in riding the giraffe. Ultimately, however, the pony succeeds in doing so after climbing onto the giraffe’s back with the aid of a bale of hay and the object reading of the DC is correctly judged false. In the DC condition, I once again chose to vary the direction of the affirmative response and order of presentation biases across individual items, contrary to the recommendations contained in Crain and Thornton (1998). As before, test and control items were distinguished in terms of whether an affirmative response favoured the object or subject reading of the sentence. For DC14, the story was written so that the affirmative response bias favoured the subject reading, consistent with its classification as a control item. The order of presentation also favoured the subject reading of DC14, since the giraffe’s attempt to ride the pony was presented as the final event in the story. As in the case of TC control items, I viewed the main function of DC control items as providing a balanced opportunity for subjects to associate a true judgment of the TC with the subject reading of the sentence. As Crain and Thornton have observed, this consideration is of particular importance when testing ambiguous sentences, since it is well recognized that both children and adults typically display an interpretive bias towards one of the two available readings. As previously noted, however, such a pattern of performance would be relatively less informative than a situation in which a child demonstrates the ability to access both subject and object readings. Thus, I hoped that mixing the direction of the affirmative response and order of presentation biases in individual items, as I had done in the TC condition, would help to counter any tendency that a subject might feel to assign exclusive subject readings or exclusive object readings to the DC. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 4.3.1.2 287 Infinitival relatives (IRs)/ Object-gap purpose constructions (OPCs) Since the IR and OPC are both unambiguous in the adult grammar, most of the basic features of design for these items were as described in §4.3.0.2 for the TC. 19 In this condition, I once again associated the experimental hypothesis with the assumption that the child does not possess the syntactic ability to interpret an NOS and will therefore be restricted to a non-target-like interpretation of both types of structures. The formulation of this hypothesis differs from that proposed for the TC and DC, however, in that I prefer not to make any prediction with regard to the specific syntactic analysis that the child assigns when a non-target-like interpretation of either the IR or OPC is accessed. This is because I am aware that competing claims have been advanced in the literature with respect to this issue, which I review later in this section (see, e.g., Nishigauchi and Roeper 1987, Goodluck and Behne 1992, or Jones 1992). Thus, in this condition, I resolved simply to take the assignment of any nontarget-like interpretation of either the IR or OPC as providing support for the experimental hypothesis that children lack the syntactic ability to interpret an NOS. As in the previous two conditions, test items were associated with the affirmative response bias, where the true response correlated with a target-like reading of the IR/OPC, while the bias was reversed in the case of control items. Since IR and OPC items were limited to a total of two each, I was not able to vary the direction of the order of presentation bias in the same way that I did for TC and DC items. Instead, the direction of biases across the four combined IR and OPC items was as follows: 19 Recall that the basic syntactic properties of the IR and OPC were outlined in §2.5.0 of Chapter 2. As noted there, the IR can be distinguished from the OPC according to the syntactic position in which the adjunct clause is assumed to attach; for the IR, as in (ia) below, this is standardly considered the N′ level, while for the OPC, as in (ib), it is VP: (i) a. IR: The tigerk found [DP a [NP [N rabbiti ][IP Opi [PROk to eat ti]]]]]. b. OPC: The clownk [VP [VP bought a dogi] [IP Opi [PROk to ride ti]]]. One important issue that was raised in §2.5.0 concerns the question of whether it is reasonable to analyse the derivation of the IR as involving null operator movement, given that the validity of this claim has been challenged in the literature (see, e.g., Contreras 1993). I wish to be clear that my decision to treat both the IR and OPC as examples of NOS in my study represented a working assumption only; this is because one of my investigative goals was to determine whether various NOS are concurrently acquired, a pattern of performance which would provide support for the hypothesis that the IR, OPC, ODC, and TC share a similar structural analysis. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 288 IR17 – CS/OB, IR18 – CS/SB, OPC19 - TS/OB, and OPC20 – TS/SB.20 (Note that the abbreviations TS and CS distinguish test and control sentences, and SB and OB indicate whether the order of presentation of events in the story favoured a subject or object interpretation of the matrix subject argument.) I now consider a specific example of an IR item, this being IR17, The soldier found a pirate to fight. The materials used for this item were as illustrated in Figure 4.9 below: Figure 4.9: Materials used for IR17, ‘The soldier found a pirate to fight.’ The story that preceded presentation of this item provided the following two contexts in which the sentence could be interpreted. In the first, a soldier explains that he wishes only to join a pirate on his ship for a bit of singing and the story ends accordingly. This was the intended context for the target-like reading of the sentence 20 I am aware that the ordering of these biases is less than ideal since, for example, in the case of the two IR items, the affirmative response is associated with the non-target-like reading of the sentence in both instances, while the situation is reversed in the case of the OPC items. Thus, according to my definition of the terms, both IR17 and IR18 can be considered control items, whilst both OPC19 and OPC20 can be considered test items. This situation arose as a consequence of the fact that in the design stage of the study, I had initially classified the two IR items as OPCs. This error was later brought to my attention by Helen Goodluck (p.c.), but as the items referenced here had already been put into use, I was unable to effect the necessary modifications without compromising the reliability of the data that I had already collected (see Anderson 2002a). I return to this issue in §4.5.2 and §4.5.3, where I consider the effect of the direction of these biases on subject performance. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 289 (i.e. The soldierk did not find a piratei to fight ei), and thus IR17 should be judged false on this interpretation. In the second context, the pirate identifies the soldier as someone who typically wants to fight (an event included to satisfy the CPD) and the pirate begins to fight the soldier until the soldier expresses his true intention in seeking out the pirate. I speculated that children who lacked target-like knowledge of the sentence might allow a reading which is supported by the second story context; this is one in which the embedded subject PRO is interpreted as being co-referential with the matrix object DP, and the embedded object as being co-referential with the matrix subject DP. That is, I wondered if some children might erroneously judge the IR17 true because they assign it a structural analysis as in (8), below: (8) The soldieri found a piratek [PROk to fight himi]. While to my knowledge it has not been established that children necessarily allow such a reading of the IR, Goodluck and Behne (1992) have documented a number of cases in which their experimental subjects provided what Jones (1992) terms switched-control readings of the OPC, in which the referential dependencies attested in the sentence match those exemplified in (8), above. And, as observed by Jones (ibid.:178), the structural configuration of the OPC does in fact provide two possible c-commanding antecedents that can act as controllers in the sentence; thus, I believe it is not inconceivable that a child with a developing grammar might entertain a switched control reading of the OPC and by extension, a switched control reading of the IR.21 Of course, it is also possible that a child who accessed a non-target-like interpretation of IR17 could assign the sentence a syntactic representation as in (9), below, in which 21 For simplicity’s sake, Jones (1992) adopts a definition of c-command in which it is sufficient to establish that an element is dominated by “some maximal projection” for the control relation that is described above to obtain. Thus, he is not concerned with the theoretical consequences of segmental adjunction of the object-gap purpose clause to the matrix VP, in which the object of the matrix verb would more appropriately be viewed as m-commanding, rather than as strictly c-commanding, the two empty categories in the adjunct clause (see Jones ibid.:175, ftnt. 2, for further discussion and also N. Chomsky 1986a:6-8). Like Jones, I prefer not to consider the latter issue, but, in my case, this decision is motivated by the awareness that control relations cannot be exhaustively defined according to purely syntactic criteria; accordingly, I believe that an investigation of this issue would take us too far afield from the matters that constitute my focus here. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 290 there is object control of the embedded subject, but the reference of embedded object pro remains free: (9) The soldieri found a piratek [whok was willing] to fight proprototypical Such an analysis is of course not available in the adult case but would nevertheless receive some support from the story context, since the pirate does express a willingness to fight the soldier, which a child could possibly interpret to include a willingness to fight others in general. In fact, this is the type of representation that I personally considered more likely to be assigned in the case of IR18, The tiger found a rabbit to eat, if a child were to entertain a non-target-like reading of the sentence. That is, I speculated that some children might interpret IR18 along the lines of, The tigerk found a rabbiti whoi was eating (or, possibly, The tigerk found a rabbiti and the rabbiti was eating.) As the data I collected in the experiment cannot speak to the child’s choice of a particular syntactic representation of the IR, however, the issue of the actual form of the child’s non-target-like interpretation of the construction is one that must await future investigation. Turning now to the design of the two OPC items, I will take OPC20, The clown bought a dog to ride, as a representative example. The test materials used for this item were as illustrated in Figure 4.10 below: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 291 Figure 4.10: Materials used for OPC20, ‘The clown bought a dog to ride.’ The first possible context in which the sentence could be interpreted involved a clown who bought a dog so that he could ride the dog and, in doing so, entertain a little girl who was bedridden in hospital. These story events were therefore intended to support a true judgment of the sentence, and since this response was associated with a targetlike interpretation of the same, OPC20 was classified as a test item. The second context, in contrast, involved the dog expressing an interest in riding in a cart pulled by the clown. Thus it was anticipated that these latter story events could be used to support a non-target-like interpretation of the sentence, for example, an interpretation as in (10) below, in which the matrix object argument, a dog, serves as the antecedent for PRO: (10) The clowni bought a dogk [PROk to ride pro*j/*prototypical]. Because the clown did not purchase the dog with the intention of letting the dog ride in the cart, the interpretation of the sentence represented in (10) is correctly judged false according to the story context. The CPD, which requires prior consideration of a corresponding positive judgment of this interpretation of OPC20, is met by having the dog briefly entertain the notion that the clown will let him ride in the cart, only to have the clown inform the dog that he (i.e. the dog) is in fact expected to pull the cart. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 4.3.1.3 292 Passive sentences In addition to the NOS items previously discussed, the main study also featured four passive sentences. These sentences were included to serve what can be described as a control function; in particular, I was interested to determine whether any difficulty that my child subjects experienced on NOS could be linked to a more general difficulty that these subjects experienced in interpreting displaced object arguments. As the discussion in Chapter 3 indicated, my decision to concurrently test children’s comprehension of the TC and the passive is not without precedent in the literature. Cromer (1970), for example, incorporated such a comparison into the design of his experimental study. After testing forty-one children between the ages of 5;3 and 7;5, he reported that while only five of these subjects performed in a target-like manner on the TC, thirty-eight (or 93%) did so on the two passive sentences that he presented. There are also a number of studies of the acquisition of the passive alone, in which it has been reported that children of pre-school age are capable of both comprehending and producing such sentences in a target-like manner (see, inter alios, Maratsos and Abramovitch 1975, Maratsos, Kuczaj, Fox, and Chalkley 1979, Maratsos, Fox, Becker, and Chalkley 1985, Pinker, Lebeaux and Frost 1987, and Fox and Grodzinsky 1998). Thus, my decision to include such items in my own study may seem, in this respect, superfluous. The reason that I would argue that it is not is because there is also a body of evidence which indicates that children’s performance on the passive is not uniformly target-like during the pre-school years and beyond. In particular, the findings of certain studies point to a particular difficulty that children experience in processing passive sentences that contain non-actional verbs, such as to like (see, inter alios, Maratsos et al. 1979; 1985, de Villiers, Phinney and Avery 1982, and Gordon and Chafetz 1986; see also Pinker et al. 1987:243-4 for similar findings with regard to children’s production of nonactional passives). In recognition of the latter findings, I therefore chose to include passive sentences of both types in my study, specifically, two that featured the actional verbs bite and chase, and two that featured the nonactional (or experiencer) verbs, hear and watch. I associated the experimental hypothesis with the assumption that children who lack D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 293 target-like knowledge of passivization will assign what I term an active interpretation to the passive sentence. For example, according to this hypothesis, I anticipated that children would interpret a sentence such as The boy was chased by the duck as if it involved an active rather than passive form of the verb, with the boy therefore construed as an agent rather than as a patient. Yet I also recognized that a child’s non-target-like performance might be restricted to only those items containing nonactional or experiencer verbs. As I associated the null hypothesis with the assumption that children should perform just like adults on passives of both types, I decided to take either pattern of performance as evidence for the experimental hypothesis. As in previous conditions, the order of both the affirmative response and order of presentation biases was varied across individual items, and test and control items were distinguished in terms of whether the affirmative response bias favoured or did not favour a target-like interpretation of the sentence. In this condition, however, I distinguished active (i.e. non-target-like readings) from passive (i.e. target-like readings) of experimental items, rather than subject versus object readings. The four individual items were therefore categorized as follows: AP21- CS/AC, AP22 – TS/PS, NAP23 – CS/PS, NAP24 – TS/AC. (Note that the abbreviations AP and NAP stand for actional passive and non-actional passive, TS and CS distinguish test and control sentences, and AC and PS indicate whether the order of presentation of events in the story favoured an active (AC) or passive (PS) reading of the sentence.) Taking the single example of the nonactional passive NAP23, The snake was watched by the rabbits, the materials used for this item were as illustrated in Figure 4.11, below: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 294 Figure 4.11: Materials used for NAP23, ‘The snake was watched by the rabbits.’ The story preceding presentation of this sentence provided the following two interpretive contexts. According to one scenario, a snake sat in a tree watching two rabbits have a picnic on the grass below. Thus, the sentence could be interpreted as being true according to what I have termed the active or non-target-like reading of the sentence, and this pairing of affirmative response with non-target-like interpretation is consistent with its classification as a control item. According to the second scenario, which was presented last in the story, the possibility is first introduced that the rabbits might stand and watch the snake after they are made aware of its presence by an alert hedgehog. This consideration was introduced only to satisfy the CPD, however, since it is a false rather than true judgment of the sentence that corresponds with later story events. This is because the rabbits become frightened once they see the snake and decide to flee. As a consequence, the rabbits do not ever watch the snake, and, accordingly, the target-like or passive interpretation of NAP23 is correctly judged false. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 4.3.1.4 295 Filler sentences I will make only a few brief comments regarding the filler items used in the study, which are listed in Appendix I. As previously noted, the stories written to accompany these items were generally shorter in length than those that accompanied either NOS or passive sentences. Consequently, filler stories typically took thirty to forty seconds to present, as opposed to an average of sixty seconds for other stories used in the study. Filler stories were also relatively more straightforward than test or control items since they were designed to include only a single context in which the sentence could be judged true or false. For this reason, I anticipated that both adult and child subjects would find the interpretation of filler sentences less demanding than the interpretation of test and control items. Finally, filler sentences were evenly balanced in terms of whether the target-like response was associated with a true or false judgment of the sentence. I was also careful to include only lexical items that I believed were likely to be attested in the receptive vocabulary of a child as young as three. 4.4 Pilot study A pilot study was conducted approximately one month prior to the main study. Out of the original group of 122 children who participated in the pre-test described in §4.2, twelve were chosen for participation in the pilot study. These twelve children, who ranged in age from 3;3 to 6;8, were also joined by six adult control subjects. Child subjects were tested at the school they attended, either the Honeypot Pre-School or Willingham Primary School, both located in the village of Willingham in Cambridgeshire, U.K. Adult subjects were tested in their own homes. A combined total of twenty-four test and control items were used in the pilot study, consisting of twelve TCs, four DCs, two OPCs, two IRs, and four passives, with two of the latter featuring actional verbs and two featuring nonactional verbs. Individual subjects were presented with eight to twelve items in total, including filler items. As a general rule, a filler item was presented after presentation of every two test and/or D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 296 control items. Two filler items were also presented at the beginning of the testing session in order to familiarize subjects with the requirements of the TVJ task. The main purpose of the pilot study was to provide a means of evaluating both the efficacy of the TVJ task as well as the suitability of various test/control items that I planned to use in the main study. On the basis of the feedback that my subjects provided during this preliminary study, certain items were modified or even replaced altogether. One notable finding in this regard concerned performance on the following two filler sentences: (1) The bird scared the dog, and (2) The bird frightened the dog. Both were barred from use in the main study because three out of the five subjects tested on these items, all under the age of 3;11, failed to correctly judge one or both of these sentences as being false according to the accompanying story. As the events depicted in these stories were quite straightforward, I believe it is unlikely that this non-target-like performance could be attributed to misinterpretation of story details. Instead, I think it more likely indicates that children take some time to acquire the marked argument structure properties of the verbs in question, to scare and to frighten, which, in violation of the thematic hierarchy (cf. Grimshaw 1990), assign the role of experiencer to an object rather than subject argument.22 One other finding from the pilot study caused me to revise the inclusion criterion that I had originally proposed for subject participation in the main study. This concerns the performance of two pilot subjects, one, a female, aged 3;7, and the other, a male, aged 3;11, who consistently judged all experiment items, whether filler, test, or control, as true, leading me to question whether either properly understood the experimental task. Interestingly, however, each of these children had performed quite well on the vocabulary and memory parts of the pre-test, in contrast to a number of their age-matched peers who had failed either one or both of these tests and had therefore been excluded from participation in the pilot study. Given the questionable 22 De Villiers, Phinney, and Avery (1982) report that “non-action verbs” (i.e. those whose argument structure representation involves an experiencer rather than agent) were poorly understood by their youngest experimental subjects in active as well as passive sentences. Thus, it is possible that the nontarget-like performance I observed in my pilot study reflects a more general problem that children experience in mapping experiencer arguments to various syntactic positions. I return to this issue in Chapter 5. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 297 performance of these subjects on the TVJ task, I was forced to consider that successful performance on parts one and two of the pre-test might not be a suitably stringent inclusion criterion for participation in the main study. I therefore added the requirement that in addition to passing both parts of the pre-test, all participants should consistently demonstrate target-like performance on filler items. 4.5 Results of main experimental study A total of forty-four children between the ages of 3;4 and 7;5 participated in the study to be discussed in this section. All forty-four were drawn from the original 122 participants in the pre-test and did not include any of the children who had participated in the pilot study. As earlier detailed, all of my child subjects were monolingual native speakers of British English, whose parents were primarily of middle or working class background. Subjects who attended the Honeypot PreSchool were all under the age of 4;8, and so I was concerned that testing should not exhaust the limited attention span of these children. Accordingly, I decided to administer the full set of twenty-four test/control items over four individual sessions, each approximately twenty minutes in length. For subjects over the age of 4;8, who attended Willingham Primary School, testing was completed in three sessions, each lasting approximately twenty-five minutes. Subjects at both locations were, as far as possible, seen on a weekly basis, although adjustments were made to this schedule in the event of a child’s absence due to illness or family holidays. Two experimental assistants were employed as puppeteers, one assistant having responsibility for pre-school-aged subjects and another for subjects of primary school age. Both women were familiar to the child participants. The primary responsibility of the assistant was to present the test sentence – via the puppet - to the subject. As previously explained, this was accomplished by having the puppet respond to the experimenter’s prompt, “What happened in that story, Fudge?” While both women were allowed to adopt a distinctive voice for the puppet during periods of free play, D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 298 they were asked to deliver the test sentence in a normal speaking voice in order to ensure that it would be properly understood by the subjects.23 There were a total of twenty-two adults who served as control subjects. Unlike child subjects, each adult was tested on a set of items that comprised only half the number of items presented to the children. This was because pilot testing had established that adults were, for obvious reasons, less interested in performing the experimental task than child subjects and therefore less willing to participate in the multiple testing sessions which would have been required to administer a full set of over thirty test, control, and filler items.24 Since I believed that presentation of all of these items in a single session would strain even adult capabilities, I adopted the compromise solution of increasing the number of adult subjects from the original eleven to twenty-two, with two adults sharing a complete set of items between them. For adult as well as child subjects, the order of presentation of individual experimental items was randomly varied, subject to the following two exceptions: (1) Two items testing knowledge of the same type of construction were never presented consecutively, and (2) A filler item followed the presentation of every two test items for child subjects and every three items for adults. 23 The assistants were both speakers of British English, as well as residents of the village in which the testing took place. The test sentences were therefore delivered in a variety of English that was familiar to all participants. However, the author of this thesis, a speaker of American English, told the TVJ stories. Potential problems introduced by this situation were addressed in a number of ways. First, I served as a volunteer classroom assistant during the months preceding the experimental study and therefore had extensive opportunity to interact with potential subjects prior to their participation in the study. Second, I administered the pre-test and informally used these sessions to assess any potential for miscommunication. Finally, all of the vocabulary used in the experimental items was based on norms of British and not American English. I believe that the effectiveness of these measures is confirmed by the fact that for the duration of the study, I observed no instances in which a child failed to understand my use of spoken English. 24 Based on her own experience with use of the TVJ task, Maria Teresa Guasti (p.c.) has suggested that adult interest and/or attention in the TVJ task can be better maintained if stories are presented on videotape. One advantage of this technique that I can envision involves some lessening of the embarrassment that adults may naturally feel when participating in an activity that has been designed to appeal to children. However, as I wished to avoid introduction of extraneous (i.e. nuisance) variables that could undermine the validity of the findings I obtained, I preferred to maintain, as far as possible, parity in the conditions under which child and adult subjects were tested. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 4.5.0 Tough constructions (TCs) 4.5.0.0 Presentation of group findings 299 Recall that the null and experimental hypotheses for this condition were as in (11a&b), below: (11) a. Null hypothesis: The child possesses target-like knowledge of the TC and so will allow only object readings of the construction. b. Experimental hypothesis: The child lacks the ability to construct a target-like syntactic representation of an NOS and is consequently able to access only subject (i.e. non-target-like) readings of the TC. I also acknowledged the possibility that a child participant could allow both subject and object readings of the TC. I decided that such a pattern of performance, although non-target-like, would not be taken as evidence for the experimental hypothesis, as I think a more narrow formulation of the experimental hypothesis is preferable from the standpoint of experimental design. Table 4.11, below, compares the performance of each age group on easy, hard, and difficult items, as well as on all TC items combined. The figures listed represent the actual number of target-like responses provided in each condition. Note that in order to adjust for missing values, the number of target-like responses obtained is at times expressed as the numerator of a fraction with the total number of responses as its denominator.25 25 Overall, there were six instances in which a score for an individual item could not be obtained, three due to procedural error and three due to a subject’s failure to provide a clear judgment of a test/control sentence. D.L. Anderson, University of Cambridge 300 Chapter 4: Experimental Design and Presentation of Results Grp Ages easy TCs (items = 4) hard TCs (items = 4) difficult TCs (items = 4) All TCs (items = 12) 1 3;4 - 4;4 17 (38.6%) 20 (45.5%) 15/43 (34.9%) 39.7% 2 4;6 - 5;5 16/42 (38.1%) 24/43 (55.8%) 18 (40.9%) 45.0% 3 5;6 - 6;3 19 (43.2%) 26 (59.1%) 27 (61.4%) 54.6% 4 6;5 - 7;5 35/43 (81.4%) 34 (77.3%) 38/43 (88.4%) 82.3% Table 4.11: Total object readings per TC test condition and overall (NB: Percentages reported in the final column have been adjusted to account for any missing values in the individual conditions.) In Figure 4.12 below, I provide a graphic representation of the results that are reported in the final column of Table 4.11: 1.2 1.0 13 Mean % Object Rdgs - TCs .8 3 .6 .4 .2 0.0 N= 11 11 11 11 3:4 to 4:4 4:6 to 5:5 5:6 to 6:3 6:5 to 7:5 Age group Figure 4.12: Boxplot graph of mean percentage object readings provided by age group on combined TC items D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 301 In Figure 4.12, the solid black line drawn across each box represents the median percentage of object, or target-like, readings obtained for each group (i.e. the 50th percentile), while the lower and upper edges represent the 25th and 75th percentiles, respectively. Variability in the performance of individual subjects is expressed in terms of the length of the lines (or “whiskers”) that trail from the upper and lower edges of each box. As can be readily observed, the subjects in the third age group (aged 5;6 to 6;3) exhibited the greatest variation in individual scores, ranging from a low of 17% object readings to a high of 92%. However, as the graph also indicates, considerable individual variation was observed in the first and second age groups as well, and therefore it is only with respect to those subjects over the age of 6;5 that individual performance becomes more clearly homogeneous. There are two outliers identified in the boxplot graph by means of circles (O). These are subjects no. 3, aged 3;8, and no. 13, aged 4;7, who each obtained a score that was more than one standard deviation from the mean for their respective age group.26 In the period preceding the experimental study, these two children had been identified by their teachers as being of average academic ability. Following the study, however, I administered the British Picture Vocabulary Test (BPVS) to all child participants (see discussion in §4.6) and the results revealed that subjects 3 and 13 had each scored in the 99+ percentile for their respective age group; certainly, then, neither could be considered average in terms of their vocabulary ability. Nevertheless, as will be further discussed in §4.6, I did not in fact observe any necessary correlation between exceptional performance on the BPVS and uniformly target-like performance on TC items. For example, subject no. 9, aged 4;2, obtained a BPVS score that was also in the 99th percentile for his age group, and yet he provided only 58% object readings on TC items overall. According to this consideration, then, and given the fact that considerable variability was attested in the individual performance of subjects in this age group (i.e. group 1), I decided not to exclude the 26 Specifically, subject no. 3 provided 75% object readings compared to an age group mean of 39.4% and subject no. 13 provided 82% compared to an age group mean of 43.9%. D.L. Anderson, University of Cambridge 302 Chapter 4: Experimental Design and Presentation of Results data collected from subjects no. 3 and no. 13 from the statistical analysis of my results. I now look at performance in each of the three TC test conditions, easy, hard, and difficult, in more detail. Beginning with easy TCs, Table 4.12, below, reports the total number of object readings obtained by age group for each of the easy test/control items, as well as for all four items combined: easy2 (TS-OB) easy3 (CS-SB) easy4 (TS-SB) All items (1 to 4) Items 1, 3, 4 only Grp Ages easy1 (CS-OB) 1 3;4 - 4;4 4 (36.4%) 4 (36.4%) 4 (36.4%) 5 (45.5%) 38.6% 39.4% 2 4;6 - 5;5 5/10 (50%) 3/10 (30%) 5 (45.5%) 3 (27.3%) 38.1% 40.6% 3 5;6 - 6;3 4 (36.4%) 3 (27.3%) 6 (54.6%) 6 (54.6%) 43.2% 48.5% 4 6;5 - 7;5 10/10 (100%) 5 (45.5%) 10 (90.9%) 10 (90.9%) 81.4% 93.8% 5 Adult 11 (100%) 8 (72.7%) 11 (100%) 11 (100%) 93.2% 100% Table 4.12: Total object or target-like readings of ‘easy’ TCs In looking for differences in between-group performance, I first analysed the results reported in the penultimate column of Table 4.12, using actual counts rather than percentages for the purposes of statistical analysis. Due to the relatively small number of subjects per age group, and due to the fact that the data reported in Table 4.12 do not meet all the criteria for the use of a one-way ANOVA27, I opted to use a Kruskal-Wallis test to analyse between-group performance. This is a nonparametric alternative to one-way ANOVA that imposes no special requirements on the 27 For example, distribution of values within each sample of data is not uniformly normal. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 303 distribution of the data.28 For groups 1 to 3, the results of the test revealed there to be no significant difference between the number of object readings obtained for each group (χ2 (2, N=33) = .138, p < .933), indicating that subjects below the age of 6;3 performed as a single group with respect to easy TCs. When groups 1 to 5 were compared, however, a significant difference was observed (χ2 (4, N=55) = 30.514, p < .001), and the same finding obtained when the adult controls were removed from the analysis and only child groups 1 to 4 were compared (χ2 (3, N=44) = 15.873, p < .001). Thus, while subjects in the first three age groups performed as a single population with respect to easy items, these subjects provided significantly fewer target-like readings than either those in the oldest child group or the adult control group. Finally, with respect to age group 4 and adult group 5, I used a Mann-Whitney test to compare the performance of the two, but this failed to reach significance (U (11,11) = 37.000, p < .133). The performance of subjects above the age of 6;5 was thus consistent with that of adults on easy TCs. I next compared performance by each age group on individual easy items. Using a Cochran’s Q test for distribution of a dichotomous variable across several related samples, where 0 = subject reading and 1 = object reading, I found no significant difference in the performance of groups 1 to 3 on any particular easy TC (Grp. 1: Q(3) = .333, p < .954; Grp. 2: Q(3) = 2.143, p < .543; Grp. 3: Q(3) = 3.000, p < .392). That is, I found no evidence that subjects in any of the first three age groups experienced relatively greater difficulty with a particular easy item or items. Since the distribution of subject and object readings was observed to be statistically similar across all four easy TCs, I submit that these findings therefore suggest that, contrary to the predictions contained in Crain and Thornton (1998) (see discussion in §4.3.0.2), the 28 This does not imply, of course, that the use of nonparametric tests is requirement-free. For example, use of the Kruskal-Wallis test requires that the samples of data to be compared have equal variances. With respect to the data reported in the penultimate column of Table 4.12, homogeneity of variance for groups 1 to 4 was determined through use of a Levene test. A caveat should also be mentioned with regard to the use of nonparametric statistical tests. While these tests require less stringent assumptions about the data to be analysed, they are also considered less powerful than their parametric counterparts. Consequently, it is possible that use of a nonparametric test may fail to reveal a significant difference that does in fact exist between two or more compared samples. Thus, this consideration should be kept in mind when interpreting the results reported above. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 304 direction of the affirmative response and/or order of presentation bias in individual easy items did not exert any detectable influence on children’s choice of a subject or object reading of the TC. In the case of age group 4, however, the distribution of object readings was found to differ significantly across the 4 easy items (Q(3) = 12.000, p < .007). Subsequent statistical testing established that this difference could be attributed solely to performance on a single item, easy 2, for which object readings represented less than 50% of the total responses obtained, as compared to 90% of the total responses obtained in the case of the other three items. Moreover, the same consideration was found to underlie the significant difference observed in the performance of adult controls when compared across the four TC items (Q(3) = 9.000, p < .029), since adults made three errors on easy 2 but no errors on the remaining three items. With regard to the non-target-like performance of the three adult subjects on easy 2, it is important to note that I found no evidence that these particular adults, nor any other of my control subjects, ever assigned a subject reading to a TC. Rather, I observed throughout the study that adult judgments that deviated from expected norms could most often be explained in one of the following two ways: (1) subject inattention to story details, or (2) the subject’s use of general or world knowledge, as opposed to specific story context, when determining an appropriate interpretation of the test sentence. Based on the types of explanations offered by the three adults in question, I suggest that it is the latter of the two factors that is implicated in their non-target-like performance on easy 2. According to the design of the story that preceded presentation of the test sentence The boy was easy to help, it was anticipated that a true response would be associated with the object reading of the sentence, since a fireman finds it easy to come to the aid of a boy. Consistent with this prediction, I observed that some child and adult subjects did provide such an explanation of their target-like judgment of easy 2, as illustrated in (12), below: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (12) 305 E(xperimenter): Why was the boy easy to help? a. “Cause the fireman get the steps.” (female 4;3) b. “All he (= fireman) needed was a ladder.” (male 6;0) c. “Because the fireman could easily get into the cage.” (adult subject #19) For child subjects who judged the test sentence false, it was anticipated that they would explain their non-target-like response in terms of the fact that, according to the context of the story, the little boy was unable to help the fireman open the cage door because the little boy had hurt his legs when he fell into the cage. I recorded nine such explanations, three of which are illustrated in (13), below. (Note that ‘+’ marks a pause or hesitation.) (13) E: Why was the boy not easy to help? a. “He couldn’t help cause he was stuck.” (male 3;8) b. “Because the boy was in the cage and he couldn’t help ++ the fireman.”(female 4;9) c. “Because umm he had hurt his legs and + and he couldn’t open the gate.” (female 5;3) Notably, none of the three adults who erroneously judged the test sentence false gave the type of explanation exemplified in (13); instead, they offered explanations which suggested that their interpretation had been influenced by extra-contextual considerations. For example, although the fireman in the story had openly remarked about the lack of difficulty he had experienced in rescuing the child, stating, “See, I told you it would be no trouble to help you, little boy,” it seemed that these three adults had nevertheless viewed the series of actions that the fireman undertook in his rescue attempt to involve a certain degree of difficulty. As one adult female explained, “That’s not easy to help him (= boy) if he (= fireman) can’t get an obvious doorway to get in.” When questioned as to whether she had taken note of the fireman’s own favourable assessment of his efforts, the subject replied, “I heard what he said but (I) still thought it was hard for him to get in. I thought he was just sort of trying to calm the boy down.” And, similarly, another female adult explained, “We’re D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 306 brought up to believe that firemen – well, we’ve been through this – they always say it’s no trouble regardless of whether it is.” Regrettably, pilot testing of this particular item failed to reveal it as problematic, given the relatively limited number of subjects involved in the pilot study, and the item was therefore retained for use in the main study. As atypical performance by adult subjects on easy 2 raises legitimate questions regarding the suitability of this test item, I carefully reviewed the explanations provided by my child subjects, looking for similar patterns of performance. I found two such subjects in age group 4, whose explanations suggest to me that their interpretation of the test sentence could have been influenced by the same types of extra-contextual considerations discussed above. Therefore, I felt it was prudent to re-run my statistical analysis of subject performance on easy items, excluding all of the data collected for easy 2. This reanalysis of the data, however, produced no different results than those previously reported in connection with all four easy items; that is, groups 1, 2, and 3 were still observed to perform as a single population and to differ from groups 4 and 5, which similarly performed as a single group. I now turn to performance on TC items featuring the adjective hard. Table 4.13 reports the total object readings obtained per age group for each of the hard test/control items, as well as for all four items combined: D.L. Anderson, University of Cambridge 307 Chapter 4: Experimental Design and Presentation of Results hard5 (CS-OB) hard6 (TS-OB) hard7 (CS-SB) hard8 (TS-SB) All items (5 to 8) 3;4 - 4;4 4 (36.4%) 7 (63.6%) 5 (45.5%) 4 (36.4%) 45.5% 39.4% 2 4;6 - 5;5 5 (45.5%) 5 (45.5%) 9/10 (90%) 5 (45.5%) 55.8% 59.4% 3 5;6 - 6;3 7 (63.6%) 4 (36.4%) 7 (63.6%) 8 (72.7%) 59.1% 66.6% 4 6;5 - 7;5 8 (72.7%) 5 (45.5%) 11 (100%) 10 (90.9%) 77.3% 87.9% 5 Adult 11 (100%) 9 (81.8%) 10 (90.9%) 11 (100%) 93.2% 97% Grp Age 1 Items 5 ,7, 8 only Table 4.13: Total object or target-like readings of hard TCs A statistical analysis of the results reported in the penultimate column of Table 4.13 revealed that for groups 1 to 3 there was no significant difference in performance on the four combined hard items (Kruskal-Wallis, χ2 (2, N=33) = 3.326, p < .190). A significant difference was obtained, however, when the performance of groups 1 to 5 was compared (χ 2 (4, N=55) = 27.041, p < .001), and also when groups 1 to 4 were compared (χ 2 (3, N=44) = 11.606, p < .009). Because the results of a subsequent Mann-Whitney test revealed no significant difference in performance (U (11,11) = 34.000, p < .088) between groups 4 and 5, these results parallel those reported for easy items. That is, groups 1 to 3 performed as a single population with respect to hard items and differed from child subjects in group 4 and adult subjects in group 5, who performed as a single group. For groups 1 to 3, a within-groups comparison (i.e. a Cochran’s Q-Test) of performance on individual hard items revealed no significant difference in the distribution of object readings for any of these items. Thus, subjects below the age of 6;3 did not experience any relatively greater difficulty with a particular hard item or items. Once again, then, the direction of the affirmative response and/or order of D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 308 presentation bias did not have an appreciable effect on subject interpretation of specific hard items, pace the predictions made by Crain and Thornton (1998). For group 4, however, a similar comparison did reveal a significant difference (Q(3) = 10.500, p < .015), which subsequent statistical testing located to a contrast in this group’s performance on hard 6, as compared to other hard items. Specifically, fewer than half the subjects in group 4 provided target-like responses for hard 6, while such responses averaged 83% of the total responses collected in the case of the other three hard TCs. Additionally, I noted two adult subjects who failed to give a target-like response for this same item.29 Certain changes had been made to the design of hard 6 after pilot testing, although, as in the case of easy 2, some of the problems associated with this item regrettably did not become evident until it was tested on greater numbers of child and adult subjects in the main study. Problems associated with hard 6 differed somewhat from those associated with easy 2, although I must acknowledge the possibility of related complications given that these were the only two TCs in the study that featured the embedded verb to help. The story that accompanied hard 6, The rabbit was hard to help, first presented a context in which a girl asks a rabbit for help in finding her lost 29 As Table 4.13 indicates, there was also one adult subject who gave a non-target-like response to hard 7, The hedgehog was hard to ride. This error, however, was found to be wholly attributable to the subject’s inattention to story details, with the subject believing that the hedgehog was hard for the frog to ride because it had prickly fur. In explaining her non-target-like response, the subject herself caught her mistake (i.e. that the hedgehog was a baby and so had soft fur) and subsequently changed her initial judgment of the sentence without any prompting. As previously noted, inattention was one of the primary factors contributing to non-target-like performance by my adult subjects. However, this factor was only intermittently correlated with problematic performance, as I recorded a number of instances in which adult subjects admitted to momentary lapses in attention while TVJ stories were being told yet proceeded to assign a correct interpretation to the test/control sentence. Therefore, in considering the range of acceptable adult performance, I chose to consider one non-target-like response out of twelve TC items as a reasonable margin of error. In fact, at the completion of the study, I found that on average adult subjects had provided eleven correct responses out of twelve, rather than twelve out of twelve. Finally, I find it interesting that, in my experience, inattention to story details was not a problem that generally affected my child subjects. This is because child subjects of all ages typically proved able to recount even the most minor details of stories and would often provide this type of information without prompting. I took this behaviour as a strong indicator of subject interest in the task and in the content of the stories themselves. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 309 spoon but is unable in turn to help the rabbit get out of his hutch. Test materials for this item were as illustrated in Figure 4.13, below: Figure 4.13: Materials used for hard 6, ‘The rabbit was hard to help.’ Ultimately, the rabbit is able to make his own way out of the hutch and he succeeds in helping the little girl find her spoon. On the object reading of the sentence (i.e. The rabbiti was hard to help ei), both child and adult subjects were therefore expected to judge the sentence true and to explain their judgment in terms of the difficulty that the little girl experienced in trying to free the rabbit from his hutch. I recorded a number of explanations of this type, which are exemplified in (14), below: (14) E: Why was the rabbit hard to help? a. “Because + because the girl couldn’t get the rabbit out.” (male child 6;5) b. “Because she wasn’t strong enough to open the hutch.” (adult subject 1) Conversely, I anticipated that child subjects who accessed the subject reading of the sentence (i.e. The rabbiti found it hard PROi to help the girl), would judge the sentence false and explain their judgment in terms of the ultimate success that the D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 310 rabbit experienced in helping the little girl find her spoon. Explanations of this type are illustrated in (15), below: (15) E: Why was the rabbit not hard to help? a. “Umm, the rabbit could help her.” (female 4;8) b. “Umm he was easy to help + cause he could find it (= spoon) right under this chair.” (male 6;1) Turning now to the aberrant responses that were provided by the two adult subjects referenced above, both provided an explanation of their non-target-like response that differed in kind from the type of explanation offered by the children in (15). Specifically, the two explanations in question were as listed in (16), below: (16) E: Why was the rabbit not hard to help? a. “It was the little girl that wanted the help + + as opposed to the rabbit.” (subject A24) b. “Because the rabbit actually got out of there by itself so although she stood on the basket and tried to help, umm +++ I think probably she gave the rabbit the opportunity to sort of help itself, really.” (subject A2) In (16a), the subject’s response appears to indicate that the little girl’s request for help was, at least for this subject, a more salient event than the rabbit’s request for help. Rather more problematically, in the example listed in (16b), the relevant consideration instead pertains to the issue of whether attempts to help the rabbit should be attributed to the girl or to the rabbit himself. Clearly, in each case, the rabbit is perceived as the grammatical object of the embedded verb, and therefore in neither case is it appropriate to assume that the adult has assigned a subject reading to the sentence. Yet, at least in the second case, I think it is possible that this adult’s interpretation of the sentence might have been something along the lines of It wasn’t hard for the rabbit to help himself get out of the cage. If so, this would certainly be problematic since the standard syntactic representation of the TC does not permit such an interpretation (cf. *The rabbit was not hard to help himself). D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 311 I also noted six children who offered superficially similar explanations to those illustrated in (16a&b).30 For example, two of these children explained their false or non-target-like interpretation of The rabbit was hard to help as follows: (17) a. Puppet: Why was I wrong? “Cause umm you said the rabbit + the rabbit was too hard to get out.” (female 4;3) b. E: What were you thinking about when you said Fudge was wrong? S: “Umm, well, the rabbit umm was quite clever because he jumped up in the air and got hisself [sic] out.” (female 7;4) With respect to (17a), I think the child’s explanation of her false judgment of test sentence, in the context of the story presented, only properly accords with an interpretation of the TC along the lines of, It was not hard for the rabbit to help himself get out of the cage (i.e. *The rabbit was not hard to help himself). Of course, given that this child has chosen to explain her interpretation of the test sentence by offering a DC, the possibility remains that the meaning of her explanation was The rabbit was not too hard for the girl to help get out of his cage. But since this meaning of the ODC flatly contradicts the story details, I think it is unlikely to be the one that the child had in mind. (As an aside, I find it interesting that this child would choose a DC to explain her interpretation of a TC, given that I argued in §2.4 of Chapter 2 that the base representation of the latter structure most likely includes a projection for a degree constituent, which may remain optionally unfilled.) As for (17b), I submit that the child’s explanation of her false judgment of the TC, The rabbit was too hard to help, suggests even more clearly that she has interpreted the TC to mean, The rabbit was not too hard to help himself (get out of his cage). Finally, I am aware of one additional subject, aged 6;0, who seems to have also 30 Notably, four of these six subjects were in the oldest age group (i.e. group 4) and these were the only children in this group to give non-target-like judgments of hard 6. Thus this pattern of performance, while potentially problematic for the reasons discussed above, nevertheless is consistent with the statistical analysis of the data that I performed, which revealed that child subjects in age group 4 generally performed like adults with respect to hard items. D.L. Anderson, University of Cambridge 312 Chapter 4: Experimental Design and Presentation of Results accessed a reflexive interpretation of the verb to help, since when questioned as to who was helping the rabbit in the story, he replied, “The rabbit was helping hisself [sic].” Therefore, on the basis of the evidence reviewed above, I thought it reasonable to question whether certain child (and, possibly, adult) subjects in the study may have accessed an unintended interpretation of the test sentence, thus leading to the loss of some measure of contextual control in the use of this particular item. Given this concern, I decided to re-run my statistical analysis of between-group performance on individual hard TCs, with hard 6 removed from the calculations. However, as was previously reported in connection with my adjusted analysis of performance on easy items, I found that the statistical results obtained for hard TCs remained the same whether or not hard 6 was included in the calculations. That is, groups 1 to 3 still performed as a single population and differed from groups 4 and 5, which performed as a single group. Lastly, I look at between-group performance on difficult TCs, which is reviewed in Table 4.14 below: Grp Ages diff 9 (CS-OB) diff 10 (TS-OB) diff 11 (CS-SB) diff 12 (TS-SB) All items 1 3;4 - 4;4 4 (36.4%) 4 (36.4%) 3 (27.3%) 4/10 (40%) 34.9% 2 4;6 - 5;5 4 (36.4%) 4 (36.4%) 5 (45.5%) 5 (45.5%) 40.9% 3 5;6 - 6;3 8 (72.7%) 5 (45.5%) 4 (36.4%) 10 (90.9%) 61.4% 4 6;5 - 7;5 10 (90.9%) 8/10 (80%) 10 (90.9%) 10 (90.9%) 88.4% 5 Adult 9 (81.8%) 10 (90.9%) 11 (100%) 10 (90.9%) 90.9% Table 4.14: Total object or target-like readings of difficult TCs D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 313 As was reported in the case of both easy and hard, I found no significant difference in between-group performance on difficult when groups 1 to 3 were compared through application of a Kruskal-Wallis test (χ 2 (2, N=33) = 3.770, p < .152).31 Therefore, regardless of the particular tough adjective tested, subjects in the first three age groups performed as a single population. Application of the same test did, however, reveal a significant difference when the performance of groups 1 to 5 was compared (χ 2 (4, N=55) = 27.679, p < .001), and also when groups 1 to 4 were compared (χ 2 (3, N=44) = 18.470, p < .001). A subsequent comparison of groups 4 and 5 through application of a Mann-Whitney test did not reach significance (U (11, 11) = 52.000, p < .606), however. Thus, paralleling the findings reported for both easy and hard TCs, the findings for difficult indicate that children above the age of 6;5 performed statistically like adults. As on hard TCs, adult performance on difficult items was not uniformly target-like. Nevertheless, I do not view this situation as particularly problematic: This is because with the exception of the two errors reported for difficult 9, which could both be attributed to subject inattention, 32 the single error reported for each of the other three difficult items still fell within what I have defined as the acceptable range for adult performance (see the discussion in ftnt. 29). 31 I earlier observed (see ftnt. 28) that use of the Kruskal-Wallis test is standardly based on the assumption that data samples have equal variances, although it does not require equal distribution. In the case of performance on difficult items, however, the results of a Levene test for equality of variances amongst groups, both with and without controls, was negative. Consequently, because one of the basic assumptions for use of the Kruskal-Wallis test was not met in the case of the data obtained for difficult, I wish to make clear that statistical results reported for this set of data may be of lesser reliability than those reported in the case of either easy or hard items. 32 This determination is based on remarks made by the two adult female subjects during follow-up questioning. In the story accompanying TC9, The king was difficult to draw, one scenario involved a king who experienced difficulty in drawing a picture, while the other involved a princess who initially expressed some reservation about her ability to draw a picture of her father, the king, but in the end found the task quite easy. The first subject admitted that she hadn’t been “concentrating very hard” during the story-telling phase of the task and, consequently, could recall only the part of the story in which the princess had expressed concern regarding her ability to draw her father. The second subject explained her positive judgment of the control sentence as follows: “I suppose + I suppose because looking at it (i.e. picture of the king), I’d have a job to draw it.” Her response therefore was consistent with a misinterpretation of TC9 as meaning something like The king would be difficult for anyone to draw. She, like the first subject, later admitted that inattention had played a role in her (erroneous) positive evaluation of the sentence, remarking that, “I think my mind went more on the picture than (on) listening to what you were saying.” D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 314 I next analysed within-group performance across each of the four difficult items. For age groups 1 and 2, a Cochran’s Q Test revealed no significant difference in group performance, and the same observation held true for groups 4 and 5. A significant difference in the percentage of object readings obtained was detected for group 3, however, (Q(3) = 13.000, p < .005), which was subsequently located to the contrast between items 11 and 12. These two items differ in the direction of the affirmative response bias, with difficult 11 favouring the subject or non-target-like reading and difficult 12 the object reading, but share a similar order of presentation bias in favour of the subject reading. In the case of item 11, which had a double bias towards the subject reading, these readings did account for a greater proportion of responses but only by a relatively narrow margin (i.e. 55% subject readings versus 45% object readings). In contrast, in the case of the mixed-bias difficult 12, children in the same age group favoured object readings of this item by 90%. Nevertheless, I am reluctant to assign any great importance to this particular finding, given that this is the only instance I observed in which a between-item comparison reached statistical significance and given that once again I am unable to draw any clear correlation between the direction of the affirmative response and order of presentation biases and the observed pattern of subject response. In closing this section, I analyse overall TC performance, both within and between age groups. I used a Friedman test to compare within-group performance across combined easy, combined hard, and combined difficult items in order to determine whether subjects experienced any relatively greater difficulty with a particular tough adjective or adjectives. The results failed to reach significance, however, regardless of whether potentially problematic items easy 2 and hard 6 were retained or removed from the analysis. Thus, it appears that subjects in the study did not experience any relatively greater difficulty with the interpretation of a particular tough adjective. My findings therefore contrast with those reported by McKee (1997a), since McKee observed an overall interaction between age group and adjective type. However, as discussed in §4.2.0.0, I believe there is reason to question the reliability of certain of the data obtained by McKee, since this included responses collected for TC items that contained adjectives whose meaning was unknown to the subject. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 315 With respect to the between-group performance of my subjects, recall that I reported in the previous section that for each of the three TC test conditions (i.e. easy, hard, difficult), there was no significant difference observed when the performance of children in groups 1, 2 and 3 was compared. The performance of children in group 4, however, was observed to differ significantly from that of children in the younger three age groups and to match that of the adult controls. As anticipated, these condition-specific findings were confirmed when I made a final comparison of age groups in terms of overall performance on the twelve TC items. This comparison was performed using the full set of twelve items, as well as a reduced set of these items with easy 2 and hard 6 removed. With only one exception, the results confirmed those earlier reported for easy, hard, and difficult conditions. The single exception concerns the performance of age groups 4 and 5, for whom the difference in overall TC performance was found to approach significance according to the results of a Mann-Whitney test (U (11, 11) = 31.500, p < .056). However, this result applies only when the full set of TC items is considered, since when easy 2 and hard 6 are removed from the comparison set, no significant difference is obtained (U (11, 11) = 41.000, p < .217). Thus, given the concerns raised earlier about the suitability of these two particular experimental items, I prefer to accept the latter of these two findings as the more reliable. 4.5.0.1 Analysis of individual performance on the TC In this section, I analyse the performance of individual subjects on TC items according to the three-way classification first proposed by Cromer (1970), which distinguishes Primitive-rule Users, Intermediates, and Passers. As discussed in §3.2.1.0 of Chapter 3, Primitive-rule (P-R) Users are so-named because these children are hypothesized to interpret the TC according to the use of a surface structure heuristic or primitive rule, which treats the matrix subject DP as being co-referential with the subject of the embedded infinitive verb. Thus, the criterion for classification as a P-R User is that the child should consistently interpret the TC in a non-target-like manner. Intermediates, in contrast, are defined as those children who provide mixed D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 316 target-like and non-target-like readings of TCs and Passers as those who provide target-like readings only. One of the more notable findings of my investigation of children’s comprehension of the TC was that there were no subjects in my study who could be classified as P-R Users, according to Cromer’s original criterion of performance. That is, if I follow Cromer in defining P-R Users as those subjects who gave no target-like readings of TC items, then none of my forty-four subjects can be so classified. When analysing the performance of my adult subjects on TCs, however, I earlier noted that I considered one non-target-like response per twelve items to be a reasonable margin of error for these subjects. Accordingly, if I revise the criterion for P-R Use to allow my child subjects the same margin of error, 33 I still, rather surprisingly, observe only one child in the study, a male, aged 3;8, who can be classified as a P-R User. It is perhaps misleading, however, to classify subjects according to performance on all twelve TC items when easy 2 and hard 6 have been identified as being problematic. Therefore, I re-analyzed the data, excluding responses collected for easy 2 and hard 6. Using Cromer’s criterion of no target-like responses, the results revealed that, once again, only the single male subject referenced above could be classified as a P-R User. And even using the more liberal criterion of allowing one target-like response out of the now ten total items, I still found only three subjects who met this criterion: subject no. 4 (already referenced), subject no. 22, a male, aged 5;4, and subject no. 25, a female, aged 5;8. Thus, according to the most liberal criterion I was willing to accept, P-R Users still comprised less than 10% of the forty-four children involved in the study. With respect to Intermediates and Passers, I again first classified subjects according to Cromer’s strict criteria, using the full set of twelve TC items as a basis for analysis. I found only two subjects, no. 36, a male, aged 6;8, and no. 38, a female, aged 6;10, 33 I note that in a later study, Cromer (1983a:313) actually proposed a similar revision of his original criteria for classification of performance. Allowing for the possibility of occasional inattention to the task, as I have here, he defined P-R Users as those providing one target-like response or less out of ten possible and Passers as those providing no more than one non-target-like response out of ten. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 317 who could be classified as Passers, having provided only target-like responses. And since, as earlier noted, all subjects in the study provided at least one target-like response, the remaining forty-two subjects (i.e. 96%) would thus obligatorily be classified as Intermediates according to Cromer’s (1970) original criteria. On the other hand, if I adopt the more relaxed criterion of allowing one target-like response per P-R User and one non-target-like response per Passer, and additionally use only the reduced set of ten TCs as the basis for analysis, I find a somewhat more balanced breakdown of subject performance, with three subjects performing as P-R Users, thirty as Intermediates, and eleven as Passers. Notably, under either of the analyses discussed above (i.e. Cromer’s criteria plus full set of items, as opposed to relaxed criteria plus reduced set of items), Intermediates still comprise the majority of my subjects (i.e. 68% according to the relaxed criteria), and thus a sizeable number of my subjects were observed to provide both target-like and non-target-like readings of the TC. Interestingly, this type of performance was not associated with a particular age group, since Intermediates were found in all four of the child groups I tested and ranged in age from 3;4 to 7;3. Passers, too, were represented in all age groups in my study, with the exception of group 1, and ranged in age from 4;7 to 7;4. The age range was not so wide for P-R Users, extending from 3;8 to 5;8; nevertheless, I found it somewhat surprising that the P-R Users were not confined to the youngest group but instead included two children over the age of five. The results thus far reported are comparable with those reported by McKee (1997a), who further observes (p.c.) that there were no subjects out of the sixty-four included in her study who could be classified as P-R Users, according to Cromer’s original criteria. I think this is a striking observation given the sizeable number of subjects involved in McKee’s study and the fact that her subjects included children considerably younger (e.g. 1;11) than those involved in any other study reviewed in Chapter 3. Specifically, according to Cromer’s original criteria, McKee reports that fifty-three of her subjects could be classified as Intermediates and eleven as Passers. Even adopting the more relaxed criterion of allowing one exceptional response per either P-R User or Passer, there was still only one child in McKee’s study, aged 2;0, D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 318 who could be considered a P-R User, with forty-nine of her remaining subjects performing as Intermediates and fourteen as Passers. Therefore, similar to the findings reported in my own study, the majority of McKee’s subjects (i.e. 49/64 or 76.6%) were observed to have provided both target-like and non-target-like interpretations of the TC. Of course, it is not possible to directly compare McKee’s results with my own since the two studies involved subjects who do not completely overlap in age as well as different methodologies. Nevertheless, I believe the similarities observed here are still of great interest, particularly when the two sets of findings are compared with those reported in certain earlier studies. For example, as discussed in Chapter 3, Cromer (1970) tested forty-one children between the ages of 5;3 and 7;5 and reported that approximately 41% of these subjects could be classified as P-R Users. Holding the age range constant between 5;3 and 7;5, these results can be directly contrasted with those obtained in both Kessel (1970) and my own study (Anderson 2002a,b), since the latter two studies included children in this same age range. As Table 4.15 indicates, when individual subject performance in the three studies is compared in this manner, some rather striking differences emerge.34 (Note that in the interest of providing a more accurate comparison, I have used Cromer’s original criteria for classification of subjects and have reported my own findings only with respect to the reduced set of ten TC items.) 34 For Kessel (1970), the figures reported in Table 4.15 have been calculated on the basis of the general findings reported in his study, since he did not provide a specific breakdown of subject performance according to the three-way classification employed here. D.L. Anderson, University of Cambridge 319 Chapter 4: Experimental Design and Presentation of Results Cromer (1970) Kessel (1970) Anderson (2002a,b) P-R Users 17 (41.5%) 1 (5%) 0 Intermediates 19 (46.3%) 8 (40%) 20 (83.3%) Passers 5 (12.2%) 11 (55%) 4 (16.6%) No. of subjs. 41 20 24 No. of items 4 4 10 Lexical items tested easy, hard, fun, tasty easy, hard, impossible easy, hard, difficult Methodology act-out task Piagetian interview TVJ task Table 4.15: Comparison of the individual performance of subjects between the ages of 5;3 and 7;5 in Cromer (1970), Kessel (1970), and Anderson (2002a,b) Most notable amongst the differences observed in Table 4.15 is the considerable discrepancy that exists between the number of P-R Users reported in Cromer, as compared to either Kessel or Anderson. When considered in conjunction with the similarly low number of P-R Users observed in McKee (1997a), I submit that the evidence reviewed in Table 4.15 provides reason to question whether Cromer’s subjects were appropriately classified. In particular, I think it is likely that a number of the subjects that Cromer classified as P-R Users may in fact have been Intermediates, with their linguistic abilities thus underestimated. One factor that I believe may have contributed to the possible misclassification of subject ability in Cromer concerns his choice of lexical items for use in his test materials. For example, I earlier noted (see §3.2.1.0 of Chapter 3) that two of the adjectives he classified as tough or O-type, fun and tasty, do not meet my own criteria for inclusion in the tough class. Thus, I submit that not all of the data Cromer collected may be equally reliable as regards the acquisition of the TC. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 4.5.0.2 320 Explanations of judgments of TC items I have thus far resisted describing the TC results reported here as representing chance, above-chance, or below-chance performance on the part of my subjects, although I recognize that this is standard practice in the description of experimental behaviour. My reluctance to use these descriptive terms is justified, I think, by the evidence to be presented in this section, which was collected during follow-up questioning of child subjects (see also the supplementary evidence cited in Appendix II). The reader will recall that, according to the methodology employed in the study, subjects were typically asked to provide some explanation for their judgment of the truth-value of a particular test/control sentence; for example, after offering her true or false judgment of a sentence, the subject would be asked a question such as “So why was the monkey not easy to teach?” Although follow-up questions were not uniformly administered to all subjects, both the quality and quantity of the data that I was able to collect leads me to believe that, as a general rule, my subjects did not utilize guessing strategies in assigning an interpretation to the TC. Instead, anticipating the discussion to be contained in the final chapter of this thesis, I contend that children enter a relatively prolonged developmental period during which they have access to two interpretations of the TC, which I have termed a subject (i.e. non-target-like) reading and object (i.e. target-like) reading. As earlier noted, follow-up questions were not asked of all subjects, nor did these questions necessarily take the same form in each instance. This is because I was sensitive to the fact that extensive questioning, especially of younger subjects, could prove overly taxing. A further consideration was that I was concerned to limit the length of individual testing sessions to no more than twenty-five minutes for schoolage subjects and twenty minutes for those of pre-school-age, while still allowing for the administration of a pre-determined and optimum number of TVJ items. Accordingly, I felt that random administration of follow-up questions would be sufficient to ensure subject attention. Because TVJ stories were designed to provide two distinct contexts in which a sentence could be judged either true or false, I anticipated that children would justify D.L. Anderson, University of Cambridge 321 Chapter 4: Experimental Design and Presentation of Results their judgment of the test/control sentence by citing specific story events associated with their chosen interpretation. I was pleased, then, to observe that my child subjects generally did provide explanations of just this type and even did so on occasion without any external prompting. Table 4.16, below, provides an overall summary of subject performance in what I will henceforth term the post-judgment phase of the TVJ task. For each TC item, the table first lists the total number of subjects who were asked to provide an explanation of their judgment of the sentence, next, the number who were not asked to do so, and, lastly, the number who provided a spontaneous or unprompted explanation. TC Total subjsa Prompted Not asked Unprompted easy 1 42 24 (57.1%) 15 (35.7%) 3 (7.1%) easy 2 43 26 (60.5%) 13 (30.2%) 4 (9.3%) easy 3 44 35 (79.5%) 4 (9.1%) 5 (11.4%) easy 4 44 23 (52.3%) 19 (43.2%) 2 (4.5%) hard 5 44 39 (88.6%) 1 (2.3%) 4 (9.1%) hard 6 44 33 (75%) 8 (18.2%) 3 (6.8%) hard 7 43 34 (79.1%) 5 (11.6%) 4 (9.3%) hard 8 44 36 (81.8%) 3 (6.8%) 5 (11.4%) diff 9 44 28 (63.6%) 8 (18.2%) 8 (18.2%) diff 10 43 24 (55.8%) 10 (23.3%) 9 (20.9%) diff 11 44 33 (75%) 6 (13.6%) 5 (11.4%) diff 12 43 30 (69.8%) 11 (25.6%) 2 (4.6%) Table 4.16: Number and type of explanations offered for TC items ( a Note that percentages reported in the table have been adjusted to reflect missing data values.) D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 322 As the figures in Table 4.16 indicate, the percentage of subjects who were asked to explain their judgment varied from a low of 52.3% in the case of easy 4, to a high of 88.6% in the case of hard 5. Therefore, for each item, at least 50% of the subjects were asked to explain their true/false judgment of the TC. Table 4.17, below, provides a breakdown of the types of explanations offered by child subjects according to whether or not the child’s response could be viewed as providing support for their chosen interpretation of a particular TC.35 This determination was made according to the following considerations. If the child’s explanation referenced the story context that was intended to support their chosen interpretation of the sentence, then this was taken as an appropriate (i.e. supportive) explanation. If, however, the child provided one interpretation of the test/control sentence but then referenced story events more appropriately associated with the opposite interpretation, then this was classified as a non-supportive explanation. Additionally, there were instances in which the precise meaning of the child’s explanation could not be determined with any certainty; these responses were accordingly coded as indeterminate. Finally, there were certain cases in which children provided explanations that seemed to support an understanding of story events that differed from the interpretation shared by the majority of adult and child subjects; these responses are listed in the column entitled alternative explanation, below: 35 The means for determining the total number of explanations provided per each TC item was as follows. The number of subjects who failed to respond to the prompt was deducted from the total number of subjects asked. This figure was then increased by the number of subjects who provided an unprompted explanation of their judgment, to arrive at a final total per item. Easy 1 was associated with the highest rate of non-response to the prompt, with seven out of twenty-four children, or 29% of those asked, failing to respond. On average, however, the rate of non-response per item was less than 10%. D.L. Anderson, University of Cambridge 323 Chapter 4: Experimental Design and Presentation of Results TC item Total explanations Supports judgment of sentence Indeterminate explanation Alternative explanation Supports opposite interpretation easy 1 20 80% 10% 5% 5% easy 2 28 75% 11% 7% 7% easy 3 38 97% 3% N/A N/A easy 4 20 70% 15% 10% 5% hard 5 41 90% 5% N/A 5% hard 6 33 52% 9% 36% 3% hard 7 36 69% 11% N/A 20% hard8 39 69% 23% 8% N/A diff9 35 88% 6% N/A 6% diff10 32 69% 16% 3% 12% diff11 35 100% N/A N/A N/A diff12 29 79% 7% 4% 10% Table 4.17: Summary of the types of explanations offered for judgments of individual TC items The reader will note that three of the items listed in Table 4.17 are associated with a relatively high rate of supportive explanations, easy 3, hard 5, and difficult 11, the first two of which merit further discussion here. (Samples of post-judgment data collected for difficult 11, as well as for various other test items, can be found in Appendix II.) When analysing the post-judgment data that I collected for easy 3 and hard 5, I noted a similar complication in each case, which is that there was one particular explanation that was offered in support of both subject and object readings of the item. Taking the example of easy 3, The fairy was easy to fight, the story accompanying presentation of this control sentence involved a soldier who challenged a fairy to a fight, believing her to be armed only with a stick. In fact, the stick was a magic wand and the fairy was able to prevail in the fight with the aid of her magic wand and with the advantage conferred by her ability to fly. I recorded a number of D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 324 explanations that uniquely supported either a true (i.e. non-target-like) or false (i.e. target-like) judgment of easy 3. For example, some children favouring a non-targetlike interpretation of the sentence explained that the sentence was true because the fairy flew up into the air and knocked the soldier to the ground, while, conversely, some explained their target-like judgment of the item with appropriate reference to the soldier’s inability to use his sword in fighting the fairy. I also recorded eleven explanations of easy 3 (or 28.9% of the total) which shared a similar form and which were invoked in support of both true and false judgments of the control sentence. All involved some non-specific reference to the fairy’s magic powers, a story element that does in fact provide credible justification for either the ease with which the fairy fought the soldier or for the difficulty that the soldier experienced in fighting her. Although this state of affairs is regrettable from a design standpoint, I nevertheless chose to code all of these eleven explanations as being supportive, regardless of whether the child’s original judgment of the sentence was target-like or non-target-like. This is first because it would be inaccurate to code them as indeterminate, and second because I observed that one of my adult subjects similarly referenced the fairy’s magic powers when explaining her judgment of easy 3.36 Thus, even though the children’s responses were not as fully articulated as this adult subject’s response, I felt it was reasonable to infer that these eleven children had a similar explanation in mind to that offered by the adult subject. I furthermore note that even if the eleven explanations referenced above are deducted from the total number offered for easy 3, the rate of supportive responses is still quite close to the overall item average of 70%. For this reason, I am not unduly concerned about the lack of homogeneity in this particular set of responses. Turning now to hard 5, I similarly observed that one particular explanation was offered in support of both target-like as well as non-target-like readings of this item. This explanation, which was offered by some adult as well as child subjects, focused on one very salient story event, in which a baby elephant used her trunk to attack a tiger and flip him 36 Specifically, this response was offered by subject no. A11, a 38-year-old female, who answered the prompt, “Tell me why it’s wrong,” with the explanation, “Umm because she had magic powers and they allowed her to fight abnormally.” D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 325 over. This event does serve as reasonable justification for either a true or false reading of the test sentence, The tiger was hard to fight. For example, according to a non-target-like interpretation of the sentence, it is true that the tiger had difficulty matching the elephant’s superior fighting skills. And according to a target-like interpretation, it is false that the elephant experienced little difficulty in fighting the tiger. Regrettably, this particular weakness in the design of this item did not become evident until well into the main study, at which point I decided that it would be preferable to retain the item and report the noted flaw, rather than replace it. There were thirteen child subjects who included a demonstration of the elephant flipping the tiger over in their explanation of hard 5, in support of either a true or false judgment of the sentence. In five of the thirteen cases, subjects had been prompted by the experimenter to physically demonstrate an explanation of their judgment (e.g. “Show me why the tiger was/was not hard to fight”) when they failed to respond to the first follow-up question, and so it is perhaps not surprising that they chose to respond by re-enacting this particularly salient event. Again, however, I would argue that the post-judgment data collected for this item are not particularly problematic when it is considered that there were still twenty-four subjects who provided an explanation that uniquely supported either a true or false judgment of this item. In (18), below, I offer a sampling of these sorts of explanations: (18) True or non-target-like reading of hard 5, ‘The tiger was hard to fight’ a. “He (= tiger) hasn’t got one of these.” Subject points to elephant’s trunk. (male 4;9) b. “Cause the elephant was + strong to fight.” (male 5;1) False or target-like reading of hard 5, ‘The tiger was hard to fight’ “The tiger didn’t win.” (female 5;10) Next, I turn the focus to the figures reported in Table 4.17 for performance on hard 6, The rabbit is hard to find. Recall that hard 6 was identified in §4.5.0.0 as a potentially problematic item from the standpoint of its design. Consistent with the D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 326 concerns raised in that section, this particular item is associated with the greatest number of explanations coded as alternative rather than supportive. A number of these explanations are of the type earlier discussed, which I have suggested may involve a third, unintended, interpretation of hard 6, specifically, one based on a reflexive interpretation of the embedded infinitive verb. The data reported in Table 4.17 would thus appear to accurately reflect the problematic nature of this item. Nonetheless, with the single exception of hard 6, the figures in Table 4.17 indicate that the percentage of explanations coded as alternative or idiosyncratic was relatively low for all other items. Since atypical understanding of the TVJ stories can be seen as a rather restricted phenomenon, I think it is reasonable to consider aberrant performance of this type to be within the limit of acceptable experimental “noise.” Accordingly, in the discussion that follows, I will offer examples of alternative explanations only when I deem these to be of particular interest. Finally, before turning to consideration of post-judgment data that are more representative of the typical behaviour of my subjects, it will be helpful by way of comparison to first provide some examples of the types of explanations that I coded as ‘indeterminate.’ These are provided in (19) below: (19) E: What’s wrong with that? S: “Umm because umm umm that’s not the same and that’s wrong and that doesn’t mean + mean it.” (female 3;11) E: So why was the monkey easy to teach? S: “Cause he was magic.” (female 4;0) P(uppet): You tell me why I’m wrong. S: “Cause you couldn’t + think right.” (male 4;6) The examples in (19) illustrate that explanations classified as indeterminate usually involved some reference to the puppet’s inability to evaluate the story events properly or to some objection to the form of the puppet’s statement, which served as the test/control sentence. In other cases, such as in the second example above, there was D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 327 simply not enough information to determine whether a particular explanation supported the child’s judgment of the test sentence or not. The examples in (19) also appropriately reflect that subjects in age group 1 (ages 3;4 to 4;4) proved the richest source of explanations that could be classified as indeterminate. As children in this age group also provided the fewest explanations of their judgments overall, I think this state of affairs more likely reflects the fact that the youngest subjects lacked the metalinguistic skills of the older children, rather than the fact that they possessed lesser grammatical competence. This claim is further supported, I submit, by the numerous examples I collected of explanations offered by younger subjects that effectively served as truncated versions of adult explanations. Before leaving the topic of indeterminate responses, I should also note that when a child provided such a response, the normal procedure was for me to prompt the child to further clarify her explanation. However, consistent with the point made above about the lesser metalinguistic abilities of children in age group 1, I observed that, in general, only subjects over the age of 4;5 proved able to fully comply with such a request. In the remainder of this section, I will focus on explanations offered in connection with three specific items, easy 4, hard 7, and difficult 10. I have chosen these three items to illustrate typical behaviour in the post-judgment phase of the experiment, since together they provide a cross-section of item types in terms of the specific adjective used, as well as in terms of the order of affirmative response and order of presentation biases. In the case of the first item, easy 4, The spaceman was easy to draw, the “true” or affirmative was associated with a target-like interpretation of the sentence and the order of presentation of story events favoured the opposite interpretation. For the second item, hard 7, The hedgehog was hard to ride, both the affirmative response and order of presentation of story events favoured the non-targetlike interpretation of the sentence. These biases were reversed in the case of difficult 10, The dog was difficult to teach, with both the affirmative response and order of presentation of story events favouring the target-like reading of the sentence. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 328 Beginning with easy 4, Figure 4.14, below, illustrates the test materials used for this item and Table 4.18 provides a short review of the contextual information associated with the two interpretations of the sentence: Figure 4.14: Materials used for easy 4, ‘The spaceman was easy to draw’ Subject reading - False Object reading - True It was not easy for the spaceman to draw a picture because he couldn’t see properly through his space helmet and couldn’t remove the helmet for safety reasons. It was easy for the little boy to draw a picture of the spaceman. Table 4.18: Story contexts for easy 4, ‘The spaceman was easy to draw.’ For a true or target-like judgment of easy 4, a typical adult explanation was as in (20), below: (20) “I was thinking, yes, the little boy found him easy to draw cause he just drew the picture and it’s a good picture of him.” (female subject A18) D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 329 However, another explanation offered by one of my adult subjects, illustrated in (21), below, appears problematic in comparison: (21) “I’m going to say true but the only reason I’m going to say true is because in your story, it didn’t take him (= the little boy) very long.” (female subject A20) As was earlier noted in my discussion of problematic aspects of hard 6, the explanation reported in (21) provides further confirmation that adult subjects are not necessarily consistent in their assessment of the relative ease or difficulty of performing some act. With reference to easy 4, this inconsistency is perhaps not entirely surprising, since the boy’s relative lack of difficulty in drawing the spaceman was inherently a less dramatic state of affairs than the spaceman’s futile efforts to draw the boy. Yet, I find it interesting that none of the forty-four children tested on this item articulated the same considerations as those raised by the adult subject in (21). This, of course, does not preclude the possibility that my child subjects may have entertained similar thoughts, but the post-judgment data that I collected for this item provide no direct support for such a claim. Child subjects, then, like the majority of adult subjects, explained their true judgment of easy 4 in accordance with my predictions; thus, the examples illustrated in (22), below, resemble the adult response listed in (20): (22) E: Why was the spaceman easy to draw? Show me. S: Subject demonstrates little boy drawing by putting paper in his lap and the crayon in his hand. (female 3;10) E: Why was the spaceman easy to draw? Why do you think? S: “Cause the picture’s kind of easy. I could draw it.” (male 5;8) For non-target-like judgments of easy 4, child subjects offered the following types of explanations in response to the prompt, “Why was the spaceman not easy to draw?”: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (23) 330 a. “He can’t draw.” (female 3;6) (Note that “he” presumably refers to the spaceman in the story since the little boy was depicted as successfully drawing a picture.) b. “Cause he had his ah + his helmet on and he couldn’t see the paper.” (male 5;0) c. “The boy was easy to draw and the spaceman couldn’t because his + his helmet was in the way.” (female 5;11) d. “Because the boy umm said come have a try at drawing and he (=spaceman) couldn’t draw because his helmet was in the way.” (male 6;2) As the examples in (23) illustrate, child subjects who interpreted the sentence as false typically explained their objection to the puppet’s statement in terms of the spaceman’s inability to draw a picture; thus, the explanations provided by these children are consistent with an interpretation of the TC in which the matrix subject DP, the spaceman, serves as the agent rather than as the object of the embedded verb to draw. I therefore take explanations of this type as providing support for the experimental hypothesis that children have access to an interpretation of the TC that is unavailable to adults. I now turn to post-judgment performance on hard 7, The hedgehog was hard to ride. Figure 4.15, below, illustrates the test materials used for this item, followed by Table 4.19, which lists the story contexts supporting the subject and object readings of this sentence: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 331 Figure 4.15: Materials used for hard 7, ‘The hedgehog was hard to ride.’ Subject reading - True Object reading - False It was hard for the hedgehog to ride the frog because the frog’s back was slippery. The hedgehog was not hard for the frog to ride because the hedgehog, being a baby hedgehog, had soft fur the frog could hold onto. Table 4.19: Story contexts for hard 7, ‘The hedgehog was hard to ride’ Consistent with the story events reviewed above, a typical adult explanation of a false or target-like judgment of hard 7 was as in (24), below: (24) a. “The hedgehog wasn’t hard to ride because he had fur to hold onto. The frog was hard to ride because he had a slippery back.” (subject A27) b. “It’s wrong. The frog was hard to ride because he was slippery.” (subject A12) Interestingly, both of the adults above reference the slipperiness of the frog’s back, even though a sufficient explanation for a false or target-like reading of the sentence would have required reference only to the favourable conditions that the frog encountered when attempting to ride on the hedgehog’s back. In total, I noted four D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 332 adults who referenced the frog’s failed attempt to ride the hedgehog in connection with a target-like judgment of hard 7 and seven child subjects who did so. The oldest of the latter subjects was a girl of 7;4, who gave the following answer to the question, “What did he (= puppet) say wrong?”: (25) “It’s because the hedgehog was umm couldn’t get on the frog’s back because it was too slippery and he didn’t have any fur to hold onto.” I have chosen to highlight this aspect of post-judgment performance on hard 7 because this item produced the highest number of explanations (i.e. 20%) classified as “supports (the) opposite interpretation” (see Table 4.17). Yet, it is my opinion that this finding is not a particularly problematic one, given that adult and child subjects were observed to perform similarly in this respect. In particular, I think the seemingly contradictory nature of the type of explanation offered in (25) is reasonably explained in terms of a tension that the child (and perhaps even the adult) may have experienced between the desire to provide an appropriate response to the experimenter’s question and the desire to discuss a particularly salient story event. Other than the seven child subjects already referenced above, there were a total of twenty-five children who gave supportive explanations of both target-like and nontarget-like judgments of hard 7. For target-like judgments, subjects offered the following types of responses to the question, “Why was the hedgehog not hard to ride?”: (26) a. Non-verbal response: Subject demonstrates frog riding on hedgehog. (male 4;2) b. “Because the frog was slippy and the hedgehog wasn’t.” (female 4;8) c. “Ah + because the frog stayed on.” (male 6;0) In contrast, for true or non-target-like judgments of hard 7, child subjects offered the following reasons why they believed the hedgehog was hard to ride: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (27) 333 a. S: “Cause it’s slippery.” E: Cause who’s slippery? S: “The frog.” (female 3;5) b. “Cause the frog jumped on the back and it was nice and furry (but) when the hedgehog jumped onto the frog’s back, he couldn’t hold on cause it was too slippery.” (male 3;8) (Note also that this subject responded to presentation of the test sentence, The hedgehog was hard to ride, by saying, “Yeah, on the frog.”) c. “Cause he (= frog) had no fur to hold on (sic).” (female 5;10) d. “Cause if the hedgehog gets onto the frog’s back, when the frog jumps, the hedgehog would fall off.” (male 6;0) Notably, the majority of children who offered explanations of their true or false judgment of hard 7 offered an explanation of the type exemplified in (26) or (27), and thus the data collected were for the most part clearly supportive of one or the other interpretation of the control sentence. Finally, I examine post-judgment data collected for TC10, The dog was difficult to teach, a test item which I earlier used to illustrate certain basic features of experimental design (see §4.3.0.2). In Table 4.20, below, I first review the story contexts for the two readings of this particular item: Subject reading - False Object reading - True It was not difficult for the dog to teach the pig how to play football. The pig found it difficult to teach the dog how to go down a slide because the dog went up the slide backwards and then got distracted and chased a cat. Table 4.20: Story contexts for TC 10, ‘The dog was difficult to teach.’ For a true or target-like judgment of this item, adult control subjects typically offered the type of explanation listed in (28) below: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (28) 334 a. “True, because he went up the wrong way up (sic) the slide so he obviously wasn’t listening the way the teacher (doesn’t finish phrase) and got distracted by the cat.” (subject A14) b. “Because it (= dog) didn’t follow what the pig did. It went the opposite way.” (subject A1) As illustrated in (29), below, child subjects who judged the sentence true offered similar explanations to those listed in (28) when prompted to answer the question, “Why was the dog difficult to teach?”: (29) a. “Umm cause he went up the slide the wrong way.” (female 3;8) b. “He goes like this.” Subject demonstrates the dog climbing up the stairs the wrong way. (female 4;1) c. “Because he went up there and then he + then he said ‘I’m going to go chase that cat.” (female 6;0) d. “Because umm the pig listened to + the dog and he got a goal but umm when the + when the + the umm dog was listening + when the dog saw the cat, he wasn’t listening and he didn’t take any notice of the pig.” (female 7;4) I anticipated that an appropriate explanation of a false or non-target-like judgment of difficult 10 would involve reference to the dog’s successful attempts to teach the pig, and I did in fact collect a number of responses of this type, as illustrated in (30), below: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (30) 335 a. “The dog ++ took his nose and he flipped it (= ball) into the net.” Subject demonstrates dog showing pig how to push the football with his nose. (male 3;8) b. “Cause umm + the + the + the dog could teach.” (female 4;8) c. “Cause the dog was easy to put in (sic) there.” Subject demonstrates dog using his nose to push the football into the goal. (female 4;9) (Note that, here, the form of the child’s utterance corresponds to a subject reading of the TC, since the dog is construed as experiencing ease in putting the ball into the net and not as the object of the verb to put.) d. Response to test sentence, The dog was difficult to teach: “The pig ++ the dog was teaching the pig to play football.” Not all of the explanations offered for this item were as consistent as those illustrated in (29) and (30), however. As reported in Table 4.17, 12% of the explanations recorded for difficult 10 could be classified as of the type typically offered in support of the opposite interpretation of the sentence. A closer inspection of the performance of the four children who furnished these types of explanations, all of whom gave a false or non-target-like judgment of the test sentence, reveals that three of them referenced the dog chasing the cat, and one referenced the dog walking up the slide the wrong way. Therefore, I think it is reasonable to consider that, as in the case of hard 7, the child’s desire to discuss these particularly salient story events - which also represented the final events presented in the story - may have overridden his or her desire to comply with the request to provide an appropriate explanation of his or her judgment. (See also my earlier discussion of hard 5 in this section.) In the preceding discussion, I have considered certain evidence which suggests that children have access to a reading of the TC that is not available in the adult grammar, specifically, a reading in which the matrix subject DP is taken to control embedded subject PRO. However, when examined in isolation, the Intermediate child’s postjudgment explanation of a non-target-like interpretation of the TC still cannot speak to the larger issue of whether the same child considers the TC ambiguous. In order to D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 336 investigate this issue, it is necessary to compare not only the ability of the Intermediate to offer both target-like and non-target-like judgments of the TC but also the Intermediate’s ability to provide supportive explanations of both types of judgments. Since space considerations preclude a full presentation of all of the data that I collected in this regard, I will focus on the performance of three subjects in particular, nos. 29, 32, and 42, who were all over the age of six and therefore able to clearly explain their reasoning. As the data in Tables 4.21, 4.22, and 4.23 indicate, each of these subjects proved able to appropriately explain both target-like and nontarget-like interpretations of the TC, which, I would argue, provides support for my earlier claim that these subjects did not engage in guesswork but rather made reasoned judgments of test/control sentences: Test/control sentence Judg. Explanation of judgment The monkey was easy to teach. False (TL) Response to control sentence: “The `girl was easy to teach ++ bad luck.” (Second comment directed to puppet.) The tiger was hard to fight. True (NTL) Response to test sentence: “Yes + yes because the elephant flipped the tiger over + twice.” The hedgehog was hard to ride. False (TL) S: “Well the hedgehog ++ the frog did get on the hedgehog but the hedgehog couldn’t get on the frog.” Table 4.21: Comparison of explanations offered by subject no. 29, a male, aged 6;0, for target-like (TL) and non-target-like (NTL) judgments of TC items D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 337 Test/control sentence Judg. Explanation of judgment The fairy was easy to fight. False (TL) E: The fairy wasn’t easy to fight? Subject shakes his head ‘no’ and replies, “Cause she had the magic wand.” The spaceman was easy to draw. False (NTL) E: He (= the puppet) said the spaceman was easy to draw. What was wrong with that? S: “No, because the boy umm said come have a try at drawing and he (= spaceman) couldn’t because his helmet was in the way.” The hedgehog was hard to ride. True (NTL) Response to test sentence: “Yes, cause his (= frog’s) back was very slippery.” The ladybird was difficult to eat. True (TL) Response to control sentence: “Yes, cause he (= ladybird) flew high in the tree” (i.e. where the dinosaur couldn’t reach him). Table 4.22: Comparison of explanations offered by subject no. 32, a male, aged 6;1, for target-like (TL) and non-target-like (NTL) judgments of TC items Test/control sentence Judg. Explanation of judgment True (NTL) Response to test sentence: “Umm +++ umm +++ the fairy +++ was easy to fight with the /pr/ + knight.” E: “Why?” S: “Cause she went up into the air and pushed him over twice.” The monkey was hard to draw. True (TL) E: “Why was the monkey hard to draw? S: “Because umm he kept /er/ + he kept moving in the trees when the boy was trying to draw.” The king was difficult to draw. False (TL) E: “Why is that wrong? Can you tell him (= puppet)? S: “Umm umm she (= princess) said at the end when she drawed the king that it was quite easy to draw.” The dog was difficult to teach. False (NTL) Response to control sentence: “The pig ++ the dog was teaching the pig to play football.” The fairy was easy to fight Table 4.23: Comparison of explanations offered by subject no. 42, a female, aged 7;3, for target-like (TL) and non-target-like(NTL) judgments of TC items D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 338 In concluding the discussion in this section, I maintain that the post-judgment data reviewed in the three tables above provide strong support for my claim that the Intermediate subject has access to two readings of the TC, and thus that the performance of the Intermediate is inaccurately characterized as being random. I wish to be clear, however, that I do not discount the possibility that my subjects, whether Intermediate or not, may have displayed chance performance from time to time. For example, it is reasonable to expect that both child and adult subjects will experience occasional lapses of attention in any formal experimental study of linguistic behaviour. Nevertheless, my contention remains that the evidence reviewed in this section, on the whole, is consistent with my claim that the Intermediate makes grammatically motivated rather than random choices with regard to her interpretation of the TC. 4.5.1 Degree constructions (DCs) 4.5.1.0 Presentation of group and individual findings As reviewed in §4.3.1.1, the null and experimental hypotheses in this condition differed somewhat from those proposed for the other NOS in the study. With regard to a DC such as, The dinosaur was too naughty to teach, I formulated the null hypothesis according to the assertion that children, like adults, have access to both SDC and ODC interpretations of the sentence, as illustrated in (31), below: (31) a. SDC: The dinosauri was too naughty PROi to teach pro*k/prototypical. b. ODC: The dinosauri was too naughty PROk to teach ei. Conversely, according to the experimental hypothesis, it was predicted that children would be restricted to subject readings of the DC, that is, to an SDC interpretation of the DC as a consequence of their inability to interpret an NOS. In Table 4.24, below, I contrast the performance of subjects in all five age groups on individual DC items, as well as on the four DC items combined. Importantly, subjects are not compared in the table according to the percentage of target-like readings they D.L. Anderson, University of Cambridge 339 Chapter 4: Experimental Design and Presentation of Results provided, since both readings of the DC can be so described, but instead according to the total number of object readings they provided: Grp Ages DC13 DC14 DC15 DC16 All items 1 3;4 - 4;4 6 (54.6%) 2 (18.2%) 4 (36.4%) 2/10 (20%) 14 (31.8%) 2 4;6 - 5;5 4 (36.4%) 1/10 (10%) 7/10 (70%) 4 (36.4%) 16/42 (38.1%) 3 5;6 - 6;3 4 (36.4%) 3 (27.3%) 8 (72.7%) 6 (54.6%) 21 (47.7%) 4 6;5 - 7;5 6 (54.5%) 4/10 (40%) 10 (90.9%) 10 (90.9%) 30/43 (69.8%) 5 Adults 6 (54.5%) 8 (72.7%) 8 (72.7%) 11 (100%) 33 (75%) Table 4.24: Total number of object readings provided by age group on DC items (NB: Where the number of subjects tested on an item was less than eleven, the number of object readings obtained appears as a fraction over the total number of children tested.) Notably, the data reported in Table 4.24 do not provide support for the experimental hypothesis, since subjects in all age groups demonstrated the ability to assign an object reading to the DC. Thus, pace the claim advanced by Solan (1978), which was discussed in §3.2.1.0, I did not find that the acquisition of the ODC poses any particular difficulty for children over and above their mastery of the TC. Nevertheless, the group findings reviewed in Table 4.24 do mask the fact that there were four subjects, all in age group 1, who provided only subject readings of the four DC items.37 Although the performance of these four children is not inconsistent with the experimental hypothesis, their performance on the other NOS is. This is because all demonstrated the ability to assign at least some target-like readings to the TC, IR, and OPC. Thus, I believe it more plausible that the failure of these four children to provide an object reading of the DC was not related to their inability to interpret NOS 37 Conversely, there was only one subject in the study, a female, age 7;3, who provided only object readings of the four DC items. The majority of the child subjects gave mixed readings of these items, as did ten out of the eleven adult subjects. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 340 but rather to an interpretive preference that they displayed for the subject reading of the DC. I return to the issue of subject preference for a particular reading of the DC later in this section. Returning to my analysis of the findings reported in Table 4.24, a statistical analysis of between-group performance on the four DCs revealed that there was no significant difference in the mean percentage of object readings provided by subjects in groups 1, 2, or 3 (χ2 (2, n=33) = 2.268, p <.322); therefore, these three groups performed as a single population. When groups 1 to 5 and groups 1 to 4 were similarly compared, however, a significant difference was observed (Groups 1-4: χ 2 (3, n=44) = 12.924, p <.005; Groups 1-5: χ 2 (4, n=55) = 25.073, p <.001), indicating that the youngest subjects in the study performed differently from both the oldest children and the adult controls. Finally, a Mann-Whitney test revealed no significant difference in performance between groups 4 and 5 (U (11,11) = 50.500, p < .519), and therefore the oldest child subjects and adult controls performed statistically as a single group. I was also interested to determine whether child subjects in any of the age groups experienced particular difficulty with a specific DC item. A Cochran’s Q test revealed that the distribution of subject and object readings was statistically similar for all groups with the exception of group 4. For these subjects, a significant difference was obtained when between-item performance was analysed (Q(3) = 8.586, p <.035). Nevertheless, the results of a McNemar’s test for pairwise comparisons failed to locate this difference to any single contrast between two DC items, and therefore I will not pursue an explanation of this particular finding here. A similar comparison of the performance of my adult control subjects on individual DC items revealed no significant difference in the distribution of subject versus object readings for any given item (Q(3) = 4.935, p <.177). Even so, an inspection of the figures provided in Table 4.24 reveals that while DC13 produced a relatively evenly balanced number of subject as opposed to object readings, adults demonstrated a clear D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 341 preference for object readings in the case of the other three DCs.38 In this respect, I note that my findings are consistent with those reported for McKee’s adult subjects, who favoured object readings of DC items approximately 75% of the time (1997a:77). I now turn to an examination of the types of explanations offered by both child and adult subjects for their judgments of DCs. As in the case of TCs, the explanations offered by my child subjects for DCs often closely resembled those provided by adult subjects, although, understandably, the child’s remarks were typically more truncated than the adult’s. Notably, the ambiguity of the DC afforded a unique opportunity to compare adult and child explanations of both subject as well as object readings of a particular item. In general, however, I solicited fewer explanations of DC items than of TC items. This was primarily because I did not wish to tire subjects with excessive follow-up questioning and so chose to concentrate my efforts on obtaining postjudgment data for the TC, which represents the main construction of the present study. As Table 4.24 indicates, DC13, The dinosaur was too naughty to teach, prompted the most balanced number of subject and object readings in the case of both adult and child subjects. Thus I think it is the most appropriate choice for illustrating postjudgment performance on the DC. Figure 4.16, below, illustrates the test materials used for this item. 38 In the case of DC16, I think it is reasonable to consider that some aspect of the design of this item may have reduced the availability of the subject reading of the sentence, given that all eleven adult subjects accessed an object interpretation. The story preceding DC16 had included two scenarios, one in which a snake contemplated eating a lion and another in which a lion decided to eat a snake, despite the snake’s protests that he would not make an adequate meal because he was “too small (for the lion) to eat.” While the presentation of story events favoured the object reading of DC16, I believe that this factor alone is insufficient to explain the performance of my adult subjects, who offered mixed responses for the other similarly biased item, DC15. And while I think it is plausible that adults may have simply favoured a transitive reading of the embedded verb to eat over an intransitive reading of the same, I have no direct evidence for the validity of this claim. I submit, then, that resolution of this issue would require further testing of similar items and, ideally, testing of DCs that feature a wider range of embedded verbs. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 342 Figure 4.16: Materials used for DC13, ‘The dinosaur was too naughty to teach.’ The design of the story preceding DC13 supported a false judgment of the subject reading of the sentence (i.e. The dinosauri was too naughty PROi to teach pro*k/prototypical) since the dinosaur proved to be a very good teacher. Alternatively, the story context also supported a true judgment of the object interpretation of DC13 (i.e. The dinosauri was too naughty PROk to teach ei), since the dinosaur was a very troublesome pupil who knocked over his desk while being taught. Beginning with explanations offered by adult subjects for DC13, I find it interesting that there were two subjects who consciously recognized the ambiguity of the sentence, as indicated by their remarks, reported below: (32) a. “It wasn’t clear whether you were saying the dinosaur was too naughty to teach, meaning that he couldn’t teach because he was too naughty, or whether the teacher found him too naughty to teach.” (subject A1) b. “Well it depends what you mean by that, the dinosaur was too naughty to teach - to be taught or to be a teacher? (subject A8) Thus, while it is typical for adults to consciously access only one preferred reading of an ambiguous sentence (cf. Crain and Thornton 1998), these two subjects quite clearly D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 343 recognized both available interpretations of the DC.39 Furthermore, I observed that this particular finding was not limited to DC13, since in the case of each of the other three DC items, there was at least one adult who offered remarks similar to those listed in (32), above. For adult subjects who judged DC13 either true or false, the following represent typical explanations offered for each reading of the sentence: (33) Subject reading (false) E: Why was he not too naughty to teach? S: “Because he actually got up in front of the class and taught the class.” Object reading (true) E: Why was he too naughty to teach? S: “Well, he knocked his chair over, did a lot of jumping around and Mrs. Payne (= teacher) gave up teaching entirely and + let him have a go.” In the case of the forty-four child subjects, twenty-eight, or 63.6% of the total, gave either prompted or unprompted explanations of their judgment of DC13, and the majority of these explanations (i.e. 89%) could be considered supportive of the child’s original judgment of the test sentence. The explanations listed in (34), below, are representative and as the reader will observe, resemble the adult explanations listed in (33): 39 In addition, it appears that one child, age 6;6, may have consciously accessed both interpretations of DC13, since on presentation of the sentence The dinosaur was too naughty to teach, this child remarked, “in half of it.” I acknowledge that the child’s remark may have simply meant that the truthvalue of an object interpretation of DC13 could be established only on the basis of events that took place in the first half of the story, when the dinosaur misbehaved as a student. However, the child’s statement is also consistent with her recognition that the truth value of the subject interpretation of the sentence could not be upheld according to the events that took place in the last half of the story, in which the dinosaur proved to be a good teacher. Thus, I think it is possible that this child recognized not only the truth of the object interpretation of the sentence but also the concurrent falsity of the subject interpretation of the same. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (34) 344 Subject reading (false) a. “Because the dinosaur was good to teach.” (female 5;1) b. “Yeah, yeah, cause the dinosaur teached them very well.” (male 6;0) c. “He said that the dinosaur was too naughty + + to be a teacher.” (female 6;10) Object reading (true) a. “Cause he ++ knocked over that chair.” (female 3;5) b. Response to test sentence: “Yep, he did knock over the chair.” (female 5;8) c. “Because umm he knocked the chair over + and the teacher said umm she wouldn’t be able to teach him if he’s naughty.” (male 6;10) As in the case of the TC, I was particularly interested to find examples of single subjects who could capably explain both subject and object interpretations of various DCs, since this would provide supportive evidence that the child’s grammar licenses both options. However, because I tested only four DCs, as compared to a total of twelve TCs, and consequently posed fewer follow-up questions in this condition, I was able to obtain only a relatively limited set of relevant data, from which I have drawn the examples offered in Tables 4.25 and 4.26, below: Test/control item Judg. DC15 - The giraffe was too big to ride. True subject reading DC16 - The snake was too small to eat. Falseobject reading Explanation of judgment E: Why is that true, the giraffe was too big to ride? S: Subject demonstrates by making the (very tall) giraffe stand over the pony’s back. E: Why was Fudge (= puppet) wrong? S: “Cause umm the lion was going to eat the snake and he ate him.” Table 4.25: Comparison of explanations provided by subject no. 15, a female, aged 4;8, for subject and object readings of the DC. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results Test/control item Judg. Explanation of judgment DC13 - The dinosaur was too naughty to teach. False subject reading E: What was wrong with that? DC16 - The snake was too small to eat. False object reading Response to test sentence: 345 S: “It was that the dinosaur was good to teach, not bad.” “Nope, the lion could bend down.” (Presumably, to eat the snake.) Table 4.26: Comparison of explanations provided by subject no. 28, a male, aged 6;0 for subject and object readings of the DC. 4.5.1.1 Discussion In this section, I address the larger issue of how the experimental evidence reviewed in the previous section, which pertains to the processing of ambiguous sentences, can be used to inform existing theories of sentence comprehension. As earlier noted, it is generally accepted in the psycholinguistic literature that adults will favour one particular reading of an ambiguous sentence when such a sentence is presented to them in the absence of a predisposing context. There is less agreement, however, regarding the issue of how the human parser determines such a preference. Some theorists adopt the view that the parser operates in a strictly autonomous fashion, with a syntactic analysis of the input necessarily preceding semantic and/or pragmatic analysis of the same (see, e.g., Frazier 1978, 1987, Mitchell 1994, and Frazier and Clifton 1996); consequently, when all other factors are held constant, such theories hold that the favoured interpretation of an ambiguous sentence can be distinguished in terms of structural considerations alone. For other theorists, however, an autonomous and/or serial conception of the operation of the parser is rejected in favour of one that takes parsing determinations to result from satisfaction of a number of competing factors or constraints (see, e.g., Taraban and McClelland 1988, Boland, Tanenhaus, and Garnsey 1990, and MacDonald 1994). It is thus implicit in the latter view that the initial operations of the parser are not purely data-driven but instead can involve the use of multiple sources of information, such as pragmatic knowledge or frequencybased considerations. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 346 I prefer to remain theoretically neutral with respect to the relative merits of the two conceptions of the parser outlined above, as the data I collected cannot directly address this fundamental issue. Furthermore, while it is my contention that my findings can be used to inform theories of sentence comprehension, it is nevertheless appropriate that I exercise some caution in any attempted application of these findings, given that the present study involved use of an off-line rather than on-line measure of comprehension ability. This choice of testing method was motivated both by the relatively young age of my subjects and by the need to test subjects on site at the schools they attended. Recall that the eleven adult control subjects in the study displayed a clear bias toward object readings in the case of three out of the four ambiguous DC items, with only one, DC13, producing a relatively even distribution of subject and object readings. In this respect, as earlier observed, my findings parallel those reported by McKee (1997a), whose adult subjects also favoured object readings of ambiguous DC items presented with two supportive contexts. According to the Construal Model of parsing (Frazier and Clifton 1996), which proposes a serial and autonomous (i.e. modular) operation of the parser, structural complexity would have to be ruled out as the factor motivating the preference discussed here. This is because, according to the criteria outlined by Frazier and Clifton, both the subject and object reading of the DC would have roughly equal structural status, with both instantiating the same “primary relation” between a licensing matrix predicate (here, the degree phrase) and its complement clause (ibid.:41-2). Thus, it seems that an explanation of the adult preference for the object reading of the ambiguous DC would require instead some extra-syntactic account according to the Construal Model. However, to my mind, the interpretation of the data that is imposed by Construal Theory is counterintuitive on some level, since I would argue that according to any reasonable measure of derivational complexity, it is the object reading of the DC that would be identified as the more syntactically complex. This is because the ODC is standardly held to involve interpretation of a syntactically displaced object argument, whereas the SDC is not. Moreover, even in the psycholinguistic literature, it is the D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 347 ODC that would traditionally be considered the more difficult structure to process, according to the accepted truism that the processing of the type of filler-gap dependency that is attested in the ODC places a relatively greater strain on working memory than the processing of a dependency (e.g. a control relation) which does not involve movement or dislocation of a syntactic constituent, as in the SDC. It seems, then, that Construal Theory cannot offer a satisfactory explanation of the general preference that adults display for the object reading of the DC. Must we therefore abandon the notion of a modular and/or serial conception of the parser in order to account for this particular set of data? Not necessarily. For example, even if, we reject relative structural complexity as the determining factor in establishing an adult preference for the object reading of the DC, it is still possible to consider that any bias introduced at the initial, purely structural, level of analysis may simply be overridden at a later stage of processing. In particular, I would like to suggest that the relevant consideration is a lexical bias that is introduced by the presence of the degree word itself, which, in English, appears more frequently in connection with the ODC than with the SDC.40 Thus, while the initial operation of the parser might treat both representations equally, as Construal Theory would predict, or even favour what some would argue is the structurally simpler subject reading, this state of affairs does not rule out the possibility that an initial preference is overridden at a later stage of processing by the type of frequency-based consideration that I have suggested here. Certainly, I would argue that the experimental evidence collected by both Anderson (2002a) and McKee (1997a) is consistent with the existence of such a bias, given that the preference that adults displayed for the object reading of the TC was observed to be consistent across individual DC items, regardless of the choice of adjectival and/or clausal complement to the degree word. With regard to the issue of whether this hypothesized bias is more likely to influence an early rather than later stage of parsing, I concede that my findings cannot be used to directly address this particular concern since my data were collected through the use of an off-line method. Nevertheless, I note that there is a growing body of 40 I am grateful to Ianthi Tsimpli for originally suggesting this possibility to me. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 348 experimental evidence which points to frequency-based considerations as the relevant factor in determining certain parsing preferences (see, e.g., Trueswell et al. 1993, MacDonald 1994, Garnsey et al. 1997). For example, it has been variously argued that an adult preference for one reading of an ambiguous sentence over another can sometimes be traced directly to the fact that the matrix predicate in the sentence is statistically more likely to occur in English with one particular type of complement than with another. Thus, the types of statistical considerations that I have proposed may underlie the adult preference for the ODC are not without precedent in the literature. Since I recognize, however, that establishing the validity of this claim would minimally involve an extensive examination of natural language corpora, I must leave this as an issue for future investigation. Turning now to a consideration of the child data, I earlier noted that a reverse bias is attested in the early stages of the acquisition of the DC, with children initially favouring the subject rather than object reading of the sentence. In particular, I found that only subjects in group 4 (ages 6;5 to 7;5), who statistically performed as adults, demonstrated a clear preference for the object reading of the ambiguous DC. In contrast, younger children favoured subject readings of the DC by a margin of approximately 2 to 1, although this tendency was observed to decrease with age. For example, children in age group 3 (ages 5;6 to 6;3) provided a nearly balanced number of both types of readings, specifically, 52.3% subject-type and 47.7% object-type. Clearly, then, children below the age of 5;6 do not share the interpretive preference that adults display for the object reading of the DC, and yet the data also indicate that young children are not limited to the assignment of subject readings alone. I believe this state of affairs is reasonably explained according to the assumption that it takes some time for children to appreciate that assignment of an object interpretation to the DC is probabilistically favoured in English, and for the operations of the parser to be suitably influenced by this consideration. I also recognize the alternative possibility that the child’s initial preference for the subject reading of the DC may simply reflect the fact that early interpretive preferences are solely, or at least primarily, based on considerations of structural D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 349 complexity. Such a supposition would be consistent with the claims of theorists who argue that the derivation of the object reading of the DC is more complex than that of the subject reading since the former involves the interpretation of a displaced syntactic constituent. Nevertheless, I would argue that if the early preference that children display for the subject reading is one that is strictly based on considerations of syntactic complexity, then this preference should be an exclusive one, particularly in the early stages of acquisition. My results, however, do not provide any evidence for the existence of such a developmental stage. This is because the vast majority of my subjects – specifically, forty out of forty-four children - proved capable of assigning both subject and object readings to the DC. And in the case of the four children who provided only subject readings of the DC, all four concurrently demonstrated the ability to assign one or more object readings to the TC. Therefore, as earlier argued, I believe it is doubtful that lack of syntactic competence in interpreting NOS is implicated in the exclusive preference that these four children displayed for the subject reading of the DC. Instead, I think is equally plausible that the preference is one associated with the child’s developing processing abilities, rather than one imposed by a deficient competence grammar. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 4.5.2 350 Infinitival relatives (IRs) In this section I analyse subject performance on the IR constructions tested, IR17, The pirate found a soldier to fight, and IR18, The tiger found a rabbit to eat. (Note that subject performance on the OPC will be reviewed in the following section.) Because, unlike the DC, the IR is associated with only a single interpretation in the adult grammar, I will once again evaluate child performance according to whether it can be considered target-like or non-target-like. Recall that the null and experimental hypotheses. for both the IR and OPC test conditions, were formulated as illustrated in (35a&b), below: (35) a. Null hypothesis: The child, like the adult, has the syntactic ability to assign a target-like interpretation to the IR and OPC. b. Experimental hypothesis: The child does not possess the syntactic ability to interpret an NOS and therefore will be restricted to a non-target-like interpretation of both the IR and OPC. As noted in §4.3.1.2, I prefer not to label the non-target-like reading of the IR a subject reading. This is because I wish to avoid direct comparison of the non-targetlike reading of the TC, which I hypothesize involves subject control, and the nontarget-like reading of the IR/OPC, which I think more likely involves control of embedded PRO by the matrix object argument. Table 4.27, below, compares the total number of target-like readings obtained by age group for each of the two IRs, as well as for the two items combined: D.L. Anderson, University of Cambridge 351 Chapter 4: Experimental Design and Presentation of Results Group Ages IR 17 IR 18 Both items 1 3;4 - 4;4 5 (45.5%) 10 (90.9%) 15 (68.2%) 2 4;6 - 5;5 7 (63.6%) 9 (81.8%) 16 (72.7%) 3 5;6 - 6;3 4 (36.4%) 10/10 (100%) 14/21 (66.7%) 4 6;5 - 7;5 9 (81.8%) 10 (90.9%) 19 (86.4%) 5 Adults 11 (100%) 10 (90.9%) 21 (95.5%) Table 4.27: Total number of target-like responses per age group - IRs (NB: Where the total number of subjects tested was less than eleven, the number of responses obtained appears over the total number of subjects tested.) A review of the figures provided in Table 4.27 suggests that children generally experienced more difficulty with IR17 than with IR18, an observation that is confirmed by a statistical analysis of the results. In particular, while there was no significant difference observed between any child group and the adult control group on IR18, the same could not be said for IR17. As in the case of the TC and DC, the results of a Kruskal-Wallis test for between-group differences revealed that groups 1 to 3 performed as a single population (χ2 (2, N=33), p < .439) on IR17, but that each of these groups differed from both groups 4 and 5. Finally, again as reported for the TC and DC, a Mann-Whitney test for between-group differences revealed no significant difference in the performance of groups 4 and 5 (U (11, 11) = 49.500, p < .478), indicating that the two groups performed as a single population. The target-like performance demonstrated by all age groups on IR18 is not consistent with the view that certain subjects experienced an across-the-board impairment of their ability to interpret the IR. Rather, I believe that the comparatively poorer performance of some subjects on IR17 suggests the existence of some flaw in the design of this particular item. In support of this supposition, I note that the explanations offered by child and adult subjects for their judgments of IR18 were D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 352 relatively straightforward as compared to certain explanations offered in support of IR17. In order to illustrate this point, I will first review typical explanations offered for both target-like and non-target-like interpretations of the relatively unproblematic IR18, The tiger found a rabbit to eat. In (36), below, I offer a representative example of an adult explanation of a false (i.e. target-like) judgment of this item:41 (36) “He didn’t want to eat him. He wanted to play with him.” (subject A18) In the case of child subjects who provided an explanation of their target-like judgment of IR18, their remarks typically took one of two forms. Either the child focused on how accurately the sentence described the tiger’s intentions - which could be described as a more “purposive” interpretation of the sentence - or he/she focused on the characterization of the rabbit as being the object of eating, which I presume was more in keeping with a reading of the sentence as involving an infinitival relative clause. An example of the first type of explanation is provided in (37a) below and an example of the second in (37b): (37) a. “Because + because he /d/ + because he wanted to play with the rabbit instead.” (male 4;8) b. E: What did he (= puppet) say wrong? S: “Umm the tiger found something to eat.” (male 5;0) Regrettably, none of the four children who provided a non-target-like judgment of IR18 offered an explanation of their interpretation of the sentence. Nevertheless, all four were asked various follow-up comprehension questions (e.g. “Did the tiger want 41 As indicated in Table 4.27, there was one adult subject who gave a non-target-like (i.e. “true”) reading of IR18, but this is most likely attributed to the fact that she allowed extra-contextual considerations to affect her interpretation of the sentence. Specifically, this subject explained that she judged the sentence true because, “He (= tiger) found a rabbit that he wanted to eat but he didn’t eat it.” When the experimenter pointed out that, in the story, the tiger had expressed no interest in eating the rabbit, the subject responded that the tiger’s instinct, like that of her own pet dogs, would be to consider the rabbit potential prey whether or not he actually ended up eating the rabbit. Interestingly, it also appears that one child subject, age 6;0, may have entertained a similar consideration when providing a target-like (i.e. false) judgment of the same item, since he offered the following explanation of his interpretation of the sentence: “He (=tiger) found one to eat but he didn’t get to eat it.” D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 353 to eat the rabbit?”), and I noted that all responded appropriately. Two of these subjects, ages 4;9 and 5;5, correctly interpreted IR17 and missed IR18, while the other two, aged 4;4 and 6;10, failed both items. Therefore, it is only with respect to the performance of the latter two subjects that there is a legitimate reason to question whether these children may have lacked target-like ability to interpret the IR. Nonetheless, I think that the validity of this contention is undermined in the case of the second of these two subjects, first because of the relatively advanced age of this child and second because she performed quite competently on the other NOS tested in the study. Turning now to performance on IR17, The soldier found a pirate to fight, nineteen children (or 43.2% of the total) provided a non-target-like judgment of this item, including two children over the age of 6;5. As detailed in §4.3.1.2, the story preceding presentation of this IR featured a soldier who approaches a pirate. The pirate initially believes that the soldier has come to fight him, but the soldier then explains that he wishes only to join the pirate on his ship for a bit of singing and the story ends with the two characters singing together. I had speculated that children who lacked target-like knowledge of the sentence might allow what Jones (1992) has termed a switched-control reading of the sentence, in which a referential relationship holds between the matrix object DP and embedded subject PRO, and between the matrix subject DP and embedded object, as illustrated in (38), below: (38) The soldieri found a piratek PROk to fight (him)i. (With the meaning, The soldieri found a pirate who was willing to fight himi.) Of the twenty-five children who correctly judged the sentence false, ten provided an explanation of their judgment, and, as illustrated in (39), below, these were comparable to explanations offered by adults for the same reading: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (39) 354 a. E: And it’s false because? S: “He (= soldier) wasn’t looking for someone to fight. He was looking for someone to sing with.” (adult subject no. 3) b. “The soldier found a pirate ++ to sing.” (male 6;8) Non-target-like judgments of IR17 were provided by nineteen children, but, regrettably, I obtained only one explanation for this reading of the sentence, which is listed below: (40) Response to test sentence: “He did.” E: Did he? S: “Yeah. He found him and then he said, ‘I’m gonna fight you.’ ” I submit that this child’s explanation is consistent with the alternative interpretation of the sentence that was proposed in (38), above, in which a soldier finds a pirate who in fact turns out to be interested in fighting the soldier. In support of this contention, I note that the referent of “he” in the child’s sentence cannot be the same in both occurrences if the child’s assessment of the test sentence is to accord with story details. This is because the character who did the finding in the story, the soldier, was not the same as the character who threatened to fight the soldier. In the case of the other eighteen subjects who gave non-target-like judgments of this item, all were asked follow-up comprehension questions such as “What did the soldier want?” or “So did the soldier fight the pirate? Interestingly, sixteen of these children correctly answered all such questions, thus consistently identifying the pirate as the one doing the fighting and the soldier as the one seeking a singing partner. Certainly, then, the non-target-like performance of these sixteen children cannot be explained in terms of their poor comprehension of story details; instead, I suggest that these children may have had an alternative reading of IR17 in mind.42 The issue of whether 42 The remaining two subjects were the only ones who in fact gave any clear indication of having misunderstood the story. One boy, age 4;8, correctly remembered that the soldier had said that he wanted to find a pirate but incorrectly maintained that the soldier had intended to fight the pirate. And the second child, a boy, age 6;0, incorrectly claimed that the soldier did not want to find a pirate but instead “just wanted to have a go on the ship.” D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 355 this alternative reading may have involved switched control, as illustrated in (38), or some other analysis is complicated, I believe, by the performance of two of my adult controls. Both of these subjects correctly judged IR17 false, but each also expressed some reservation after having made this judgment. For example, one female subject judged the sentence false and then remarked, “That’s hard though.” When pressed to explain why she found judging this item difficult, she responded as follows: (41) “I don’t know why. The soldier found a pirate and the pirate wanted to fight him. That’s why. But in fact he didn’t.” (subject A16) While I acknowledge that this adult subject’s explanation is not entirely clear, I am nevertheless somewhat concerned that her remarks seem consistent with a construal of the sentence that involves switched control, as in (38), despite the fact that this interpretation of the sentence should be barred by the adult grammar. And, similarly, I believe that the remarks made by the second adult, a male, are also consistent with this same possibility. In this case, after judging the sentence false, the subject gave the following explanation of why he had felt some uncertainty in making his judgment: (42) “Well because he did find a pirate and the pirate wanted to fight. Therefore you could say he found a pirate to fight even though he wasn’t interested in fighting, so it’s just how you want <to?> phrase it.” (subject A21) Despite the curious nature of the above remarks, I will not pursue the argument that these two adult subjects may have accessed, even briefly, an illicit interpretation of the IR. Instead, I think it is more plausible that they interpreted the sentence as a subject-gap purpose construction (SPC), a construction exemplified in (43), below, which features an embedded subject gap: (43) The agency found a volunteerk [PROk to teach proprototypical] [in Belize]. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 356 In (43), the embedded object position is occupied by pro, which takes generic or unspecified reference; consequently, neither the matrix subject DP nor matrix object DP can serve as the controller of this argument. This particular reading of the IR is thus made possible by the subcategorial properties of the embedded verb to teach, which in English is licensed to occur with a generic or unspecified object. And since the verb to fight is similarly licensed to take generic pro as its object argument, I think it is reasonable to speculate that the two adults referenced above may have considered, although ultimately rejected, an SPC interpretation of IR17 (cf. The soldier found a piratei [PROi to fight proarb]). Moreover, I think it is possible that certain of my child subjects may have entertained similar thoughts regarding the appropriate analysis of IR17. However, what would remain unexplained according to the above line of argument is why so few of my subjects appear to have considered a similar alternative interpretation of IR18. According to the story accompanying presentation of the sentence The tiger found a rabbit to eat, an interpretation of the sentence such as that depicted in (43) is not ruled out. This is because the tiger does find a rabbit in the story, which begins with the rabbit eating food out of a dish. Furthermore, the embedded verb to eat, like the verb to fight, is licensed to take a null pronominal object with generic or unspecified reference; therefore, it is not inconceivable that the sentence could be interpreted as featuring generic object pro rather than an object gap with specific reference. Since I believe, however, that the limited amount of data I collected precludes a proper investigation of this issue at the present time, I must leave this as a topic for future investigation. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 4.5.3 357 Object-gap purpose constructions (OPCs) In this section I review performance on the two OPC items tested, OPC19, The man bought a chicken to eat, and OPC20, The clown bought a dog to ride. As in the previous section, I will not employ the terms subject or object reading to describe the two potential interpretations of the OPC, since I believe this terminology is potentially misleading. Instead, I will distinguish target-like readings, in which the matrix subject is interpreted as performing the action denoted in the embedded clause, and non-target-like readings, in which the matrix object argument is assumed to perform the same action. The reader is referred to example (35) in the previous section for a review of the null and experimental hypotheses associated with these test items. Notably, I do not consider the possibility that the non-target-like reading of the OPC might involve switched reference, as I speculated in the case of the IR (cf. The soldieri found a piratek PROk to fight (him)i). This is because the context of the stories accompanying presentation of OPC19 and OPC20 did not allow for such an interpretation of either item; specifically, the possibility was never entertained that the chicken might eat the man who purchased him, nor that the dog would ever ride the clown. Table 4.28, below, compares the five age groups in terms of the number of target-like readings obtained for each OPC item, as well as for both items combined: D.L. Anderson, University of Cambridge 358 Chapter 4: Experimental Design and Presentation of Results Grp Ages OPC19 OPC20 Both items 1 3;4 to 4;4 8 (72.7%) 9 (81.8%) 17 (77.3%) 2 4;6 to 5;5 10 (90.9%) 7 (63.6%) 17 (77.3%) 3 5;6 to 6;3 11 (100%) 10 (90.9%) 21 (95.5%) 4 6;5 to 7;5 11 (100%) 10 (90.9%) 21 (95.5%) 5 Adults 11 (100%) 8 (72.7%) 19 (86.4%) Table 4.28: Total number of target-like responses per age group – OPCs According to a statistical analysis of item-based performance, there was no significant difference between the performance of any of the child groups and that of the adult controls (χ2 (4, 55) = 8.752, p< .068 for OPC19 and χ2 (4, 55) = 3.793, p< .435 for OPC20). Therefore, even children in the youngest age group performed like adults on the OPC. Taking a closer look at the individual performance of the eighteen children aged 5;0 or under, who would be the most likely to have experienced difficulty with this construction, I find that there were no subjects who missed both OPCs and only six who missed one of the two items. Thus, the majority of my subjects below the age of five - specifically, twelve children - provided target-like responses on both OPCs. While I must be cautious in generalizing the findings reported here, given the very limited number of test items administered, it is nevertheless quite clear that these findings do not provide support for the claim that children are relatively delayed in their acquisition of the OPC, as argued by Goodluck and Behne (1992), Goodluck (1984), and Goodluck, Finney, and Ling (1995). As discussed in §3.2.1.1 of Chapter 3, the first two of the referenced studies have produced experimental evidence which suggests that children as old as ten still lack adult-like capability to interpret the OPC. However, looking at the performance of my own subjects in the upper two age groups, only one child out of eleven in group 3 (ages 5;6 to 6;3) made an error on the D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 359 OPC, and, again, only one child out of eleven in group 4 (ages 6;5 to 7;5) made such an error. Therefore, my findings, although limited, do not match those reported in the above-referenced sources. Furthermore, I believe that subject performance on OPC20, the item missed by the two children referenced earlier, would have been improved had certain modifications been made to this item prior to its use in the main study. Specifically, I note that there were three adults who gave non-target-like judgments for this item, which suggests some particular problem with the item itself. The reader will recall that in the case of aberrant adult responses to TC items, I was generally able to account for these errors in terms of subject inattention or in terms of the subject’s reliance on extra-contextual considerations when determining an interpretation of the sentence. However, I would argue that the three adult non-target-like responses obtained for OPC20 were more likely prompted by flaws in the design of this item. As originally discussed in §4.3.1.2, the story accompanying OPC20, The clown bought a dog to ride, first presented a scenario in which a clown went to a pet store and bought a dog. The dog initially assumed that he had been purchased as a pet and asked the clown if he could go to his new home. The clown explained to the dog that he had bought him because he wanted to ride him as a means of cheering up a little girl who was feeling poorly in hospital. This item had been pilot-tested on both child and adult subjects with no difficulties, but in the course of its use in the main study, I became aware that responses provided by several adults raised questions regarding the suitability of this item.43 The remarks made by two adult subjects in particular, who correctly judged the sentence true, are very informative in this regard and are reported in (44a&b), below: 43 As noted in §4.3, time and scheduling constraints prevented me from testing adult subjects prior to child subjects in the main study. Instead, testing proceeded in parallel for both groups. I acknowledge, however, that prior testing of adults is generally to be preferred and, in this situation, would have been especially advantageous, as it would have allowed me to modify or reject OPC20 before administering it to child subjects. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (44) 360 a. “I put ‘true’ but then I thought, did he (= clown) buy the dog to ride, or did he buy the dog to cheer the girl up in the hospital, or did he buy the dog to pull the cart? He could’ve bought the dog to do all sorts of things really.” (subject A1) b. Subject explains reason for hesitating before responding “true”: “Because he bought him for a number of reasons it strikes me: to cheer the girl up, etc., etc. – not essentially just to ride him. If you had said the clown bought a dog to make the little girl laugh, I would’ve said it was true without questioning.” (subject A15) In the case of the three adults who gave false or non-target-like judgments of the sentence, each raised similar considerations to those expressed by the subjects in (44a&b), offering alternative reasons for the clown buying the dog, such as to serve as a pet or to cheer the little girl up. I confess to being somewhat surprised by the response of these adult subjects, given that the story had included a specific event in which the clown declared to the dog, “I bought you so I could ride you.” However, as this event occurred early in the story, it is possible that its salience may have been diminished by later events. Because problems associated with OPC20 did not become apparent until well into the testing phase of the main study, I chose not to replace the item at such a late stage and to continue collecting responses from all of the subjects. A reasonable question thus arises as to whether any of my child subjects entertained the same type of considerations with regard to OPC20 as the adult subjects discussed above. In fact, I recorded three such explanations provided by child subjects for false or non-targetlike judgments of this item, which are listed in (45), below: (45) a. Response to test sentence: “Wrong + because + to help.” (female 5;4) b. Puppet: But tell me why I’m wrong. S: “No, they brought her some flowers + in the cart and then the dog after +++ rided him + home.” (male 6;0) c. “Umm the clown bought the dog so he could make the little girl laugh.” (female 7;4) D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 361 Moreover, I recorded one child, who had also incorrectly judged the sentence false and who offered the following response to the question, “So why do you think that clown bought a dog?”: (46) “Because he + he wanted to ride on it but he didn’t want to choose to ride on it.” (female 5;1) Notably, this particular child’s response seems to be very similar in content, though not in style, to the adult remarks cited in (44), since I interpret her remarks to mean that riding on the dog was not the primary purpose the clown had in mind when buying the dog. On the basis of the evidence reviewed here, I reiterate that I think that the number of non-target-like responses obtained on OPC20 may have been artificially inflated by the problems of design that I have noted. Consequently, it remains my contention that the performance of my child subjects on the two OPC items provides evidence for early, rather than delayed acquisition of structures of this type. 4.5.4 Passive sentences In this section, I analyse the performance of subjects on the four passive sentences included in the study, which consisted of two actional passives (AP) and two nonactional passives (NAP). As discussed in §4.3.1.3, two of these items featured an actional verb, AP21 (The boy was chased by the duck) and AP22 (The monkey was bitten by the swan), and two featured a nonactional verb, NAP23 (The snake was watched by the rabbits) and NAP24 (The elephant was heard by the dog). As standard in the study, the story context accompanying each passive item provided support for two potential readings of the sentence: an active or non-target-like reading, in which the matrix subject is assumed to play the logical role of subject of the matrix verb, and a passive or target-like reading. The null and experimental hypotheses for this condition were as in (47), below: D.L. Anderson, University of Cambridge 362 Chapter 4: Experimental Design and Presentation of Results (47) a. Null hypothesis: The child shares the same grammatical knowledge of the passive as the adult and thus will only allow a passive or target-like reading of the sentence. b. Experimental hypothesis: The child lacks a general ability to interpret a displaced object argument and will therefore be limited to the assignment of active or nontarget-like readings of the passive sentence. Looking first at the performance of child and adult subjects on the two APs, Table 4.29, below, compares each age group in terms of the number of passive or target-like readings provided per item and for the two items combined: Age group AP21 AP22 Both items 3;4 to 4;4 9 (81.8%) 9/9 (100%) 18/20 (90%) 4;6 to 5;5 10 (90.1%) 11 (100%) 21 (95.5%) 5;6 to 6;3 8 (72.7%) 11 (100%) 19 (86.4%) 6;5 to 7;5 11 (100%) 11 (100%) 22 (100%) Adults 11 (100%) 11 (100%) 22 (100%) Table 4.29: Total target-like responses per age group for actional passives (NB: Where the total number of subjects tested was less than eleven, the number of responses obtained appears over the total number of subjects tested.) According to a statistical analysis of the results, there was no significant difference observed when the performance of any of the child groups was compared with that of the adult controls (Kruskal-Wallis test, χ2 (4, N=55) = 6.175, p < .186). Therefore, it can reasonably be claimed that even the youngest subjects in the study performed like adults with respect to the two APs. Furthermore, according to the results of a McNemar’s test (N=54, p< .063), I found that none of the groups differed in terms of performance on the two individual items. Like Cromer (1970), then, I found no D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 363 evidence that my subjects experienced an across-the-board impairment of their ability to interpret a displaced object argument. Furthermore, these results comport with other findings reported in the literature. For example, Maratsos et al. (1985) demonstrated that 4-year-olds experience no particular difficulty with the interpretation of passives featuring actional, as opposed to nonactional verbs, while Fox and Grodzinsky (1998) reported that their subjects, aged 3;6 to 5;5, performed well on both actional and nonactional passives, with the exception of those nonactional passives that included a by-phrase. (See also DeMuth 1989, 1990, who demonstrates that verbal passives are readily acquired by very young speakers of certain non-Indo-European languages, such as Sesotho.) In the case of AP21, for which I recorded a total of six non-target-like judgments, I regrettably did not obtain any clear explanations of this type of response. However, I note that four of the six subjects who gave incorrect judgments of AP21 demonstrated correct recall of the details of the story when asked comprehension questions in the follow-up phase of the task. Thus, I think it is unlikely that the non-target-like performance of these subjects derived from poor comprehension of the accompanying story; rather, I think that each simply chose to assign the sentence an active interpretation, it being one afforded by their grammar and one which accorded with the story details. With respect to the second actional passive, AP22, I obtained even less data in the post-judgment phase of the task than for AP21, since AP22 was correctly interpreted by even my youngest subjects. Therefore, I turn directly to a review of subject performance on the two passive items that featured nonactional verbs. For NAPs, the performance of subjects in each of the age groups is summarized in Table 4.30, below: D.L. Anderson, University of Cambridge 364 Chapter 4: Experimental Design and Presentation of Results Age group NAP 23 NAP 24 Both items 3;4 to 4;4 3/10 (30%) 3 (27.3%) 6/21 (28.6%) 4;6 to 5;5 5 (45.5%) 6 (54.6%) 11 (50%) 5;6 to 6;3 6 (54.6%) 8 (72.7%) 14 (63.6%) 6;5 to 7;5 9 (81.8%) 6 (54.6%) 15 (68.2%) Adult 11 (100%) 11 (100%) 22 (100%) Table 4.30: Total target-like responses per age group for nonactional passives (NB: Where the total number of subjects tested was less than eleven, the number of passive or target-like responses provided appears over the total number of subjects tested.) A review of the data contained in Table 4.30 indicates that child subjects in all age groups performed relatively worse on nonactional than actional passives, a finding that comports with the results obtained in a number of previous studies, including Maratsos et al. 1979, 1985, de Villiers et al. 1982, Gordon and Chafetz 1986, and Pinker et al. 1987. (Alternatively, see Fox and Grodzinksy 1998 for a somewhat more complicated set of findings.) According to a statistical analysis of the results, children in the first four age groups performed as a single group with respect to NAPs (Kruskal-Wallis test, χ2 (3, N=44) = 6.442, p < .092), with none of the four groups demonstrating target-like performance on these items. A significant difference was obtained, however, when the four child groups were compared with the adult control group (χ 2 (4, N=55) = 20.316, p < .001) and when only those in the oldest child group (i.e. group 4) were compared with the adults (Mann-Whitney test, U (11,11) = 27.500, p < .028). Thus, even subjects between the ages of 6;5 and 7;5 failed to demonstrate target-like knowledge of NAPs. Looking at subject performance within age groups, the results of a McNemar’s test revealed that there was no significant difference in the number of passive readings D.L. Anderson, University of Cambridge 365 Chapter 4: Experimental Design and Presentation of Results provided for NAP23 or NAP24 (N=54, p < 1.000). Therefore, the difficulty that child subjects experienced was more likely related to the nonactional status of the two items, rather than to their specific form. Figure 4.17, below, offers a graphic comparison of subject performance on APs and NAPs. To review, while target-like performance on APs was observed to be fairly consistent across all age groups in the study, performance on NAPs had not reached a target-like standard even for those subjects over the age of 6;5: 2.5 Mean target-like responses 2.0 1.5 1.0 Sentence type .5 Actional passives 0.0 Non-act. passives 3:4 to 4:4 5:6 to 6:3 4:6 to 5:5 Adult 6:5 to 7:5 Age group Figure 4.17: Percentage of target-like responses by age group for passive items Returning to my analysis of subject performance on the NAP, I was fortunately able to obtain more informative post-judgment data for these two items than for the two APs. Taking NAP23 as a representative example, I first review the basic contextual information associated with this item, which was earlier presented in §4.3.1.3: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results Passive or TL reading - False Active or NTL reading - True The rabbits don’t see the snake until a hedgehog advises them to look up in the tree. The snake watches the rabbits from a tree branch above them. 366 Table 4.31: Story contexts for NAP23, ‘The snake was watched by the rabbits.’ When asked to explain their false, or target-like judgment of NAP23, adult and child subjects offered the types of standard responses illustrated in (48) below: (48) a. “Because the rabbits didn’t look at the tree before they put their picnic down.” (adult subject no. 26) b. “The rabbits were watched by the snake.” (male 4;2) c. “He (= puppet) said the rabbit was looking.” (male 5;8) d. “The rabbits don’t watch the snake. They didn’t watch. The snake watched the rabbits.” (male 6;0) Turning now to non-target-like performance on this item, I note that some of the postjudgment data I collected would appear to corroborate a finding earlier reported in the literature, which is that young children sometimes display non-adult-like knowledge of the meaning of perception predicates such as watch (see, e.g., Goodluck and Roeper 1978 and de Villiers et al. 1982; see also the results of my pilot study, discussed in §4.4). Specifically, comments provided by two of my subjects under the age of four, both of whom gave non-target-like judgments of NAP23, would seem to suggest that these children interpreted the verb to watch as being synonymous with either the verb to look at or the verb to see. When asked whether the rabbits were watching the snake in the story, both of these subjects answered in the affirmative. They were then asked, “When did the rabbits watch the snake?” The first child, a female aged 3;6, replied, “When the hedgehog came.” She thus appears to be referencing the point in the story where the hedgehog drew the rabbits’ attention to the snake, an event that is consistent with the rabbits having seen the snake but not having watched it. And the response of the second child, a male aged 3;8, is even D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 367 more telling in this respect, since he explained that the rabbits watched the snake: “…cause they seen + cause they see what he was doing.” With regard to the child subjects over the age of 3;10 who also gave non-target-like judgments of NAP23, I regrettably failed to obtain any clear explanations of these judgments. However, several of these subjects were asked to answer follow-up comprehension questions, which I noted were consistently answered correctly. Thus, for example, when asked if the rabbits were watching the snake in the story, the subjects referenced here, unlike the two children discussed above, correctly answered “No.” Furthermore, two of these subjects, of the relatively young ages of 3;10 and 4;1, responded to the follow-up questions they were asked by correctly retelling the entire story that accompanied NAP23, including dialogue. Therefore, I submit that their non-target-like performance on this particular item was not linked to poor comprehension of story details. Instead, as I earlier argued in the case of non-targetlike performance on the AP, I think these subjects simply chose to assign the NAP an active reading. Since post-judgment data pertaining to NAP23 were so scarce, I reviewed the data collected in connection with NAP24 (The elephant was heard by the dog) looking for evidence that might further support the claim that certain of my subjects assigned an active rather than passive interpretation to the NAP. Notably, I collected no evidence from the post-judgment phase of the task that would suggest that subjects experienced similar difficulties with the interpretation of the verb to hear that some had experienced with the verb to watch. Instead, even younger subjects who provided non-target-like judgments of NAP24 (i.e. “false”) appear to have simply assigned the sentence an active interpretation, as suggested by the sample explanations illustrated in (49), below: D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results (49) 368 Why was the elephant not heard by the dog? a. “Cause + cause they didn’t + they + cause the elephant didn’t hear them (= dog and bird) did he? (female 3;6) b. E: Did he (= puppet) say a silly thing? What did he say? Do you remember? S: “Dog umm the elephant heard the dog.” (male 4;6) c. Response to test sentence: “The dog heard the elephant. He (= puppet) said it the wrong way round.” (female 7;3) Finally, I observed three interesting examples of children apparently using passive morphology to convey an active interpretation of the sentence, which are reminiscent of certain data reported by Whitehurst, Ironsmith and Goldfein (1974) and Horgan (1978). Horgan, for example, reported that her 2 to 4-year-old subjects produced what she termed reversed reversible passives, as when describing a picture of a cat chasing a girl, as, “The cat was chased by the girl” (ibid.:72) In (50), below, I offer three similar examples obtained from my own study: (50) a. Response to test sentence ‘The snake was watched by the rabbits’: “No + cause + the snake was watching + by the rabbits.” (female 5;3) b. E: But why was that one wrong? He (= puppet) said the elephant was heard by the dog. S: “Well ++ well the elephant wasn’t heard by the dog.” E: Did the dog hear the elephant in the story? S: “Yeah.” (female 5;1) (Note that the only plausible interpretation of this subject’s “passive” sentence, above, is an active one.) c. Response to test sentence: “No + + the umm umm + the dog was heard by the elephant.” (female 3;11) (Note that a passive reading of the child’s sentence does not accord with the story details.) Since two of the subjects quoted in (50) are over the age of five, it is important for me to acknowledge that the findings reported here may not strictly match those reported D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 369 in the earlier studies, given that those studies involved only children below this age. With regard to the first child quoted in (50) above, her response may simply represent a speech error, since she gave target-like judgments of all four passive sentences, including the item referenced above and so in all other respects demonstrated targetlike knowledge of this structure. In the case of the latter two subjects, however, the first gave target-like responses to all passive items with the exception of the referenced item, while the second gave target-like judgments of only actional passives. The latter two children thus displayed some knowledge of the grammatical rules for passivization in English but nevertheless appeared willing to extend the range of meaning associated with these structures. In explaining her own subjects production of reversed reversible passives, Horgan (ibid.:78), proposed that even young children recognize a distinction between reversible and non-reversible passives, since ungrammatical reversal of agent and object arguments, as in the production of reversed reversible passives, does not occur when the sentence features semantically non-reversible arguments in the first place. In particular, she hypothesizes that young children may produce reversed reversible passives because they associate this form with expression of the general notion of “mutual activity” rather than with alternative expression of agent-object relations. Interestingly, however, I recorded no such errors in the post-judgment production data that I collected for the two actional passives, which also involved semantically reversible sentences. Thus I am led to speculate, after a suggestion offered by de Villiers et al. (1982), that the presence of a nonactional or perceptual verb in a passive item may increase the processing demands associated with a sentence of this type. In particular, I believe that at least in the early stages of linguistic development, it is reasonable to consider that processing demands would be greater for the interpretation of a passive that, atypically, involves the syntactic promotion of an experiencer object, rather than for one that involves the syntactic promotion of a patient. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 4.5.5 370 Comparison of group performance on the TC and DC In this section, I compare group performance on the TC and on the DC and argue that a meaningful correlation can be drawn between the two. I have chosen not to compare TC performance with performance in the IR, OPC, or passive conditions, however, given the lack of parity in the number of items tested. For example, while I administered twelve TC test and control items, I administered only two IR and only two OPC items. And, with regard to the passive, the four items I administered were divided into two of the actional type and two of the non-actional type. Of course, the difference between the number of items administered in the TC and in the DC conditions is sizeable as well, since I tested only four in the latter case. Nevertheless, by converting the scores obtained into percentages, I believe that a reasonable comparison of performance across the two particular conditions can be made. As previously reported, child subjects in groups 1 to 3 statistically performed as a single group on TC items as well as on DC items. Notably, a similar finding was obtained when performance in the TC and DC conditions was compared according to a Wilcoxon signed ranks test, with no significant difference observed in the mean percentage of object readings provided for either type of construction, when the performance of subjects in the first three age groups was statistically analysed (Z = 1.338, p <.181). For groups 4 and 5, however, a significant difference was obtained when performance across the two conditions was compared, using the same test (Z = 2.257, p< .024, for group 4 and Z = -2.848, p < .004, for group 5)44. Thus, the oldest children in the study behaved like the adult controls, in that each group provided significantly more object readings of the TC than of the DC. Figure 4.18, below, provides a graphic illustration of group performance on the two constructions: 44 According to the results of the Wilcoxon test, the finding reported for group 4 holds only when performance is compared between DCs and the adjusted set of ten TCs, with potentially problematic items easy 2 and hard 6 removed. That is, when data obtained from the full set of TCs is used, no significant difference is observed (Z = -1.559, p < .119). As I believe results based on the administration of the set of ten items provide a more accurate picture of subject ability, I take the statistical analysis of the reduced set of findings to be the more reliable and informative. D.L. Anderson, University of Cambridge 371 Chapter 4: Experimental Design and Presentation of Results 1.0 .9 .8 Mean % Object Readings .7 .6 .5 .4 Degree NOS .3 .2 3:4 to 4:4 TCs 4:6 to 5:5 5:6 to 6:3 6:5 to 7:5 Adult Age group Figure 4.18: Comparison of group performance on TCs and DCs As Figure 4.18 illustrates, it is only after the age of 6;5 that children begin to treat the TC and DC in a distinct manner, at least as regards the lack of availability of a subject reading in the case of the TC. The findings reported above thus support my contention that children initially treat both constructions as ambiguous, with a similar bias towards the subject reading of each. Note that I do not assert, however, that children assign the same structural analysis to the TC and the DC prior to the age of six. Rather, my claim is only that prior to target-like acquisition of the TC, the child assumes that both the TC and the DC are associated with two legitimate interpretive options, one of which involves subject control of embedded PRO and the other a null operator-gap dependency. D.L. Anderson, University of Cambridge 372 Chapter 4: Experimental Design and Presentation of Results 4.5.6 Comparison of individual performance across test conditions In §4.5.0.1, I observed that of the forty-four subjects included in the study, three could be classified as P-R Users, thirty as Intermediates and eleven as Passers on the basis of their performance on the TC. In this section, I compare the performance of individual children not only with respect to the TC but across all of the conditions in the study. Looking first at the three P-R Users, Table 4.32, below, compares the total number of target-like readings provided by each of these subjects in each of the six experimental conditions. (Note that results for DC items are separately distinguished, since these figures represent the number of object readings provided, as opposed to the number of target-like readings provided.) TC IR OPC AP NAP Subj. no. Age (n =10) (n=2) (n=2) (n=2) (n=2) DC (n=4) 4 3;8 0 1 2 2 0 1 22 5;4 1 1 0 2 1 1 25 5;8 1 1 2 1 1 2 Table 4.32: Number of target-like (or object) readings per test condition – P-R Users As Table 4.32 indicates, there is no condition in which the performance of these three subjects was observed to be uniformly target-like. Nevertheless, I would argue that the data reported above still provide no evidence that any of these three subjects experienced an across-the-board impairment of their ability to interpret an NOS nor any general impairment of their ability to interpret a displaced syntactic constituent. As case in point, subject no. 4, who by virtue of his age could be considered the most likely of the three to possess limited syntactic ability, performed in a target-like manner on the two PC items and provided at least one object reading of the DC. Moreover, this subject proved exceptionally competent at explaining his judgments of various test/control sentences. By comparison, subject no. 25, a girl, was shy and therefore offered relatively fewer explanations of her judgments; even so, she, like subject no. 4, performed well on the PC and provided two object readings of the DC. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 373 Thus, the second subject is the only one of the three P-R Users for whom it can be claimed that performance was not target-like in any of the NOS conditions. This subject, a boy, had generally demonstrated good understanding of story details when responding to follow-up comprehension questions but provided few explanations of his judgments, even when requested to do so. It is possible that this particular child did not fully understand the importance of his role as teacher to the puppet, given that there were a number of occasions in which his judgment of the test/control sentence appeared to be offered with little reflection and, in some cases, was subsequently changed without explanation. In this respect, the performance of this subject does not conform with that of his fellow P-R Users nor with that of the other participants in the study, who, as noted in §4.5.0.2, generally proved able to provide appropriate explanations of both target-like and non-target-like judgments of experimental items. I thus prefer to consider this subject an exceptional case. Overall, then, I contend that the data reported in Table 4.32 do not provide support for the claim that the P-R User lacks the syntactic ability to interpret the TC and is consequently forced to rely on standard word order cues as a determinant of grammatical relations (cf. C. Chomsky 1969, Cromer 1970). The fully target-like performance of two of these subjects on the AP speaks directly against their reliance on the use of such an interpretive strategy. Furthermore, as earlier noted, two of these subjects were over the age of five and accordingly beyond an age at which reliance on the use of a non-grammatical strategy for sentence interpretation might be reasonably assumed. While it remains possible, as Goodluck (1991:98-9) has asserted, that child learners find the derivation of the TC particularly challenging, the results reported above are equally consistent with the hypothesis that the P-R User displays a strong interpretive preference for the subject reading of the TC but nevertheless is not restricted in her ability to derive the object reading. This proposal may seem controversial, given that two of the three P-R Users in the study offered no more than one target-like reading of the TC. Yet, as I earlier reported, four of the Intermediate subjects in the study were classified as such strictly on the basis of having provided two target-like judgments of D.L. Anderson, University of Cambridge 374 Chapter 4: Experimental Design and Presentation of Results the TC rather than one. Thus, I think it is reasonable to consider that the grammatical ability of two, if not three, of the P-R Users referenced above is perhaps more appropriately characterized as Intermediate, despite the strong preference that each displayed for the subject reading of the TC. Turning now to the performance of the eleven subjects in the study who were classified as Passers, Table 4.33, below, lists the total number of target-like readings provided by these subjects in each of the five experimental conditions, as well as the number of object readings they provided for DC items: TC IR OPC AP NAP Subj. Age of no. subject (n=10) (n=2) (n=2) (n=2) (n=2) DC (n=4) 13 4;7 9 1 2 2 2 1 23 5;6 9 1 2 1 1 3 24 5;7 9 2 2 1 2 2 34 6;5 9 1 2 2 2 3 35 6;5 9 2 2 2 2 2 36 6;8 8 a 2 2 2 1 3 37 6;10 9 2 2 2 1 2 38 6;10 10 0 2 2 2 3 41 7;2 10 2 2 2 2 2 43 7;4 10 2 1 2 1 4 44 7;4 9 2 2 2 1 3 Table 4.33: Total number of target-like (or object) readings per test condition - Passers (a This child failed to provide responses for two of the ten TC items, and so the score reported here represents eight target-like responses out of eight possible.) D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 375 According to the results reported in Table 4.33, six of the eleven Passers (i.e. nos. 24, 35, 36, 37, 41, and 44) performed in a target-like manner on both PCs and IRs, while the remaining five subjects (i.e. nos. 13, 23, 34, 38, and 43) demonstrated target-like competence in only one of these two conditions. Therefore, it is only with respect to the first set of six subjects that it can be claimed that all NOS, including the TC, ODC, OPC, and IR, were consistently interpreted in a target-like manner. However, I note that for the three subjects who made a single error on the IR, all made this error on IR17, an item I previously identified as being associated with a disproportionate number of errors (see the discussion in §4.5.2). Additionally, for two of these subjects, nos. 13 and 34, the error on IR17 represented the only non-target-like response provided in any of the experimental conditions. Finally, the poorest performance reported for an individual Passer is that reported for subject no. 23, who provided non-target-like responses for OPC, AP, and NAP items; nevertheless, this particular subject’s performance clearly represents the exception rather than the rule for those in the Passer category. In general, the data in Table 4.33 indicate that, for Passers, target-like performance on the TC was not perfectly correlated with target-like performance in all of the remaining conditions. However, because I have reason to believe that the error rate reported in Table 4.33 may be artificially inflated by the use of a potentially problematic test item, IR17, I think it is prudent that I remain cautious in making any negative evaluation of the grammatical capabilities of these subjects. Finally, as regards the performance of the thirty subjects classified as Intermediates, who ranged in age from 3;4 to 7;3, I will not list data for individual subjects given the sizeable number of children involved. There were only three in this group, all over the age of 6;0, who gave all target-like readings of IRs, OPCs, APs, and NAPs. Interestingly, then, the grammatical competence displayed by these three subjects on various NOS did not correlate with target-like performance on the TC. There were additionally eight Intermediate subjects who made only a single error on the IR, OPC, AP, or NAP: Five of these errors were associated with an NAP item and two were D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 376 made on IR17. Thus, these represent fairly predictable types of errors, as based on the relatively poor performance of all subjects on the NAP, and on IR17 in particular. The remaining nineteen Intermediates were mixed in terms of the extent of their nontarget-like performance in each of the four non-TC conditions. The majority of errors reported for these subjects occurred on either one or both NAP items, although there were scattered instances of non-target-like responses in other conditions. The poorest overall performance reported for any one individual Intermediate was for subject no. 5, aged 3;9, who failed all passive items and made a single additional error on OPC18. Notably, however, this same child performed in a completely target-like manner on IRs and also provided one target-like response for an OPC item; thus, even this subject’s performance does not suggest an across-the-board impairment of her ability to interpret an NOS. In summary, my analysis of individual performance in the main study casts doubt on the validity of two of the research questions I posed at the beginning of this chapter. First, I found no necessary correlation between delayed acquisition of the TC and delayed acquisition of other NOS. Second, delayed acquisition of the TC, even in the case of the P-R Users in the study, cannot be explained in terms of a general difficulty that children experience in their ability to interpret a syntactically displaced object argument. Finally, I note that the findings reported in both the present and preceding sections do not support the contention that NOS are syntactically complex and, consequently, relatively late-acquired (cf. Goodluck and Behne 1992) nor the contention that NOS share a similar structural analysis and, consequently, are concurrently acquired. While my data cannot speak to the relative merits of different syntactic analyses of NOS, I do think it is informative that a child demonstrating competence in interpreting one such structure does not necessarily demonstrate a similar competence in her interpretation of other NOS. D.L. Anderson, University of Cambridge 377 Chapter 4: Experimental Design and Presentation of Results 4.6 Post-test: BPVS In Chapter 3, §3.2.1.0, I reviewed Cromer’s (1970) experimental study of TC comprehension, in which he advanced the claim that verbal mental age (VMA), as determined by the results obtained on the Peabody Picture Vocabulary Test (PPVT), served as a more accurate means of predicting a child’s ability to interpret the TC than the child’s chronological age. In particular, Cromer observed no direct correlation between a subject’s chronological age and the subject’s classification as either a P-R User, Intermediate, or Passer. For example, he reported that although all Passers in his study were over the age of 6;7, there were also children above this age who performed as Intermediates and even as P-R Users. In contrast, he reported the following correlations between subject performance on the PPVT and on the TC (ibid.:401, adaptation of Cromer’s Table 1)45: Mental age on PPVT P-R Users Intermediates Passers (years:months) 2;11 – 5;7 17 10 0 5;9 – 6;6 0 8 0 6;8 – 10;8 0 1 5 Table 4.34: Performance on the TC correlated with mental age Cromer (1970) According to the classification of subjects reported in Table 4.34, there was no subject with a VMA over 5;7 who performed as a P-R User, all subjects with a VMA between 5;9 and 6;6 performed as Intermediates, and all but one subject above the VMA of 6;8 performed as a Passer. Cromer reported that these trends were all significant beyond the 0.001 level. 45 Cromer’s subjects were tested on the PPVT two months prior to their participation in his study of TC comprehension. Therefore, it is important to recognize that the two abilities were not concurrently tested when interpreting the significance of the correlations he reports. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 378 In considering the classification of my own subjects according to Cromer’s original criteria, I have previously noted that the vast majority of my subjects (i.e. forty or 91%) would fall into the Intermediate category. However, when these criteria are relaxed to allow a margin of error of one out of twelve items, as first suggested in §4.5.0.0, then the breakdown of subject classification is as reported at the beginning of this section: Three P-R Users, thirty Intermediates, and eleven Passers. Like Cromer, I find that these three groups of subjects cannot be neatly divided in terms of chronological age, as there is a considerable degree of overlap between the three classes. Intermediates in my study, for example, included subjects between the ages of 3;4 and 7;3, while Passers included those between the ages of 4;7 and 7;4. For this reason, I was interested to test with my own subjects the validity of Cromer’s claim that VMA serves as a better predictor of TC performance than chronological age. Given time and scheduling constraints, I was able to administer the more recent equivalent of the PPVT, the British Picture Vocabulary Scale (BPVS) (Dunn and Dunn 1997), only after I had completed data collection in the pilot and main studies.46 For some subjects, this meant that no more than one week elapsed between the completion of experimental testing and the administration of the BPVS. For some others, however, a period of nearly two months separated the child’s participation in the experimental study and my administration of the BPVS. Therefore, this consideration must be kept in mind when one interprets the results obtained in the post-test described here. According to the results I obtained, I do see some consistency in the relation between BPVS score and subject performance on the TC. For example, all ten Passers in my 46 I administered the BPVS (1997) without the aid of an experimental assistant since the task requires only that the child match spoken vocabulary items with graphic representations of a spoken word. As I am a speaker of American English, a reasonable question could be raised as to whether administration of the BPVS met certain stipulated conditions. The first of these was that vocabulary items should be read in the first instance using the pronunciation characteristic of the prevailing local dialect and then, in the second instance, using the pronunciation of standard British English. In order to comply as closely as possible with these recommendations, I audiotaped both the local and standard pronunciation of the vocabulary items and practiced both types of pronunciation before reading the items aloud in the actual trial. As I observed no instances in which a child reported difficulty in understanding the vocabulary items as pronounced, I would therefore contend that the recommended conditions for administration of the test were satisfactorily observed. D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 379 study had a VMA over 6;1; moreover, all children with a VMA of 8;0 or over could be classified as Passers. However, it must be emphasized that I did not find that a relatively high BPVS score served as a perfect predictor of target-like TC performance. For example, of the three subjects who attained a BPVS score in the extremely high range, two children, aged 3;8 and 4;2, could be classified as Intermediates in terms of TC performance and only one, aged 4;7, a Passer. Similarly, those attaining BPVS scores in the moderately high range included both Intermediates and Passers. In the case of the three P-R Users in the study, two had VMAs of 4;11 and one had a VMA of 5;10. However, there were a number of other subjects with similar or lower VMAs who could be classified as Intermediates on the basis of their TC performance and, therefore, I did not find VMA to be a particularly reliable predictor of P-R Use status. Finally, as regards the thirty Intermediate subjects, VMA for these subjects ranged from 3;6 to 7;5, which I note is roughly comparable to the findings reported by Cromer in Table 4.34, where the VMA of all but one of his Intermediate subjects can be seen to range between 2;11 and 6;6. Since I thus observed considerable variation in VMA in the Intermediate group, I am reluctant to claim that any real correlation exists between VMA and target-like ability to interpret the TC. Finally, I used BPVS scores to address one particular methodological issue which arose during subject selection for the pre-test. This concerned my desire to select experimental subjects who were of average academic ability so as to ensure that, as far as possible, I was investigating typical development of the ability to interpret NOS. To this end, I solicited the assistance of teachers in identifying students of exceptionally high or exceptionally low academic ability, who could thus be excluded from participation in the pre-testing. However, my decision to administer the BPVS as a post-test presented me with an opportunity to retrospectively evaluate the general verbal ability, or verbal intelligence, of the child participants, abilities which can serve to indicate a child’s general “scholastic aptitude,” according to Dunn and Dunn (op.cit.:2). D.L. Anderson, University of Cambridge 380 Chapter 4: Experimental Design and Presentation of Results Table 4.35, below, provides a breakdown of the performance of the forty-four subjects on the BPVS. I have used the classifications provided in the test materials, which are based on the use of standardized scores, to rank the vocabulary ability and scholastic aptitude of subjects according to the performance of their age-matched peers.47 Score range Percentile Rank No. of Subjs. Low average 15 to 49% 8 (18%) Average 50% 1 (2%) High average 51 to 85% 26 (59%) Moderately high 85 to 97% 6 (14%) Extremely high over 98% 3 (7%) Table 4.35: Classification of experimental subjects in terms of BPVS percentile ranking (Anderson 2002a,b) The most striking finding reported above is that 80% of the participants in my study scored in the upper 50th percentile on the BPVS. That is, 80% of the subjects could be classified as ranging from high average to extremely high in terms of their vocabulary (and scholastic) ability. This finding would therefore appear to suggest that, despite the measures I took to select participants of average academic ability, my study nonetheless included a disproportionate number of subjects with above-average vocabulary skills. However, according to the distribution of scores reported for the sample of subjects on which the BPVS was standardized, it is predicted that the majority of scores drawn from any random sample – specifically, 68% - will fall within the low to high average range. And as can be readily seen from the data reported in the table above, thirty47 The percentile rank reported in Table 4.35 represents the percentage of children of the same age in a standardized sample who scored equal to, or below, the individual subject’s score. It therefore represents a “deviation norm” since it serves as a measure of how much an individual subject’s performance differs from that of an average group of age-matched peers (Dunn and Dunn 1997:16). D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 381 five of my subjects, or 79.5%, of the total did in fact obtain scores in the low to high average range. Nevertheless, this still leaves nearly 20% of the subjects with BPVS scores that indicate higher than average vocabulary ability. While I am not unduly concerned by the latter finding, I acknowledge that my subject group was not as homogeneous as I had sought at the outset of the study. Therefore, I would advocate prior testing of potential participants for their vocabulary ability, rather than posttesting, in any future investigation of a subject’s ability to interpret NOS. 4.7 Conclusion In summary, my analysis of both group and individual performance in the main study casts doubt on the validity of the first of the hypotheses I presented in §4.0, which attributed children’s delayed acquisition of the TC to the syntactic complexity of such structures. This is because I found no necessary correlation between delayed acquisition of the TC and delayed acquisition of other NOS, as would be anticipated if this hypothesis were correct. I also found no evidence for concurrent acquisition of NOS, again as would be predicted if NOS share a similar degree of syntactic complexity. Furthermore, this latter finding does not provide support for the wellaccepted view that NOS share certain fundamental features of their syntactic analysis. However, it is important for me to acknowledge that my data are not conclusive in this regard, since the independent acquisition of NOS is also consistent with a scenario in which it is not the syntactic structure of the various NOS that determines the order of their acquisition but instead other factors which have yet to be determined. With regard to hypothesis (1b), which attributed children’s delayed acquisition of the TC to deficient lexical knowledge of the tough adjective, I also found no evidence in support of this hypothesis. First, my statistical analysis of group performance on the TC revealed no effect of the choice of a particular tough adjective or adjectives on children’s non-target-like interpretation of the TC. Rather, non-target-like responses were distributed equally across sentences containing different tough adjectives. Second, a post-test of my subjects’ vocabulary ability produced another finding problematic for hypothesis (1b), which is that a child’s relatively advanced score on D.L. Anderson, University of Cambridge Chapter 4: Experimental Design and Presentation of Results 382 this post-test did not necessarily correlate with his or her target-like performance on the TC. Finally, I pointed out that the target-like performance of even my youngest subjects on the actional passive does not support the validity of the third of the hypotheses I evaluated, which attributed delayed acquisition of the TC to a general impairment that children experience in their interpretation of a syntactically displaced object argument. One of the additional goals I pursued in this chapter was to identify areas of interest for future research. Given the limitations in both the number of test items I offered in the OPC and IR conditions and the problematic aspects of the design of certain of these items, I think that further study of children’s ability to interpret these two structures is warranted, particularly as the results I obtained on the OPC stand in conflict with the results reported by Goodluck and Behne (1992) and Goodluck et al. (1995). I think it would also be interesting to investigate what form (or forms) the child assigns to her non-target-like interpretation of the OPC/IR, as the data I collected in the present study, while suggestive, is inadequate to fully address this issue. Ideally, I would also like to extend my assessment of children’s ability to interpret the ambiguous DC. As earlier noted, my adult subjects produced only object readings of one of the test items, DC16 but produced a balanced number of subject and object readings of DC13. Although I have speculated on the basis of these results that the choice of embedded verb can influence the interpretive preference that adult subjects display for a particular test item, I submit that wider testing of the DC would be required to validate this hypothesis and to establish whether children are subject to the same parsing influences. In the following chapter, I offer further analysis of the experimental findings reported in the present chapter, in particular, detailing the significance of these findings for generative theories of language acquisition. D.L. Anderson, University of Cambridge