236 CHAPTER 4: EXPERIMENTAL DESIGN AND PRESENTATION

advertisement
236
CHAPTER 4: EXPERIMENTAL DESIGN AND PRESENTATION OF
RESULTS
4.0
Introduction
In this chapter, I present my own experimental evidence for children’s acquisition of
the tough construction (TC) and related null operator structures (NOS). These
findings, to be discussed in §4.5, were obtained in large part from an experimental
study that I conducted with forty-four monolingual child speakers of British English,
who ranged in age from 3;4 to 7;5. These forty-four subjects were drawn from an
original group of 122 children whom I pre-tested for their knowledge of the meaning
of several tough adjectives, as well as for their recall memory for story events. In
keeping with the broad research questions originally outlined in §1.0 of Chapter 1, my
experimental study was designed to allow me to test the following three hypotheses:
(1)
a. Acquisition of the TC is relatively delayed because the
construction is syntactically complex and children
initially lack the requisite syntactic ability to interpret
the TC in a target-like manner.
b. Acquisition of the TC is relatively delayed because
children require some time to learn the correct lexical
properties of the tough adjective.
c. Children initially fail to interpret the TC in a target-like
manner because they experience a more general
difficulty with the interpretation of syntactically
displaced object arguments.
Hypothesis (1a) is predicated on the following two assumptions. The first is that the
structural representation of the TC, like other NOS, involves a null operator-gap
configuration in an embedded clause, which is referentially coindexed with a matrix
antecedent. The second is that the derivation of any NOS is reasonably considered
syntactically complex in comparison with other structures in the language which do
not involve the interpretation of a displaced syntactic constituent. In essence,
hypothesis (1a) predicts that the ability to interpret various NOS should be acquired
concurrently and also fairly late. Accordingly, supportive evidence for the validity of
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
237
this hypothesis would be obtained if even my oldest subjects (i.e. those over 7;0) were
to demonstrate an inability to interpret all NOS in a target-like manner.
Conversely, I reasoned that if my subjects proved able to interpret some NOS in a
target-like manner, but not others, this pattern of performance could still be taken as
informative for syntactic theory. This is because concurrent acquisition of various
NOS would be predicted according to a theory, such as the null operator analysis
reviewed in Chapter 2, which takes NOS to share certain fundamental aspects of
syntactic representation and a similar level of syntactic complexity. If instead it could
be demonstrated that children acquire various NOS, including the TC, in a piecemeal
fashion, then this finding would cast doubt on these fundamental assumptions. At the
very least, were my subjects to show an early mastery of certain NOS, their nontarget-like performance on other NOS would be much less plausibly explained in
terms of the syntactic complexity of these structures.
My review of previous experimental studies of the acquisition of NOS in Chapter 3
presented yet one more possibility with regard to my evaluation of hypothesis (1a).
This concerns the widely reported finding that children in the Intermediate stage of
the acquisition of the TC assign both target-like and non-target-like readings to the
construction (cf. Cromer 1970). I view such a pattern of performance as being
inconsistent with the validity of hypothesis (1a), at least as concerns a child’s ability
to interpret the TC. That is, I maintain that for the claim to hold that a child lacks the
requisite syntactic ability to interpret the TC, the child should consistently fail to
interpret the TC in a target-like manner.
Turning to hypothesis (1b), supportive evidence for the validity of this hypothesis
would be obtained were my subjects to demonstrate a varying ability to assign a
target-like interpretation to the TC depending on which particular tough adjective
featured in the construction. Conversely, however, a child’s consistent non-target-like
performance on the TC would be compatible with either hypothesis (1a) or (1b) or
with both. Thus, on the basis of such a pattern of performance, I would be unable to
establish a definitive explanation for the child’s non-target-like treatment of the TC.
Even so, hypothesis (1b) predicts some degree of inconsistency in the child’s
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
238
interpretation of individual TC items, and therefore I think consistent non-target-like
performance on the TC would cast doubt on the validity of (1b).
As regards hypothesis (1c), supportive evidence would be obtained should any of the
subject groups demonstrate a concurrent inability to interpret passives and NOS in a
target-like manner, given that both constructions involve the interpretation of a
displaced object argument. As noted in Chapter 3, Cromer (1970) had previously
tested children’s concurrent ability to interpret the TC and the passive and had
reported that performance on the passive was comparatively better than on the TC for
even the youngest subjects in his study. This early finding thus undermines the
validity of hypothesis (1c). Nevertheless, as Cromer’s was the only study to offer
concurrent testing of the two constructions, I thought a contemporary re-testing of the
two was warranted. Furthermore, Cromer had offered only two tokens of the passive
in his study, both of which featured the verb to bite. Since I am aware that children’s
competence in interpreting the passive has been claimed to vary according to whether
the passive features a verb that typically takes an agentive subject (e.g. bite) or an
experiencer subject (e.g. like) (see, e.g. Maratsos, Fox, Becker, and Chalkley 1985), I
decided to expand the range of passives tested in my own study, offering sentences
that featured passivized versions of both types of verbs. My aim was to determine,
first, if my subjects would show the same dissociation between target-like
performance on the TC and the passive as had Cromer’s subjects and, second, if my
subjects would experience difficulty with the interpretation of psychological (or
nonactional) passives as compared to agentive (or actional) passives.
Lastly, as the reader will recall, one of the research questions I posed in §1.0 of
Chapter 1 pertains to the issue of how experimental findings obtained in studies of the
acquisition of the TC can be used to inform more general theories of language
acquisition and, in particular, generative theories of the same. I address this particular
question in Chapter 5 of this thesis. As a preview of the discussion to be contained in
that chapter, I will argue that the experimental findings reported in the present chapter
raise clear implications for generative theories of acquisition and, specifically, that
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
239
these findings can inform our general understanding of how learnability principles
operate in the acquisition of a first language.
4.1
Organization of chapter
In §4.2, I describe a pre-test that I conducted with 122 children drawn from a preschool and primary school, both located in the village of Willingham,
Cambridgeshire, UK. The pre-test consisted of two experimental trials. In the first, I
tested subjects for their knowledge of the tough adjectives, easy, hard, difficult, and
impossible, as well as for their knowledge of the degree construction (DC). In the
second, I assessed each child’s ability to retain a short sequence of story events in
memory. The motivation for this particular pre-test was my selection of the truthvalue judgment (TVJ) task as the assessment technique to be employed in the main
experimental study. As the TVJ task requires a child to retain a short sequence of
story events in memory long enough to assign a contextually appropriate
interpretation to the sentence under consideration, I believed that successful
performance on a pre-test of recall memory would serve as an appropriate inclusion
criterion for a child’s participation in the main study.
As I detail in §4.2, my findings from the pre-test included the observation that a
sizeable number of the subjects I tested lacked knowledge of the meaning of the
adjective impossible, a finding which raises questions regarding the appropriateness
of the design and/or methodology employed in certain previous studies (e.g. Kessel
1970 or McKee 1997a). I also explain how the results of the vocabulary pre-test
suggest a predictable order of acquisition of the four tough adjectives, with the
acquisition of easy and hard preceding that of either difficult or impossible. Finally, I
review the results obtained in the memory pre-test, including the rather disappointing
performance of my younger subjects.
In §4.3, I review the design of the main study, in which I tested forty-four children
between the ages of 3;4 and 7;5 for their knowledge of NOS, including the TC,
object-gap degree construction (ODC), object-gap purpose construction (OPC), and
infinitival relative construction (IR), as well as for their knowledge of passive
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
240
sentences. After Crain and Thornton (1998) and Gordon (1998), I review the basic
procedures associated with use of the TVJ task. I then discuss specific modifications
of the task that I adopted in my own study. Lastly, taking each of the abovereferenced constructions in turn, I illustrate the design features of each test condition.
Section 4.4 contains a brief review of the findings I obtained in a pilot study that I
conducted one month prior to the main study and a discussion of how these findings
prompted me to alter certain aspects of my original experimental design. In §4.5, I
present the results of the main study, discussing each construction in turn, beginning
with the TC. I first analyse my results in terms of group performance, with the
following four groups, each consisting of eleven children, organized according to age:
Group 1 (ages 3;4 to 4;4), Group 2 (ages 4;6 to 5;5), Group 3 (ages 5;6 to 6;3), Group
4 (6;5 to 7;5). In §4.5.0.0, I offer a statistical analysis of group performance on the
TC, which I found to be non-target-like for all but the oldest group. For Groups 1-3, I
also report considerable individual variation in subject performance, consistent with
findings earlier reported by McKee (1997a). Pace McKee, however, I point out that I
did not find any evidence that the performance of my subjects varied according to the
presence of a particular tough adjective or adjectives in the TC, and thus that my
results do not provide support for hypothesis (1b).
In the same section, I report that the balanced variation I introduced in the design of
my test items did not have any appreciable effect on subject performance. As I
explain, this finding raises implications for certain of the design recommendations
outlined in Crain and Thornton (1998). I also detail problematic aspects of the design
of certain of my own test/control items.
In §4.5.0.1, I provide a detailed analysis of the performance of individual subjects on
the TC. While analysis of performance at the group level indicated that my subjects
below the age of 6;3 failed to interpret the TC in a target-like manner, an analysis of
individual performance revealed that the majority of my subjects offered both targetlike and non-target-like interpretations of the TC, consistent with their having entered
the Intermediate stage of acquisition (cf. Cromer 1970). When considered in
conjunction with the production data I review in §4.5.0.2, where Intermediate subjects
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
241
are observed to provide appropriate explanations of both target-like and non-targetlike interpretations of the TC, I argue that my findings present a picture of
Intermediate performance in which the child does not apply guesswork to the task, as
has been elsewhere argued in the literature, but instead chooses between two
interpretive options made available by her grammar.
In §4.5.1, 4.5.2, and 4.5.3, I present the results I obtained, respectively, for the ODC,
IR, and OPC. With regard to group performance on the ODC, I report that I did not
find support for hypothesis (1a), given that children in all four age groups
demonstrated the ability to assign both subject and object readings to the ambiguous
DC. Looking at individual performance, I did find that four subjects, all under the age
of 4;4, failed to provide any object readings of the DC. Nevertheless, I argue that
since each of these children provided at least one target-like reading of the IR and the
OPC and gave mixed readings of the TC, their performance on the DC could simply
reflect an interpretive bias for the subject reading of this construction. I detail how
this hypothesis is given further support by my analysis of group results, which reveal
that a preference for the object reading of the DC, characteristic of the adult
population, is demonstrated by children only after the age of 6;5. I take this latter
finding as indicative that children require some time to recognize the existence of this
interpretive bias in the primary linguistic data (PLD) and for their production of this
form to be probabilistically adjusted.
In §4.5.2, I report that subjects in all four age groups demonstrated the ability to
assign target-like readings to the IR and thus that they did not perform in a manner
consistent with hypothesis (1a). As I note, however, this target-like performance was
largely restricted to one of the two items tested in this condition. I consider how the
design of this particular item could have negatively influenced subject performance,
for example, by admitting a third, unintended reading of this particular test sentence.
In §4.5.3, I explain that my subjects’ largely successful performance on the OPC not
only fails to provide support for hypothesis (1a) but also conflicts with results earlier
reported by H. Goodluck and colleagues. For example, whereas Goodluck and Behne
(1992) had claimed that children as old as ten fail to demonstrate target-like ability to
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
242
interpret the OPC, I found that only two of my subjects above the age of 5;6 offered a
non-target-like interpretation of the OPC, in each case an error restricted to a single
test item.
In §4.5.4, I analyse the results I obtained for actional and nonactional passives,
reporting that subjects in all four age groups performed like adults on actional
passives. I therefore observed no necessary correlation between non-target-like
performance on the TC, which was observed for all groups with the exception of
group 4, and non-target-like performance on actional passives. As I detail, then, my
findings do not provide support for hypothesis (1c). With regard to nonactional
passives, however, all four of my child groups performed worse on these items than
on actional passives, consistent with findings earlier reported in the literature (e.g.
Maratsos et al. 1979, 1985). I examine the performance of individual subjects in this
condition to investigate the source of the difficulty that children experience with these
structures.
In §4.5.5, I perform a statistical comparison of group performance on the TC and
group performance on the DC. I argue that the results of this statistical analysis
provide support for my contention that children under the age of 6;5 treat both
constructions as ambiguous, with the subject reading of each construction remaining a
strong preference prior to this age. In §4.5.6, I shift the focus to a consideration of the
performance of individual subjects across the full range of constructions tested. I
demonstrate that at the individual level, I once again find little support for the validity
of hypothesis (1a), since all of my subjects displayed target-like ability with regard to
one or more NOS. As I point out, this finding holds true even in the case of the three
children in the study who could reasonably be classified as P-R Users, having
provided one or fewer target-like readings of the TC. The data reported in §4.5.6 thus
do not support concurrent acquisition of NOS nor delayed acquisition of all such
structures.
In §4.6, I close my presentation of experimental results with a review of the
performance of my subjects on the British Picture Vocabulary Scale (BPVS) (Dunn &
Dunn 1997), which I administered after each subject’s participation in the main study.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
243
The inclusion of this post-test was inspired by one of Cromer’s findings (1970), which
was that a subject’s verbal mental age (VMA), as determined by administration of the
Peabody Picture Vocabulary Test (PPVT), proved a more reliable predictor of TC
performance in his experiment than a subject’s chronological age. I was therefore
interested to determine if the interpretive abilities of my own subjects could be
similarly correlated with vocabulary ability. I report that, like Cromer, I found little
correlation between TC performance and chronological age, but that, unlike Cromer, I
found only a very rough correlation between VMA and subject performance on the
TC. Moreover, as I observed that a relatively high VMA score did not necessarily
predict a subject’s successful performance on the TC, I argue that this finding casts
further doubt on the validity of hypothesis (1b).
4.2
Pre-Test
4.2.0
Part one: Vocabulary test
4.2.0.0
Design
In Chapter 3, I reviewed a sizeable number of experimental studies of the acquisition
of TC and yet noted only two, Macaruso et al. (1993) and McKee (1997a), which had
featured testing of the meaning of various tough adjectives independent of their
occurrence in the TC. In both of these studies, however, this testing followed rather
than preceded presentation of the same adjectives in the structural context of the TC.
As earlier noted, I consider this problematic in two respects. First, it is impossible for
the researcher to control for any effect of bias that may be introduced by presenting
these adjectives in a suitable structural context prior to the testing of their meaning
alone. Second, I believe it is preferable from the standpoint of experimental design
for the researcher to adopt selection criteria that are strict enough to define a
homogeneous sample in the first instance, rather than to adopt what might be
reasonably termed ‘exclusion criteria,’ which can identify unsuitable participants only
after the testing of main experimental items is complete.
As discussed in §3.2.1.1 of Chapter 3, I believe that a further complication may have
been introduced by McKee as a result of her decision not to exclude child participants
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
244
who missed only one tough adjective (i.e. easy, hard, difficult, or impossible) in the
vocabulary post-test that she administered. As a consequence, even if one of
McKee’s subjects failed to demonstrate knowledge of the meaning of a particular
tough adjective, the same child’s interpretations of TC test items containing this
adjective were still included in the overall analysis of results. In my opinion, this
situation is less than desirable, given that it calls into question the reliability of certain
of the data collected in the study. Accordingly, in my own study, I chose to test my
subjects’ knowledge of the meaning of various tough adjectives prior to presentation
of these lexical items in the TC.
The pre-test I administered consisted of two different experimental trials. The first of
these was the vocabulary test referenced above, which also included assessment of the
ability of my subjects to interpret a degree construction. The second was designed to
evaluate subject memory for story details, for reasons that will be detailed in §4.3,
below.
A total of 122 children participated in the pre-test. All were monolingual native
speakers of British English, whose parents were primarily of middle or working class
background. The 122 subjects, who ranged in age from 3;0 to 7;6, consisted of
roughly equal numbers of boys and girls. Those below the age of 4;8 attended the
Honeypot Pre-School in Willingham, Cambridgeshire, while those over this age
attended Willingham Primary School, which is physically adjacent to the pre-school.
All testing was conducted on site at the particular school the child attended.
For part one of the pre-test, which I will term the vocabulary test, I adopted the same
technique used in McKee’s (op.cit.) vocabulary post-test. Child participants were
presented with a pair of pictures, only one of which matched the correct interpretation
of an expletive-headed tough sentence, such as It is hard for the boy to open the door.
Like McKee, I reasoned that because the logical subject and logical object of the
embedded verb are transparently represented in the surface word order of such a
sentence, children who knew the meaning of the tough adjective featured in the
expletive-headed sentence would assign the sentence a target-like interpretation.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
245
The particular lexical items that I chose to test were the adjectives easy, hard,
difficult, and impossible. Picture pairs were designed to allow flexibility in the order
of presentation of individual adjectives across different test trials. For example, the
same pair of pictures could be used to test one child on the adjective easy and another
child on the adjective hard. In Figures 4.0 and 4.1, below, I provide examples of
specific picture pairs that I used to test easy, hard, and difficult. (Note that all of the
drawings used in the pre-test are the work of Mrs. Karen Harris, a teaching assistant at
the Honeypot Pre-School, whose contribution to the study was much valued and
appreciated.)
Figure 4.0: It is ‘easy/hard/difficult’ for the boy to open the door.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
246
Figure 4.1: It is ‘easy/hard/difficult’ for the dog to get his bone.
Two warm-up items preceded presentation of the actual test items to allow subjects to
familiarize themselves with the requirements of the picture-selection task. For
individual test items, I would first briefly discuss features of both pictures and then
present the test sentence, using language such as that exemplified in (2), below:
(2)
In this picture there is a dog and a bone, and in this picture
there is a dog and a bone. But in one of these pictures, it is
easy for the dog to get his bone. Can you show me which
one?
The test sentence might be repeated a number of times until the child made a choice.
Following this, I sometimes asked the child a follow-up question, which typically
took the form of a ‘why’ question; for example, “Why is it easy for the dog to get his
bone?” or “Why is it hard for the boy to open the door?” Notably, the form of the
follow-up question remained the same regardless of which picture the child had
actually selected since, even in the case of an incorrect choice, the child’s response
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
247
was taken as an assertion that the chosen picture, for her, represented the one that best
matched the meaning of the test sentence.
For testing the adjective impossible, four different picture pairs were originally
designed. In the early stages of the pre-test, two of these proved less suitable than the
others (see the discussion in §4.2.0.1, below) and were therefore dropped from further
use. The two remaining pairs are presented in Figures 4.2 and 4.3, below:
Figure 4.2: It is ‘impossible’ for the duck to eat.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
248
Figure 4.3: It is ‘impossible’ for the fish to swim.
Finally, following a suggestion made by Ianthi Tsimpli (p.c.), the vocabulary test was
extended to include an example of a DC. This suggestion was prompted after the
author’s search of the Wells (1973-77) corpus had yielded very few examples of the
use of either the subject-gap degree clause (SDC) or the object-gap degree clause
(ODC) in naturalistic child speech (see §3.2.0, Chapter 3). Since, on the basis of
these limited findings, I could not be entirely sure that the DC represented a
construction known to children as young as three, I followed Tsimpli’s
recommendation to test my subjects’ familiarity with this type of construction. In
order to avoid prior presentation of any of the structures that would feature in the
main study, and thus the introduction of bias, I chose to assess my potential subjects’
knowledge of the DC through presentation of a subject-gap degree clause (SDC) (e.g.
The mouse is too big to go through the hole in the wall), rather than an ODC. As in
the case of the tough adjectives presented in this phase of the pre-test, subjects were
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
249
required to demonstrate target-like knowledge of the SDC in order to be selected for
participation in the main study.
Two different picture pairs were prepared to test children’s comprehension of the
SDC, with the choice of pair randomly determined. In Figure 4.4, below, I provide an
example of one of the two picture pairs used in this condition:
Figure 4.4: The girl is ‘too small’ to carry the box.
With only two exceptions, the 122 children who participated in the vocabulary pretest were given a total of seven picture pairs to evaluate, consisting of two warm-up
pairs, four tough adjective pairs, and one ODC pair. The two children who proved an
exception were aged 3;0 and 3;3 and had failed both easy and hard pairs on first
presentation. These children were therefore not tested on difficult and impossible but
instead were retested on easy and hard at a later point in the session.
For all subjects, the order of presentation of the two warm-up pairs was fixed, but the
order of presentation of actual test items was randomly varied, as was the left-to-right
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
250
order of presentation of individual picture pairs.1 Finally, the total time required to
administer the vocabulary test generally did not exceed five minutes.
4.2.0.1
Presentation of results
As anticipated, I generally found that even the youngest subjects experienced little
difficulty in complying with the requirements of the picture selection task. Out of the
122 children participants, I found only two who proved unable to perform the task,
both below the age of 3;4.2 Notably, these were not the same two children referenced
in the preceding section, who were tested twice on easy and hard items. Additionally,
I was unable to analyse the results for one child, aged 5;6, whose responses were lost
due to a taping error.
For the purpose of analysing the results of the vocabulary test, I divided the remaining
119 subjects into five separate age groups. Given that the number of subjects in each
age group is not identical, any between-group comparisons that are drawn in the
discussion to follow will be based on percentages rather than numerical counts.
The overall performance of subjects on the adjective easy is reported in Table 4.0,
below:
1
One minor exception to variation in the order of presentation was made with respect to testing of
impossible. Because I felt that accessing the meaning of this adjective might prove challenging for
younger subjects, I decided that testing of impossible should never immediately follow presentation of
the warm-up picture pairs.
2
The results for these two subjects were compromised because they at times insisted on choosing both
pictures even after having been prompted to make only a single selection, with one child habitually
replying, “Me want this one and that one.” Furthermore, their reason for selecting a single picture,
when they did do so, appeared to have more to do with some salient activity depicted in the picture –
consider one child’s comment, “Mouse eat a cheese” - than with the test sentence itself.
D.L. Anderson, University of Cambridge
251
Chapter 4: Experimental Design and Presentation of Results
Age group
3;0 to 3;11 4;0 to 4;10 5;1 to 5;11 6;0 to 6;11 7;0 to 7;6
# of subjs.
29
22
27
27
14
Correct
26
(89.7%)
22
(100%)
27
(100%)
27
(100%)
14
(100%)
Incorrect
3
(10.3%)
0
0
0
0
Don’t know
0
0
0
0
0
Not tested
0
0
0
0
0
Table 4.0: Results for ‘easy’ by age group
As Table 4.0 illustrates, the only errors reported for easy occurred in the youngest age
group and these represented only approximately 10% of the total responses. Subjects
in all other age groups demonstrated target-like knowledge of the meaning of easy.
The overall performance of subjects on the adjective hard is next reported in Table
4.1, below:
Age group
3;0 to 3;11 4;0 to 4;10 5;1 to 5;11 6;0 to 6;11 7;0 to 7;6
# of subjs.
29
22
27
27
14
Correct
21
(72.4%)
22
(100%)
25
(92.6%)
26
(96.3%)
14
(100%)
Incorrect
5
(17.3%)
0
1
(3.7%)
1
(3.7%)
0
Don’t know
3
(10.3%)
0
1
(3.7%)
0
0
Not tested
0
0
0
0
0
Total NTL
8
(27.6%)
0
2
(7.4%)
1
(3.7%)
0
Table 4.1: Results for ‘hard’ by age group
(NB: ‘NTL’ = non-target-like)
D.L. Anderson, University of Cambridge
252
Chapter 4: Experimental Design and Presentation of Results
As indicated in the table above, most subjects over the age of four demonstrated
target-like knowledge of the meaning of hard, and the majority of subjects (i.e.
72.4%) in the youngest age group also performed successfully. Nevertheless, the
number of children between the ages of 3;0 and 3;11 who did not know the meaning
of hard were more than double the number who failed easy.
Table 4.2, below, next compares performance across the five age groups on difficult:
Age group
3;0 to 3;11 4;0 to 4;10 5;1 to 5;11 6;0 to 6;11 7;0 to 7;6
# of subjs.
29
22
27
27
14
Not tested
3
0
1
0
0
Correct
11
(42.3%)
19
(86.4%)
22
(84.6%)
27
(100%)
14
(100%)
Incorrect
13
(50%)
3
(13.6%)
4
(15.4%)
0
0
Don’t know
2
(7.7%)
0
0
0
0
Total NTL
15
(57.7%)
3
(13.6%)
4
(15.4%)
0
0
Table 4.2: Results for ‘difficult’ by age group
As Table 4.2 illustrates, the majority of errors were once again made by subjects in
the youngest age group. Notably, however, whereas approximately 90% of the
children in this age group gave target-like responses on easy, and 72.4% on hard, less
than half of those between 3;0 and 3;11 (i.e. 42.3%) demonstrated target-like
knowledge of the meaning of difficult. These results therefore suggest that difficult is
less likely to be included in the vocabulary of children under the age of four than
either easy or hard. For those subjects over the age of four, however, performance
was considerably better than for children under this age, with the majority
demonstrating a target-like understanding of difficult.
D.L. Anderson, University of Cambridge
253
Chapter 4: Experimental Design and Presentation of Results
Next, the performance of my subjects on the adjective impossible is reported in Table
4.3, below, although, for reasons that will be explained in the discussion to follow, the
figures reported in the table are for only ninety of the 119 participants in the pre-test:
Age group
3;0 to 3;11 4;0 to 4;10 5;1 to 5;11 6;0 to 6;11 7;0 to 7;6
# of subjs.
14
17
21
24
14
Correct
6
(42.9%)
13
(76.5%)
15
(71.4%)
21
(87.5%)
13
(92.9%)
Incorrect
7
(50%)
4
(23.5%)
6
(28.6%)
3
(12.5%)
1
(7.1%)
Don’t know
1
(7.1%)
0
0
0
0
Total NTL
8
(57.1%)
4
(23.5%)
6
(28.6%)
3
(12.5%)
1
(7.1%)
Table 4.3: Results for ‘impossible’ by age group
I chose not to consider certain of the data that I collected for impossible due to
concerns that I had regarding the materials used to collect these results. In particular,
I was concerned that the first two picture pairs I had initially used were not equally
balanced in terms of subject interest and, therefore, that some degree of experimental
control was lost when using these pairs. The first pair contrasted a large goldfish
swimming in some water with the same goldfish sitting on the seat of a bicycle, while
the second contrasted a goldfish swimming in water with one sitting on top of a snowcovered mountain. Unfortunately, pre-testing of these picture pairs had not revealed
any problems, and it was only after sixteen children had been tested on one of these
two pairs that I could see a specific pattern of behaviour emerging, with my subjects
displaying a disproportionate interest in the incongruous site of a goldfish either
sitting on a bicycle seat or resting on a mountain top.
After testing of the sixteen children referenced above, new materials were introduced
that were more equally balanced in terms of subject interest. These are the pictures
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
254
pairs illustrated in Figures 4.2 and 4.3. Furthermore, the sixteen children first tested
on the problematic pairs were later re-tested for their knowledge of impossible, using
the more suitable pairs, in order for me to gain a more reliable assessment of their
understanding of this adjective. Nevertheless, I still chose to err on the side of caution
and exclude data collected from these sixteen children in the figures reported in Table
4.3, since the conditions under which they were tested deviated from those under
which the majority of participants were tested.
Turning now to the figures reported in Table 4.3, it is notable that well under half (i.e.
42.9%) of the subjects in the youngest age group demonstrated target-like knowledge
of impossible, just as was observed in the case of difficult. In the case of impossible,
however, there were an additional fourteen subjects over the age of four who made
errors, including four children over the age of six. By comparison, there were only
seven children over the age of four who missed difficult, and none over the age of six
who missed this adjective. Thus, these results suggest that acquisition of the meaning
of impossible, at least for some children, is relatively delayed in comparison with
acquisition of the meaning of other tough adjectives. I will return to a consideration
of this issue in §4.2.0.2, below.
Finally, Table 4.4, lists the results obtained for the single SDC item:
D.L. Anderson, University of Cambridge
255
Chapter 4: Experimental Design and Presentation of Results
Age group
3;0 to 3;11 4;0 to 4;10 5;1 to 5;11 6;0 to 6;11 7;0 to 7;6
# of subjs.
29
22
27
27
14
Not tested
6
2
0
0
0
Correct
14
(60.9%)
19
(95%)
24
(88.9%)
26
(96.3%)
14
(100%)
Incorrect
8
(34.8%)
1
(5%)
3
(11.1%)
1
(3.7%)
0
Don’t know
1
(4.3%)
0
0
0
0
Total NTL
9
(39.1%)
1
(5%)
3
(11.1%)
1
(3.7%)
0
Table 4.4: Results for subject-control degree construction (SDC) by age group
As the figures in Table 4.4 indicate, most subjects over the age of four performed in a
target-like manner, but the performance of subjects below this age was more mixed,
with approximately 40% of the children in the youngest age group failing the item.
These results suggest, therefore, that a significant number of 3-year-olds do not have
an adult-like command of the SDC and thus, presumably, lack target-like knowledge
of the DC in general. Consequently, successful performance on the single SDC item
in the pre-test became a particularly important consideration when evaluating the
suitability of younger children for participation in the main study.
I had earlier noted that, in addition to performing the picture selection task, some
subjects were also asked to answer a follow-up question of the type, “Why is it
easy/hard/difficult/impossible for X to do Y?” or, in the case of the SDC, “Why is X
too big/small to do Y?” In general, all subjects proved able to comply with this type
of request and the responses, with very few exceptions, served to corroborate the
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
256
child’s choice of picture.3 In Tables 4.5 and 4.6, below, I list representative examples
of the types of responses that I obtained during follow-up questioning, in support of
both correct as well as incorrect choices of pictures:
Subj. no.
& age
Adjective
Picture
choice
no. 72
(5;9)
easy
TL
no. 12
(3;3)
hard
NTL
no.25
(3;10)
difficult
TL
no. 50
(4;10)
difficult
NTL
no. 8
(3;2)
impossible
TL
Why is it impossible for him (= the fish) to
swim? “Cause there’s no water.”
no. 58
(5;4)
impossible
NTL
Why is it impossible for the fish to swim?
“Cause he likes water.”
NTL
Why is it impossible for the duck to eat?
“Because he hasn’t got a paper round his
beak.” And so he can eat? Subject nods.
Does that make it impossible? “Yeah.”
no. 94
(6;7)
impossible
Explanation
Why is it easy for the dog to get his bone?
“Because he hasn’t got a lead tied up to a
stick.”
Why is it hard for the duck to eat? “Cause he
has to eat that (food) because he’s hungry.”
Why is it difficult for the boy to open the
door? “Cause he’s not big enough.”
Why is it difficult (for him) to open the door?
“Because it’s got a handle on it.”
Table 4.5: Selected responses to follow-up questions in vocabulary pre-test
(NB: ‘TL’ = target-like; ‘NTL’ = non-target-like)
3
Exceptional responses were typically provided by subjects under the age of four and these tended to
be non-explanatory rather than strictly incorrect. For example, subject no. 15, age 3;6, gave a targetlike response to the SDC item and was asked, “Why is it hard for her (i.e. to carry the box) in that
picture?”, to which he simply replied, “Because it is.” And when subject no. 10, age 3;3, who gave a
non-target-like judgment of the adjective hard, was asked, “Why is it difficult for the dog to get his
bone?”, he provided a description of the picture of his choice, rather than an explanation, viz., “He’s
running for his bone.” Again, these types of responses to follow-up questions represented only a very
minor portion of the data that I collected.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
Subject
no.& age
257
Picture
choice Explanation
no. 49
(4;9)
T
Why’s he too big to go through that hole? “Cause he’s
ate-en (sic) lots of cheese.”
no 102
(6;9)
T
Why is the mouse too big to go through the hole in the
wall? “Cause it’s tiny. He can only fit a paw or his
nose or his tail in.”
no. 12
(3;3)
NTL
Spontaneous comment: “That can’t go through cause
that’s a baby hole but that one can.”
no.21
(3;7)
NTL
Is the mouse too big to go through the hole in the
wall? “He can.”
Table 4.6: Selected responses to follow-up questions on SDC item
(NB: ‘TL’ = target-like; ‘NTL’ = non-target-like)
4.2.0.2
Discussion
Perhaps the most striking result obtained in the vocabulary pre-test concerns the
performance of subjects on the adjective impossible. Specifically, over half of the
subjects between the ages of 3;0 and 3;11 failed this item, in addition to fourteen over
this age, including four over the age of six. With respect to the performance of those
over four, this finding clearly contrasts with the results obtained for the other three
tough adjectives, since no child over this age missed easy, only three over this age
missed hard, and of the seven subjects over this age who missed difficult, none were
over six. These results therefore suggest relatively delayed acquisition of impossible,
in comparison to easy, hard and difficult.
Further support for this hypothesis is provided on examining the performance of the
above-referenced fourteen subjects over the age of four who missed impossible. This
is because twelve of these children performed like adults with respect to each of the
other three tough adjectives. Thus, acquisition of the meaning of impossible quite
clearly lagged behind that of easy, hard, and difficult, at least for children in this age
group. Given, then, that I had reason to believe that even some of my older subjects
could lack knowledge of impossible, I chose to exclude this particular adjective from
use in the test/control sentences employed in the main study.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
258
Before leaving this topic, it will be instructive to revisit certain findings reported by
Kessel (1970) and McKee (1997a), which were obtained in connection with testing of
the adjective impossible and which were previously discussed in Chapter 3. First,
Kessel reported that some of the younger subjects in the study, the youngest of whom
was nearly 6-years-old, did not know the meaning of this adjective. He did attempt to
address this issue but, to my mind, the procedure he chose was not a wholly
appropriate one. Specifically, he reported that children who appeared not to know the
meaning of impossible, were read the sentence “Linus was very, very, very hard to
see,” rather than the sentence “Linus was impossible to see” (ibid.:24). Aside from
the obvious criticism that use of this procedure introduces some measure of
inconsistency into the testing situation, I am furthermore concerned that Kessel’s
decision to employ a substitution of this type ignores the fact that the two predicates
are not strictly synonymous.
As regards McKee (op.cit.), subjects in this study were independently tested for their
knowledge of tough adjectives, including impossible; however, as earlier noted, this
testing followed rather than preceded presentation of these adjectives in TC items.
Additionally, McKee’s subjects were required only to demonstrate target-like
knowledge of three out of the four adjectives tested. On the basis of the results I have
obtained, it would therefore seem likely that some, if not the majority, of McKee’s
subjects who met the requirement of passing three out of four vocabulary items failed
the adjective impossible. Yet, according to the design of the study, subjects who
missed impossible in the vocabulary assessment would still have been tested for their
knowledge of TCs that contained the same adjective. In my opinion, this situation
provides reason to question the reliability of at least certain of the data presented in
McKee.4
4
The same possibility is, of course, entertained in connection with the performance of McKee’s
(1997a) subjects on TCs that featured easy, hard, or difficult, if any of these represented a vocabulary
item that the subject had failed. Regrettably, since an items analysis is not available for the vocabulary
test that McKee conducted (McKee, p.c.), it is impossible to determine the specific nature of the errors
made by those subjects who provided three out of four correct responses in this condition.
Nevertheless, based on the results of my own vocabulary assessment, it is reasonable to speculate that
those of McKee’s subjects who missed only a single vocabulary item were more likely to have missed
impossible than any of the other three tough adjectives.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
259
In order to avoid introducing a similar problem into my own study, I chose to tighten
McKee’s inclusion criterion by requiring that potential subjects demonstrate
knowledge of the meanings of the adjectives easy, hard, and difficult, prior to
participation in the main experimental study. And, although I continued to pre-test
children’s knowledge of the adjective impossible, I eliminated this adjective from use
in the main study for the reasons earlier stated.
One remaining issue of interest concerns the question of whether children acquire
tough adjectives in any fixed order. Although the vocabulary test was not designed to
investigate this specific issue, I believe that my findings can be taken as suggestive
that a predictable, if not altogether fixed, order of acquisition of these adjectives does
exist. Focusing first on my youngest subjects, that is, those between the ages of 3;0
and 3;11, I have previously noted that nearly 90% of these children gave correct
responses for easy, as compared with 72.4% for hard, 42.3% for difficult, and 42.9%
for impossible. Therefore, insofar as it is reasonable to generalize these findings to
the wider population, it would appear that acquisition of easy and hard typically
precedes acquisition of the latter two adjectives. This hypothesis is further
strengthened by looking at the individual performance of subjects under the age of
four. With regard to the three children who failed easy, two failed all other test items,
while the third got hard correct but failed both difficult and impossible. And of the
five who failed hard, notably, two failed all other items and two got only easy
correct.5
Furthermore, in looking at the performance of subjects over the age of four, the
generalization noted above holds even more strongly since there was only a single
subject out of the ninety children tested who gave a correct response to difficult or
5
The fifth child, age 3;1, represented a bit of an exception since he scored correct on impossible but
failed both hard and difficult. Furthermore, this child was exceptional in another respect since he was
able to provide a clear explanation of his correct judgment of the impossible picture pair, an ability that
many of his peers clearly lacked; for example, when asked, “Why is it impossible for him (= the fish)
to swim?”, he reasonably replied, “Cause he hasn’t got any water.” Since this child was ultimately not
selected to participate in the main study, however, I will not investigate his atypical abilities in any
further detail here.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
260
impossible pairs yet made an error on either easy or hard pairs.6 Thus I observed that
failure on easy and hard served as a fairly reliable predictor of unsuccessful
performance on difficult and impossible, while the opposite was generally found not
to hold true
4.2.1
4.2.1.0
Part two: Memory test
Design
The main study to be discussed later in this chapter involves use of a truth-value
judgment (TVJ) task, in which a child is asked to join a puppet in watching an
experimenter demonstrate a series of actions that are performed by toy characters.
After watching a demonstration of the story, children are asked to judge whether the
puppet’s evaluation of what happened in the story, the test sentence, accurately
describes the events depicted. The task thus indirectly relies on a child’s ability to
recall the main events of the story that has been demonstrated for her and represent
these events in memory long enough to allow her to evaluate the test sentence against
the story context. On the basis of her own experimental findings, Bauer (1997) has
argued that this particular ability is not beyond the capabilities of even very young
children, since children as young as twenty months of age have demonstrated correct
recall of a short sequence of events when they are asked to reproduce the sequence by
acting it out with toys. Moreover, it has also been experimentally demonstrated that
by the age of thirty months, children are capable of reproducing a sequence of events
(e.g. “building a house”) that involves as many as eight separate steps (Bauer &
Fivush 1992, cited in Bauer op.cit.).
The test stories designed for use in the main study included both actions and dialogue,
with the relative balance of each varying somewhat from item to item. None of the
stories, however, included more than seven separate actions performed by the toy
characters, with the average falling somewhere in between five and six actions and/or
6
With regard to the acquisition of difficult and impossible, as previously noted, my results suggest a
general tendency for children to acquire impossible last. Yet there were three instances in which
subjects over the age of four gave correct responses to impossible but failed difficult. I submit that
these results would therefore seem to suggest that the order of acquisition of these two adjectives is not
completely predictable but may instead be subject to some individual variation.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
261
events per story. Consistency was maintained in terms of the length of time required
to demonstrate each story, with no story exceeding one minute in the total time
required for demonstration. Because, as noted above, success in the TVJ task relies
on adequate recall of story events, I decided to incorporate an assessment of this
ability into the existing vocabulary pre-test. I chose to use an act-out task for this
purpose on the recommendation of Bauer (op.cit.), who advocates use of this
technique for children who may lack the linguistic skills to provide an accurate verbal
report of events that they have watched. As Bauer points out, it is widely recognized
that procedural or non-declarative memory is not robust when tested across different
modalities; consequently, successful performance on the act-out task, which is a
cross-modality task, would imply the child’s use of recall or declarative memory.7
For the specific design of the task, I followed the basic recommendations contained in
Goodluck (1996).
The memory assessment involved the same 119 children who participated in the
vocabulary pre-test, although useable data was collected from only 116 of these
children.8 With only minor exceptions (as described in ftnt. 8) each subject was tested
on two stories, one of which included six separate events and another of which
included eight events. At the recommendation of Ianthi Tsimpli (p.c.), the six and
eight-event tasks were written to include at least one event that could be considered
non-plausible and/or non-predictable based on the type of general knowledge that
7
I adopt Mandler’s (1986; cited in Bauer 1997:85) definition of recall memory, which describes a
process in which a “cognitive structure” is retrieved solely on the basis of past experience and in the
absence of “on-going perceptual support.” Although children are given some perceptual support in the
act-out task in the form of the continued presence of the toy props in the experimental workspace, they
receive no such perceptual support with respect to the temporal ordering of events in the story. For this
reason, it is generally accepted that it is recall (or declarative) memory that is tested when children are
asked to reproduce a specific sequence of events.
8
Three of the subjects under the age of four who had participated in the vocabulary test proved unable
to comply with the requirements of the memory test and, consequently, any data collected from these
children was excluded from my analysis of the final results.
It should also be noted that eight of the 116 participants were tested according to a slightly different
procedure. These children, whose abilities were assessed during the first two days of testing, were
given three rather than two stories, featuring a series of four events, six events, eight events,
respectively. It became apparent early on in the testing process, however, that even the youngest
children experienced no difficulty in acting out the four-event task; therefore, I took the decision to
administer only the six- and eight-event tasks to subsequent subjects, since I believed that children’s
performance on these two tasks was likely to be the most informative as regards their abilities.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
262
even a very young child might reasonably be expected to possess. This step was
taken to ensure that subjects constructed an on-line representation of the story for the
purposes of recall, rather than a representation based on an entirely predictable series
of events. Several alternative stories were developed for use in the memory pre-test;
examples of the two most frequently used are provided in (3) and (4), below:
(3)
Six-event story
a.
A pig drives his car to a local park. (implausible event)
b.
At the park, the pig first plays on a roundabout.
c.
The pig uses his snout to push a little boy who is swinging on a swing.
(unpredictable event)
d.
The boy thanks the pig.
e.
The pig announces that the pushing he has done has made him hungry.
f.
The pig gets back into his car and drives home.
(4)
Eight-event story
a.
A little girl on a horse jumps over a barrier.
b.
The horse is thirsty and has a drink of water from a watering trough.
c.
The little girl’s father tells her that it is time to go home.
d.
The girl dismounts.
e.
The father offers the horse a bucket of food.
f.
A little mouse runs over and tries to eat food from the bucket.
(unpredictable event)
g.
The father sends the mouse away.
h.
The father and daughter walk home.
Before the demonstration of the story, subjects were told that they would not be asked
to remember “what the characters say,” but rather only “what they do.” Additionally,
subjects were allowed to watch up to three demonstrations of a story before being
asked to retell the story themselves. Successful performance required that the child be
able to demonstrate each of the events or steps in the story in the exact sequence in
which they were originally presented by the experimenter. In the case of several older
subjects who expressed reluctance to manipulate the toy props, a correct oral
D.L. Anderson, University of Cambridge
263
Chapter 4: Experimental Design and Presentation of Results
presentation of the story events was also considered an acceptable test of their recall
memory.
4.2.1.1
Presentation of results
The overall performance of subjects on the six-event memory task is reported in Table
4.7, below, with the 116 participants divided into four age groups for the purpose of
comparison:
Age group
# of
subjs.
Pass
Fail
N/A9
3;0 to 3;11
26
16
(64%)
9
(36%)
1
4;0 to 4;11
22
21
(95.5%)
1
(4.5%)
0
5;0 to 5;11
27
25
(92.6%)
2
(7.4%)
0
6;0 to 7;6
41
34
(94.4%)
2
(5.6%)
5
Table 4.7: Performance on six-event memory task by age group
As Table 4.7 illustrates, subjects over the age of four experienced little difficulty in
successfully performing the six-event task. The task proved more challenging,
though, for those subjects under the age of 3;11, over one-third of whom failed to
perform the task correctly.
Turning to the overall performance of subjects on the eight-event task, these results
are reported in Table 4.8, below:
9
The figures listed under ‘N/A’ in Tables 4.7 and 4.8 represent instances in which a pass/fail
assessment of either a six- or eight-event story could not be made. In two of these cases, an eight-event
task was not administered at all because the subjects, both under the age of 3;3, experienced great
difficulty in completing the six-event task. The other cases can be attributed to recording equipment
failure or to unforeseen interruptions in the task that precluded a normal assessment of the child’s
abilities.
D.L. Anderson, University of Cambridge
264
Chapter 4: Experimental Design and Presentation of Results
Age group
# of
subjs.
Pass
Fail
N/A
3;0 to 3;11
26
5
(21.7%)
18
(78.3%)
3
4;0 to 4;11
22
11
(55%)
9
(45%)
2
5;0 to 5;11
27
11
(42.3%)
15
(57.7%)
1
6;0 to 7;6
41
26
(65%)
14
(35%)
1
Table 4.8: Performance on eight-event memory task by age group
The data listed in Table 4.8 indicate that the performance of all age groups on the
eight-event task lagged behind general performance on the six-step task. In the case
of the youngest age group, proportions were reversed as compared to the six-event
task, with more children (78.3%) failing rather than passing (21.7%) the eight-event
task. For those subjects between the ages of 4;0 and 5;11, performance was mixed,
with roughly equal numbers passing and failing the latter task. It is therefore only in
the case of the oldest subjects in the study, that is, in the case of those over the age of
six, that successful performance on the eight-event task can be seen to become more
assured, with approximately two-thirds of those tested passing this item.
Finally, in Table 4.9, below, I illustrate the performance of each age group across both
conditions:
D.L. Anderson, University of Cambridge
265
Chapter 4: Experimental Design and Presentation of Results
Failed only Failed only
6-event
8-event
Age group
# of
subjs.
3;0 to 3;11
26
1
(4.5%)
10
(45.5%)
4;0 to 4;11
22
0
8
(40%)
1
(5%)
11
(55%)
2
5;0 to 5;11
27
0
13
(50%)
2
(7.7%)
11
(42.3%)
1
6;0 to 7;6
41
0
10
(28.6%)
2
(5.7%)
23
(65.7%)
6
Failed
both
Passed
both
4
7
(31.8%) (18.2%)
N/A
4
Table 4.9: Performance on six-event and eight-event tasks by age group
As Table 4.9 indicates, the percentage of children who passed both test items remains
quite low in the case of the 3-year-olds, and it is only after the age of six that
performance begins to approach adult norms. Problems raised by the relatively poor
performance of my younger subjects in both conditions of the memory pre-test will be
considered in the following section.
4.2.1.2
Discussion
As the results reported in Tables 4.8 and 4.9 indicate, the eight-event task presented a
real challenge to a sizeable number of children across all of the age groups included in
the study. What the figures in the tables fail to reflect, however, is the fact that the
performance of individual subjects was not entirely uniform, with some observed to
experience a greater degree of difficulty in performing the eight-event task than
others. A closer examination of the types of errors made on the horse story (cf. (3)),
which was the most frequently used test item in this condition, is informative in this
regard. A total of sixty-seven subjects were given this story and, of these, thirty
children (or 45%) passed and thirty-seven children (or 55%) failed the task. I found
that the most common error subjects made was that of either completely forgetting to
include the final event of the story or of only remembering to include the final event
when supplied with a prompt by the experimenter (e.g. “And what happens at the end
of the story?”). This type of error occurred over 50% of the time, and this rate of
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
266
error was similar across all of the eight-event stories I used. These findings therefore
suggest to me that my subjects did not omit the final event in the horse story because
it was not salient or particularly memorable, as this type of error was witnessed
regardless of the particular series of events presented in this condition. Because the
error of omitting the final step in the eight-event story was observed to occur with
such frequency across subjects of all age groups, I felt it was reasonable to relax the
criterion of successful performance that had originally set for this task. Accordingly,
I chose to assign a passing score to any child who made only a single error of this
type.
Turning now to the second most frequent error made on the horse story, this involved
children omitting the step in which the little girl dismounts the horse at her father’s
request. Sixteen children, or 24% of the total, made this error, with occurrences fairly
evenly distributed across subjects in all age groups. In fact, errors of this type far
exceeded those in which children omitted (or altered) the unpredictable event of the
mouse trying to eat the horse’s food, which therefore suggests that subjects were not
simply relying on event predictability as a means of recall. Rather, I believe that
these results indicate that event salience serves as a particularly strong cue for recall
memory, with the activities of the mouse, for example, generating more interest and
therefore a stronger memory trace than the routine event of the girl dismounting the
horse.
Returning to Table 4.9, recall that the percentage of children under the age of 3;11
who passed both tasks was less than 20% of the total. This finding thus presented a
problem with regard to the selection criteria that I had initially established, which had
conditioned a child’s participation in the main study on her successful performance in
both pre-tests. In order to secure a reasonable number of subjects below the age of
four, I chose to amend the inclusion criteria by allowing successful performance on
the vocabulary test to serve as the primary consideration when selecting subjects
below this age. With regard to the memory test, I felt it was reasonable, based on the
findings reported above, to relax the criterion for successful performance for the
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
267
under 4-year-olds. Specifically, I required that children under this age pass the sixevent task and additionally make no more than one error on the eight-event task.
In fact, as later observation established, those children below the age of four who
were selected to participate in the main study on the basis of the revised criterion
performed comparably to their peers who had passed both memory tasks with ease.
Moreover, certain of these subjects even showed exceptional ability to retain story
events in memory when these events were introduced in the context of the TVJ task.
Thus, I observed that less-than-perfect performance on the eight-event task did not
necessarily predict a subject’s inability to cope with the requirements of the TVJ task,
and that, interestingly, the opposite appeared to hold true in several cases. In
particular, there were four subjects who passed both the six-and eight-event tasks, but
whom I nevertheless later judged to be unsuited for participation in the main study.
4.3
Design of main experimental study
4.3.0
The truth-value judgment task
4.3.0.0
General features of design
As earlier noted, in both the pilot and main studies I used a TVJ task (cf. Crain and
Thornton 1998; Gordon 1998) to test children’s comprehension of TCs and other
syntactically related structures. My reasons for adopting this methodology will be
reviewed in the following section. First, however, I present a basic description of the
features of the TVJ task in this section, which is primarily based on the discussion
contained in Chapter 27 of Crain and Thornton (op.cit.:221-37). As noted in the
previous section, the TVJ task involves a child who joins a puppet in watching an
experimenter tell a short story with toys. A photograph of the experimental
workspace, taken from my own experimental study, is offered in Figure 4.5, below:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
268
Figure 4.5: Set-up of experimental workspace for TVJ task (Anderson 2002a,b)
At the end of the story, the toy props are left in position in the workspace to serve as a
visual reminder of the events just described. The puppet is then asked to explain what
happened in the story and it is in this manner that the test sentence is presented to the
child. After presentation of the test sentence, the child is called upon to evaluate
whether the puppet’s assessment of what happened in the story (i.e. the test sentence)
is true (“right”) or false (“wrong”) in the context of the story that has just been
demonstrated.
At this point, an elicitation measure can be added to the task in which the child is
asked to explain why the puppet’s response has been judged as being incorrect; for
example, Crain & Thornton recommend asking the child, “What really happened (in
the story)?” As the authors note, this feature can prove useful in establishing that a
child has rejected the puppet’s statement for a legitimate reason. Additionally, the
experimenter may allow the child to either reward the puppet for a correct assessment
of the story or offer it some type of negative consequence for an incorrect assessment
as a means of enhancing the child’s enjoyment of the task (op.cit.:222).
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
269
In my own study, I adopted this last recommendation and I believe that it served, as
predicted, to enhance the cooperation of my subjects. On the basis of remarks made
by some of the parents of these children, however, I felt compelled to develop an
alternative procedure for providing feedback to the puppet to those previously
discussed in the literature. For example, Crain & Thornton (1998) used a technique in
which the child pretends to feed the puppet either a desirable (plastic) food or an
undesirable food. Alternatively, Crain and McKee (1985) asked children to either
feed the puppet a cookie as a reward for a true statement or a rag as a consequence of
making a false statement, activities which they report that their child subjects enjoyed
very much. Yet, parents of some of the children participating in my study expressed
concern that use of either of these techniques might inspire their children to treat reallife pets in a similar fashion or might serve to reinforce their children’s dislike of
certain healthy foods.
Therefore, in response to the concerns noted above, I chose two self-inking stamps to
serve the same function as the cookie and rag in Crain and McKee’s study. One
stamp featured a gold star and the other a smiling face. If the child judged that the
puppet had correctly assessed what happened in the story, then they were instructed to
stamp the paper with a gold star; for an incorrect assessment, the smiley face stamp
was used. In order to avoid the possibility that preference for the use of one stamp
over the other would influence the child’s judgments of the puppet’s statements, the
stamps themselves were carefully selected so as to be equally desirable to child
subjects. I also chose to label the stamp associated with an incorrect assessment of
the story (i.e. the smiley face) a try-again stamp so that children would not be tempted
to associate use of this stamp with punishment of the puppet and, accordingly, restrict
their negative judgments of the puppet’s statements. As a further precaution against
this last possibility, I continually encouraged my subjects to view themselves as
teachers who took responsibility for making proper assessments of the puppet’s
statements in order to help him learn.
In fact, I found that my incorporation of the self-inking stamps into the experimental
task conveyed an unexpected benefit, which was that children were given something
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
270
to do with their hands. I noted that my younger subjects in particular needed frequent
reminders not to touch the toy props during storytelling, even though they were freely
allowed to do so before the story began or after offering their judgment of the test
sentence. The stamps therefore helped to curb these tendencies and I observed that
my subjects also enjoyed taking responsibility for keeping a record of the puppet’s
performance.
4.3.0.1
Methodological advantages
My decision to use the TVJ task was motivated by my belief that use of this task
offers clear methodological advantages over the use of other experimental techniques.
Crain and Thornton (1998:210), who advocate exclusive use of the TVJ task for
evaluating comprehension, and Gordon (1998) offer a number of reasons why this
task can be viewed as superior to other alternatives. For Crain and Thornton, the first
and foremost advantage offered is that the experimenter is not only allowed to control
the form of the sentence presented to the child but also the context in which the
sentence is to be interpreted. As they point out, certain alternative methods, such as
the act-out task, suffer by comparison, since the experimenter can control the form of
the test sentences but not the interpretive context. The authors argue that if it can be
demonstrated that a child is able to accurately distinguish contexts that make a
particular sentence true from those that make it false, then it is reasonable to infer that
the child shares adult-like knowledge of the sentence/meaning pair under
investigation. Furthermore, as Gordon (ibid.:212) notes, evaluation of the truth value
of an utterance does not require the use of metalinguistic skills, as required in the
performance of the grammaticality judgment task; instead, successful performance in
the TVJ task requires only that the child have some conception of the truth relations
that hold between a sentence and the particular situation to which it refers.
Another advantage of the TVJ task concerns the design of individual test items.
Specifically, stories that precede the presentation of test sentences can be written in
such a way as to support two potential interpretations of a single sentence. As Crain
and Thornton (op.cit.) observe, this feature thus proves particularly useful in studies
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
271
such as my own, in which the issue to be investigated is whether children can assign
more than one interpretation to a sentence that is unambiguous in the adult grammar.
It has also been argued that use of the TVJ task, in comparison with the use of other
techniques, helps minimize demands placed on children’s memory resources. Gordon
(op.cit.), for example, notes that the TVJ task essentially asks the child to perform
normal discourse processing and so can therefore be viewed as imposing no
extraordinary demands on memory. Furthermore, it can be argued that one design
feature of the TVJ task actually facilitates recall. This concerns the step in which the
toy props are left in place at the end of the story, which allows the child to consult a
visual record of the story context against which the test sentence is to be evaluated.
A final very important advantage of the TVJ task is a psychological one, since child
subjects are never made to feel as if their own knowledge is being tested. Instead, it is
the puppet whose responses are under evaluation and who is perceived as being
occasionally fallible. In contrast, the child herself is treated at all times as an
authority with respect to her judgments of the puppet’s statements and, for this reason,
her own judgments are never subject to any correction or comment. Furthermore, the
inclusion of a puppet in the task confers a psychological advantage in its own right, as
Crain and Thornton (op.cit.) point out, since interaction with a puppet, rather than
with an adult, helps address any reluctance that a child may feel to provide negative
judgments of statements that are made by adult experimenters.
In my own experience with use of the TVJ task I found confirmation of the general
claim that the task is well suited to the psychological needs of children. This is
because I observed that subjects of all ages were comfortable with the testing
situation, took the role of playing teacher to the puppet quite seriously and expressed
great willingness to participate in further sessions.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
4.3.0.2
272
Procedure
With regard to the design of the test items used in the study, a complete list of which
is offered in Appendix I, I followed a number of specific recommendations made by
Crain and Thornton (1998:222-7). It is perhaps easiest to illustrate these
recommendations, as well as the reasons the authors advocate them, by analysing how
they were incorporated into the design of a single test item. As a representative
example, I have chosen story 10, which preceded presentation of the TC, The dog was
difficult to teach. For adult speakers of English, there is of course only one possible
interpretation of this sentence, in which an unspecified agent experiences difficulty
when attempting to teach a dog something. I have previously termed this the object
reading of the sentence in reference to the logical role that is played by the sentenceinitial determiner phrase (DP) with respect to the embedded infinitive verb. As
discussed in Chapter 3, however, it has been proposed in the literature that children
may initially assign an alternative interpretation to the TC, which I have termed the
subject reading. With reference to story 10, this reading would be one in which the
dog experiences difficulty when attempting to teach something to an unspecified
entity. Both readings of TC10 are listed in (5) below, along with a suggested
syntactic representation of each sentence:10
10
The embedded verb used here, to teach, is recognized as one that licenses unspecified object
deletion in English (Rizzi 1986:509). Specifically, the verb may occur with a phonologically null
object, which is construed as having a canonical or prototypical interpretation (cf. I’m planning to teach
in the autumn). Since the story accompanying the presentation of TC10 provides a specific context for
the interpretation of the sentence, however, I made the tentative assumption that children who chose a
subject reading would have a specific referent in mind for the pro object of the infinitive verb, which in
this case would be the pig. Thus, the null object of teach in this instance would more accurately
represent an example of what Rizzi (ibid.) has termed null complement anaphora (see also Hankamer
and Sag 1976), since it would be assumed to take definite and anaphoric reference.
Notably, however, the adult grammar does not recognize this last option for the verb to teach (cf. *I
taught, with the interpretation I taught Jane). Nevertheless, I will argue that children who assign
subject readings to the TC do allow a definite interpretation of the null embedded object. Supportive
evidence for this claim consists in part of explanations offered by my child subjects for their nontarget-like readings of the TC, which will be reviewed in §4.5.0.2. Additionally, I am aware of other
experimental studies, such as Eisenberg and Cairns (1994:722), which have produced findings that
young children (i.e. those below the age of five) are willing to entertain object drop in certain contexts
that adults would reject.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(5)
273
a. Object reading - Allowed in the adult grammar
‘Someone finds it difficult to teach the dog something.’
The dogi was difficult [PRO to teach ti].
b. Subject reading - Disallowed (*) in the adult grammar
‘The dog finds it difficult to teach someone something.’
*The dogi was difficult [PROi to teach prok].
The main issue I sought to investigate in the present study was this: Do children at an
early stage of grammatical development share the same constraints on interpretation
of TCs that adult speakers of the language do? In particular, are children restricted, as
adults are, to the assignment of an object reading only? As is standard in the
literature, I associated the null hypothesis with the assumption that children possess
target-like knowledge of the control restrictions associated with the tough adjective;
according to this hypothesis, then, the only interpretation available to the child should
be the object reading.11
Initially, I took the experimental hypothesis to assert that children lack target-like
knowledge of the interpretive constraints associated with the tough adjective.
However, a potential problem arose in that there are actually two possibilities that
exist with respect to the state of the child’s grammar; these are schematically
represented in (6) below:
(6)
Experimental hypothesis - Either (a) or (b):
a. Object and subject readings will be available.
b. Only subject readings will be available.
According to the first possibility, the child’s grammar treats the TC as ambiguous,
while according to the second, only the subject reading of the TC is allowed. In both
cases, the child’s interpretation of the TC is reasonably considered non-target-like. I
11
In this respect, I deviate from Crain and Thornton (1998) since the authors, contrary to general
practice, associate the null hypothesis with children’s lack of target-like knowledge of some
grammatical principle and/or constraint (see ibid.:221-2 for discussion). In my own study, I chose to
follow standard practice in the experimental literature (cf. Hsu & Hsu 1998:316) and take the null
hypothesis to describe the situation in which the two populations compared (i.e. children and adults),
do not differ with respect to their knowledge of the interpretive constraints associated with the TC.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
274
therefore chose to formulate the experimental hypothesis in more narrow terms,
according to the assumption that the child lacks the ability to construct a target-like
syntactic representation of any NOS and is consequently restricted to only subject
readings of the TC. Thus, I resolved that the pattern of performance represented in
(6a) would not be taken as providing support for the experimental hypothesis, even
though this pattern of performance is also rightly considered non-target-like.
The determination as to whether a particular child subject performed in a manner
consistent with (6a) or (6b) was made by analysing the performance of each subject
across the full set of twelve TC items. Note that this analysis of individual subject
performance constitutes the focus of §4.5.0.1.
Returning to the issue of the design of TC items, Figure 4.6, below, illustrates the
materials used for the story that accompanied TC10, and Figure 4.7 presents the text
of the story:
Figure 4.6: Materials used for TC10, ‘The dog was difficult to teach.’
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
275
[Toy props: A pig, a dog, a cat, a football, a goal (fence), and a slide.]
Narrator: This is a story about a dog and a pig who are playing in the park.
Dog: I can teach you how to play football, pig. Would you like that?
Pig: But you’re a dog, not a football player. How can you teach me how to play
football?
Dog: Watch! I push the football with my nose. Then I run with the ball and push it
into the fence. There! I scored a goal. Now you do the same thing, pig.
Pig: OK. Like this? (Pig pushes ball into fence with his nose.) Yea! I scored a goal
too. Thanks for teaching me how to play football, dog. Now I’ll teach you how to
play on the slide. Just watch. You go up the steps like this and then slide down this
end. Whee! Your turn, dog.
Dog: Like this? (The dog tries to go up the slide from the wrong end.)
Pig: No, dog, that’s not right. Go round to the steps and try again.
Dog: (noticing cat) Hey, is that a cat over there? Forget about the slide, I’m going to
chase that cat! (Dog runs after cat. Cat makes ‘meow’ noise.)
Puppet: I know what happened in that story. The dog was difficult to teach.
Figure 4.7: Text of story preceding TC10, ‘The dog was difficult to teach.’
As was noted earlier in this section, the TVJ story allows presentation of two different
interpretive contexts against which the truth-value of a particular test sentence can be
evaluated. In the case of TC10, the first such context involves a situation in which a
pig experiences difficulty in teaching a dog how to slide down a playground slide.
This is the intended context for the target-like or object reading of the sentence, which
should be judged true on this interpretation since the pig does indeed experience
difficulty in teaching the dog. The second context involves a situation in which the
dog does not experience any difficulty in teaching the pig how to play football. This
is the intended context for the non-target-like or subject reading of the sentence,
which should be judged false.
There are two respects, however, in which the design of story 10 did not meet the
specific recommendations of Crain and Thornton (op.cit.). In both cases, this is the
result of my decision to adopt the traditional conception of the null hypothesis (see
ftnt. 11), rather than the conception proposed by the authors. The first point of
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
276
difference concerns Crain and Thornton’s recommendation that the false judgment of
a TVJ sentence should always be associated with the target-like reading of the
sentence and, conversely, that the non-target-like reading should always correspond
with an affirmative response. With reference to story 10, the reader will observe that
the opposite holds true. Since Crain and Thornton associate the experimental
hypothesis with the assumption that the child shares target-like knowledge of the
structure under investigation, they are concerned to avoid a situation in which a
child’s affirmative response may erroneously be taken as providing support for the
experimental hypothesis. Such a situation can arise, the authors explain, when a child
provides a “yes” response to an experimental question due to confusion or a lack of
understanding of the test sentence, rather than because the child possesses target-like
knowledge of the structure under consideration.12 I return to this issue shortly.
The second way in which story 10 fails to meet the recommendations contained in
Crain and Thornton is with respect to the order of presentation of story events. In
story 10, the final sequence of events pertains to the pig’s attempts to teach the dog,
which is the intended context for an object or target-like reading of the test sentence
(i.e. The dogi was difficult for the pig to teach ei). Crain and Thornton suggest,
however, that the final events presented in a TVJ story should favour the non-targetlike reading of a test sentence in order to make this reading the most salient and
therefore preferred interpretation, if allowed by the child’s grammar. The authors
argue that if a child overrides a bias in the presentation of story events in order to
assign a target-like interpretation to the sentence, then the child’s performance will
serve as more robust evidence of target-like syntactic competence than if the child’s
correct response corresponds to the most recently mentioned event (op.cit.:224).
It is important to note, however, that the argument referenced above is predicated on
one fundamental assumption that the authors make, which is that a child’s
12
The type of error described here – that is, one in which experimental results are taken to provide
support for the experimental hypothesis when, in fact, the null hypothesis is true - is sometimes termed
a Type I error (Crain and Thornton 1998:213). I have chosen not to use this term in the discussion
above, however, due to the potential for confusion; this is because the consequence of committing such
an error will be seen to differ according to whether one adopts the formulation of the experimental
hypothesis advocated in Crain and Thornton or the formulation that I have proposed here.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
277
performance should not be affected by any bias in test design if the child possesses
target-like knowledge of the grammatical constraint under investigation. That is,
Crain and Thornton expect children, like adults, to override any such biases if their
knowledge of a particular test item is target-like. Ianthi Tsimpli (p.c.), however, has
questioned whether the introduction of biases in the design of test items might
adversely affect the performance of children whose grammars are not yet target-like
but are instead in a state of development. Since this is a consideration that, as far as I
can determine, is not addressed by Crain and Thornton, I chose to exercise caution in
adopting their specific recommendations with regard to what I will term the
affirmative response and order of presentation biases. In particular, I decided to vary
the direction of the two types of biases across individual items as follows. First, I
identified test items as those pairing the true response with the target-like or object
reading of the TC and control items as those exhibiting the opposite pairing. In my
study, control items thus served the primary function of providing child subjects with
an alternative pairing of the affirmative response with the subject reading of the TC.
It was hoped that this feature of design would help guard against the possible
occurrence of a training effect, in which the subject comes to associate the true or
false response exclusively with either a target-like or non-target-like interpretation of
experimental items.
Additionally, I varied the direction of the order of presentation bias in pairs of both
test and control items. The latter feature of design is perhaps best illustrated
graphically, and so in Table 4.10, below, I have categorized each of the twelve TC
items used in the study according to the direction of the two biases:13
13
It was not until testing of subjects had commenced that I noticed that the story accompanying hard 8
included an order of presentation bias towards the subject or non-target-like reading of the sentence,
even though this item should have featured the reverse bias. As data had already been collected for this
item at the point at which the error was discovered, I felt that revision of the story and/or test sentence
at this late stage would be ill advised.
D.L. Anderson, University of Cambridge
278
Chapter 4: Experimental Design and Presentation of Results
Item
type
Affirmative response
(AR) bias
Order of presentation
(OP) bias
Abbr.
Test/control
items
Test
object or targetlike (TL) reading
subject or non-targetlike (NTL) reading
TS-SB
easy 4
hard 6
hard 8
difficult 12
Test
object (TL) reading
object (TL) reading
TS-OB
easy 2,
difficult 10
Control subject (NTL) reading subject (NTL) reading
CS-SB
easy 3,
hard 7
difficult 11
Control subject (NTL)reading
CS-OB
easy 1,
hard 5,
difficult 9
object (TL) reading
Table 4.10: Distribution of biases in TC experimental items
(NB: ‘TS’ = test sentence; ‘CS’ = control sentence;
‘SB’ = subject reading bias; ‘OB’ = object reading bias)
The means of classification that I have adopted in Table 4.10 (see ‘Abbr.’ column)
reflects the fact that test items are distinguished from control items according to
whether an affirmative judgement is associated with a target-like or non-target-like
interpretation of an item. Furthermore, like items are distinguished in terms of
whether the final event in the story is linked to a target-like or non-target-like
interpretation of the item. The notation TS-OB, for example, indicates first that the
pairing of an affirmative (i.e. “true”) response with a target-like reading of the TC
distinguishes this as a test sentence (TS) and, second, that the order of presentation of
story events also favours the object or target-like reading of the TC. (Note that the
effect of these particular biases on subject performance will be discussed in §4.5.0.0)
Returning to the original design recommendations advocated by Crain and Thornton,
there is an additional pragmatic consideration that the authors discuss, which I
incorporated into the design of my own experimental items. This is what the authors
term the condition of plausible dissent (or Russell’s maxim) (op.cit.:226). Here, Crain
and Thornton adopt a view originally espoused by Russell (1948:138; cited in Crain
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
279
and Thornton op.cit.), who proposed that a possible negative judgment of a sentence
is felicitously made only when the speaker has already made or considered a positive
evaluation of the same. Accordingly, Crain and Thornton argue that it is
unreasonable to expect children to judge a test sentence false if the discourse context
does not make it clear precisely why the sentence is false. The authors therefore
specifically recommend that the design of each TVJ story meet the condition of
plausible dissent (hereafter, CPD) (op.cit.:225-6).14
With reference to the sample item previously discussed, TC10, the reader can verify
by consulting Figure 4.7 that I included an event in the story that meets the CPD.
Recall that a false judgement of TC10 is associated with a non-target-like
interpretation of the sentence The dog was difficult to teach, an interpretation which is
supported by the success that a dog experiences in teaching a pig how to play football
(i.e. *The dog was not difficult to teach the pig). The CPD requires, however, that the
listener consider a positive judgment of this reading of the sentence prior to judging it
false. This requirement was met in the case of TC10 by adding an event in which the
pig expresses some reservation about the dog’s ability to teach football, stating, “But
you’re a dog, not a football player. How can you teach me how to play football?”
Thus, the possibility is briefly introduced that the dog might prove an inadequate
instructor for the pig, even though subsequent events in the story rule out this initial
consideration.
The last of Crain and Thornton’s recommendations to be discussed in this section
pertains to the inclusion of filler sentences in experimental trials. The authors
advocate the use of such items as a means of: (1) keeping children motivated and/or
interested in the experimental task; (2) determining whether a child is paying proper
attention; and (3) establishing whether a child is responding appropriately to
experimental items or merely providing the same answer in all circumstances. In my
14
Gordon (1998:216-18), however, has questioned whether the CPD is a necessary requirement for the
design of the TVJ task. He suggests the possibility that the same effect – that is, one of providing a
reasonable explanation for the negation of a test sentence - might in some circumstances be achievable
through highlighting certain information that is already present in the background of the story.
Although I do not dismiss the validity of Gordon’s suggestion, I nevertheless chose to adopt Crain and
Thornton’s recommendations in order to err on the side of caution when designing my own
experimental items.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
280
own study, I used filler items to serve all of these purposes. Nonetheless, I chose to
deviate from one specific recommendation for the design of these items that is
advanced by Crain and Thornton; this concerns their advice that filler items should
ideally be similar in complexity to test and/or control items (op.cit.:134). Instead, I
chose to present filler items that were simpler in terms of both length and content than
either test or control items. This is because pilot testing of TCs and related NOS had
indicated that a relatively high level of concentration was required in order for
subjects to properly evaluate these particular sentences against the TVJ story context.
And since this observation held true as much for my adult subjects as for my child
subjects, I felt that the use of filler items of lesser complexity would better serve to
maintain the interest of the subjects and promote confidence in their overall ability to
perform the experimental task.
4.3.0.3
Preparation for use of the task
In this section, I briefly review steps that I took several months before the
commencement of the main study in order to familiarize potential subjects with the
requirements of the TVJ task. Prior to conducting both the pilot and main
experiments, I worked for several months as a volunteer at both the Honeypot PreSchool and Willingham Primary School, the two facilities from which experimental
subjects were exclusively drawn. During this period of time, I became a familiar
presence in various classrooms and was therefore able to develop a comfortable
working relationship with subjects prior to their participation in the experiment. It
was also during this period that children in the two facilities were first introduced to
Fudge, a plush dog puppet I had chosen for use in the study (see Figure 4.5, §4.3.0.0).
In order to encourage interest in the puppet, children were asked to help think up
suitable names for him and the name Fudge was selected according to the results of
this competition. Some time later, I introduced a special game that involved the
puppet, an activity specially designed to serve as preparation for participation in the
TVJ task. Children were asked to join Fudge in listening to a story that was read by
their teacher. At the end of the story, Fudge would raise his paw and the teacher
would call on him to make some comment about the story. The children were told in
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
281
advance that sometimes Fudge listens very well and can therefore be expected to say
something sensible about the story, but other times does not pay close enough
attention and can therefore be expected to say something wrong or silly. Children
were asked to listen to what Fudge said and to decide whether his statement was right
or wrong, according to the context of the story that the teacher had just read to them.
Children in all of the classrooms enjoyed this game immensely and readily took to the
role of playing surrogate teacher to Fudge.
Importantly, however, the grammatical structures which were to be tested in the
experimental study were not introduced at any time prior to the study itself. This is
because these classroom sessions were solely intended to serve as a means of
introducing children to the procedures associated with the TVJ task, rather than to the
test items themselves.
4.3.1
Test/control items
4.3.1.0
Selection of vocabulary
In designing the present study, I was concerned to address what I consider a weakness
of certain previous experimental studies of the acquisition of the TC. This pertains to
the choice of vocabulary to be used in TC items. In Cromer (1970), for example, I
earlier noted (see Chapter 3, §3.2.1.0) that all of the TCs used in his study featured the
single embedded infinitive verb to bite. Yet, as I pointed out, this verb is strongly
associated with a transitive rather than intransitive interpretation in adult English (cf.
?John bit), and therefore I believe it is possible that use of this verb could have biased
his subjects’ interpretation of test items. Similarly, in McKee (1997a), the TCs that
were offered to child subjects all contained infinitive verbs (e.g. reach, catch, chase,
and kick) that are standardly considered obligatorily transitive (see Levin 1993).
Since both studies included children who provided target as well as non-target-like
readings of the TC, I acknowledge that the bias considered here cannot be argued to
have wholly dictated the subjects’ choice of sentence interpretation; nevertheless, I
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
282
submit that it remains a possibility that exclusive use of strongly transitive verbs
could have compromised the reliability of the data collected in the two studies.15
In my own study, therefore, I was interested to make the two potential representations
of the TC, DC, and other NOS equally accessible to child subjects. Accordingly, I
decided to use experimental items that featured only verbs that readily allow
intransitive as well as transitive readings. Moreover, I sought to restrict my selection
to only those verbs that are likely to be included in the receptive vocabulary of a 3year-old. On this basis, I selected the following six verbs for use in my experimental
items, with the exception of passive sentences: draw, eat, fight, help, ride, and
teach.16 (Note that the design of passive items will be discussed in §4.5.4.) Although
only three of these verbs, draw, eat, and teach, are included in Levin’s (op.cit.) list of
verbs that allow deletion of an unspecified or indefinite object, all six share certain
critical features of syntactic distribution with verbs of this class.17 For example, all
15
Ingham (1989:310-1) collected interesting evidence in this regard, arguing on the basis of his own
experimental findings that children as young as three are already sensitive to the argument structure
restrictions associated with particular verbs in English. He asserted that young children tend to use
transitive verbs in intransitive contexts only when the adult grammar licenses this possibility. This
claim would therefore seem to predict that Cromer (1970) and McKee’s subjects (1997a) should have
been strongly biased towards the transitive interpretation of the embedded verb in TC items and thus to
a target-like interpretation of the TC. Nevertheless, as discussed in ftnt. 10, the manner in which
children utilize “object drop” in English does not necessarily match adult practice in all respects (cf.
Eisenberg and Cairns 1994) and therefore I must leave as an unresolved issue the extent to which
Cromer and McKee’s subjects may have been influenced by their use of strongly transitive embedded
verbs.
16
In order to meet the latter of the two considerations referenced above, I used the (Communicative
Development Inventory) WORD list (Reznick and Goldsmith 1989:94-7) as a primary, although not
exclusive, source of words that are likely to be included in the receptive vocabulary of very young
children (i.e. those aged 1;0 to 2;0). Four of the verbs I chose, draw, eat, help, and ride, are included in
this list. The last two, teach and fight, were selected as being highly likely to be known to my subjects,
all over the age of three, on the basis that all were students attending either nursery or primary school at
the time that the study was conducted.
17
This is not intended to imply that the six verbs I selected share all features of syntactic distribution.
The verb fight, for example, takes an understood reciprocal object when it selects for a plural NP
subject; for example, the sentence The men fought implies that the men fought each other (cf. Levin
1993). Since my test sentences involved only single DPs as subjects, however, this particular feature of
the argument structure of fight was not of immediate concern.
There is one distributional difference, however, that I believe may have influenced the findings
reported in §4.5.0.0 and this concerns the verb to help. Of the six verbs selected for use in my study,
this is the only one for which it has been proposed that a deleted object has a contextually specified,
rather than indefinite or generic referent (cf. The invigilator told the students not to help, where the
object of help is most naturally construed as a empty pronoun coindexed with the subject DP) (Ingham
1989:125; see also the discussion in ftnt.15.)
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
283
allow object omission when used in the present progressive or imperative (e.g. The
girl is drawing; Stop fighting!) and therefore contrast with strongly transitive verbs,
which are generally considered unacceptable in the same forms (cf. *?The boy is
chasing; *?Stop reaching!).
Following Rizzi (1986), I assume that object deletion in English is lexically licensed
rather than syntactically licensed as in other languages, such as Italian.18 Yet, if
object deletion is specified as an idiosyncratic property of particular lexical items in
English, as argued by Rizzi, as well as Ingham (1989), then it becomes important to
establish the age at which a child will have acquired this type of knowledge. As
previously observed (see ftnt. 15), Ingham (op.cit.) has experimentally investigated
this question and has claimed that 3-year-olds already display sensitivity to lexical
restrictions on the licensing of object deletion in English. On this basis, then, I felt
reasonably confident in assuming that my own child subjects, who were all over the
age of 3;4, would recognize the availability of both transitive and intransitive readings
of the six common verbs chosen for use in the study.
18
Rizzi (1986) proposes that in English, certain lexical items allow a θ-role associated with an object
argument to be saturated in the lexicon, thereby bypassing the GB projection principle, which requires
that thematic structure is necessarily given syntactic representation. He distinguishes between three
different types of null objects, all of which are lexically licensed in English: (1) those which receive an
arbitrary interpretation (John is always ready to please (people)); (2) those which receive a canonical
or prototypical interpretation (John ate); and (3) those which are interpreted as being definite and
anaphoric to some pragmatically salient element (I know) (see also the discussion in McConnell-Ginet
1982 and Jacobson 1992). For the sake of consistency, and because I was limited to using activity
verbs which are likely to be known to children as young as three, I tried to select only those verbs that
belong to the second of these categories. (Although, as acknowledged in ftnt. 17, above, the verb to
help may represent a single exception to this generalization.)
Nevertheless, while I strove to select verbs for use in the study whose null object admits a prototypical
interpretation, I did not necessarily expect children to assign this type of interpretation to the empty
object argument. That is, as discussed earlier in ftnt. 10, I believe it is possible that children who
access the subject interpretation of the TC allow the null object to take specific reference.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
4.3.1.1
284
Degree constructions (DCs)
For the four DCs included in the study, I will once again illustrate the basic features
of design through use of a specific example, story 14, which preceded presentation of
DC14, The giraffe was too big to ride. Figure 4.8, below, illustrates the test materials
used for the story:
Figure 4.8: Materials used for DC14, ‘The giraffe was too big to ride.’
In this condition, I once again took as the null hypothesis that children share targetlike knowledge of the construction under investigation; accordingly, children, like
adults, should have two readings of the DC available to them. The two options are
illustrated in (7), below, along with a proposed syntactic representation of each. The
reader will note that, as in the previous condition, I have chosen to distinguish subject
and object readings of a DC item according to whether the matrix subject argument
plays the logical role of subject or object with reference to the embedded infinitive
verb:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(7)
285
DC14, The giraffe was too big to ride
a.
Object reading - False
The giraffe was too big (for the pony) to ride.
The giraffek was too big [Opk [PROi to ride tk]].
b.
Subject reading - True
The giraffe was too big to ride (the pony).
The giraffek was too big [PROk to ride proi/the pony].
As regards the experimental hypothesis, I adopted the same formulation as in the TC
condition; namely, I associated this hypothesis with the assumption that children lack
the ability to construct a target-like syntactic representation of an NOS and should
therefore be restricted to the assignment of subject readings only. A complication
arises, however, when it is considered that an adult subject could demonstrate an
exclusive preference for the subject reading of the DC, which would not imply his or
her inability to assign an object reading to the same structure. Thus, by analogy, a
similar pattern of performance on the part of a child cannot be taken as necessarily
indicative of the child’s lack of syntactic competence in interpreting the DC. Instead,
I decided that a child’s exclusive preference for subject readings of DC items could be
taken as supportive evidence for the experimental hypothesis only when such a pattern
of performance was observed to be consistent across all of the NOS tested in the
study.
Returning to the example of DC14, the context of the story was intended to elicit a
true judgment of the sentence, The giraffe was too big to ride, according to a subject
or non-target-like reading of the same. This is because the giraffe’s attempt to ride
the pony is thwarted by his big size. Conversely, according to the story events, a false
judgment would be associated with an object or target-like interpretation of the
sentence, since the pony succeeds in climbing up on the giraffe and is given a nice
ride around a field.
Recall that the condition of plausible dissent requires that on the false interpretation of
the sentence, a corresponding positive judgment of the sentence should have been
under consideration at some previous point in time. For story 14, this condition is
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
286
satisfied when the pony initially expresses concern that he will not be able to climb up
on the giraffe’s back, thus temporarily introducing the possibility that the pony may
not succeed in riding the giraffe. Ultimately, however, the pony succeeds in doing so
after climbing onto the giraffe’s back with the aid of a bale of hay and the object
reading of the DC is correctly judged false.
In the DC condition, I once again chose to vary the direction of the affirmative
response and order of presentation biases across individual items, contrary to the
recommendations contained in Crain and Thornton (1998). As before, test and
control items were distinguished in terms of whether an affirmative response favoured
the object or subject reading of the sentence. For DC14, the story was written so that
the affirmative response bias favoured the subject reading, consistent with its
classification as a control item. The order of presentation also favoured the subject
reading of DC14, since the giraffe’s attempt to ride the pony was presented as the
final event in the story.
As in the case of TC control items, I viewed the main function of DC control items as
providing a balanced opportunity for subjects to associate a true judgment of the TC
with the subject reading of the sentence. As Crain and Thornton have observed, this
consideration is of particular importance when testing ambiguous sentences, since it is
well recognized that both children and adults typically display an interpretive bias
towards one of the two available readings. As previously noted, however, such a
pattern of performance would be relatively less informative than a situation in which a
child demonstrates the ability to access both subject and object readings. Thus, I
hoped that mixing the direction of the affirmative response and order of presentation
biases in individual items, as I had done in the TC condition, would help to counter
any tendency that a subject might feel to assign exclusive subject readings or
exclusive object readings to the DC.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
4.3.1.2
287
Infinitival relatives (IRs)/ Object-gap purpose constructions (OPCs)
Since the IR and OPC are both unambiguous in the adult grammar, most of the basic
features of design for these items were as described in §4.3.0.2 for the TC. 19 In this
condition, I once again associated the experimental hypothesis with the assumption
that the child does not possess the syntactic ability to interpret an NOS and will
therefore be restricted to a non-target-like interpretation of both types of structures.
The formulation of this hypothesis differs from that proposed for the TC and DC,
however, in that I prefer not to make any prediction with regard to the specific
syntactic analysis that the child assigns when a non-target-like interpretation of either
the IR or OPC is accessed. This is because I am aware that competing claims have
been advanced in the literature with respect to this issue, which I review later in this
section (see, e.g., Nishigauchi and Roeper 1987, Goodluck and Behne 1992, or Jones
1992). Thus, in this condition, I resolved simply to take the assignment of any nontarget-like interpretation of either the IR or OPC as providing support for the
experimental hypothesis that children lack the syntactic ability to interpret an NOS.
As in the previous two conditions, test items were associated with the affirmative
response bias, where the true response correlated with a target-like reading of the
IR/OPC, while the bias was reversed in the case of control items. Since IR and OPC
items were limited to a total of two each, I was not able to vary the direction of the
order of presentation bias in the same way that I did for TC and DC items. Instead,
the direction of biases across the four combined IR and OPC items was as follows:
19
Recall that the basic syntactic properties of the IR and OPC were outlined in §2.5.0 of Chapter 2.
As noted there, the IR can be distinguished from the OPC according to the syntactic position in which
the adjunct clause is assumed to attach; for the IR, as in (ia) below, this is standardly considered the N′
level, while for the OPC, as in (ib), it is VP:
(i)
a.
IR: The tigerk found [DP a [NP [N rabbiti ][IP Opi [PROk to eat ti]]]]].
b.
OPC: The clownk [VP [VP bought a dogi] [IP Opi [PROk to ride ti]]].
One important issue that was raised in §2.5.0 concerns the question of whether it is reasonable to
analyse the derivation of the IR as involving null operator movement, given that the validity of this
claim has been challenged in the literature (see, e.g., Contreras 1993). I wish to be clear that my
decision to treat both the IR and OPC as examples of NOS in my study represented a working
assumption only; this is because one of my investigative goals was to determine whether various NOS
are concurrently acquired, a pattern of performance which would provide support for the hypothesis
that the IR, OPC, ODC, and TC share a similar structural analysis.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
288
IR17 – CS/OB, IR18 – CS/SB, OPC19 - TS/OB, and OPC20 – TS/SB.20 (Note that
the abbreviations TS and CS distinguish test and control sentences, and SB and OB
indicate whether the order of presentation of events in the story favoured a subject or
object interpretation of the matrix subject argument.)
I now consider a specific example of an IR item, this being IR17, The soldier found a
pirate to fight. The materials used for this item were as illustrated in Figure 4.9
below:
Figure 4.9: Materials used for IR17, ‘The soldier found a pirate to fight.’
The story that preceded presentation of this item provided the following two contexts
in which the sentence could be interpreted. In the first, a soldier explains that he
wishes only to join a pirate on his ship for a bit of singing and the story ends
accordingly. This was the intended context for the target-like reading of the sentence
20
I am aware that the ordering of these biases is less than ideal since, for example, in the case of the
two IR items, the affirmative response is associated with the non-target-like reading of the sentence in
both instances, while the situation is reversed in the case of the OPC items. Thus, according to my
definition of the terms, both IR17 and IR18 can be considered control items, whilst both OPC19 and
OPC20 can be considered test items. This situation arose as a consequence of the fact that in the
design stage of the study, I had initially classified the two IR items as OPCs. This error was later
brought to my attention by Helen Goodluck (p.c.), but as the items referenced here had already been
put into use, I was unable to effect the necessary modifications without compromising the reliability of
the data that I had already collected (see Anderson 2002a). I return to this issue in §4.5.2 and §4.5.3,
where I consider the effect of the direction of these biases on subject performance.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
289
(i.e. The soldierk did not find a piratei to fight ei), and thus IR17 should be judged
false on this interpretation. In the second context, the pirate identifies the soldier as
someone who typically wants to fight (an event included to satisfy the CPD) and the
pirate begins to fight the soldier until the soldier expresses his true intention in
seeking out the pirate. I speculated that children who lacked target-like knowledge of
the sentence might allow a reading which is supported by the second story context;
this is one in which the embedded subject PRO is interpreted as being co-referential
with the matrix object DP, and the embedded object as being co-referential with the
matrix subject DP. That is, I wondered if some children might erroneously judge the
IR17 true because they assign it a structural analysis as in (8), below:
(8)
The soldieri found a piratek [PROk to fight himi].
While to my knowledge it has not been established that children necessarily allow
such a reading of the IR, Goodluck and Behne (1992) have documented a number of
cases in which their experimental subjects provided what Jones (1992) terms
switched-control readings of the OPC, in which the referential dependencies attested
in the sentence match those exemplified in (8), above. And, as observed by Jones
(ibid.:178), the structural configuration of the OPC does in fact provide two possible
c-commanding antecedents that can act as controllers in the sentence; thus, I believe it
is not inconceivable that a child with a developing grammar might entertain a
switched control reading of the OPC and by extension, a switched control reading of
the IR.21
Of course, it is also possible that a child who accessed a non-target-like interpretation
of IR17 could assign the sentence a syntactic representation as in (9), below, in which
21
For simplicity’s sake, Jones (1992) adopts a definition of c-command in which it is sufficient to
establish that an element is dominated by “some maximal projection” for the control relation that is
described above to obtain. Thus, he is not concerned with the theoretical consequences of segmental
adjunction of the object-gap purpose clause to the matrix VP, in which the object of the matrix verb
would more appropriately be viewed as m-commanding, rather than as strictly c-commanding, the two
empty categories in the adjunct clause (see Jones ibid.:175, ftnt. 2, for further discussion and also N.
Chomsky 1986a:6-8). Like Jones, I prefer not to consider the latter issue, but, in my case, this decision
is motivated by the awareness that control relations cannot be exhaustively defined according to purely
syntactic criteria; accordingly, I believe that an investigation of this issue would take us too far afield
from the matters that constitute my focus here.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
290
there is object control of the embedded subject, but the reference of embedded object
pro remains free:
(9) The soldieri found a piratek [whok was willing] to fight proprototypical
Such an analysis is of course not available in the adult case but would nevertheless
receive some support from the story context, since the pirate does express a
willingness to fight the soldier, which a child could possibly interpret to include a
willingness to fight others in general. In fact, this is the type of representation that I
personally considered more likely to be assigned in the case of IR18, The tiger found
a rabbit to eat, if a child were to entertain a non-target-like reading of the sentence.
That is, I speculated that some children might interpret IR18 along the lines of, The
tigerk found a rabbiti whoi was eating (or, possibly, The tigerk found a rabbiti and the
rabbiti was eating.) As the data I collected in the experiment cannot speak to the
child’s choice of a particular syntactic representation of the IR, however, the issue of
the actual form of the child’s non-target-like interpretation of the construction is one
that must await future investigation.
Turning now to the design of the two OPC items, I will take OPC20, The clown
bought a dog to ride, as a representative example. The test materials used for this
item were as illustrated in Figure 4.10 below:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
291
Figure 4.10: Materials used for OPC20, ‘The clown bought a dog to ride.’
The first possible context in which the sentence could be interpreted involved a clown
who bought a dog so that he could ride the dog and, in doing so, entertain a little girl
who was bedridden in hospital. These story events were therefore intended to support
a true judgment of the sentence, and since this response was associated with a targetlike interpretation of the same, OPC20 was classified as a test item. The second
context, in contrast, involved the dog expressing an interest in riding in a cart pulled
by the clown. Thus it was anticipated that these latter story events could be used to
support a non-target-like interpretation of the sentence, for example, an interpretation
as in (10) below, in which the matrix object argument, a dog, serves as the antecedent
for PRO:
(10)
The clowni bought a dogk [PROk to ride pro*j/*prototypical].
Because the clown did not purchase the dog with the intention of letting the dog ride
in the cart, the interpretation of the sentence represented in (10) is correctly judged
false according to the story context. The CPD, which requires prior consideration of a
corresponding positive judgment of this interpretation of OPC20, is met by having the
dog briefly entertain the notion that the clown will let him ride in the cart, only to
have the clown inform the dog that he (i.e. the dog) is in fact expected to pull the cart.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
4.3.1.3
292
Passive sentences
In addition to the NOS items previously discussed, the main study also featured four
passive sentences. These sentences were included to serve what can be described as a
control function; in particular, I was interested to determine whether any difficulty
that my child subjects experienced on NOS could be linked to a more general
difficulty that these subjects experienced in interpreting displaced object arguments.
As the discussion in Chapter 3 indicated, my decision to concurrently test children’s
comprehension of the TC and the passive is not without precedent in the literature.
Cromer (1970), for example, incorporated such a comparison into the design of his
experimental study. After testing forty-one children between the ages of 5;3 and 7;5,
he reported that while only five of these subjects performed in a target-like manner on
the TC, thirty-eight (or 93%) did so on the two passive sentences that he presented.
There are also a number of studies of the acquisition of the passive alone, in which it
has been reported that children of pre-school age are capable of both comprehending
and producing such sentences in a target-like manner (see, inter alios, Maratsos and
Abramovitch 1975, Maratsos, Kuczaj, Fox, and Chalkley 1979, Maratsos, Fox,
Becker, and Chalkley 1985, Pinker, Lebeaux and Frost 1987, and Fox and Grodzinsky
1998). Thus, my decision to include such items in my own study may seem, in this
respect, superfluous. The reason that I would argue that it is not is because there is
also a body of evidence which indicates that children’s performance on the passive is
not uniformly target-like during the pre-school years and beyond. In particular, the
findings of certain studies point to a particular difficulty that children experience in
processing passive sentences that contain non-actional verbs, such as to like (see, inter
alios, Maratsos et al. 1979; 1985, de Villiers, Phinney and Avery 1982, and Gordon
and Chafetz 1986; see also Pinker et al. 1987:243-4 for similar findings with regard to
children’s production of nonactional passives).
In recognition of the latter findings, I therefore chose to include passive sentences of
both types in my study, specifically, two that featured the actional verbs bite and
chase, and two that featured the nonactional (or experiencer) verbs, hear and watch. I
associated the experimental hypothesis with the assumption that children who lack
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
293
target-like knowledge of passivization will assign what I term an active interpretation
to the passive sentence. For example, according to this hypothesis, I anticipated that
children would interpret a sentence such as The boy was chased by the duck as if it
involved an active rather than passive form of the verb, with the boy therefore
construed as an agent rather than as a patient. Yet I also recognized that a child’s
non-target-like performance might be restricted to only those items containing nonactional or experiencer verbs. As I associated the null hypothesis with the assumption
that children should perform just like adults on passives of both types, I decided to
take either pattern of performance as evidence for the experimental hypothesis.
As in previous conditions, the order of both the affirmative response and order of
presentation biases was varied across individual items, and test and control items were
distinguished in terms of whether the affirmative response bias favoured or did not
favour a target-like interpretation of the sentence. In this condition, however, I
distinguished active (i.e. non-target-like readings) from passive (i.e. target-like
readings) of experimental items, rather than subject versus object readings. The four
individual items were therefore categorized as follows: AP21- CS/AC, AP22 – TS/PS,
NAP23 – CS/PS, NAP24 – TS/AC. (Note that the abbreviations AP and NAP stand
for actional passive and non-actional passive, TS and CS distinguish test and control
sentences, and AC and PS indicate whether the order of presentation of events in the
story favoured an active (AC) or passive (PS) reading of the sentence.)
Taking the single example of the nonactional passive NAP23, The snake was watched
by the rabbits, the materials used for this item were as illustrated in Figure 4.11,
below:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
294
Figure 4.11: Materials used for NAP23, ‘The snake was watched by the rabbits.’
The story preceding presentation of this sentence provided the following two
interpretive contexts. According to one scenario, a snake sat in a tree watching two
rabbits have a picnic on the grass below. Thus, the sentence could be interpreted as
being true according to what I have termed the active or non-target-like reading of the
sentence, and this pairing of affirmative response with non-target-like interpretation is
consistent with its classification as a control item. According to the second scenario,
which was presented last in the story, the possibility is first introduced that the rabbits
might stand and watch the snake after they are made aware of its presence by an alert
hedgehog. This consideration was introduced only to satisfy the CPD, however, since
it is a false rather than true judgment of the sentence that corresponds with later story
events. This is because the rabbits become frightened once they see the snake and
decide to flee. As a consequence, the rabbits do not ever watch the snake, and,
accordingly, the target-like or passive interpretation of NAP23 is correctly judged
false.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
4.3.1.4
295
Filler sentences
I will make only a few brief comments regarding the filler items used in the study,
which are listed in Appendix I. As previously noted, the stories written to accompany
these items were generally shorter in length than those that accompanied either NOS
or passive sentences. Consequently, filler stories typically took thirty to forty seconds
to present, as opposed to an average of sixty seconds for other stories used in the
study. Filler stories were also relatively more straightforward than test or control
items since they were designed to include only a single context in which the sentence
could be judged true or false. For this reason, I anticipated that both adult and child
subjects would find the interpretation of filler sentences less demanding than the
interpretation of test and control items.
Finally, filler sentences were evenly balanced in terms of whether the target-like
response was associated with a true or false judgment of the sentence. I was also
careful to include only lexical items that I believed were likely to be attested in the
receptive vocabulary of a child as young as three.
4.4
Pilot study
A pilot study was conducted approximately one month prior to the main study. Out of
the original group of 122 children who participated in the pre-test described in §4.2,
twelve were chosen for participation in the pilot study. These twelve children, who
ranged in age from 3;3 to 6;8, were also joined by six adult control subjects. Child
subjects were tested at the school they attended, either the Honeypot Pre-School or
Willingham Primary School, both located in the village of Willingham in
Cambridgeshire, U.K. Adult subjects were tested in their own homes.
A combined total of twenty-four test and control items were used in the pilot study,
consisting of twelve TCs, four DCs, two OPCs, two IRs, and four passives, with two
of the latter featuring actional verbs and two featuring nonactional verbs. Individual
subjects were presented with eight to twelve items in total, including filler items. As
a general rule, a filler item was presented after presentation of every two test and/or
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
296
control items. Two filler items were also presented at the beginning of the testing
session in order to familiarize subjects with the requirements of the TVJ task.
The main purpose of the pilot study was to provide a means of evaluating both the
efficacy of the TVJ task as well as the suitability of various test/control items that I
planned to use in the main study. On the basis of the feedback that my subjects
provided during this preliminary study, certain items were modified or even replaced
altogether. One notable finding in this regard concerned performance on the
following two filler sentences: (1) The bird scared the dog, and (2) The bird
frightened the dog. Both were barred from use in the main study because three out of
the five subjects tested on these items, all under the age of 3;11, failed to correctly
judge one or both of these sentences as being false according to the accompanying
story. As the events depicted in these stories were quite straightforward, I believe it is
unlikely that this non-target-like performance could be attributed to misinterpretation
of story details. Instead, I think it more likely indicates that children take some time
to acquire the marked argument structure properties of the verbs in question, to scare
and to frighten, which, in violation of the thematic hierarchy (cf. Grimshaw 1990),
assign the role of experiencer to an object rather than subject argument.22
One other finding from the pilot study caused me to revise the inclusion criterion that
I had originally proposed for subject participation in the main study. This concerns
the performance of two pilot subjects, one, a female, aged 3;7, and the other, a male,
aged 3;11, who consistently judged all experiment items, whether filler, test, or
control, as true, leading me to question whether either properly understood the
experimental task. Interestingly, however, each of these children had performed quite
well on the vocabulary and memory parts of the pre-test, in contrast to a number of
their age-matched peers who had failed either one or both of these tests and had
therefore been excluded from participation in the pilot study. Given the questionable
22
De Villiers, Phinney, and Avery (1982) report that “non-action verbs” (i.e. those whose argument
structure representation involves an experiencer rather than agent) were poorly understood by their
youngest experimental subjects in active as well as passive sentences. Thus, it is possible that the nontarget-like performance I observed in my pilot study reflects a more general problem that children
experience in mapping experiencer arguments to various syntactic positions. I return to this issue in
Chapter 5.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
297
performance of these subjects on the TVJ task, I was forced to consider that
successful performance on parts one and two of the pre-test might not be a suitably
stringent inclusion criterion for participation in the main study. I therefore added the
requirement that in addition to passing both parts of the pre-test, all participants
should consistently demonstrate target-like performance on filler items.
4.5
Results of main experimental study
A total of forty-four children between the ages of 3;4 and 7;5 participated in the study
to be discussed in this section. All forty-four were drawn from the original 122
participants in the pre-test and did not include any of the children who had
participated in the pilot study. As earlier detailed, all of my child subjects were
monolingual native speakers of British English, whose parents were primarily of
middle or working class background. Subjects who attended the Honeypot PreSchool were all under the age of 4;8, and so I was concerned that testing should not
exhaust the limited attention span of these children. Accordingly, I decided to
administer the full set of twenty-four test/control items over four individual sessions,
each approximately twenty minutes in length. For subjects over the age of 4;8, who
attended Willingham Primary School, testing was completed in three sessions, each
lasting approximately twenty-five minutes. Subjects at both locations were, as far as
possible, seen on a weekly basis, although adjustments were made to this schedule in
the event of a child’s absence due to illness or family holidays.
Two experimental assistants were employed as puppeteers, one assistant having
responsibility for pre-school-aged subjects and another for subjects of primary school
age. Both women were familiar to the child participants. The primary responsibility
of the assistant was to present the test sentence – via the puppet - to the subject. As
previously explained, this was accomplished by having the puppet respond to the
experimenter’s prompt, “What happened in that story, Fudge?” While both women
were allowed to adopt a distinctive voice for the puppet during periods of free play,
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
298
they were asked to deliver the test sentence in a normal speaking voice in order to
ensure that it would be properly understood by the subjects.23
There were a total of twenty-two adults who served as control subjects. Unlike child
subjects, each adult was tested on a set of items that comprised only half the number
of items presented to the children. This was because pilot testing had established that
adults were, for obvious reasons, less interested in performing the experimental task
than child subjects and therefore less willing to participate in the multiple testing
sessions which would have been required to administer a full set of over thirty test,
control, and filler items.24 Since I believed that presentation of all of these items in a
single session would strain even adult capabilities, I adopted the compromise solution
of increasing the number of adult subjects from the original eleven to twenty-two,
with two adults sharing a complete set of items between them.
For adult as well as child subjects, the order of presentation of individual
experimental items was randomly varied, subject to the following two exceptions: (1)
Two items testing knowledge of the same type of construction were never presented
consecutively, and (2) A filler item followed the presentation of every two test items
for child subjects and every three items for adults.
23
The assistants were both speakers of British English, as well as residents of the village in which the
testing took place. The test sentences were therefore delivered in a variety of English that was familiar
to all participants. However, the author of this thesis, a speaker of American English, told the TVJ
stories. Potential problems introduced by this situation were addressed in a number of ways. First, I
served as a volunteer classroom assistant during the months preceding the experimental study and
therefore had extensive opportunity to interact with potential subjects prior to their participation in the
study. Second, I administered the pre-test and informally used these sessions to assess any potential for
miscommunication. Finally, all of the vocabulary used in the experimental items was based on norms
of British and not American English. I believe that the effectiveness of these measures is confirmed by
the fact that for the duration of the study, I observed no instances in which a child failed to understand
my use of spoken English.
24
Based on her own experience with use of the TVJ task, Maria Teresa Guasti (p.c.) has suggested that
adult interest and/or attention in the TVJ task can be better maintained if stories are presented on
videotape. One advantage of this technique that I can envision involves some lessening of the
embarrassment that adults may naturally feel when participating in an activity that has been designed to
appeal to children. However, as I wished to avoid introduction of extraneous (i.e. nuisance) variables
that could undermine the validity of the findings I obtained, I preferred to maintain, as far as possible,
parity in the conditions under which child and adult subjects were tested.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
4.5.0
Tough constructions (TCs)
4.5.0.0
Presentation of group findings
299
Recall that the null and experimental hypotheses for this condition were as in
(11a&b), below:
(11)
a. Null hypothesis: The child possesses target-like
knowledge of the TC and so will allow only object
readings of the construction.
b. Experimental hypothesis: The child lacks the ability to
construct a target-like syntactic representation of an
NOS and is consequently able to access only subject
(i.e. non-target-like) readings of the TC.
I also acknowledged the possibility that a child participant could allow both subject
and object readings of the TC. I decided that such a pattern of performance, although
non-target-like, would not be taken as evidence for the experimental hypothesis, as I
think a more narrow formulation of the experimental hypothesis is preferable from the
standpoint of experimental design.
Table 4.11, below, compares the performance of each age group on easy, hard, and
difficult items, as well as on all TC items combined. The figures listed represent the
actual number of target-like responses provided in each condition. Note that in order
to adjust for missing values, the number of target-like responses obtained is at times
expressed as the numerator of a fraction with the total number of responses as its
denominator.25
25
Overall, there were six instances in which a score for an individual item could not be obtained, three
due to procedural error and three due to a subject’s failure to provide a clear judgment of a test/control
sentence.
D.L. Anderson, University of Cambridge
300
Chapter 4: Experimental Design and Presentation of Results
Grp
Ages
easy TCs
(items = 4)
hard TCs
(items = 4)
difficult TCs
(items = 4)
All TCs
(items = 12)
1
3;4 - 4;4
17
(38.6%)
20
(45.5%)
15/43
(34.9%)
39.7%
2
4;6 - 5;5
16/42
(38.1%)
24/43
(55.8%)
18
(40.9%)
45.0%
3
5;6 - 6;3
19
(43.2%)
26
(59.1%)
27
(61.4%)
54.6%
4
6;5 - 7;5
35/43
(81.4%)
34
(77.3%)
38/43
(88.4%)
82.3%
Table 4.11: Total object readings per TC test condition and overall
(NB: Percentages reported in the final column have been adjusted to
account for any missing values in the individual conditions.)
In Figure 4.12 below, I provide a graphic representation of the results that are reported
in the final column of Table 4.11:
1.2
1.0
13
Mean % Object Rdgs - TCs
.8
3
.6
.4
.2
0.0
N=
11
11
11
11
3:4 to 4:4
4:6 to 5:5
5:6 to 6:3
6:5 to 7:5
Age group
Figure 4.12: Boxplot graph of mean percentage object readings
provided by age group on combined TC items
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
301
In Figure 4.12, the solid black line drawn across each box represents the median
percentage of object, or target-like, readings obtained for each group (i.e. the 50th
percentile), while the lower and upper edges represent the 25th and 75th percentiles,
respectively. Variability in the performance of individual subjects is expressed in
terms of the length of the lines (or “whiskers”) that trail from the upper and lower
edges of each box. As can be readily observed, the subjects in the third age group
(aged 5;6 to 6;3) exhibited the greatest variation in individual scores, ranging from a
low of 17% object readings to a high of 92%. However, as the graph also indicates,
considerable individual variation was observed in the first and second age groups as
well, and therefore it is only with respect to those subjects over the age of 6;5 that
individual performance becomes more clearly homogeneous.
There are two outliers identified in the boxplot graph by means of circles (O). These
are subjects no. 3, aged 3;8, and no. 13, aged 4;7, who each obtained a score that was
more than one standard deviation from the mean for their respective age group.26 In
the period preceding the experimental study, these two children had been identified by
their teachers as being of average academic ability. Following the study, however, I
administered the British Picture Vocabulary Test (BPVS) to all child participants (see
discussion in §4.6) and the results revealed that subjects 3 and 13 had each scored in
the 99+ percentile for their respective age group; certainly, then, neither could be
considered average in terms of their vocabulary ability.
Nevertheless, as will be further discussed in §4.6, I did not in fact observe any
necessary correlation between exceptional performance on the BPVS and uniformly
target-like performance on TC items. For example, subject no. 9, aged 4;2, obtained a
BPVS score that was also in the 99th percentile for his age group, and yet he provided
only 58% object readings on TC items overall. According to this consideration, then,
and given the fact that considerable variability was attested in the individual
performance of subjects in this age group (i.e. group 1), I decided not to exclude the
26
Specifically, subject no. 3 provided 75% object readings compared to an age group mean of 39.4%
and subject no. 13 provided 82% compared to an age group mean of 43.9%.
D.L. Anderson, University of Cambridge
302
Chapter 4: Experimental Design and Presentation of Results
data collected from subjects no. 3 and no. 13 from the statistical analysis of my
results.
I now look at performance in each of the three TC test conditions, easy, hard, and
difficult, in more detail. Beginning with easy TCs, Table 4.12, below, reports the total
number of object readings obtained by age group for each of the easy test/control
items, as well as for all four items combined:
easy2
(TS-OB)
easy3
(CS-SB)
easy4
(TS-SB)
All
items
(1 to 4)
Items
1, 3, 4
only
Grp
Ages
easy1
(CS-OB)
1
3;4 - 4;4
4
(36.4%)
4
(36.4%)
4
(36.4%)
5
(45.5%)
38.6%
39.4%
2
4;6 - 5;5
5/10
(50%)
3/10
(30%)
5
(45.5%)
3
(27.3%)
38.1%
40.6%
3
5;6 - 6;3
4
(36.4%)
3
(27.3%)
6
(54.6%)
6
(54.6%)
43.2%
48.5%
4
6;5 - 7;5
10/10
(100%)
5
(45.5%)
10
(90.9%)
10
(90.9%)
81.4%
93.8%
5
Adult
11
(100%)
8
(72.7%)
11
(100%)
11
(100%)
93.2%
100%
Table 4.12: Total object or target-like readings of ‘easy’ TCs
In looking for differences in between-group performance, I first analysed the results
reported in the penultimate column of Table 4.12, using actual counts rather than
percentages for the purposes of statistical analysis. Due to the relatively small
number of subjects per age group, and due to the fact that the data reported in Table
4.12 do not meet all the criteria for the use of a one-way ANOVA27, I opted to use a
Kruskal-Wallis test to analyse between-group performance. This is a nonparametric
alternative to one-way ANOVA that imposes no special requirements on the
27
For example, distribution of values within each sample of data is not uniformly normal.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
303
distribution of the data.28 For groups 1 to 3, the results of the test revealed there to be
no significant difference between the number of object readings obtained for each
group (χ2 (2, N=33) = .138, p < .933), indicating that subjects below the age of 6;3
performed as a single group with respect to easy TCs.
When groups 1 to 5 were compared, however, a significant difference was observed
(χ2 (4, N=55) = 30.514, p < .001), and the same finding obtained when the adult
controls were removed from the analysis and only child groups 1 to 4 were compared
(χ2 (3, N=44) = 15.873, p < .001). Thus, while subjects in the first three age groups
performed as a single population with respect to easy items, these subjects provided
significantly fewer target-like readings than either those in the oldest child group or
the adult control group. Finally, with respect to age group 4 and adult group 5, I used
a Mann-Whitney test to compare the performance of the two, but this failed to reach
significance (U (11,11) = 37.000, p < .133). The performance of subjects above the
age of 6;5 was thus consistent with that of adults on easy TCs.
I next compared performance by each age group on individual easy items. Using a
Cochran’s Q test for distribution of a dichotomous variable across several related
samples, where 0 = subject reading and 1 = object reading, I found no significant
difference in the performance of groups 1 to 3 on any particular easy TC (Grp. 1: Q(3)
= .333, p < .954; Grp. 2: Q(3) = 2.143, p < .543; Grp. 3: Q(3) = 3.000, p < .392). That is,
I found no evidence that subjects in any of the first three age groups experienced
relatively greater difficulty with a particular easy item or items. Since the distribution
of subject and object readings was observed to be statistically similar across all four
easy TCs, I submit that these findings therefore suggest that, contrary to the
predictions contained in Crain and Thornton (1998) (see discussion in §4.3.0.2), the
28
This does not imply, of course, that the use of nonparametric tests is requirement-free. For example,
use of the Kruskal-Wallis test requires that the samples of data to be compared have equal variances.
With respect to the data reported in the penultimate column of Table 4.12, homogeneity of variance for
groups 1 to 4 was determined through use of a Levene test.
A caveat should also be mentioned with regard to the use of nonparametric statistical tests. While
these tests require less stringent assumptions about the data to be analysed, they are also considered
less powerful than their parametric counterparts. Consequently, it is possible that use of a
nonparametric test may fail to reveal a significant difference that does in fact exist between two or
more compared samples. Thus, this consideration should be kept in mind when interpreting the results
reported above.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
304
direction of the affirmative response and/or order of presentation bias in individual
easy items did not exert any detectable influence on children’s choice of a subject or
object reading of the TC.
In the case of age group 4, however, the distribution of object readings was found to
differ significantly across the 4 easy items (Q(3) = 12.000, p < .007). Subsequent
statistical testing established that this difference could be attributed solely to
performance on a single item, easy 2, for which object readings represented less than
50% of the total responses obtained, as compared to 90% of the total responses
obtained in the case of the other three items. Moreover, the same consideration was
found to underlie the significant difference observed in the performance of adult
controls when compared across the four TC items (Q(3) = 9.000, p < .029), since adults
made three errors on easy 2 but no errors on the remaining three items.
With regard to the non-target-like performance of the three adult subjects on easy 2, it
is important to note that I found no evidence that these particular adults, nor any other
of my control subjects, ever assigned a subject reading to a TC. Rather, I observed
throughout the study that adult judgments that deviated from expected norms could
most often be explained in one of the following two ways: (1) subject inattention to
story details, or (2) the subject’s use of general or world knowledge, as opposed to
specific story context, when determining an appropriate interpretation of the test
sentence.
Based on the types of explanations offered by the three adults in question, I suggest
that it is the latter of the two factors that is implicated in their non-target-like
performance on easy 2. According to the design of the story that preceded
presentation of the test sentence The boy was easy to help, it was anticipated that a
true response would be associated with the object reading of the sentence, since a
fireman finds it easy to come to the aid of a boy. Consistent with this prediction, I
observed that some child and adult subjects did provide such an explanation of their
target-like judgment of easy 2, as illustrated in (12), below:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(12)
305
E(xperimenter): Why was the boy easy to help?
a. “Cause the fireman get the steps.” (female 4;3)
b. “All he (= fireman) needed was a ladder.” (male 6;0)
c. “Because the fireman could easily get into the cage.”
(adult subject #19)
For child subjects who judged the test sentence false, it was anticipated that they
would explain their non-target-like response in terms of the fact that, according to the
context of the story, the little boy was unable to help the fireman open the cage door
because the little boy had hurt his legs when he fell into the cage. I recorded nine
such explanations, three of which are illustrated in (13), below. (Note that ‘+’ marks
a pause or hesitation.)
(13)
E: Why was the boy not easy to help?
a. “He couldn’t help cause he was stuck.” (male 3;8)
b. “Because the boy was in the cage and he couldn’t help
++ the fireman.”(female 4;9)
c. “Because umm he had hurt his legs and + and he
couldn’t open the gate.” (female 5;3)
Notably, none of the three adults who erroneously judged the test sentence false gave
the type of explanation exemplified in (13); instead, they offered explanations which
suggested that their interpretation had been influenced by extra-contextual
considerations. For example, although the fireman in the story had openly remarked
about the lack of difficulty he had experienced in rescuing the child, stating, “See, I
told you it would be no trouble to help you, little boy,” it seemed that these three
adults had nevertheless viewed the series of actions that the fireman undertook in his
rescue attempt to involve a certain degree of difficulty. As one adult female
explained, “That’s not easy to help him (= boy) if he (= fireman) can’t get an obvious
doorway to get in.” When questioned as to whether she had taken note of the
fireman’s own favourable assessment of his efforts, the subject replied, “I heard what
he said but (I) still thought it was hard for him to get in. I thought he was just sort of
trying to calm the boy down.” And, similarly, another female adult explained, “We’re
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
306
brought up to believe that firemen – well, we’ve been through this – they always say
it’s no trouble regardless of whether it is.”
Regrettably, pilot testing of this particular item failed to reveal it as problematic,
given the relatively limited number of subjects involved in the pilot study, and the
item was therefore retained for use in the main study. As atypical performance by
adult subjects on easy 2 raises legitimate questions regarding the suitability of this test
item, I carefully reviewed the explanations provided by my child subjects, looking for
similar patterns of performance. I found two such subjects in age group 4, whose
explanations suggest to me that their interpretation of the test sentence could have
been influenced by the same types of extra-contextual considerations discussed above.
Therefore, I felt it was prudent to re-run my statistical analysis of subject performance
on easy items, excluding all of the data collected for easy 2.
This reanalysis of the data, however, produced no different results than those
previously reported in connection with all four easy items; that is, groups 1, 2, and 3
were still observed to perform as a single population and to differ from groups 4 and
5, which similarly performed as a single group.
I now turn to performance on TC items featuring the adjective hard. Table 4.13
reports the total object readings obtained per age group for each of the hard
test/control items, as well as for all four items combined:
D.L. Anderson, University of Cambridge
307
Chapter 4: Experimental Design and Presentation of Results
hard5
(CS-OB)
hard6
(TS-OB)
hard7
(CS-SB)
hard8
(TS-SB)
All
items
(5 to 8)
3;4 - 4;4
4
(36.4%)
7
(63.6%)
5
(45.5%)
4
(36.4%)
45.5%
39.4%
2
4;6 - 5;5
5
(45.5%)
5
(45.5%)
9/10
(90%)
5
(45.5%)
55.8%
59.4%
3
5;6 - 6;3
7
(63.6%)
4
(36.4%)
7
(63.6%)
8
(72.7%)
59.1%
66.6%
4
6;5 - 7;5
8
(72.7%)
5
(45.5%)
11
(100%)
10
(90.9%)
77.3%
87.9%
5
Adult
11
(100%)
9
(81.8%)
10
(90.9%)
11
(100%)
93.2%
97%
Grp
Age
1
Items
5 ,7, 8
only
Table 4.13: Total object or target-like readings of hard TCs
A statistical analysis of the results reported in the penultimate column of Table 4.13
revealed that for groups 1 to 3 there was no significant difference in performance on
the four combined hard items (Kruskal-Wallis, χ2 (2, N=33) = 3.326, p < .190). A
significant difference was obtained, however, when the performance of groups 1 to 5
was compared (χ 2 (4, N=55) = 27.041, p < .001), and also when groups 1 to 4 were
compared (χ 2 (3, N=44) = 11.606, p < .009). Because the results of a subsequent
Mann-Whitney test revealed no significant difference in performance (U (11,11) =
34.000, p < .088) between groups 4 and 5, these results parallel those reported for
easy items. That is, groups 1 to 3 performed as a single population with respect to
hard items and differed from child subjects in group 4 and adult subjects in group 5,
who performed as a single group.
For groups 1 to 3, a within-groups comparison (i.e. a Cochran’s Q-Test) of
performance on individual hard items revealed no significant difference in the
distribution of object readings for any of these items. Thus, subjects below the age of
6;3 did not experience any relatively greater difficulty with a particular hard item or
items. Once again, then, the direction of the affirmative response and/or order of
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
308
presentation bias did not have an appreciable effect on subject interpretation of
specific hard items, pace the predictions made by Crain and Thornton (1998).
For group 4, however, a similar comparison did reveal a significant difference (Q(3) =
10.500, p < .015), which subsequent statistical testing located to a contrast in this
group’s performance on hard 6, as compared to other hard items. Specifically, fewer
than half the subjects in group 4 provided target-like responses for hard 6, while such
responses averaged 83% of the total responses collected in the case of the other three
hard TCs. Additionally, I noted two adult subjects who failed to give a target-like
response for this same item.29
Certain changes had been made to the design of hard 6 after pilot testing, although, as
in the case of easy 2, some of the problems associated with this item regrettably did
not become evident until it was tested on greater numbers of child and adult subjects
in the main study. Problems associated with hard 6 differed somewhat from those
associated with easy 2, although I must acknowledge the possibility of related
complications given that these were the only two TCs in the study that featured the
embedded verb to help. The story that accompanied hard 6, The rabbit was hard to
help, first presented a context in which a girl asks a rabbit for help in finding her lost
29
As Table 4.13 indicates, there was also one adult subject who gave a non-target-like response to
hard 7, The hedgehog was hard to ride. This error, however, was found to be wholly attributable to the
subject’s inattention to story details, with the subject believing that the hedgehog was hard for the frog
to ride because it had prickly fur. In explaining her non-target-like response, the subject herself caught
her mistake (i.e. that the hedgehog was a baby and so had soft fur) and subsequently changed her initial
judgment of the sentence without any prompting.
As previously noted, inattention was one of the primary factors contributing to non-target-like
performance by my adult subjects. However, this factor was only intermittently correlated with
problematic performance, as I recorded a number of instances in which adult subjects admitted to
momentary lapses in attention while TVJ stories were being told yet proceeded to assign a correct
interpretation to the test/control sentence. Therefore, in considering the range of acceptable adult
performance, I chose to consider one non-target-like response out of twelve TC items as a reasonable
margin of error. In fact, at the completion of the study, I found that on average adult subjects had
provided eleven correct responses out of twelve, rather than twelve out of twelve.
Finally, I find it interesting that, in my experience, inattention to story details was not a problem that
generally affected my child subjects. This is because child subjects of all ages typically proved able to
recount even the most minor details of stories and would often provide this type of information without
prompting. I took this behaviour as a strong indicator of subject interest in the task and in the content
of the stories themselves.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
309
spoon but is unable in turn to help the rabbit get out of his hutch. Test materials for
this item were as illustrated in Figure 4.13, below:
Figure 4.13: Materials used for hard 6, ‘The rabbit was hard to help.’
Ultimately, the rabbit is able to make his own way out of the hutch and he succeeds in
helping the little girl find her spoon. On the object reading of the sentence (i.e. The
rabbiti was hard to help ei), both child and adult subjects were therefore expected to
judge the sentence true and to explain their judgment in terms of the difficulty that the
little girl experienced in trying to free the rabbit from his hutch. I recorded a number
of explanations of this type, which are exemplified in (14), below:
(14)
E: Why was the rabbit hard to help?
a. “Because + because the girl couldn’t get the rabbit out.”
(male child 6;5)
b. “Because she wasn’t strong enough to open the hutch.”
(adult subject 1)
Conversely, I anticipated that child subjects who accessed the subject reading of the
sentence (i.e. The rabbiti found it hard PROi to help the girl), would judge the
sentence false and explain their judgment in terms of the ultimate success that the
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
310
rabbit experienced in helping the little girl find her spoon. Explanations of this type
are illustrated in (15), below:
(15)
E: Why was the rabbit not hard to help?
a. “Umm, the rabbit could help her.” (female 4;8)
b. “Umm he was easy to help + cause he could find it
(= spoon) right under this chair.” (male 6;1)
Turning now to the aberrant responses that were provided by the two adult subjects
referenced above, both provided an explanation of their non-target-like response that
differed in kind from the type of explanation offered by the children in (15).
Specifically, the two explanations in question were as listed in (16), below:
(16)
E: Why was the rabbit not hard to help?
a. “It was the little girl that wanted the help + + as
opposed to the rabbit.” (subject A24)
b. “Because the rabbit actually got out of there by itself so
although she stood on the basket and tried to help, umm
+++ I think probably she gave the rabbit the opportunity
to sort of help itself, really.” (subject A2)
In (16a), the subject’s response appears to indicate that the little girl’s request for help
was, at least for this subject, a more salient event than the rabbit’s request for help.
Rather more problematically, in the example listed in (16b), the relevant consideration
instead pertains to the issue of whether attempts to help the rabbit should be attributed
to the girl or to the rabbit himself. Clearly, in each case, the rabbit is perceived as the
grammatical object of the embedded verb, and therefore in neither case is it
appropriate to assume that the adult has assigned a subject reading to the sentence.
Yet, at least in the second case, I think it is possible that this adult’s interpretation of
the sentence might have been something along the lines of It wasn’t hard for the
rabbit to help himself get out of the cage. If so, this would certainly be problematic
since the standard syntactic representation of the TC does not permit such an
interpretation (cf. *The rabbit was not hard to help himself).
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
311
I also noted six children who offered superficially similar explanations to those
illustrated in (16a&b).30 For example, two of these children explained their false or
non-target-like interpretation of The rabbit was hard to help as follows:
(17)
a. Puppet: Why was I wrong?
“Cause umm you said the rabbit + the rabbit was too
hard to get out.” (female 4;3)
b. E: What were you thinking about when you said
Fudge was wrong?
S: “Umm, well, the rabbit umm was quite clever
because he jumped up in the air and got hisself [sic]
out.” (female 7;4)
With respect to (17a), I think the child’s explanation of her false judgment of test
sentence, in the context of the story presented, only properly accords with an
interpretation of the TC along the lines of, It was not hard for the rabbit to help
himself get out of the cage (i.e. *The rabbit was not hard to help himself). Of course,
given that this child has chosen to explain her interpretation of the test sentence by
offering a DC, the possibility remains that the meaning of her explanation was The
rabbit was not too hard for the girl to help get out of his cage. But since this meaning
of the ODC flatly contradicts the story details, I think it is unlikely to be the one that
the child had in mind. (As an aside, I find it interesting that this child would choose a
DC to explain her interpretation of a TC, given that I argued in §2.4 of Chapter 2 that
the base representation of the latter structure most likely includes a projection for a
degree constituent, which may remain optionally unfilled.)
As for (17b), I submit that the child’s explanation of her false judgment of the TC,
The rabbit was too hard to help, suggests even more clearly that she has interpreted
the TC to mean, The rabbit was not too hard to help himself (get out of his cage).
Finally, I am aware of one additional subject, aged 6;0, who seems to have also
30
Notably, four of these six subjects were in the oldest age group (i.e. group 4) and these were the
only children in this group to give non-target-like judgments of hard 6. Thus this pattern of
performance, while potentially problematic for the reasons discussed above, nevertheless is consistent
with the statistical analysis of the data that I performed, which revealed that child subjects in age group
4 generally performed like adults with respect to hard items.
D.L. Anderson, University of Cambridge
312
Chapter 4: Experimental Design and Presentation of Results
accessed a reflexive interpretation of the verb to help, since when questioned as to
who was helping the rabbit in the story, he replied, “The rabbit was helping hisself
[sic].”
Therefore, on the basis of the evidence reviewed above, I thought it reasonable to
question whether certain child (and, possibly, adult) subjects in the study may have
accessed an unintended interpretation of the test sentence, thus leading to the loss of
some measure of contextual control in the use of this particular item. Given this
concern, I decided to re-run my statistical analysis of between-group performance on
individual hard TCs, with hard 6 removed from the calculations. However, as was
previously reported in connection with my adjusted analysis of performance on easy
items, I found that the statistical results obtained for hard TCs remained the same
whether or not hard 6 was included in the calculations. That is, groups 1 to 3 still
performed as a single population and differed from groups 4 and 5, which performed
as a single group.
Lastly, I look at between-group performance on difficult TCs, which is reviewed in
Table 4.14 below:
Grp
Ages
diff 9
(CS-OB)
diff 10
(TS-OB)
diff 11
(CS-SB)
diff 12
(TS-SB)
All
items
1
3;4 - 4;4
4
(36.4%)
4
(36.4%)
3
(27.3%)
4/10
(40%)
34.9%
2
4;6 - 5;5
4
(36.4%)
4
(36.4%)
5
(45.5%)
5
(45.5%)
40.9%
3
5;6 - 6;3
8
(72.7%)
5
(45.5%)
4
(36.4%)
10
(90.9%)
61.4%
4
6;5 - 7;5
10
(90.9%)
8/10
(80%)
10
(90.9%)
10
(90.9%)
88.4%
5
Adult
9
(81.8%)
10
(90.9%)
11
(100%)
10
(90.9%)
90.9%
Table 4.14: Total object or target-like readings of difficult TCs
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
313
As was reported in the case of both easy and hard, I found no significant difference in
between-group performance on difficult when groups 1 to 3 were compared through
application of a Kruskal-Wallis test (χ 2 (2, N=33) = 3.770, p < .152).31 Therefore,
regardless of the particular tough adjective tested, subjects in the first three age groups
performed as a single population. Application of the same test did, however, reveal a
significant difference when the performance of groups 1 to 5 was compared (χ 2 (4,
N=55) = 27.679, p < .001), and also when groups 1 to 4 were compared (χ 2 (3,
N=44) = 18.470, p < .001). A subsequent comparison of groups 4 and 5 through
application of a Mann-Whitney test did not reach significance (U (11, 11) = 52.000, p
< .606), however. Thus, paralleling the findings reported for both easy and hard TCs,
the findings for difficult indicate that children above the age of 6;5 performed
statistically like adults.
As on hard TCs, adult performance on difficult items was not uniformly target-like.
Nevertheless, I do not view this situation as particularly problematic: This is because
with the exception of the two errors reported for difficult 9, which could both be
attributed to subject inattention, 32 the single error reported for each of the other three
difficult items still fell within what I have defined as the acceptable range for adult
performance (see the discussion in ftnt. 29).
31
I earlier observed (see ftnt. 28) that use of the Kruskal-Wallis test is standardly based on the
assumption that data samples have equal variances, although it does not require equal distribution. In
the case of performance on difficult items, however, the results of a Levene test for equality of
variances amongst groups, both with and without controls, was negative. Consequently, because one
of the basic assumptions for use of the Kruskal-Wallis test was not met in the case of the data obtained
for difficult, I wish to make clear that statistical results reported for this set of data may be of lesser
reliability than those reported in the case of either easy or hard items.
32
This determination is based on remarks made by the two adult female subjects during follow-up
questioning. In the story accompanying TC9, The king was difficult to draw, one scenario involved a
king who experienced difficulty in drawing a picture, while the other involved a princess who initially
expressed some reservation about her ability to draw a picture of her father, the king, but in the end
found the task quite easy. The first subject admitted that she hadn’t been “concentrating very hard”
during the story-telling phase of the task and, consequently, could recall only the part of the story in
which the princess had expressed concern regarding her ability to draw her father. The second subject
explained her positive judgment of the control sentence as follows: “I suppose + I suppose because
looking at it (i.e. picture of the king), I’d have a job to draw it.” Her response therefore was consistent
with a misinterpretation of TC9 as meaning something like The king would be difficult for anyone to
draw. She, like the first subject, later admitted that inattention had played a role in her (erroneous)
positive evaluation of the sentence, remarking that, “I think my mind went more on the picture than
(on) listening to what you were saying.”
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
314
I next analysed within-group performance across each of the four difficult items. For
age groups 1 and 2, a Cochran’s Q Test revealed no significant difference in group
performance, and the same observation held true for groups 4 and 5. A significant
difference in the percentage of object readings obtained was detected for group 3,
however, (Q(3) = 13.000, p < .005), which was subsequently located to the contrast
between items 11 and 12. These two items differ in the direction of the affirmative
response bias, with difficult 11 favouring the subject or non-target-like reading and
difficult 12 the object reading, but share a similar order of presentation bias in favour
of the subject reading. In the case of item 11, which had a double bias towards the
subject reading, these readings did account for a greater proportion of responses but
only by a relatively narrow margin (i.e. 55% subject readings versus 45% object
readings). In contrast, in the case of the mixed-bias difficult 12, children in the same
age group favoured object readings of this item by 90%. Nevertheless, I am reluctant
to assign any great importance to this particular finding, given that this is the only
instance I observed in which a between-item comparison reached statistical
significance and given that once again I am unable to draw any clear correlation
between the direction of the affirmative response and order of presentation biases and
the observed pattern of subject response.
In closing this section, I analyse overall TC performance, both within and between
age groups. I used a Friedman test to compare within-group performance across
combined easy, combined hard, and combined difficult items in order to determine
whether subjects experienced any relatively greater difficulty with a particular tough
adjective or adjectives. The results failed to reach significance, however, regardless
of whether potentially problematic items easy 2 and hard 6 were retained or removed
from the analysis. Thus, it appears that subjects in the study did not experience any
relatively greater difficulty with the interpretation of a particular tough adjective. My
findings therefore contrast with those reported by McKee (1997a), since McKee
observed an overall interaction between age group and adjective type. However, as
discussed in §4.2.0.0, I believe there is reason to question the reliability of certain of
the data obtained by McKee, since this included responses collected for TC items that
contained adjectives whose meaning was unknown to the subject.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
315
With respect to the between-group performance of my subjects, recall that I reported
in the previous section that for each of the three TC test conditions (i.e. easy, hard,
difficult), there was no significant difference observed when the performance of
children in groups 1, 2 and 3 was compared. The performance of children in group 4,
however, was observed to differ significantly from that of children in the younger
three age groups and to match that of the adult controls. As anticipated, these
condition-specific findings were confirmed when I made a final comparison of age
groups in terms of overall performance on the twelve TC items. This comparison was
performed using the full set of twelve items, as well as a reduced set of these items
with easy 2 and hard 6 removed. With only one exception, the results confirmed
those earlier reported for easy, hard, and difficult conditions. The single exception
concerns the performance of age groups 4 and 5, for whom the difference in overall
TC performance was found to approach significance according to the results of a
Mann-Whitney test (U (11, 11) = 31.500, p < .056). However, this result applies only
when the full set of TC items is considered, since when easy 2 and hard 6 are
removed from the comparison set, no significant difference is obtained (U (11, 11) =
41.000, p < .217). Thus, given the concerns raised earlier about the suitability of
these two particular experimental items, I prefer to accept the latter of these two
findings as the more reliable.
4.5.0.1
Analysis of individual performance on the TC
In this section, I analyse the performance of individual subjects on TC items
according to the three-way classification first proposed by Cromer (1970), which
distinguishes Primitive-rule Users, Intermediates, and Passers. As discussed in
§3.2.1.0 of Chapter 3, Primitive-rule (P-R) Users are so-named because these children
are hypothesized to interpret the TC according to the use of a surface structure
heuristic or primitive rule, which treats the matrix subject DP as being co-referential
with the subject of the embedded infinitive verb. Thus, the criterion for classification
as a P-R User is that the child should consistently interpret the TC in a non-target-like
manner. Intermediates, in contrast, are defined as those children who provide mixed
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
316
target-like and non-target-like readings of TCs and Passers as those who provide
target-like readings only.
One of the more notable findings of my investigation of children’s comprehension of
the TC was that there were no subjects in my study who could be classified as P-R
Users, according to Cromer’s original criterion of performance. That is, if I follow
Cromer in defining P-R Users as those subjects who gave no target-like readings of
TC items, then none of my forty-four subjects can be so classified. When analysing
the performance of my adult subjects on TCs, however, I earlier noted that I
considered one non-target-like response per twelve items to be a reasonable margin of
error for these subjects. Accordingly, if I revise the criterion for P-R Use to allow my
child subjects the same margin of error, 33 I still, rather surprisingly, observe only one
child in the study, a male, aged 3;8, who can be classified as a P-R User.
It is perhaps misleading, however, to classify subjects according to performance on all
twelve TC items when easy 2 and hard 6 have been identified as being problematic.
Therefore, I re-analyzed the data, excluding responses collected for easy 2 and hard 6.
Using Cromer’s criterion of no target-like responses, the results revealed that, once
again, only the single male subject referenced above could be classified as a P-R User.
And even using the more liberal criterion of allowing one target-like response out of
the now ten total items, I still found only three subjects who met this criterion: subject
no. 4 (already referenced), subject no. 22, a male, aged 5;4, and subject no. 25, a
female, aged 5;8. Thus, according to the most liberal criterion I was willing to accept,
P-R Users still comprised less than 10% of the forty-four children involved in the
study.
With respect to Intermediates and Passers, I again first classified subjects according to
Cromer’s strict criteria, using the full set of twelve TC items as a basis for analysis. I
found only two subjects, no. 36, a male, aged 6;8, and no. 38, a female, aged 6;10,
33
I note that in a later study, Cromer (1983a:313) actually proposed a similar revision of his original
criteria for classification of performance. Allowing for the possibility of occasional inattention to the
task, as I have here, he defined P-R Users as those providing one target-like response or less out of ten
possible and Passers as those providing no more than one non-target-like response out of ten.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
317
who could be classified as Passers, having provided only target-like responses. And
since, as earlier noted, all subjects in the study provided at least one target-like
response, the remaining forty-two subjects (i.e. 96%) would thus obligatorily be
classified as Intermediates according to Cromer’s (1970) original criteria. On the
other hand, if I adopt the more relaxed criterion of allowing one target-like response
per P-R User and one non-target-like response per Passer, and additionally use only
the reduced set of ten TCs as the basis for analysis, I find a somewhat more balanced
breakdown of subject performance, with three subjects performing as P-R Users,
thirty as Intermediates, and eleven as Passers.
Notably, under either of the analyses discussed above (i.e. Cromer’s criteria plus full
set of items, as opposed to relaxed criteria plus reduced set of items), Intermediates
still comprise the majority of my subjects (i.e. 68% according to the relaxed criteria),
and thus a sizeable number of my subjects were observed to provide both target-like
and non-target-like readings of the TC. Interestingly, this type of performance was
not associated with a particular age group, since Intermediates were found in all four
of the child groups I tested and ranged in age from 3;4 to 7;3. Passers, too, were
represented in all age groups in my study, with the exception of group 1, and ranged
in age from 4;7 to 7;4. The age range was not so wide for P-R Users, extending from
3;8 to 5;8; nevertheless, I found it somewhat surprising that the P-R Users were not
confined to the youngest group but instead included two children over the age of five.
The results thus far reported are comparable with those reported by McKee (1997a),
who further observes (p.c.) that there were no subjects out of the sixty-four included
in her study who could be classified as P-R Users, according to Cromer’s original
criteria. I think this is a striking observation given the sizeable number of subjects
involved in McKee’s study and the fact that her subjects included children
considerably younger (e.g. 1;11) than those involved in any other study reviewed in
Chapter 3. Specifically, according to Cromer’s original criteria, McKee reports that
fifty-three of her subjects could be classified as Intermediates and eleven as Passers.
Even adopting the more relaxed criterion of allowing one exceptional response per
either P-R User or Passer, there was still only one child in McKee’s study, aged 2;0,
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
318
who could be considered a P-R User, with forty-nine of her remaining subjects
performing as Intermediates and fourteen as Passers. Therefore, similar to the
findings reported in my own study, the majority of McKee’s subjects (i.e. 49/64 or
76.6%) were observed to have provided both target-like and non-target-like
interpretations of the TC.
Of course, it is not possible to directly compare McKee’s results with my own since
the two studies involved subjects who do not completely overlap in age as well as
different methodologies. Nevertheless, I believe the similarities observed here are
still of great interest, particularly when the two sets of findings are compared with
those reported in certain earlier studies. For example, as discussed in Chapter 3,
Cromer (1970) tested forty-one children between the ages of 5;3 and 7;5 and reported
that approximately 41% of these subjects could be classified as P-R Users. Holding
the age range constant between 5;3 and 7;5, these results can be directly contrasted
with those obtained in both Kessel (1970) and my own study (Anderson 2002a,b),
since the latter two studies included children in this same age range. As Table 4.15
indicates, when individual subject performance in the three studies is compared in this
manner, some rather striking differences emerge.34 (Note that in the interest of
providing a more accurate comparison, I have used Cromer’s original criteria for
classification of subjects and have reported my own findings only with respect to the
reduced set of ten TC items.)
34
For Kessel (1970), the figures reported in Table 4.15 have been calculated on the basis of the
general findings reported in his study, since he did not provide a specific breakdown of subject
performance according to the three-way classification employed here.
D.L. Anderson, University of Cambridge
319
Chapter 4: Experimental Design and Presentation of Results
Cromer (1970)
Kessel (1970)
Anderson (2002a,b)
P-R Users
17 (41.5%)
1 (5%)
0
Intermediates
19 (46.3%)
8 (40%)
20 (83.3%)
Passers
5 (12.2%)
11 (55%)
4 (16.6%)
No. of subjs.
41
20
24
No. of items
4
4
10
Lexical items
tested
easy, hard, fun,
tasty
easy, hard,
impossible
easy, hard, difficult
Methodology
act-out task
Piagetian
interview
TVJ task
Table 4.15: Comparison of the individual performance of subjects between the ages
of 5;3 and 7;5 in Cromer (1970), Kessel (1970), and Anderson (2002a,b)
Most notable amongst the differences observed in Table 4.15 is the considerable
discrepancy that exists between the number of P-R Users reported in Cromer, as
compared to either Kessel or Anderson. When considered in conjunction with the
similarly low number of P-R Users observed in McKee (1997a), I submit that the
evidence reviewed in Table 4.15 provides reason to question whether Cromer’s
subjects were appropriately classified. In particular, I think it is likely that a number
of the subjects that Cromer classified as P-R Users may in fact have been
Intermediates, with their linguistic abilities thus underestimated.
One factor that I believe may have contributed to the possible misclassification of
subject ability in Cromer concerns his choice of lexical items for use in his test
materials. For example, I earlier noted (see §3.2.1.0 of Chapter 3) that two of the
adjectives he classified as tough or O-type, fun and tasty, do not meet my own criteria
for inclusion in the tough class. Thus, I submit that not all of the data Cromer
collected may be equally reliable as regards the acquisition of the TC.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
4.5.0.2
320
Explanations of judgments of TC items
I have thus far resisted describing the TC results reported here as representing chance,
above-chance, or below-chance performance on the part of my subjects, although I
recognize that this is standard practice in the description of experimental behaviour.
My reluctance to use these descriptive terms is justified, I think, by the evidence to be
presented in this section, which was collected during follow-up questioning of child
subjects (see also the supplementary evidence cited in Appendix II). The reader will
recall that, according to the methodology employed in the study, subjects were
typically asked to provide some explanation for their judgment of the truth-value of a
particular test/control sentence; for example, after offering her true or false judgment
of a sentence, the subject would be asked a question such as “So why was the monkey
not easy to teach?” Although follow-up questions were not uniformly administered to
all subjects, both the quality and quantity of the data that I was able to collect leads
me to believe that, as a general rule, my subjects did not utilize guessing strategies in
assigning an interpretation to the TC. Instead, anticipating the discussion to be
contained in the final chapter of this thesis, I contend that children enter a relatively
prolonged developmental period during which they have access to two interpretations
of the TC, which I have termed a subject (i.e. non-target-like) reading and object (i.e.
target-like) reading.
As earlier noted, follow-up questions were not asked of all subjects, nor did these
questions necessarily take the same form in each instance. This is because I was
sensitive to the fact that extensive questioning, especially of younger subjects, could
prove overly taxing. A further consideration was that I was concerned to limit the
length of individual testing sessions to no more than twenty-five minutes for schoolage subjects and twenty minutes for those of pre-school-age, while still allowing for
the administration of a pre-determined and optimum number of TVJ items.
Accordingly, I felt that random administration of follow-up questions would be
sufficient to ensure subject attention.
Because TVJ stories were designed to provide two distinct contexts in which a
sentence could be judged either true or false, I anticipated that children would justify
D.L. Anderson, University of Cambridge
321
Chapter 4: Experimental Design and Presentation of Results
their judgment of the test/control sentence by citing specific story events associated
with their chosen interpretation. I was pleased, then, to observe that my child subjects
generally did provide explanations of just this type and even did so on occasion
without any external prompting.
Table 4.16, below, provides an overall summary of subject performance in what I will
henceforth term the post-judgment phase of the TVJ task. For each TC item, the table
first lists the total number of subjects who were asked to provide an explanation of
their judgment of the sentence, next, the number who were not asked to do so, and,
lastly, the number who provided a spontaneous or unprompted explanation.
TC
Total
subjsa
Prompted
Not asked
Unprompted
easy 1
42
24 (57.1%)
15 (35.7%)
3 (7.1%)
easy 2
43
26 (60.5%)
13 (30.2%)
4 (9.3%)
easy 3
44
35 (79.5%)
4 (9.1%)
5 (11.4%)
easy 4
44
23 (52.3%)
19 (43.2%)
2 (4.5%)
hard 5
44
39 (88.6%)
1 (2.3%)
4 (9.1%)
hard 6
44
33 (75%)
8 (18.2%)
3 (6.8%)
hard 7
43
34 (79.1%)
5 (11.6%)
4 (9.3%)
hard 8
44
36 (81.8%)
3 (6.8%)
5 (11.4%)
diff 9
44
28 (63.6%)
8 (18.2%)
8 (18.2%)
diff 10
43
24 (55.8%)
10 (23.3%)
9 (20.9%)
diff 11
44
33 (75%)
6 (13.6%)
5 (11.4%)
diff 12
43
30 (69.8%)
11 (25.6%)
2 (4.6%)
Table 4.16: Number and type of explanations offered for TC items
( a Note that percentages reported in the table have been adjusted to reflect missing data values.)
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
322
As the figures in Table 4.16 indicate, the percentage of subjects who were asked to
explain their judgment varied from a low of 52.3% in the case of easy 4, to a high of
88.6% in the case of hard 5. Therefore, for each item, at least 50% of the subjects
were asked to explain their true/false judgment of the TC.
Table 4.17, below, provides a breakdown of the types of explanations offered by child
subjects according to whether or not the child’s response could be viewed as
providing support for their chosen interpretation of a particular TC.35 This
determination was made according to the following considerations. If the child’s
explanation referenced the story context that was intended to support their chosen
interpretation of the sentence, then this was taken as an appropriate (i.e. supportive)
explanation. If, however, the child provided one interpretation of the test/control
sentence but then referenced story events more appropriately associated with the
opposite interpretation, then this was classified as a non-supportive explanation.
Additionally, there were instances in which the precise meaning of the child’s
explanation could not be determined with any certainty; these responses were
accordingly coded as indeterminate. Finally, there were certain cases in which
children provided explanations that seemed to support an understanding of story
events that differed from the interpretation shared by the majority of adult and child
subjects; these responses are listed in the column entitled alternative explanation,
below:
35
The means for determining the total number of explanations provided per each TC item was as
follows. The number of subjects who failed to respond to the prompt was deducted from the total
number of subjects asked. This figure was then increased by the number of subjects who provided an
unprompted explanation of their judgment, to arrive at a final total per item. Easy 1 was associated
with the highest rate of non-response to the prompt, with seven out of twenty-four children, or 29% of
those asked, failing to respond. On average, however, the rate of non-response per item was less than
10%.
D.L. Anderson, University of Cambridge
323
Chapter 4: Experimental Design and Presentation of Results
TC
item
Total
explanations
Supports
judgment
of sentence
Indeterminate
explanation
Alternative
explanation
Supports
opposite
interpretation
easy 1
20
80%
10%
5%
5%
easy 2
28
75%
11%
7%
7%
easy 3
38
97%
3%
N/A
N/A
easy 4
20
70%
15%
10%
5%
hard 5
41
90%
5%
N/A
5%
hard 6
33
52%
9%
36%
3%
hard 7
36
69%
11%
N/A
20%
hard8
39
69%
23%
8%
N/A
diff9
35
88%
6%
N/A
6%
diff10
32
69%
16%
3%
12%
diff11
35
100%
N/A
N/A
N/A
diff12
29
79%
7%
4%
10%
Table 4.17: Summary of the types of explanations offered for judgments of individual
TC items
The reader will note that three of the items listed in Table 4.17 are associated with a
relatively high rate of supportive explanations, easy 3, hard 5, and difficult 11, the
first two of which merit further discussion here. (Samples of post-judgment data
collected for difficult 11, as well as for various other test items, can be found in
Appendix II.) When analysing the post-judgment data that I collected for easy 3 and
hard 5, I noted a similar complication in each case, which is that there was one
particular explanation that was offered in support of both subject and object readings
of the item. Taking the example of easy 3, The fairy was easy to fight, the story
accompanying presentation of this control sentence involved a soldier who challenged
a fairy to a fight, believing her to be armed only with a stick. In fact, the stick was a
magic wand and the fairy was able to prevail in the fight with the aid of her magic
wand and with the advantage conferred by her ability to fly. I recorded a number of
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
324
explanations that uniquely supported either a true (i.e. non-target-like) or false (i.e.
target-like) judgment of easy 3. For example, some children favouring a non-targetlike interpretation of the sentence explained that the sentence was true because the
fairy flew up into the air and knocked the soldier to the ground, while, conversely,
some explained their target-like judgment of the item with appropriate reference to the
soldier’s inability to use his sword in fighting the fairy.
I also recorded eleven explanations of easy 3 (or 28.9% of the total) which shared a
similar form and which were invoked in support of both true and false judgments of
the control sentence. All involved some non-specific reference to the fairy’s magic
powers, a story element that does in fact provide credible justification for either the
ease with which the fairy fought the soldier or for the difficulty that the soldier
experienced in fighting her. Although this state of affairs is regrettable from a design
standpoint, I nevertheless chose to code all of these eleven explanations as being
supportive, regardless of whether the child’s original judgment of the sentence was
target-like or non-target-like. This is first because it would be inaccurate to code them
as indeterminate, and second because I observed that one of my adult subjects
similarly referenced the fairy’s magic powers when explaining her judgment of easy
3.36 Thus, even though the children’s responses were not as fully articulated as this
adult subject’s response, I felt it was reasonable to infer that these eleven children had
a similar explanation in mind to that offered by the adult subject.
I furthermore note that even if the eleven explanations referenced above are deducted
from the total number offered for easy 3, the rate of supportive responses is still quite
close to the overall item average of 70%. For this reason, I am not unduly concerned
about the lack of homogeneity in this particular set of responses. Turning now to
hard 5, I similarly observed that one particular explanation was offered in support of
both target-like as well as non-target-like readings of this item. This explanation,
which was offered by some adult as well as child subjects, focused on one very salient
story event, in which a baby elephant used her trunk to attack a tiger and flip him
36
Specifically, this response was offered by subject no. A11, a 38-year-old female, who answered the
prompt, “Tell me why it’s wrong,” with the explanation, “Umm because she had magic powers and
they allowed her to fight abnormally.”
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
325
over. This event does serve as reasonable justification for either a true or false
reading of the test sentence, The tiger was hard to fight. For example, according to a
non-target-like interpretation of the sentence, it is true that the tiger had difficulty
matching the elephant’s superior fighting skills. And according to a target-like
interpretation, it is false that the elephant experienced little difficulty in fighting the
tiger. Regrettably, this particular weakness in the design of this item did not become
evident until well into the main study, at which point I decided that it would be
preferable to retain the item and report the noted flaw, rather than replace it.
There were thirteen child subjects who included a demonstration of the elephant
flipping the tiger over in their explanation of hard 5, in support of either a true or false
judgment of the sentence. In five of the thirteen cases, subjects had been prompted by
the experimenter to physically demonstrate an explanation of their judgment (e.g.
“Show me why the tiger was/was not hard to fight”) when they failed to respond to
the first follow-up question, and so it is perhaps not surprising that they chose to
respond by re-enacting this particularly salient event. Again, however, I would argue
that the post-judgment data collected for this item are not particularly problematic
when it is considered that there were still twenty-four subjects who provided an
explanation that uniquely supported either a true or false judgment of this item. In
(18), below, I offer a sampling of these sorts of explanations:
(18)
True or non-target-like reading of hard 5, ‘The tiger was
hard to fight’
a. “He (= tiger) hasn’t got one of these.” Subject points to
elephant’s trunk. (male 4;9)
b. “Cause the elephant was + strong to fight.” (male 5;1)
False or target-like reading of hard 5, ‘The tiger was hard
to fight’
“The tiger didn’t win.” (female 5;10)
Next, I turn the focus to the figures reported in Table 4.17 for performance on hard 6,
The rabbit is hard to find. Recall that hard 6 was identified in §4.5.0.0 as a
potentially problematic item from the standpoint of its design. Consistent with the
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
326
concerns raised in that section, this particular item is associated with the greatest
number of explanations coded as alternative rather than supportive. A number of
these explanations are of the type earlier discussed, which I have suggested may
involve a third, unintended, interpretation of hard 6, specifically, one based on a
reflexive interpretation of the embedded infinitive verb. The data reported in Table
4.17 would thus appear to accurately reflect the problematic nature of this item.
Nonetheless, with the single exception of hard 6, the figures in Table 4.17 indicate
that the percentage of explanations coded as alternative or idiosyncratic was relatively
low for all other items. Since atypical understanding of the TVJ stories can be seen as
a rather restricted phenomenon, I think it is reasonable to consider aberrant
performance of this type to be within the limit of acceptable experimental “noise.”
Accordingly, in the discussion that follows, I will offer examples of alternative
explanations only when I deem these to be of particular interest.
Finally, before turning to consideration of post-judgment data that are more
representative of the typical behaviour of my subjects, it will be helpful by way of
comparison to first provide some examples of the types of explanations that I coded as
‘indeterminate.’ These are provided in (19) below:
(19)
E: What’s wrong with that?
S: “Umm because umm umm that’s not the same and
that’s wrong and that doesn’t mean + mean it.”
(female 3;11)
E: So why was the monkey easy to teach?
S: “Cause he was magic.” (female 4;0)
P(uppet): You tell me why I’m wrong.
S: “Cause you couldn’t + think right.” (male 4;6)
The examples in (19) illustrate that explanations classified as indeterminate usually
involved some reference to the puppet’s inability to evaluate the story events properly
or to some objection to the form of the puppet’s statement, which served as the
test/control sentence. In other cases, such as in the second example above, there was
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
327
simply not enough information to determine whether a particular explanation
supported the child’s judgment of the test sentence or not. The examples in (19) also
appropriately reflect that subjects in age group 1 (ages 3;4 to 4;4) proved the richest
source of explanations that could be classified as indeterminate. As children in this
age group also provided the fewest explanations of their judgments overall, I think
this state of affairs more likely reflects the fact that the youngest subjects lacked the
metalinguistic skills of the older children, rather than the fact that they possessed
lesser grammatical competence. This claim is further supported, I submit, by the
numerous examples I collected of explanations offered by younger subjects that
effectively served as truncated versions of adult explanations.
Before leaving the topic of indeterminate responses, I should also note that when a
child provided such a response, the normal procedure was for me to prompt the child
to further clarify her explanation. However, consistent with the point made above
about the lesser metalinguistic abilities of children in age group 1, I observed that, in
general, only subjects over the age of 4;5 proved able to fully comply with such a
request.
In the remainder of this section, I will focus on explanations offered in connection
with three specific items, easy 4, hard 7, and difficult 10. I have chosen these three
items to illustrate typical behaviour in the post-judgment phase of the experiment,
since together they provide a cross-section of item types in terms of the specific
adjective used, as well as in terms of the order of affirmative response and order of
presentation biases. In the case of the first item, easy 4, The spaceman was easy to
draw, the “true” or affirmative was associated with a target-like interpretation of the
sentence and the order of presentation of story events favoured the opposite
interpretation. For the second item, hard 7, The hedgehog was hard to ride, both the
affirmative response and order of presentation of story events favoured the non-targetlike interpretation of the sentence. These biases were reversed in the case of difficult
10, The dog was difficult to teach, with both the affirmative response and order of
presentation of story events favouring the target-like reading of the sentence.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
328
Beginning with easy 4, Figure 4.14, below, illustrates the test materials used for this
item and Table 4.18 provides a short review of the contextual information associated
with the two interpretations of the sentence:
Figure 4.14: Materials used for easy 4, ‘The spaceman was easy to draw’
Subject reading - False
Object reading - True
It was not easy for the spaceman to
draw a picture because he couldn’t see
properly through his space helmet and
couldn’t remove the helmet for safety
reasons.
It was easy for the little boy to draw a
picture of the spaceman.
Table 4.18: Story contexts for easy 4, ‘The spaceman was easy to draw.’
For a true or target-like judgment of easy 4, a typical adult explanation was as in (20),
below:
(20)
“I was thinking, yes, the little boy found him easy to draw
cause he just drew the picture and it’s a good picture of
him.” (female subject A18)
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
329
However, another explanation offered by one of my adult subjects, illustrated in (21),
below, appears problematic in comparison:
(21)
“I’m going to say true but the only reason I’m going to say
true is because in your story, it didn’t take him (= the little
boy) very long.” (female subject A20)
As was earlier noted in my discussion of problematic aspects of hard 6, the
explanation reported in (21) provides further confirmation that adult subjects are not
necessarily consistent in their assessment of the relative ease or difficulty of
performing some act. With reference to easy 4, this inconsistency is perhaps not
entirely surprising, since the boy’s relative lack of difficulty in drawing the spaceman
was inherently a less dramatic state of affairs than the spaceman’s futile efforts to
draw the boy. Yet, I find it interesting that none of the forty-four children tested on
this item articulated the same considerations as those raised by the adult subject in
(21). This, of course, does not preclude the possibility that my child subjects may
have entertained similar thoughts, but the post-judgment data that I collected for this
item provide no direct support for such a claim.
Child subjects, then, like the majority of adult subjects, explained their true judgment
of easy 4 in accordance with my predictions; thus, the examples illustrated in (22),
below, resemble the adult response listed in (20):
(22)
E: Why was the spaceman easy to draw? Show me.
S: Subject demonstrates little boy drawing by putting
paper in his lap and the crayon in his hand.
(female 3;10)
E: Why was the spaceman easy to draw? Why do you
think?
S: “Cause the picture’s kind of easy. I could draw it.”
(male 5;8)
For non-target-like judgments of easy 4, child subjects offered the following types of
explanations in response to the prompt, “Why was the spaceman not easy to draw?”:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(23)
330
a. “He can’t draw.” (female 3;6)
(Note that “he” presumably refers to the spaceman in
the story since the little boy was depicted as
successfully drawing a picture.)
b. “Cause he had his ah + his helmet on and he couldn’t
see the paper.” (male 5;0)
c. “The boy was easy to draw and the spaceman couldn’t
because his + his helmet was in the way.” (female 5;11)
d. “Because the boy umm said come have a try at drawing
and he (=spaceman) couldn’t draw because his helmet
was in the way.” (male 6;2)
As the examples in (23) illustrate, child subjects who interpreted the sentence as false
typically explained their objection to the puppet’s statement in terms of the
spaceman’s inability to draw a picture; thus, the explanations provided by these
children are consistent with an interpretation of the TC in which the matrix subject
DP, the spaceman, serves as the agent rather than as the object of the embedded verb
to draw. I therefore take explanations of this type as providing support for the
experimental hypothesis that children have access to an interpretation of the TC that is
unavailable to adults.
I now turn to post-judgment performance on hard 7, The hedgehog was hard to ride.
Figure 4.15, below, illustrates the test materials used for this item, followed by Table
4.19, which lists the story contexts supporting the subject and object readings of this
sentence:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
331
Figure 4.15: Materials used for hard 7, ‘The hedgehog was hard to ride.’
Subject reading - True
Object reading - False
It was hard for the hedgehog to ride the
frog because the frog’s back was
slippery.
The hedgehog was not hard for the frog
to ride because the hedgehog, being a
baby hedgehog, had soft fur the frog
could hold onto.
Table 4.19: Story contexts for hard 7, ‘The hedgehog was hard to ride’
Consistent with the story events reviewed above, a typical adult explanation of a false
or target-like judgment of hard 7 was as in (24), below:
(24)
a. “The hedgehog wasn’t hard to ride because he had fur
to hold onto. The frog was hard to ride because he had
a slippery back.” (subject A27)
b. “It’s wrong. The frog was hard to ride because he was
slippery.” (subject A12)
Interestingly, both of the adults above reference the slipperiness of the frog’s back,
even though a sufficient explanation for a false or target-like reading of the sentence
would have required reference only to the favourable conditions that the frog
encountered when attempting to ride on the hedgehog’s back. In total, I noted four
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
332
adults who referenced the frog’s failed attempt to ride the hedgehog in connection
with a target-like judgment of hard 7 and seven child subjects who did so. The oldest
of the latter subjects was a girl of 7;4, who gave the following answer to the question,
“What did he (= puppet) say wrong?”:
(25)
“It’s because the hedgehog was umm couldn’t get on the
frog’s back because it was too slippery and he didn’t have
any fur to hold onto.”
I have chosen to highlight this aspect of post-judgment performance on hard 7
because this item produced the highest number of explanations (i.e. 20%) classified as
“supports (the) opposite interpretation” (see Table 4.17). Yet, it is my opinion that
this finding is not a particularly problematic one, given that adult and child subjects
were observed to perform similarly in this respect. In particular, I think the seemingly
contradictory nature of the type of explanation offered in (25) is reasonably explained
in terms of a tension that the child (and perhaps even the adult) may have experienced
between the desire to provide an appropriate response to the experimenter’s question
and the desire to discuss a particularly salient story event.
Other than the seven child subjects already referenced above, there were a total of
twenty-five children who gave supportive explanations of both target-like and nontarget-like judgments of hard 7. For target-like judgments, subjects offered the
following types of responses to the question, “Why was the hedgehog not hard to
ride?”:
(26)
a. Non-verbal response: Subject demonstrates frog riding
on hedgehog. (male 4;2)
b. “Because the frog was slippy and the hedgehog wasn’t.”
(female 4;8)
c. “Ah + because the frog stayed on.” (male 6;0)
In contrast, for true or non-target-like judgments of hard 7, child subjects offered the
following reasons why they believed the hedgehog was hard to ride:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(27)
333
a. S: “Cause it’s slippery.” E: Cause who’s slippery?
S: “The frog.” (female 3;5)
b. “Cause the frog jumped on the back and it was nice and
furry (but) when the hedgehog jumped onto the frog’s
back, he couldn’t hold on cause it was too slippery.”
(male 3;8) (Note also that this subject responded to
presentation of the test sentence, The hedgehog was
hard to ride, by saying, “Yeah, on the frog.”)
c. “Cause he (= frog) had no fur to hold on (sic).” (female
5;10)
d. “Cause if the hedgehog gets onto the frog’s back, when
the frog jumps, the hedgehog would fall off.”
(male 6;0)
Notably, the majority of children who offered explanations of their true or false
judgment of hard 7 offered an explanation of the type exemplified in (26) or (27), and
thus the data collected were for the most part clearly supportive of one or the other
interpretation of the control sentence.
Finally, I examine post-judgment data collected for TC10, The dog was difficult to
teach, a test item which I earlier used to illustrate certain basic features of
experimental design (see §4.3.0.2). In Table 4.20, below, I first review the story
contexts for the two readings of this particular item:
Subject reading - False
Object reading - True
It was not difficult for the dog to teach
the pig how to play football.
The pig found it difficult to teach the
dog how to go down a slide because the
dog went up the slide backwards and
then got distracted and chased a cat.
Table 4.20: Story contexts for TC 10, ‘The dog was difficult to teach.’
For a true or target-like judgment of this item, adult control subjects typically offered
the type of explanation listed in (28) below:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(28)
334
a. “True, because he went up the wrong way up (sic) the
slide so he obviously wasn’t listening the way the
teacher (doesn’t finish phrase) and got distracted by the
cat.” (subject A14)
b. “Because it (= dog) didn’t follow what the pig did. It
went the opposite way.” (subject A1)
As illustrated in (29), below, child subjects who judged the sentence true offered
similar explanations to those listed in (28) when prompted to answer the question,
“Why was the dog difficult to teach?”:
(29)
a. “Umm cause he went up the slide the wrong way.”
(female 3;8)
b. “He goes like this.” Subject demonstrates the dog
climbing up the stairs the wrong way. (female 4;1)
c. “Because he went up there and then he + then he said
‘I’m going to go chase that cat.” (female 6;0)
d. “Because umm the pig listened to + the dog and he got a
goal but umm when the + when the + the umm dog was
listening + when the dog saw the cat, he wasn’t
listening and he didn’t take any notice of the pig.”
(female 7;4)
I anticipated that an appropriate explanation of a false or non-target-like judgment of
difficult 10 would involve reference to the dog’s successful attempts to teach the pig,
and I did in fact collect a number of responses of this type, as illustrated in (30),
below:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(30)
335
a. “The dog ++ took his nose and he flipped it (= ball) into
the net.” Subject demonstrates dog showing pig how to
push the football with his nose. (male 3;8)
b. “Cause umm + the + the + the dog could teach.”
(female 4;8)
c. “Cause the dog was easy to put in (sic) there.” Subject
demonstrates dog using his nose to push the football
into the goal. (female 4;9)
(Note that, here, the form of the child’s utterance
corresponds to a subject reading of the TC, since the
dog is construed as experiencing ease in putting the ball
into the net and not as the object of the verb to put.)
d. Response to test sentence, The dog was difficult to
teach: “The pig ++ the dog was teaching the pig to play
football.”
Not all of the explanations offered for this item were as consistent as those illustrated
in (29) and (30), however. As reported in Table 4.17, 12% of the explanations
recorded for difficult 10 could be classified as of the type typically offered in support
of the opposite interpretation of the sentence. A closer inspection of the performance
of the four children who furnished these types of explanations, all of whom gave a
false or non-target-like judgment of the test sentence, reveals that three of them
referenced the dog chasing the cat, and one referenced the dog walking up the slide
the wrong way. Therefore, I think it is reasonable to consider that, as in the case of
hard 7, the child’s desire to discuss these particularly salient story events - which also
represented the final events presented in the story - may have overridden his or her
desire to comply with the request to provide an appropriate explanation of his or her
judgment. (See also my earlier discussion of hard 5 in this section.)
In the preceding discussion, I have considered certain evidence which suggests that
children have access to a reading of the TC that is not available in the adult grammar,
specifically, a reading in which the matrix subject DP is taken to control embedded
subject PRO. However, when examined in isolation, the Intermediate child’s postjudgment explanation of a non-target-like interpretation of the TC still cannot speak to
the larger issue of whether the same child considers the TC ambiguous. In order to
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
336
investigate this issue, it is necessary to compare not only the ability of the
Intermediate to offer both target-like and non-target-like judgments of the TC but also
the Intermediate’s ability to provide supportive explanations of both types of
judgments. Since space considerations preclude a full presentation of all of the data
that I collected in this regard, I will focus on the performance of three subjects in
particular, nos. 29, 32, and 42, who were all over the age of six and therefore able to
clearly explain their reasoning. As the data in Tables 4.21, 4.22, and 4.23 indicate,
each of these subjects proved able to appropriately explain both target-like and nontarget-like interpretations of the TC, which, I would argue, provides support for my
earlier claim that these subjects did not engage in guesswork but rather made reasoned
judgments of test/control sentences:
Test/control sentence
Judg.
Explanation of judgment
The monkey was easy to
teach.
False
(TL)
Response to control sentence: “The `girl was
easy to teach ++ bad luck.” (Second comment
directed to puppet.)
The tiger was hard to fight.
True
(NTL)
Response to test sentence: “Yes + yes because
the elephant flipped the tiger over + twice.”
The hedgehog was hard to
ride.
False
(TL)
S: “Well the hedgehog ++ the frog did get on
the hedgehog but the hedgehog couldn’t get on
the frog.”
Table 4.21: Comparison of explanations offered by subject no. 29, a male, aged 6;0,
for target-like (TL) and non-target-like (NTL) judgments of TC items
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
337
Test/control sentence
Judg.
Explanation of judgment
The fairy was easy to fight.
False
(TL)
E: The fairy wasn’t easy to fight? Subject
shakes his head ‘no’ and replies, “Cause she
had the magic wand.”
The spaceman was easy to
draw.
False
(NTL)
E: He (= the puppet) said the spaceman was
easy to draw. What was wrong with that?
S: “No, because the boy umm said come have a
try at drawing and he (= spaceman) couldn’t
because his helmet was in the way.”
The hedgehog was hard to
ride.
True
(NTL)
Response to test sentence: “Yes, cause his
(= frog’s) back was very slippery.”
The ladybird was difficult to
eat.
True
(TL)
Response to control sentence: “Yes, cause he
(= ladybird) flew high in the tree” (i.e. where
the dinosaur couldn’t reach him).
Table 4.22: Comparison of explanations offered by subject no. 32, a male, aged 6;1,
for target-like (TL) and non-target-like (NTL) judgments of TC items
Test/control sentence
Judg.
Explanation of judgment
True
(NTL)
Response to test sentence: “Umm +++ umm
+++ the fairy +++ was easy to fight with the
/pr/ + knight.” E: “Why?” S: “Cause she went
up into the air and pushed him over twice.”
The monkey was hard to
draw.
True
(TL)
E: “Why was the monkey hard to draw? S:
“Because umm he kept /er/ + he kept moving in
the trees when the boy was trying to draw.”
The king was difficult to
draw.
False
(TL)
E: “Why is that wrong? Can you tell him (=
puppet)? S: “Umm umm she (= princess) said
at the end when she drawed the king that it was
quite easy to draw.”
The dog was difficult to
teach.
False
(NTL)
Response to control sentence: “The pig ++ the
dog was teaching the pig to play football.”
The fairy was easy to fight
Table 4.23: Comparison of explanations offered by subject no. 42, a female, aged 7;3,
for target-like (TL) and non-target-like(NTL) judgments of TC items
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
338
In concluding the discussion in this section, I maintain that the post-judgment data
reviewed in the three tables above provide strong support for my claim that the
Intermediate subject has access to two readings of the TC, and thus that the
performance of the Intermediate is inaccurately characterized as being random. I
wish to be clear, however, that I do not discount the possibility that my subjects,
whether Intermediate or not, may have displayed chance performance from time to
time. For example, it is reasonable to expect that both child and adult subjects will
experience occasional lapses of attention in any formal experimental study of
linguistic behaviour. Nevertheless, my contention remains that the evidence reviewed
in this section, on the whole, is consistent with my claim that the Intermediate makes
grammatically motivated rather than random choices with regard to her interpretation
of the TC.
4.5.1
Degree constructions (DCs)
4.5.1.0
Presentation of group and individual findings
As reviewed in §4.3.1.1, the null and experimental hypotheses in this condition
differed somewhat from those proposed for the other NOS in the study. With regard
to a DC such as, The dinosaur was too naughty to teach, I formulated the null
hypothesis according to the assertion that children, like adults, have access to both
SDC and ODC interpretations of the sentence, as illustrated in (31), below:
(31)
a. SDC: The dinosauri was too naughty PROi to teach
pro*k/prototypical.
b. ODC: The dinosauri was too naughty PROk to teach ei.
Conversely, according to the experimental hypothesis, it was predicted that children
would be restricted to subject readings of the DC, that is, to an SDC interpretation of
the DC as a consequence of their inability to interpret an NOS.
In Table 4.24, below, I contrast the performance of subjects in all five age groups on
individual DC items, as well as on the four DC items combined. Importantly, subjects
are not compared in the table according to the percentage of target-like readings they
D.L. Anderson, University of Cambridge
339
Chapter 4: Experimental Design and Presentation of Results
provided, since both readings of the DC can be so described, but instead according to
the total number of object readings they provided:
Grp
Ages
DC13
DC14
DC15
DC16
All items
1
3;4 - 4;4
6
(54.6%)
2
(18.2%)
4
(36.4%)
2/10
(20%)
14
(31.8%)
2
4;6 - 5;5
4
(36.4%)
1/10
(10%)
7/10
(70%)
4
(36.4%)
16/42
(38.1%)
3
5;6 - 6;3
4
(36.4%)
3
(27.3%)
8
(72.7%)
6
(54.6%)
21
(47.7%)
4
6;5 - 7;5
6
(54.5%)
4/10
(40%)
10
(90.9%)
10
(90.9%)
30/43
(69.8%)
5
Adults
6
(54.5%)
8
(72.7%)
8
(72.7%)
11
(100%)
33
(75%)
Table 4.24: Total number of object readings provided by age group on DC items
(NB: Where the number of subjects tested on an item was less than eleven, the number of object
readings obtained appears as a fraction over the total number of children tested.)
Notably, the data reported in Table 4.24 do not provide support for the experimental
hypothesis, since subjects in all age groups demonstrated the ability to assign an
object reading to the DC. Thus, pace the claim advanced by Solan (1978), which was
discussed in §3.2.1.0, I did not find that the acquisition of the ODC poses any
particular difficulty for children over and above their mastery of the TC.
Nevertheless, the group findings reviewed in Table 4.24 do mask the fact that there
were four subjects, all in age group 1, who provided only subject readings of the four
DC items.37 Although the performance of these four children is not inconsistent with
the experimental hypothesis, their performance on the other NOS is. This is because
all demonstrated the ability to assign at least some target-like readings to the TC, IR,
and OPC. Thus, I believe it more plausible that the failure of these four children to
provide an object reading of the DC was not related to their inability to interpret NOS
37
Conversely, there was only one subject in the study, a female, age 7;3, who provided only object
readings of the four DC items. The majority of the child subjects gave mixed readings of these items,
as did ten out of the eleven adult subjects.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
340
but rather to an interpretive preference that they displayed for the subject reading of
the DC. I return to the issue of subject preference for a particular reading of the DC
later in this section.
Returning to my analysis of the findings reported in Table 4.24, a statistical analysis
of between-group performance on the four DCs revealed that there was no significant
difference in the mean percentage of object readings provided by subjects in groups 1,
2, or 3 (χ2 (2, n=33) = 2.268, p <.322); therefore, these three groups performed as a
single population. When groups 1 to 5 and groups 1 to 4 were similarly compared,
however, a significant difference was observed (Groups 1-4: χ 2 (3, n=44) = 12.924, p
<.005; Groups 1-5: χ 2 (4, n=55) = 25.073, p <.001), indicating that the youngest
subjects in the study performed differently from both the oldest children and the adult
controls. Finally, a Mann-Whitney test revealed no significant difference in
performance between groups 4 and 5 (U (11,11) = 50.500, p < .519), and therefore the
oldest child subjects and adult controls performed statistically as a single group.
I was also interested to determine whether child subjects in any of the age groups
experienced particular difficulty with a specific DC item. A Cochran’s Q test
revealed that the distribution of subject and object readings was statistically similar
for all groups with the exception of group 4. For these subjects, a significant
difference was obtained when between-item performance was analysed (Q(3) = 8.586,
p <.035). Nevertheless, the results of a McNemar’s test for pairwise comparisons
failed to locate this difference to any single contrast between two DC items, and
therefore I will not pursue an explanation of this particular finding here.
A similar comparison of the performance of my adult control subjects on individual
DC items revealed no significant difference in the distribution of subject versus object
readings for any given item (Q(3) = 4.935, p <.177). Even so, an inspection of the
figures provided in Table 4.24 reveals that while DC13 produced a relatively evenly
balanced number of subject as opposed to object readings, adults demonstrated a clear
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
341
preference for object readings in the case of the other three DCs.38 In this respect, I
note that my findings are consistent with those reported for McKee’s adult subjects,
who favoured object readings of DC items approximately 75% of the time (1997a:77).
I now turn to an examination of the types of explanations offered by both child and
adult subjects for their judgments of DCs. As in the case of TCs, the explanations
offered by my child subjects for DCs often closely resembled those provided by adult
subjects, although, understandably, the child’s remarks were typically more truncated
than the adult’s. Notably, the ambiguity of the DC afforded a unique opportunity to
compare adult and child explanations of both subject as well as object readings of a
particular item. In general, however, I solicited fewer explanations of DC items than
of TC items. This was primarily because I did not wish to tire subjects with excessive
follow-up questioning and so chose to concentrate my efforts on obtaining postjudgment data for the TC, which represents the main construction of the present
study.
As Table 4.24 indicates, DC13, The dinosaur was too naughty to teach, prompted the
most balanced number of subject and object readings in the case of both adult and
child subjects. Thus I think it is the most appropriate choice for illustrating postjudgment performance on the DC. Figure 4.16, below, illustrates the test materials
used for this item.
38
In the case of DC16, I think it is reasonable to consider that some aspect of the design of this item
may have reduced the availability of the subject reading of the sentence, given that all eleven adult
subjects accessed an object interpretation. The story preceding DC16 had included two scenarios, one
in which a snake contemplated eating a lion and another in which a lion decided to eat a snake, despite
the snake’s protests that he would not make an adequate meal because he was “too small (for the lion)
to eat.” While the presentation of story events favoured the object reading of DC16, I believe that this
factor alone is insufficient to explain the performance of my adult subjects, who offered mixed
responses for the other similarly biased item, DC15. And while I think it is plausible that adults may
have simply favoured a transitive reading of the embedded verb to eat over an intransitive reading of
the same, I have no direct evidence for the validity of this claim. I submit, then, that resolution of this
issue would require further testing of similar items and, ideally, testing of DCs that feature a wider
range of embedded verbs.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
342
Figure 4.16: Materials used for DC13, ‘The dinosaur was too naughty to teach.’
The design of the story preceding DC13 supported a false judgment of the subject
reading of the sentence (i.e. The dinosauri was too naughty PROi to teach
pro*k/prototypical) since the dinosaur proved to be a very good teacher. Alternatively, the
story context also supported a true judgment of the object interpretation of DC13 (i.e.
The dinosauri was too naughty PROk to teach ei), since the dinosaur was a very
troublesome pupil who knocked over his desk while being taught.
Beginning with explanations offered by adult subjects for DC13, I find it interesting
that there were two subjects who consciously recognized the ambiguity of the
sentence, as indicated by their remarks, reported below:
(32)
a. “It wasn’t clear whether you were saying the dinosaur
was too naughty to teach, meaning that he couldn’t
teach because he was too naughty, or whether the
teacher found him too naughty to teach.” (subject A1)
b. “Well it depends what you mean by that, the dinosaur
was too naughty to teach - to be taught or to be a
teacher? (subject A8)
Thus, while it is typical for adults to consciously access only one preferred reading of
an ambiguous sentence (cf. Crain and Thornton 1998), these two subjects quite clearly
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
343
recognized both available interpretations of the DC.39 Furthermore, I observed that
this particular finding was not limited to DC13, since in the case of each of the other
three DC items, there was at least one adult who offered remarks similar to those
listed in (32), above.
For adult subjects who judged DC13 either true or false, the following represent
typical explanations offered for each reading of the sentence:
(33)
Subject reading (false)
E: Why was he not too naughty to teach?
S: “Because he actually got up in front of the class and
taught the class.”
Object reading (true)
E: Why was he too naughty to teach?
S: “Well, he knocked his chair over, did a lot of jumping
around and Mrs. Payne (= teacher) gave up teaching
entirely and + let him have a go.”
In the case of the forty-four child subjects, twenty-eight, or 63.6% of the total, gave
either prompted or unprompted explanations of their judgment of DC13, and the
majority of these explanations (i.e. 89%) could be considered supportive of the child’s
original judgment of the test sentence. The explanations listed in (34), below, are
representative and as the reader will observe, resemble the adult explanations listed in
(33):
39
In addition, it appears that one child, age 6;6, may have consciously accessed both interpretations of
DC13, since on presentation of the sentence The dinosaur was too naughty to teach, this child
remarked, “in half of it.” I acknowledge that the child’s remark may have simply meant that the truthvalue of an object interpretation of DC13 could be established only on the basis of events that took
place in the first half of the story, when the dinosaur misbehaved as a student. However, the child’s
statement is also consistent with her recognition that the truth value of the subject interpretation of the
sentence could not be upheld according to the events that took place in the last half of the story, in
which the dinosaur proved to be a good teacher. Thus, I think it is possible that this child recognized
not only the truth of the object interpretation of the sentence but also the concurrent falsity of the
subject interpretation of the same.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(34)
344
Subject reading (false)
a. “Because the dinosaur was good to teach.” (female 5;1)
b. “Yeah, yeah, cause the dinosaur teached them very well.”
(male 6;0)
c. “He said that the dinosaur was too naughty + + to be a
teacher.” (female 6;10)
Object reading (true)
a. “Cause he ++ knocked over that chair.” (female 3;5)
b. Response to test sentence: “Yep, he did knock over the
chair.” (female 5;8)
c. “Because umm he knocked the chair over + and the
teacher said umm she wouldn’t be able to teach him if
he’s naughty.” (male 6;10)
As in the case of the TC, I was particularly interested to find examples of single
subjects who could capably explain both subject and object interpretations of various
DCs, since this would provide supportive evidence that the child’s grammar licenses
both options. However, because I tested only four DCs, as compared to a total of
twelve TCs, and consequently posed fewer follow-up questions in this condition, I
was able to obtain only a relatively limited set of relevant data, from which I have
drawn the examples offered in Tables 4.25 and 4.26, below:
Test/control item
Judg.
DC15 - The giraffe
was too big to ride.
True subject
reading
DC16 - The snake
was too small to eat.
Falseobject
reading
Explanation of judgment
E: Why is that true, the giraffe was too big to
ride?
S: Subject demonstrates by making the
(very tall) giraffe stand over the pony’s back.
E: Why was Fudge (= puppet) wrong?
S: “Cause umm the lion was going to eat the
snake and he ate him.”
Table 4.25: Comparison of explanations provided by subject no. 15, a female, aged
4;8, for subject and object readings of the DC.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
Test/control item
Judg.
Explanation of judgment
DC13 - The dinosaur was
too naughty to teach.
False subject
reading
E: What was wrong with that?
DC16 - The snake was too
small to eat.
False object
reading
Response to test sentence:
345
S: “It was that the dinosaur was good to
teach, not bad.”
“Nope, the lion could bend down.”
(Presumably, to eat the snake.)
Table 4.26: Comparison of explanations provided by subject no. 28, a male, aged 6;0
for subject and object readings of the DC.
4.5.1.1
Discussion
In this section, I address the larger issue of how the experimental evidence reviewed
in the previous section, which pertains to the processing of ambiguous sentences, can
be used to inform existing theories of sentence comprehension. As earlier noted, it is
generally accepted in the psycholinguistic literature that adults will favour one
particular reading of an ambiguous sentence when such a sentence is presented to
them in the absence of a predisposing context. There is less agreement, however,
regarding the issue of how the human parser determines such a preference. Some
theorists adopt the view that the parser operates in a strictly autonomous fashion, with
a syntactic analysis of the input necessarily preceding semantic and/or pragmatic
analysis of the same (see, e.g., Frazier 1978, 1987, Mitchell 1994, and Frazier and
Clifton 1996); consequently, when all other factors are held constant, such theories
hold that the favoured interpretation of an ambiguous sentence can be distinguished in
terms of structural considerations alone. For other theorists, however, an autonomous
and/or serial conception of the operation of the parser is rejected in favour of one that
takes parsing determinations to result from satisfaction of a number of competing
factors or constraints (see, e.g., Taraban and McClelland 1988, Boland, Tanenhaus,
and Garnsey 1990, and MacDonald 1994). It is thus implicit in the latter view that the
initial operations of the parser are not purely data-driven but instead can involve the
use of multiple sources of information, such as pragmatic knowledge or frequencybased considerations.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
346
I prefer to remain theoretically neutral with respect to the relative merits of the two
conceptions of the parser outlined above, as the data I collected cannot directly
address this fundamental issue. Furthermore, while it is my contention that my
findings can be used to inform theories of sentence comprehension, it is nevertheless
appropriate that I exercise some caution in any attempted application of these
findings, given that the present study involved use of an off-line rather than on-line
measure of comprehension ability. This choice of testing method was motivated both
by the relatively young age of my subjects and by the need to test subjects on site at
the schools they attended.
Recall that the eleven adult control subjects in the study displayed a clear bias toward
object readings in the case of three out of the four ambiguous DC items, with only
one, DC13, producing a relatively even distribution of subject and object readings. In
this respect, as earlier observed, my findings parallel those reported by McKee
(1997a), whose adult subjects also favoured object readings of ambiguous DC items
presented with two supportive contexts. According to the Construal Model of parsing
(Frazier and Clifton 1996), which proposes a serial and autonomous (i.e. modular)
operation of the parser, structural complexity would have to be ruled out as the factor
motivating the preference discussed here. This is because, according to the criteria
outlined by Frazier and Clifton, both the subject and object reading of the DC would
have roughly equal structural status, with both instantiating the same “primary
relation” between a licensing matrix predicate (here, the degree phrase) and its
complement clause (ibid.:41-2). Thus, it seems that an explanation of the adult
preference for the object reading of the ambiguous DC would require instead some
extra-syntactic account according to the Construal Model.
However, to my mind, the interpretation of the data that is imposed by Construal
Theory is counterintuitive on some level, since I would argue that according to any
reasonable measure of derivational complexity, it is the object reading of the DC that
would be identified as the more syntactically complex. This is because the ODC is
standardly held to involve interpretation of a syntactically displaced object argument,
whereas the SDC is not. Moreover, even in the psycholinguistic literature, it is the
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
347
ODC that would traditionally be considered the more difficult structure to process,
according to the accepted truism that the processing of the type of filler-gap
dependency that is attested in the ODC places a relatively greater strain on working
memory than the processing of a dependency (e.g. a control relation) which does not
involve movement or dislocation of a syntactic constituent, as in the SDC.
It seems, then, that Construal Theory cannot offer a satisfactory explanation of the
general preference that adults display for the object reading of the DC. Must we
therefore abandon the notion of a modular and/or serial conception of the parser in
order to account for this particular set of data? Not necessarily. For example, even if,
we reject relative structural complexity as the determining factor in establishing an
adult preference for the object reading of the DC, it is still possible to consider that
any bias introduced at the initial, purely structural, level of analysis may simply be
overridden at a later stage of processing. In particular, I would like to suggest that the
relevant consideration is a lexical bias that is introduced by the presence of the degree
word itself, which, in English, appears more frequently in connection with the ODC
than with the SDC.40 Thus, while the initial operation of the parser might treat both
representations equally, as Construal Theory would predict, or even favour what some
would argue is the structurally simpler subject reading, this state of affairs does not
rule out the possibility that an initial preference is overridden at a later stage of
processing by the type of frequency-based consideration that I have suggested here.
Certainly, I would argue that the experimental evidence collected by both Anderson
(2002a) and McKee (1997a) is consistent with the existence of such a bias, given that
the preference that adults displayed for the object reading of the TC was observed to
be consistent across individual DC items, regardless of the choice of adjectival and/or
clausal complement to the degree word.
With regard to the issue of whether this hypothesized bias is more likely to influence
an early rather than later stage of parsing, I concede that my findings cannot be used
to directly address this particular concern since my data were collected through the
use of an off-line method. Nevertheless, I note that there is a growing body of
40
I am grateful to Ianthi Tsimpli for originally suggesting this possibility to me.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
348
experimental evidence which points to frequency-based considerations as the relevant
factor in determining certain parsing preferences (see, e.g., Trueswell et al. 1993,
MacDonald 1994, Garnsey et al. 1997). For example, it has been variously argued
that an adult preference for one reading of an ambiguous sentence over another can
sometimes be traced directly to the fact that the matrix predicate in the sentence is
statistically more likely to occur in English with one particular type of complement
than with another. Thus, the types of statistical considerations that I have proposed
may underlie the adult preference for the ODC are not without precedent in the
literature. Since I recognize, however, that establishing the validity of this claim
would minimally involve an extensive examination of natural language corpora, I
must leave this as an issue for future investigation.
Turning now to a consideration of the child data, I earlier noted that a reverse bias is
attested in the early stages of the acquisition of the DC, with children initially
favouring the subject rather than object reading of the sentence. In particular, I found
that only subjects in group 4 (ages 6;5 to 7;5), who statistically performed as adults,
demonstrated a clear preference for the object reading of the ambiguous DC. In
contrast, younger children favoured subject readings of the DC by a margin of
approximately 2 to 1, although this tendency was observed to decrease with age. For
example, children in age group 3 (ages 5;6 to 6;3) provided a nearly balanced number
of both types of readings, specifically, 52.3% subject-type and 47.7% object-type.
Clearly, then, children below the age of 5;6 do not share the interpretive preference
that adults display for the object reading of the DC, and yet the data also indicate that
young children are not limited to the assignment of subject readings alone. I believe
this state of affairs is reasonably explained according to the assumption that it takes
some time for children to appreciate that assignment of an object interpretation to the
DC is probabilistically favoured in English, and for the operations of the parser to be
suitably influenced by this consideration.
I also recognize the alternative possibility that the child’s initial preference for the
subject reading of the DC may simply reflect the fact that early interpretive
preferences are solely, or at least primarily, based on considerations of structural
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
349
complexity. Such a supposition would be consistent with the claims of theorists who
argue that the derivation of the object reading of the DC is more complex than that of
the subject reading since the former involves the interpretation of a displaced
syntactic constituent. Nevertheless, I would argue that if the early preference that
children display for the subject reading is one that is strictly based on considerations
of syntactic complexity, then this preference should be an exclusive one, particularly
in the early stages of acquisition.
My results, however, do not provide any evidence for the existence of such a
developmental stage. This is because the vast majority of my subjects – specifically,
forty out of forty-four children - proved capable of assigning both subject and object
readings to the DC. And in the case of the four children who provided only subject
readings of the DC, all four concurrently demonstrated the ability to assign one or
more object readings to the TC. Therefore, as earlier argued, I believe it is doubtful
that lack of syntactic competence in interpreting NOS is implicated in the exclusive
preference that these four children displayed for the subject reading of the DC.
Instead, I think is equally plausible that the preference is one associated with the
child’s developing processing abilities, rather than one imposed by a deficient
competence grammar.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
4.5.2
350
Infinitival relatives (IRs)
In this section I analyse subject performance on the IR constructions tested, IR17, The
pirate found a soldier to fight, and IR18, The tiger found a rabbit to eat. (Note that
subject performance on the OPC will be reviewed in the following section.) Because,
unlike the DC, the IR is associated with only a single interpretation in the adult
grammar, I will once again evaluate child performance according to whether it can be
considered target-like or non-target-like. Recall that the null and experimental
hypotheses. for both the IR and OPC test conditions, were formulated as illustrated in
(35a&b), below:
(35)
a. Null hypothesis: The child, like the adult, has the
syntactic ability to assign a target-like interpretation to
the IR and OPC.
b. Experimental hypothesis: The child does not possess
the syntactic ability to interpret an NOS and therefore
will be restricted to a non-target-like interpretation of
both the IR and OPC.
As noted in §4.3.1.2, I prefer not to label the non-target-like reading of the IR a
subject reading. This is because I wish to avoid direct comparison of the non-targetlike reading of the TC, which I hypothesize involves subject control, and the nontarget-like reading of the IR/OPC, which I think more likely involves control of
embedded PRO by the matrix object argument.
Table 4.27, below, compares the total number of target-like readings obtained by age
group for each of the two IRs, as well as for the two items combined:
D.L. Anderson, University of Cambridge
351
Chapter 4: Experimental Design and Presentation of Results
Group
Ages
IR 17
IR 18
Both items
1
3;4 - 4;4
5
(45.5%)
10
(90.9%)
15
(68.2%)
2
4;6 - 5;5
7
(63.6%)
9
(81.8%)
16
(72.7%)
3
5;6 - 6;3
4
(36.4%)
10/10
(100%)
14/21
(66.7%)
4
6;5 - 7;5
9
(81.8%)
10
(90.9%)
19
(86.4%)
5
Adults
11
(100%)
10
(90.9%)
21
(95.5%)
Table 4.27: Total number of target-like responses per age group - IRs
(NB: Where the total number of subjects tested was less than eleven, the number of responses
obtained appears over the total number of subjects tested.)
A review of the figures provided in Table 4.27 suggests that children generally
experienced more difficulty with IR17 than with IR18, an observation that is
confirmed by a statistical analysis of the results. In particular, while there was no
significant difference observed between any child group and the adult control group
on IR18, the same could not be said for IR17. As in the case of the TC and DC, the
results of a Kruskal-Wallis test for between-group differences revealed that groups 1
to 3 performed as a single population (χ2 (2, N=33), p < .439) on IR17, but that each
of these groups differed from both groups 4 and 5. Finally, again as reported for the
TC and DC, a Mann-Whitney test for between-group differences revealed no
significant difference in the performance of groups 4 and 5 (U (11, 11) = 49.500, p <
.478), indicating that the two groups performed as a single population.
The target-like performance demonstrated by all age groups on IR18 is not consistent
with the view that certain subjects experienced an across-the-board impairment of
their ability to interpret the IR. Rather, I believe that the comparatively poorer
performance of some subjects on IR17 suggests the existence of some flaw in the
design of this particular item. In support of this supposition, I note that the
explanations offered by child and adult subjects for their judgments of IR18 were
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
352
relatively straightforward as compared to certain explanations offered in support of
IR17. In order to illustrate this point, I will first review typical explanations offered
for both target-like and non-target-like interpretations of the relatively unproblematic
IR18, The tiger found a rabbit to eat. In (36), below, I offer a representative example
of an adult explanation of a false (i.e. target-like) judgment of this item:41
(36) “He didn’t want to eat him. He wanted to play with him.”
(subject A18)
In the case of child subjects who provided an explanation of their target-like judgment
of IR18, their remarks typically took one of two forms. Either the child focused on
how accurately the sentence described the tiger’s intentions - which could be
described as a more “purposive” interpretation of the sentence - or he/she focused on
the characterization of the rabbit as being the object of eating, which I presume was
more in keeping with a reading of the sentence as involving an infinitival relative
clause. An example of the first type of explanation is provided in (37a) below and an
example of the second in (37b):
(37)
a. “Because + because he /d/ + because he wanted to play
with the rabbit instead.” (male 4;8)
b. E: What did he (= puppet) say wrong?
S: “Umm the tiger found something to eat.” (male 5;0)
Regrettably, none of the four children who provided a non-target-like judgment of
IR18 offered an explanation of their interpretation of the sentence. Nevertheless, all
four were asked various follow-up comprehension questions (e.g. “Did the tiger want
41
As indicated in Table 4.27, there was one adult subject who gave a non-target-like (i.e. “true”)
reading of IR18, but this is most likely attributed to the fact that she allowed extra-contextual
considerations to affect her interpretation of the sentence. Specifically, this subject explained that she
judged the sentence true because, “He (= tiger) found a rabbit that he wanted to eat but he didn’t eat it.”
When the experimenter pointed out that, in the story, the tiger had expressed no interest in eating the
rabbit, the subject responded that the tiger’s instinct, like that of her own pet dogs, would be to consider
the rabbit potential prey whether or not he actually ended up eating the rabbit.
Interestingly, it also appears that one child subject, age 6;0, may have entertained a similar
consideration when providing a target-like (i.e. false) judgment of the same item, since he offered the
following explanation of his interpretation of the sentence: “He (=tiger) found one to eat but he didn’t
get to eat it.”
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
353
to eat the rabbit?”), and I noted that all responded appropriately. Two of these
subjects, ages 4;9 and 5;5, correctly interpreted IR17 and missed IR18, while the other
two, aged 4;4 and 6;10, failed both items. Therefore, it is only with respect to the
performance of the latter two subjects that there is a legitimate reason to question
whether these children may have lacked target-like ability to interpret the IR.
Nonetheless, I think that the validity of this contention is undermined in the case of
the second of these two subjects, first because of the relatively advanced age of this
child and second because she performed quite competently on the other NOS tested in
the study.
Turning now to performance on IR17, The soldier found a pirate to fight, nineteen
children (or 43.2% of the total) provided a non-target-like judgment of this item,
including two children over the age of 6;5. As detailed in §4.3.1.2, the story
preceding presentation of this IR featured a soldier who approaches a pirate. The
pirate initially believes that the soldier has come to fight him, but the soldier then
explains that he wishes only to join the pirate on his ship for a bit of singing and the
story ends with the two characters singing together. I had speculated that children
who lacked target-like knowledge of the sentence might allow what Jones (1992) has
termed a switched-control reading of the sentence, in which a referential relationship
holds between the matrix object DP and embedded subject PRO, and between the
matrix subject DP and embedded object, as illustrated in (38), below:
(38)
The soldieri found a piratek PROk to fight (him)i.
(With the meaning, The soldieri found a pirate who was
willing to fight himi.)
Of the twenty-five children who correctly judged the sentence false, ten provided an
explanation of their judgment, and, as illustrated in (39), below, these were
comparable to explanations offered by adults for the same reading:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(39)
354
a. E: And it’s false because?
S: “He (= soldier) wasn’t looking for someone to fight.
He was looking for someone to sing with.”
(adult subject no. 3)
b. “The soldier found a pirate ++ to sing.” (male 6;8)
Non-target-like judgments of IR17 were provided by nineteen children, but,
regrettably, I obtained only one explanation for this reading of the sentence, which is
listed below:
(40)
Response to test sentence: “He did.” E: Did he?
S: “Yeah. He found him and then he said, ‘I’m gonna fight
you.’ ”
I submit that this child’s explanation is consistent with the alternative interpretation of
the sentence that was proposed in (38), above, in which a soldier finds a pirate who in
fact turns out to be interested in fighting the soldier. In support of this contention, I
note that the referent of “he” in the child’s sentence cannot be the same in both
occurrences if the child’s assessment of the test sentence is to accord with story
details. This is because the character who did the finding in the story, the soldier, was
not the same as the character who threatened to fight the soldier.
In the case of the other eighteen subjects who gave non-target-like judgments of this
item, all were asked follow-up comprehension questions such as “What did the soldier
want?” or “So did the soldier fight the pirate? Interestingly, sixteen of these children
correctly answered all such questions, thus consistently identifying the pirate as the
one doing the fighting and the soldier as the one seeking a singing partner. Certainly,
then, the non-target-like performance of these sixteen children cannot be explained in
terms of their poor comprehension of story details; instead, I suggest that these
children may have had an alternative reading of IR17 in mind.42 The issue of whether
42
The remaining two subjects were the only ones who in fact gave any clear indication of having
misunderstood the story. One boy, age 4;8, correctly remembered that the soldier had said that he
wanted to find a pirate but incorrectly maintained that the soldier had intended to fight the pirate. And
the second child, a boy, age 6;0, incorrectly claimed that the soldier did not want to find a pirate but
instead “just wanted to have a go on the ship.”
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
355
this alternative reading may have involved switched control, as illustrated in (38), or
some other analysis is complicated, I believe, by the performance of two of my adult
controls. Both of these subjects correctly judged IR17 false, but each also expressed
some reservation after having made this judgment. For example, one female subject
judged the sentence false and then remarked, “That’s hard though.” When pressed to
explain why she found judging this item difficult, she responded as follows:
(41)
“I don’t know why. The soldier found a pirate and the
pirate wanted to fight him. That’s why. But in fact he
didn’t.” (subject A16)
While I acknowledge that this adult subject’s explanation is not entirely clear, I am
nevertheless somewhat concerned that her remarks seem consistent with a construal
of the sentence that involves switched control, as in (38), despite the fact that this
interpretation of the sentence should be barred by the adult grammar. And, similarly,
I believe that the remarks made by the second adult, a male, are also consistent with
this same possibility. In this case, after judging the sentence false, the subject gave
the following explanation of why he had felt some uncertainty in making his
judgment:
(42)
“Well because he did find a pirate and the pirate wanted to
fight. Therefore you could say he found a pirate to fight
even though he wasn’t interested in fighting, so it’s just
how you want <to?> phrase it.” (subject A21)
Despite the curious nature of the above remarks, I will not pursue the argument that
these two adult subjects may have accessed, even briefly, an illicit interpretation of
the IR. Instead, I think it is more plausible that they interpreted the sentence as a
subject-gap purpose construction (SPC), a construction exemplified in (43), below,
which features an embedded subject gap:
(43)
The agency found a volunteerk [PROk to teach
proprototypical] [in Belize].
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
356
In (43), the embedded object position is occupied by pro, which takes generic or
unspecified reference; consequently, neither the matrix subject DP nor matrix object
DP can serve as the controller of this argument. This particular reading of the IR is
thus made possible by the subcategorial properties of the embedded verb to teach,
which in English is licensed to occur with a generic or unspecified object. And since
the verb to fight is similarly licensed to take generic pro as its object argument, I think
it is reasonable to speculate that the two adults referenced above may have
considered, although ultimately rejected, an SPC interpretation of IR17 (cf. The
soldier found a piratei [PROi to fight proarb]). Moreover, I think it is possible that
certain of my child subjects may have entertained similar thoughts regarding the
appropriate analysis of IR17.
However, what would remain unexplained according to the above line of argument is
why so few of my subjects appear to have considered a similar alternative
interpretation of IR18. According to the story accompanying presentation of the
sentence The tiger found a rabbit to eat, an interpretation of the sentence such as that
depicted in (43) is not ruled out. This is because the tiger does find a rabbit in the
story, which begins with the rabbit eating food out of a dish. Furthermore, the
embedded verb to eat, like the verb to fight, is licensed to take a null pronominal
object with generic or unspecified reference; therefore, it is not inconceivable that the
sentence could be interpreted as featuring generic object pro rather than an object gap
with specific reference. Since I believe, however, that the limited amount of data I
collected precludes a proper investigation of this issue at the present time, I must
leave this as a topic for future investigation.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
4.5.3
357
Object-gap purpose constructions (OPCs)
In this section I review performance on the two OPC items tested, OPC19, The man
bought a chicken to eat, and OPC20, The clown bought a dog to ride. As in the
previous section, I will not employ the terms subject or object reading to describe the
two potential interpretations of the OPC, since I believe this terminology is potentially
misleading. Instead, I will distinguish target-like readings, in which the matrix
subject is interpreted as performing the action denoted in the embedded clause, and
non-target-like readings, in which the matrix object argument is assumed to perform
the same action. The reader is referred to example (35) in the previous section for a
review of the null and experimental hypotheses associated with these test items.
Notably, I do not consider the possibility that the non-target-like reading of the OPC
might involve switched reference, as I speculated in the case of the IR (cf. The soldieri
found a piratek PROk to fight (him)i). This is because the context of the stories
accompanying presentation of OPC19 and OPC20 did not allow for such an
interpretation of either item; specifically, the possibility was never entertained that the
chicken might eat the man who purchased him, nor that the dog would ever ride the
clown.
Table 4.28, below, compares the five age groups in terms of the number of target-like
readings obtained for each OPC item, as well as for both items combined:
D.L. Anderson, University of Cambridge
358
Chapter 4: Experimental Design and Presentation of Results
Grp
Ages
OPC19
OPC20
Both items
1
3;4 to 4;4
8
(72.7%)
9
(81.8%)
17
(77.3%)
2
4;6 to 5;5
10
(90.9%)
7
(63.6%)
17
(77.3%)
3
5;6 to 6;3
11
(100%)
10
(90.9%)
21
(95.5%)
4
6;5 to 7;5
11
(100%)
10
(90.9%)
21
(95.5%)
5
Adults
11
(100%)
8
(72.7%)
19
(86.4%)
Table 4.28: Total number of target-like responses per age group – OPCs
According to a statistical analysis of item-based performance, there was no significant
difference between the performance of any of the child groups and that of the adult
controls (χ2 (4, 55) = 8.752, p< .068 for OPC19 and χ2 (4, 55) = 3.793, p< .435 for
OPC20). Therefore, even children in the youngest age group performed like adults on
the OPC. Taking a closer look at the individual performance of the eighteen children
aged 5;0 or under, who would be the most likely to have experienced difficulty with
this construction, I find that there were no subjects who missed both OPCs and only
six who missed one of the two items. Thus, the majority of my subjects below the age
of five - specifically, twelve children - provided target-like responses on both OPCs.
While I must be cautious in generalizing the findings reported here, given the very
limited number of test items administered, it is nevertheless quite clear that these
findings do not provide support for the claim that children are relatively delayed in
their acquisition of the OPC, as argued by Goodluck and Behne (1992), Goodluck
(1984), and Goodluck, Finney, and Ling (1995). As discussed in §3.2.1.1 of Chapter
3, the first two of the referenced studies have produced experimental evidence which
suggests that children as old as ten still lack adult-like capability to interpret the OPC.
However, looking at the performance of my own subjects in the upper two age
groups, only one child out of eleven in group 3 (ages 5;6 to 6;3) made an error on the
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
359
OPC, and, again, only one child out of eleven in group 4 (ages 6;5 to 7;5) made such
an error. Therefore, my findings, although limited, do not match those reported in the
above-referenced sources.
Furthermore, I believe that subject performance on OPC20, the item missed by the
two children referenced earlier, would have been improved had certain modifications
been made to this item prior to its use in the main study. Specifically, I note that there
were three adults who gave non-target-like judgments for this item, which suggests
some particular problem with the item itself. The reader will recall that in the case of
aberrant adult responses to TC items, I was generally able to account for these errors
in terms of subject inattention or in terms of the subject’s reliance on extra-contextual
considerations when determining an interpretation of the sentence. However, I would
argue that the three adult non-target-like responses obtained for OPC20 were more
likely prompted by flaws in the design of this item.
As originally discussed in §4.3.1.2, the story accompanying OPC20, The clown
bought a dog to ride, first presented a scenario in which a clown went to a pet store
and bought a dog. The dog initially assumed that he had been purchased as a pet and
asked the clown if he could go to his new home. The clown explained to the dog that
he had bought him because he wanted to ride him as a means of cheering up a little
girl who was feeling poorly in hospital. This item had been pilot-tested on both child
and adult subjects with no difficulties, but in the course of its use in the main study, I
became aware that responses provided by several adults raised questions regarding the
suitability of this item.43 The remarks made by two adult subjects in particular, who
correctly judged the sentence true, are very informative in this regard and are reported
in (44a&b), below:
43
As noted in §4.3, time and scheduling constraints prevented me from testing adult subjects prior to
child subjects in the main study. Instead, testing proceeded in parallel for both groups. I acknowledge,
however, that prior testing of adults is generally to be preferred and, in this situation, would have been
especially advantageous, as it would have allowed me to modify or reject OPC20 before administering
it to child subjects.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(44)
360
a. “I put ‘true’ but then I thought, did he (= clown) buy the
dog to ride, or did he buy the dog to cheer the girl up in
the hospital, or did he buy the dog to pull the cart? He
could’ve bought the dog to do all sorts of things really.”
(subject A1)
b. Subject explains reason for hesitating before
responding “true”: “Because he bought him for a
number of reasons it strikes me: to cheer the girl up,
etc., etc. – not essentially just to ride him. If you had
said the clown bought a dog to make the little girl laugh,
I would’ve said it was true without questioning.”
(subject A15)
In the case of the three adults who gave false or non-target-like judgments of the
sentence, each raised similar considerations to those expressed by the subjects in
(44a&b), offering alternative reasons for the clown buying the dog, such as to serve as
a pet or to cheer the little girl up. I confess to being somewhat surprised by the
response of these adult subjects, given that the story had included a specific event in
which the clown declared to the dog, “I bought you so I could ride you.” However, as
this event occurred early in the story, it is possible that its salience may have been
diminished by later events.
Because problems associated with OPC20 did not become apparent until well into the
testing phase of the main study, I chose not to replace the item at such a late stage and
to continue collecting responses from all of the subjects. A reasonable question thus
arises as to whether any of my child subjects entertained the same type of
considerations with regard to OPC20 as the adult subjects discussed above. In fact, I
recorded three such explanations provided by child subjects for false or non-targetlike judgments of this item, which are listed in (45), below:
(45)
a. Response to test sentence: “Wrong + because + to
help.” (female 5;4)
b. Puppet: But tell me why I’m wrong. S: “No, they
brought her some flowers + in the cart and then the dog
after +++ rided him + home.” (male 6;0)
c. “Umm the clown bought the dog so he could make the
little girl laugh.” (female 7;4)
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
361
Moreover, I recorded one child, who had also incorrectly judged the sentence false
and who offered the following response to the question, “So why do you think that
clown bought a dog?”:
(46)
“Because he + he wanted to ride on it but he didn’t want to
choose to ride on it.” (female 5;1)
Notably, this particular child’s response seems to be very similar in content, though
not in style, to the adult remarks cited in (44), since I interpret her remarks to mean
that riding on the dog was not the primary purpose the clown had in mind when
buying the dog. On the basis of the evidence reviewed here, I reiterate that I think
that the number of non-target-like responses obtained on OPC20 may have been
artificially inflated by the problems of design that I have noted. Consequently, it
remains my contention that the performance of my child subjects on the two OPC
items provides evidence for early, rather than delayed acquisition of structures of this
type.
4.5.4
Passive sentences
In this section, I analyse the performance of subjects on the four passive sentences
included in the study, which consisted of two actional passives (AP) and two
nonactional passives (NAP). As discussed in §4.3.1.3, two of these items featured an
actional verb, AP21 (The boy was chased by the duck) and AP22 (The monkey was
bitten by the swan), and two featured a nonactional verb, NAP23 (The snake was
watched by the rabbits) and NAP24 (The elephant was heard by the dog). As
standard in the study, the story context accompanying each passive item provided
support for two potential readings of the sentence: an active or non-target-like
reading, in which the matrix subject is assumed to play the logical role of subject of
the matrix verb, and a passive or target-like reading.
The null and experimental hypotheses for this condition were as in (47), below:
D.L. Anderson, University of Cambridge
362
Chapter 4: Experimental Design and Presentation of Results
(47)
a. Null hypothesis: The child shares the same grammatical
knowledge of the passive as the adult and thus will only
allow a passive or target-like reading of the sentence.
b. Experimental hypothesis: The child lacks a general
ability to interpret a displaced object argument and will
therefore be limited to the assignment of active or nontarget-like readings of the passive sentence.
Looking first at the performance of child and adult subjects on the two APs, Table
4.29, below, compares each age group in terms of the number of passive or target-like
readings provided per item and for the two items combined:
Age group
AP21
AP22
Both items
3;4 to 4;4
9
(81.8%)
9/9
(100%)
18/20
(90%)
4;6 to 5;5
10
(90.1%)
11
(100%)
21
(95.5%)
5;6 to 6;3
8
(72.7%)
11
(100%)
19
(86.4%)
6;5 to 7;5
11
(100%)
11
(100%)
22
(100%)
Adults
11
(100%)
11
(100%)
22
(100%)
Table 4.29: Total target-like responses per age group for actional passives
(NB: Where the total number of subjects tested was less than eleven, the number of
responses obtained appears over the total number of subjects tested.)
According to a statistical analysis of the results, there was no significant difference
observed when the performance of any of the child groups was compared with that of
the adult controls (Kruskal-Wallis test, χ2 (4, N=55) = 6.175, p < .186). Therefore, it
can reasonably be claimed that even the youngest subjects in the study performed like
adults with respect to the two APs. Furthermore, according to the results of a
McNemar’s test (N=54, p< .063), I found that none of the groups differed in terms of
performance on the two individual items. Like Cromer (1970), then, I found no
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
363
evidence that my subjects experienced an across-the-board impairment of their ability
to interpret a displaced object argument. Furthermore, these results comport with
other findings reported in the literature. For example, Maratsos et al. (1985)
demonstrated that 4-year-olds experience no particular difficulty with the
interpretation of passives featuring actional, as opposed to nonactional verbs, while
Fox and Grodzinsky (1998) reported that their subjects, aged 3;6 to 5;5, performed
well on both actional and nonactional passives, with the exception of those
nonactional passives that included a by-phrase. (See also DeMuth 1989, 1990, who
demonstrates that verbal passives are readily acquired by very young speakers of
certain non-Indo-European languages, such as Sesotho.)
In the case of AP21, for which I recorded a total of six non-target-like judgments, I
regrettably did not obtain any clear explanations of this type of response. However, I
note that four of the six subjects who gave incorrect judgments of AP21 demonstrated
correct recall of the details of the story when asked comprehension questions in the
follow-up phase of the task. Thus, I think it is unlikely that the non-target-like
performance of these subjects derived from poor comprehension of the accompanying
story; rather, I think that each simply chose to assign the sentence an active
interpretation, it being one afforded by their grammar and one which accorded with
the story details.
With respect to the second actional passive, AP22, I obtained even less data in the
post-judgment phase of the task than for AP21, since AP22 was correctly interpreted
by even my youngest subjects. Therefore, I turn directly to a review of subject
performance on the two passive items that featured nonactional verbs.
For NAPs, the performance of subjects in each of the age groups is summarized in
Table 4.30, below:
D.L. Anderson, University of Cambridge
364
Chapter 4: Experimental Design and Presentation of Results
Age group
NAP 23
NAP 24
Both items
3;4 to 4;4
3/10
(30%)
3
(27.3%)
6/21
(28.6%)
4;6 to 5;5
5
(45.5%)
6
(54.6%)
11
(50%)
5;6 to 6;3
6
(54.6%)
8
(72.7%)
14
(63.6%)
6;5 to 7;5
9
(81.8%)
6
(54.6%)
15
(68.2%)
Adult
11
(100%)
11
(100%)
22
(100%)
Table 4.30: Total target-like responses per age group for nonactional passives
(NB: Where the total number of subjects tested was less than eleven, the number of passive or
target-like responses provided appears over the total number of subjects tested.)
A review of the data contained in Table 4.30 indicates that child subjects in all age
groups performed relatively worse on nonactional than actional passives, a finding
that comports with the results obtained in a number of previous studies, including
Maratsos et al. 1979, 1985, de Villiers et al. 1982, Gordon and Chafetz 1986, and
Pinker et al. 1987. (Alternatively, see Fox and Grodzinksy 1998 for a somewhat
more complicated set of findings.) According to a statistical analysis of the results,
children in the first four age groups performed as a single group with respect to NAPs
(Kruskal-Wallis test, χ2 (3, N=44) = 6.442, p < .092), with none of the four groups
demonstrating target-like performance on these items. A significant difference was
obtained, however, when the four child groups were compared with the adult control
group (χ 2 (4, N=55) = 20.316, p < .001) and when only those in the oldest child group
(i.e. group 4) were compared with the adults (Mann-Whitney test, U (11,11) = 27.500,
p < .028). Thus, even subjects between the ages of 6;5 and 7;5 failed to demonstrate
target-like knowledge of NAPs.
Looking at subject performance within age groups, the results of a McNemar’s test
revealed that there was no significant difference in the number of passive readings
D.L. Anderson, University of Cambridge
365
Chapter 4: Experimental Design and Presentation of Results
provided for NAP23 or NAP24 (N=54, p < 1.000). Therefore, the difficulty that
child subjects experienced was more likely related to the nonactional status of the two
items, rather than to their specific form.
Figure 4.17, below, offers a graphic comparison of subject performance on APs and
NAPs. To review, while target-like performance on APs was observed to be fairly
consistent across all age groups in the study, performance on NAPs had not reached a
target-like standard even for those subjects over the age of 6;5:
2.5
Mean target-like responses
2.0
1.5
1.0
Sentence type
.5
Actional passives
0.0
Non-act. passives
3:4 to 4:4
5:6 to 6:3
4:6 to 5:5
Adult
6:5 to 7:5
Age group
Figure 4.17: Percentage of target-like responses by age group for passive items
Returning to my analysis of subject performance on the NAP, I was fortunately able
to obtain more informative post-judgment data for these two items than for the two
APs. Taking NAP23 as a representative example, I first review the basic contextual
information associated with this item, which was earlier presented in §4.3.1.3:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
Passive or TL reading - False
Active or NTL reading - True
The rabbits don’t see the snake until a
hedgehog advises them to look up in
the tree.
The snake watches the rabbits from a
tree branch above them.
366
Table 4.31: Story contexts for NAP23, ‘The snake was watched by the rabbits.’
When asked to explain their false, or target-like judgment of NAP23, adult and child
subjects offered the types of standard responses illustrated in (48) below:
(48)
a. “Because the rabbits didn’t look at the tree before they
put their picnic down.” (adult subject no. 26)
b. “The rabbits were watched by the snake.” (male 4;2)
c. “He (= puppet) said the rabbit was looking.” (male 5;8)
d. “The rabbits don’t watch the snake. They didn’t watch.
The snake watched the rabbits.” (male 6;0)
Turning now to non-target-like performance on this item, I note that some of the postjudgment data I collected would appear to corroborate a finding earlier reported in the
literature, which is that young children sometimes display non-adult-like knowledge
of the meaning of perception predicates such as watch (see, e.g., Goodluck and
Roeper 1978 and de Villiers et al. 1982; see also the results of my pilot study,
discussed in §4.4). Specifically, comments provided by two of my subjects under the
age of four, both of whom gave non-target-like judgments of NAP23, would seem to
suggest that these children interpreted the verb to watch as being synonymous with
either the verb to look at or the verb to see. When asked whether the rabbits were
watching the snake in the story, both of these subjects answered in the affirmative.
They were then asked, “When did the rabbits watch the snake?” The first child, a
female aged 3;6, replied, “When the hedgehog came.” She thus appears to be
referencing the point in the story where the hedgehog drew the rabbits’ attention to
the snake, an event that is consistent with the rabbits having seen the snake but not
having watched it. And the response of the second child, a male aged 3;8, is even
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
367
more telling in this respect, since he explained that the rabbits watched the snake:
“…cause they seen + cause they see what he was doing.”
With regard to the child subjects over the age of 3;10 who also gave non-target-like
judgments of NAP23, I regrettably failed to obtain any clear explanations of these
judgments. However, several of these subjects were asked to answer follow-up
comprehension questions, which I noted were consistently answered correctly. Thus,
for example, when asked if the rabbits were watching the snake in the story, the
subjects referenced here, unlike the two children discussed above, correctly answered
“No.” Furthermore, two of these subjects, of the relatively young ages of 3;10 and
4;1, responded to the follow-up questions they were asked by correctly retelling the
entire story that accompanied NAP23, including dialogue. Therefore, I submit that
their non-target-like performance on this particular item was not linked to poor
comprehension of story details. Instead, as I earlier argued in the case of non-targetlike performance on the AP, I think these subjects simply chose to assign the NAP an
active reading.
Since post-judgment data pertaining to NAP23 were so scarce, I reviewed the data
collected in connection with NAP24 (The elephant was heard by the dog) looking for
evidence that might further support the claim that certain of my subjects assigned an
active rather than passive interpretation to the NAP. Notably, I collected no evidence
from the post-judgment phase of the task that would suggest that subjects experienced
similar difficulties with the interpretation of the verb to hear that some had
experienced with the verb to watch. Instead, even younger subjects who provided
non-target-like judgments of NAP24 (i.e. “false”) appear to have simply assigned the
sentence an active interpretation, as suggested by the sample explanations illustrated
in (49), below:
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
(49)
368
Why was the elephant not heard by the dog?
a. “Cause + cause they didn’t + they + cause the elephant
didn’t hear them (= dog and bird) did he? (female 3;6)
b. E: Did he (= puppet) say a silly thing? What did he
say? Do you remember?
S: “Dog umm the elephant heard the dog.” (male 4;6)
c. Response to test sentence: “The dog heard the
elephant. He (= puppet) said it the wrong way round.”
(female 7;3)
Finally, I observed three interesting examples of children apparently using passive
morphology to convey an active interpretation of the sentence, which are reminiscent
of certain data reported by Whitehurst, Ironsmith and Goldfein (1974) and Horgan
(1978). Horgan, for example, reported that her 2 to 4-year-old subjects produced
what she termed reversed reversible passives, as when describing a picture of a cat
chasing a girl, as, “The cat was chased by the girl” (ibid.:72) In (50), below, I offer
three similar examples obtained from my own study:
(50)
a. Response to test sentence ‘The snake was watched
by the rabbits’: “No + cause + the snake was watching
+ by the rabbits.” (female 5;3)
b. E: But why was that one wrong? He (= puppet) said
the elephant was heard by the dog.
S: “Well ++ well the elephant wasn’t heard by the dog.”
E: Did the dog hear the elephant in the story?
S: “Yeah.” (female 5;1)
(Note that the only plausible interpretation of this
subject’s “passive” sentence, above, is an active one.)
c. Response to test sentence: “No + + the umm umm + the
dog was heard by the elephant.” (female 3;11)
(Note that a passive reading of the child’s sentence
does not accord with the story details.)
Since two of the subjects quoted in (50) are over the age of five, it is important for me
to acknowledge that the findings reported here may not strictly match those reported
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
369
in the earlier studies, given that those studies involved only children below this age.
With regard to the first child quoted in (50) above, her response may simply represent
a speech error, since she gave target-like judgments of all four passive sentences,
including the item referenced above and so in all other respects demonstrated targetlike knowledge of this structure. In the case of the latter two subjects, however, the
first gave target-like responses to all passive items with the exception of the
referenced item, while the second gave target-like judgments of only actional
passives. The latter two children thus displayed some knowledge of the grammatical
rules for passivization in English but nevertheless appeared willing to extend the
range of meaning associated with these structures.
In explaining her own subjects production of reversed reversible passives, Horgan
(ibid.:78), proposed that even young children recognize a distinction between
reversible and non-reversible passives, since ungrammatical reversal of agent and
object arguments, as in the production of reversed reversible passives, does not occur
when the sentence features semantically non-reversible arguments in the first place.
In particular, she hypothesizes that young children may produce reversed reversible
passives because they associate this form with expression of the general notion of
“mutual activity” rather than with alternative expression of agent-object relations.
Interestingly, however, I recorded no such errors in the post-judgment production data
that I collected for the two actional passives, which also involved semantically
reversible sentences. Thus I am led to speculate, after a suggestion offered by de
Villiers et al. (1982), that the presence of a nonactional or perceptual verb in a
passive item may increase the processing demands associated with a sentence of this
type. In particular, I believe that at least in the early stages of linguistic development,
it is reasonable to consider that processing demands would be greater for the
interpretation of a passive that, atypically, involves the syntactic promotion of an
experiencer object, rather than for one that involves the syntactic promotion of a
patient.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
4.5.5
370
Comparison of group performance on the TC and DC
In this section, I compare group performance on the TC and on the DC and argue that
a meaningful correlation can be drawn between the two. I have chosen not to
compare TC performance with performance in the IR, OPC, or passive conditions,
however, given the lack of parity in the number of items tested. For example, while I
administered twelve TC test and control items, I administered only two IR and only
two OPC items. And, with regard to the passive, the four items I administered were
divided into two of the actional type and two of the non-actional type. Of course, the
difference between the number of items administered in the TC and in the DC
conditions is sizeable as well, since I tested only four in the latter case. Nevertheless,
by converting the scores obtained into percentages, I believe that a reasonable
comparison of performance across the two particular conditions can be made.
As previously reported, child subjects in groups 1 to 3 statistically performed as a
single group on TC items as well as on DC items. Notably, a similar finding was
obtained when performance in the TC and DC conditions was compared according to
a Wilcoxon signed ranks test, with no significant difference observed in the mean
percentage of object readings provided for either type of construction, when the
performance of subjects in the first three age groups was statistically analysed (Z = 1.338, p <.181). For groups 4 and 5, however, a significant difference was obtained
when performance across the two conditions was compared, using the same test (Z = 2.257, p< .024, for group 4 and Z = -2.848, p < .004, for group 5)44. Thus, the oldest
children in the study behaved like the adult controls, in that each group provided
significantly more object readings of the TC than of the DC. Figure 4.18, below,
provides a graphic illustration of group performance on the two constructions:
44
According to the results of the Wilcoxon test, the finding reported for group 4 holds only when
performance is compared between DCs and the adjusted set of ten TCs, with potentially problematic
items easy 2 and hard 6 removed. That is, when data obtained from the full set of TCs is used, no
significant difference is observed (Z = -1.559, p < .119). As I believe results based on the
administration of the set of ten items provide a more accurate picture of subject ability, I take the
statistical analysis of the reduced set of findings to be the more reliable and informative.
D.L. Anderson, University of Cambridge
371
Chapter 4: Experimental Design and Presentation of Results
1.0
.9
.8
Mean % Object Readings
.7
.6
.5
.4
Degree NOS
.3
.2
3:4 to 4:4
TCs
4:6 to 5:5
5:6 to 6:3
6:5 to 7:5
Adult
Age group
Figure 4.18: Comparison of group performance on TCs and DCs
As Figure 4.18 illustrates, it is only after the age of 6;5 that children begin to treat the
TC and DC in a distinct manner, at least as regards the lack of availability of a subject
reading in the case of the TC. The findings reported above thus support my
contention that children initially treat both constructions as ambiguous, with a similar
bias towards the subject reading of each. Note that I do not assert, however, that
children assign the same structural analysis to the TC and the DC prior to the age of
six. Rather, my claim is only that prior to target-like acquisition of the TC, the child
assumes that both the TC and the DC are associated with two legitimate interpretive
options, one of which involves subject control of embedded PRO and the other a null
operator-gap dependency.
D.L. Anderson, University of Cambridge
372
Chapter 4: Experimental Design and Presentation of Results
4.5.6
Comparison of individual performance across test conditions
In §4.5.0.1, I observed that of the forty-four subjects included in the study, three could
be classified as P-R Users, thirty as Intermediates and eleven as Passers on the basis
of their performance on the TC. In this section, I compare the performance of
individual children not only with respect to the TC but across all of the conditions in
the study. Looking first at the three P-R Users, Table 4.32, below, compares the total
number of target-like readings provided by each of these subjects in each of the six
experimental conditions. (Note that results for DC items are separately distinguished,
since these figures represent the number of object readings provided, as opposed to
the number of target-like readings provided.)
TC
IR
OPC AP NAP
Subj.
no. Age (n =10) (n=2) (n=2) (n=2) (n=2)
DC
(n=4)
4
3;8
0
1
2
2
0
1
22
5;4
1
1
0
2
1
1
25
5;8
1
1
2
1
1
2
Table 4.32: Number of target-like (or object) readings per test condition – P-R Users
As Table 4.32 indicates, there is no condition in which the performance of these three
subjects was observed to be uniformly target-like. Nevertheless, I would argue that
the data reported above still provide no evidence that any of these three subjects
experienced an across-the-board impairment of their ability to interpret an NOS nor
any general impairment of their ability to interpret a displaced syntactic constituent.
As case in point, subject no. 4, who by virtue of his age could be considered the most
likely of the three to possess limited syntactic ability, performed in a target-like
manner on the two PC items and provided at least one object reading of the DC.
Moreover, this subject proved exceptionally competent at explaining his judgments of
various test/control sentences. By comparison, subject no. 25, a girl, was shy and
therefore offered relatively fewer explanations of her judgments; even so, she, like
subject no. 4, performed well on the PC and provided two object readings of the DC.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
373
Thus, the second subject is the only one of the three P-R Users for whom it can be
claimed that performance was not target-like in any of the NOS conditions. This
subject, a boy, had generally demonstrated good understanding of story details when
responding to follow-up comprehension questions but provided few explanations of
his judgments, even when requested to do so. It is possible that this particular child
did not fully understand the importance of his role as teacher to the puppet, given that
there were a number of occasions in which his judgment of the test/control sentence
appeared to be offered with little reflection and, in some cases, was subsequently
changed without explanation. In this respect, the performance of this subject does not
conform with that of his fellow P-R Users nor with that of the other participants in the
study, who, as noted in §4.5.0.2, generally proved able to provide appropriate
explanations of both target-like and non-target-like judgments of experimental items.
I thus prefer to consider this subject an exceptional case.
Overall, then, I contend that the data reported in Table 4.32 do not provide support for
the claim that the P-R User lacks the syntactic ability to interpret the TC and is
consequently forced to rely on standard word order cues as a determinant of
grammatical relations (cf. C. Chomsky 1969, Cromer 1970). The fully target-like
performance of two of these subjects on the AP speaks directly against their reliance
on the use of such an interpretive strategy. Furthermore, as earlier noted, two of these
subjects were over the age of five and accordingly beyond an age at which reliance on
the use of a non-grammatical strategy for sentence interpretation might be reasonably
assumed.
While it remains possible, as Goodluck (1991:98-9) has asserted, that child learners
find the derivation of the TC particularly challenging, the results reported above are
equally consistent with the hypothesis that the P-R User displays a strong interpretive
preference for the subject reading of the TC but nevertheless is not restricted in her
ability to derive the object reading. This proposal may seem controversial, given that
two of the three P-R Users in the study offered no more than one target-like reading of
the TC. Yet, as I earlier reported, four of the Intermediate subjects in the study were
classified as such strictly on the basis of having provided two target-like judgments of
D.L. Anderson, University of Cambridge
374
Chapter 4: Experimental Design and Presentation of Results
the TC rather than one. Thus, I think it is reasonable to consider that the grammatical
ability of two, if not three, of the P-R Users referenced above is perhaps more
appropriately characterized as Intermediate, despite the strong preference that each
displayed for the subject reading of the TC.
Turning now to the performance of the eleven subjects in the study who were
classified as Passers, Table 4.33, below, lists the total number of target-like readings
provided by these subjects in each of the five experimental conditions, as well as the
number of object readings they provided for DC items:
TC
IR
OPC AP NAP
Subj. Age of
no. subject (n=10) (n=2) (n=2) (n=2) (n=2)
DC
(n=4)
13
4;7
9
1
2
2
2
1
23
5;6
9
1
2
1
1
3
24
5;7
9
2
2
1
2
2
34
6;5
9
1
2
2
2
3
35
6;5
9
2
2
2
2
2
36
6;8
8
a
2
2
2
1
3
37
6;10
9
2
2
2
1
2
38
6;10
10
0
2
2
2
3
41
7;2
10
2
2
2
2
2
43
7;4
10
2
1
2
1
4
44
7;4
9
2
2
2
1
3
Table 4.33: Total number of target-like (or object) readings per test
condition - Passers
(a
This child failed to provide responses for two of the ten TC items, and so the
score reported here represents eight target-like responses out of eight possible.)
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
375
According to the results reported in Table 4.33, six of the eleven Passers (i.e. nos. 24,
35, 36, 37, 41, and 44) performed in a target-like manner on both PCs and IRs, while
the remaining five subjects (i.e. nos. 13, 23, 34, 38, and 43) demonstrated target-like
competence in only one of these two conditions. Therefore, it is only with respect to
the first set of six subjects that it can be claimed that all NOS, including the TC, ODC,
OPC, and IR, were consistently interpreted in a target-like manner. However, I note
that for the three subjects who made a single error on the IR, all made this error on
IR17, an item I previously identified as being associated with a disproportionate
number of errors (see the discussion in §4.5.2). Additionally, for two of these
subjects, nos. 13 and 34, the error on IR17 represented the only non-target-like
response provided in any of the experimental conditions. Finally, the poorest
performance reported for an individual Passer is that reported for subject no. 23, who
provided non-target-like responses for OPC, AP, and NAP items; nevertheless, this
particular subject’s performance clearly represents the exception rather than the rule
for those in the Passer category.
In general, the data in Table 4.33 indicate that, for Passers, target-like performance on
the TC was not perfectly correlated with target-like performance in all of the
remaining conditions. However, because I have reason to believe that the error rate
reported in Table 4.33 may be artificially inflated by the use of a potentially
problematic test item, IR17, I think it is prudent that I remain cautious in making any
negative evaluation of the grammatical capabilities of these subjects.
Finally, as regards the performance of the thirty subjects classified as Intermediates,
who ranged in age from 3;4 to 7;3, I will not list data for individual subjects given the
sizeable number of children involved. There were only three in this group, all over
the age of 6;0, who gave all target-like readings of IRs, OPCs, APs, and NAPs.
Interestingly, then, the grammatical competence displayed by these three subjects on
various NOS did not correlate with target-like performance on the TC. There were
additionally eight Intermediate subjects who made only a single error on the IR, OPC,
AP, or NAP: Five of these errors were associated with an NAP item and two were
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
376
made on IR17. Thus, these represent fairly predictable types of errors, as based on the
relatively poor performance of all subjects on the NAP, and on IR17 in particular.
The remaining nineteen Intermediates were mixed in terms of the extent of their nontarget-like performance in each of the four non-TC conditions. The majority of errors
reported for these subjects occurred on either one or both NAP items, although there
were scattered instances of non-target-like responses in other conditions. The poorest
overall performance reported for any one individual Intermediate was for subject no.
5, aged 3;9, who failed all passive items and made a single additional error on OPC18.
Notably, however, this same child performed in a completely target-like manner on
IRs and also provided one target-like response for an OPC item; thus, even this
subject’s performance does not suggest an across-the-board impairment of her ability
to interpret an NOS.
In summary, my analysis of individual performance in the main study casts doubt on
the validity of two of the research questions I posed at the beginning of this chapter.
First, I found no necessary correlation between delayed acquisition of the TC and
delayed acquisition of other NOS. Second, delayed acquisition of the TC, even in the
case of the P-R Users in the study, cannot be explained in terms of a general difficulty
that children experience in their ability to interpret a syntactically displaced object
argument. Finally, I note that the findings reported in both the present and preceding
sections do not support the contention that NOS are syntactically complex and,
consequently, relatively late-acquired (cf. Goodluck and Behne 1992) nor the
contention that NOS share a similar structural analysis and, consequently, are
concurrently acquired. While my data cannot speak to the relative merits of different
syntactic analyses of NOS, I do think it is informative that a child demonstrating
competence in interpreting one such structure does not necessarily demonstrate a
similar competence in her interpretation of other NOS.
D.L. Anderson, University of Cambridge
377
Chapter 4: Experimental Design and Presentation of Results
4.6
Post-test: BPVS
In Chapter 3, §3.2.1.0, I reviewed Cromer’s (1970) experimental study of TC
comprehension, in which he advanced the claim that verbal mental age (VMA), as
determined by the results obtained on the Peabody Picture Vocabulary Test (PPVT),
served as a more accurate means of predicting a child’s ability to interpret the TC than
the child’s chronological age. In particular, Cromer observed no direct correlation
between a subject’s chronological age and the subject’s classification as either a P-R
User, Intermediate, or Passer. For example, he reported that although all Passers in
his study were over the age of 6;7, there were also children above this age who
performed as Intermediates and even as P-R Users. In contrast, he reported the
following correlations between subject performance on the PPVT and on the TC
(ibid.:401, adaptation of Cromer’s Table 1)45:
Mental age on PPVT
P-R Users Intermediates Passers
(years:months)
2;11 – 5;7
17
10
0
5;9 – 6;6
0
8
0
6;8 – 10;8
0
1
5
Table 4.34: Performance on the TC correlated with mental age Cromer (1970)
According to the classification of subjects reported in Table 4.34, there was no subject
with a VMA over 5;7 who performed as a P-R User, all subjects with a VMA between
5;9 and 6;6 performed as Intermediates, and all but one subject above the VMA of 6;8
performed as a Passer. Cromer reported that these trends were all significant beyond
the 0.001 level.
45
Cromer’s subjects were tested on the PPVT two months prior to their participation in his study of
TC comprehension. Therefore, it is important to recognize that the two abilities were not concurrently
tested when interpreting the significance of the correlations he reports.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
378
In considering the classification of my own subjects according to Cromer’s original
criteria, I have previously noted that the vast majority of my subjects (i.e. forty or
91%) would fall into the Intermediate category. However, when these criteria are
relaxed to allow a margin of error of one out of twelve items, as first suggested in
§4.5.0.0, then the breakdown of subject classification is as reported at the beginning
of this section: Three P-R Users, thirty Intermediates, and eleven Passers. Like
Cromer, I find that these three groups of subjects cannot be neatly divided in terms of
chronological age, as there is a considerable degree of overlap between the three
classes. Intermediates in my study, for example, included subjects between the ages
of 3;4 and 7;3, while Passers included those between the ages of 4;7 and 7;4. For this
reason, I was interested to test with my own subjects the validity of Cromer’s claim
that VMA serves as a better predictor of TC performance than chronological age.
Given time and scheduling constraints, I was able to administer the more recent
equivalent of the PPVT, the British Picture Vocabulary Scale (BPVS) (Dunn and
Dunn 1997), only after I had completed data collection in the pilot and main studies.46
For some subjects, this meant that no more than one week elapsed between the
completion of experimental testing and the administration of the BPVS. For some
others, however, a period of nearly two months separated the child’s participation in
the experimental study and my administration of the BPVS. Therefore, this
consideration must be kept in mind when one interprets the results obtained in the
post-test described here.
According to the results I obtained, I do see some consistency in the relation between
BPVS score and subject performance on the TC. For example, all ten Passers in my
46
I administered the BPVS (1997) without the aid of an experimental assistant since the task requires
only that the child match spoken vocabulary items with graphic representations of a spoken word. As I
am a speaker of American English, a reasonable question could be raised as to whether administration
of the BPVS met certain stipulated conditions. The first of these was that vocabulary items should be
read in the first instance using the pronunciation characteristic of the prevailing local dialect and then,
in the second instance, using the pronunciation of standard British English. In order to comply as
closely as possible with these recommendations, I audiotaped both the local and standard pronunciation
of the vocabulary items and practiced both types of pronunciation before reading the items aloud in the
actual trial. As I observed no instances in which a child reported difficulty in understanding the
vocabulary items as pronounced, I would therefore contend that the recommended conditions for
administration of the test were satisfactorily observed.
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
379
study had a VMA over 6;1; moreover, all children with a VMA of 8;0 or over could
be classified as Passers. However, it must be emphasized that I did not find that a
relatively high BPVS score served as a perfect predictor of target-like TC
performance. For example, of the three subjects who attained a BPVS score in the
extremely high range, two children, aged 3;8 and 4;2, could be classified as
Intermediates in terms of TC performance and only one, aged 4;7, a Passer.
Similarly, those attaining BPVS scores in the moderately high range included both
Intermediates and Passers.
In the case of the three P-R Users in the study, two had VMAs of 4;11 and one had a
VMA of 5;10. However, there were a number of other subjects with similar or lower
VMAs who could be classified as Intermediates on the basis of their TC performance
and, therefore, I did not find VMA to be a particularly reliable predictor of P-R Use
status. Finally, as regards the thirty Intermediate subjects, VMA for these subjects
ranged from 3;6 to 7;5, which I note is roughly comparable to the findings reported by
Cromer in Table 4.34, where the VMA of all but one of his Intermediate subjects can
be seen to range between 2;11 and 6;6. Since I thus observed considerable variation
in VMA in the Intermediate group, I am reluctant to claim that any real correlation
exists between VMA and target-like ability to interpret the TC.
Finally, I used BPVS scores to address one particular methodological issue which
arose during subject selection for the pre-test. This concerned my desire to select
experimental subjects who were of average academic ability so as to ensure that, as
far as possible, I was investigating typical development of the ability to interpret
NOS. To this end, I solicited the assistance of teachers in identifying students of
exceptionally high or exceptionally low academic ability, who could thus be excluded
from participation in the pre-testing. However, my decision to administer the BPVS
as a post-test presented me with an opportunity to retrospectively evaluate the general
verbal ability, or verbal intelligence, of the child participants, abilities which can
serve to indicate a child’s general “scholastic aptitude,” according to Dunn and Dunn
(op.cit.:2).
D.L. Anderson, University of Cambridge
380
Chapter 4: Experimental Design and Presentation of Results
Table 4.35, below, provides a breakdown of the performance of the forty-four
subjects on the BPVS. I have used the classifications provided in the test materials,
which are based on the use of standardized scores, to rank the vocabulary ability and
scholastic aptitude of subjects according to the performance of their age-matched
peers.47
Score range
Percentile
Rank
No. of Subjs.
Low average
15 to 49%
8 (18%)
Average
50%
1 (2%)
High average
51 to 85%
26 (59%)
Moderately high
85 to 97%
6 (14%)
Extremely high
over 98%
3 (7%)
Table 4.35: Classification of experimental subjects in terms of
BPVS percentile ranking (Anderson 2002a,b)
The most striking finding reported above is that 80% of the participants in my study
scored in the upper 50th percentile on the BPVS. That is, 80% of the subjects could be
classified as ranging from high average to extremely high in terms of their vocabulary
(and scholastic) ability. This finding would therefore appear to suggest that, despite
the measures I took to select participants of average academic ability, my study
nonetheless included a disproportionate number of subjects with above-average
vocabulary skills.
However, according to the distribution of scores reported for the sample of subjects
on which the BPVS was standardized, it is predicted that the majority of scores drawn
from any random sample – specifically, 68% - will fall within the low to high average
range. And as can be readily seen from the data reported in the table above, thirty47
The percentile rank reported in Table 4.35 represents the percentage of children of the same age in a
standardized sample who scored equal to, or below, the individual subject’s score. It therefore
represents a “deviation norm” since it serves as a measure of how much an individual subject’s
performance differs from that of an average group of age-matched peers (Dunn and Dunn 1997:16).
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
381
five of my subjects, or 79.5%, of the total did in fact obtain scores in the low to high
average range. Nevertheless, this still leaves nearly 20% of the subjects with BPVS
scores that indicate higher than average vocabulary ability. While I am not unduly
concerned by the latter finding, I acknowledge that my subject group was not as
homogeneous as I had sought at the outset of the study. Therefore, I would advocate
prior testing of potential participants for their vocabulary ability, rather than posttesting, in any future investigation of a subject’s ability to interpret NOS.
4.7
Conclusion
In summary, my analysis of both group and individual performance in the main study
casts doubt on the validity of the first of the hypotheses I presented in §4.0, which
attributed children’s delayed acquisition of the TC to the syntactic complexity of such
structures. This is because I found no necessary correlation between delayed
acquisition of the TC and delayed acquisition of other NOS, as would be anticipated if
this hypothesis were correct. I also found no evidence for concurrent acquisition of
NOS, again as would be predicted if NOS share a similar degree of syntactic
complexity. Furthermore, this latter finding does not provide support for the wellaccepted view that NOS share certain fundamental features of their syntactic analysis.
However, it is important for me to acknowledge that my data are not conclusive in
this regard, since the independent acquisition of NOS is also consistent with a
scenario in which it is not the syntactic structure of the various NOS that determines
the order of their acquisition but instead other factors which have yet to be
determined.
With regard to hypothesis (1b), which attributed children’s delayed acquisition of the
TC to deficient lexical knowledge of the tough adjective, I also found no evidence in
support of this hypothesis. First, my statistical analysis of group performance on the
TC revealed no effect of the choice of a particular tough adjective or adjectives on
children’s non-target-like interpretation of the TC. Rather, non-target-like responses
were distributed equally across sentences containing different tough adjectives.
Second, a post-test of my subjects’ vocabulary ability produced another finding
problematic for hypothesis (1b), which is that a child’s relatively advanced score on
D.L. Anderson, University of Cambridge
Chapter 4: Experimental Design and Presentation of Results
382
this post-test did not necessarily correlate with his or her target-like performance on
the TC.
Finally, I pointed out that the target-like performance of even my youngest subjects
on the actional passive does not support the validity of the third of the hypotheses I
evaluated, which attributed delayed acquisition of the TC to a general impairment that
children experience in their interpretation of a syntactically displaced object
argument.
One of the additional goals I pursued in this chapter was to identify areas of interest
for future research. Given the limitations in both the number of test items I offered in
the OPC and IR conditions and the problematic aspects of the design of certain of
these items, I think that further study of children’s ability to interpret these two
structures is warranted, particularly as the results I obtained on the OPC stand in
conflict with the results reported by Goodluck and Behne (1992) and Goodluck et al.
(1995). I think it would also be interesting to investigate what form (or forms) the
child assigns to her non-target-like interpretation of the OPC/IR, as the data I
collected in the present study, while suggestive, is inadequate to fully address this
issue.
Ideally, I would also like to extend my assessment of children’s ability to interpret the
ambiguous DC. As earlier noted, my adult subjects produced only object readings of
one of the test items, DC16 but produced a balanced number of subject and object
readings of DC13. Although I have speculated on the basis of these results that the
choice of embedded verb can influence the interpretive preference that adult subjects
display for a particular test item, I submit that wider testing of the DC would be
required to validate this hypothesis and to establish whether children are subject to the
same parsing influences.
In the following chapter, I offer further analysis of the experimental findings reported
in the present chapter, in particular, detailing the significance of these findings for
generative theories of language acquisition.
D.L. Anderson, University of Cambridge
Download