BMN ANGD A2 Linguistic Theory Lecture 13: Language Acquisition 2 Theories of Language Acquisition 1 Introduction There are very few ancient theories of language acquisition per se. Probably it is such a natural fact about humans that it didn’t come across as anything worth questioning. The Greeks did debate over whether words were in language were a natural ‘god-given’ aspect of the bits of the universe they refer to or whether they were invented by man, ultimately giving rise to the rationalist/empiricist divide on the question of the origin of knowledge, and one might speculate as to whether these points of view led ancient philosophers to the claim that language was given to children by the gods, or invented by them as they went through the procedure. But to my knowledge, the question was never really put in these terms. Certainly with ancient grammar mainly focussing on word meanings and morphological properties of words, the problem of learning complex syntactic systems was not considered and we had to wait until the early 20th century before syntax became seen to be a central part of grammatical investigation and it was recognised that there was a complex system that would have to be acquired. Even then, however, theories of language acquisition tended to down play the problem and so were entirely inadequate in accounting for the fact that children can and do do it. For example, Bloomfield (1933), influenced as he was by Behaviourist Psychology, put forward the following proposal as to how language acquisition works. Children make noises from birth. Some of these noises approximate language noises of the child’s parents. Whenever the child produces such noises he or she is rewarded by parental attention (aaah! she just said mama!), which reinforces the behaviour of producing that sound. In this way, appropriate languages sounds become more likely to be part of a child’s output and nonappropriate sound production decreases, until such a time that the only sounds that a child makes are those of the parents’ language. While this theory does make use of some real aspects of language acquisition, such as the fact that children do babble, it is clearly totally inadequate to account for even the simplest observations concerning what children actually do. We saw last week that children acquire an abstract grammar not sentences (nor indeed reinforced linguistic behaviour) and it is clear that such a system could not possibly be learned by the proposed method. It requires the child to come out with all the relevant constructions in the first place, so that these can be reinforced by the parents, and the chance that this could happen accidentally are so close to zero as to be not worth considering. Thus the child would already have to possess the linguistic system that they were supposed to be learning in order to be able to learn it! Nearly 25 years later, the Behaviourist B.F. Skinner published another account of language acquisition (Skinner 1957), making use of a more sophisticated model of language, which we will not go into the details of. More important is the fact that two years later Chomsky published a highly critical review of Skinners work (Chomsky 1959), pointing out that it still suffered from the same problems facing Bloomfield’s simplistic theory, though heavily disguised in what Chomsky saw as pseudo scientific notions and terminology. The subsequent fall of Behaviourism and the popularity of Chomsky’s Syntactic Structures (1957) virtually left Chomsky unchallenged in his rationalist approach to language and language acquisition. Mark Newson 2 The Innateness Hypothesis – early years Of course Chomsky’s approach to language acquisition is the rationalist one: the child is born with something which specifically functions to acquire language. It seems hard to avoid this conclusion given the facts of languages acquisition: the difficulty of the task and the ease with which it is carried out. However, although Chomsky had defeated the strong empiricist ‘tabula rasa’ approach, and had argued convincingly in favour of an innatist position, he did not at the time put forward any particular theory. His view was that the theory of language acquisition that was most likely to succeed would be based on a successful theory of language, which was naturally the primary objective of linguistics. Thus while Chomsky argued that an account of language acquisition without reference to some in built mechanism, which he strongly believed would have to be directly related to language, and not just some general ability to learn, he was not really in a position to say just what that mechanism was. Thus, the first attempts to provide a theory of language acquisition along Chomskyan lines were based on a rather vague notion of some kind of Language Acquisition Device (LAD) that the child came equipped with. As transformational grammar developed, and especially as the mathematical linguistics became better understood, it was suggested that what the LAD did was to order possible grammars to provide the learner with not only a set of hypotheses to be tested against the linguistic data presented, but also a set path through this set of languages, so that once a grammar was rejected, the learner would adopt the next hypothesis determined by the LAD. Thus the model of acquisition that followed these ideas was as in (1): (1) data → LAD → learned grammar In this way, we can see how this theory at least attempts to address the fundamental problem of language acquisition: how is it that the child is able to construct an abstract grammar when what they receive is representative sentences from the language generated by the target grammar. However, lacking much of the detail, at the time there was virtually no way to make this theory fit with observations about what children actually did. The LAD was some sort of ‘black box’ that if one fed linguistic data into it, out popped a hypothesised grammar, but just what went on inside the box was little understood. Thus the theoretical approach to the study of language acquisition and the experimental approach, which investigated children during the process of acquisition seemed miles apart with little chance of meeting up. 3 Chomskyan Inspired Theories of Acquisition This is not to say that there were no attempts during this period to try to account for child language acquisition patterns using the kind of linguistics that Chomsky introduced. In 1963 Martin Braine proposed to account for the early two word stage in terms of a simplified phrase structure grammar, which he called a Pivot Grammar. The idea is that children operate with a simplified set of categories, split into two: pivots and open categories and sentences are constructed from these by combining them following a simple phrase structure grammar: 2 Language Acquisition – Theories of Language Acquisition (2) S→OP S→PO S→OO So, for example, if the child had categorised the word see as a pivot and the words daddy and baby as open categories, the following sentences would be possible: (3) a b c daddy see see baby daddy baby The idea was that from this simple grammar the child was able to work up to a more complicated one in which there were more categories and more complex rules, including transformations. This idea however, although initially popular, suffered a number of problems. First it is not clear how entertaining a grammar which is clearly of a different nature to adult grammars is supposed to help the child to eventually acquire an adult system. One would have thought that if human language is something which comes from having a human mind, that children’s grammars should be of the same nature as adult grammars and that switching radically from something which is nothing like an adult grammar would hinder rather than help the process. Moreover, it was also pointed out that even at this stage, the similarities between child and adult language are greater than one would expect if children’s grammars were so radically different. For example, the word orders that children use on the whole tend to be alike to the adult language. This is obviously difficult to account for it children and adults make use of very different grammatical systems. Indeed, the evidence, such as it is, seems to indicate that even at this early stage, children operate with similar categories to adults, at least in terms of nouns, verbs and adjectives. As we know, the functional categories seem to come ‘on line’ along with the syntax spurt at about 2 and a half years. This is again something that the adoption of a pivot grammar would fail to account for: why would there be a slow development of grammar from the two word stage until about the age of 2 and a half and then a rapid expansion of grammar after this, coinciding with the development of functional categories, if children start with a pivot grammar and then abandon it (suddenly) in favour of something far more complicated? 4 The Effect of Constraints on Language Acquisition Theory As generative theory underwent development during the 1970s, moving from construction specific rules, to a constrained system using more general principle, then, for the first time, some notion of Universal Grammar became a tenable consideration. If rules are not directed at specific constructions in specific languages, then it can be proposed that the same rules are applicable to more languages. For example, many languages seem to have a wh-movement process, which moves interrogative elements to the front of the clause, though there are some differences as to what moves and where. A viable theory of Universal Grammar would of course make an innatist theory of language acquisition more specific. It could be claimed that children are born with the principles of Universal Grammar as part of their innate language faculty and it is this that aids them in their acquisition process. For one thing, an innate Universal Grammar imposes restrictions on 3 Mark Newson what the child can hypothesise about the language they are attempting to learn. This in itself would greatly help in the learning process as searching a restricted space of possible solutions for the right one is clearly an easier task than searching an unrestricted space. Universal Grammar defines what is a possible human grammar and therefore determines what is not possible and not to be hypothesised by the learner. This also answers the learning problem of Gold’s work in formal learnability theory: Gold showed that context free and context sensitive languages are unlearnable from positive data only (which seems to be what is available to children) given fairly generous assumptions about the learning situation and the conditions placed on the process. But the problem can be solved if human languages are not equated with the set of context sensitive or free languages, but rather constitutes a more limited set. Obviously, this is potentially what Universal Grammar does. It remains to be seen whether the set of languages defined by Universal Grammar are learnable in Gold’s model, but clearly that is a difficult question to ask until we know more about what Universal Grammar is. In a real sense, the notion of Universal Grammar also fleshes out what was vaguely reported as the LAD. The general learning situation assuming an innate Universal Grammar looks as follows: (4) data → Universal Grammar → particular grammar The linguistic data provide the child with evidence about which of the options made available by Universal Grammar are made use of in the target language, and so the child is able to home in on the correct grammar. Although the notion of Universal Grammar would be a step in the right direction for a viable innatist theory of language acquisition, it still remains to be shown how it could work. Obviously, Universal Grammar can only be seen as the basis of all human languages, not what generates human language as otherwise there would only be one language and this would not have to undergo any learning process at all. The fact that there is more than one possible human language and that children do undergo some period of language learning demonstrates that Universal Grammar must allow a certain degree of variation and that this is what is the cause of the learning process. We will discuss how this is done in the next section. 5 Principles and Parameters Theory By the time of the 1980s, grammatical theory had reached a stage at which grammatical rules were as generalised as possible: the X-bar principles of phrase structure and the movement rule Move could hardly be any more general. Such is the nature of such general rules that we can take them to be a part of all languages: all languages have phrases which have heads and these heads select complements, etc. in accordance with X-bar theory; all languages show some evidence of things moving from one place to another. Chomsky (1981) argued that these ‘rules’ are so general and therefore different to the kind of grammatical rules previously utilised in syntactic theories, that they should not be considered rules at all. Instead, he proposed that we term them ‘principles’. Principles are then what Universal Grammar is constructed from. However, principles cannot account for language variation. It may be true that all languages make use of the notions ‘head’ and ‘complement’, but languages differ in terms of how these elements are related to each other syntactically. For example, in English heads always 4 Language Acquisition – Theories of Language Acquisition precede their complements, unless the complement is moved. This is true no matter what the head. So, nouns, verbs, adjectives and adpositions all have following complements (which is why adpositions are called prepositions in English): (5) a b c d student [of linguistics] write [a letter] fond [of chocolate] under [the bridge] In a language like Japanese for example, the head always follows its complement (Japanese has post positions): (6) a [gengogakuno] gakusei linguistics(of) student [tegami o] kaku letter –acc write [hashi] shita bridge under b c Note that both English and Japanese are similar in some respects as they have phrases headed by nouns verbs and prepositions (and adjectives, I just couldn’t find one for Japanese), only the order differs: (7) NP NP N' N' N PP PP student of linguistics N gengogakuno gakusei linguistics(of) student Both of these structures conform to the same basic rules: (8) X' → X YP or X' → YP X Thus we might claim that while the X-bar principles are universal, languages differ in how these principles are realised and in particular in whether the head precedes or follows its complement. Thus we might suppose a general X-bar rule which makes no reference to order, saying that an X' contains the head and its complement, and then have a variant rules which says that either heads come first or last: (9) X' → X, YP a) head is first b) head is last The choice given in this rule is known as a Parameter: a variable part of a principle which may be set to one of a number of possible values. It is the existence of parameters, then, that give rise to language variation. The theory that follows from these assumptions is that Universal Grammar consists of a set of invariable principles which may come with a number of parameters. Each parameter has two 5 Mark Newson or more possible values that it can be set to. Universal Grammar, then, is similar to any possible human grammar, without its parameters set. Any specific language is the result of setting the parameters in a particular way. Thus we have the following model: (10) Universal Grammar + parameter settings = a specific language It also follows that it is the parameter settings that require learning. In learning English, the child does not have to discover that phrases have heads and complements, as this is true for all languages and is a principle of Universal Grammar and therefore given innately, but they do have to find out whether English is a head initial language or a head final one. Thus they must learn how to set the ‘head parameter’. Presumably this is not particularly difficult: all it requires is hearing one instance of a head initial phrase, such as see the ball, and the parameter can be set. Recall, this is the kind of thing that children show mastery of from a very early age. 6 Problems for Parameter Setting One fairly obvious problem with the parameter setting theory of language acquisition is that it appears to make the problem of language acquisition so easily solved that it is hard to see why it takes children as long as five years to complete. If it is just a matter of hearing one relevant piece of data for each parameter to be set, then surely language acquisition should be almost spontaneous! Of course, on top of parameter setting, children have to learn lexical facts and this, to a large extent, must be rote learning as the lexicon is entirely unpredictable information. But still, the evidence is that it is not just words that take time to learn, but syntax also develops. Recall however, that there are two spurts in the acquisition process, one where vocabulary increases and basic word combinations start to take place and the second where functional categories are learned and more complex syntactic processes start to appear. It has been suggested that these spurts indicate that language acquisition is to some extent biologically triggered, similar to the way teeth develop or the body undergoing development at puberty. Clearly these bodily changes are biologically determined – set to go off at some predestined time. The idea is that the development of language may also be subject to these kinds of maturational time lags and that the onset of certain linguistic notions is biologically determined rather than subject to a gradual learning process. If this is so, it would account for the sudden spurts that we see in child linguistic behaviour and it would further account for why parameter setting is generally spread out rather than being spontaneous. What is needed is some detail about exactly what matures and so what is available (and unavailable) to the child at any particular moment in the acquisition process. It seems clear that what develops during the vocabulary spurt is some of the basic concepts which allow combinations of words, particularly the kind of semantic relationships for forming basic propositions. For example the idea of predicate and argument seems to be missing at the one word stage, but is present once two or more words start to be combined. The thematic words which take part in these combinations are also what are evident at this stage and it is verbs, nouns and adjectives which predominate. During the syntax spurt it is the functional categories which seem to mature. Current syntactic theory holds that it is the functional words which play a central role in the more complex aspects of syntax and so it is not surprising that these syntactic processes are absent before functional words are present and that they should rapidly develop after the onset of these elements. 6 Language Acquisition – Theories of Language Acquisition A more worrying criticism of parameter setting is that it virtually reduces an otherwise highly explanatory theory of language to mere description. This argument is based on the question of what counts as a parameter. Obviously as parameters are the mechanisms for language variation, it is parameters which determine the possible range of linguistic variation. But in principle anything may be parameterised and any parameter can have any conceivable value. The head parameter we saw above is a simple case as there are only two possible ways to order the head and the complement and both of these are realised in languages. But if there is no limit on what we can consider to be parameterised, we are free to assume that differences between any two languages are to be captured by a specific parameter setting. The absurd limit to this would be for all languages to be the result of unique parameter settings which are not apparent in any other language. Thus there would be a +Spanish parameter setting and a +Swahili setting, etc. This would be perfectly consistent with the notion of Universal Grammar under the view of principles and parameters theory, but it would render the whole theory without explanation. The response to this problem has tended to build into the notion of Universal Grammar some restrictive theory of parameters, though it has to be said that nothing has been particularly persuasive. For example, one of the earliest ideas about parameters is that given that they were embedded in a theory in which linguistic complexity arises from the complex interaction of simple and general grammatical modules, that parameters, though simple in themselves could affect grammars in extensive and yet subtle ways by changing the conditions in which the modules interact. In effect then, a small change in a parameter setting could be enough to have wide and fairly drastic effects on resulting language. This would account for why languages can seem to be so different from each other and yet be related to the same Universal Grammar. Note that this would also aid the acquisition process too as it means that the amount of data that can be used to trigger a parameter setting is much wider. Probably the most well known of this kind of parameter was the pro-drop parameter, which, amongst other things, determines whether a language can have an empty pronoun subject of a finite clause or not. English is a non-pro-drop language as its finite clause subjects have to be overt: (11) a b he sleeps * sleeps Hungarian is a pro-drop language as it can have a covert subject in a similar case: (12) a b ő alszik alszik Thus, there are two settings for this parameter: pro-drop and non-pro-drop. It was in fact more technical than this, resting on a certain property of the tense and agreement inflection of the language, but we need not go into these details. However, it was also claimed that the prodrop parameter was responsible for other phenomena too (again, the technicalities will be avoided, but it should be noted that the facts did follow from the same properties of the inflection, and wasn’t just a stipulated association as might seem from the present presentation). For example, the presence of pleonastic (meaningless) subjects in a language correlates with the language lacking null subjects in finite clauses: 7 Mark Newson (13) a b it seems that John is ill sembra che Gianni sia ammalato seems that John is ill In English the it subject is necessary for a grammatical sentence, but in Italian, a pro-drop language, there is no such element. Another distinction between pro-drop and non-pro-drop languages is the ability to invert the subject with the VP. This is possible in Italian, for instance, but not in English: (14) a b ha telefonato Gianni * has phoned John Finally, certain long distance movements are possible in pro-drop languages, seemingly violating certain Island conditions, which are not possible in non-pro-drop languages: (15) a b che credi che verrà * who do you believe that will come Thus, if we set the pro-drop parameter one way, we get a language which can drop pronominal subjects, which doesn’t have pleonastic subjects, which can invert subjects with VPs and which can move wh-elements over long distances. If we set the pro-drop parameter the other way, we get a language which has overt subject of finite clauses, which has pleonastic subjects, which cannot invert subjects and VPs and which cannot move whelements over long distances. Note that the last fact is quite complex and the data is not likely to be the sort of thing that children will be exposed to. But all they need is to know whether the language has null subjects or not, and that will tell them whether wh-elements can move long distances or not and they will not have to hear sentences such as in (15) to know this. For, the important point is that this idea may serve to limit the kind of thing that we can propose as a possible parameter. If all parameters are like the pro-drop parameter, affecting a wide number of linguistic phenomena, then we are not entitled to come up with a parameter to account for a single difference between two languages. Unfortunately, this programme did not really succeed in that the pro-drop parameter, although a spectacular example of what parameters could do, seemed to be fairly unique in this way. Most parameters that were proposed were like the head parameter, which just deals with the order of heads and complements and plays no role in other linguistic phenomena. Moreover, as further languages were investigated, even the pro-drop parameter was eroded as it seems that there are pro-drop languages which do have expletive subjects, which cannot invert subjects and VPs and there are non-pro-drop languages which can move wh-elements over long distances. Thus, the only thing the pro-drop parameter really determines is whether the language allows null subjects or not. Hence the parameter is only associated with one observable linguistic property. A more successful restriction follows work by Borer (1984), which showed that many linguistic differences between languages can be put down to properties of their functional categories. Just like the difference between pro-drop and non-pro-drop languages seems to be correlated with how rich the inflections of the language are (pro-drop languages have rich inflectional systems and non-pro-drop languages have poor inflectional systems: compare 8 Language Acquisition – Theories of Language Acquisition English and Hungarian in this respect). Borer demonstrated that other differences between languages could be put down to properties of other functional elements and hence if parameters are what deal with language variation, it seems that parameters are limited to properties of functional categories such as inflections, determiners and complementisers. From this perspective, apart from superficial differences, all languages are the same with respect to their argument-predicate structures: the equivalent verb to smile in all languages will take just one argument, whereas the equivalent to give will take three. The Functional Parameterisation Hypothesis works well for many linguistic differences, but it is difficult to see how differences such as the head parameter can be reduced to the properties of functional elements as it obviously affects the order of non-functional elements such as nouns and verbs. Therefore even if some parameters are limited to the properties of functional elements, it seems that we also need other parameters which are not and hence this is not a fully restrictive theory. 7 A brief word about acquisition theories from other grammatical perspectives In Chomsky’s latest thinking about language, the Minimalist Programme, linguistic differences have been reduced to whether elements move overtly or not and thus parameterisation is restricted to this distinction alone. It has to be said however, that the application of this theory is not as wide as the theories of the 1980s (e.g. Government and Binding theory) and so it is not at all clear whether this assumption can be made compatible with all observable linguistic differences between languages. Optimality Theory, on the other hand, has come up with an alternative to Parameter setting as a theory of linguistic variation and acquisition. In OT, constraints conflict with each other so that they require opposite things of a grammatical structure. The conflict is settled by ranking the constraints so that in cases of conflict the highest ranked one will be conformed to and the lower ranked one violated. The point is, then, that it is the ranking of the constraints that determine what is grammatical in a language and therefore if the ranking differs, different things will be grammatical. Ranking therefore accounts for linguistic variation. It follows that it is constraint ranking that must be learned by a child acquiring the system: the constraints themselves can be assumed given by Universal Grammar. Formally this is a very different kind of acquisition process to Parameter Setting and it can be investigated in different ways. There are interesting results in this area, but it would take us too far from our current position to be able to go into any details. 8 Conclusion It is very difficult not to believe in an innate ability in children to learn language. Chomsky has argued forcefully that this innate ability must be specific to language and not a kind of general ability to learn well (how could infants learn whether or not long distance whmovement is possible in their language when the data needed to figure this out from is far more complicated than could be reasonably be expected to be available to a child). However is was only once we had any reasonable idea of what Universal Grammar was that a realistic innatist theory could be presented. While this is still far from perfect, and obviously the theory of Universal Grammar and the dependent theory of language acquisition changes as linguists views on language develop, the notion that came about in the 1980s was enough for the theory to make predictions about what actually goes on in the learning procedure and as such, for the first time, there was real contact between theoretical linguists and child language acquisition specialists. The parameter setting theory spurred a great many investigations into 9 Mark Newson details about how children learned specific linguistic phenomena and in turn this research informed linguistic theory about what was a viable theory and what was not. Although of late, with the Minimalist Programme retreating into more theoretical questions, the contact between linguists and acquisitionists has not been quite so productive, principles and parameters theory has shown what is possible when linguistic theory is able to develop in this direction. References Bloomfield, Leonard 1933 Language, Rinehart and Winston, New York. Borer, Hagit 1984 Parametric Syntax, Foris, Dordrecht. Braine, Martin D. S. 1963, 'The Ontogeny of English Phrase Structure: the First Phase', Language 39, 1-13. Chomsky, Noam 1957 Syntactic Structures, Mouton, the Hague. Chomsky, Noam 1959 ‘Review of B.F. Skinner’s Verbal Behaviour’, Language, 35, 26-58. Chomsky, Noam 1981 Lectures on Government and Binding, Foris, Dordrecht, Holland. Gold, E. M. 1967 ‘Language identification in the limit’ in Information and Control, 10, 447– 474. Rizzi, Luigi 1982 Issues in Italian Syntax, Foris, Dordrecht, Holland. Skinner, B. F. 1957 Verbal Behavior Appleton Century Crofts, New York. 10