BMN ANGD A2 Linguistic Theory Lecture 6: Generalisation 1 Generality and Explanation To set the scene for this lecture I would like to compare the first transformational theory of Chomsky (1957) with that of the Extended Standard theory of the 1970s. Recall that in the 57 theory there was a set of phrase structure rules which generated a core set of ‘kernel’ sentences and then a set of transformations which were responsible for generating all other sentence types. This essentially put the rules of the grammatical system in a one to one correspondence with the linguistic phenomena that the grammar was attempting to account for: a core set of rules corresponded to a core set of sentences and extra rules corresponded to extra sentences: (1) Grammar Language PS rules kernel sentences T1 T2 T3 T4 … sentence type 1 sentence type 2 sentence type 3 sentence type 4 … Essentially, the grammatical system merely reflects linguistic phenomena and is therefore just as complex as the object it addresses. It is clear that in such a situation the grammar does no more than describe the data. Moreover, when we consider that the situation above describes just a single language and we would have to have a similar model for every single existent and indeed possible human language, there is very little understanding to be gained concerning the nature of possible human languages nor accounting for how linguistic systems arise in the individual (i.e. how language acquisition is possible!). Essentially the only limits imposed on the system are empirical in nature: the set of observed sentence types limit the grammatical rules needed to account for them. In principle then, the grammar could contain any conceivable rules and a child faced with the task of learning a grammatical system could entertain any conceivable system. In other words, this kind of view of the linguistic system does not help us to understand the possibility of language acquisition as we are left with the question that if a learner can hypothesise any conceivable system, how are they able to narrow their hypotheses down to the correct one? Now compare this to the Extended Standard Theory, in which the phrase structure rules are not restricted to forming a subset of sentences, but are relevant to all sentences of the language, albeit at a deeper level of analysis, and there is a restricted set of transformations which are not associate with just one sentence type, but with a whole set of sentence types: Mark Newson (2) Grammar Language PS Rules constraints T1 T2 T3 Note that the structure of the grammar does not simply reflect that of the language and moreover, while the language may involve fairly complex phenomena, the grammar remains relatively simple. In this way the grammar is more than just a translation of the linguistic phenomena into descriptive rules, but goes some way to improving our understanding of the phenomena itself. The other advantage of this kind of grammatical system is that as the rules themselves are not specifically related to language particular structures, but to something far more general, the rules themselves may have greater application to other languages and as such they represent something far more indicative of the basic nature of human languages as a whole. Obviously languages differ from one another, but from this perspective, given that each rule is associated with a wide range of phenomena, it might be that small alterations in the grammar result in wide ranging differences in observable languages. If what has to be learned is limited to such small differences in the grammatical system, the whole question of how language can be acquired is more easily answered. The grammatical system itself imposes huge restrictions on the notion of a possible human grammar and as such the task of language learning is much simplified. What the above discussion illustrates is the relationship between the generalisation of linguistic rules and the explanatory content of the theory within which those rules are stated. In a nutshell, the more general the grammatical system, the more explanatory the theory. In this lecture, we will discuss how the developments that took place in the 1970s along the lines sketched above culminated at the beginning of the 1980s in a theory which attained an unprecedented level or explanation. We will concentrate on two grammatical areas, the phrase structure rules and the transformation component. 2 Phrase Structure and X-bar theory Phrase Structure rules were Chomsky’s formalisation of the Structuralist idea of Immediate Constituent Analysis. From the start they were fairly descriptive devices capable of modelling any possibility for constituent structure. So if an NP contained a determiner 2 Generalisation followed by a noun, this was just as easy to model as if an NP were to contain a verb followed by a preposition: (3) a b NP → Det N NP → V P The fact that only one of these possibilities is actual in any language is then, as far as the basic theory of phrase structure based on such notions is concerned, pure accident. In other words, given that this observation is clearly no accident, the theory cannot explain the facts. The structuralists had noted that some phrases are replaceable by one of its constituents, or in other words that a part of the phrase can function as the whole. So, to take a famous example from Bloomfield (1933): (4) a b poor John ran away John ran away Here the phrase poor John is replaced by John and hence the single noun can function as the whole phrase indicating the noun’s centrality within the phrase. The term head was given to such central elements, and phrases which had heads were called endocentric. Not all linguistic constituents were endocentric, however. The most obvious unit that cannot be replaced by one of its constituent being a sentence: (5) a b c poor John ran away poor John ran away While (5a) has the status of a sentence, neither (5b) nor (5c) do. Nor does any other constituent of (5) for that matter. Hence it was proposed that sentences are exocentric, lacking a head. It is clear however that a phrase structure rule of the form X → Y Z cannot capture the notion head and hence endocentric and exocentric structures are given the same treatment: (6) NP → Adj N S → NP VP While we can say that the noun is the head of the noun phrase in the first rule, there is nothing in this rule that informs us of this apart from the apparently descriptive accident that the phrase is labelled with the same category symbol as the noun. Given that in principle (3b) is a possible phrase structure rule and there is no head of this NP we can see that phrase structure rules are not capable of capturing the notion of endocentricity nor the connected notion of a head. After more than ten year of working with phrase structure rules, Chomsky (1970) proposed a revision to the phrase structure component of the grammar, which has since become known as X-bar theory. Chomsky’s proposal was not particularly detailed and was virtually tacked on to the end of a paper about the difference between derived nominals and gerunds, which need not concern us here. The main point of the proposal was to capture certain similarities between certain elements and their dependants. Starting with verbs, which had traditionally 3 Mark Newson been subcategorised in terms of their following dependants (transitive verbs having objects, prepositional verbs having prepositional complements, etc.), Chomsky observed that nouns and adjectives, even those not derived from verbs, can also be subcategorised along similar lines: (7) a b c treat [with penicillin] treatment [with penicillin] treatable [with penicillin] book [about linguistics] fond [of chocolate] The phrase structure grammar would seem to contain the following rules therefore: (8) VP → V Comp NP → N Comp AP → A Comp where ‘Comp’ stands for the dependent phrase or ‘complement’. Clearly there is a generalisation to be had here which is entirely missed by stating these three separate rules. In each case the phrase is endocentric and the head precedes the complement. Chomsky’s proposal was to capture this generalisation by using a category variable X to represent the head and the category of the phrase it heads: (9) X' → X YP The symbol X' (pronounced X bar) represents a phrase whose category is determined by the head and thus this system is able to capture the notion of an endocentric constituent. Chomsky also saw the need to include material which preceded the head, such a determiners, auxiliary verbs and degree adverbs: (10) a b c the book have gone so tall He termed these elements specifiers and proposed the following rule: (11) X" → Spec X' The category X" (pronounced X double bar) represents the full phrase and hence phrases generated by these rules have the following structure: (12) X" Spec X' X YP Although Chomsky did not mention prepositions in the (1970) paper, it is clear that these can also be included in the set of things that X in the rules (9) and (11) can range over. Soon after, then, X was taken to range over N, V, A and P. 4 Generalisation In the 1970 paper, it was never really stated what the status of the rules in (9) and (11) were supposed to be taken as, but it seems that in the period that followed they were generally assumed to be a kind of template for possible phrase structure rules. For example, this was how they were taken in Jackendoff (1977), which was one of the major work at the time on the subject of X-bar syntax. The reason for this was the fact that there still seemed to be phenomena which were idiosyncratic to certain phrases and as X-bar rules do not distinguish between phrases of different types, specific rules for NPs, VPs, etc. were still needed. For example, it is well known that while verbs and prepositions can take NP complements, nouns and adjectives cannot: (13) a b c d tell him to him * picture him * capable it picture of him capable of it This cannot be a lexically determined thing as is not something specific to particular lexical items, but is something that is true of a whole category. The only place that such generalisations can be captured therefore is in the grammar and the phrase structure component seems to be the best place for them. Hence there was a need for rules such as the following: (14) V' → V NP P' → P NP N' → N PP A' → A PP Given that these rules conform to the pattern set down by the X-bar rule in (9) they were considered valid under the view that the role played by (9) was to licence possible phrase structure rules which were of a more specific nature. In a sense, the relationship between the X-bar rules and phrase structure rules is similar to the relationship between transformations and constraints on transformations. Although this position is more restricted an general than that which held prior to 1970, it still isn’t entirely satisfactory as there are still construction specific rules which are therefore of a descriptive nature. There are no explanations in the rules in (14), for example, for why nouns and adjectives in English cannot have NP complements. However there were a number of developments which enabled steps to be made towards a greater generalisation of the system and ultimately allowing rules such as (14) to be eliminated from the grammar. Recall the Case Filter from last week. This is a filter that controls the surface distribution of NPs, forcing them to occupy Case positions. The data in (13) indicate that the complement position of nouns and adjectives is not a Case position and under this assumption, that nouns and adjectives cannot take NPs in their surface complement positions is accounted for independently and does not have to be stipulated in terms of specific phrase structure rules. Stowell (1981) proposed that all phenomena that necessitate category specific phrase structure rules can be accounted for by independent principles of the grammar and hence the phrase structure part of the grammar can be eliminated entirely, leaving only the general X-bar rules to deal with the basic structural properties of a language. The theory which emerges looks like the following: 5 Mark Newson (15) X-bar Theory Lexicon D-structure Transformations Constraints S-structure Case Theory However, at this stage of the theory there still remain at least two structures that are exocentric and hence which stand outside of X-bar theory altogether, S and S, and a large number of elements, such as determiners, auxiliaries and complementisers which are not associated with phrases and so which are also not part of the X-bar system. It is interesting to note that at this point X-bar theory was seen as a theory of the structure of certain linguistic elements, nouns, verbs, adjectives and prepositions, which were referred to traditionally as the major categories, but are also called the lexical or thematic categories, whereas the non-X-bar elements were what are traditionally termed minor categories, though these days are more often called functional categories. Clearly the traditional names ‘major’ and ‘minor’ reflects the attitude that the thematic elements are somehow more important and this attitude seemed to prevail in X-bar theory too. It is quite obvious however that this view is guided by the fact that the thematic elements carry the larger part of the semantics whereas the functional elements have only a secondary role to play in terms of meaning. While this was the kind of thing that tended to influence traditional grammars, which tended to be meaning centred, it should not have been a factor in modern grammar which has, since the structuralists, maintained that syntax and semantics are separate (if related). It wasn’t until the early part of the 1980s theoretical interest eventually turned to the functional categories and more importance began to be placed on these elements. For example Borer (1984) proposed that most important linguistic differences can be traced to differences in properties of functional categories and hence it is these that define a particular language and these properties that children have to learn when acquiring language. The first changes reflecting this newly acquired interest in functional elements came with the analysis of the clause. Recall, this was analysed as an exocentric structure as in the following: (16) S COMP S NP INFL VP COMP represents the position of the complementiser (not to be confused with Comp, the complement position in an X-bar structure) and INFL (for ‘inflection’) represents the position of the tense element (including modal auxiliaries and the infinitive marker to). Also recall 6 Generalisation that S (pronounced ‘S bar’) was an independent notion from X-bar, indicating that this element is clausal in nature, but different from S. The possibility that the clause was not exocentric was considered in the 1970s and there seemed to be two possibilities for the choice of the head. Jackendoff (1977) championed the idea that the verb was the head of the clause, making S categorially a VP. The other view was that the inflection was the head of the clause, though it was not until the 1980s that this was taken seriously and the following representation appeared (in Stowell (1981), for example): (17) S IP NP I VP NP I' I VP This suggestion actually kills two birds with one stone as not only does it provide an X-bar analysis for the S node, but it also considers the inflection to be a category capable of being a head. It also makes the head of the clause a functional element which fits nicely with the growing interest in the syntactic role of functional categories. From this move it seemed to follow naturally that S should also fall into an X-bar type analysis and the obvious move was to consider the complementiser as its head, again bringing another functional element into the realms of X-bar theory: (18) S C CP S (wh) C' C IP Recall that the wh-element, fronted in interrogatives, was not assumed to occupy the COMP position, but be adjoined to it. The CP analysis confirms the independence of the complementiser and the wh-position, but here the wh-element is moved to the specifier position. The final functional element to fall to an X-bar analysis was the determiner. Fukui (1986) and Abney (1987) both provided an analysis of what had traditionally been taken to be a phrase headed by the noun and argued that the real head was the determiner: (19) NP Det the N DP N' D' PP picture of Mary D NP the N' N PP picture of Mary 7 Mark Newson Although this is not the place to try to justify these analyses, it is worth pointing out that even from structuralist criteria taking the determiner as the head of the ‘NP’ has its justification. Recall Bloomfield’s notion of a head being a part of a phrase that can function as the whole. In Bloomfield’s example poor John, John was identified as the head, but suppose we consider another example: that dog. Clearly dog cannot function as the whole phrase here, but that can: (20) a b c he patted that dog * he patted dog he patted that From a structuralist point of view, this provided contradictory evidence for the head of this kind of phrase and hence already more sophisticated argumentation is necessary to conclude whether the head should be taken to be the noun or the determiner. Abney provides such sophisticated argumentation and concludes in favour of the determiner. To conclude this section, we have seen how a process of generalisation has been directing the development of the part of the grammar which attends to basic structural issues. Starting with phrase structure rules, which were construction specific and rather descriptive in nature, a series of developments have led to the position in which there are just two phrase structure rules: (21) XP → Spec X' X' → X YP These are clearly not construction specific and to the extent that they account for all aspects of phrase structure can be seen to attain high levels of explanatory adequacy. For example, these rules can be assumed to underlie the phrase structures of all human languages and hence pose no particular problem for learning: they can be assumed to be universal and hence part of the innate linguistic system. Languages may differ, for example in terms of whether the head precedes or follows its complement, but the general rule that the head and the complement form a constituent (X') is universal. Thus the amount of learning involved in this aspect of language is minimal and can be done on the basis of exposure to quite simple data. 3 Transformations We have seen how transformations started off as structure specific rules in the 1960s and as a result of the addition of constraints became far more general. Indeed, by the end of the 1970s it was generally accepted that there were two main transformations, one for moving NPs into subject positions and one for moving wh-elements into COMP. Besides these there were a number of other transformations which seemed to be of a stylistic nature and were generally optional, so perhaps belonging to an entirely different part of the grammar. The two major movements, NP-movement and Wh-movement, could not be further reduced as it appeared they were subject to different conditions: NP-movement being restricted by the Tensed S Condition and the Specified Subject Condition and Wh-movement being restricted by the Crossover Conditions (see lecture 4). Perhaps this was not an impossible situation, but it still raised questions that could only be solved in a stipulatory manner, such as why there are these two transformations and why they have the particular properties that they do. 8 Generalisation Once again, developments in other parts of the grammar proved helpful to overcome these problems. Specifically it was the development of trace theory that allowed the final step in the generalisation of the transformational component. We will go into the details more fully next week, but the observation was that the traces involved in structures formed by various movement phenomena had different properties. Consider a straightforward case of NPmovement, for example: (22) a b c John1 seemed [ t1 to be rich] * John1 seemed [ t1 is rich] * John1 seemed [ Mary to like t1] We see that an NP is allowed to move out of the subject position of a non-finite clause, but not out of a finite clause (violating the Tensed S Condition) or out of an object position (violating the Specified Subject Condition). This pattern is repeated in phenomena concerning the referential properties of certain pronouns: (23) a b c John1 believes [himself1 to be smart] * John1 believes [himself1 is smart] * John1 believes [Mary to like himself1] In (23), we use indexes to indicate the referent of the pronoun. What we see is that a reflexive pronoun in the subject position of a non-finite clause can refer to an element in the dominating clause, but not if it is in a finite clause or in object position. These data can be handled if we assume that the traces in (22) are subject to the same grammatical principles as the pronouns in (23), or in other words, these traces and pronouns form a coherent class of elements, known as anaphors. With this assumption, then, these restrictions can be factored out from the movement process altogether and whatever the more general conditions that determine the properties of anaphors, accounting for the Tensed S Condition and Specified Subject Condition, are not to be taken as constraints on particular movements all. A similar move can account for Crossover phenomena too. Recall that a wh-element is not allowed to move over the top of a coreferential element: (24) a b who1 t1 said [he1 likes Mary] * who1 did he1 say [t1 likes Mary] In this case we cannot liken the behaviour of the trace to that of a reflexive pronoun, as replacing the trace with such a pronoun gives an ungrammatical sentence in both cases: (25) a b * himself1 said [he1 likes Mary] * he1 said [himself1 likes Mary] However, if we replace the trace with a full referential NP we get the required result, so in this case it seems that the trace behaves like a referential expression: (26) a b John1 said [he1 likes Mary] * he1 said [John1 likes Mary] 9 Mark Newson Again, factoring these conditions out of the movement process altogether and imposing them as restrictions on types of traces, we no longer need to claim that there is a type of movement which is restricted by a specific constraint. The part of the grammar which was developed to account for the referential properties of elements such as reflective and personal pronouns and referential expressions such as names was called Binding Theory and its conditions act like Filters applying at S-structure, defining possible referential interpretations for the relevant elements. Seeing traces as having referential properties and hence being subject to the principles of Binding Theory was one of the major developments which allowed apparently structure specific restrictions to be factored out of transformations entirely. Because of these developments, the transformational component was ultimately reduced to a single transformation. This transformation did not have to stipulate what element had to move, as this was determined by independent considerations such as the Case filter – forcing NPs to move out of Caseless positions, for example. Nor did it have to stipulate where the element had to move to – the Case Filter required Caseless NPs to move to Case positions. In fact all that was required of the transformation component was the statement that things can move and all the specific details of particular movements, which had previously been encoded in the transformations themselves, was factored out to other independently motivated parts of the grammar. The transformation required was then: (27) Move Move anything anywhere Obviously this is the most general transformation there could possibly be, something that might easily be part of an innate system and given that in most languages there is some indication that some elements undergo movement processes it is a universal aspect of languages too. Of course, there are differences in what moves where in languages: not all languages move wh-elements to the front of the clause in interrogatives, for example. But as these facts are not specific to the transformational component, but to other aspects of the grammar, they do not need to be encoded in the transformation rule itself. The grammar we end up with is as follows: 10 Generalisation (28) X-bar Theory Lexicon D-structure Move S-structure Constraints Case Theory Binding Theory 4 Conclusion In the grammatical system represented in (28) we can start to see the beginnings of the theory that developed in the 1980s, known as Government and Binding theory. One of the most obvious features of this system is its modular nature: there are independent grammatical modules, each of which addresses specific grammatical phenomena, such as X-bar theory addressing basic structural issues and Binding Theory addressing referential phenomena, and each of which contains a small number of general grammatical principles. The complexity of the linguistic system does not stem from the complexity of the grammatical rules themselves, but from the complex way these rules interact with each other. For example, Case theory imposes a simple restriction on S-structure, that all NPs/DPs must be in Case positions, and this in turn places requirements on movement, forcing elements to move out of Caseless positions into positions where they can get Case. The movement component itself, however, simply permits movements of any kind, but only those which serve to satisfy the Case Filter will give rise to grammatical structures. Given that the structure of the grammar can be taken to be universal, and the simple principles of the grammatical modules are also general enough to be considered universal, this model goes a long way to solving the question of how language acquisition is possible. What has to be learned are the lexical elements and their properties, which given that this amounts to a finite body of knowledge, poses no particular logical problem, and also some fairly superficial differences in how the principles themselves are to be applied, such as whether heads precede complements. Because of the complex interaction between the simple and general modules of the grammar, however, simple changes in one module might result in dramatic differences in terms of what structures are grammatical between languages. So while languages may look to differ one from another in vast and complex ways, the actual difference between the grammatical systems, which is after all what has to be acquired, may be relatively minor. Such was the optimistic hope of 1980s grammar, at least. We shall see that while some of this optimism was justified, ultimately Government and Binding theory raised as many questions as it solved, however. 11 Mark Newson References Abney, Stephen Paul 1987 ‘The English Noun Phrase in its Sentential Aspect’, PhD. dissertation, MIT, Cambridge, Mass. Bloomfield, Leonard 1933 Language, Rinehart and Winston, New York. Borer, Hagit 1984 Parametric Syntax, Foris, Dordrecht. Chomsky, Noam 1970 ‘Remarks on Nominalisation’, in Jacobs, R. and P.S. Rosenbaum (eds.) Readings in English Transformational Grammar, Ginn and Co., Waltham, Mass. Fukui, N 1986 ‘A Theory of Category Projection and its Applications’, Ph.D. dissertation, MIT. Jackendoff, Raymond 1977 X-bar Syntax, MIT Press, Cambridge, Mass. 12