BMN ANGD A2 Linguistic Theory Lecture 2: Phrase Structure 1 Before Phrase Structure Phrase structure is a central notion of most current approaches to syntax, but the notion has only really taken of within the last 100 years, which is surprising given that language is known to have been studied for over two and a half thousand years. There are probably several reasons for why phrase structure was relatively a late comer in grammatical study. Early grammatical investigations, influenced by its general philosophical nature, mainly concentrated on words and their meanings. The Greeks, for example, worried about knowledge and how we know about things in the world and knowing things and having names for them were related. Later grammarians fixated on Latin and even more so on the morphology of that language. The relative free word order of Latin meant that syntax was not so obvious to study, especially when studied as a dead language with little in the way of native speaker intuitions to inform investigations. When linguistics eventually broke out of its ‘classical prison’, it seemed that once again the focus of study was on phonology and morphology as it is far more easy to reconstruct forms of words in Proto languages than it is sentence types. Hence the Indo-European studies, staring towards the end of the 1700, which was the main focus of scientific linguistics in Europe for more than 100 years afterwards, did little to encourage the study of syntax. Certain grammatical notions, however, did develop which have obvious connections with structure, though for some reason did not seem to direct grammatical thought in a structural direction. For example, grammatical functions, such as subject and object have been part of grammatical descriptions for centuries. From the contemporary perspective, the subject is the phrase associated with certain syntactic properties and in English, at least, with a certain syntactic position. Thus it is hard to separate such notions from that of structure. Again, however, due to the Latin tradition, grammatical functions were more associated with morphological form and meaning than with structure in most studies prior to the 1900s. Nominal Case and verbal agreement were obvious indicators of grammatical functions and these are items of morphology. Of course, some notion of subordination of clauses was conceptualised, as a way of forming complex clauses contrasting with coordination. This must have entailed the rudiments of structure, as it envisaged one clause as part of another. However, this was not taken to be a general relation, extending beyond clauses and hence no coherent notion of a phrase followed from it. There were other more dubious lines of reasoning that also led to some idea of a phrase. For example, the idea that all languages were in some way a deviation from Latin, considered by some to be the pinnacle of human linguistic achievement, was instrumental in forming the idea that the English verbs and auxiliary verbs function as some kind of unit, called by some a ‘verb phrase’. The reasoning goes that as the auxiliaries in English express things which are typically expressed by verbal morphology in Latin and as the verb is clearly a single unit in Latin, then the collection of verbal elements of English should also be a single unit. As might be expected, a conclusion based on such assumptions has very little empirical support, but the idea has been a popular one and appears in many descriptive works, even to the present day. Mark Newson 2 Structuralism As was discussed last lecture, the idea that all sentences break down into smaller and smaller constituents, i.e. Constituent Structure Analysis, came directly from the Structuralists’ belief that all linguistic constructs must be based on the observable. Therefore every aspect of the description of the linguistic system had to ultimately rest on observable sounds and the constructs of the higher levels, such as phonemes, morphemes and words, had to be constructed from these in a hierarchical way. As far as syntax is concerned, we have some notion of sentences which must be built up out of things ultimately constructed of units of sound. While it would be logically possible to jump straight from words to sentences, such a proposal would make accommodating a number of well established syntactic notions a little difficult. Besides, there was ample empirical evidence to suggest that sentences were not directly constructed from words, but there was an intermediate level to be passed through on the road from words to phrases. Empirical evidence for the existence of phrases came mainly from the application of the ‘discovery procedures’ that the Structuralists developed. The notion of distribution played an important role in the ‘discovery’ of a languages set of phonemes, mainly in the form of minimal pair tests and observations concerning complementary distribution. Similar distributional tests were made use of to determine the word categories made use of by a language and so distributional test were seen to be a fairly universal discovery procedure which could yield information about many aspects of language. The fact that groups of words, and not just the words themselves, have distributions was a major factor in development of the idea that sentences should be analysed as constituted of phrases. Just as distributional patterns could be used to show that words fell into different categories, so the evidence pointed out that phrases too fell into different categories. While there was no logical necessity to think that word categories and phrase categories should have anything to do with each other, existent notions such as ‘verb phrase’ perhaps suggested that it should. Though it should be pointed out that on distributional evidence, what the structuralists and traditional grammar would have termed a ‘verb phrase’ would be very different parts of the sentence. It was also perhaps traditional grammar that influenced a developing notion of ‘noun phrase’. From a traditional point of view the NP was not necessary as grammatical functions such as subject and object were seen to be properties of nouns. The fact that there could be a whole bunch of other words which served to modify the noun was uninteresting as structural issues were typically not considered. It is clear that a concentration on meaning is what led the traditional grammarians to this conclusion: the subject was defined as the one that the sentence is about, or the one who performs the action, etc. and the noun associated with these functions is the most salient element from a semantic point of view. Thus in a sentence such as: (1) this tall man attended every lecture given that the one who attends the lecture is most saliently described as being a man, then it was concluded that the word man must be associated with the function subject. 2 Phrase Structure Even though such reasoning would not have carried any weight in the structuralist framework, especially as they eschewed any reference to semantic facts, it still seems that the primacy of the noun was taken for granted, as the phrase identified as the subject was referred to as a ‘noun phrase’. There was very little from a restrictive theoretical position to be said about phrases. A phrase could be any group of words which displays distributional properties and its category would be given on the basis of a mixture of considerations: what it contained and how it behaved, etc. There was no uniform way of doing this. For example, despite the noun being the most salient element in the phrase of the man, this was not to be called a noun phrase as it did not distribute like other noun phrases. Instead, this was a preposition phrase for the simple reason that the main difference between this an a noun phrase is the presence of the preposition. Furthermore, two phrases that distributed in a similar way would be considered of the same category regardless of how they were constituted. Thus a gerund, which has a distribution identical to a noun phrase, can therefore be considered to be a noun phrase, despite not containing a word that has clear nominal properties: (2) a b I considered [his convincing lie]/[him convincingly lying] [his convincing lie]/[him convincingly lying] persuaded the jury In these cases the gerund lying has obvious verbal properties, being modified by an adverb rather than an adjective, and yet the construction distributes like a normal noun phrase. This is not particularly problematic as there is nothing in the concept of a phrase from this perspective that says that it must contain a word of the relevant category. The fact that most phrases do contain such a word is therefore rather difficult to account for from this perspective. 3 Phrase Structure Grammar In his book Syntactic Structures (1957) Chomsky spent some time formalising essentially what the structuralists had been doing under the title of Immediate Constituent Analysis. The main aim of this was not so much to present a convincing theory based on structuralist principles, and indeed in the rest of the book Chomsky pointed out a number of weaknesses of a grammar based on such assumptions, but to exemplify how Chomsky considered linguistic investigation ought to be carried out. The method was to develop a formal grammar rigorously and explicitly and then to compare it to another such grammar to see which accounted for the data best. By this process of elimination we could then work towards better and better theories. In a sense therefore, Chomsky’s development of a Phrase Structure Grammar was a little like building a straw man, to be knocked down later and it is clear that even by that time he considered Immediate Constituent Analysis to be an inadequate method of treating human languages. Chomsky introduced two important things to make the formalisation of structuralist notions more formal. The first is the representation of a constituent structure analysis, which he called a phrase marker, but which is more commonly called a tree diagram. Quite simply this represents the analysis of higher level structures into lower ones, down as far as the words. Thus a representation of a simple sentence with a subject-predicate structure could look like the following: 3 Mark Newson (3) S NP VP Det N V NP the man saw Det N PP P NP the boy in Det N the park The second thing Chomsky introduced was a kind of rule that could be used to produce such structures. These he called Phrase Structure Rules and they had the following form: (4) X→YZ This rule tells us what the immediate constituents of an element are. In particular, a constituent of category X has immediate constituents of categories Y and Z. In terms of a phrase marker, this rule would produce a section of a tree involving X, Y and Z: (5) X Y Z To build a complete phrase marker, what is needed is a whole series of Phrase Structure Rules, and the complete set of such rules that could generate all of the grammatical structures for a language is called a Phrase Structure Grammar. The set of rules needed to generate the structure in (3), for instance, would be as follows: (6) S → NP VP VP → V NP PP PP → P NP NP → Det N Clearly many more rules would be needed to generate all of the possible structures of English. It is important to note that at this point Chomsky has done nothing more than to formalise what the structuralists had been doing in a less formal way. In particular, Phrase Structure Rules of this kind allow us to define and analyse any collection of words and phrases as a phrase of any kind. Thus it is quite easy to produce a rule which would generate the gerund structure presented in (2) above: (7) NP → NP Adv V (him convincingly lying) This said, however, there are two important consequences that follow from Chomsky’s formalisation of the constituent structure analysis. The first is, as I’ve already mentioned, Chomsky’s aim was to demonstrate what is wrong with the constituent structure approach 4 Phrase Structure and this formalisation made this easier to carry out. Chomsky’s contention was that we can much better find defects in linguistic theories if they are made explicit. Although Chomsky did little more than formalise the structuralist position, this allowed him to point out problems which had successfully ignored until then. The second important consequence of Chomsky’s formalisation of constituent structure analysis was that it enabled the development of these ideas along paths that would not otherwise have been taken. For example, once a rule of he form in (4) has been proposed we can start to investigate its properties, how these might be changed and what the consequences of these changes might be. Indeed, it was these possibilities that allowed a whole new subject area of mathematical linguistics to develop. To give you some idea of how this works consider the rule in (4) in more depth. As it stands it involves one symbol to the left of the arrow and two to the right. We might ask what the consequences might be of allowing other possibilities. For example, what if there could be more than one symbol on the left: (8) XY→AB It is fairly clear that such a rule would not produce something representable as a tree diagram as these essentially encompass the idea that single constituents break up into smaller constituents. But the rule in (8) allows for something which is not even defined as a constituent to be rewritten as something else which is also not defined as a constituent. Thus whatever this rule is, it is not part of a constituent structure analysis. Yet, it is clear that the rule in (4) is a restricted version of the kind of rule in (8): the former represents the same kind of rule with the number of symbols to the left of the arrow limited to one. Thus mathematically a grammar which restricts itself to the kind of rule in (4) is a subset of the kind of grammar which also allows the kind of rule in (8): (9) Phrase Structure Grammar Unrestricted Rewrite System A further variation on these kinds of rules we can think of is if we allow more than one symbol to the left of the arrow, but only one of these to be rewritten on the right: (10) AXB→AYZB The restriction that only one symbol can be affected means that this is a phrase structure rule, generating structures capable of being represented by tree diagrams. The difference between this rule and the type we started with is that with this kind of rule we can take more aspects of the structure into consideration when the rule applies. Specifically the rule in (10) says that X rewrites as Y Z when it is preceded by A and followed by B. We say that this rule is therefore context sensitive. The rule in (4) is a context free rule, by contrast, as it says that X rewrites as Y Z no matter what else is present. Again we can see there is a mathematical relationship between these kinds of rules: a context free rule is a kind of context sensitive rule lacking the contextual parts and a context sensitive rule is an unrestricted rewrite rule restricted to rewriting one symbol. Hence there is once more a set relationship between the kinds of grammars constructed of such rules: 5 Mark Newson Context Free Phrase Structure Grammar (11) Context Sensitive Phrase Structure Grammar Unrestricted Rewrite System This interesting thing about this is that set relationships are well understood aspect of mathematics and this enables us to investigate the relationships between different grammatical systems in a rigorous mathematical way. This is not to say that this is the way that all linguistic investigation has to go, but merely that a new door is opened for investigation which was previously unknown. Indeed this led to some very interesting work starting in the 1960s, which asked questions about what kinds of formal systems such as these can be defined, what their properties are, what types of languages they might generate and most importantly what kind of language type does the set of human languages fall into. Mathematical work was even done on the question of what kind of mathematical rule systems are ‘learnable’, given certain assumptions about the learning situation. Obviously this was an issue of some importance to Chomsky as he had placed emphasis on the ability of a theory to account for the fact that human languages are ‘learned’ and therefore learnable in order to elevate it to a status of ‘explanatory adequacy’. 4 What Phrase Structure Grammars Can Do Before we move on to look at Chomsky’s criticism of Phrase Structure Grammar, it is worth going over some of their obvious positive aspects first. The first it that they can account for the fact of distribution. By this I do not mean that they can account for all observations about the distribution of elements in all languages, but rather that they can account for the fact that linguistic elements have distributions. In other words, that distribution is a property of human languages. Consider what might have been the case if human grammars worked along the lines of unrestricted rewrite systems. Presumably there would be very little to observe that would indicate that groups of words hold together as phrases and indeed words themselves would have quite complex patterns of distribution. With Phrase Structure Grammars, however, both words and the phrases that the grammar defines will have strict and observable distributions as the rules are capable of generating structures in which words an phrases can appear in limited positions. Looking back at the grammar fragment in (6) we can see that although there are only a small number of rules here, the grammar defines the distribution of NPs quite comprehensively as sister of the VP directly inside the clause (i.e. subject), sister of the verb inside the VP (i.e. object) and sister of the preposition inside the PP (prepositional object). Furthermore it is predicted that we will not find phrases other than NPs in these positions and the grammar could not generate structures such as the following: (12) S VP VP saw the boy in the park saw the boy in the park 6 Phrase Structure S NP VP the man V VP PP saw saw the boy in the park in the park The second positive consequence of a Phrase Structure Grammar is that it can predict facts about language that would not be possible unless there is some notion of structure built into the system. For example, there are cases of ambiguity which are explicable in terms of word meanings: certain words are ambiguous and in certain contexts both meanings are possible and so one could interpret the sentence in either way: (13) he wasn’t very smart a he wasn’t very intelligent b he wasn’t dressed well However, there are cases of ambiguity which cannot be accounted for in terms of the ambiguities of the words involved: (14) he wrote the note on the table a he wrote the note which is on the table b he wrote the note whilst he was at the table This ambiguity has nothing to do with the possible meanings of the individual words, but to do with what the PP on the table is interpreted as modifying – the note or the writing of the note. Clearly the ambiguity relies on the structure of the expression as if the PP is part of the NP as a modifier one interpretation results and if it is part of the VP the other interpretation is achieved. We need only to add the following rules to the grammar in (6) to achieve this: (15) VP → V NP NP → Det N PP In other words, the PP is an optional part of the NP or VP structure. This produces, among others, the following structures: (16) S NP he VP V NP PP wrote the note on the table 7 Mark Newson S NP VP he V NP wrote Det N PP the note on the table 5 The Inadequacy of Phrase Structure Grammars In 1957 Chomsky presented a number of phenomena from English which he claimed to be difficult at best for a Phrase Structure Grammar to cope with. Possibly the easiest to discuss here is the passive. The issue is not that passive sentences cannot be given a Constituent Structure analysis and therefore cannot be represented by tree structures and generated by rules. This is easy enough to do for any passive sentence: (17) a the man was seen S NP VP the man Verb Aux V was seen VP → Verb Verb → Aux V Here I try to be true to Chomsky’s 1957 presentation of the analysis, though it is clear that this would not be considered adequate by today’s standards. For example, Chomsky follows traditional grammar in analysing the auxiliary and the verb as a constituent, which there is little empirical justification for. However, as this does not make any difference to the arguments he presents I will ignore the issue. The important issue is the Chomsky observes that the application of these rules cannot be free as it will produce a number of ungrammatical sentences. For example the passive typically involves the use of the auxiliary be in conjunction with the passive morpheme –en. But this can only happen if the verb is transitive and moreover appears without its object: (18) a b * he was smiled * he was seen Bill But building these restrictions into a phrase structure grammar is very difficult. To start with, a context free system would not be able to handle the data at all as it is clear that the passive 8 Phrase Structure auxiliary and passive form of the verb can only be inserted into structures bearing in mind the overall context of the passive construction. But even a context sensitive grammar would have to get very complicated in order to handle simple passives. For example transitive verbs are typically only used in the presence of an object and hence are contextually restricted themselves. The opposite is true for intransitives as they can only be used in the context of a missing object. These contextual restrictions are then reversed in the case of the passive, as transitive verbs can only be inserted in the absence of an object and the intransitive verb cannot be used at all, even when the object is absent. A second problem caused by the passive construction can be seen by considering the following data: (19) a b sincerity frightens John John fears sincerity * John frightens sincerity * sincerity fears John Clearly verbs place restrictions on what kind of elements can appear in their subject and object positions and this seems to be determined for each verb. However these restrictions are over-ridden in the passive: (20) a b John was frightened (by sincerity) sincerity was feared (by John) * sincerity was frightened (by John) * John was feared (by sincerity) However these restrictions are to be encoded in the grammar, and it is not clear how to do this successfully with phrase structure rules, the point is that the restrictions would have to restated in mirror image for each construction and its passive counterpart. Worse still than the obvious complexities involved in getting phrase structure grammars to handle what appears to be rather simple data, is the fact that in all the complexities a very simple generalisation is completely lost. This is that the subject of the passive to all intents and purposes is identified with the object of the active. This generalisation cannot be captured in a phrase structure grammar for the simple reason that it has no way to connect two different structures. In order to do this, Chomsky argues, a different type of rule is needed, which we will be looking at in next week’s lecture. 6 6.1 More Advanced Phrase Structure Rules The development of X-bar Although Chomsky dismissed pure phrase structure grammars as adequate models for human languages, he always maintained that some phrase structure component, essentially based on rewrite rules, played a role in the human linguistic system. However, some of the intrinsic problems of the Constituent Structure Analysis were therefore carried over to the initial generative grammars. For example, the fact that constituents were unrestricted and motivated purely on empirical grounds and hence certain properties, such as that noun phrases contain nouns and verb phrases contain verbs, are difficult to account for also carried over to the earlier generative systems. It wasn’t until 1970 that the issue was addressed by the introduction of the X-bar system – specifically the ‘X’ part of this. The mechanism is quite simple and counts as a further 9 Mark Newson restriction on the kinds of phrase structure rules allowed. It involves imposing the notion of a head onto the system by restricting rules to the following form: (21) Xm → … Xn … The superscripts, while part of the X-bar system, are irrelevant to the point at hand. This is that there must be a symbol of the same category to the left and the right of the arrow. Thus an NP will always contain at least one element of category N and a VP a V, etc. The X-bar system, then turns out to be a further restriction of phrase structure grammars, though not one that falls into a simple subset relationship to the others we have reviewed as in principle one can have context free and context sensitive X-bar rules, though by the time X-bar theory was proposed the use of context sensitive rules was long considered unnecessary outside of the treatment of lexical distribution. By the 1980s however, the introduction of the X-bar system had led to the situation in which the phrase structure component of the grammar was all but eliminated from the grammar, leaving a small number of rules of the type in (21) in its place. This is another topic to which we will return in a future lecture. 6.2 Slash categories Some have claimed that Chomsky’s original criticism of phrase structure rules was too harsh and if we were to allow ourselves certain extensions to these rules, the kind of data that Chomsky claimed to be difficult to account for could be handled. For example another construction that seems to require added complexities are wh-questions in which an element missing from a given position in a sentence is linked to the appearance of a related element at the beginning of the sentence: (22) I asked who he met __ In Generalised Phrase Structure Grammar it was proposed to handle such effects by the introduction of a ‘slash category’, which is essentially a category with something missing and which can combine with another element of the relevant type to make a whole category or be part of a larger constituent which inherits the property of the missing element. To give some idea of how this works consider the sentence above. This involves a VP with a missing object and so its category is VP/NP. This combines with a subject to form a sentence with a missing object, of the category S/NP. This combines with the wh-NP to form a complete sentence: 10 Phrase Structure (23) S NP I VP V S asked NP S/NP who NP VP/NP he V NP/NP met We cannot go into detail about how this system was formalised in a phrase structure grammar as it would involve the introduction of too much new and rather technical mechanisms. However, some idea of how it might be instantiated, if somewhat simplistically, can be gained by considering the following rewrite rules: (24) XP/XP → XP/YP → … ZP/YP XP → YP XP/YP The first of these rules simply says that a constituent that lacks itself is empty. This applies to the missing object in (23). The second states that a constituent that lacks some constituent contains some other category that lacks the same constituent. In (23) we can see that the VP/NP conforms to this rule as it contains the NP/NP constituent representing the missing object. In turn the S/NP then contains the VP/NP and so this is another example of the application of this rule. Finally the third says that a constituent can be made up of something that lacks a certain constituent combined with a constituent of the missing type. The embedded S is an instance of this rule as it contains the S/NP category combined with an NP: in a sense the NP and the /NP cancel each other out and what we are left with is the S. 7 Conclusion In this lecture we have concentrated on the treatment of phrase structure in linguistics over the past 100 or so years. It is somewhat intriguing that the very notion of phrase structure has maintained its prominence, even in Chomskyan generative grammar, when its initial development was based on structuralist/empiricist ideas that Chomsky vehemently attacked at the end of the 1950s. Thus this is perhaps the only notion to survive in current generative linguistics from the structuralist tradition. Moreover, despite Chomsky’s withering scepticism of the structuralist idea of discovery procedures, the kinds of methods that they developed are still very much part of linguistic practise and are to be found described in detail in almost all introduction to linguistics text books, though clearly no one these days believe them to be infallible ways of extracting facts from observations. While there have been one or two attempts to question the validity of the structuralist remnant in generative grammar, these are very much in the minority and on the whole the idea of constituent structure is now so embedded in the linguistic conscience that it is virtually unquestionable. This is surprising given that the spirit of generative grammar in accordance with rationalist investigation 11 Mark Newson purports to believe that every theoretical construct should be held up to scrutiny and that no assumption should be sacrosanct. 12