Lecture 2

advertisement
BMN ANGD A2 Linguistic Theory
Lecture 2: Phrase Structure
1
Before Phrase Structure
Phrase structure is a central notion of most current approaches to syntax, but the notion has
only really taken of within the last 100 years, which is surprising given that language is
known to have been studied for over two and a half thousand years. There are probably
several reasons for why phrase structure was relatively a late comer in grammatical study.
Early grammatical investigations, influenced by its general philosophical nature, mainly
concentrated on words and their meanings. The Greeks, for example, worried about
knowledge and how we know about things in the world and knowing things and having
names for them were related. Later grammarians fixated on Latin and even more so on the
morphology of that language. The relative free word order of Latin meant that syntax was not
so obvious to study, especially when studied as a dead language with little in the way of
native speaker intuitions to inform investigations. When linguistics eventually broke out of its
‘classical prison’, it seemed that once again the focus of study was on phonology and
morphology as it is far more easy to reconstruct forms of words in Proto languages than it is
sentence types. Hence the Indo-European studies, staring towards the end of the 1700, which
was the main focus of scientific linguistics in Europe for more than 100 years afterwards, did
little to encourage the study of syntax.
Certain grammatical notions, however, did develop which have obvious connections with
structure, though for some reason did not seem to direct grammatical thought in a structural
direction. For example, grammatical functions, such as subject and object have been part of
grammatical descriptions for centuries. From the contemporary perspective, the subject is the
phrase associated with certain syntactic properties and in English, at least, with a certain
syntactic position. Thus it is hard to separate such notions from that of structure. Again,
however, due to the Latin tradition, grammatical functions were more associated with
morphological form and meaning than with structure in most studies prior to the 1900s.
Nominal Case and verbal agreement were obvious indicators of grammatical functions and
these are items of morphology.
Of course, some notion of subordination of clauses was conceptualised, as a way of forming
complex clauses contrasting with coordination. This must have entailed the rudiments of
structure, as it envisaged one clause as part of another. However, this was not taken to be a
general relation, extending beyond clauses and hence no coherent notion of a phrase followed
from it.
There were other more dubious lines of reasoning that also led to some idea of a phrase. For
example, the idea that all languages were in some way a deviation from Latin, considered by
some to be the pinnacle of human linguistic achievement, was instrumental in forming the
idea that the English verbs and auxiliary verbs function as some kind of unit, called by some
a ‘verb phrase’. The reasoning goes that as the auxiliaries in English express things which are
typically expressed by verbal morphology in Latin and as the verb is clearly a single unit in
Latin, then the collection of verbal elements of English should also be a single unit. As might
be expected, a conclusion based on such assumptions has very little empirical support, but the
idea has been a popular one and appears in many descriptive works, even to the present day.
Mark Newson
2
Structuralism
As was discussed last lecture, the idea that all sentences break down into smaller and smaller
constituents, i.e. Constituent Structure Analysis, came directly from the Structuralists’ belief
that all linguistic constructs must be based on the observable. Therefore every aspect of the
description of the linguistic system had to ultimately rest on observable sounds and the
constructs of the higher levels, such as phonemes, morphemes and words, had to be
constructed from these in a hierarchical way.
As far as syntax is concerned, we have some notion of sentences which must be built up out
of things ultimately constructed of units of sound. While it would be logically possible to
jump straight from words to sentences, such a proposal would make accommodating a
number of well established syntactic notions a little difficult. Besides, there was ample
empirical evidence to suggest that sentences were not directly constructed from words, but
there was an intermediate level to be passed through on the road from words to phrases.
Empirical evidence for the existence of phrases came mainly from the application of the
‘discovery procedures’ that the Structuralists developed. The notion of distribution played an
important role in the ‘discovery’ of a languages set of phonemes, mainly in the form of
minimal pair tests and observations concerning complementary distribution. Similar
distributional tests were made use of to determine the word categories made use of by a
language and so distributional test were seen to be a fairly universal discovery procedure
which could yield information about many aspects of language. The fact that groups of
words, and not just the words themselves, have distributions was a major factor in
development of the idea that sentences should be analysed as constituted of phrases.
Just as distributional patterns could be used to show that words fell into different categories,
so the evidence pointed out that phrases too fell into different categories. While there was no
logical necessity to think that word categories and phrase categories should have anything to
do with each other, existent notions such as ‘verb phrase’ perhaps suggested that it should.
Though it should be pointed out that on distributional evidence, what the structuralists and
traditional grammar would have termed a ‘verb phrase’ would be very different parts of the
sentence.
It was also perhaps traditional grammar that influenced a developing notion of ‘noun phrase’.
From a traditional point of view the NP was not necessary as grammatical functions such as
subject and object were seen to be properties of nouns. The fact that there could be a whole
bunch of other words which served to modify the noun was uninteresting as structural issues
were typically not considered. It is clear that a concentration on meaning is what led the
traditional grammarians to this conclusion: the subject was defined as the one that the
sentence is about, or the one who performs the action, etc. and the noun associated with these
functions is the most salient element from a semantic point of view. Thus in a sentence such
as:
(1)
this tall man attended every lecture
given that the one who attends the lecture is most saliently described as being a man, then it
was concluded that the word man must be associated with the function subject.
2
Phrase Structure
Even though such reasoning would not have carried any weight in the structuralist
framework, especially as they eschewed any reference to semantic facts, it still seems that the
primacy of the noun was taken for granted, as the phrase identified as the subject was referred
to as a ‘noun phrase’.
There was very little from a restrictive theoretical position to be said about phrases. A phrase
could be any group of words which displays distributional properties and its category would
be given on the basis of a mixture of considerations: what it contained and how it behaved,
etc. There was no uniform way of doing this. For example, despite the noun being the most
salient element in the phrase of the man, this was not to be called a noun phrase as it did not
distribute like other noun phrases. Instead, this was a preposition phrase for the simple reason
that the main difference between this an a noun phrase is the presence of the preposition.
Furthermore, two phrases that distributed in a similar way would be considered of the same
category regardless of how they were constituted. Thus a gerund, which has a distribution
identical to a noun phrase, can therefore be considered to be a noun phrase, despite not
containing a word that has clear nominal properties:
(2)
a
b
I considered [his convincing lie]/[him convincingly lying]
[his convincing lie]/[him convincingly lying] persuaded the jury
In these cases the gerund lying has obvious verbal properties, being modified by an adverb
rather than an adjective, and yet the construction distributes like a normal noun phrase. This
is not particularly problematic as there is nothing in the concept of a phrase from this
perspective that says that it must contain a word of the relevant category. The fact that most
phrases do contain such a word is therefore rather difficult to account for from this
perspective.
3
Phrase Structure Grammar
In his book Syntactic Structures (1957) Chomsky spent some time formalising essentially
what the structuralists had been doing under the title of Immediate Constituent Analysis. The
main aim of this was not so much to present a convincing theory based on structuralist
principles, and indeed in the rest of the book Chomsky pointed out a number of weaknesses
of a grammar based on such assumptions, but to exemplify how Chomsky considered
linguistic investigation ought to be carried out. The method was to develop a formal grammar
rigorously and explicitly and then to compare it to another such grammar to see which
accounted for the data best. By this process of elimination we could then work towards better
and better theories. In a sense therefore, Chomsky’s development of a Phrase Structure
Grammar was a little like building a straw man, to be knocked down later and it is clear that
even by that time he considered Immediate Constituent Analysis to be an inadequate method
of treating human languages.
Chomsky introduced two important things to make the formalisation of structuralist notions
more formal. The first is the representation of a constituent structure analysis, which he called
a phrase marker, but which is more commonly called a tree diagram. Quite simply this
represents the analysis of higher level structures into lower ones, down as far as the words.
Thus a representation of a simple sentence with a subject-predicate structure could look like
the following:
3
Mark Newson
(3)
S
NP
VP
Det N
V
NP
the man saw Det N
PP
P
NP
the boy in Det N
the park
The second thing Chomsky introduced was a kind of rule that could be used to produce such
structures. These he called Phrase Structure Rules and they had the following form:
(4)
X→YZ
This rule tells us what the immediate constituents of an element are. In particular, a
constituent of category X has immediate constituents of categories Y and Z. In terms of a
phrase marker, this rule would produce a section of a tree involving X, Y and Z:
(5)
X
Y
Z
To build a complete phrase marker, what is needed is a whole series of Phrase Structure
Rules, and the complete set of such rules that could generate all of the grammatical structures
for a language is called a Phrase Structure Grammar.
The set of rules needed to generate the structure in (3), for instance, would be as follows:
(6)
S → NP VP
VP → V NP PP
PP → P NP
NP → Det N
Clearly many more rules would be needed to generate all of the possible structures of
English.
It is important to note that at this point Chomsky has done nothing more than to formalise
what the structuralists had been doing in a less formal way. In particular, Phrase Structure
Rules of this kind allow us to define and analyse any collection of words and phrases as a
phrase of any kind. Thus it is quite easy to produce a rule which would generate the gerund
structure presented in (2) above:
(7)
NP → NP Adv V
(him convincingly lying)
This said, however, there are two important consequences that follow from Chomsky’s
formalisation of the constituent structure analysis. The first is, as I’ve already mentioned,
Chomsky’s aim was to demonstrate what is wrong with the constituent structure approach
4
Phrase Structure
and this formalisation made this easier to carry out. Chomsky’s contention was that we can
much better find defects in linguistic theories if they are made explicit. Although Chomsky
did little more than formalise the structuralist position, this allowed him to point out problems
which had successfully ignored until then.
The second important consequence of Chomsky’s formalisation of constituent structure
analysis was that it enabled the development of these ideas along paths that would not
otherwise have been taken. For example, once a rule of he form in (4) has been proposed we
can start to investigate its properties, how these might be changed and what the consequences
of these changes might be. Indeed, it was these possibilities that allowed a whole new subject
area of mathematical linguistics to develop. To give you some idea of how this works
consider the rule in (4) in more depth. As it stands it involves one symbol to the left of the
arrow and two to the right. We might ask what the consequences might be of allowing other
possibilities. For example, what if there could be more than one symbol on the left:
(8)
XY→AB
It is fairly clear that such a rule would not produce something representable as a tree diagram
as these essentially encompass the idea that single constituents break up into smaller
constituents. But the rule in (8) allows for something which is not even defined as a
constituent to be rewritten as something else which is also not defined as a constituent. Thus
whatever this rule is, it is not part of a constituent structure analysis. Yet, it is clear that the
rule in (4) is a restricted version of the kind of rule in (8): the former represents the same kind
of rule with the number of symbols to the left of the arrow limited to one. Thus
mathematically a grammar which restricts itself to the kind of rule in (4) is a subset of the
kind of grammar which also allows the kind of rule in (8):
(9)
Phrase Structure Grammar
Unrestricted Rewrite System
A further variation on these kinds of rules we can think of is if we allow more than one
symbol to the left of the arrow, but only one of these to be rewritten on the right:
(10)
AXB→AYZB
The restriction that only one symbol can be affected means that this is a phrase structure rule,
generating structures capable of being represented by tree diagrams. The difference between
this rule and the type we started with is that with this kind of rule we can take more aspects of
the structure into consideration when the rule applies. Specifically the rule in (10) says that X
rewrites as Y Z when it is preceded by A and followed by B. We say that this rule is therefore
context sensitive. The rule in (4) is a context free rule, by contrast, as it says that X rewrites
as Y Z no matter what else is present. Again we can see there is a mathematical relationship
between these kinds of rules: a context free rule is a kind of context sensitive rule lacking the
contextual parts and a context sensitive rule is an unrestricted rewrite rule restricted to
rewriting one symbol. Hence there is once more a set relationship between the kinds of
grammars constructed of such rules:
5
Mark Newson
Context Free Phrase Structure Grammar
(11)
Context Sensitive Phrase Structure Grammar
Unrestricted Rewrite System
This interesting thing about this is that set relationships are well understood aspect of
mathematics and this enables us to investigate the relationships between different
grammatical systems in a rigorous mathematical way. This is not to say that this is the way
that all linguistic investigation has to go, but merely that a new door is opened for
investigation which was previously unknown. Indeed this led to some very interesting work
starting in the 1960s, which asked questions about what kinds of formal systems such as these
can be defined, what their properties are, what types of languages they might generate and
most importantly what kind of language type does the set of human languages fall into.
Mathematical work was even done on the question of what kind of mathematical rule systems
are ‘learnable’, given certain assumptions about the learning situation. Obviously this was an
issue of some importance to Chomsky as he had placed emphasis on the ability of a theory to
account for the fact that human languages are ‘learned’ and therefore learnable in order to
elevate it to a status of ‘explanatory adequacy’.
4
What Phrase Structure Grammars Can Do
Before we move on to look at Chomsky’s criticism of Phrase Structure Grammar, it is worth
going over some of their obvious positive aspects first.
The first it that they can account for the fact of distribution. By this I do not mean that they
can account for all observations about the distribution of elements in all languages, but rather
that they can account for the fact that linguistic elements have distributions. In other words,
that distribution is a property of human languages. Consider what might have been the case if
human grammars worked along the lines of unrestricted rewrite systems. Presumably there
would be very little to observe that would indicate that groups of words hold together as
phrases and indeed words themselves would have quite complex patterns of distribution.
With Phrase Structure Grammars, however, both words and the phrases that the grammar
defines will have strict and observable distributions as the rules are capable of generating
structures in which words an phrases can appear in limited positions. Looking back at the
grammar fragment in (6) we can see that although there are only a small number of rules
here, the grammar defines the distribution of NPs quite comprehensively as sister of the VP
directly inside the clause (i.e. subject), sister of the verb inside the VP (i.e. object) and sister
of the preposition inside the PP (prepositional object). Furthermore it is predicted that we will
not find phrases other than NPs in these positions and the grammar could not generate
structures such as the following:
(12)
S
VP
VP
saw the boy in the park saw the boy in the park
6
Phrase Structure
S
NP
VP
the man V
VP
PP
saw saw the boy in the park in the park
The second positive consequence of a Phrase Structure Grammar is that it can predict facts
about language that would not be possible unless there is some notion of structure built into
the system. For example, there are cases of ambiguity which are explicable in terms of word
meanings: certain words are ambiguous and in certain contexts both meanings are possible
and so one could interpret the sentence in either way:
(13)
he wasn’t very smart
a
he wasn’t very intelligent
b
he wasn’t dressed well
However, there are cases of ambiguity which cannot be accounted for in terms of the
ambiguities of the words involved:
(14)
he wrote the note on the table
a
he wrote the note which is on the table
b
he wrote the note whilst he was at the table
This ambiguity has nothing to do with the possible meanings of the individual words, but to
do with what the PP on the table is interpreted as modifying – the note or the writing of the
note. Clearly the ambiguity relies on the structure of the expression as if the PP is part of the
NP as a modifier one interpretation results and if it is part of the VP the other interpretation is
achieved. We need only to add the following rules to the grammar in (6) to achieve this:
(15)
VP → V NP
NP → Det N PP
In other words, the PP is an optional part of the NP or VP structure. This produces, among
others, the following structures:
(16)
S
NP
he
VP
V
NP
PP
wrote the note on the table
7
Mark Newson
S
NP
VP
he V
NP
wrote Det N
PP
the note on the table
5
The Inadequacy of Phrase Structure Grammars
In 1957 Chomsky presented a number of phenomena from English which he claimed to be
difficult at best for a Phrase Structure Grammar to cope with. Possibly the easiest to discuss
here is the passive.
The issue is not that passive sentences cannot be given a Constituent Structure analysis and
therefore cannot be represented by tree structures and generated by rules. This is easy enough
to do for any passive sentence:
(17)
a
the man was seen
S
NP
VP
the man
Verb
Aux V
was seen
VP → Verb
Verb → Aux V
Here I try to be true to Chomsky’s 1957 presentation of the analysis, though it is clear that
this would not be considered adequate by today’s standards. For example, Chomsky follows
traditional grammar in analysing the auxiliary and the verb as a constituent, which there is
little empirical justification for. However, as this does not make any difference to the
arguments he presents I will ignore the issue. The important issue is the Chomsky observes
that the application of these rules cannot be free as it will produce a number of
ungrammatical sentences. For example the passive typically involves the use of the auxiliary
be in conjunction with the passive morpheme –en. But this can only happen if the verb is
transitive and moreover appears without its object:
(18)
a
b
* he was smiled
* he was seen Bill
But building these restrictions into a phrase structure grammar is very difficult. To start with,
a context free system would not be able to handle the data at all as it is clear that the passive
8
Phrase Structure
auxiliary and passive form of the verb can only be inserted into structures bearing in mind the
overall context of the passive construction. But even a context sensitive grammar would have
to get very complicated in order to handle simple passives. For example transitive verbs are
typically only used in the presence of an object and hence are contextually restricted
themselves. The opposite is true for intransitives as they can only be used in the context of a
missing object. These contextual restrictions are then reversed in the case of the passive, as
transitive verbs can only be inserted in the absence of an object and the intransitive verb
cannot be used at all, even when the object is absent.
A second problem caused by the passive construction can be seen by considering the
following data:
(19)
a
b
sincerity frightens John
John fears sincerity
* John frightens sincerity
* sincerity fears John
Clearly verbs place restrictions on what kind of elements can appear in their subject and
object positions and this seems to be determined for each verb. However these restrictions are
over-ridden in the passive:
(20)
a
b
John was frightened (by sincerity)
sincerity was feared (by John)
* sincerity was frightened (by John)
* John was feared (by sincerity)
However these restrictions are to be encoded in the grammar, and it is not clear how to do
this successfully with phrase structure rules, the point is that the restrictions would have to
restated in mirror image for each construction and its passive counterpart.
Worse still than the obvious complexities involved in getting phrase structure grammars to
handle what appears to be rather simple data, is the fact that in all the complexities a very
simple generalisation is completely lost. This is that the subject of the passive to all intents
and purposes is identified with the object of the active. This generalisation cannot be captured
in a phrase structure grammar for the simple reason that it has no way to connect two
different structures. In order to do this, Chomsky argues, a different type of rule is needed,
which we will be looking at in next week’s lecture.
6
6.1
More Advanced Phrase Structure Rules
The development of X-bar
Although Chomsky dismissed pure phrase structure grammars as adequate models for human
languages, he always maintained that some phrase structure component, essentially based on
rewrite rules, played a role in the human linguistic system. However, some of the intrinsic
problems of the Constituent Structure Analysis were therefore carried over to the initial
generative grammars. For example, the fact that constituents were unrestricted and motivated
purely on empirical grounds and hence certain properties, such as that noun phrases contain
nouns and verb phrases contain verbs, are difficult to account for also carried over to the
earlier generative systems.
It wasn’t until 1970 that the issue was addressed by the introduction of the X-bar system –
specifically the ‘X’ part of this. The mechanism is quite simple and counts as a further
9
Mark Newson
restriction on the kinds of phrase structure rules allowed. It involves imposing the notion of a
head onto the system by restricting rules to the following form:
(21)
Xm → … Xn …
The superscripts, while part of the X-bar system, are irrelevant to the point at hand. This is
that there must be a symbol of the same category to the left and the right of the arrow. Thus
an NP will always contain at least one element of category N and a VP a V, etc. The X-bar
system, then turns out to be a further restriction of phrase structure grammars, though not one
that falls into a simple subset relationship to the others we have reviewed as in principle one
can have context free and context sensitive X-bar rules, though by the time X-bar theory was
proposed the use of context sensitive rules was long considered unnecessary outside of the
treatment of lexical distribution.
By the 1980s however, the introduction of the X-bar system had led to the situation in which
the phrase structure component of the grammar was all but eliminated from the grammar,
leaving a small number of rules of the type in (21) in its place. This is another topic to which
we will return in a future lecture.
6.2
Slash categories
Some have claimed that Chomsky’s original criticism of phrase structure rules was too harsh
and if we were to allow ourselves certain extensions to these rules, the kind of data that
Chomsky claimed to be difficult to account for could be handled. For example another
construction that seems to require added complexities are wh-questions in which an element
missing from a given position in a sentence is linked to the appearance of a related element at
the beginning of the sentence:
(22)
I asked who he met __
In Generalised Phrase Structure Grammar it was proposed to handle such effects by the
introduction of a ‘slash category’, which is essentially a category with something missing and
which can combine with another element of the relevant type to make a whole category or be
part of a larger constituent which inherits the property of the missing element. To give some
idea of how this works consider the sentence above. This involves a VP with a missing object
and so its category is VP/NP. This combines with a subject to form a sentence with a missing
object, of the category S/NP. This combines with the wh-NP to form a complete sentence:
10
Phrase Structure
(23)
S
NP
I
VP
V
S
asked NP
S/NP
who NP VP/NP
he V
NP/NP
met
We cannot go into detail about how this system was formalised in a phrase structure grammar
as it would involve the introduction of too much new and rather technical mechanisms.
However, some idea of how it might be instantiated, if somewhat simplistically, can be
gained by considering the following rewrite rules:
(24)
XP/XP → 
XP/YP → … ZP/YP
XP → YP XP/YP
The first of these rules simply says that a constituent that lacks itself is empty. This applies to
the missing object in (23). The second states that a constituent that lacks some constituent
contains some other category that lacks the same constituent. In (23) we can see that the
VP/NP conforms to this rule as it contains the NP/NP constituent representing the missing
object. In turn the S/NP then contains the VP/NP and so this is another example of the
application of this rule. Finally the third says that a constituent can be made up of something
that lacks a certain constituent combined with a constituent of the missing type. The
embedded S is an instance of this rule as it contains the S/NP category combined with an NP:
in a sense the NP and the /NP cancel each other out and what we are left with is the S.
7
Conclusion
In this lecture we have concentrated on the treatment of phrase structure in linguistics over
the past 100 or so years. It is somewhat intriguing that the very notion of phrase structure has
maintained its prominence, even in Chomskyan generative grammar, when its initial
development was based on structuralist/empiricist ideas that Chomsky vehemently attacked at
the end of the 1950s. Thus this is perhaps the only notion to survive in current generative
linguistics from the structuralist tradition. Moreover, despite Chomsky’s withering scepticism
of the structuralist idea of discovery procedures, the kinds of methods that they developed are
still very much part of linguistic practise and are to be found described in detail in almost all
introduction to linguistics text books, though clearly no one these days believe them to be
infallible ways of extracting facts from observations. While there have been one or two
attempts to question the validity of the structuralist remnant in generative grammar, these are
very much in the minority and on the whole the idea of constituent structure is now so
embedded in the linguistic conscience that it is virtually unquestionable. This is surprising
given that the spirit of generative grammar in accordance with rationalist investigation
11
Mark Newson
purports to believe that every theoretical construct should be held up to scrutiny and that no
assumption should be sacrosanct.
12
Download