Grammars - Homepages | The University of Aberdeen

advertisement
CS4025: Grammars
l
l
l
Grammars for English
Features
Definite Clause Grammars
See J&M Chapter 9 in 1st ed, 12 in 2nd, Prolog textbooks
(for DCGs)
Computing Science, University of Aberdeen
1
Grammar: Definition
The surface structure (syntax) of sentences is usually
described by some kind of grammar.
l
A grammar defines syntactically legal sentences.
» John ate an apple
» John ate apple
» John ate a building
l
(syn legal)
(not syn legal)
(syn legal)
More importantly, a grammar provides a description
of the structure of a legal sentence
Computing Science, University of Aberdeen
2
Facts about Grammars
There is no single “correct”or “complete” grammar
for any natural language (though people have tried to
make them and certain terminology has emerged that
is generally useful).
Linguists dispute exactly what formalism should be
used for stating a grammar.
Linguists dispute exactly what data the grammar
should apply to.
Grammars describe subsets of NLs. They all have
their strengths and weaknesses.
Computing Science, University of Aberdeen
3
Sentence Constituents
Grammar rules mention constituents (e.g. NP – noun
phrase) as well as single word types (noun)
A constituent can occur in many rules
» NP also after a preposition (near the dog), in possessives
(the cat’s paws),...
These capture to some extent ideas such as:
» Where NP occurs in a rule, any phrase of this class can
appear
» NP’s have something in common in terms of their meaning
» Two phrases of the same kind, e.g. NP, can be joined
together by the word “and” to yield a larger phrase of the
same kind. Phrase structure tests.
Computing Science, University of Aberdeen
4
Common Constituents
Some common constituents
»
»
»
»
»
»
Noun phrase: the dog, an apple, the man I saw yesterday
Adjective phrase: happy, very happy
Verb group: eat, ate, am eating, have been eating.
Verb phrase: eating an apple, ran into the store
Prepositional phrase: in the car
Sentence: A woman ate an apple.
Constituents named “X phrase” in general have a
most important word, the head, of category X
(underlined above). The head determines properties
of the rest of the phrase.
Computing Science, University of Aberdeen
5
Phrase-Structure Grammar
PS grammars use context-free rewrite rules to define
constituents and sentences
» S => NP Verb NP
Similar to rules used to define syntax of programming
languages.
» if-statement => if condition then statement
Also called context-free grammar (CFG) because the
context does not restrict the application of the rules.
Adequate for (almost all of?) English
» see J&M, chap 13
Computing Science, University of Aberdeen
6
A simple grammar
–
–
–
–
–
–
–
–
–
–
–
S => NP VP
NP => Det Adj* Noun
NP => ProperNoun
VP => IntranVerb
VP => TranVerb NP
Det: a , the
Adj: big , black
CommonNoun: dog , cat
ProperNoun: Fido , Misty, Sam
IntranVerb: runs, sleeps
TranVerb: chases, sees, saw
How does this assign structure to simple sentences?
Computing Science, University of Aberdeen
7
Example: Sam sleeps
Sentence
NP (NounPhrase)
ProperNoun
VP (VerbPhrase)
IntranVerb
Rewrite/expansion
rules.
Sam
Computing Science, University of Aberdeen
sleeps
8
Ex: The cat sees the dog
Sentence
VP
NP
Det
Noun
TranVerb
NP
Det
The
cat
Computing Science, University of Aberdeen
sees
the
Noun
dog
9
Fido chases the big black cat
Sentence
NP
ProperNoun
VP
TranVerb
NP
Det
Fido
chases
Computing Science, University of Aberdeen
the
Adj
Adj
Noun
big
black
cat
10
Recursion
NL Grammars allow recursion
NP = Det Adj* CommonNoun PP*
PP = Prep NP
For example:
The big cat...
The big black wild cat....
The man in the store saw John.
I fixed the intake of the engine of BA’s first jet.
Computing Science, University of Aberdeen
11
The man in the store saw John
Sentence
NP
Det
Noun
VP
PP
Prep
The
man
Computing Science, University of Aberdeen
in
NP
Det
Noun
the
store
[saw John]
12
Grammar is not everything
Sentences may be grammatically OK but not
acceptable (acceptability?)
» The sleepy table eats the green idea.
» Fido chases the black big cat.
Sentences may be understandable even if
grammatically flawed.
» Fido chases a cat small.
BUT:
Grammar terminology may be useful for describing
sentences even if they are flawed (e.g. in tutoring
systems)
Grammars may be useful for analysing fragments
within unrestricted and badly-formed text.
Computing Science, University of Aberdeen
13
Features
Features are annotations on words and consituents
Grammar rules use unification (as in Prolog) to
manipulate features. (See references on unification,
which is informally feature matching to the most
common feature set.)
Allow grammars to be smaller and simpler. If we
accommodated all features in the above grammar, it
would be very complex.
See J&M, chap 11 (1st edition), chap 15 (2nd edition)
Computing Science, University of Aberdeen
14
Ex: Number
One feature is [number:N], where N is singular or
plural.
» A noun’s number comes from morphology (and semantics)
– cat
– cats
- singular noun
- plural noun
» An NP's number comes from its head noun
» A VP’s number comes from its verb (or auxiliary verb)
– is happy
– are happy
- singular VP
- plural VP
» NP and VP must agree in number:
– The rat has an injury
– *The rat have an injury
Computing Science, University of Aberdeen
15
Ex: Number
Similar issue between determiners and the head
common noun:
»
»
»
»
This tall man...
*This tall men...
*These tall man...
These tall men...
Languages can be richer (Slavic) or poorer (English,
Chinese) morphologically (analysis of word forms).
Many different sorts of features in the world's
languages (animacy, shape, gender,...).
Computing Science, University of Aberdeen
16
Modified Grammar
»
»
»
»
»
»
»
»
»
S => NP[num:N] VP[num:N]
NP[num:N] => Det[num:N] Adj* Noun[num:N] PP*
NP[num:singular] => ProperNoun
VP[num:N] => IntranVerb[num:N]
VP[num:N] => TranVerb[num:N] NP
PP => Prep NP
Noun[num:sing]: dog
Noun[num:plur]: dogs
etc.
??Sentence features?
Computing Science, University of Aberdeen
17
Example rule
NP[num:N] = Det[num:N] Adj* Noun[num:N]
An NP’s determiner and noun (but not its adjectives
in English v Russian) must agree in number.
–
–
–
–
–
–
this dog
is OK
these dogs is OK
this dogs is not OK (det must agree)
these dog is not OK (det must agree)
the big red dog is OK
the big red dogs is OK (adjs don’t agree)
[number of NP] is [number of Noun]
Might also need some 'feature passing' in complex
structures.
Computing Science, University of Aberdeen
18
Ex: These cats see this dog
Sentence[plur]
NP[plur]
Det[plural]
Noun[plur]
VP[plur]
TranVerb[plur]
NP[sing]
Det[sing]
These
cats
Computing Science, University of Aberdeen
see
this
Noun[sing]
dog
19
Ex: *This cats see this dog
*Sentence[]
NP[??]
Det[sing]
Noun[plur]
VP[plur]
TranVerb[plur]
NP[sing]
Det[sing]
This
cats
Computing Science, University of Aberdeen
see
this
Noun[sing]
dog
20
Other features
Person (I, you, him)
Verb form (past, present, base, past participle, pres
participle)
Gender (masc, fem, neuter)
Polarity (positive, negative)
etc, etc, etc
Computing Science, University of Aberdeen
21
Definite Clause Grammars
A definite clause grammar (DCG) is a CFG with
added feature information
The syntax is also Prolog-like, and DCGs are usually
implemented in Prolog
You don’t need to know Prolog to use DCGs
The formalism was (partially) invented in Scotland
Computing Science, University of Aberdeen
22
DCG Notation (1)
Use ---> between rule LHS and RHS.
Non-terminal symbols (e.g., np) written as Prolog
constants (start with lower case letters)
Features (eg, Num) are arguments in ()
» Features are shown by position, not name
» Prolog variables (start with upper case letters) indicate
unknown values
Actual words are lists, in []
Prolog tests can be attached in {}
_ is the ``wild card'' variable
(Optional) use distinguished predicate to define the
goal symbol (eg, s).
Computing Science, University of Aberdeen
23
DCG Notation (2)
Rule notation
S => NP VP
S[num:N] =>
NP[num:N] VP[num:N]
DCG notation
s ---> np, vp.
N => dog
N => New York
NP => <name>
n ---> [dog].
n ---> [new, york].
np ---> [X], {isname(X)}.
s(Num) ---> np(Num), vp(Num).
NB the number of “-” in the “--->” varies according to
the implementation
Variables have to match, e.g. N, Num, X....
Computing Science, University of Aberdeen
24
An example DCG
distinguished(s).
s ---> np(Num), vp(Num).
np(Num) ---> det(Num), noun(Num).
np(sing) ---> name.
vp(Num) ---> intranverb(Num).
vp(Num) ---> tranverb(Num), np(_). _ means no constraint on
Num agreement between verb and direct object.
det(_) ---> [the].
noun(sing) ---> [dog].
name ---> [sam].
intranverb(sing) ---> [sleeps].
intranverb(plur) ---> [sleep].
tranverb(sing) ---> [chases].
tranverb(plur) ---> [chase].
Computing Science, University of Aberdeen
25
DCG’s and Prolog
DCG’s are translated into Prolog in a fairly
straightforward fashion
» See Shapiro, The Art of Prolog
In fact, Prolog was originally designed as a language
for NLP, as part of a machine-translation project.
Prolog has evolved since, but is still well-suited to
many NLP operations.
Computing Science, University of Aberdeen
26
Edinburgh Package
We will use a parsing system for DCGs developed
(for teaching purposes) by Holt and Ritchie from
Edinburgh University.
Use SWI Prolog
Computing Science, University of Aberdeen
27
Using the package
Statements always end with “.”
Prompt is ?Type code into a file, not directly into the interpreter.
Load files with
[file].
» .pl
= prolog source
By default, type “n.” to backtrack request
Computing Science, University of Aberdeen
28
Using the DCG package
?- prp([sam, chases, the, dog]). parse and pretty print
?- prp(np, [the, dog]).
parse and print an np
?- parse([sam, chases, the, dog], X). returns parse in X
?- parse(np, [the, dog], X).
parse a constituent (np)
For all the above, after each parse is printed, you will be asked if
you want to look for additional parses
NB for other DCG implementations, the grammar writer explicitly
builds parse structures in the grammar rules.
Computing Science, University of Aberdeen
29
Other Grammar Formalisms
Many other grammar formalisms
» GPSG, HPSG, LFG, TAG, ATN, GB,...
No consensus on which is best, although many
strong opinions (like C vs Pascal vs Lisp)
Like programming languages, formalism used often
determined by experience of developers, and
available support tools
Computing Science, University of Aberdeen
30
Download