CS4025: Grammars l l l Grammars for English Features Definite Clause Grammars See J&M Chapter 9 in 1st ed, 12 in 2nd, Prolog textbooks (for DCGs) Computing Science, University of Aberdeen 1 Grammar: Definition The surface structure (syntax) of sentences is usually described by some kind of grammar. l A grammar defines syntactically legal sentences. » John ate an apple » John ate apple » John ate a building l (syn legal) (not syn legal) (syn legal) More importantly, a grammar provides a description of the structure of a legal sentence Computing Science, University of Aberdeen 2 Facts about Grammars There is no single “correct”or “complete” grammar for any natural language (though people have tried to make them and certain terminology has emerged that is generally useful). Linguists dispute exactly what formalism should be used for stating a grammar. Linguists dispute exactly what data the grammar should apply to. Grammars describe subsets of NLs. They all have their strengths and weaknesses. Computing Science, University of Aberdeen 3 Sentence Constituents Grammar rules mention constituents (e.g. NP – noun phrase) as well as single word types (noun) A constituent can occur in many rules » NP also after a preposition (near the dog), in possessives (the cat’s paws),... These capture to some extent ideas such as: » Where NP occurs in a rule, any phrase of this class can appear » NP’s have something in common in terms of their meaning » Two phrases of the same kind, e.g. NP, can be joined together by the word “and” to yield a larger phrase of the same kind. Phrase structure tests. Computing Science, University of Aberdeen 4 Common Constituents Some common constituents » » » » » » Noun phrase: the dog, an apple, the man I saw yesterday Adjective phrase: happy, very happy Verb group: eat, ate, am eating, have been eating. Verb phrase: eating an apple, ran into the store Prepositional phrase: in the car Sentence: A woman ate an apple. Constituents named “X phrase” in general have a most important word, the head, of category X (underlined above). The head determines properties of the rest of the phrase. Computing Science, University of Aberdeen 5 Phrase-Structure Grammar PS grammars use context-free rewrite rules to define constituents and sentences » S => NP Verb NP Similar to rules used to define syntax of programming languages. » if-statement => if condition then statement Also called context-free grammar (CFG) because the context does not restrict the application of the rules. Adequate for (almost all of?) English » see J&M, chap 13 Computing Science, University of Aberdeen 6 A simple grammar – – – – – – – – – – – S => NP VP NP => Det Adj* Noun NP => ProperNoun VP => IntranVerb VP => TranVerb NP Det: a , the Adj: big , black CommonNoun: dog , cat ProperNoun: Fido , Misty, Sam IntranVerb: runs, sleeps TranVerb: chases, sees, saw How does this assign structure to simple sentences? Computing Science, University of Aberdeen 7 Example: Sam sleeps Sentence NP (NounPhrase) ProperNoun VP (VerbPhrase) IntranVerb Rewrite/expansion rules. Sam Computing Science, University of Aberdeen sleeps 8 Ex: The cat sees the dog Sentence VP NP Det Noun TranVerb NP Det The cat Computing Science, University of Aberdeen sees the Noun dog 9 Fido chases the big black cat Sentence NP ProperNoun VP TranVerb NP Det Fido chases Computing Science, University of Aberdeen the Adj Adj Noun big black cat 10 Recursion NL Grammars allow recursion NP = Det Adj* CommonNoun PP* PP = Prep NP For example: The big cat... The big black wild cat.... The man in the store saw John. I fixed the intake of the engine of BA’s first jet. Computing Science, University of Aberdeen 11 The man in the store saw John Sentence NP Det Noun VP PP Prep The man Computing Science, University of Aberdeen in NP Det Noun the store [saw John] 12 Grammar is not everything Sentences may be grammatically OK but not acceptable (acceptability?) » The sleepy table eats the green idea. » Fido chases the black big cat. Sentences may be understandable even if grammatically flawed. » Fido chases a cat small. BUT: Grammar terminology may be useful for describing sentences even if they are flawed (e.g. in tutoring systems) Grammars may be useful for analysing fragments within unrestricted and badly-formed text. Computing Science, University of Aberdeen 13 Features Features are annotations on words and consituents Grammar rules use unification (as in Prolog) to manipulate features. (See references on unification, which is informally feature matching to the most common feature set.) Allow grammars to be smaller and simpler. If we accommodated all features in the above grammar, it would be very complex. See J&M, chap 11 (1st edition), chap 15 (2nd edition) Computing Science, University of Aberdeen 14 Ex: Number One feature is [number:N], where N is singular or plural. » A noun’s number comes from morphology (and semantics) – cat – cats - singular noun - plural noun » An NP's number comes from its head noun » A VP’s number comes from its verb (or auxiliary verb) – is happy – are happy - singular VP - plural VP » NP and VP must agree in number: – The rat has an injury – *The rat have an injury Computing Science, University of Aberdeen 15 Ex: Number Similar issue between determiners and the head common noun: » » » » This tall man... *This tall men... *These tall man... These tall men... Languages can be richer (Slavic) or poorer (English, Chinese) morphologically (analysis of word forms). Many different sorts of features in the world's languages (animacy, shape, gender,...). Computing Science, University of Aberdeen 16 Modified Grammar » » » » » » » » » S => NP[num:N] VP[num:N] NP[num:N] => Det[num:N] Adj* Noun[num:N] PP* NP[num:singular] => ProperNoun VP[num:N] => IntranVerb[num:N] VP[num:N] => TranVerb[num:N] NP PP => Prep NP Noun[num:sing]: dog Noun[num:plur]: dogs etc. ??Sentence features? Computing Science, University of Aberdeen 17 Example rule NP[num:N] = Det[num:N] Adj* Noun[num:N] An NP’s determiner and noun (but not its adjectives in English v Russian) must agree in number. – – – – – – this dog is OK these dogs is OK this dogs is not OK (det must agree) these dog is not OK (det must agree) the big red dog is OK the big red dogs is OK (adjs don’t agree) [number of NP] is [number of Noun] Might also need some 'feature passing' in complex structures. Computing Science, University of Aberdeen 18 Ex: These cats see this dog Sentence[plur] NP[plur] Det[plural] Noun[plur] VP[plur] TranVerb[plur] NP[sing] Det[sing] These cats Computing Science, University of Aberdeen see this Noun[sing] dog 19 Ex: *This cats see this dog *Sentence[] NP[??] Det[sing] Noun[plur] VP[plur] TranVerb[plur] NP[sing] Det[sing] This cats Computing Science, University of Aberdeen see this Noun[sing] dog 20 Other features Person (I, you, him) Verb form (past, present, base, past participle, pres participle) Gender (masc, fem, neuter) Polarity (positive, negative) etc, etc, etc Computing Science, University of Aberdeen 21 Definite Clause Grammars A definite clause grammar (DCG) is a CFG with added feature information The syntax is also Prolog-like, and DCGs are usually implemented in Prolog You don’t need to know Prolog to use DCGs The formalism was (partially) invented in Scotland Computing Science, University of Aberdeen 22 DCG Notation (1) Use ---> between rule LHS and RHS. Non-terminal symbols (e.g., np) written as Prolog constants (start with lower case letters) Features (eg, Num) are arguments in () » Features are shown by position, not name » Prolog variables (start with upper case letters) indicate unknown values Actual words are lists, in [] Prolog tests can be attached in {} _ is the ``wild card'' variable (Optional) use distinguished predicate to define the goal symbol (eg, s). Computing Science, University of Aberdeen 23 DCG Notation (2) Rule notation S => NP VP S[num:N] => NP[num:N] VP[num:N] DCG notation s ---> np, vp. N => dog N => New York NP => <name> n ---> [dog]. n ---> [new, york]. np ---> [X], {isname(X)}. s(Num) ---> np(Num), vp(Num). NB the number of “-” in the “--->” varies according to the implementation Variables have to match, e.g. N, Num, X.... Computing Science, University of Aberdeen 24 An example DCG distinguished(s). s ---> np(Num), vp(Num). np(Num) ---> det(Num), noun(Num). np(sing) ---> name. vp(Num) ---> intranverb(Num). vp(Num) ---> tranverb(Num), np(_). _ means no constraint on Num agreement between verb and direct object. det(_) ---> [the]. noun(sing) ---> [dog]. name ---> [sam]. intranverb(sing) ---> [sleeps]. intranverb(plur) ---> [sleep]. tranverb(sing) ---> [chases]. tranverb(plur) ---> [chase]. Computing Science, University of Aberdeen 25 DCG’s and Prolog DCG’s are translated into Prolog in a fairly straightforward fashion » See Shapiro, The Art of Prolog In fact, Prolog was originally designed as a language for NLP, as part of a machine-translation project. Prolog has evolved since, but is still well-suited to many NLP operations. Computing Science, University of Aberdeen 26 Edinburgh Package We will use a parsing system for DCGs developed (for teaching purposes) by Holt and Ritchie from Edinburgh University. Use SWI Prolog Computing Science, University of Aberdeen 27 Using the package Statements always end with “.” Prompt is ?Type code into a file, not directly into the interpreter. Load files with [file]. » .pl = prolog source By default, type “n.” to backtrack request Computing Science, University of Aberdeen 28 Using the DCG package ?- prp([sam, chases, the, dog]). parse and pretty print ?- prp(np, [the, dog]). parse and print an np ?- parse([sam, chases, the, dog], X). returns parse in X ?- parse(np, [the, dog], X). parse a constituent (np) For all the above, after each parse is printed, you will be asked if you want to look for additional parses NB for other DCG implementations, the grammar writer explicitly builds parse structures in the grammar rules. Computing Science, University of Aberdeen 29 Other Grammar Formalisms Many other grammar formalisms » GPSG, HPSG, LFG, TAG, ATN, GB,... No consensus on which is best, although many strong opinions (like C vs Pascal vs Lisp) Like programming languages, formalism used often determined by experience of developers, and available support tools Computing Science, University of Aberdeen 30