Generalizing Context-Free Grammars Carl Pollard Linguistics 602.02 Jan. 16, 2007 (1) A Toy English GCFG (Syntax Only) Types = {S, NP, VP, TV, DTV, SV, Det, N, Adv} ConstsNP = {Fido, Felix, Mary} ConstsVP = {barked} ConstsTV = {bit} ConstsDTV = {gave} ConstsSV = {believed, heard} ConstsDet = {the} ConstsN = {cat, dog} ConstsAdv = {yesterday} OpsNP VP S = {p} OpsTV NP VP = {q} OpsDTV NP NP VP = {r} OpsSV S VP = {s} OpsVP Adv VP = {t} OpsDet N NP = {u} (2) Simpler Notation for GCFG Fido, Felix, Mary : NP barked : VP bit : TV gave : DTV believed, heard : SV the : Det cat, dog : N yesterday : Adv 1 p : NP, VP; S q : TV, NP; VP r : DTV, NP, NP; VP s : SV, S; VP t : VP, Adv; VP u : Det, N; NP (3) In an Interpretation I of the GCFG: • NP denotes the syntactic category of NPs, I(NP). • Fido denotes the word I(Fido) ∈ I(NP). • p denotes a function I(p) : (I(NP) × I(VP)) → I(S). (4) Notational Simplication To prevent notational clutter, we use the object-language symbols (types, constants, and operation symbols) as metalanguage names of the things they denote. E.g., we’ll speak of: • ‘the category NP’, not ‘the category I(NP)’ • ‘the NP Fido’, not ‘the word I(Fido) of category I(NP)’ • ‘the construction p : (NP × VP) → S’,not ‘the construction I(p) : (I(NP) × I(VP)) → I(S)’ (5) GCFG Phrase Notation a. Mary believed Fido bit the cat. b. S NP Mary VP SV believed S NP VP TV Fido bit NP Det N the cat c. [S [NP Mary][VP [SV believed][S [NP Fido][VP [TV bit][NP [Det the][N cat]]]]]] d. p(Mary, s(believed, p(Fido, q(bit, u(the, cat))))) 2 (6) Some Differences between CFG and GCFG a. The traditional labelled bracketing [S [NP Mary][VP [SV believed][S [NP Fido][VP [TV bit][NP [Det the][N cat]]]]]] is just an alternative notation for the labelled ordered tree in (5), whereas p(Mary, s(believed, p(Fido, q(bit, u(the, cat))))) is a term in a formal language (the set of terms of the algebraic signature (1)), and in any algebra I that interprets that signature, this term denotes not a tree but rather a member of the set I(S) (the syntactic category of sentences). b. So in GCFG, there aren’t any phrase structure trees. c. And so the work done in other frameworks by concepts and operations that are defined in terms of trees (e.g. nonbranching vs. branching nodes, binary branching vs. flat structure, domination, government, c-command, m-command, movement, etc.) will have to be done in other ways. d. The GCFG term doesn’t explicitly display the types of the subterms because these are inferrable from the GCFG itself. e. The GCFG term does explictly show which constructions are involved (the operation symbols used to form the phrasal subterms are names of constructions). f. Except in languages where the syntax-phonology interface is purely concatenative, in general the phonology of a phrase cannot be read directly off the GCFG term denoting that phrase (as it can be from a traditional labelled bracketing). (7) If the Syntax-Phonology Interface is Purely Concatena- tive: We can assume that the set of possible phonologies of linguistic expressions is a monoid. We say this in the grammar by adding a. a type Phon of phonologies b. a constant e : Phon (‘phonological zero’) c. constants of type Phon for lexical phonologies, e.g. e.g. /fajdo/, /bIt/, etc. 3 d. a binary operation symbol _ for concatenation, subject to the equations1 ∀x ∀y ∀z ((x _ y) _ z = x _ (y _ z)) ∀x (x _ e = x) ∀x (x = e _ x) e. equations that specify the phonologies of words, e.g. phon(Fido) = /fajdo/ phon(bit) = /bIt/ phon(Felix) = /filIks/ f. for each construction, an equation that tells in what order the phonologies of the ICs are to be concatenated, e.g. 2 ∀n ∀v (phon(p(n, v)) = phon(n) _ phon(v) (8) Abbreviatory Conventions for Phonologies • We write lexical phonologies between slashes, e.g. /fajdo/. • Because of associativity, we can unambiguously write /fajdo/ _ /bIt/ _ /filIks/ rather than (/fajdo/ _ /bIt/) _ /filIks/ or /fajdo/ _ (/bIt/ _ /filIks/). • As a further simplication, we omit all but the outermost slashes, thus: /fajdo bIt filIks/. 1 2 Here the variables are of type Phon. Here n and v are variables of type NP and VP respectively. 4