Natural Language Processing DCG and Syntax • • • • NLP DCG A “translation” example: special case A DCG recogniser Natural Language Processing NLP is the art and science of getting computers to understand natural language. • NLP draws on materials from other disciplines: computer science, formal philosophy and formal linguistics • NLP is an “AI complete” task: all the activities which turn up elsewhere in AI, such as knowledge representation, planning, inference and so on turn up in one form or another in NLP. DCG (Definite Clause Grammar) • An example • $120pw $200perweek $150pweek • [$, 1, 2, 0, p, w] price-->dollar, number, unit. dollar-->[$]. number-->digit, number. number-->digit. unit-->[p,w]. unit-->[p, e, r, w, e, e, k]. unit-->[p, w, e, e, k]. digit-->[1]. digit-->[2]. digit-->[3]. ... • ?- price([$, 2, 3, 4, p, w], []). • Yes • ?-price([8, 0, 0, p, w, e, e, k], []). • No. Expand DCG to standard predicates Price-->dollar, number, unit. price(List1, List2):dollar(List1, List11), number(List11, List12), unit(List12, List2). dollar-->[$]. dollar([$|List], List). digit-->[1]. digit([1|List], List]). number-->digit, number. number-->digit. Extending DCG 1. Add variables 2. Add normal predicates in { } • • ?-price(X, [$, 1,2,3, p, w], []). X=[1,2,3] price(X)-->dollar, number(X), unit. number([D|T])-->digit(D), number(T). number([D])-->digit(D). digit(1)-->[1]. digit(2)-->[2]. price(X)-->dollar, number(X), unit, {length(X,N), N<3}. Expand extended DCG to standard predicates price(X)-->dollar, number(X), unit, {length(X, N), N<3}. price(X, List1, List2):dollar(List1, List11), number(X, List11, List12), unit(List12, List2), length(X, N), N<3. A “machine translation” example • • • • three hundred and thirty four: 334 twenty one: 21 fourteen: 14 five: 5 • ?-to_number(N, [three, hundred, and, thirty, four],[]). • N=334. A “translation” example • Vocabulary, lexicon digit(1) --> [one]. digit(2) --> [two]. ….. digit(9) --> [nine]. teen(10) --> [ten]. teen(11) --> [eleven]. ….. teen(19) --> [nineteen]. tens(20) --> [twenty]. tens(30) --> [thirty]. ….. tens(90) --> [ninety]. A “translation” example • Numbers with one or two digits. xx(N) --> digit(N). xx(N) --> teen(N). xx(N) --> tens(T), rest_xx(N1), {N is T+N1}. rest_xx(N) --> digit(N). rest_xx(0) --> []. A “translation” example % numbers with 3 or fewer digits xxx(N) --> digit(D), [hundred], rest_xxx(N1), {N is D*100+N1}. xxx(N) --> xx(N). rest_xxx(N) --> [and], xx(N). rest_xxx(0) --> []. %top level to_number(0) --> [zero]. to_number(N) --> xxx(N). Query ?-to_number(N, [two, hundred, and, twenty one], []). N=221 Representing Syntactic Knowledge Syntactic knowledge: – – – Syntactic Categories: e.g. Noun, Sentence. Grammatical features: e.g. Singular, Plural Grammar rules. • Why bother? Parts of language • Regard sentences as being built out of constituents • Two types of constituents: – – words (simple constituents), which have lexical categories like noun, verb, etc. phrases (compound constituents), like noun phrases, verb phrases, etc. • How to store syntactic knowledge? – – lexicon grammar rules Words: Lexical Categories (Parts of Speech) • Noun (N): Jack, tree, house, cannon • Verb (V): build, walk, kill • Adjective (Adj): big, red, unpleasant • Determiner (Det): the, a, which, that – – Jack built {the, a, that} big, red house; Which house did Jack build? • Preposition (Prep): with, for, in, from, to, through, via, under Words: Lexical Categories (ctd) • Pronoun (Pro): her, him, she, itself, that, it – – I saw the man in the park with the telescope Don't do that to him • Conjunction (Conj): and, or, but. Two kinds of lexical categories: 1. Open categories (“content words”): N, V, Adj 2. Closed categories (“function words”): Det, Prep, Pro, Conj Compound Constituents Some compound constituents: Sentence (S): Jack built the house. Noun Phrase (NP): John; the big, red house; the house that Jack built; the destruction of the city. Verb Phrase (VP): built the house quickly; saw the man in the park. Prepositional Phrase (PP): with the telescope; on the table A Simple Grammar S NP VP VP V NP NP Proper_N NP det N Proper_N John Proper_N Mary N cake V loves V ate det the Sentences in this language: “John loves Mary” “John ate the cake” “John loves the cake” Definite Clause Grammars (DCGs) The above grammar can be simply implemented in DCG notation as follows: s --> np, vp. vp --> v, np. np --> proper_n. np --> det, n. proper_n --> [john]. proper_n --> [mary]. n --> [cake]. v --> [loves]. v --> [ate]. det --> [the]. Translating DCG Consider the rule s --> np, vp. Prolog translates this as: s(Ws1,Ws2) :- np(Ws1,Ws),vp(Ws,Ws2). This says that after taking an s off the start of Ws1, Ws2 remains The rule proper_n --> [john]. is translated as proper_n([john|Ws],Ws). Query • s([john, ate, the cake],[]). • Yes • s([ate, john, cake, the],[]). • No