Document

advertisement
Natural Language Processing
DCG and Syntax
•
•
•
•
NLP
DCG
A “translation” example: special case
A DCG recogniser
Natural Language Processing
NLP is the art and science of getting computers
to understand natural language.
• NLP draws on materials from other
disciplines: computer science, formal
philosophy and formal linguistics
• NLP is an “AI complete” task: all the
activities which turn up elsewhere in AI,
such as knowledge representation, planning,
inference and so on turn up in one form or
another in NLP.
DCG (Definite Clause
Grammar)
• An example
• $120pw $200perweek $150pweek
• [$, 1, 2, 0, p, w]
price-->dollar, number, unit.
dollar-->[$].
number-->digit, number.
number-->digit.
unit-->[p,w].
unit-->[p, e, r, w, e, e, k].
unit-->[p, w, e, e, k].
digit-->[1].
digit-->[2].
digit-->[3].
...
• ?- price([$, 2, 3, 4, p, w], []).
• Yes
• ?-price([8, 0, 0, p, w, e, e, k], []).
• No.
Expand DCG to standard
predicates
Price-->dollar, number, unit.
price(List1, List2):dollar(List1, List11),
number(List11, List12),
unit(List12, List2).
dollar-->[$].
dollar([$|List], List).
digit-->[1].
digit([1|List], List]).
number-->digit, number.
number-->digit.
Extending DCG
1. Add variables
2. Add normal predicates in { }
•
•
?-price(X, [$, 1,2,3, p, w], []).
X=[1,2,3]
price(X)-->dollar,
number(X), unit.
number([D|T])-->digit(D),
number(T).
number([D])-->digit(D).
digit(1)-->[1].
digit(2)-->[2].
price(X)-->dollar,
number(X), unit,
{length(X,N), N<3}.
Expand extended DCG to
standard predicates
price(X)-->dollar,
number(X), unit,
{length(X, N), N<3}.
price(X, List1, List2):dollar(List1, List11),
number(X, List11,
List12),
unit(List12, List2),
length(X, N),
N<3.
A “machine translation”
example
•
•
•
•
three hundred and thirty four: 334
twenty one: 21
fourteen: 14
five: 5
• ?-to_number(N, [three, hundred, and,
thirty, four],[]).
• N=334.
A “translation” example
• Vocabulary, lexicon
digit(1) --> [one].
digit(2) --> [two].
…..
digit(9) --> [nine].
teen(10) --> [ten].
teen(11) --> [eleven].
…..
teen(19) --> [nineteen].
tens(20) --> [twenty].
tens(30) --> [thirty].
…..
tens(90) --> [ninety].
A “translation” example
• Numbers with one or two digits.
xx(N) --> digit(N).
xx(N) --> teen(N).
xx(N) --> tens(T), rest_xx(N1), {N is T+N1}.
rest_xx(N) --> digit(N).
rest_xx(0) --> [].
A “translation” example
% numbers with 3 or fewer digits
xxx(N) --> digit(D), [hundred], rest_xxx(N1),
{N is D*100+N1}.
xxx(N) --> xx(N).
rest_xxx(N) --> [and], xx(N).
rest_xxx(0) --> [].
%top level
to_number(0) --> [zero].
to_number(N) --> xxx(N).
Query
?-to_number(N, [two, hundred, and, twenty
one], []).
N=221
Representing Syntactic
Knowledge
Syntactic knowledge:
–
–
–
Syntactic Categories: e.g. Noun, Sentence.
Grammatical features: e.g. Singular, Plural
Grammar rules.
• Why bother?
Parts of language
• Regard sentences as being built out of
constituents
• Two types of constituents:
–
–
words (simple constituents), which have
lexical categories like noun, verb, etc.
phrases (compound constituents), like noun
phrases, verb phrases, etc.
• How to store syntactic knowledge?
–
–
lexicon
grammar rules
Words: Lexical Categories
(Parts of Speech)
• Noun (N): Jack, tree, house, cannon
• Verb (V): build, walk, kill
• Adjective (Adj): big, red, unpleasant
• Determiner (Det): the, a, which, that
–
–
Jack built {the, a, that} big, red house;
Which house did Jack build?
• Preposition (Prep): with, for, in, from, to,
through, via, under
Words: Lexical Categories
(ctd)
• Pronoun (Pro): her, him, she, itself, that, it
–
–
I saw the man in the park with the telescope
Don't do that to him
• Conjunction (Conj): and, or, but.
Two kinds of lexical categories:
1. Open categories (“content words”): N, V,
Adj
2. Closed categories (“function words”): Det,
Prep, Pro, Conj
Compound Constituents
Some compound constituents:
Sentence (S): Jack built the house.
Noun Phrase (NP):
John;
the big, red house;
the house that Jack built;
the destruction of the city.
Verb Phrase (VP):
built the house quickly;
saw the man in the park.
Prepositional Phrase (PP):
with the telescope;
on the table
A Simple Grammar
S  NP VP
VP  V NP
NP  Proper_N
NP  det N
Proper_N  John
Proper_N  Mary
N  cake
V  loves
V  ate
det  the
Sentences in this language:
“John loves Mary”
“John ate the cake”
“John loves the cake”
Definite Clause Grammars
(DCGs)
The above grammar can be simply
implemented in DCG notation as follows:
s --> np, vp.
vp --> v, np.
np --> proper_n.
np --> det, n.
proper_n --> [john].
proper_n --> [mary].
n --> [cake].
v --> [loves].
v --> [ate].
det --> [the].
Translating DCG
Consider the rule
s --> np, vp.
Prolog translates this as:
s(Ws1,Ws2) :- np(Ws1,Ws),vp(Ws,Ws2).
This says that after taking an s off the start of
Ws1, Ws2 remains
The rule
proper_n --> [john].
is translated as
proper_n([john|Ws],Ws).
Query
• s([john, ate, the cake],[]).
• Yes
• s([ate, john, cake, the],[]).
• No
Download