Class 5 notes: Feb. 8, 2006

advertisement
Class 5 notes: Feb. 8, 2006
More work on the lexicon:
Notice that grammars are not written with 40+ parts of speech – not practical!
Lexical categories used in grammars: (roughly)
Pronoun
ProperNoun
Noun
Aux
Modal
Verb
Adjective
Adverb
Preposition
Determiner
Conjunction
‘s or ‘ (possessive ending)
Everything else is represented using features.
Here is a better format for your lexicons:
A lexical entry:
CAT – the part of speech
BASE – base or “root” form of the word
Features: an attribute value list
LEX – the word itself
For nouns: NUMBER = MASS, SG, PL
For verbs:
FORM = BASE PRES PAST PPRT ING 3PS
COMP = DOBJ IOBJ TO-VP THAT-S FOR-ING FOR-TO
For pronouns determiners and adverbs: WH
For pronouns: REFLexive, SUBjective, ACCusative, POSSessive
For adjectives and adverbs: BASE, COMParative, SUPerlative
Re-cast your lexicon as shown above, and encode a set of words to be provided on
the class web site very soon. (All of these words will appear in the 2000-word
sample of the Brown corpus, where you can find at least some of their Brown tags,
which can help you.)
Continue discussion of parsing:
Function TOP-DOWN-PARSE (list of words, grammar) // returns a parse tree
// only returns the first parse found
Push([Initial S tree, list of words], agenda) //a state = [tree, unconsumed-words]
While agenda not empty
loop
current-state = Pop(agenda)
if successful-parse? (current-state) // no non-terminal leaves and no more input
return tree
if Node-to-expand(current-state)) is hterm (a POS or base form of a word)
if instance-of(Next-word(current-state), CAT(Node-to-expand …))
// Next-word must return empty so instance-of fails if no more input
Push ([Attach-word (Node-to-expand . . ), Rest-of-input], agenda)
else continue loop
else
Push-all(Apply-rules (current-state, grammar), agenda)
end loop
return reject
S  VP | Aux NP VP | NP VP
NP  Pronoun | Proper-Noun | Det Nominal ADJP| Det Nominal
Nominal  Noun | Noun Nominal |
VP  Verb | Verb PP | Verb NP | Verb NP PP
PP  Preposition NP
ADJP  ADJ | VP
Example 1: Harry disappeared.
1. ([S, Harry disappeared])
Apply-rules
2. ([S  VP, Harry disappeared]
[S  aux NP VP, Harry disappeared]
[S NP VP, Harry disappeared])
Apply-rules
3. ([S  <VP <verb>> , Harry disappeared],
[S  <VP <verb PP>> , Harry disappeared]
[S  <VP <verb NP>> , Harry disappeared]
[S  <VP <verb NP PP>> , Harry disappeared]
Continue
[S  aux NP VP, Harry disappeared],
[S NP VP, Harry disappeared])
4. [S  <VP <verb PP>> , Harry disappeared]
[S  <VP <verb NP>> , Harry disappeared]
[S  <VP <verb NP PP>> , Harry disappeared]
[S  aux NP VP, Harry disappeared],
[S NP VP, Harry disappeared])
Continue
5. [S  <VP <verb NP>> , Harry disappeared]
[S  <VP <verb NP PP>> , Harry disappeared]
[S  aux NP VP, Harry disappeared],
[S NP VP, Harry disappeared])
Continue
6. [S  <VP <verb NP PP>> , Harry disappeared]
[S  aux NP VP, Harry disappeared],
[S NP VP, Harry disappeared])
Continue
7. [S  aux NP VP, Harry disappeared],
[S NP VP, Harry disappeared])
Continue
8. [S NP VP, Harry disappeared])
Apply-rules
9. [S  <NP <Pronoun>> VP, Harry disappeared]
Continue
[S  <NP <Proper-noun>> VP, Harry disappeared]
[S  <NP <Det Nominal ADJP>> VP, Harry disappeared]
[S  <NP <Det Nominal>> VP, Harry disappeared]
10.[S  <NP <Proper-noun>> VP, Harry disappeared]
Attach-word
[S  <NP <Det Nominal ADJP>> VP, Harry disappeared]
[S  <NP <Det Nominal>> VP, Harry disappeared]
11.[S  <NP <Proper-noun Harry>> VP, disappeared]
Apply-rules
[S  <NP <Det Nominal ADJP>> VP, Harry disappeared]
[S  <NP <Det Nominal>> VP, Harry disappeared]
12.[S  <NP <Proper-noun Harry>> <VP <verb>>, disappeared]
Attach-word
[S  <NP <Proper-noun Harry>> <VP <verb PP>>, disappeared]
[S  <NP <Proper-noun Harry>> <VP <verb NP>>, disappeared]
[S  <NP <Proper-noun Harry>> <VP <verb NP PP>>, disappeared]
[S  <NP <Det Nominal ADJP>> VP, Harry disappeared]
[S  <NP <Det Nominal>> VP, Harry disappeared]
13.[S  <NP <Proper-noun Harry>> <VP <verb disappeared>>, ]
SUCCESS!!
[S  <NP <Proper-noun Harry>> <VP <verb PP>>, disappeared]
[S  <NP <Proper-noun Harry>> <VP <verb NP>>, disappeared]
[S  <NP <Proper-noun Harry>> <VP <verb NP PP>>, disappeared]
[S  <NP <Det Nominal ADJP>> VP, Harry disappeared]
[S  <NP <Det Nominal>> VP, Harry disappeared]
Illustrates the benefit of Bottom Up Filtering:
Top down parsing with bottom up filtering
A table LC mapping each non-terminal into its possible “left corner” terminals
(POS) that can begin a constituent of that type.
Apply-rules filters the new search-states it creates to exclude rules where the
LC of the first symbol of the RHS is incompatible with the current input.
Example2: Are many students studying at Northeastern
(assume bottom up filtering is being used)
1. [S, Are many students studying at NU]
Apply-rules S
2. [S  aux NP VP, Are many students studying at NU]
Attach-word aux
3. [S  <aux Are> NP VP, many students studying at NU]
Apply-rules NP
4* [S  <aux Are> <NP <Det Nominal ADJP>> VP, many students studying at NU]
[S  <aux Are> <NP <Det Nominal>> VP, many students studying at NU]
ignore for the moment the fact that “many” can be a pronoun Attach-word Det
5. [S  <aux Are> <NP < < Det many> Nominal ADJP>> VP, students studying at
NU]
Apply-rules Nominal
[S  <aux Are> <NP <Det Nominal>> VP, many students studying at NU]
6. [S  <aux Are> <NP < < Det many> <Nominal <Noun>>ADJP>> VP, students
studying at NU]
Attach-word Noun
*[S  <aux Are> <NP < < Det many> <Nominal <Noun Nominal>>ADJP>> VP,
students studying at NU]
[S  <aux Are> <NP <Det Nominal>> VP, many students studying at NU]
7. [S  <aux Are> <NP < < Det many> <Nominal <Noun students>> ADJP>> VP,
studying at NU]
Apply-rules ADJP
*
8. [S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP>>> VP, studying at NU]
Apply-rules VP
*
9. [S  <aux Are> <NP < < Det many> <Nominal <Noun students>> Attach-word
<ADJP <VP <verb>>>> VP, studying at NU]
verb
[S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP <verb PP>>>> VP, studying at NU]
[S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP <verb NP>>>> VP, studying at NU]
[S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP <verb NP PP>>>> VP, studying at NU]
*
10.[S  <aux Are> <NP < < Det many> <Nominal <Noun students>> Apply-rules
<ADJP <VP <verb studying>>>> VP, at NU]
VP (none pass filter)
[S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP <verb PP>>>> VP, studying at NU]
[S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP <verb NP>>>> VP, studying at NU]
[S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP <verb NP PP>>>> VP, studying at NU]
*
11. [S  <aux Are> <NP < < Det many> <Nominal <Noun students>> Attach-word
<ADJP <VP <verb PP>>>> VP, studying at NU]
verb
[S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP <verb NP>>>> VP, studying at NU]
[S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP <verb NP PP>>>> VP, studying at NU]
*
12.[S  <aux Are> <NP < < Det many> <Nominal <Noun students>> Apply-rules
<ADJP <VP <<verb studying> PP>>>> VP, at NU]
PP
[S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP <verb NP>>>> VP, studying at NU]
[S  <aux Are> <NP < < Det many> <Nominal <Noun students>>
<ADJP <VP <verb NP PP>>>> VP, studying at NU]
*
“at NU” parses as a PP, thus satisfying the expansion of the subject NP as a
NP  Det Nominal ADJP. However, there is still a VP in the tree, with no more
input. Therefore BACKTRACKING will occur, to Step 4, where the rule NP  Det
Nominal will be selected. All the work we did since then is discarded. Parsing of
“students” as a Nominal, and of “studying at NU” as a VP must be done all over
again (exactly the same way). Only the higher level NP and S structures are
different.
Remaining problems with top-down parsing:
Building and discarding the same structure many times over (when it is
parsed in the wrong attachment context)
Infinite loop if the grammar contains left-recursive rules
The Early Algorithm (also known as Chart Parsing)
Instead of building a parse tree, create a chart: Each NODE represents a place
between words. Each EDGE represents a word or constituent.
N0 fish N1 head N2 up N3 the N4 river N5 in N6 spring N7
We add edges to the chart representing possible constituents. Every edge has a
LABEL, with a CAT component, whose value is a terminal or non-terminal symbol
from the grammar. The goal is to get one or more S edges going from N0 to N7.
This method naturally produces all possible parses. (Unfortunately, frequently several
hundred).
The shortest edges are “lexical edges” connecting adjacent nodes, whose labels
contain the information from the lexicon for a word. If a word has more than one
POS, there will be several lexical edges representing that word, one for each POS.
Example: see what edges can be built.
Edges can be complete (producers) or incomplete (consumers).
There is a queue of complete edges waiting to be added to the chart. Initially it
contains all the lexical edges with the leftmost word first. There is a queue of
incomplete edges waiting to be added to the chart. 0-length incomplete edges (loops)
are initially created from the grammar rules, and then longer incomplete edges are
generated as matching RHS constituents are consumed.
Whenever an edge is added to the chart, it triggers a procedure. When a new
incomplete edge is added, it represents a partially fulfilled rule. The parser must look
to the right for the next constituent from the RHS of the rule. When a new complete
edge is added, it represents a found constituent. The parser must look to the left to see
if anyone is waiting for this constituent. These two processes are instantiations of the
“fundamental rule” of chart parsing.
The control schemes for a chart parser can be top-down, bottom up, depth first or
breadth first with only slight changes in the algorithm. It is easy to parameterize this.
(Also: right to left can easily replace left to right, but we will not consider that.)
Just as in tree search, depth first always adds new edges to the front of the queue,
breadth first adds them to the end. We will look at top down and bottom up chart
parsing using the tutorial from the NLTK.
Natural Language Semantics
What could it mean to say a computer “understands” NL?
FOPC as a representation language:
truth-functional semantics with sound and complete inference algorithm
canonical form relative to synonyms and simple syntactic variations
ambiguity is exposed
Limitations of expressiveness:
vagueness is ignored
statements about mental states create major problems
Relation to syntax of simple declaratives:
 main verb defines a “predicate” with some expected arguments
 related NP’s provide values for the arguments
 attempt to define a set of standard semantic roles (thematic roles, case roles)
such as agent, patient, instrument, direction
 the Verb’s subcategorization frame(s) map syntactic roles such as subject,
direct object, with-object into semantic case roles.
 selectional restrictions define what types of objects can fill the semantic roles
therefore define a “semantic grammar” based on world knowledge taxonomy.
John broke the window.
The window broke.
John broke the window with a hammer.
The window was broken by John (passive transformation)
Selectional restrictions:
A hammer broke the window. (?)
Set-theoretic semantics of FOPC
Use of FOPC for modeling NL meanings:
1. Individuals and categories
2. Events
3. Ambiguity
4. Propositional attitudes (mental states)
1. Individuals and categories
Old approach (from philosophy):
constants & variables refer to individuals; categories and properties represented
as predicates
All men are mortal; Socrates is a man  Socrates is mortal (modus ponens)
AX [man(X)  mortal(X)]
man(socrates)
mortal(socrates)
student(jun), wizard(harry), green(ball1) . . .
Lack of expressive power: ball1, ball2: can’t express that they are the same color
unless:
green(ball1) ^ green(ball2) || red(ball1) ^ red(ball2) || …..
Answer: reification – making a concept into an object about which things can be
asserted (in FOPC, a constant instead of a predicate).
Now let all categories be constants. Individuals by convention are represented with
numeric suffixes. (not part of logical formalism but for understandability).
isa(socrates1, man)
isa(sappho2, woman)
ako(man, human)
ako(woman, human)
AX (isa(X, human)  mortal(X))
“Inheritance rule” AXY Z[ako(X, Y) ^ isa (Z, X))  isa(Z, Y)]
Property values also become objects; can express many more things.
color(ball1, green)
color(ball2, green)
AX [color(ball1, X)  color(ball2,X)]
AXY[color(X,Y)  =(Y, green) || =(Y, red) || ….]
2. Events
A similar evolution occurred:
Old way: John ate a pizza: EX[isa(X, pizza) ^ eat(John1, X)]
can’t say he ate it quickly, or that he ate it after noon, etc. Need an object to represent
the event of his eating. Reification.
New way:
EXW [isa(W, eating-event) ^ eater(W, John1) ^ eaten(W, X) ^ isa(X, pizza)]
This representation derives from the use of “frames” or “schemas” in AI, which is a
precursor of object oriented programming. eater, eaten, etc are the arguments of the
eat predicate, but the predicate has been reified.
Benefits: after(W, 12:00-noon) or on(W, Wednesday)
maybe we don’t know what john ate: John got sick from eating
EXWV [isa(W, eating-event) ^ eater(W, John1) ^ isa(V, getsick-event) ^
patient(V, John1) ^ cause(W, V)]
3. Ambiguity
Famous example: Every man loves a woman - has two interpretations
AX [ isa(X, man)  EY [isa(Y, woman) ^ loves(X, Y)]]
(for every man, there is some woman whom he loves)
EY [isa(Y, woman) ^ AX [isa(X, man)  loves (X, Y)]]
(there is a woman whom every man loves)
4. Propositional attitudes
believe/know want, fear, doubt, hope (BDI semantics: belief/desire/intention)
Mary believes that Sam arrived.
No way to express this without asserting Sam is here.
EVW[isa(V, belief-event) ^ believer(V, Mary1) ^ believed (V, W)
^ isa(W, arrival-event) ^ arriver(W, Sam2)]
We want to re-ify the arrival event:
EV[isa(V, belief-event) ^ believer(V, Mary1) ^ believed (V, EW[ isa(W, arrivalevent) ^ arriver(W, Sam2)]]
Problem: Re-ification won’t work -- we can’t make a complete proposition into an
object and stay within FOPC.
One answer: modal logic. Believe becomes an operator.
Problem: We lose the clear truth-functional semantics and sound & complete
inference algorithms.
Specifically: Interaction of modal operators with quanitifiers, negation and inference
rules such as substitution of equals is unclear.
Another famous example:
John believes the evening star is visible.
The evening star is the planet Venus
?????????????????????????????????????
John believes the planet Venus is visible
Referentially opaque meaning is true; Referentially transparent meaning is false.
Another example:
Oedipus wanted to marry Jocasta
Jocasta was his mother
Oedipus wanted to marry his mother
Why this only works in restricted domains and environments.
Download