Pertemuan 22 Natural Language Processing Syntactic Processing Matakuliah

advertisement
Matakuliah
Tahun
Versi
: T0264/Intelijensia Semu
: Juli 2006
: 2/2
Pertemuan 22
Natural Language Processing
Syntactic Processing
1
Learning Outcomes
Pada akhir pertemuan ini, diharapkan mahasiswa
akan mampu :
• << TIK-99 >>
• << TIK-99>>
2
Outline Materi
•
•
•
•
•
Materi 1
Materi 2
Materi 3
Materi 4
Materi 5
3
15.2. Syntactic Proccesing
• Syntactic processing adalah tahapan yang
mengerjakan konversi kalimat kedalam struktur
hirarki yang berkaitan dengan arti kalimat secara
tunggal.
• Proses ini disebut sebagai parsing.
• Dua alasan penting adalah :
1. Proses semantic harus beroperasi pada pokok
kalimat.
2. Memungkinkan harus menguraikan makna
kalimat tanpa menggunakan tatabahasa.
4
Roles
• Constraint the number of constituents that
semantic can consider. Since syntax is
cheaper then semantics, this is cost
effective.
• Force syntactically required
interpretations, for example in
distinguishing the meanings of:
– The satellite orbited Mars.
– Mars orbited the satellite.
5
Two Main Components
• Grammar
A declarative representation, called
grammar, of the syntactic facts about the
language.
• Parser
A procedure, called a parser, that compares
the grammar against input sentences to
produce parsed structures.
6
15.2.1. Grammar and Parser
A Simple Grammar for a Fragment of English
S  NP VP
NP  the NP1
NP  PRO
NP  PN
NP  NP1
NP1  ADJS N
ADJS    ADJ ADJS
VP  V
VP  V NP
N  file NP  printer
PN  Bill
PRO  I
ADJ  short  long  fast
V  printed  created  want
7
A Parse Tree for s Sentence
“ Bill printed the file “
8
Ambiguity
Examples :
“Have the students who missed the exam
take it today.”
” Have the students who missed the exam
taken it today ?”
“ The horse raced past the barn fell down.”
9
Parsing Strategies
• Top-Down
• Bottom-Up
• All Paths
• Best Path with Backtracking
• Best Path with Patchup
• Wait and See
10
Parsing Strategies
• Top-Down Parsing – Begin with the start
symbol and apply the grammar rules forward until
the symbols at the terminal of the tree correspond
to the components of the sentence being parsed.
• Bottom-Up Parsing – Begin with the sentence
to be parsed and apply the grammar rules
backward until a single tree whose terminals are
the word of the sentence and whose to node is
the start symbol has been produced.
11
Parsing Strategies
• All Paths – Follow all possible path and build all
the posiible intermediate components.
• Best Path with Backtracking – Follow only one
path at a time, but record, at every choice point,
the information that is necessary to make another
choice if the chosen path fails to lead to a
complete interpretation of the sentence.
12
Parsing Strategies
• Best Path with Patchup – Follow only one
path at a time, but when an error is
detected, explicitly shuffle around the
components that have already been formed.
• Wait and See – Follow only one path, but
rather than making decisions about the
function of each component at it is
encountered, procrastinate the decision until
enough information is available to make the
decision correctly.
13
15.2.2. Augmented Transition Network
•
Augmented Transition Network is a top-down parsing
procedure that allows various kinds of knowledge to be
incorporated into the parsing system so it can operated
efficiently.
•
ATN in graphical notation :
“The long file has printed”
•
1.
2.
3.
4.
This execution proceeds as follows :
Begin in state S.
Push to NP.
Do a category test to see if “the” is a determiner.
This test succeeds, so set the DETERMINER register to
DEFINITE and go to state Q6.
14
Augmented Transition Network
5. Do a category test to see if “long” is an adjective
6. This test succeeds, so append “long” to the list contained in
the ADJS register. (This list was previously empty). Stay in
state Q6.
7. Do a category test to see if “file” is an adjective. This test
fails.
8. Do a category test to see if “file” is a noun. This test
succeeds, so set the NOUN register to “file” and go to state
Q7.
9. Push to PP.
10. Do a category test to see if “has” is a preposition. This test
fails, so pop and signal failure.
15
Augmented Transition Network
11. There is nothing else that can be done from state Q7, so
pop and return the structure ( NP ( FILE ( LONG )
DEFINITE ))
The return causes the machine to be in state Q1, with the
SUBJ register set to the structure just returned and the type
register set to DCL.
12. Do a category test to see if “has” is a verb. This test
succeeds, so set the AUX register to NIL and set the V
register to “has”. Go to state Q4.
13. Push to state NP. Since the next word, “printed”, is not
determiner or proper noun, NP will pop and return failure.
14. The only other thing to do in state Q4 is to halt. But more
input remains, so a complete parse has not been found.
Backtracking is now required.
16
Augmented Transition Network
15. The last choice point was at state Q1, so return there. The
register AUX and V must be unset.
16. Do a category test to see if “has” is an auxiliary. This test
succeeds, so set the AUX register to “has” and go to state
Q3.
17. Do a category test to see if “printed” is a verb. This test
succeeds, so set the V register to “printed”. Go to state Q4.
18. Now, since the input is exhausted, Q4 is acceptable final
state. Pop and return the structure
( S DCL (NP ( FILE ( LONG ) DEFINITE ))
HAS
( VP PRINTED)
This structure is the output of the parse.
17
An ATN Network for a Fragment of English
18
15.2.3. Unification Grammars
•
•
Purely declarative representations
Unification simultaneously performs two
operations:
– Matching
– Structure building, by combining
constituents
•
Think of graphs as sets not lists, i.e.,
order doesn’t matter.
19
Unification Grammars contd’
• Lexical items as graphs:
[CAT: DET
LEX: the ]
[CAT: N
LEX: file
NUMBER:
SING]
• Nonterminal constituents as graphs:
[NP: [DET: the
HEAD: file
NUMBER:
SING]
20
Unification Grammars contd’
•
Grammar rules (e.g., NP  DET N) as
graphs:
[CONSTITUENT1: [CAT: DET
LEX: {1}]
[CONSTITUENT2: [CAT: N
LEX: {2}
NUMBER {3}]
[BUILD: [NP: [DET: {1}
HEAD: {2}
NUMBER {3}]]]
21
Algorithm : Unification Grammars
1. If either G1 or G2 is an attribute that is not itself
an attribute-value pair then :
a. If the attributes conflict (as defined above), then fail.
b. If either is a variable, then bind it to the value of the
other and return that value.
c. Otherwise, return the most general value that is
consistent with both the original values. Specifically, is
disjunction is allowed, then return the intersection of
the values.
2. Otherwise, do :
a. Set variable NEW to empty.
b. For each attribute A that is present (at the top level) in
either G1 or G2 do :
22
Algorithm : Unification Grammars
(i)
If A is not present at the top level in the other input,
then add A its value to NEW
(ii) If it is, then call Graph-Unify with the two values for A.
If that fail, then fail. Otherwise, take the new value of
A to be the result of that unification and add A with is
value to NEW.
c.
If there are any labels attached to G1 or G2, then bind
them to NEW and return NEW.
23
<< Closing >>
End of Pertemuan 22
Good Luck
24
Download