Matakuliah Tahun Versi : T0264/Intelijensia Semu : Juli 2006 : 2/2 Pertemuan 22 Natural Language Processing Syntactic Processing 1 Learning Outcomes Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu : • << TIK-99 >> • << TIK-99>> 2 Outline Materi • • • • • Materi 1 Materi 2 Materi 3 Materi 4 Materi 5 3 15.2. Syntactic Proccesing • Syntactic processing adalah tahapan yang mengerjakan konversi kalimat kedalam struktur hirarki yang berkaitan dengan arti kalimat secara tunggal. • Proses ini disebut sebagai parsing. • Dua alasan penting adalah : 1. Proses semantic harus beroperasi pada pokok kalimat. 2. Memungkinkan harus menguraikan makna kalimat tanpa menggunakan tatabahasa. 4 Roles • Constraint the number of constituents that semantic can consider. Since syntax is cheaper then semantics, this is cost effective. • Force syntactically required interpretations, for example in distinguishing the meanings of: – The satellite orbited Mars. – Mars orbited the satellite. 5 Two Main Components • Grammar A declarative representation, called grammar, of the syntactic facts about the language. • Parser A procedure, called a parser, that compares the grammar against input sentences to produce parsed structures. 6 15.2.1. Grammar and Parser A Simple Grammar for a Fragment of English S NP VP NP the NP1 NP PRO NP PN NP NP1 NP1 ADJS N ADJS ADJ ADJS VP V VP V NP N file NP printer PN Bill PRO I ADJ short long fast V printed created want 7 A Parse Tree for s Sentence “ Bill printed the file “ 8 Ambiguity Examples : “Have the students who missed the exam take it today.” ” Have the students who missed the exam taken it today ?” “ The horse raced past the barn fell down.” 9 Parsing Strategies • Top-Down • Bottom-Up • All Paths • Best Path with Backtracking • Best Path with Patchup • Wait and See 10 Parsing Strategies • Top-Down Parsing – Begin with the start symbol and apply the grammar rules forward until the symbols at the terminal of the tree correspond to the components of the sentence being parsed. • Bottom-Up Parsing – Begin with the sentence to be parsed and apply the grammar rules backward until a single tree whose terminals are the word of the sentence and whose to node is the start symbol has been produced. 11 Parsing Strategies • All Paths – Follow all possible path and build all the posiible intermediate components. • Best Path with Backtracking – Follow only one path at a time, but record, at every choice point, the information that is necessary to make another choice if the chosen path fails to lead to a complete interpretation of the sentence. 12 Parsing Strategies • Best Path with Patchup – Follow only one path at a time, but when an error is detected, explicitly shuffle around the components that have already been formed. • Wait and See – Follow only one path, but rather than making decisions about the function of each component at it is encountered, procrastinate the decision until enough information is available to make the decision correctly. 13 15.2.2. Augmented Transition Network • Augmented Transition Network is a top-down parsing procedure that allows various kinds of knowledge to be incorporated into the parsing system so it can operated efficiently. • ATN in graphical notation : “The long file has printed” • 1. 2. 3. 4. This execution proceeds as follows : Begin in state S. Push to NP. Do a category test to see if “the” is a determiner. This test succeeds, so set the DETERMINER register to DEFINITE and go to state Q6. 14 Augmented Transition Network 5. Do a category test to see if “long” is an adjective 6. This test succeeds, so append “long” to the list contained in the ADJS register. (This list was previously empty). Stay in state Q6. 7. Do a category test to see if “file” is an adjective. This test fails. 8. Do a category test to see if “file” is a noun. This test succeeds, so set the NOUN register to “file” and go to state Q7. 9. Push to PP. 10. Do a category test to see if “has” is a preposition. This test fails, so pop and signal failure. 15 Augmented Transition Network 11. There is nothing else that can be done from state Q7, so pop and return the structure ( NP ( FILE ( LONG ) DEFINITE )) The return causes the machine to be in state Q1, with the SUBJ register set to the structure just returned and the type register set to DCL. 12. Do a category test to see if “has” is a verb. This test succeeds, so set the AUX register to NIL and set the V register to “has”. Go to state Q4. 13. Push to state NP. Since the next word, “printed”, is not determiner or proper noun, NP will pop and return failure. 14. The only other thing to do in state Q4 is to halt. But more input remains, so a complete parse has not been found. Backtracking is now required. 16 Augmented Transition Network 15. The last choice point was at state Q1, so return there. The register AUX and V must be unset. 16. Do a category test to see if “has” is an auxiliary. This test succeeds, so set the AUX register to “has” and go to state Q3. 17. Do a category test to see if “printed” is a verb. This test succeeds, so set the V register to “printed”. Go to state Q4. 18. Now, since the input is exhausted, Q4 is acceptable final state. Pop and return the structure ( S DCL (NP ( FILE ( LONG ) DEFINITE )) HAS ( VP PRINTED) This structure is the output of the parse. 17 An ATN Network for a Fragment of English 18 15.2.3. Unification Grammars • • Purely declarative representations Unification simultaneously performs two operations: – Matching – Structure building, by combining constituents • Think of graphs as sets not lists, i.e., order doesn’t matter. 19 Unification Grammars contd’ • Lexical items as graphs: [CAT: DET LEX: the ] [CAT: N LEX: file NUMBER: SING] • Nonterminal constituents as graphs: [NP: [DET: the HEAD: file NUMBER: SING] 20 Unification Grammars contd’ • Grammar rules (e.g., NP DET N) as graphs: [CONSTITUENT1: [CAT: DET LEX: {1}] [CONSTITUENT2: [CAT: N LEX: {2} NUMBER {3}] [BUILD: [NP: [DET: {1} HEAD: {2} NUMBER {3}]]] 21 Algorithm : Unification Grammars 1. If either G1 or G2 is an attribute that is not itself an attribute-value pair then : a. If the attributes conflict (as defined above), then fail. b. If either is a variable, then bind it to the value of the other and return that value. c. Otherwise, return the most general value that is consistent with both the original values. Specifically, is disjunction is allowed, then return the intersection of the values. 2. Otherwise, do : a. Set variable NEW to empty. b. For each attribute A that is present (at the top level) in either G1 or G2 do : 22 Algorithm : Unification Grammars (i) If A is not present at the top level in the other input, then add A its value to NEW (ii) If it is, then call Graph-Unify with the two values for A. If that fail, then fail. Otherwise, take the new value of A to be the result of that unification and add A with is value to NEW. c. If there are any labels attached to G1 or G2, then bind them to NEW and return NEW. 23 << Closing >> End of Pertemuan 22 Good Luck 24