- UM Repository

advertisement
UNIVERSITI MALAYA
ORIGINAL LITERARY WORK DECLARATION
Name of Candidate: NOOR HAFHIZAH BINTI ABD RAHIM (I.C/Passport No: 830127-08-5412)
Registration/Matric No: WGA070052
Name of Degree: Master of Computer Science
Title of Project Paper/Research Report/Dissertation/Thesis (“this Work”):
A Statistical Parser To Reduce Structural Ambiguity In Malay Grammar Rules
Field of Study: Artificial Intelligence (Natural Language Processing)
I do solemnly and sincerely declare that:
(1)
(2)
(3)
(4)
(5)
(6)
I am the sole author/writer of this Work;
This Work is original;
Any use of any work in which copyright exists was done by way of fair dealing and for
permitted purposes and any excerpt or extract from, or reference to or reproduction of any
copyright work has been disclosed expressly and sufficiently and the title of the Work and its
authorship have been acknowledged in this Work;
I do not have any actual knowledge nor do I ought reasonably to know that the making of this
work constitutes an infringement of any copyright work;
I hereby assign all and every rights in the copyright to this Work to the University of Malaya
(“UM”), who henceforth shall be owner of the copyright in this Work and that any
reproduction or use in any form or by any means whatsoever is prohibited without the written
consent of UM having been first had and obtained;
I am fully aware that if in the course of making this Work I have infringed any copyright
whether intentionally or otherwise, I may be subject to legal action or any other action as
may be determined by UM.
Candidate’s Signature
Date 16 February 2011
Subscribed and solemnly declared before,
Witness’s Signature
Name:
Designation:
Date
ABSTRACT
The goal of the research is to develop a statistical parser that can help in reducing a
structural ambiguity in a Malay language. Parsing is an important phase in understanding
natural language. However, to parse a sentence is a difficult task due to the various
ambiguity problems in natural language.
Parsing technique is the most important
components that need to be considered in developing any parser. The technique used in
this research is top-down parsing and the grammar chosen is a context-free grammar (CFG)
for Malay language. The CFG contains rule in forming a Malay basic sentence. The
proposed Malay Statistical Parser uses probability values, which were computed for one
hundred and fourty seven (147) grammar rules as the guideline in parsing the best parse
tree. Since there is no probability for Malay CFG rules, one thousand (1000) of training
data are collected from primary text books and various Malay grammar books.
The
probability values were calculated and it is known as Probability Context-free Grammar
(PCFG). The parser is then evaluated using one hundred (100) test data, where the data
was approved by two Malay linguists that were known as Munsyi Dewan. After that, the
Malay statistical parser computes the highest probability value for each of the parsed
sentences.
The result shows the parser achieved 100% recall, 93.25% precision and
96.75% f-score, where the parser is able to reduce ambiguity for Malay basic sentence.
ii
ABSTRAK
Tujuan penyelidikan ini ialah membangunkan sebuah pengurai berstatistik yang
dapat membantu mengurangkan ketaksaan berstruktur dalam Bahasa Melayu. Penguraian
merupakan satu fasa penting dalam memahami bahasa tabii. Walau bagaimanapun, untuk
mengurai sesuatu ayat, ia merupakan satu tugas yang sukar memandangkan terdapat banyak
masalah dalam ketaksaan bahasa tabii. Teknik penguraian merupakan komponen yang
paling penting yang perlu dipertimbangkan dalam membangunkan sebarang pengurai.
Teknik yang digunakan dalam penyelidikan ini ialah teknik penguraian atas-bawah dan
tatabahasa yang dipilih ialah nahu bebas-konteks untuk Bahasa Melayu. Nahu bebaskonteks tersebut mengandungi petua-petua bagi membentuk ayat mudah Bahasa Melayu.
Pengurai Berstatistik Bahasa Melayu menggunakan nilai-nilai kebarangkalian yang dikira
untuk seratus empat puluh tujuh (147) petua-petua nahu yang digunakan sebagai panduan
dalam memperoleh rajah pepohon yang terbaik.
Memandangkan belum ada nilai
kebarangkalian bagi petua nahu bebas-konteks untuk Bahasa Melayu, seribu (1000) data
latihan diperoleh daripada buku-buku teks sekolah rendah dan tatabahasa Bahasa Melayu.
Nilai-nilai kebarangkalian yang dikira itu dikenali sebagai Nahu Bebas-konteks
Berkebarangkalian.
Pengurai itu dinilai menggunakan seratus (100) data ujian yang
dipersetujui oleh dua orang pakar dalam Bahasa Melayu yang dikenali sebagai Munsyi
Dewan. Seterusnya, Pengurai Berstatistik Bahasa Melayu tersebut dapat mengira nilai
kebarangkalian yang tertinggi bagi setiap ayat yang diurai. Hasil keputusan menunjukkan
pengurai itu mencapai 100% recall, 93.25% precision dan 96.75% f-score, yang
menunjukkan pengurai tersebut berjaya mengurangkan ketaksaan berstruktur bagi ayat
mudah Bahasa Melayu.
iii
ACKNOWLEDGEMENT
To Allah, who gives me strength and good health to finish my MSc Degree.
To parents and siblings, who gives me full support.
To Dr Rohana Mahmud, my supervisor, who encourages and guides me in this research.
To my friends
Thanks a lot.
iv
TABLE OF CONTENTS
Abstract........................................................................................................
ii - iii
Acknowledgement........................................................................................
iv
Table of Contents..........................................................................................
v - viii
List of Figures................................................................................................
ix- x
List of Tables.................................................................................................
xi
1 Introduction
1
1.1 Introduction...............................................................................................
1
1.2 Problem Statement ...................................................................................
5
1.3 Research Objectives ...............................................................................
6
1.4 Research Approach……….......................................................................
6
1.5 Expected Results………………………………………………………...
7
1.6 Research Scopes and Limitations………………………………………..
7
1.7 Dissertation Organization..........................................................................
8
2 Literature Review
10
2.1 Natural Language Processing Phases............................................................
10
2.2 Syntax Analysis………………………….....................................................
11
2.2.1 Category of Syntax Analysis………….......................................
13
2.2.2 Syntactic Analysis Technique…………………………………..
14
2.3 Ambiguity………………………………………………………………...
16
2.3.1 Part-of-Speech (POS) Ambiguity................................................
16
v
2.3.2 Semantic Ambiguity....................................................................
17
2.3.3 Syntactic or Structural Ambiguity..............................................
17
2.3.4 Verbal Ambiguity......................................................................
17
2.4 Statistical Parsing………………………………………………………..
18
2.4.1 Context-free Grammar in English Language………………….
19
2.4.2 Probabilistic Context-free Grammar (PCFG)…………………
21
2.5 Charniak’s Parser......................................................................................
23
2.6 Collin’s Parser………..…………………………………………………
27
2.7 Statistical Parser for Malay Language.....................................................
28
2.8 Ahmad’s Malay Parser…………………………………………………..
28
2.9 Juzaiddin’s Malay Parser…………………………………………………
31
2.10 Summary of the Chapter………………………………………………..
34
3 Probabilistic Malay Grammar
36
3.1 Introduction of Grammar………………………………………………
36
3.2 Types of Malay Language Grammar......................................................
36
3.2.1 Sentence Grammar……….......................................................
37
3.2.1.1 CFG in Malay Language…………………………...
37
3.2.2 Partial Discourse Grammar......................................................
40
3.2.3 ‘Pola’ (Pattern) Grammar..........................................................
40
3.3 Rules for Basic Malay Sentence..............................................................
42
3.3.1 Rules for FN(Noun Phrase, NP)………………
42
3.3.2 Rules for FK (Verb Phrase, VP)……………...
44
vi
3.3. 3 Rules for FA (Adjective Phrase, AP)………
45
3.2. 4 Rules for FS (Prepositional Phrase, PP)........
45
3.4 Probabilistic Context-free Grammar (PCFG) for Malay
Language……………………………………………………
46
3.4.1 Training Data………………………………………
46
3.4.2 Analysis of Training Data…………………………
48
3.5 Summary of the Chapter……………………………………
61
4 Development of Malay Statistical Parser
62
4.1 Requirement Specification…………………………………………..
62
4.1.1 Functional Requirement…………………………………...
62
4.1.2 Non-functional Requirement………………………………
63
4.2 System Design………………………………………………………
64
4.2.1 System Architecture.............................................................
64
4.2.1.1 Input Component....................................................
65
4.2.1.2 Part-of-Speech (POS) Tagger................................
68
4.2.1.3 Malay Lexicon (MaLEX)......................................
70
4.2.1.4 Parsing Engine.......................................................
72
4.2.1.5 Output Component……..………………………...
75
4.2.2 User Interface Design………………………………………
75
4.3 Summary of the Chapter…………………………………………….
78
vii
5 Experiments and Results
79
5.1 Test Datasets.........................................................................................
79
5.2 Results…………………………………………………………………
90
5.3 Summary of the Chapter…......................................................................
95
6 Conclusion
96
6.1 Fulfillment of Research Objectives............................................................
96
6.2 Malay Statistical Parser...............................................................................
97
6.3 Limitations…………………………………………………………………
98
6.4 Future Enhancement………………………………………………………
98
6.5Summary of the Chapter…..........................................................................
99
References..........................................................................................................
100
Appendix A………….........................................................................................
103
Appendix B………….........................................................................................
120
Appendix C………….........................................................................................
122
Appendix D………….........................................................................................
191
Appendix E………….........................................................................................
193
viii
LIST OF FIGURES
Figure 1.1: First tree of “He saw the boy with a telescope” (Meyer at al., 2002)
2
Figure 1.2: Second tree of “He saw the boy with a telescope” (Meyer at al., 2002)
3
Figure 1.3: First tree of “Kami adang air itu”
4
Figure 1.4: Second tree of “Kami adang air itu”
4
Figure 2.1: Structure of Syntax Analysis (Parsing).
12
Figure 2.2 Some Examples of CFG for English (Jurafsky et. al. 2000)
20
Figure 2.3: Tree for Sentence “John loves Mary”
20
Figure 2.4: The Grammar with PCFG Form
25
Figure 2.5: The First Possible Parse Tree for Sentence “salespeople sold the dog
25
biscuits”
Figure 2.6: The Second Possible Tree for Sentence “salespeople sold the dog
26
biscuit”
Figure 2.7: The third possible parse tree for sentence “salespeople sold the dog
26
biscuits”
Figure 2.8: System Architecture for Ahmad‟s Malay Parser
30
Figure 2.9: The System Architecture Based on „Pola‟ (Pattern) Sentence
33
Figure 3.1: Context-Free Grammar for Malay Language, after Nik Safiah Karim
38
(1995)
Figure 3.2: Basic Rule for FN
43
Figure 3.3: Basic Rule for FN in detail
44
Figure 3.4: Basic Rules for FK
44
Figure 3.5: Basic rules for FA
45
Figure 3.6: Basic rules for FS
45
ix
Figure 3.7: Analysis Pattern Sentence of Training Data
49
Figure 4.1: System Architecture of a Statistical Parser for Malay Language
64
Figure 4.2: A parse tree of sentence “Dia gemar melancong”
69
(Mohd Juzaiddin Ab Aziz, et.al, 2006)
Figure 4.3: A parse tree of sentence “Dia gemar menulis aaturcara”
69
(Mohd Juzaiddin Ab Aziz, et.al, 2006)
Figure 4.4: The Pseudocode for POS Tagger
70
Figure 4.5: The Process in Parsing Engine
73
Figure 4.6: Another Process in Parsing Engine
74
Figure 4.7: Example of a Parse Trees And Their Probability Values
75
Figure 4.8: Main Interface for Malay Statistical Parser
76
Figure 4.9: Output for the Parser
76
Figure 4.10: Output Component for sentence “bapa saya pemandu teksi”
77
Figure 4.11: One of the Probability Values for the Parsed Sentence
77
Figure 4.12: The main interface with the error message “Tiada dalam database”
78
Figure 4.13: Error Message for Unsuccessful Parsed Sentence
78
Figure 5.1: Recall, Precision and F-score for Statistical Parser for Malay
94
Language
x
LIST OF TABLES
Table 2.1: Examples of terminal symbols, non-terminals and rewrite rules
19
Table 2.2: Comparison between Charniak’s parser, Collins’s parser,
34
Ahmad’s Malay parser and Juzaiddin’s Malay parser
Table 3.1: Description of Elements Used in Malay Grammar Rules, after
39
Ahmad I. Z. Abidin et al. (2007)
Table 3.2: Analysis of the sentences according to their pattern
48
Table 3.3: Rules for A (sentence), S (subject), and P (predicate)
50
Table 3.4: Rules for the FN (noun phrase)
50
Table 3.5: Rules for the FK (verb phrase)
55
Table 3.6: Rules for the FA (adverb phrase)
59
Table 3.7: Rules for the FS (prepositional phrase)
61
Table 4.1: A Few Examples of Words, Lexical Classes and their Synonyms
71
from MaLEX
Table 4.2: A Few Examples of Rules and Probability Values from MaLEX
72
Table 5.1: The number of test sentences
92
Table 5.2: Result of the experiments
93
xi
CHAPTER 1:
INTRODUCTION
Chapter 1
Introduction
1.1 Introduction
The purpose of Natural Language Processing (NLP) is to ensure that machines understand
the human language. Mohanty and Balabantaray (2003) define parsing as a process of
assigning structural description to the sequence of words in the natural language. The
parsing process produces parse trees which are useful in grammar’s checking applications.
The applications are similar to those used in the word processing system. In those
applications, parsing plays an important role throughout the inspection process. For
example, a sentence like “likes she reading” cannot be parsed because it is ungrammatical.
Parse trees are also very important in semantic analysis and to understand the deep meaning
of a sentence. Questions answering and information extraction are examples of analytical
applications. For example, information needed to answer the question “What books were
written by male Malay authors after 2000?” can be found in the subject of the sentence,
which are “what books” and the by-adjunct “male Malay authors”. The question can be
solved by knowing the books’ list instead of the authors’ list (Manning & Schutze, 1999).
Parsing a sentence, a difficult task, is an initial step to understand natural language although
ambiguity is a serious problem that linguists face in natural language parsing. An ambiguity
problem occurs when more than one parse tree are constructed. For example, the sentence
1
“He saw the boy with a telescope” can give two interpretations for readers. First reading:
“He used the telescope to see the boy”. Second reading: “He saw the boy who had a
telescope”. Both interpretations can also be represented by using tree structures. First tree
is shown in Figure 1.1.
Figure 1.1: First tree of “He saw the boy with a telescope” (Meyer at al., 2002)
The second tree which has dissimilar structure is shown in Figure 1.2.
2
Figure 1.2: Second tree of “He saw the boy with a telescope” (Meyer at al., 2002)
Both figures have dissimilar structures because they used different grammar rules. Thus,
the sentence “He saw the boy with a telescope” is an ambiguous sentence.
In Malay language, there are also ambiguous sentences. As mentioned by Mohd Juzaiddin
et. al. (2006), a Malay sentence is an ambiguous sentence when it has more than one parse
tree constructed. It can be happened when the sentence has word ambiguity too. The
definition of word ambiguity is a word that holds more than one part-of-speech (POS). For
example, word “adang” (deter) has two POS; namely KN (noun) and KKTr (transitive
verb). A sentence “kami adang air itu” (we deter the water) can have two different parse
trees. First tree is shown in Figure 1.3.
3
Figure 1.3: First tree of “Kami adang air itu”
The second tree which has dissimilar structure is shown in Figure 1.4.
Figure 1.4: Second tree of “Kami adang air itu”
Other than structural ambiguity, there are also other three (3) types of ambiguity that
usually occur in natural language such as English and Malay language, which are the partof-speech ambiguity, semantic ambiguity and verbal ambiguity (Jurasky et. al 2000).
To minimize the structural ambiguity; a statistical parser is one of the solutions to
encounter the problem. A statistical parser is a parser that assigned probability value to each
4
one of the grammar rules. Many researchers have succeeded in developing this type of
parsers. The development has been applied to many languages such as English, Chinese,
French and German. Charniak’s and Collins’s parsers are type of statistical parser that are
developed for English language. Charniak’s parser had been developed by a group of
researchers from University of Pennsylvania (Charniak, 1997) while Collins’s parser had
been developed by Michael Collins for his PhD study (Collins, 2003).
In this research, the focus is to implement a statistical parser for the Malay language as
other researchers have already developed Malay parser without using the statistical
elements. Ahmad’s (Ahmad I. Z. Abidin et al., 2007), and Juzaiddin’s (Mohd Juzaiddin Ab
Aziz et al., 2006) are examples of Malay parsers.
1.2 Problem statements
Parsing a sentence helps in knowing the way in which a sentence is structured by using the
linguistic knowledge. A sentence is successfully parsed when a parse tree can represent the
sentence. However, if the sentence is an ambiguous sentence, this may result in more than
one parse tree. To handle this situation, a statistical parser that minimizes ambiguity should
be developed. So far, however, there is no current development of statistical parsers in
Malay language; the goal of this project is to build a statistical parser that can minimize
structural ambiguity for Malay language sentences.
5
1.3 Research Objectives
There are two main objectives to this project:
(1) To provide the probability values for 147 of the Malay grammar rules.
(2) To develop a prototype of a statistical parser for the Malay language.
The statistical parser will then calculate the parsed sentence and choose a parse tree with
the highest probability.
1.4 Research Approach
There are two main things need to be considered when developing statistical parser for
Malay language. First is the calculation of the probabilistic grammar. Probability values
are assigned as a statistical element to each of the parsed sentence. Because there are no
such grammatical rules in the Malay language that has probability values, the values will be
calculated based on one thousand (1000) Malay sentences which follow language rules that
are introduced by Prof Emeritus Datuk Dr Nik Safiah Karim (Nik Safiah Karim, 1995).
The training data are derived from various sources containing Malay basic sentences.
When probability values are computed, then they will be assigned to 147 Malay grammar
rules. Following that, the statistical parser calculates the parsed sentence and chooses a
parse tree with the highest probability.
Second thing that is considered when developing a statistical parser is the Malay Lexicon
(MaLEX), which contains Malay words and its lexical classes. There are ninety words
which have two lexical classes.
The words are very important because they lead to
ambiguous sentences which will produce two parse trees and probability values. The
6
MaLEX is provided by undergraduate students from Universiti Kebangsaan Malaysia
(UKM). In this research project, an enhancement has been made where a table which
contains grammar rules and their probability values is added.
1.5 Expected Results
There are two expected results in this study:
(1) probability values for 147 Malay’s grammar rules only
A list of Malay grammar rules that assigned a probability value to each of the rules
known as Probability Context-Free Grammar (PCFG). So far, these rules only
derived within one thousand training data.
(2) a prototype of a Malay statistical parser
A prototype which can minimize the structural ambiguity of Malay basic sentences.
1.6 Research Scopes and Limitations
The scope of this study is limited to the development of a statistical parser for the Malay
language. Malay language sentences are categorized to either a basic sentence or a complex
sentence. Thus the limitation of this study is based only on basic sentences as it follows the
context-free grammar rules that were developed by Malay linguist (Nik Safiah Karim,
1995).
7
1.7 Dissertation Organization
The dissertation of this is organized as follows:

Chapter One
This chapter provides an overview of the study.
An introduction of the study is
provided first, followed by the problem’s statements, the objectives, synopsis of the
study’s methodology, the scope and limitation, and finally the dissertation’s
structure.

Chapter Two
Chapter Two provides a background study of Natural Language Processing (NLP)
phases, followed by a syntax analysis. Also, it discusses types of ambiguity and
refers to previous related works that is reported by other researcher.

Chapter Three
The third chapter describes in detail about Malay grammar. It explains the different
types of the Malay language grammar, namely the sentence grammar, the partial
discourse grammar, and pattern (pola) grammar.
Next, the chapter provides an explanation on CFG in the Malay language. It also
discusses the training data and how the probability values are counted on each of the
CFG rules and known as PCFG.
8

Chapter Four
Chapter Four describes the system architecture of a statistical parser for the Malay
language. The system comprises five components including the parsing engine,
part-of speech (POS) tagger, Malay lexicon, input and output components.

Chapter Five
This chapter discusses in detail the experiments carried out of each the basic Malay
sentence’s pattern.
The chapter also demonstrated the results and analyse the
findings.

Chapter Six
The last chapter assesses the fulfillment of the objectives in this study, the strength
of the Malay Statistical Parser, the limitations and also future enhancement.
9
CHAPTER 2:
LITERATURE REVIEW
Chapter 2
Literature Review
This chapter describes briefly the phases in Natural Language Processing. Then, it explains
about the syntactic analysis, which is also known as parsing, followed by further
explanation about the category and technique in syntactic analysis. Then, it goes to the
explanation of ambiguity, the statistical parser in English and also the grammar. This
chapter also gives some examples about available parsers in English such as Charniak‟s
and Collin‟s parsers. It also describes two parsers in Malay language, namely Ahmad‟s and
Juzaiddin‟s Malay Parser.
2.1 Natural Language Processing (NLP) Phases
Natural language processing is a task where computer systems can understand natural
language. Humans use natural language in daily conversations and examples of said natural
language are English, French, Turkish, Chinese and Malay. There are six different phases
of knowledge to understand a natural language: phonology, morphology, syntax, semantic,
pragmatic and discourse.
Phonology is a field regarding how words are related to the sound. For example, the word
„book‟ is pronounced as „buk‟. The next phase is morphology. Morphology is an area which
is related to the structure of the words. The words are constructed from basic meaning units
called morphemes. For example, the word „recently‟ comes from the root word „recent‟
10
(adjective) and coupled with the suffix –ly. When both are joined together, the word
„recently‟ becomes an adverb. The third phase is syntactic. This phase concerns how words
can be put together to form a correct sentence. It deals with the structure of the sentences.
For example, the structure of the sentence „she reads a book‟ is (S (NP (N she)) (VP (V
reads) (VP (ART a) (N book)))). The fourth phase is semantic knowledge. The semantic
knowledge concerns about the meaning of the words and how these meanings are
combined in sentences to form sentence meanings. For instance, the sentence “Colourless
green ideas sleep furiously” (Chomsky, 1957) would be rejected as semantically
anomalous. Next is the pragmatic phase. It deals with the use of a sentence in different
situations. For instance, the sentence „you have a green light‟ can give two interpretations.
The first interpretation is „you are holding a green light bulb‟ and the second interpretation
is „you have a green light to drive your car‟. The last phase is the discourse phase.
Discourse phase is a phase which concerns how the previous sentence affects the next
sentence. For example, “Ahmad wanted it”. “It” is referred to previous sentence.
In this research project, the major emphasis is in the syntax phase. As mentioned,
ambiguity becomes a main problem in natural language parsing. We want to disambiguate
the structure of an ambiguous sentence therefore syntax is the field that relates to it.
2.2
Syntax Analysis
Syntax analysis or also known as syntactic parsing is a task of recognising a sentence and
assigning a syntactic structure to the parsed sentence (Jurafsky et al., 2000). The word
syntax comes from the Greek sŷntaxis, meaning “setting out together or arrangement”, and
refers to the way words are arranged together (Jurafsky et. al, 2000). Syntax provides rules
11
to put together words in a sentence in the correct order. The objective of the syntax is to
produce a tree or some equivalent representation from the output.
Syntactic structure in a sentence means each word in the sentence associates with a lexical
value (word category). Another definition of the analysis is a process of analysis of text,
made of a sequence of tokens (i.e. words) to determine its grammatical structure with
respect to a given formal grammar.
Figure 2.1 shows the structure of the syntax analysis (parsing).
A test sentence
P
A
R
S
E
R
Parse trees
Grammar rules
Figure 2.1: Structure of Syntax Analysis (Parsing).
12
2.2.1 Category of Syntax Analysis
Syntax analysis can be categorized into two parts, namely recognising a phrase and
generating a series of strings.
The first part is to recognise a phrase. In recognising phrases in a sentence, most syntactic
representations of language are based on the notion of context-free grammars (CFG). CFG
consists of a set of rules or production. It is clearly stated that CFG is used to group and
order symbols together. CFG also defines lexicon of words and symbols (Allen, 1995).
An example of CFG from the Malay language that represents the Noun Phrase (NP) or
Frasa Nama (FN) which was suggested by Nik Safiah Karim (1975) can be seen below:
FN → (Bil) (Penj Bil) (Gelaran) Kata Nama Int <Kata
Nama
Int> (Penentu) (Pent)
Instance of FN:
Dua buah buku
Bil (Penj Bil)(Kata Nama)
There are two classes of symbols that are used in CFG. The classes are called terminal
symbol and non-terminal symbol. Symbols that cannot be further decomposed into
grammar are called terminal symbols like Verb (V) and Noun (N). Symbols that express
clusters of terminal symbols are called non-terminals such as NP, VP (verb phrase) and S
(sentence) (Jurasfky et. al, 2000).
13
The second part is to generate a series of strings. This sequence of rule expansions is called
a derivation of words. There are two important processes in derivation. The first is sentence
generation which uses derivations to construct legal sentences. A simple generator could be
implemented by randomly choosing rewrite rules, starting from the S symbol, until it
reaches a sequence of words. The second process based on derivations is parsing which
identifies the structures of sentences given a grammar (Allen, 1995).
2.2.2 Syntactic Analysis Technique
Syntax analysis technique or commonly known as syntactic parsing technique is a method
of analyzing a sentence to determine its structure according to the grammar (Allen, 1995).
There are two basic approaches in parsing techniques, namely top-down parsing and
bottom-up parsing (Grune & Jacobs , 1990). Top-down parsing starts with an S symbol (S
is equivalent to A in Malay language context where A stands for Ayat) and then searches
through different ways to rewrite the symbols until the input sentence is generated.
For example, we have the sentence: Ali belajar. (Ali studies)
A → Subjek Prediket
(Sentence → Subject Predicate)
→ (Frasa Nama) (Frasa Kerja)
(Noun Phrase) (Verb Phrase)
→ (Kata Nama) (Frasa Kerja)
(Noun) (Verb Phrase)
14
→ Ali (Frasa Kerja)
Ali (Verb Phrase)
→ Ali (Kata Kerja)
Ali (Verb)
→ Ali belajar
Ali studies.
Bottom-up parsing starts with the words in the sentence and used to rewrite rules backward
to reduce the sequence of symbols until it consists of only A.
Example:
→ Ali belajar
Ali studies.
→ (Kata Nama) belajar.
(Noun) studies.
→ (Frasa Nama) belajar
(Noun Phrase) studies.
→ (Frasa Nama) (Kata Kerja)
(Noun Phrase) (Verb)
→ (Frasa Nama) (Frasa Kerja)
(Noun Phrase) (Verb Phrase)
→ Subjek Prediket
(Subject Predicate)
→ A
(Sentence)
15
2.3 Ambiguity
Ambiguity is perhaps the most serious problem faced by parsers whereby the ambiguity
problem could not be solved by the top-down and bottom-up approaches. The parsers that
use those techniques may produce more than one parse tree when they analyse an
ambiguous sentence. On the other hand, statistical parsing is a better approach to tackle the
ambiguity problem (Charniak, 2000), (Jurafsky et. al., 2000).
There are four different types of ambiguity, namely part-of-speech (POS) ambiguity,
semantic ambiguity, syntactic or structural ambiguity and verbal ambiguity (Jurafsky et. al
2000).
2.3.1 Part-of-Speech (POS) Ambiguity
POS tagging is an activity that selects appropriate syntactic categories for words in a
sentence.
For example,
Sentence: Ali belajar (Ali studies)
POS: Ali → kata nama (noun), belajar → kata kerja
(verb)
However, ambiguity also can be a problem in POS tagging. For instance, the word mereka
has two different categories, the first is a noun (mereka meaning they) and second is a verb
(mereka meaning design).
16
2.3.2 Semantic Ambiguity
Ambiguity is a serious problem during semantic interpretation. A word can be identified as
semantically ambiguous if it maps to more than one sense.
For example: Palestinian head seeks arms.
The word “head” can be interpreted as a noun meaning either chief or the anatomical head
of body. While the word “arms” can be interpreted as a plural noun meaning either
weapons or body parts (Hoenisch, 2004).
2.3.3 Syntactic or Structural Ambiguity
Syntactic or structural ambiguity can happen when a sentence can be interpreted in more
than one way. This ambiguity arises from the relationship between the words and clauses of
a sentence and also the structure of a sentence.
Example: The children ate the cake with the spoon.
The first interpretation of the sentence can be that the children ate the cake by using the
spoon, while it can also be interpreted as the children ate the cake and the spoon together
(Manning & Schutze 1999).
2.3.4 Verbal ambiguity
Verbal ambiguity is a deeper kind of ambiguity. It happens during the speech.
17
Example: “John loves his mother and so does Bill.”
It can be used to say either that John loves John's mother and Bill loves Bill's mother, or
that John loves John's mother and Bill loves John's mother (Bach, 1998).
In developing a statistical parser for the Malay language, only structural ambiguity will be
considered. The reason is structural ambiguity arises when we parse a sentence using
syntactic parser (Jurafsky et. al 2000). This ambiguity occurs when the grammar assigns
more than one possible parse tree to a sentence. The statistical parser gives a solution where
it uses probability as the approach.
2.4 Statistical Parsing
The aim of the statistical parser is to offer a solution to the problem that arises from
syntactic parsing in which the parser cannot handle an ambiguous sentence. Statistical
parser uses probabilistic approach to the grammar that is used in parsing.
Statistical parsers have been successfully developed in variety of languages, like English,
Germany, French and Chinese. In English, there are a few statistical parsers that have been
developed like Charniak‟s Parser and Collin‟s Parser.
To build the statistical parsers, there are two main things that must consider, first is the type
of grammar and second is the method of parsing. Basically, the most basic type of grammar
that used in English Statistical Parser is context-free grammar (CFG).
18
2.4.1 Context-free Grammar in English Language
The most common type of English grammar is CFG. CFG consists of four (4) constituents,
namely a set of terminal symbols, a set of non-terminals symbols, a specific non-terminal
symbol and a set of rewrite rules.
A set of terminal symbols is the symbols that appear in the final strings, while a set of nonterminals is defined as the symbols that are expanded into other symbols. There is a
specific non-terminal symbol which is designated as the starting symbol. Basically, the
symbol of S that gives a meaning of sentence is one of the examples of a starting symbol.
A set of rewrite rules contain a single non-terminal on the left hand side and one or more
terminal or non-terminal symbols on the right (Charniak, 1993).
Table 2.1: Examples of terminal symbols, non-terminals and rewrite rules
Non-terminal Symbols
Terminal Symbols
Rewrite Rules
Sentence (S)
Verb (V)
S → NP VP
Verb Phrase (VP)
Noun N)
NP → Det N N
Noun Phrase (NP)
Determiner (Det)
NP → N
Prepositional Phrase (PP)
Adjective (Adj)
NP → Det N
Figure 2.2 shows some examples of CFG in English that are taken from Jurafsky et. al.
2000:
19
Sentence → Auxiliary NounPhrase VerbPhrase
Sentence → NounPhrase VerbPhrase
NounPhrase → ProperName
NounPhrase → NounPhrase + PrepositionalPhrase
VerbPhrase → Verb NounPhrase VerbPhrase
NounPhrase → Nominal
NounPhrase → Pronoun
Nominal → Noun
Figure 4: CFG for English (Jurafsky et. al 2000).
Figure 2.2: Some Examples of CFG for English (Jurafsky et. al. 2000)
In developing a statistical parser, the second thing that needs to be considered is the
methodology. For statistical work, we need a corpus for hand-parsed text. Fortunately, in
English language, there is a corpus that has been developed, which is called as Penn tree
bank (Marcus et. al., 1993). Penn tree bank project annotates occurring text for linguistic
structure. It shows the syntactic and semantic information. The tree banks contain a huge
amount of linguistic trees where it also annotates text with part-of-speech tags.
One of the examples in Penn tree bank is stated below:
Figure 2.3: Tree for Sentence “John loves Mary”
20
The statistical parsers work by assigning probabilities to possible parses of sentences,
locating the most probable parse tree and presenting the tree as the answer. For that
purposes, we need the probabilistic grammar. The simple mechanism for this is based on
probabilistic context-free grammar (PCFG).
2.4.2 Probabilistic Context-free Grammar (PCFG)
PCFG is an idea to combine probability with each grammatical rule. Lakeland & Knott
(2004) describes the statistical parser as an idea to combine probability with each of the
grammatical rules. PCFG is very useful in disambiguation (Jurafsky & Martin 2000).
PCFG is also known as Stochastic Context-free Grammar (SCFG) that was first proposed
by Booth (1969).
A PCFG is defined by five parameters (N, ∑, P, S, D):
(1)
A set of non-terminal symbols N
(2)
A set of terminal symbols ∑
(3)
A set of production P, each of the form A → β, where A is a nonterminal and β is a string of symbols from the infinite set of strings (∑ U
N) *
(4)
A designated start symbol S
(5)
A function that assigns probabilities to each rule D
We can assign probability values for those CFG rules. Allen (1995) defined the PCFG as
counting the number of times each rule that is used in a corpus containing parsed sentences.
The use of the process is to estimate the probability of each rule being used (Allen 1995).
21
To calculate the PCFG, Allen (1995) produced a formula. For example, let‟s consider a
category C where the grammar contains m rules, R1, ........, Rm, with left-hand side C. The
formula to compute the probability of using rule Rj to derive C is
PROB(Rj | C) = Count (# times Rj used) / ∑ i = 1, m (# times Ri used)
For instance which is taken from Allen (1995), if we calculate the PCFG for a sentence “A
flower wilted”, the rules given with the probability are as below:
S → NP VP
(1.000)
VP → V
(0.386)
VP → V NP
(0.393)
NP → N N
(0.090)
NP → N
(0.140)
NP → ART N
(0.550)
There are three possible ways to generate “A flower wilted”. By giving the probability of
individual rules, we can calculate the probability of an entire parse by taking the product of
the probability for each of the rules.
(i)
(S (NP ( ART a) (N flower)) (VP (V wilted)))
Probability for the list:
= 1 x 0.550 x 0.386
= 0.212
(ii)
(S (NP (N a) (N flower)) (VP (V wilted)))
Probability for the list:
22
= 1 x 0.09 x 0.386
= 0.035
(iii)
(S (NP (N a) (N flower)) (VP (V flower) (NP (N wilted))))
Probability for the list:
= 1 x 0.09 x 0.393 x 0.14
= 4.95 x 10-3
As 0.212 is the highest probability value among them, the first list is the most possible
representation for the parsed sentence.
Charniak‟s parser (Charniak, 1997) and Collin‟s parser (Collins, 2003) are a few examples
that used probabilistic grammar as a technique in developing statistical parser.
2.5 Charniak’s Parser
Charniak‟s parser had been developed by a group of researchers from University of
Pennsylvania (Marcus et. al 1993). The technique of parsing that was applied in this parser
is top-down parsing.
There are two basic approaches in parsing techniques, namely top-down parsing and
bottom-up parsing (Top-down parsing starts with an S symbol (S stands for sentence) and
then searches through different ways to rewrite the symbols until the input sentence is
generated (Grune & Jacobs , 1990).
23
For example, we have the sentence: She eats.
S → (Noun Phrase) ( Verb Phrase)
→ (Noun) (Verb Phrase)
→ She (Verb Phrase)
→ She (Verb)
→ She eats.
Bottom-up parsing starts with the words in the sentence and used to rewrite rules backward
to reduce the sequence of symbols until it consists of only S.
Example:
→ She eats
→ (Noun) eats
→ (Noun Phrase) eats
→ (Noun Phrase) (Verb)
→ (Noun Phrase) (Verb Phrase)
→S
In this parser, the corpus that is used based on Penn Treebank. Charniak‟s parser is a type
of statistical parsing.
It is based on probability to parse by the parser. To assign a
probability in each constituent, Charniak used a method called the Markov grammar.
Markov grammar is a method to assign probabilities to any possible expansion using
statistics gathered from training corpus (Charniak, 1997), (Collins, 2003), (Magerman,
1995).
24
In the parser, Charniak suggested three steps to parse a new sentence. First, we need an
actual grammar with PCFG form. Second, we parse the sentence by using a parser that
applies the PCFG to a sentence and find all the possible parses for the sentence. And lastly,
we need to find the highest probability among the parse trees.
For example, a test sentence: “Salespeople sold the dog biscuits.”
Step 1: We need an actual grammar with PCFG form. See Figure 2.4 below.
s
vp
vp
np
np
np
np
→ np vp
→ verb np
→ verb np np
→ det noun
→ noun
→ det noun noun
→ np np
(1.00)
(0.8)
(0.2)
(0.5)
(0.3)
(0.15)
(0.05)
Figure 2.4: The Grammar with PCFG Form
Step 2: We parse the test sentence to the parser and find all the possible parses. There are
three possible parses as shown in the three figures below: Figure 2.5, Figure 2.6 and Figure
2.7.
s
vp
np
noun
salespeople
np
verb det noun noun
sold
the dog biscuit
Figure 2.5: The First Possible Parse Tree for Sentence “salespeople sold the dog biscuits”
25
s
vp
np
np
noun
np
verb det noun noun
salespeople
sold
the dog biscuit
Figure 2.6: The Second Possible Tree for Sentence “salespeople sold the dog biscuit”
s
vp
np
np
np
np
noun
verb det noun
noun
salespeople
sold the dog
biscuits
Figure 2.7: The third possible parse tree for sentence “salespeople sold the dog biscuits”
Step 3: We need to find the highest probability among the parse trees.
Before we identify which parse tree has the highest value, we first need to calculate the
probability for each of the parse tree.
For Figure 2.5, the probability is 1.0 x 0.3 x 0.8 x 0.15 = 0.036
Figure 2.6, the probability is 1.0 x 0.3 x 0.2 x 0.5 x 0.3 = 0.009
Figure 2.7, the probability is 1 x 0.3 x 0.8 x 0.05 x 0.3 x 0.5 = 0.0018
26
From the result above, we can see that the parse tree in Figure 2.5 has the highest
probability value.
In conclusion, the parse tree for the sentence “salesman sold the dog biscuits” is found in
Figure 2.5. This parser achieved 83% for measurement recall and precision.
2.6 Collin’s Parser
Collins‟s parser is also another example of statistical parser. This parser was developed by
Michael Collins (Collins, 2003). Collins applied a statistical approach in building his
parser. A key idea in the statistical approach is to associate probability with each grammar
rules. This grammar is called as Probabilistic Context-free Grammar (PCFG). A PCFG is a
simple modification of a context-free grammar in which each rule in the grammar has an
associated probability;
The equation means that the conditional probability of ‟s
is being expanded using the rule
expanding
listed in grammar.
The probability of a tree-sentence pair
rules
Side),
, as opposed to one of other possibilities for
derived by n applications of context-free
(LHS is known as Left-Hand side, RHS is known as Right-Hand
under PCFG is
However, Collins discovered that the probability of the test sentence is almost to zero. He
suggested two solutions to overcome the problem. The suggestions are each rule should be
broken into smaller steps and the number of non-terminals should be increased.
27
This parser is also considered as head-driven statistical model because the parse tree is
represented as the sequence of decisions corresponding to a head-centered and top-down
derivation of the tree.
2.7 Statistical Parser for Malay Language
In developing a statistical parser for Malay language, there are a few things that need to be
considered. First is there is no lexicon like WordNet. WordNet is a huge lexicon resource
of English language that is used in most natural language applications (Fellbaum, 1999).
The second is there is no probabilistic grammar.
Thus, we need to compute each
probability value for each context-free grammar rule for Malay language.
Previous studies have shown that there is no statistical parser has been developed yet for
Malay language. However, to date there are two Malay syntactic parsers that had already
been built. First is the Ahmad‟s Malay Parser.
2.8 Ahmad’s Malay Parser
The parser was developed by a group of researchers from Universiti Teknologi Petronas
that was led by Ahmad Izuddin Zainal Abidin (Ahmad I.Z. Abidin et. al, 2007). This parser
is a type of syntactic parsing using a top-down parsing approach. The target of the parser is
to complete the existing word processing system by checking the grammar of a test
sentence. Another function of this parser is that it is able to illustrate a parse tree if the
sentence is grammatically correct.
28
The research domain of the parser is Malay language and focuses on basic Malay
sentences. This parser also focuses on semantic part, which is a basic level of semantic
parsing on top of syntactical parsing. In the semantic parsing, Malay words are divided into
two categories: humans and animals. Some examples of words for humans are
‘mengandung’ (pregnant), ‘memasak’ (cooking) and ‘berfikir’ (thinking) while
examples
for
animals
are
‘meragut’(grazing),
‘mengawan’
(mating)
and
‘bunting’(pregnant for animals) . The advantage of this parser is that it can handle the
semantic ambiguity. Case in point, the sentence ‘bapa meragut rumput‟ (a father is
grazing grasses) failed in parsing as the word ‘meragut’ is categorised under animals
and not for humans. This parser was evaluated by experts in Malay language, specifically
school teachers.
Figure 2.8 is the architecture of the parsing for Ahmad‟s Malay Parser.
29
User interface
Input
Checking Engine
Text Parser
Import data and check for
Grammar Accurateness
Technical Structures of Parser
Grammar rules
Lexicon
Output
Figure 2.8: System Architecture for Ahmad‟s Malay Parser
Figure 2.8 illustrates the system architecture for Ahmad‟s Malay Parser. When the user
inputs a sentence, the checking engine will parse the test sentence using the text parser
component. In the text parser component, there are two important parts which form the
technical structure of the parser. These parts are grammar rules and Malay Lexicon. The
grammar rules are derived from Nik Safiah Karim (1995) while the Malay Lexicon
contains three thousand words (3000) and is arranged according to word categories. The
30
words were collected from Kamus Dwibahasa Oxford Fajar 2 nd Edition (Hawkins, 2001).
This parser achieved 81.3% in recall measurement.
2.9 Juzaiddin’s Malay Parser
This parser had been developed by Mohd Juzaiddin Ab Aziz for his PhD study (Mohd
Juzaiddin Ab Aziz et. al, 2006). The parser introduces a pola-grammar technique that does
not require lexical process of retrieving the part-of-speech for each word. The techniques
used the automata and the finite states. This parser also analysed a basic Malay sentence,
which is either combination NP+NP, NP+VP, NP+PP or NP+AP. The basic sentence will
be grouped into five categories, namely adjunct, subject, post-subject, conjunction and
predicate. An adjunct is a type of adverbial illustrating the circumstances of the action.
For example, dua (two), pada (to), di (at), orang (people), beberapa (a few),
lampau (past), silam (past), kerana (because), agar (so that) and sekiranya
(if). A subject tells whom or what the sentence is all about. For example dia (he/she),
mereka (they) and pengaturcara (programmer). A post-subject which is „yang‟ is
normally used in Malay language as the language is a terse language. A conjunction is the
word used as a discourse marker (kata penghubung). For example tetapi (but),
kerana (because) and dan (and). A predicate tells something about the subject, such as
„makan
nasi‟ (eat rice), „menterjemah
aturcara‟ (translate a code) and
„bermain permainan komputer‟ (play computer games).
There are three steps to identify a pattern (pola) for each of the sentence.
Step 1: Identify the basic sentence
Step 2: Identify the category of the sentence
31
Step 3: Produce the sentence pattern.
For example:
Pengkompil
menukar
bahasa
paras
tinggi
kepada
bahasa mesin.
(Compiler changes the high level language to the machine language)
Step 1: Identify the basic sentence
It is a basic sentence; refer to rule (NP+VP)
Step 2: Identify the category of the sentence
Subject (Pengkompil) Predicate [menukar
kepada
bahasa
paras
tinggi
bahasa mesin]
Predicate: Verb (menukar) Object (bahasa paras tinggi) Conjunction
(kepada) Adverb (bahasa mesin)
Adverb: Object (bahasa mesin)
Step 3: Produce the sentence pattern (pola).
Menukar(bahasa paras tinggi, bahasa mesin)
The architecture of the system is shown in Figure 2.9.
32
The Sentences
Adjunct
Subject
Singular/Plural
PostSubject
Yang/Ini
Conjunction
Object
Verb
Predicate
Adverb
Identify:
1. Passive/ Active Sentence.
2. Basic or Compound Sentence.
3. The order of the subjects and the objects.
4. Negative Sentence.
Verb [[(subject, object)], Verb (subject, object)]
Figure 2.9: The System Architecture Based on „Pola‟ (Pattern) Sentence
This parser is experimented using 19 abstracts thesis consisting 3604 words and 173
sentences to test the algorithm. As the parser introduced the pola-grammar technique, it
does not involve ambiguity problems like other syntactic parsers. This parser is also been
compared to Ahmad‟s Malay Parser. Based on the comparison, Ahmad‟s Malay parser
does not solve ambiguity problem because it does not have the probabilistic model for tree
structures.
33
The Juzaiddin‟s Malay Parser achieves in the range of 73% to 93% of f-score. F-score is a
measurement of parser where it takes the average of precision and recall (Jurafsky et. al,
2000).
Based on the reviews, we can make a comparison among the parsers that is represented
using Table 2.2.
Table 2.2: Comparison between Charniak‟s parser, Collins‟s parser, Ahmad‟s Malay
parser and Juzaiddin‟s Malay parser
Approach
Type of
grammar
Representation
of ouput
Charniak’s
parser
Collins’s parser
Ahmad’s
Malay
parser
Juzaiddin’s
Malay Parser
Top down
parser and
statistical
parser
Markov
grammar
Statistical parser
Top down
parser
Not applicable
Probabilistic
Context-free
Grammar
(PCFG)
Parse tree
Context-free
grammar
Pola-grammar
Technique
Parse tree
Verb [[(subject,
object)], Verb
(subject,
object)]
Parse tree
2.10 Summary of the Chapter
In this chapter, we have defined the syntactic analysis.
Then, we have defined and
described the details of ambiguity and also the statistical parsers that available for English.
This chapter also discusses the available parsers for Malay language.
34
Based on the review, as suggested by Charniak (1997), the simplest method to develop
statistical parser is using Probabilistic Context-free Grammar (PCFG). To our knowledge,
no attempt has been made to provide the probability values for Malay Context-free
Grammar. Mohd Juzaiddin Abd Aziz et. al (2006) introduces the pola-grammar technique
which does not involve ambiguity problems. The parser is only suitable for annotating the
thematic roles in semantic analysis. To overcome the ambiguity problem in syntactic
analysis, the probability values should be provided for Malay context-free grammar rules.
In conclusion, this chapter helps to form a strong base for understanding the concept of
parsing and the ambiguity problem.
35
CHAPTER 3:
PROBABILISTIC MALAY
GRAMMAR
Chapter 3
Probabilistic Malay Grammar
Chapter 3 describes the introduction to grammar, types of grammar in Malay language, the
details of Context-free Grammar for Malay language and the Probabilistic Malay Grammar.
In the Probabilistic Malay Grammar, it explains the details of how training data is gathered
and the steps to calculate the probability.
3.1 Introduction of Grammar
Grammar is an inner regularity and the simple knowledge representation of language
(Chomsky 1966, 1971, 1975, 1980). It occurs from language and plays the most important
role in the implementation of the fundamental aims of linguistics analysis. Grammar plays
two roles:
(i)
To separate the grammatical sentence from the ungrammatical sequences
(ii)
To study the structure of grammatical sentences.
The grammar of a language is also a device that generates all grammatical sentences of the
language and none of the ungrammatical ones (Yuan 1997).
3.2 Types of Malay Language Grammar
There are three types of grammar in Malay language. Sentence grammar is the first, second
is the partial discourse grammar and the third is the „pola‟ (pattern) sentence grammar.
36
3.2.1 Sentence Grammar
This type of grammar uses personal, idiolectal (the total amount of a language that any one
person knows and uses), artificial sounding and independent sentences as a guide in making
syntactic Malay sentences.
‘Ayat’(sentence) grammar has two models, namely the transformational-generative
grammar (Nik Safiah Karim 1975) and the relational grammar (Yeoh 1979).
The
transformational-generative grammar is a grammar that consist a series of phrase-structure
rewrite rules. For example, a series rules that generates the underlying phrase structure of a
sentence; and a series of rules that act upon the phrase structure to form more complex
sentence. The relational grammar is a theory of descriptive grammar which stated the
syntactic operations such as the relationship between subject and object. These two models
are inherited of context-free grammar (CFG).
3.2.1.1 CFG in Malay Language
CFG for Malay language was formed by Nik Safiah Karim (1995). It became the basis in
developing probabilistic for Malay language grammar. The CFG to form a basic sentence
for the Malay language is pictured in Figure 3.1.
37
A
→ S + P
S
→ FN
P
→ FN
P
→ FK
P
→ FA
P
→ FS
FN → (Bil) + (PenjBil) + (Gel) + KN + [KN] + (Pen) +(Pent)
FK → (KBantu) + KKtr + Obj + (Ket)
FK → (KBantu) + KKtr + AKomp + (Ket)
FK → (KBantu) + KKttr + Pel + (Ket)
FK → (KBantu) + KKttr + AKomp + (Ket)
FA → (KBantu) + (KPeng) + Adj + [Adj] + (Ket) + (AKomp)
FS → (KBantu) +
KSN + (KNA) + FN + (AKomp)
FS → (KBantu) +
KSN + (KNA) + FN + (Ket)
Figure 3.1: Context-Free Grammar for Malay Language, after Nik Safiah Karim (1995)
38
Table 3.1: Description of Elements Used in Malay Grammar Rules, after Ahmad I. Z.
Abidin et al. (2007)
Element
Description in Malay language (English)
A
Ayat (Sentence)
()
S ()
P
Subjek (Subject)
()
Prediket (Predicate)
Adj ()
Adjektif (Adjective)
AKomp ()
Ayat Komplemen (Complementary Sentence)
Bil ()
Bilangan (Numeric)
FA ()
Frasa Adjektif (Adjective Phrase)
FK ()
Frasa Kerja (Verb Phrase)
FN ()
Frasa Nama (Noun Phrase)
FS
() Frasa Sendi (Prepositional Phrase)
Gel ()
Gelaran (Title)
KBantu ()
Kata Bantu (Auxiliary)
Ket ()
Keterangan (Explanation)
KKtr ()
Kata Kerja Transitif (Transitive Verb)
KKttr ()
Kata Kerja Tak Transitif (Intransitive Verb)
KNA ()
Kata Nama Arah (Direction)
KN ()
Kata Nama (Noun)
KPeng ()
Kata Penguat (Intensifier)
Obj ()
Objek (Object)
Pel ()
Pelengkap (Complement)
Pen ()
Penerang (Description)
PenjBil ()
Penjodoh Bilangan (Numerical Coefficient or Classifier)
Pent ()
Penentu (Determiner)
KSN ()
Kata Sendi Nama (Conjunction)
39
3.2.2 Partial Discourse Grammar
A discourse grammar is the grammar of the sentences as are used in discourse. Discourse
concerns how the immediately preceding sentences affect the interpretation of the next
sentences.
A partial discourse is the grammar that picks out the sentences from discourse in order to
make linguistic statements about them. This type of grammar is different to “ayat”
grammar because it uses “language-first” approach in the writing of syntax while “ayat”
grammar uses “theory-first” approach. According to Azhar Simin (1988), “language-first”
approach represents a chance for Malay reader to read the latest ideas in his own language
about the genius of his language, while “theory-first” approach is more likely to need the
sentence in order to make the chosen theory appear workable.
Example of partial discourse grammar:
Aminah membaca buku. Dia juga mendengar radio.
(Aminah is reading a book. She is also listening to the radio.)
Dia (She) is referring to Aminah
3.2.3 ‘Pola’ (Pattern) Grammar
“Pola” grammar is the pattern of grammar in the sentences. This type of grammar is used
by Azhar Simin (1988). Each “pola” is linked to class-name that forms or helps to make a
basic sentence. Each “pola” is a formula to make a basic sentence.
40
Example:
”Pola”: Pelaku + perbuatan
Pattern: Actor + verb
Sentence: Saya makan.
(I eat).
Asmah Hj Omar (1968) represents the most “theoretical” work on “pola” grammar. It
provides a methodology for “pola” grammar writing. Below is the “pola” grammar for
Malay language:
(i)
Pelaku + Perbuatan (Actor + Verb)
(ii)
Pelaku
+
Perbuatan
+
Pelengkap
(Actor + Verb +
Complement)
(iii)
Perbuatan + Pelengkap (Verb + Complement)
(iv)
Diterangkan + Menerangkan (Signified + Signify)
(v)
Digolong + Penggolong (Classified + Classifier)
(vi)
Pelengkap + Perbuatan + Pelaku (Complement + Verb +
Actor)
(vii)
Pelengkap + Perbuatan (Complement + Verb)
In forming a basic sentence in Malay language, the type of grammar that is suitable to use
is sentence grammar which provides rules. The rules are derived from CFG that was
mentioned in Section 3.2.1.1.
41
3.3 Rules for Basic Malay Sentence
Basically, to create rules for a sentence in Malay language, we should pursue CFG for
Malay language by Nik Safiah Karim (1995) as shown in Figure 7. A basic sentence in
Malay language can be derived from these four (4) basic patterns of rules:
A → FN + FN
(S
)
A → FN + FK
(S
)
A → FN + FA
(S
)
A → FN + FS
(S
)
where
A = Ayat (sentence), FN = Frasa Nama (Noun Phrase), FK = Frasa Kerja
(Verb Phrase), FA = Frasa Adjektif (Adjective Phrase), FS = Frasa
Sendi (Prepositional Phrase)
Each of FN, FK, FA and FS can be described detail in the following sections.
3.3.1 Rules for FN (Noun Phrase, NP)
There is a basic rule for FN. The rule is in Figure 3.2.
42
Bil + PenjBil
Bil
+
KN +
{Pen}
+
{Pent}
Gel
Figure 3.2: Basic Rule for FN
The elements that are used in the FN basic rules can be described below:

Bil = Bilangan (Numerical)

PenjBil
=
Penjodoh
Bilangan
(Numerical Coefficient of
Classifier)

Gel = Gel (Title)

KN = Kata Nama (Noun)

Pen = Penerang (Description)

Pent = Penentu (Determiner)
For example: dua orang pelajar itu (the two students), mereka (they), and
murid-murid (pupils)
All those elements can be considered as POS except for Pen or Penerang (Description).
Penerang can be categorised into two groups, namely „Penerang 1‟ and „Penerang
2‟. „Penerang 1‟ contains KN or Kata Nama (Noun) while „Penerang 2‟ contains
KK or Kata Kerja (Verb), KSN or Kata Sendi Nama (Preposition) and KA or
Kata Adjektif (Adjectives).
43
The basic rule of FN in Figure 3.2 can be detailed in Figure 3.3.
Bil + PenjBil
Bil
KN
+ KN +
Gel
KKtr+KN/KKttr
+ {Pent}
KA
KSN + KN
Figure 3.3: Basic Rule for FN in detail
tiga orang pelajar sekolah itu (the three school students),
Examples:
Datuk Ahmad di dewan parlimen (Datuk Ahmad at Parliament House)
From the above, we can develop many rules for FN as the elements inside the parentheses
are optional.
3.3.2 Rules for FK (Verb Phrase, VP)
There are two basic rules for FK. Kata Kerja (verb) consists of two dissimilar types,
namely Kata Kerja Transitif, KKtr (transitive verb) and Kata Kerja Tak
Transitif, KKttr (intransitive verb).
Basic Rules:
(i)
for KKtr: FK → (KBantu) + KKtr + KN + (Ket)
(ii)
for KKttr: FK → KKttr + (Pel)
Figure 3.4: Basic Rules for FK
44
For example: KKtr: melamar (propose), menggunakan (use), menerima (receive)
KKttr: tidur (sleep), makan (eat), mandi (bathe)
3.3.3 Rules for FA (Adjective Phrase, AP)
Frasa Adjektif is a phrase (groups of words) that relate to a noun. There is a basic
rule for FA.
Basic rule:
FA → (KBantu) + (KPeng) + KA + (KA) + (Ket) + (Akomp)
Figure 3.5: Basic rules for FA
For example: amat berani (very brave), cantik (beautiful), renik (very small)
3.3.4 Rules for FS
Frasa sendi (prepositional phrase) is a phrase that is used to show the relationship of a
noun or a pronoun to some other word. There is a basic rule for FS.
Basic rule:
FS → (KBantu) + KSN + (KArah) + FN + (Ket)
Figure 3.6: Basic rules for FS
For example: di dalam pinggan (in the plate), di tepi tangga (beside the
ladder), dalam perahu (in the boat), di Kuala Lumpur (at Kuala Lumpur)
45
As mentioned, context-free grammar is the most basic grammar in Malay language. Many
syntactic parsing uses CFG as their grammar. Yet, syntactic parsing has a main difficulty
where it cannot handle syntactic ambiguity. To resolve the problem, we place a statistical
element on each of the Malay grammar rules. The statistical element is the probability
itself. We compute the probability to the CFG for Malay language. We label it as
Probabilistic Context-free Grammar (PCFG).
3.4 Probabilistic Context-Free Grammar (PCFG) for Malay Language
In English language, there are a few corpora that contain rules with the probability, for
example Penn Wall Street Journal Corpus (Marcus et. al, 1993) and Brown Corpus (Kucera
& Francis 1979).
However, in Malay language, there is no such corpus that contains rules and probability.
We need to calculate the probability for the Malay grammar rules. In order to calculate the
probability for each rule in CFG Malay language, there are two steps that we should follow.
First we need a training data that contains a collection of Malay basic sentences. In this
research project, we collect one thousand (1000) sentences from various sources. Then, we
need to compute the probability based on the training data.
3.4.1 Training Data
In getting one thousand (1000) basic sentences, various sources are used. The sources are
listed below:
46
(1) Malay text books for primary schools (Zainal Arifin Yusof, Kamarudin Jeon &
Mohd Nasar Sukor, 2008), (Zainal Arifin Yusof, Kamarudin Jeon & Mohd Nasar
Sukor, 2005).
(2) Malay Grammar books (Abdullah Hassan, 1993), (Abdullah Hassan & Ainon
Mohd., 1994a), (Abdullah Hassan & Ainon Mohd., 1994b), (Abu Naim Kassan,
2001), (Nik Safiah Karim et. al, 2009)
Training data analysis process is evaluated by the Malay language expert, who is known as
Munsyi Dewan. Munsyi Dewan is a group of human language experts in Malay language,
who are certified by Bahasa dan Pustaka (DBP). This group has main responsibilities for
giving consultation and lectures about Malay language to the public and private sectors
(Dewan Bahasa dan Pustaka Malaysia, 2010).
There are two steps of analysis the training data. First, the sentences are tagged according
to Malay lexicon. Second is the process of deriving the Malay rules. The process needs to
match the result of the tagged sentenced and the Malay language rules.
For example:
Sentence: Umur saya tujuh tahun
(My age is seven years old)
Step 1:
Tagging Process
Umur saya tujuh tahun
KN
KN
Bil
KN
47
Step 2: Deriving Rules Process
A→S+P
S → FN
P → FN
FN → KN + KN
FN → KN + KN
The processes are repeated to all the basic sentences in training data. The results are shown
in Appendix A. The training data are manually tag by Munsyi Dewan.
3.4.2 Analysis Pattern of Training Data
Based on the sentences, they can be categorized into four patterns; FN + FN, FN + FK, FN
+ FA, and FN + FS patterns. The fractions of the sentences are displayed in Table 3.2.
Table 3.2: Analysis of the sentences according to their pattern
PATTERN
NUMBER OF
SENTENCES
FN + FN
80
FN + FK
739
FN + FA
141
FN + FS
40
TOTAL
1000
Table 3.2 can be represented using a pie chart.
48
FN+FA
14.1%
141 sentences
FN+FS
4.0%
40 sentence
FN+FN
8.0%
80 sentences
FN+FN
FN+FK
FN+FK
73.9%
739 sentences
FN+FA
FN+FS
Figure 3.7: Analysis Pattern Sentence of Training Data
The pie chart shows the pattern sentences of data collection.
There are clear differences
among the patterns. The FN+FK has the highest percentage where the value is 73.9%. It
shows that many sentences contain verb. Next pattern is pattern FN+FA where it has
higher percentage compared to FN+FN where the different value is 6.1%. The least
percentage is 4% where it belongs to FN+FS‟s pattern.
The analysis of the basic sentences is shown in several tables. Each of the tables displays
the amount of left-hand side rules (LHS), right-hand-side rules (RHS) and the probability
values. Table 3.3 displays the result of A (sentence), S (subject), and P (predicate) rules.
Table 3.4 presents the FN (noun phrase) rules while Table 3.5 shows the FK (verb phrase)
rules. Table 3.6 and 3.7 show the FA (adjective phrase) and FS (prepositional phrase) rules
respectively.
The total of the rules for each of the segments are presented detail in
Appendix B.
49
Table 3.3: Rules for A (sentence), S (subject), and P (predicate)
Num
RULE
LHS
RHS Rule RHS/ LHS
Rule
Probability
Count
Count
1.
A→S+P
1000
1000
1000/1000
1.0000
2.
S → FN
1000
1000
1000/1000
1.0000
3.
P → FN
1000
80
80/1000
0.0800
4.
P → FK
1000
739
739/1000
0.7390
5.
P → FA
1000
141
141/1000
0.1410
6.
P → FS
1000
40
40/1000
0.0400
Table 3.4: Rules for the FN (noun phrase)
Num
1.
RULE FN
FN → KN
LHS
RHS
RHS/
Rule
Rule
LHS
Count
Count
1304
517
517/
Probability
0.3965
1034
2.
FN → KN + Pent
1304
162
162/
0.1242
1304
3.
FN → KN + Pent + KA + KN
1304
1
1/1304
0.0008
4.
FN → KN + Pent + KN + Bil +
1304
1
1/1304
0.0008
1304
1
1/1304
0.0008
KN
5.
FN → KN + Pent + KN + KN
50
Num
6.
RULE FN
FN → KN + Pent + KSN + KN +
LHS
RHS
RHS/
Probability
Rule
Rule
LHS
Count
Count
1304
2
2/1304
0.0015
KN
7.
FN → KN + KKTr + KN + KN
1304
1
1/1304
0.0008
8.
FN → KN + KKTTr
1304
3
3/1304
0.0023
9.
FN → KN + KSN + KN + KN +
1304
2
2/1304
0.0015
3/1304
0.0023
263/
0.2017
KN
10.
FN → KN + KSN + KN + Pent
1304
3
11.
FN → KN + KN
1304
263
1304
12.
FN → KN + KN + KSN + KN +
1304
5
5/ 1304
0.0038
1304
1
1/1304
0.0008
1304
1
1/1304
0.0008
KN
13.
FN → KN + KN + KSN + KN +
KN + Pent
14.
FN → KN + KN + Bil + PenjBil
+ KN + KN + KN
15.
FN → KN + KN + Pent
1304
97
97/1304
0.0744
16.
FN → KN + KN + KBantu + KA
1304
3
3/1304
0.0023
17.
FN → KN + KN + KA + KN +
1304
1
1/1304
0.0008
KN
18.
FN → KN + KN + KN
1304
63
63/1304
0.0483
19.
FN → KN + KN + KN + Pent
1304
12
12/1304
0.0092
51
Num
20.
RULE FN
FN → KN + KN + KN + Bil +
LHS
RHS
RHS/
Probability
Rule
Rule
LHS
Count
Count
1304
1
1/1304
0.0008
1304
1
1/1304
0.0008
1304
4
4/1304
0.0031
KN + Pent
21.
FN → KN + KN + KN + Pent +
KN
22.
FN → KN + KN + KN + KSN +
KN
23.
FN → KN + KN + KN + KN
1304
15
15/1304
0.0115
24.
FN → KN + KN + KN + KN +
1304
3
3/1304
0.0023
KN
25.
FN → KPeng + KA + KN + KN
1304
4
4/1304
0.0031
26.
FN → Bil + KN + KN
1304
3
3/1304
0.0023
27.
FN → KN + KN + KSN + KN
1304
2
2/1304
0.0015
28.
FN → KN + KBantu + KN +
1304
1
1/1304
0.0008
KBantu
29.
FN → KN + KKTTr + KA
1304
1
1/1304
0.0008
30.
FN → KN + KSN + KN
1304
19
19/1304
0.0146
31.
FN → KN + PenjBil + KN + Pent
1304
1
1/1304
0.0008
32.
FN → Bil + KN
1304
22
22/1304
0.0169
33.
FN → PenjBil + KN + KN
1304
1
1/1304
0.0008
34.
FN → Bil + PenjBil + KN
1304
23
23/1304
0.0176
35.
FN → KN + KBantu + KA +
1304
1
1/1304
0.0008
52
Num
RULE FN
LHS
RHS
RHS/
Rule
Rule
LHS
Count
Count
Probability
Pent
36.
FN → Gel + KN
1304
2
2/1304
0.0015
37.
FN → KBantu + KSN + KN +
1304
1
1/1304
0.0008
KN
38.
FN → Bil + KN + KN + Pent
1304
1
1/1304
0.0008
39.
FN → KN + KPeng + KA
1304
1
1/1304
0.0008
40.
FN → KBantu + KKTr + KN +
1304
1
1/1304
0.0008
1304
1
1/1304
0.0008
KSN + KN + Pent
41.
FN → KN + KN + KBantu +
KKTTr
42.
FN → KN + KSN + KN + KN
1304
5
5/1304
0.0038
43.
FN → KN + KSN + KKTTr
1304
1
1/1304
0.0008
44.
FN → Bil + PenjBil + KSN + KN 1304
1
1/1304
0.0008
45.
FN → Bil + PenjBil + KN + KN
1304
6
6/1304
0.0046
46.
FN → KN + KN + KBantu
1304
1
1/1304
0.0008
47.
FN → KN + KSN + KA
1304
2
2/1304
0.0015
48.
FN → Bil + KN + KSN + KN
1304
1
1/1304
0.0008
49.
FN → KN + KSN + KN +
1304
1
1/1304
0.0008
1304
1
1/1304
0.0008
KBantu
50.
FN → Bil + PenjBil
53
Num
51.
RULE FN
FN → KN + KN + KSN + KN +
LHS
RHS
RHS/
Probability
Rule
Rule
LHS
Count
Count
1304
1
1/1304
0.0008
Pent
52.
FN → KN + KSN + KN + KA
1304
1
1/1304
0.0008
53.
FN → KN + KSN + KN + KSN
1304
2
2/1304
0.0015
1304
1
1/1304
0.0008
1304
1
1/1304
0.0008
+ KN
54.
FN → KN + KN + KSN + KN +
KN + KN + KN
55.
FN → KN + KN + Pent + KSN +
KN
56.
FN → KN + KKTTr + KKTTr
1304
1
1/1304
0.0008
57.
FN → Bil + PenjBil + KN + KA
1304
1
1/1304
0.0008
58.
FN → KN + Pent + KSN + KN
1304
2
2/1304
0.0015
59.
FN → KN + Bil
1304
3
3/1304
0.0023
60.
FN → KN + PenjBil + KN
1304
1
1/1304
0.0008
61.
FN → KN + Pent + KN
1304
2
2/1304
0.0015
62.
FN → KN + Pent + KN + KN
1304
2
2/1304
0.0015
63.
FN → KN + KBantu + KBantu +
1304
1
1/1304
0.0008
KKTTr + Pent
64.
FN → KN + KN + KSN + KN
1304
1
1/1304
0.0008
65.
FN → KN + KSN + KN + Bil +
1304
2
2/1304
0.0015
KN
54
Num
RULE FN
LHS
RHS
RHS/
Rule
Rule
LHS
Count
Count
Probability
66.
FN → Bil + KN + Kbantu + KA
1304
1
1/1304
0.0008
67.
FN → KN + Pent + KA
1304
1
1/1304
0.0008
68.
FN → KN + KBantu + KPeng +
1304
1
1/1304
0.0008
KA
69.
FN → KN + KN + Pent + KN
1304
1
1/1304
0.0008
70.
FN → KN + Bil + KN
1304
1
1/1304
0.0008
71.
FN → KN + KN + Bil + KN
1304
1
1/1304
0.0008
72.
FN → KN + Bil + PenjBil + KN
1304
6
6/1304
0.0046
73.
FN → KN + KN + Bil + PenjBil
1304
1
1/1304
0.0008
1304
1
1/1304
0.0008
1304
1
1/1304
0.0038
+ KN
74.
FN → KN + Pent + KSN + KN +
KBantu + KSN + KN
75.
FN → KN + KBantu + KN +
Pent
Table 3.5: Rules for the FK (verb phrase)
Num
Rules FK
LHS
RHS
LHS/
Rule
Rule
RHS
Count
Count
Probability
1.
FK → KKTr + FN
704
339
339/704
0.4815
2.
FK → KBantu + KKTr + FN
704
61
61/704
0.0866
55
Num
Rules FK
LHS
RHS
LHS/
Rule
Rule
RHS
Count
Count
Probability
3.
FK → KKTTr
704
67
67/704
0.0952
4.
FK → KKTTr + KA
704
26
26/704
0.0369
5.
FK → KKTTr + KSN + KN
704
50
50/704
0.0710
6.
FK → KKTTr + KSN + KN + KN
704
36
36/704
0.0511
7.
FK → KKTTr + KSN + KN + KN +
704
5
5/704
0.0071
KN
8.
FK → KBantu + KKTTr
704
55
55/704
0.0781
9.
FK → KBantu + KKTTr + KSN + KN
704
12
12/704
0.0170
10.
FK → KBantu + KKTTr + KSN + KN
704
3
3/704
0.0043
704
1
1/704
0.0014
+ KN
11.
FK → KNafi + KKTTr + KSN + KN +
KN
12.
FK → KKTTr + KSN +Bil + KN + KN
704
1
1/704
0.0014
13.
FK → KKTTr + KN + KN
704
2
2/704
0.0028
14.
FK → KBantu + KKTTr + KA
704
2
2/704
0.0028
15.
FK → KKTTr + KN + Bil + KN
704
1
1/704
0.0014
16.
FK → KNafi + KBantu + KKTTr +
704
1
1/704
0.0014
704
1
1/704
0.0014
704
1
1/704
0.0014
KN + KN
17.
FK → KBantu + KKTTr + KBantu +
KN
18.
FK → KKTTr + KSN + Pent
56
Num
Rules FK
LHS
RHS
LHS/
Rule
Rule
RHS
Count
Count
Probability
19.
FK → KKTTr + KN + KN + KN
704
2
2/704
0.0028
20.
FK → KKTTr + KBantu + Bil +
704
1
1/704
0.0014
704
1
1/704
0.0014
704
1
1/704
0.0014
PenjBil
21.
FK → KKTTr + KSN + KN + KBantu
+ Bil + KN
22.
FK → KKTTr + KSN + KN + KN +
KSN + KN + KN
23.
FK → KKTTr + KSN + KN + Pent
704
1
1/704
0.0014
24.
FK → KKTTr + KSN + KN +KN+
704
2
2/704
0.0028
Pent
25.
FK → KKTTr + KPeng + KA
704
2
2/704
0.0028
26.
FK → KKTTr + KSN + KBantu +
704
1
1/704
0.0014
704
2
2/704
0.0028
704
1
1/704
0.0014
704
1
1/704
0.0014
KKTr + KN
27.
FK → KKTTr + KSN + KN + KSN +
KN + KN
28.
FK → KKTTr + KSN + KN + KSN +
KKTr + KN
29.
FK → KKTTr +KN + KSN + KN +
57
Num
Rules FK
LHS
RHS
LHS/
Rule
Rule
RHS
Count
Count
Probability
KN + KKTTr
30.
FK → KKTTr + KSN + KN + KN +
704
2
2/704
0.0028
704
1
1/704
0.0014
704
1
1/704
0.0014
Pent
31.
FK → Kbantu + KKTTr + KN + KSN
+ Bil + KN + KN
32.
FK → KKTTr + KBantu + KKTTr +
KN
33.
FK → KNafi + KKTTr + KBantu
704
1
1/704
0.0014
34.
FK → KKTTr + KA + KSN + KN +
704
1
1/704
0.0014
KN
35.
FK → KKTTr + KN
704
2
2/704
0.0028
36.
FK → KKTTr + Bil
704
1
1/704
0.0014
37.
FK → KKTTr + KKTr + KN + KN
704
1
1/704
0.0014
38.
FK → KKTTr + KN + KN + KN + KN
704
1
1/704
0.0014
39.
FK → KBantu + KKTTr + KSN + KN
704
1
1/704
0.0014
+ KN
40.
FK → KBantu + KKTr + KKTTr
704
1
1/704
0.0014
41.
FK → KBantu + KKTTr + KA
704
4
4/704
0.0057
42.
FK → KPeng + KKTTr
704
2
2/704
0.0028
43.
FK → KBantu + KKTTr + KPeng
704
1
1/704
0.0014
58
Num
Rules FK
LHS
RHS
LHS/
Rule
Rule
RHS
Count
Count
Probability
44.
FK → KBantu + KKTTr + KBantu
704
2
2/704
0.0028
45.
FK → KKTTr + KSN + KN KBantu +
704
1
1/704
0.0014
704
1
1/704
0.0014
704
1
1/704
0.0014
KA
46.
FK → KKTTr + KSN + KN + Bil +
KN
47.
FK → KKTTr + KKTTr
Table 3.6: Rules for the FA (adverb phrase)
Num
1.
Rules FA
FA → KA + KSN + KN + KN +
LHS
RHS
RHS/
Probability
Rule
Rule
LHS
Count
Count
141
1
1/141
0.0071
KBantu + KA
2.
FA → KA + Kpeng
141
15
15/141
0.1064
3.
FA → KA + KN
141
3
3/141
0.0213
4.
FA → KA + KN + KN + KN + KN
141
1
1/141
0.0071
5.
FA → Kpeng + KA
141
39
39/141
0.2766
6.
FA → Kpeng + KA + KKTTr
141
3
3/141
0.0213
7.
FA → KBantu + KN + KA
141
1
1/141
0.0071
8.
FA → KBantu + KPeng + KA
141
1
1/141
0.0071
59
Num
Rules FA
LHS
RHS
RHS/
Rule
Rule
LHS
Count
Count
Probability
9.
FA → KBantu + KA + KPeng
141
1
1/141
0.0071
10.
FA → KPeng + KA + KBantu +
141
1
1/141
0.0071
KN + KN + Pent
11.
FA → KBantu + KA
141
11
11/141
0.0780
12.
FA → KBantu + KA + KSN + KN
141
2
2/141
0.0142
141
1
1/141
0.0071
+ Pent
13.
FA → KBantu + KA + KPeng +
KSN + KN + KN + KN + KN
14.
FA → KA + KSN + KN
141
1
1/141
0.0071
15.
FA → KPeng + KA + KKTTr
141
2
2/141
0.0142
16.
FA → KPeng + KA + KSN + KN +
141
1
1/141
0.0071
KN
17.
FA → KA
141
44
44/141
0.3121
18.
FA → KA + KSN + KN + KN
141
1
1/141
0.0071
19.
FA → KA + KA
141
8
8/141
0.0567
20.
FA → KA + KA + KN
141
1
1/141
0.0071
21.
FA → KA + KKTr + KN
141
1
1/141
0.0071
22.
FA → KA + KN + KKTTr
141
1
1/141
0.0071
23.
FA → KBantu + KA + KBantu +
141
1
1/141
0.0071
KKTTr + KSN + KN
60
Table 3.7: Rules for the FS (prepositional phrase)
Num
Rules FA
LHS Rule
RHS
RHS/
Count
Rule
LHS
Probability
Count
1.
FS → KSN + FN
40
39
39/40
0.975
2.
FS → KBantu + KSN + FN
40
1
1/40
0.025
3.5 Summary of the Chapter
In this chapter, a review of Malay grammar was explained in detail. The grammar rules to
form a basic sentence in Malay language are also described. This chapter also showed how
to calculate probability value from training data. The thousand sentences of training data
are gathered from various sources. This probability value is counted and put inside the
database together with the rules. In the following chapter, it shows how the probability
values are used in the parsing engine.
61
CHAPTER 4:
DEVELOPMENT OF MALAY
STATISTICAL PARSER
Chapter 4
Development of Malay Statistical Parser
The system development method used for the development of the Statistical Malay Parser
is a prototyping model. This method is used to test or illustrate an idea and build a system
in explorative way (Hawryszkiewycz, 1998). The development process using this method
covers four main stages.
The stages are requirement specification, system design,
implementation and evaluation.
4.1 Requirement Specification
The requirement for Malay Statistical Parser is to develop a parser that can parse an
ambiguous sentence and represent a parse tree with the highest probability value.
4.1.1 Functional Requirement
There is one functional requirement for this prototype. The requirement is to parse a
sentence. A sentence that should be input by a user must be a type of basic and declarative
sentence. The prototype could not accept any text file for input component.
However, the length of the sentence should not exceed more than ten (10) words. The
reason is the training data has also the same length of words in a sentence.
Once the user inputs a sentence, the user needs to click the button „Parse‟, then if the
system successful parsing the sentence, it then automatically calculates the probability. If
62
the sentence has an ambiguity, the parser shows a few parse trees with their probability
values.
Finally it selects the highest value in order to show the best parse tree that
represents the ambiguous sentence.
4.1.2 Non-functional Requirement
A non-functional requirement or constraint describes a restriction on the system that limits
our choices for constructing a solution to a problem (Pfleeger, 1998). The non-functional
requirement for Malay Statistical Parser is stated below
(i)
Usability
This parser is useful for those who want to check the best representation of
parse tree for an ambiguous sentence. It also helpful for those who want to
check grammar for a basic sentence, which limits to ten words only.
(ii)
Platform constraint
This prototype shall run under Windows platform only.
(iii)
Response times
On average, this result shall be answered within 3 seconds.
(iv)
Reliability
The input to the parser shall be a sequence of texts. The parser could not accept
numbers unless they are spelled.
63
4.2 System Design
The system is described in terms of architecture and the user interface design.
4.2.1 System Architecture
The system architecture of a Malay Statistical Parser has five components, namely the
Parsing Engine, the Part-Of-Speech (POS) Tagger, the Malay Lexicon (MaLEX), the Input
Component and the Output Component. The system architecture is shown in the following
Figure 4.1.
Input
POS tagger
component
User
MaLEX
Parsing Engine
CFG with probability
(PCFG)
Output component
(Parse tree)
Figure 4.1: System Architecture of a Statistical Parser for Malay Language
The statistical parser is aimed to reduce structural ambiguity in Malay sentence. The parser
produces a few parse trees for an ambiguous sentence and computes the probability of each
tree to determine the most probable tree with the highest probability value.
64
The process of parsing a sentence starts when a user input a sentence. The sentence will
be tagged by the POS tagger. The tagger chooses a single tag for each of the words in the
sentence. Tag is part-of-speech (POS) for a word. Malay lexicon (MaLEX) provides
words and tags to the parser.
After tagging, the next step is parsing activity which involves a parsing engine. The engine
parses the test sentence and checks the grammar rules that applied to the test sentence. If
the test sentence is an ambiguous sentence, it will calculate the probability for each of the
possible parse trees. However, the output for the parsing process is a parse tree with the
highest probability value.
Each of the components in the system will be described in detail below:
4.2.1.1 Input Component
The input component will accept a Malay sentence. In Malay language, there are four
types of sentences based on their purposes, namely
(i)
Declarative sentence (Ayat penyata)
A declarative sentence is a sentence which the predicate explains something
about the subject.
For example:
Ini rumah saya.
(This is my house.)
65
(ii)
An interrogative sentence (Ayat Tanya)
An interrogative sentence is used to pose a question or to seek information.
For example:
Berapa umur anda?
(How old are you)
(iii)
Imperative sentence (Ayat perintah)
An imperative sentence gives an order or makes a request.
For example:
Menangislah sepuas-puas hati.
(Cry your heart out.)
(iv)
Exclamatory sentence (Ayat Seru)
An exclamatory sentence is a sentence which used intonation to describe a response
to an emotion such as joy, surprise, disbelief, fear, anger, sorrow and pain.
For example:
Amboi, cantiknya kereta ni!
(What a beautiful car!)
In this system, the user needs to enter a declarative sentence as it follows context-free
grammar (CFG) rules that were derived for Nik Safiah Karim (1995).
66
The user also needs to enter a basic sentence rather a complex sentence. A basic sentence
is a sentence which becomes the base or source for forming other sentences. It has only a
subject and predicate. A basic sentence is also known as a kernel sentence or simple
sentence (Nik Safiah Karim et al. 2009).
For example,
Rumah itu terbakar
Subjek
Prediket
(The house is burnt).
Subject
Predicate
A complex sentence is a sentence that comes from a combination of two or more subjects
or predicates.
For example:
Ahmad seorang pelajar yang pintar tetapi dia berasal
dari keluarga yang susah.
(Ahmad is a clever boy but he is from a poor family)
The sentence consists of two basic sentences.
(i)
Ahmad seorang pelajar yang pintar
(Ahmad is a clever. boy)
(ii)
Dia berasal dari keluarga yang susah.
(He is from a poor family)
67
Both of the sentences (sentence (i) and sentence (ii)) are joined using conjunctions
tetapi (but). However, only simple sentence is allowed to be entered in the system.
4.2.1.2 Part-of-Speech (POS) Tagger
POS tagging is a process of assigning POS or other syntactic class marker to each of word
in corpus. POS tagger involves two main components. The components are Tokenization
and Ambiguity Look Up.
Tokenization is a process of breaking a stream of text up into meaningful elements. The
elements are called token. For example, the sentence “Rumah itu terbakar” (That
house burnt) will be divided into three tokens which are rumah (house), itu (that), and
terbakar (burnt).
In Ambiguity Look Up, it involves the use of Malay Lexicon. From the lexicon, there are
ninety (90) words which have more than one part-of-speech. These words also can lead to
the ambiguous sentences.
For example, word „gemar‟ (favour) has two part-of-speeches; KKTr (transitive verb) and
KBantu (auxiliary).
68
A
S
P
FN
FK
KN
KKTr
Objek
Dia
gemar
melancong
Figure 4.2: A parse tree of sentence “Dia gemar melancong”
(Mohd Juzaiddin Ab Aziz, et.al, 2006)
A
S
P
FN
FK
KN
KBantu
Dia
gemar
KKTr
menulis
Objek
aturcara
Figure 4.3: A parse tree of sentence “Dia gemar menulis aturcara”
(Mohd Juzaiddin Ab Aziz, et.al, 2006)
In this tagging process, it can be simplified by referring to the tagging algorithm below. In
the algorithm, the input to the POS tagger is a string of words (a sentence). The output is a
single tag for each word. The pseudocode of the algorithm is shown below:
69
Start
Read a string of words
while(the string does not reach the end)
{
if (the string is in the lexicon)
return the POS
else
unrecognized POS
{
End
Figure 4.4: The Pseudocode for POS Tagger
For example, let say a sentence “Rumah itu terbakar” is entered as the input, so the
output from the POS tagger is
Before tagging:
After tagging:
Rumah itu terbakar (The house is burnt)
KN
Pent KKttr
(Noun) (Determiner) (Intransitive Verb)
4.2.1.3 Malay Lexicon (MaLEX)
MaLEX is a computerized lexical database in Malay language which designed as a tool for
this research project in natural language parsing. In MaLEX, there are two tables. The first
table is for words and its lexical class, the second table is for rules and its probability value.
The first table, Lexicon, provides lexical class and meaning for the words.
MaLEX
contains 39,190 words that are based from Kamus Oxford Fajar 2nd Edition. (Hawkins,
70
1997)
The lexical classes which are provided in the lexicon have variety of classes. The
classes are KN (noun), KA (adjective), KSN (preposition), KKTr (transitive verb), KKttr
(intransitive verb), Bil (count) and Gel (title). The arrangement of words is in alphabetical
order. In the MaLEX also, it has variety types of words such as root words and also
derivative words like lari (run) and berlari (running).
The current version of this lexicon is MaLEX 1.0 which is in Access format (Microsoft
Corporation, 2002). It was built by a group of undergraduate students of Fakulti Teknologi
dan Sains Maklumat, Universiti Kebangsaan Malaysia.
Some examples of words from MaLEX is shown in the Table 4.1:
Table 4.1: A Few Examples of Words, Lexical Classes and their Synonyms from MaLEX
Words
Lexical Class
Synonyms
Itu
Pent
Kata petunjuk kepada satu benda
Rumah
KN
Binaan untuk tempat tinggal
terbakar
KKttr
Sedang atau sudah menyala kerana
terbakar
However, in this research project, there is an enhancement has been made to the MaLEX.
The second table is added to the lexicon, which contain Malay grammar rules and
probability values. The Malay grammar rules are derived from the training data. The rules
71
are associated with the probability value. Some examples of rules and their probability
values are shown in Table 4.2.
Table 4.2: A Few Examples of Rules and Probability Values from MaLEX
Rules
Probability Values
A→S+P
1.0000
S → FN
1.0000
P → FN
0.0800
FN → KN + Pent
0.1242
FN → KN + KN
0.2017
FN → Bil + KN
0.0169
4.2.1.4 Parsing Engine
The most important part in Malay Statistical Parser is parsing engine. Parsing engine is an
engine that parses a tagged sentence and assigned a probability to each of the rules that
involved in the parsed sentence. The engine will find the similarity between the tagged
sentence and the rules that are embedded inside the engine. The process is illustrated in the
Figure 4.5.
72
Process
Example
Bapa saya pemandu teksi
Input Sentence
POS tagger
MaLEX
POS tagger
Tokenization
Word
Bapa
saya
pemandu
teksi
………
…………..
POS
KN
KN
KN
KN
……..
...
POS label
PCFG
1.0000
1.0000
0.0800
FN → KN
FN → KN +
KN
FN → KN +
KN + KN
0.3965
0.2017
.......
...
0.0483
KN
Parsing Engine
.......
Rules
A→S+P
S→FN
P→FN
Bapa saya
pemandu
KN
KN
teksi
KN
Parsing Engine
...
Grammar rules
A → S + P (1.0000)
S → FN (1.0000)
P → FN (0.0800)
FN → KN (0.3965)
FN → KN + KN (0.2017)
FN → KN + KN+ KN (0.0483)
PCFG
Parse Tree
A
Parse Tree (1)
S
P
FN
FN
Probability value
Probability
value
= 1.0000 x
1.0000 x
0.0800 x
0.3965 x
0.0483
= 1.1532 x
10-3
A
S
P
FN
FN
KN KN KN
KN
Bapa saya pemandu teksi
Parse Tree (2)
Probability
value
= 1.0000 x
1.0000 x
0.0800 x
0.2017 x
0.2017
= 3.255 x
10-3
A
S
P
FN
FN
KN KN KN
KN
Bapa saya pemandu teksi
Figure 4.5: The Process in Parsing Engine
73
Based on Figure 4.5, the parser will choose the second parse tree with the highest value
probability for the sentence “Bapa saya pemandu teksi”.
Another example of parsing another sentence is shown in Figure 4.6. The sentence is
“Pemandu itu sangat letih.”
Process
Example
Input Sentence
MaLEX
Word
Itu
Letih
pemandu
sangat
.......
.......
POS tagger
POS tagger
POS
Pent
KA
KN
KPeng
...
...
Tokenization
POS label
Parsing Engine
Rules
A→S+P
S→FN
P→FA
PCFG
1.0000
1.0000
0.1410
FN → KN
+Pent
FA → KPeng
+ KA
.......
0.1242
Pemandu itu sangat letih
Pemandu
KN
itu
sangat
letih
Pent
KPeng
KA
Parsing Engine
A → S + P (1.0000)
S → FN (1.0000)
P → FA (0.1410)
FN → KN + Pent (0.1242)
FA → KPeng + KA (0.2017)
Grammar rules
PCFG
0.2017
...
Parse Tree + PCFG value
A
S
P
FN
FA
Parse Tree
PCFG value = 1.0000x 1.0000 x0.1410x
0.1242x0.201 = 3.520 x 10-3
A
KN
S
P
FN
FA
Pent
KPeng KA
Figure 4.6: Another Process in Parsing Engine
74
As the sentence “Pemandu itu sangat letih” is an unambiguous sentence, only
one parse tree is presented with a probability value.
4.2.1.5 Output Component
The output for this statistical parser is a parse tree. The output also provides the probability
value for the parsed sentence.
The output component shows a few parse trees and
probability values for the ambiguous sentence. For example, the output in the previous
example is shown in Figure 4.7.
Parse Tree (1):
Parse Tree (2):
A
A
S
P
S
P
FN
FN
FN
FN
KN KN KN
KN
Bapa saya pemandu teksi
Probability value = 1.1532 x 10-3
KN KN KN
KN
Bapa saya pemandu teksi
Probability value = 3.255 x 10-3
Figure 4.7: Example of a Parse Trees And Their Probability Values
4.2.2 User Interface Design
There is a main interface for Malay Statistical Parser. The Figure 4.8 shows the main
interface for the Statistical Malay Parser.
75
Input
component
Process of
tokenization
Figure 4.8: Main Interface for Malay Statistical Parser
In referring to Figure 4.8, the left hand side is the input component. The user enters “Bapa
saya pemandu teksi” as the input to the parser. The right hand side is the output of
the process tokenization. However, the process only happened after the user clicked button
„Proses‟.
As the sentence is successfully parsed, the message “Ayat ini sah” (the sentence is
syntactically correct) is shown as in the Figure 4.9.
Figure 4.9: Output for the Parser
Then, the parser shows the parse tree which has the highest probability value. For example,
the output for the sentence is shown in the Figure 4.10.
76
Figure 4.10: Output Component for sentence “bapa saya pemandu teksi”
The probability is calculated and the result is shown in the message box. The message box
is shown in the Figure 4.11.
Figure 4.11: One of the Probability Values for the Parsed Sentence
However, if the system is unable to parse the sentence, the error message will be presented.
There are two reasons when the sentence is failed to parse. The first reason is some words
in the sentence cannot be found in the Malay Lexicon (MaLEX). It was labeled at the right
hand side of the main interface window as demonstrated in Figure 4.12.
77
Figure 4.12: The main interface with the error message “Tiada dalam database”
The second reason is the parsed sentence does not match with the rules that are embedded
in the parser. So, the error message will be shown in the Figure 4.13.
Figure 4.13: Error Message for Unsuccessful Parsed Sentence
4.3 Summary of the Chapter
This chapter discusses the design and development process of the Malay Statistical Parser
prototype. It started with the requirement of the parser which contains functional and nonfunctional requirements. Then, it followed by system architecture and user interface design.
The evaluation of the parser will be discussed in Chapter 5.
78
CHAPTER 5:
EXPERIMENTS AND RESULTS
Chapter 5
Experiments and Results
This chapter will explain the test datasets that were used in the experiments which contain
four (4) different types pattern of Malay basic sentence. It will then proceed to the results
of the prototype using the three measurements, namely precision, recall and f-score.
5.1 Test Dataset
Test Dataset is a set of data that is used to test the parser. In this Malay Statistical Parser,
the test data that is used includes four patterns of basic sentences, which contains syntactic
ambiguity.
All the sentences were acquired from two experts in Malay language’s
grammar, who are Tuan Hj Sufian b Hj Afandi and Tuan Hj Nawi b Ismail. Both are
munsyi dewan who is appointed by Dewan Bahasa dan Pustaka (Dewan Bahasa dan
Pustaka Malaysia, 2010). All of the sentences are correct according to Malay language’s
grammar. The total of the test sentences is one hundred (100) sentences. Each of them
was tested with the prototype. The length of each test sentences is not more than ten
words.
Below is a list of the sentences that are used in the experiments.
Sentences for FN + FN (noun Phrase + Noun Phrase)
T2: Baju kakak batik Terengganu. (My sister’s attire is Terengganu batik)
T5: Pegawai itu pengurus syarikat (The officer is a company manager)
79
T15: Coklat koko makanan kegemaran adik (Chocolates is my sister’s favourite)
T23: Kakak saya guru sekolah (My sister is a teacher)
T24: Sepupu saya jurutera binaan (My nephew is a civil engineer)
T30: Adik saya murid sekolah rendah. (My brother is a pupil)
T33: Makcik saya kerani sekolah (My auntie is a school clerk)
T34: Kawan abang tentera laut (My brother’s friend is a navy)
T39: Gadis itu model sambilan. (The girl is a part-time model)
T48: Pulau peranginan milik kerajaan negeri (The island resort is owned by state
government)
T49: Kakak jurusolek butik pengantin (My sister is a bridal boutique beautician)
T50: Beliau atlet negara Malaysia. (He is a Malaysian athlete)
T52: rumah saya rumah kayu (My house is a wooden house).
T55: Pelajar itu pelajar cemerlang (The student is an excellent student).
T65: kereta idaman saya kereta mewah (My dream car is a luxury car)
T73: sekolah kami sekolah harapan (our schools are school expectations)
T74: guru saya guru besar (Our teacher is a headmaster)
T80: pakcik saya rakan kongsi ayah (My uncle is my father’s share partner)
T83: datuk saya pesara polis (My grandfather was a policeman)
T84: rakan saya pegawai bomba (My friend is a fireman officer)
T89: lelaki itu pengacara televisyen (The man is a television host)
T96: beliau pemimpin besar negara (he is a national leader)
T97: dia pelajar kolej swasta (She is a private college student)
T98: rakan kami pengusaha kedai perabot (Our friend is a furniture businessman).
80
T99: ibu pengusaha butik pengantin (My mother is a bridal boutique businesswoman)
T100: beliau pemain bola sepak (He is a footballer)
Sentence for FN + FK (Noun Phrase + verb Phrase)
T1: Kami adang air itu. (We deter the water)
T4: Saya bagi fakir itu wang. (I give money to the poor)
T7: Beg adik berwarna coklat (The colour of my younger brother’s bag is brown)
T10: Saya eskot pengetua ke pentas (I escort the principal to the stage)
T11: Kami garuk tanah. (We scratch the soil)
T12: Saya gali sendiri telaga itu. (I dig the well by myself)
T20: Baju kakak berwarna putih (The colour of my sister’s cloth is white)
T21: Kapal milik keluarga saya labuh di pelabuhan klang (My family’s ship wagons at Port
Klang)
T22: Aku lambung duit syiling (I lob the coins)
T25: saya kopek buah kelapa itu. (I peel the coconut fruit)
T29: Buah gajus rasa kelat. (The cashew fruit taste bitter)
T32: Aku cas bateri itu. (I charge the battery)
T37: Ibu saya bagi kucing itu makanan (My mother gives the food to the cat)
T38: guru kami bagi markah sangat rendah (Our teacher gives us too low marks)
T31: saya pagar reban itu (I enclose the hens’ shed)
T46: Kami bentuk adunan biskut itu (We shape the dough of biscuit)
T47: Pelajar malas itu benak dalam semua subjek (The lazy student slows in learning all
the subjects)
81
T51: Saya kepit suratkhabar itu. (I clamp the newspaper)
T54: orang kaya itu bagi sedekah (the rich man gives donation).
T57: Gigi adik berwarna putih (My brother’s teeth is white)
T60: Saya daftar subjek baru semester hadapan (I register the new subject for coming
semester)
T61: Kami garuk sungai yang cetek (We dig the shallow river)
T62: Saya gali lubang sampai dalam (I dig the hole deeply)
T70: kasut ayah berwarna coklat (My father’s shoes color is brown)
T72: kami lambung pengantin lelaki itu (We lobbed the bridegroom.)
T75: aku kopek buah limau itu. (I peel the orange fruit)
T79: buah strawberi rasa masam (The strawberry fruits taste sour)
T81: kami pagar kandang itu. (We gate the cage)
T82: kami cas generator ini (We charge this hgenerator)
T87: abang saya bagi budak itu duit (My brother gives the boy money)
T88: pengadil bagi mata sangat tinggi (the referee gives the high points)
Sentence for FN + FA
T3: Seterika antik sangat mahal. (The antique iron is very expensive)
T6: Atap rumah saya bocor. (My house’s roof is leaking)
T8: Pipi mukanya bengkak (Her face is swollen)
T9: Kata-kata wanita itu bisa. (Her words are really hurt)
T13: Nasi godak kenduri sangat sedap. (The special feast rice is very delicious)
T14: Puteri ketujuh paling gombang (The seventh princess is very pretty)
T16: Suara penyanyi wanita agak garuk (The singer’s voice is quite husky)
82
T17: Sakit hati saya makin buku (My heart feels so hurt)
T18: Buah beri itu manis. (The berry fruit is sweet)
T19: Ucapan guru kaunseling itu cacat. (The counseling teacher’s speech is flawed)
T26: Jawapan murid itu konkrit (The student’s answer is concrete)
T27: bunyi derai kaca amat ngilu (The sound of broken glass is very unpleasant)
T28: pelajar cemerlang sungguh dinamik (The excellent student is a very dynamic person)
T35: kawan kakak sangat cantik (My sister’s friend is very beautiful)
T36: kulit bayi sangat halus (The baby skin is so gentle)
T40: kereta kepunyaan ayah baru (My father’s car is new)
T41: Pembetulan tesis saya minor (My thesis’s corrections are minor)
T44: baju adik baru (my sister’s cloth is new)
T45: pembedahan ibu minor (my mother’s operation is minor)
T53: baju ibu sangat murah (My mom’s attire is so cheap).
T56: belon kepunyaan adik bocor (My younger sister balloons’ leak)
T58: Perut adik buntal (My brother’s stomach distended)
T59: sengat tebuan itu bisa (the sting hornet is hurt)
T63: sayur hijau sangat segar (the green vegetables is very fresh).
T64: kereta kepunyaan rakan amat besar (my friend’s car is very big).
T66: badan pesakit diabetis makin kurus (The diabetes patients become slimmer)
T67: kereta buatan tempatan makin mahal (The local cars become more expensive)
T68: badan pelakon itu langsing (The actress body is slim)
T69: bangunan milik kerajaan itu runtuh (the government’s building is collapsed.)
T71: kain sekolah pelajar itu labuh (the student’s skirt is trailing)
T76: binaan bangunan itu konkrit (The structure of the building is concrete)
83
T77: jiran rumah kami amat baik (our neighbor is so kind)
T78: adik saya sangat nakal (my brother is very naughty)
T85: bunga mawar sangat wangi (the rose is so sweet-smelling).
T86: anak kakak sangat comel (my sister’s daughter is so cute)
T90: kereta kepunyaan ayah baru (My father has a new car)
T91: Pembetulan tesis saya minor (My thesis correction is minor)
T94: perkakasan sekolah adik baru (My brother’s stationary is new)
T95: murid sekolah itu rajin (the school pupil is hardworking)
Sentence for FN + FS
T42: Bapa saya ke pejabat (My father goes to the office)
T43: Asal benang daripada kapas (Origin thread is from the cotton)
T92: rumah kami di bandar (My house is in the city)
T93: penduduk kampung ke sawah padi (villagers go to the paddy field)
There are three steps that each of the sentences should follow:
(1) Find the possible parses
(2) Assigning probabilities to each rules that are involved in the parsed sentence
(3) Determine the most probable one (the parse tree which has the highest probability
value)
Experiments were done on each sentence but only a few have been selected for the below
examples. Others can be referred in Appendix C. Each of the examples follows the three
steps.
84
Example 1: T2: Baju kakak batik Terengganu. (My sister’s attire is Terengganu batik)
STEP (1): Find the possible parses
Parse tree 1:
Parse tree 2:
STEP (2): Assigning probabilities to each rules that are involved in the parsed
sentence
Parse tree 1:
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN (0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x 0.2017 x
0.2017
= 3.255 x 10-3
Probability value
= 1.0000 x 1.0000 x 0.0800 x 0.3965 x
0.0483
= 1.532 x 10-3
STEP (3): Determine the most probable one (the parse tree which has the highest
probability value)
The parser chooses the first parse tree (Parse tree 1) as the structure for sentence “baju
kakak batik Terengganu” as the parse tree has the higher value of probability compares to
the parse tree (3.255 x 10-3 > 1.532 x 10-3)
85
Example 2: T1: Kami adang air itu. (We deter the water)
STEP (1): Find the possible parses
Parse tree 1:
Parse tree 2:
STEP (2): Assigning probabilities to each rules that are involved in the parsed
sentence
Parse tree 1:
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x 0.3964 x
0.4815 x 0.1242
= 1.752 x 10-2
Probability value
= 1.0000 x 1.0000 x 0.0800 x 0.2017 x
0.1242
= 2.004 x 10-3
STEP (3): Determine the most probable one (the parse tree which has the highest
probability value)
The parser chooses the first parse tree (Parse tree 1) as the structure for sentence “kami
adang air itu” as the parse tree has the higher value of probability compares to the parse
tree (1.752 x 10-2 > 2.004 x 10-3)
86
Example 3: T13: Nasi godak kenduri sangat sedap. (The special feast rice is very
delicious)
STEP (1): Find the possible parses
Parse tree 1:
Parse tree 2:
STEP (2): Assigning probabilities to each rules that are involved in the parsed
sentence
Parse tree 1:
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN (0.0483)
FA → KPeng + KA (0.2766)
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN(0.2017)
FN → KN + KPeng + KA (0.0008)
Probability value
= 1.0000 x 1.0000 x 0.1410 X 0.0483
X 0.2766
= 1.884 X 10-3
Probability value
= 1.0000 x 1.0000 x 0.0800 x 0.2017 x
0.0008
= 1.291 X 10-5
STEP (3): Determine the most probable one (the parse tree which has the highest
probability value)
The parser chooses the first parse tree (Parse tree 1) as the structure for sentence “kami
adang air itu”) as the parse tree has the higher value of probability compares to the parse
tree (1.884 x 10-3 > 1.291 x 10-5)
87
Example 4: T92: Rumah kami di bandar (My house is in the city)
STEP (1): Find the possible parses
Parse tree 1:
Parse tree 2:
STEP (2): Assigning probabilities to each rules that are involved in the parsed
sentence
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN + KN (0.2017)
FS → KSN+ FN (0.975)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0400 x 0.2017 x
0.975 x 0.3965
= 3.119 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KSN + KN (0.0146)
Probability value
= 1.0000 x 1.0000 x 0.0800 x 0.3965 x
0.0146
= 4.631 x 10-4
STEP (3): Determine the most probable one (the parse tree which has the highest
probability value)
The parser chooses the first parse tree (Parse tree 1) as the structure for sentence “kami
adang air itu”) as the parse tree has the higher value of probability compares to the parse
tree (3.119 x 10-3 > 4.631 x 10-4)
88
From those examples, we can conclude that the Malay Statistical Parser is quite successful
to minimize the structural ambiguity in Malay grammar rules. However, in the test data,
there are seven (7) sentences where the parse tree wrongly selects the best possible parse
tree. The sentences are listed below:
(1) T49: Kakak jurusolek butik pengantin
(2) T50: Beliau atlet negara Malaysia
(3) T94: perkakasan sekolah adik baru
(4) T96: beliau pemimpin besar negara
(5) T97: dia pelajar kolej swasta
(6) T98: rakan kami pengusaha kedai perabot
(7) T99: ibu pengusaha butik pengantin
One of the sentences is shown in the example 5.
Example 5: T49: Kakak jurusolek butik pengantin (My sister is a bridal boutique
beautician)
STEP (1): Find the possible parses
Parse tree 1:
Parse tree 2:
89
STEP (2): Assigning probabilities to each rules that are involved in the parsed
sentence
Parse tree 1:
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN (0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x 0.2017 x
0.2017
= 3.255 x 10-3
Probability value
= 1.0000 x 1.0000 x 0.0800 x 0.3965 x
0.0483
= 1.532 x 10-3
STEP (3): Determine the most probable one (the parse tree which has the highest
probability value)
The parser chooses the first parse tree (Parse tree 1) as the structure for sentence “kakak
jurusolek butik pengantin”) as the parse tree has the higher value of probability compares to
the parse tree (3.255 x 10-3> 1.532 x 10-3)
However, the parse tree that is chosen by the parser does not represent the best possible
parse tree. According to the munsyi dewan, the best possible parse tree should be the Parse
tree 2.
Based on the result, the subject of the sentence contains “kakak jurusolek”,
However, the predicate should be “kakak” only.
5.2 Results
By evaluating the performance and accuracy of the parsers, there are three different metrics
that were used recall, precision, and f-score (Carroll et. al, 1998).
90
Recall is identified as the ratio of the number of grammatical relations (GRs) returned by
the parsers that match GRs in the corresponding annotated sentence, divided by the total
number of GRs in the annotated sentences. GRs can be defined as a process when the
parser produces an output that remove away the details of the actual sentence but retains
the structure important for semantics (Carroll & Charniak, 1992).
Recall is computed by dividing the correct parsed sentences with the intended correct
parsed sentence.
Recall = correct parsed sentence
Intended correct parsed sentence
The second metric, precision is identified as ratio of the number of GRs returned by parser
that match, divided by the total number of GRs returned by the parser for that sentence.
We also used precision to determine the accuracy of the parser. In this parser, the precision
is calculated by dividing the correct parsed sentence with the all parsed sentence.
Precision = correct parsed sentence
all parsed sentence
The last metric is f-score. We can calculate this metric by adding the both value of
precision and recall. Then the result will be divided by two.
f-score = (precision + recall) / 2
The overall number of the test sentences is presented in the Table 5.1:
91
Table 5.1: The number of test sentences
Pattern
Number of Test Dataset
FN+FN
26
FN+FK
31
FN+FA
39
FN+FS
4
TOTAL TEST
100
SENTENCES
There are one hundred (100) test sentences that are used to evaluate the prototype. The
sentences are obtained from the Munsyi Dewan who is expert in Malay languages (dewan
Bahsa & Pustaka (2010). The highest number of the test sentences is 39, which belongs to
FN + FA pattern.
The results of the parser evaluation are categorized according the types of pattern
sentences. The steps of the calculation are shown in the following example.
For pattern FN +FN:
Recall = correct parsed sentence
Intended correct parsed sentence
= 19/19 = 100%
Precision = correct parsed sentence
All parsed sentence
= 19/26 = 73%
92
F-score = (precision + recall) / 2
= (100% + 73%) / 2 = 87%
For pattern FN+FK:
Recall = 31 /31
= 100%
Precision = 31/31
= 100%
F-score = (100%+100%)/2
= 100%
For pattern FN+FA:
Recall = 39/39
= 100%
Precision = 39/39
= 100%
F-score = (100%+100%)/2= 100%
For pattern FN+FS:
Recall = 4/4
= 100%
Precision = 4/4
= 100%
F-score = (100%+100%)/2
= 100%
93
All the results are simplified by using Table 5.2.
Table 5.2: Result of the experiments
Pattern
Recall
Precision
F-score
FN + FN
100%
73%
87%
FN + FK
100%
100%
100%
FN + FA
100%
100%
100%
FN + FS
100%
100%
100%
Average
100%
93.25%
96.75%
The Table 5.2 could also be represented using bar chart in the Figure 5.1.
Recall, Precision and F-score for Statistical Parser for
Malay Language
Percentage (%)
100
Recall
90
Precision
80
F-score
70
60
50
40
30
20
10
0
FN+FN
FN+FK
FN+FA
(Basic Sentence Pattern)
FN+FS
Figure 5.1: Recall, Precision and F-score for Statistical Parser for Malay Language
94
From Figure 5.1, the bar chart represents the results of experiment for statistical parser for
Malay language. The results are evaluated using three different evaluation metrics namely
recall, precision and f-score and we can see that each basic sentence pattern is almost 100%
except for precision of FN+FN pattern where it only achieved 73%.
Precision has the highest percentage compared to other measurements for each of the
pattern except for FN+FN rules. It means that most tested sentences are correct in parsing.
All the test sentences are tested by Munsyi Dewan. The average of the measurement shows
that the parser has good performance.
5.3 Summary of the Chapter
In this chapter, the experiments and results have been discussed. Next is the discussion of
its efficiency, limitations and future enhancements which will be described in the final
chapter.
95
CHAPTER 6:
CONCLUSION
Chapter 6
Conclusion
This chapter sums up this dissertation and provides some conclusions from the results of
this research. It also highlights the strengths of Malay Statistical Parser, the prototype
developed for this study. Some limitations that exist in this prototype are also listed and can
be improved through suggestions given in the Future Enhancements section.
6.1 Fulfillment of Research Objectives
The objectives of this research were defined in Section 1.3 of Chapter One. Here, we will
review the objectives and see if they were fulfilled as expected.
(1) To calculate the probability of Malay’s grammar rules
This objective was achieved through the Probabilistic Malay Grammar (Chapter
Three). In that chapter, probability values are computed for Malay grammar rules.
One thousand basic sentences are collected from various sources such as primary
school textbooks and Malay grammar books. Then, the probability values are
calculated based on how many the rules occur in the training data. These processes
are demonstrated detail in Appendix A and Appendix B. There are one hundred and
fourty seven (147) rules are derived from the training data.
96
(2) To develop a prototype of a statistical parser for the Malay language.
This objective is achieved in Chapter Four that describes the process of designing
and implementing Malay Statistical Parser, the prototype for this dissertation.
6.2 Malay Statistical Parser
The Malay Statistical Parser is an initial idea for the development of statistical parser in
Malay language. Since there is no probability for Malay grammar rules was computed, a
thousand sentences are gathered to calculate the values. There are one hundred and fourty
seven (147) rules in Malay grammar that are associated with probability values.
In Malay language also, there is no corpus that has been developed. In this research
project, the MaLEX, which is Malay Lexicon, is used. Initially in the MaLEX, there is
only one table inside it. The table is Lexical Table, which has two entities, words and
lexical classes. An enhancement has been made to the MaLEX where a new table has been
added. The table is Rule Table, which has Malay grammar rules and their probability
values.
The parser could minimize the structural ambiguity in Malay language. This reduction of
ambiguity can be seen in ALL patterns of sentences, which are shown, in Appendix C. All
the test sentences are evaluated by Munsyi Dewan.
The evaluation of the parser is measured based on three measurements; namely precision,
recall and f-score. The results are good where the parser has achieved 100% recall, 93.25%
precision and 96.75% f-score.
97
6.3 Limitations
As the Malay Statistical Parser is still at the prototype level, there are some features or
aspects which are still not completely developed, thus become the limitations of the parser.
(1) Only a limited number of words are used as the test dataset
The total number of the training data that are listed in the Appendix A is one
thousand (1000) sentences. The length of each sentence is not more than ten (10)
words. Thus, we should also test the parser using a sentence which has less than ten
words in length.
(2) The type of test data set should only basic and declarative sentence.
The grammar that is used in Malay Statistical Parser is Context-Free Grammar
(CFG). The CFG is rule that form a basic sentence only. The sentence should
contain one subject and one predicate only.
The CFG is also for declarative
sentence.
6.4 Future Enhancement
Further enhancement is necessary as the current parser only involved one hundred and
fourty seven (147) rules with associate probability values. It is recommended that further
research be undertaken in the following areas:
(1) After analysed the result of the experiments, there is a need for development of
more detail online corpus of Malay words. As there is no such standard online
Malay corpus available (Ahmad Izuddin et. al 2007), we suggest an online tagging
large data set must be developed, so that it able to provide a better results to the
parser.
98
(2) The POS tagger should tag the word accurately according to its usage. Probabilistic
POS tagging should be implemented to solve the problem. For example, the word
‘semak’ should have the two probability values because it has two lexical classes,
either KN or KKTr.
(3) The size of grammar rules should be expanded. In this study, only one hundred and
fourty seven (147) rules have the probability values. The rules may expand more if
the size of training data is extended.
6.5 Summary of the Chapter
This chapter marks the end of this dissertation report. It has summarized the essential
elements that form the basis of this research. From this chapter, it is seen that the objectives
of this dissertation have been achieved. The research has also provided a corpus of a Malay
sentence with the probabilistic scores namely the PCFG of Malay Sentences. Like other
available good parsers developed for English language such as Stanford Parser (Klein &
Manning, 2003), our proposed PCFG for Malay language can also be applied in many
applications.
For instance, the parser can be used as an engine for Malay Grammar
Checker.
99
REFERENCES
REFERENCES
Abdullah Hassan, 1993, Tatabahasa Pedagogi Bahasa Melayu, Utusan Publications &
Distributors Sdn Bhd.
Abdullah Hassan & Ainon Mohd, 1994a, Bahasa Melayu untuk Maktab Perguruan,
Penerbit Fajar Bakti Sdn Bhd Kuala Lumpur.
Abdullah Hassan & Ainon Mohd, 1994b, Tatabahasa Dinamika Berdasarkan Tatabahasa
Dewan, Penerbit Fajar Bakti Sdn Bhd Kuala Lumpur.
Abu Naim Kassan, 2001, Wawasan PMR Tatabahasa, Pustaka Delta Pelajaran Sdn Bhd,
Petaling Jaya.
Ahmad I. Z. Abidin, Yong, S. P., Rozana Kasbon & Hazreen Azman, 2007, Utilizing TopDown Parsing Technique In The Development of a Malay Language Sentence
Parser, Proceeding of the 2 nd International Conference of Informatics, Universiti
Malaya, Kuala Lumpur.
Allen, J, 1995, Natural Language Understanding, The Benjamin/Cumming Publishing
Company, Inc, Redwood City, California.
Azhar Simin, 1988, Discourse-Syntax of “YANG” in Malay (Bahasa Malaysia), Dewan
Bahasa dan Pustaka, Kuala Lumpur.
Bach, K., 1998, Ambiguity, Routledge Encyclopedia of Philosophy, Routledge, London
Booth, T. L., 1969, Probabilistic Representation of Formal Languages, Tenth Annual
IEEE Symposium on Switching and Automata Theory.
Carroll, J., Briscoe, T., & Sanfilippo, A., 1998, Parser Evaluation: A Survey and a New
Proposal, Proceedings of the First International Conference on Language
resources and Evaluation, pp. 447-454.
Carroll, G., & Charniak, E., 1992, Two Experiments on Learning Probabilistic Dependency
Grammar from Corpora, Workshop Notes, Statistically-Based NLP Techniques,
pp. 1-13.
Charniak, E., 1993, Statistical Language Learning, MIT Press, Cambridge Massachusetts,
London, UK.
Charniak, E., 1997, Statistical Parsing with a Context-free Grammar and Word Statistics,
Proceedings of the National Conference on Artificial Intelligence, John Wiley
& Son Ltd, USA.
Charniak, E., 2000, A Maximum-Entropy-Inspired Parser, ACM International Conference
Proceeding Series, The MIT Press.
Chomsky, H., 1957, Syntactic Structure, The Hague, The Netherlands: Moutan.
100
Chomsky, N., 1966, Syntactic Structures, The Sixth Printing, Mounton & Co.
Chomsky, N., 1971, Problems of Knowledge Freedom – The Russell Lectures, Pantheon
Books, A division of Random House, New York.
Chomsky, N., 1975, Reflections of Language, Pantheon, New York.
Chomsky, N., 1980, Rules and Representations, Columbia University Press, New York.
Collins, M. J., 2003, Head-Driven Statistical Models For Natural Language Parsing, Vol.
29, No. 4, pp. 589-637, MIT Press Journal
Dewan Bahasa dan Pustaka Malaysia, 2010, Risalah Munsyi Dewan, online, retrieved 23
December 2010 from
http://appw05.dbp.gov.my/dokumen/risalah_munsyi_dewan.pdf
Fellbaum, C., 1999, WordNet: An Electronic Lexical Database, MIT Press.
Francis, W. N., & Kucera, H., 1979, Brown Corpus Manual,
http://www.hit.uib.no/icame/brown/bcm.html
Grune, D., & Jacobs, C., 1990, Parsing Techniques A Practical Guide, Ellis Horwood
Limited, West Sussex, England.
Hawkins, J. M., 2001, Kamus Dwibahasa Oxford Fajar, Oxford Fajar SDN. BHD.,
Selangor.
Hawryszkiewycz, I. T., 1998, Introduction to System Analysis and Design, Fourth Edition,
Australia: Prentice Hall Australia Pty. Ltf.
Hoenisch, S., 2004, Indentifying and Resolving Ambiguity, online, retrieved 21 January
2009, from http://www.criticism.com/linguistics/types-of-ambiguity.php
Jurafsky, D., Daniel, & Martin, J. H., 2000, Speech and Language Processing: An
Introduction to Natural Language Processing, Speech Recognition, and
Computational Linguistics, Prentice-Hall, Upper Saddle River, New Jersey.
Klein, D., & Manning, C., 2003, Accurate Unlexicalized Parsing, Proceedings of the
Association for Computational Linguistics (ACL).
Lakeland, C., & Knott, A., 2004, Implementing Lexicalised Parser, University of Otago,
New Zealand.
Magerman, D. M., 1995, Statistical Decision-tree Models for Parsing, Proceedings of the
33rd Annual Meeting of the Association for Computational Linguistics, pp.
276-283.
Manning, C. D., & Schutze, H., 1999, Foundations of Statistical Natural Language
Processing, The MIT Press, Cambridge, Massachusetts.
101
Marcus, M., Kim, G., & Marcinkiewicz, A.M., 1993, Building a Large Annotated Corpus
of English: the Penn Treebank, Computional Linguistics, Volume 19, Issue
2, The MIT Press, pp. 313-330.
Meyer, P. G., et al., 2002, Synchronic English Linguistics An Introduction, Gunter Narr
Verlag Tubingen, Germany.
Microsoft
Corporation, 2002, Copyright, online, retrieved 1 December
http://www.microsoft .com/about/legal/en/us/Copyright/Default.aspx
2008,
Mohanty, S., & Balabantaray R. C., 2003, Intelligent Parsing In Natural Language
Processing, 8th International Workshop of Parsing Technologies.
Mohd Juzaiddin Ab Aziz, et. al., 2006, Pola Grammar Technique For Grammatical
Relation Extraction In Malay Language, Malaysian Journal of Computer
Science, Vol 19, No. 1, pp. 59-72, University of Malaya.
Nik Safiah Karim, 1975, The Major Syntactic Structures of Bahasa Malaysia and their
Implication of the standardization of the Language, PhD Dissertation, Ohio
University.
Nik Safiah Karim, 1995, Malay Grammar for Academics and Professionals, Dewan Bahasa
dan Pustaka, Kuala Lumpur.
Nik Safiah Karim, Farid M. Onn, Hashim Haji Musa & Abdul Hamid Mahmood, 2009,
Tatabahasa Dewan Edisi Ketiga, Dewan Bahasa dan Pustaka, Kuala
Lumpur.
Pfleeger, S. L., 1998, Software Engineering Theory and Practice, International Edition,
USA: Prenticw Hall.
Yeoh, C.K., 1979, Interaction of Rules in Bahasa Malaysia, PhD Dissertation, University of
Illinois at Urbana Champaign.
Yuan, Y., 1997, Statistics Based Approaches Towards Chinese Language Processing, PhD
Dissertation, National University of Singapore.
Zainal Arifin Yusof, Kamarudin Jeon & Mohd Nasar Sukor, 2008, Buku Teks Bahasa
Melayu Sekolah Kebangsaan Tahun 1 Kurikulum Bersepadu Sekolah
Rendah (KBSR), Dewan Bahasa dan Pustaka Kuala Lumpur.
Zainal Arifin Yusof, Kamarudin Jeon & Mohd Nasar Sukor, 2005, Buku Latihan dan
Aktiviti Bahasa Melayu Sekolah Kebangsaan Tahun 1 KBSR, Dewan
Bahasa dan Pustaka Kuala Lumpur.
102
APPENDIX A
EXAMPLES OF TRAINING DATA
APPENDIX A: TRAINING DATA
Some Examples of Training Data that are tagged manually by Munsyi Dewan
Index /
Sentence
Number
Sentence
Tagging Process
Rules Text are Match
(1)
Air banjir itu kami adang
Kami adang air banjir itu.
KN KKTr KN KN Pent
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KN + Pent
(2)
Barang permainan murah buatan negeri
China tentu ada aibnya.
Barang permainan murah buatan negeri China
tentu ada aibnya.
KN
KN
KA KN
KN
KN
KBantu KKTr KN
A→S+P
S → FN
P → FK
FN → KN + KN + KA + KN + KN
FK → KBantu + KKTr + FN
FN → KN
(3)
Ali berasa sungguh aib atas pelakuannya .
Ali berasa sungguh aib atas pelakuannya .
KN KKTr KPeng KA KSN KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KPeng + KA + KSN + KN
(4)
Bau minyak wangi alkoholik tahan lama.
Bau minyak wangi alkoholik tahan lama.
KN KN KN
KN
KKTTr KA
A→S+P
S → FN
P → FK
FN → KN + KN + KN + KN
FK → KKTTr + KA
103
(5)
Pemain-pemain bola bawah 18 tahun itu
masih amatur.
Pemain-pemain bola bawah 18 tahun itu masih
amatur.
KN
KN KN Bil KN Pent
KBantu KA
(6)
Amatur sandiwara menjadi kegemarannya pada masa cuti semester.
Amatur sandiwara menjadi kegemarannya pada
masa cuti semester.
KN
KN
KKTr KN
KSN KN
KN KN
(7)
Kami adang air yang mengalir itu.
Kami adang air yang mengalir itu.
KN KKTr KN KBantu KKTr Pent
(8)
Kebanyakan hotel di tepi pantai membina
limbung berduri.
Kebanyakan hotel di tepi pantai membina limbung berduri.
KN
KN KSN KN KN KKTr KN
KKTTr
(9)
Sikap Hasim yang limbung menjelikkan
orang.
Sikap Hasim yang limbung menjelikkan orang.
KN KN
KBantu KA KKTr
KN
(10)
Orang membenci sikap Hasim yang limbung.
Orang membenci sikap Hasim yang limbung.
KN
KKTr
KN KN KBantu KA
A→S+P
S → FN
P → FK
FN → KN +
FK → KKTTr + KA
A→S+P
S → FN
P → FK
FN → KN + KN
FK → KKTr + FN
FN → KN + KSN + KN + KN + KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KBantu + KKTr + Pent
A→S+P
S → FN
P → FK
FN → KN + KN + KSN + KN + KN
FK → KKTr + FN
FN → KN + KKTTr
A→S+P
S → FN
P → FK
FN → KN + KN + KSN + KN + KN
FK → KKTr + FN
FN → KN + KKTTr
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KN + KBantu + KA
104
A→S+P
S → FN
P → FK
FN → KN + KN + KN + Pent
FK → KKTr + FN
FN → KN + KN + KN
A→S+P
S → FN
P → FA
FN → KN + KN
FA → KA + KSN + KN + KN + KBantu + KA
(11)
Lonjong bentuk songkok itu mendapat
permintaan orang ramai.
Lonjong bentuk songkok itu mendapat permintaan orang ramai.
KN
KN
KN
Pent KKTr KN
KN KN
(12)
Kakak saya lonjong daripada kakak yang
lain.
Kakak saya lonjong daripada kakak yang lain.
KN KN KA KSN KN KN KBantu KA
(13)
Hantu bersifat mistik mengikut kepercayaan masyarakat.
Hantu bersifat mistik mengikut kepercayaan
masyarakat.
KN KKTr KN KKTr KN KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KKTr + KN + KN
(14)
Pak Li mengamalkan ilmu mistik.
Pak Li mengamalkan ilmu mistik.
KN KN KKTr
KN KN
A→S+P
S → FN
P → FK
FN → KN + KN
FK → KKTr + FN
(15)
Ubat batuk itu amat mustajab.
Ubat batuk itu amat mustajab.
KN KN Pent KPeng KA
A→S+P
S → FN
P → FA
FN → KN + KN + Pent
FA → KPeng + KA
105
(16)
Mustajabnya doa seorang ibu bapa adalah
mengikut amalan mereka.
Mustajabnya doa seorang ibu bapa adalah mengikut amalan mereka.
KN
KN Bil PenjBil KN KN KN
KKTr KN KN
A→S+P
S → FN
P → FK
FN → KN + KN + Bil + PenjBil + KN + KN
+ KN
FK → KKTr + FN
FN → KN + KN
(17)
Nenek suka makan pinang
Nenek suka makan pinang
KN KBantu KKTr KN
A→S+P
S → FN
P → FK
FN → KN
FK → KBantu + KKTr + FN
(18)
Saya pinang gadis kampung itu.
Saya pinang gadis kampung itu.
KN KKTr KN KN
Pent
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KN + Pent
(19)
Kereta itu kereta prebet.
Kereta itu kereta prebet.
KN Pent KN KN
A→S+P
S → FN
P → FN
FN → KN + Pent
FN → KN + KN
(20)
Raga itu besar sangat.
Raga itu besar sangat.
KN Pent KA KPeng
A→S+P
S → FN
P → FA
FN → KN + Pent
FA → KA + KPeng
106
(21)
Pakaian seragamnya agak rambu pada
hari ini.
Pakaian seragamnya agak rambu pada hari ini.
KN
KN
KBantu KA KSN KN Pent
A→S+P
S → FN
P → FA
FN → KN + KN
FA → KBantu + KA + KSN + KN + Pent
(22)
Tali khemah itu diikat pada rambu.
Tali khemah itu diikat pada rambu.
KN KN Pent KKTTr KSN KN
A→S+P
S → FN
P → FK
FN → KN + KN + Pent
FK → KKTTr + KSN + KN
(23)
Para menteri membincangkan rang undang-undang jalan raya.
Para menteri membincangkan rang undangundang jalan raya.
KN KN KKTr KN KN KN KN
(24)
Padi itu ditanam rapat.
Padi itu ditanam rapat.
KN Pent KKTTr KA
(25)
Rapat umum itu sangat aman.
Rapat umum itu sangat aman.
KN KN Pent
KPeng KA
A→S+P
S → FN
P → FK
FN → KN + KN + Pent
FK → KKTTr + KSN + KN
A→S+P
S → FN
P → FK
FN → KN + Pent
FK → KKTTr + KA
A→S+P
S → FN
P → FA
FN → KN + KN + Pent
FA → KPeng + KA
(26)
Cuti raya tahun ini dua hari sahaja.
Cuti raya tahun ini dua hari sahaja.
KN KN KN Pent Bil KN KN
A→S+P
S → FN
P → FN
FN → KN + KN + KN + Pent
FN → Bil + KN + KN
107
(27)
Ali sedang releks di kerusi malasnya.
Ali sedang releks di kerusi malasnya.
KN KBantu KKTTr KSN KN KN
A→S+P
S → FN
P → FK
FN → KN
FK → KBantu + KKTTr + KSN + KN + KN
(28)
Dia menjawab soalan guru dengan releks
sahaja.
Dia menjawab soalan guru dengan releks sahaja.
KN KKTr KN KN KSN KN KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KN + KSN + KN + KN
(29)
Resmi padi sangat bagus dicontohi.
Resmi padi sangat bagus dicontohi.
KN KN KPeng KA KKTTr
A→S+P
S → FN
P → FA
FN → KN + KN
FA → KPeng + KA + KKTTr
(30)
Ali begitu ribut sekali di tempat kenduri
kahwin adiknya.
Ali begitu ribut sekali di tempat kenduri kahwin
adiknya.
KN
KBantu KA KPeng KSN KN KN KN
KN`
A→S+P
S → FN
P → FA
FN → KN
FA → KBantu+KA
+KPeng+KSN+KN+KN+KN+KN`
(31)
Hamidah saing baik saya di sekolah.
Hamidah saing baik saya di sekolah.
KN
KN
KN KN KSN KN
A→S+P
S → FN
P → FN
FN → KN
FN → KN + KN + KN + KSN + KN
108
A→S+P
S → FN
P → FK
FN → KN
FK → KNafi + KKTTr + KSN + KN + KN
A→S+P
S → FN
P → FN
FN → KN
FN → KN + KN + KN + Pent + KN
(32)
Kecantikannya tiada saing di kelas kami.
Kecantikannya tiada saing di kelas kami.
KN
KNafi KKTTr KSN KN KN
(33)
Saya saksi pertandingan bola sepak itu
semalam.
Saya saksi pertandingan bola sepak itu semalam.
KN
KN KN
KN KN Pent KN
(34)
Saksi kejadian itu telah meninggal dunia.
Saksi kejadian itu telah meninggal dunia.
KN KN Pent KBantu KKTr KN
A→S+P
S → FN
P → FK
FN → KN + KN + Pent
FK → KBantu + KKTr + FN
FN → KN
(35)
Kubu pertahanan musuh sapih akibat
serangan tentera bersekutu.
Kubu pertahanan musuh sapih akibat serangan
tentera bersekutu.
KN
KN
KN KA KN KN KN KN
(36)
Kami telah sapih anak lelaki kami minggu lepas
Kami telah sapih anak lelaki kami minggu lepas.
KN KBantu KKTtr KN KN KN KN KN
A→S+P
S → FN
P → FA
FN → KN + KN + KN
FA → KA + KN + KN + KN + KN
A→S+P
S → FN
P → FK
FN → KN
FK → KBantu + KKTr + FN
FN → KN + KN + KN + KN + KN
109
(37)
Saya sebak rambut saya berkali-kali.
Saya sebak rambut saya berkali-kali.
KN KKTr KN KN KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KN + KN
(38)
Dia sedang membasuh.
Dia sedang membasuh.
KN KBantu KKTTr
A→S+P
S → FN
P → FK
FN → KN
FK → KBantu + KKTTr
(39)
Ali berkeluarga sedang.
Ali berkeluarga sedang.
KN KKTr KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
(40)
Seksi penerbitan Syarikat Tunas sangat
aktif.
Seksi penerbitan Syarikat Tunas sangat aktif.
KN KN KN KN KPeng KA
(41)
Pakaian itu sungguh seksi
Pakaian itu sungguh seksi.
KN Pent KPeng KA
(42)
Pintu itu mempunyai selak aluminium.
Pintu itu mempunyai selak aluminium.
KN Pent KKTr KN KN
A→S+P
S → FN
P → FA
FN → KN + KN + KN + KN
FA → KPeng + KA
A→S+P
S → FN
P → FA
FN → KN + Pent
FA → KPeng + KA
A→S+P
S → FN
P → FK
FN → KN + Pent
FK → KKTr + FN
FN → KN + KN
110
(43)
Siling rumah itu kami seling dengan kayu
jati.
Siling rumah itu kami seling dengan kayu jati.
KN KN Pent KN KKTTr KSN KN KN
A→S+P
S → FN
P → FK
FN → KN + KN + Pent
FK → KKTTr + KSN + KN + KN
(44)
Seling besar itu telah pecah.
Seling besar itu telah pecah.
KN KN Pent KBantu KA
(45)
Kami selusur papan gelongsor itu.
Kami selusur papan gelongsor itu.
KN KKTr KN
KN Pent
A→S+P
S → FN
P → FA
FN → KN + KN + Pent
FA → KBantu + KA
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KN + Pent
(46)
Selusur jambatan kayu itu telah patah.
Selusur jambatan kayu itu telah patah.
KN KN KN Pent KBantu KKTTr
A→S+P
S → FN
P → FK
FN → KN + KN + KN + Pent
FK → KBantu + KKTTr
(47)
Kawasan rumah Ali semak.
Kawasan rumah Ali semak.
KN
KN KN KN
A→S+P
S → FN
P → FN
FN → KN
FN → KN + KN + KN
(48)
Saya semak semula kertas soalan itu.
Guru itu terpaksa semak semula kertas ujian
Hassan.
KN Pent KBantu KKTTr
A→S+P
S → FN
P → FK
FN → KN + Pent
FK → KBantu + KKTTr
111
(49)
Pintu rumah saya sempal.
Pintu rumah saya sempal.
KN KN
KN KN
A→S+P
S → FN
P → FN
FN → KN
FN → KN + KN + KN
(50)
Kami sempal lubang itu dengan kain basahan.
Kami sempal lubang itu dengan kain basahan.
KN KKTr KN Pent KSN KN KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + Pent + KSN + KN + KN
(51)
Sengkek dari Bangladesh itu baru tiba ke
Malaysia.
Sengkek dari Bangladesh itu baru tiba ke Malaysia.
KN
KSN KN
Pent KBantu KKTTr
KSN KN
A→S+P
S → FN
P → FK
FN → KN + KSN + KN + Pent
FK → KBantu + KKTTr + KSN + KN
(52)
Dia sudah jatuh sengkek.
Dia sudah jatuh sengkek.
KN KBantu KKTTr KA
A→S+P
S → FN
P → FK
FN → KN + KSN + KN + Pent
FK → KBantu + KKTTr + KSN + KN
(53)
Ali menjala ikan sepat di sungai itu.
Ali menjala ikan sepat di sungai itu.
KN KKTr KN KN KSN KN Pent
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KN + KSN + KN + Pent
112
(54)
Serang hendap tentera itu berjaya.
Serang hendap tentera itu berjaya.
KN
KN
KN Pent KKTTr
A→S+P
S → FN
P → FK
FN → KN + KN + KN + Pent
FK → KKTTr
(55)
Budak itu berani saya serang pada bilabila masa sahaja.
Budak itu berani saya serang pada bila-bila
masa sahaja.
KN Pent KA KN KKTTr KSN KN KN KN
A→S+P
S → FN
P → FK
FN → KN + Pent + KA + KN
FK → KKTTr + KSN + KN + KN + KN
(56)
Bulu surai di tengkuk kuda itu berwarna
perang.
Bulu surai di tengkuk kuda itu berwarna perang.
KN KN KSN KN KN Pent KKTr KN
A→S+P
S → FN
P → FK
FN → KN + KN +KSN + KN + KN + Pent
FK → KKTr + FN
FN → KN
(57)
Pihak kami surai majlis itu jam 4.30 petang.
Pihak kami surai majlis itu jam 4.30 petang.
KN KN KKTr KN Pent KN Bil KN
A→S+P
S → FN
P → FK
FN → KN + KN
FK → KKTr + FN
FN → KN + Pent + KN + Bil + KN
(58)
Orang ramai bersorak seperti bunyi tagar.
Orang ramai bersorak seperti bunyi tagar.
KN KN
KKTTr KSN KN
KN
A→S+P
S → FN
P → FK
FN → KN + KN
FK → KKTTr + KSN + KN + KN
113
(59)
Besi murah itu mudah tagar .
Besi murah itu mudah tagar .
KN KN Pent KA KN
A→S+P
S → FN
P → FA
FN → KN + KN + Pent
FA → KA + KN
(60)
Dia telah makan nasi.
Dia telah makan nasi.
KN KBantu KKTr KN
A→S+P
S → FN
P → FK
FN → KN
FK → KBantu + KKTr + FN
(61)
Aku telah Halim akan memperoleh 8A
dalam peperiksaannya.
Aku telah Halim akan memperoleh 8A dalam
peperiksaannya.
KN KBantu KN KBantu KKTr KN KSN KN
A→S+P
S → FN
P → FK
FN → KN + KBantu + KN + KBantu
FK → KKTr + FN
FN → KN + KSN + KN
(62)
Terawang songket itu amat kemas.
Terawang songket itu amat kemas.
KN
KN Pent KPeng KA
A→S+P
S → FN
P → FA
FN → KN + KN + Pent
FA → KPeng + KA
(63)
Dia terawang jauh mengenangkan masa
mudanya.
Dia terawang jauh mengenangkan masa mudanya.
KN KKTr KA KKTr KN KN
A→S+P
S → FN
P → FK
FN → KN + KKTr + KA
FK → KKTr + FN
FN → KN + KN
114
(64)
Dua ulas durian itu busuk.
Dua ulas durian itu busuk.
KN PenjBil KN Pent KA
A→S+P
S → FN
P → FA
FN → KN + PenjBil + KN + Pent
FA → KA
(65)
Kain ulas itu diperbuat daripada kapas
Kain ulas itu diperbuat daripada kapas
KN KN Pent KKTTr KSN KN
A→S+P
S → FN
P → FK
FN → KN + KN + Pent
FK → KKTTr + KSN + KN
(66)
Adik Fatimah bermain di padang.
Adik Fatimah bermain di padang.
KN KN
KKTTr KSN KN
A→S+P
S → FN
P → FK
FN → KN + KN
FK → KKTTr + KSN + KN
(67)
Pemandu itu membelok ke kanan.
Pemandu itu membelok ke kanan.
KN Pent
KKTTr KSN KN KN
A→S+P
S → FN
P → FK
FN → KN + Pent
FK → KKTTr + KSN + KN + KN
(68)
Peristiwa itu disaksikan oleh dua ratus
orang.
Peristiwa itu disaksikan oleh dua ratus orang.
KN Pent KKTTr KSN Bil KN KN
A→S+P
S → FN
P → FK
FN → KN + Pent
FK → KKTTr + KSN + Bil + KN + KN
(69)
Padi sedang menguning.
Padi sedang menguning.
KN KBantu KKTTr
A→S+P
S → FN
P → FK
FN → KN
FK → KBantu + KKTTr
115
(70)
Pemuda itu tersenyum.
Pemuda itu tersenyum.
KN Pent KKTTR
A→S+P
S → FN
P → FK
FN → KN + Pent
FK → KKTTr
(71)
Padi sedang menguning di sawah.
Padi sedang menguning di sawah.
KN KBantu KKTTr KSN KN
A→S+P
S → FN
P → FK
FN → KN
FK → KBantu + KKTTr + KSN + KN
(72)
Pemuda itu tersenyum seorang diri.
Pemuda itu tersenyum seorang diri.
KN Pent KKTTr KN KN
A→S+P
S → FN
P → FK
FN → KN + Pent
FK → KKTTr + KN + KN
(73)
Mereka asyik berbual di kedai kopi.
Mereka asyik berbual di kedai kopi.
KN KBantu KKTTr KSN KN KN
A→S+P
S → FN
P → FK
FN → KN
FK → KBantu + KKTTr + KSN + KN + KN
(74)
Dia menjadi guru.
Dia menjadi guru.
KN KKTr KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
(75)
Lukanya beransur baik.
Lukanya beransur baik.
KN KKTTr KA
A→S+P
S → FN
P → FK
FN → KN
FK → KKTTr + KA
116
(220)
Saya berumur lapan tahun.
Saya berumur lapan tahun.
KN KKTr
Bil KN
A→S+P
S → FN
P → FK
FN → KN + KN
FK → KKTr + FN
FN → Bil + KN
(221)
Tiga orang budak lemas.
Tiga orang budak lemas.
Bil PenjBil KN KKTTr
A→S+P
S → FN
P → FK
FK → KKTTr
FN → Bil +PenjBil+ KN
(222)
Saya hendak tahu tentang bumi.
Saya hendak tahu tentang bumi.
KN KBantu KKTTr KBantu KN
A→S+P
S → FN
P → FK
FN → KN
FK → KBantu + KKTTr + KBantu + KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTTr +KSN + KN
(394)
Ali pergi ke London.
Ali pergi ke London.
KN KKTTr KSN KN
(397)
Dua ekor belalang di atas daun.
Dua ekor belalang di atas daun.
Bil PenjBil KN KSN KN KN
A→S+P
S → FN
P → FS
FN → Bil + PenjBil+KN
FS → KSN + FN
FN → KN + KN
117
(411)
Pelajar itu belajar dengan bersungguhsungguh.
Pelajar itu belajar dengan bersungguh-sungguh.
KN Pent KKTTr KSN KKTTr
A→S+P
S → FN
P → FK
FN → KN + KN + Pent
FK → KKTTr + KSN + KKTTr
(412)
Mereka itu berjalan pada waktu pagi.
Mereka itu berjalan pada waktu pagi.
KN
Pent KKTTr KSN KN
KN
A→S+P
S → FN
P → FK
FN → KN + Pent
FK → KKTTr + KSN + KN + KN
(696)
Peniaga itu memperdaya pelanggannya
dengan janji-janji manis.
Peniaga itu memperdaya pelanggannya dengan
janji-janji manis.
KN Pent KKTr KN KSN KN KA
A→S+P
S → FN
P → FK
FN → KN + Pent
FK → KKTr + FN
FN → KN + KSN + KN + KA
(697)
Dia memperkacang harta orang tuanya
sehingga habis.
Dia memperkacang harta orang tuanya sehingga
habis.
KN KKTr
KN KN KN
KSN KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KN + KN + KSN + KN
(701)
Saya menduduki kerusi yang kosong itu.
Saya menduduki kerusi yang kosong itu.
KN KKTr
KN KBantu KN Pent
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + KBantu + KN + Pent
118
(702)
Saya mendudukkan anak itu pada kerusi
yang di hadapan.
Saya mendudukkan anak itu pada kerusi yang di
hadapan.
KN
KKTr
KN Pent KSN KN KBantu
KSN KN
A→S+P
S → FN
P → FK
FN → KN
FK → KKTr + FN
FN → KN + Pent + KSN + KN + KBantu +
KSN + KN
(904)
Kumpulan pelajar Malaysia itu bertolak
esok pagi ke Jepun.
Kumpulan pelajar Malaysia itu bertolak esok
pagi ke Jepun.
KN KN KN Pent KKTr KN KN KSN KN
A→S+P
S → FN
P → FK
FN → KN + KN + KN + Pent
FK → KKTr + FN
FN → KN + KN + KSN + KN
(933)
Sahabat penanya menulis surat yang sangat panjang.
Sahabat penanya menulis surat yang sangat
panjang.
KN KN KKTr KN KBantu KPeng KA
A→S+P
S → FN
P → FK
FN → KN + KN
FK → KKTr + FN
FN → KN + KBantu + KPeng + KA
(999)
Rasa buah itu enak sekali.
Rasa buah itu enak sekali.
KN KN Pent KA KPeng
A→S+P
S → FN
P → FA
FN → KN + KN + Pent
FA → KA + KPeng
(1000)
Nasibnya sungguh malang.
Nasibnya sungguh malang.
KN KPeng KA
A→S+P
S → FN
P → FA
FN → KN
FA → KPeng + KA
119
APPENDIX B
RULES AND RESPECTIVE INDEX SENTENCES
Num
RULE
1.
A→S+P
LHS
Rule
Count
1000
RHS
Rule
Count
1000
RHS/ LHS
Probability
The index sentence from training data according
to respective rule
Comment
1000/1000
1.0000
(1) – (1000)
1000/1000
1.0000
(1) – (1000)
All sentences have this
rule.
All sentences have this
rule.
There are 80 sentences
have FN as predicate.
2.
S → FN
1000
1000
3.
P → FN
1000
80
80/1000
0.0800
(21), (30), (38), (40), (56), (58), (106), (107),
(119), (128), (149), (165), (167), (169), (170),
(171), (172), (186), (194), (201), (202), (203),
(204), (205), (206), (207), (209), (210), (211),
(212), (213), (214), (215), (216), (217), (243),
(297), (300), (303), (304), (307), (308), (316),
(317), (401), (441), (446), (468), (470), (471),
(478), (479), (480), (501), (502), (503), (505),
(506), (507), (508), (509), (510), (512), (519),
(546), (571), (572), (573), (576), (578), (788),
(789), (794), (803), (896), (962), (971), (972),
(981), (982)
4.
P → FS
1000
40
40/1000
0.0400
(129), (163), (173), (195), (289), (299), (327),
(389), (395), (397), (400), (434), (445), (498),
(499), (500), (531), (532), (533), (534), (564),
(807), (906), (909), (954), (955), (956), (957),
(958), (959), (960), (961), (964), (990), (991),
(992), (993), (994), (995), (996)
There are 40 sentences
have FS as predicate.
5.
FN → KN
+ Pent
1304
162
162/ 1304
(20), (21), (22), (49), (50), (57), (79), (80), (84),
(89), (94), (103), (107), (109), (110), (111),
(113), (139), (156), (174), (175), (183), (184),
(188), (192), (195), (235), (292), (295), (296),
(297), (304), (308), (318), (376), (410), (412),
(430), (431), (427), (433), (434), (441), (444),
(462), (475), (480), (485), (493), (494), (497),
(499), (501), (502), (520), (521), (523), (524),
(534), (536), (545), (546), (548), (549), (550),
There are 162 rules involve
for rule FN → KN + Pent
0.1242
120
APPENDIX B:
Some Rules and Their Index Sentences from Training Data According to Their Respective Rules
Num
RULE
LHS
Rule
Count
RHS
Rule
Count
RHS/ LHS
Probability
The index sentence from training data according
to respective rule
Comment
(553), (554), (559), (560), (568), (571), (572),
(573), (574), (575), (576), (577), (579), (581),
(602), (604), (610), (611), (615), (624), (626),
(627), (628), (642), (644), (645), (648), (650),
(651), (656), (663), (677), (680), (686), (690),
(691), (692), (694), (695), (696), (705), (712),
(719), (727), (725), (732), (733), (739), (735),
(741), (751), (758), (761), (763), (779), (782),
(799), (824), (834), (850), (854), (855), (858),
(859), (860), (861), (862), (863), (864), (867),
(874), (879), (883), (888), (891), (894), (896),
(910), (911), (912), (916), (917), (929), (930),
(938), (945), (954), (955), (956), (957), (959),
(963), (964), (977), (993), (994), (996)
6.
FA →
Kpeng +
KA
141
39
39/141
0.2766
(16), (29), (48), (49), (72), (103), (137), (145),
(161), (175), (176), (183), (281), (298), (310),
(314), (376), (378), (409), (433), (524), (542),
(544), (545), (548), (549), (581), (770), (775),
(780), (784), (785), (923), (936), (941), (946),
(980), (998), (1000)
There are 39 out of 141
rules are related to rule FA
→ Kpeng + KA
7.
FK →
KKTTr +
KN + KN +
KN
FS →
KBantu +
KSN + FN
704
2
2/704
0.0028
(226), (227)
There are 2 sentences
involve rule FK → KKTTr
+ KN + KN + KN
(129)
There is only one rule of
FS → KBantu + KSN +
FN in the training data
index (129)
8.
40
1
1/40
0.025
121
APPENDIX C
RESULTS OF TEST DATA
APPENDIX C
RESULTS OF THE TEST DATA
Index
Parse Trees
Sentence
T1
Sentence: Kami adang air itu.
Probability values
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
The best parse tree that represents “Kami adang
air itu” is Parse Tree 1.
122
Index
Parse Trees
Sentence
T2
Sentence: Baju kakak batik Terengganu.
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “Baju kakak
batik Terengganu” is Parse Tree 1.
T3
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: Seterika antik sangat mahal.
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.2017 x 0.2766
= 7.886x10-3
123
Index
Parse Trees
Sentence
Parse tree 2:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KPeng + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538x10-5
The best parse tree that represents “Seterika
antik sangat mahal” is Parse Tree 1.
T4
Sentence: Saya bagi fakir itu wang.
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent + KN
(0.0015)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3965 x 0.4815 x 0.0015
= 2.116x10-4
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN (0.3965)
FK → KSN+ FN (0.975)
FN → KN + Pent + KN
(0.0015)
Parse tree 2
The best parse tree that represents “saya bagi
fakir itu wang” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x 0.0400
X 0.3965 x 0.975 x 0.0015
= 2.320x10-5
124
Index
Parse Trees
Sentence
T5
Sentence: Pegawai itu pengurus syarikat
Parse tree 1
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + Pent (0.1242)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.1242 x 0.2017
= 2.004 x 10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + Pent + KN
(0.0015)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0015x 0.3965
= 4.758x 10-5
The best parse tree that represents “Pegawai itu
pengurus syarikat” is Parse Tree 1.
T6
Sentence: Atap rumah saya bocor.
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0483 x 0.3121
= 2.125 x 10-3
125
Index
Parse Trees
Sentence
Probability values
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
The best parse tree that represents “atap rumah
saya bocor” is Parse Tree 1.
T7
Sentence: Beg adik berwarna coklat
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FN → KKTTr + KA (0.0483)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.2017 x 0.0483
= 7.199 x 10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FN → KKTTr + KN (0.0028)
The best parse tree that represents “beg adik
berwarna coklat” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.2017 x 0.0028
= 4.174 x 10-4
126
Index
Parse Trees
Sentence
T8
Sentence: Pipi mukanya bengkak
Probability values
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0. 2017X 0.3121
= 8.876 X 10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FA → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0. 2017 X 0.3965
= 6.398 X 10-3
The best parse tree that represents “pipi
mukanya bengkak” is Parse Tree 1.
T9
Sentence: Kata-kata wanita itu bisa.
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN+ Pent
(0.0744)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0.0744 X 0.3121
= 3.274 X 10-3
127
Index
Parse Trees
Sentence
Parse tree 2:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN+ Pent
(0.0744)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0.0744 X 0.3965
= 2.360 X 10-3
The best parse tree that represents “kata-kata
wanita itu bisa” is Parse Tree 1.
T10
Sentence: Saya eskot pengetua ke pentas
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + KSN + KN
(0.0146)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.3965 x 0.4815 x 0.0146
= 2.206x10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.0483)
FN → KN + KSN + KN
(0.0146)
The best parse tree that represents “saya eskot
pengetua ke pentas” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x0.08000
x 0.0483x 0.0146
= 5.641 X 10-5
128
Index
Parse Trees
Sentence
T11
Sentence: Kami garuk tanah.
Probability values
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.3965 x 0.4815 x 0.3965
= 5.5946x10-2
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN (0.3965)
FA → KA + KN (0.3121)
The best parse tree that represents “kami garuk
tanah” is Parse Tree 1.
T12
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0.0483 X 0.3121
= 2.125 X 10-3
Sentence: saya gali sendiri telaga itu.
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + KN + Pent
(0.0744)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.3965 x 0.4815 x 0.0146
= 2.206x10-3
129
Index
Parse Trees
Sentence
Probability values
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN +
Pent (0.0092)
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0.3965 X 0.0092
= 2.918 X 10-4
The best parse tree that represents “saya gali
sendiri telaga itu” is Parse Tree 1.
T13
Sentence: Nasi godak kenduri sangat sedap
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0.0483 X 0.2766
= 1.884 X 10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN(0.2017)
FN → KN + KPeng + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0.2017 X 0.0008
= 1.291 X 10-5
The best parse tree that represents “nasi godak
kenduri sangat sedap” is Parse Tree 1.
130
Index
Parse Trees
Sentence
T14
Probability values
Sentence: Puteri ketujuh paling gombang
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.2017 x 0.2766
= 7.886x10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KPeng + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538x10-5
The best parse tree that represents “puteri
ketujuh paling gombang” is Parse Tree 1.
T15
Sentence: Coklat koko makanan kegemaran
adik
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN + KN
(0.0483)
Probability value
131
Index
Parse Trees
Sentence
Probability values
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.0483
= 7.794 x 10-4
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN + KN
(0.0115)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0115
= 3.645x 10-4
The best parse tree that represents “coklat koko
makanan kegemaran adik” is Parse Tree 1.
T16
Sentence: Suara penyanyi wanita agak garuk
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KBantu + KA
(0.0780)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0483x0.0780
= 5.279x10-4
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
132
Index
Parse Trees
Sentence
Probability values
FN → KN (0.3965)
FA → KN + KN + KBantu +
KA (0.0023)
Probability value
= 1.0000 x 1.0000 x 0.0.0800
x 0.3965x0.0023
= 7.296x10-5
The best parse tree that represents “suara
penyanyi wanita agak garuk” is Parse Tree 1.
T17
Sentence: Sakit hati saya makin buku
Parse tree 1:
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KBantu + KA
(0.0780)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0483x0.0780
= 5.279x10-4
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN (0.3965)
FN → KN + KN + KBantu +
KA (0.0023)
The best parse tree that represents “sakit hati
saya makin buku” is Parse Tree 1.
T18
Probability value
= 1.0000 x 1.0000 x 0.0.0800
x 0.3965x0.0023
= 7.296x10-5
Sentence: Buah beri itu manis.
Parse tree 1:
133
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + Pent
(0.0744)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0744x0.3121
= 3.274x10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + Pent + KA
(0.0015)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.3965x0.0023
= 1.286x10-4
The best parse tree that represents “buah beri itu
manis” is Parse Tree 1.
T19
Sentence: Ucapan guru kaunseling itu cacat.
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN +KN +
Pent (0.0092)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0092x0.3121
= 4.049x10-4
Parse tree 2
134
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN(0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent + KA
(0.0008)
The best parse tree that represents “ucapan guru
kaunseling itu cacat” is Parse Tree 1.
T20
Sentence: Baju kakak berwarna putih
Parse tree 1:
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017x0.0008
= 1.291x10-5
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FN → KKTTr + KA (0.0483)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.2017 x 0.0483
= 7.199 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FN → KKTTr + KN (0.0028)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.2017 x 0.0028
= 4.174 x 10-4
The best parse tree that represents “baju kakak
berwarna putih” is Parse Tree 1.
T21
Sentence: Kapal milik keluarga saya labuh di
pelabuhan klang
A → S + P (1.0000)
135
Index
Parse Trees
Sentence
Parse tree 1
Probability values
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN + KN +KN
(0.0115)
FN → KKTTr + KSN + KN
+ KN (0.0511)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.0115 x 0.0511
= 4.343 x 10-4
Parse tree 2:
The best parse tree that represents “kapal milik
keluarga saya labuh di pelabuhan klang” is
Parse Tree 1.
T22
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN + KN
(0.0115)
FN → KA + KSN + KN +
KN (0.0071)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0115 x 0.0071
= 1.151 x 10-5
Sentence: Aku lambung duit syiling
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.3965 x 0.4815 x 0.2017
= 2.846x10-2
136
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 2:
The best parse tree that represents “aku
lambung duit syiling” is Parse Tree 1.
T23
Sentence: kakak saya guru sekolah
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
137
Index
Parse Trees
Sentence
Probability values
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
The best parse tree that represents “kakak saya
guru sekolah” is Parse Tree 1.
T24
Sentence: sepupu saya jurutera binaan
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “sepupu saya
jurutera binaan” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
138
Index
Parse Trees
Sentence
T25
Probability values
Sentence: saya kopek buah kelapa itu.
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + KN + Pent
(0.0744)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3965 x 0.4815 x 0.0744
= 1.050 x 10-2
Parse tree 2:
The best parse tree that represents “saya kopek
buah kelapa itu” is Parse Tree 1.
T26
Sentence: Jawapan murid itu konkrit
Parse tree1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.2017)
FN → KN + KN + Pent
(0.0744)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.0744
= 1.2001 x 10-3
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + Pent
(0.0744)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0744x0.3121
= 3.274x10-3
Parse tree 2:
A → S + P (1.0000)
139
Index
Parse Trees
Sentence
Probability values
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN + Pent
(0.0744)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0744x0.3965
= 2.360x10-3
The best parse tree that represents “jawapam
murid itu itu” is Parse Tree 1.
T27
Sentence: bunyi derai kaca amat ngilu
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN +
KN(0.0483)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0483 x 0.2766
= 1.884x10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KPeng + KA
(0.0008)
The best parse tree that represents “bunyi derai
kaca amat ngilu” is Parse Tree 1.
T28
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.0008
= 1.291x10-5
Sentence: pelajar cemerlang sungguh dinamik
Parse tree 1:
140
Index
Parse Trees
Sentence
Probability values
A→ S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.2017 x 0.2766
= 7.886x10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KPeng + KA
(0.0008)
The best parse tree that represents “pelajar
cemerlang sungguh dinamik” is Parse Tree 1.
T29
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538x10-5
Sentence: Buah gajus rasa kelat.
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FK → KKTr + FN (0.4815)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.2017x 0.4815 x 0.3965
= 2.846x10-2
Parse tree 2:
A → S + P (1.0000)
141
Index
Parse Trees
Sentence
Probability values
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 3:
The best parse tree that represents “buah gajus
rasa kelat” is Parse Tree 1.
T30
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: Adik saya murid sekolah rendah.
Parse tree1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN + KN
(0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.0483
= 7.794 x 10-4
Parse tree 2:
142
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN + KN
(0.0115)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0115
= 3.645x 10-4
The best parse tree that represents “adik saya
murid sekolah rendah” is Parse Tree 1.
T31
Sentence: saya pagar reban itu
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
The best parse tree that represents “saya pagar
reban itu” is Parse Tree 1.
T32
Sentence: aku cas bateri itu.
Parse tree 1:
143
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
The best parse tree that represents “aku cas
bateri itu” is Parse Tree 1.
T33
Sentence: makcik saya kerani sekolah
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
Parse tree 2:
144
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
The best parse tree that represents “makcik saya
kerani sekolah” is Parse Tree 1.
T34
Sentence: kawan abang tentera laut
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
The best parse tree that represents “kawan
abang tentera laut” is Parse Tree 1.
T35
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
Sentence: kawan kakak sangat cantik
145
Index
Parse Trees
Sentence
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.2017 x 0.2766
= 7.886x10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KPeng + KA
(0.0008)
The best parse tree that represents “kawan
kakak sangat cantik” is Parse Tree 1.
T36
Sentence: kulit bayi sangat halus
Parse tree 1:
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538x10-5
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.2017 x 0.2766
= 7.886x10-3
Parse tree 2:
A → S + P (1.0000)
146
Index
Parse Trees
Sentence
Probability values
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KPeng + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538x10-5
The best parse tree that represents “kulit bayi
sangat halus” is Parse Tree 1.
T37
Sentence: ibu saya bagi kucing itu makanan
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FK → KKTr + FN (0.4815)
FN → KN + Pent + KN
(0.0015)
Probability value
= 1.0000 x 1.0000 x 0.7390
X 0.2017 x 0.4815 x 0.0015
= 1.077x10-4
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN+KN (0.2017)
FK → KSN+ FN (0.975)
FN → KN + Pent + KN
(0.0015)
T38
The best parse tree that represents “ibu saya
bagi kucing itu makanan” is Parse Tree 1.
Sentence: guru kami bagi markah sangat rendah
Probability value
= 1.0000 x 1.0000 x 0.0400
X 0.2017 x 0.975 x 0.0015
= 1.180x10-5
147
Index
Parse Trees
Sentence
Probability values
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FK → KKTr + FN (0.4815)
FN → KN + KPeng + KA
(0.0008)
Parse tree 2:
Probability value
= 1.0000 x 1.0000 x 0.7390
X 0.2017 x 0.4815 x 0.0008
= 5.742x10-5
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN+KN (0.2017)
FK → KSN+ FN (0.975)
FN → KN + KPeng + KA
(0.0008)
The best parse tree that represents “guru kami
bagi markah sangat rendah” is Parse Tree 1.
T39
Sentence: gadis itu model sambilan.
Parse tree 1:
Probability value
= 1.0000 x 1.0000 x 0.0400
X 0.2017 x 0.975 x 0.0008
= 6.293x10-6
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + Pent (0.1242)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.1242 x 0.2017
= 2.004 x 10-3
Parse tree 2:
A → S + P (1.0000)
148
Index
Parse Trees
Sentence
Probability values
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + Pent + KN
(0.0015)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0015x 0.3965
= 4.758x 10-5
The best parse tree that represents “gadis itu
model sambilan” is Parse Tree 1.
T40
Sentence: kereta kepunyaan ayah baru
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0483 x 0.3121
= 2.125 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “kereta
kepunyaan ayah baru” is Parse Tree
T41
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: Pembetulan tesis saya minor
149
Index
Parse Trees
Sentence
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0483 x 0.3121
= 2.125 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “pembetulan
tesis saya minor” is Parse Tree 1.
T42
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: bapa saya ke pejabat
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN + KN (0.2017)
FS → KSN+ FN (0.975)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0400 x
0.2017 x 0.975 x 0.3965
= 3.119 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
150
Index
Parse Trees
Sentence
Probability values
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KSN + KN
(0.0146)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0146
= 4.631 x 10-4
The best parse tree that represents “bapa saya
ke pejabat” is Parse Tree 1.
T43
Sentence: asal benang daripada kapas
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN + KN (0.2017)
FS → KSN+ FN (0.975)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0400 x
0.2017 x 0.975 x 0.3965
= 3.119 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KSN + KN
(0.0146)
The best parse tree that represents “asal benang
daripada kapas” is Parse Tree 1.
T44
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0146
= 4.631 x 10-4
Sentence: baju adik baru
151
Index
Parse Trees
Sentence
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0. 2017X 0.3121
= 8.876 X 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FA → KN (0.3965)
The best parse tree that represents “baju adik
baru” is Parse Tree 1.
T45
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0. 2017 X 0.3965
= 6.398 X 10-3
Sentence: pembedahan ibu minor
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0. 2017X 0.3121
= 8.876 X 10-3
Parse tree 2:
152
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FA → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0. 2017 X 0.3965
= 6.398 X 10-3
The best parse tree that represents “pembedahan
ibu minor” is Parse Tree 1.
T46
Sentence: Kami bentuk adunan biskut itu
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + KN + Pent
(0.0744)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.3965 x 0.4815 x 0.0744
= 1.050x10-2
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN +
Pent (0.0092)
The best parse tree that represents “kami bentuk
adunan biskut itu” is Parse Tree 1.
T47
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0.3965 X 0.0092
= 2.918 X 10-4
Sentence: pelajar malas itu benak dalam semua
153
Index
Parse Trees
Sentence
subjek
Parse tree 1:
Parse tree 2:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + Pent
(0.0744)
FA → KA + KSN + KN +
KN (0.0071)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0744 x 0.0071
= 7.448 X 10-5
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN + Pent
(0.0744)
FN → KN + KSN + KN +
KN (0.0038)
The best parse tree that represents “pelajar
malas itu benak dalam semua subjek” is Parse
Tree 1
T48
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0744 x 0.0038
= 2.262 X 10-5
Sentence: pulau peranginan milik kerajaan
negeri
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN + KN
(0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.0483
= 7.794 x 10-4
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
154
Index
Parse Trees
Sentence
Probability values
FN → KN (0.3965)
FN → KN + KN + KN + KN
(0.0115)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0115
= 3.645x 10-4
T49
The best parse tree that represents “pulau
peranginan milik kerajaan negeri” is Parse Tree
1
Sentence: kakak jurusolek butik pengantin
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “kakak
jurusolek butik pengantin” is Parse Tree 1.
T50
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: beliau atlet negara Malaysia
155
Index
Parse Trees
Sentence
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Parse tree 2:
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
The best parse tree that represents “beliau atlet
negara malaysia” is Parse Tree 1.
T51
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
Sentence: saya kepit suratkhabar itu.
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
Parse tree 2
156
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
T52
The best parse tree that represents “saya kepit
suratkhabar itu.” is Parse Tree 1.
Sentence: rumah saya rumah kayu
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “rumah saya
rumah kayu” is Parse Tree 1.
T53
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: baju ibu sangat murah
157
Index
Parse Trees
Sentence
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.2017 x 0.2766
= 7.886x10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KPeng + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538x10-5
The best parse tree that represents “baju ibu
sangat murah” is Parse Tree 1.
T54
Sentence: orang kaya itu bagi sedekah
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN + Pent
(0.0744)
FK → KKTr + FN (0.4815)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.0744 x 0.4815 x 0.3965
= 1.050x10-2
158
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN + KN + Pent
(0.0744)
FK → KSN+ FN (0.975)
FN → KN (0.3965)
Parse tree 2
The best parse tree that represents “orang kaya
itu bagi sedekah” is Parse Tree 1.
T55
Probability value
= 1.0000 x 1.0000 x 0.0400
X 0.0744 x 0.975 x 0.3965
= 1.150x10-3
Sentence: Pelajar itu pelajar cemerlang
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + Pent (0.1242)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.1242 x 0.2017
= 2.004 x 10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + Pent + KN
(0.0015)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0015x 0.3965
= 4.758x 10-5
The best parse tree that represents “Pelajar itu
pelajar cemerlang” is Parse Tree 1.
159
Index
Parse Trees
Sentence
T56
Sentence: belon kepunyaan adik bocor.
Probability values
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0483 x 0.3121
= 2.125 x 10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “belon
kepunyaan adik bocor” is Parse Tree 1.
T57
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: Gigi adik berwarna putih
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FN → KKTTr + KA (0.0483)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.2017 x 0.0483
= 7.199 x 10-3
160
Index
Parse Trees
Sentence
Parse tree 2
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FN → KKTTr + KN (0.0028)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.2017 x 0.0028
= 4.174 x 10-4
The best parse tree that represents “beg adik
berwarna coklat” is Parse Tree 1.
T58
Sentence: Perut adik buntal
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0. 2017X 0.3121
= 8.876 X 10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FA → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0. 2017 X 0.3965
= 6.398 X 10-3
The best parse tree that represents “perut adik
bintal” is Parse Tree 1.
161
Index
Parse Trees
Sentence
T59
Sentence: sengat tebuan itu bisa
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN+ Pent
(0.0744)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0.0744 X 0.3121
= 3.274 X 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN+ Pent
(0.0744)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0.0744 X 0.3965
= 2.360 X 10-3
The best parse tree that represents “sengat
tebuan itu bisa” is Parse Tree 1.
T60
Sentence: Saya daftar subjek baru semester
hadapan
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + KN + KN + KN
(0.0115)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.3965 x 0.4815 x 0.0115
= 2.206x10-3
162
Index
Parse Trees
Sentence
Parse tree 2
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.0483)
FN → KN + KN + KN + KN
+ KN (0.0023)
The best parse tree that represents “Saya daftar
subjek baru semester ini” is Parse Tree 1.
T61
Probability value
= 1.0000 x 1.0000 x0.08000
x 0.0483x 0.0023
= 5.641 X 10-5
Sentence: Kami garuk sungai yang cetek
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + KBantu + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.3965 x 0.4815 x 0.0008
= 1.129x10-4
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KN + KBantu + KA
(0.0008)
The best parse tree that represents “kami garuk
sungai yang cetek” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0.2017 x 0.0008
= 2.275 X 10-5
163
Index
Parse Trees
Sentence
T62
Sentence: saya gali lubang sampai dalam
Parse tree 1
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + KKTr + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.3965 x 0.4815 x 0.0008
= 1.129x10-4
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN+ KN (0.2017)
FN → KN + KKTr + KA
(0.0008)
The best parse tree that represents “saya gali
lubang sampai dalam” is Parse Tree 1.
T63
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0.2017 X 0.0008
= 1.291 X 10-5
Sentence: sayur hijau sangat segar
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.2017 x 0.2766
= 7.886x10-3
164
Index
Parse Trees
Sentence
Parse tree 2
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KPeng + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538x10-5
The best parse tree that represents “sayur hijau
sangat segar” is Parse Tree 1.
T64
Sentence: kereta kepunyaan rakan amat besar
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0.0483 X 0.2766
= 1.884 X 10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN(0.2017)
FN → KN + KPeng + KA
(0.0008)
The best parse tree that represents “kereta
kepunyaan rakan amat besar” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x 0.0800
X 0.2017 X 0.0008
= 1.291 X 10-5
165
Index
Parse Trees
Sentence
T65
Sentence: kereta idaman saya kereta mewah
Probability values
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN + KN
(0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.0483
= 7.794 x 10-4
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN + KN
(0.0115)
T66
The best parse tree that represents “kereta
idaman saya kereta mewah” is Parse Tree 1.
Sentence: badan pesakit diabetis makin kurus
Parse tree 1:
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0115
= 3.645x 10-4
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KBantu + KA
(0.0780)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0483x0.0780
= 5.279x10-4
166
Index
Parse Trees
Sentence
Parse tree 2:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FA → KN + KN + KBantu +
KA (0.0023)
Probability value
= 1.0000 x 1.0000 x 0.0.0800
x 0.3965x0.0023
= 7.296x10-5
The best parse tree that represents “badan
pesakit diabetis makin kurus” is Parse Tree 1.
T67
Sentence: kereta buatan tempatan makin mahal
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KBantu + KA
(0.0780)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0483x0.0780
= 5.279x10-4
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN (0.3965)
FN → KN + KN + KBantu +
KA (0.0023)
The best parse tree that represents “kereta
buatan tenpatan makin mahal” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x 0.0.0800
x 0.3965x0.0023
= 7.296x10-5
167
Index
Parse Trees
Sentence
T68
Sentence: badan pelakon itu langsing
Probability values
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + Pent
(0.0744)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0744x0.3121
= 3.274x10-3
Parse tree 2
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + Pent + KA
(0.0015)
The best parse tree that represents “badan
pelakon itu langsing” is Parse Tree 1.
T69
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.3965x0.0023
= 1.286x10-4
Sentence: bangunan milik kerajaan itu runtuh
Parse tree 1
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN +KN +
Pent (0.0092)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0092x0.3121
= 4.049x10-4
168
Index
Parse Trees
Sentence
Parse tree 2
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN(0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent + KA
(0.0008)
The best parse tree that represents “bangunan
milik kerajaan itu runtuh ” is Parse Tree 1.
T70
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017x0.0008
= 1.291x10-5
Sentence: kasut ayah berwarna coklat
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FN → KKTTr + KA (0.0483)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.2017 x 0.0483
= 7.199 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FN → KKTTr + KN (0.0028)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.2017 x 0.0028
= 4.174 x 10-4
The best parse tree that represents “kasut ayah
berwarna coklat” is Parse Tree 1.
169
Index
Parse Trees
Sentence
T71
Sentence: kain sekolah pelajar itu labuh
Parse tree 1
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.141)
FN → KN + KN +KN +
Pent (0.0092)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.141 x
0.0092 x0.3121
= 4.049x 10-4
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN + KN + KN
(0.0115)
FN → KA + KSN + KN +
KN (0.0071)
The best parse tree that represents “kain pelajar
sekolah itu labuh” is Parse Tree 1.
T72
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.0115 x 0.0071
= 8.133 x 10-5
Sentence: kami lambung pengantin lelaki itu
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + KN + Pent
(0.0744)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.3965 x 0.4815 x 0.0744
= 1.050x10-1
Parse tree 2:
170
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN +
Pent (0.0092)
The best parse tree that represents “kami
lambung pengantin lelaki itu” is Parse Tree 1
.
T73
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0092
= 2.918 x 10-4
Sentence: sekolah kami sekolah harapan
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “sekolah
kami sekolah harapan” is Parse Tree 1.
T74
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: guru saya guru besar
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
171
Index
Parse Trees
Sentence
Probability values
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
The best parse tree that represents “guru saya
guru besar” is Parse Tree 1.
T75
Sentence: aku kopek buah limau itu.
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + KN + Pent
(0.0744)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3965 x 0.4815 x 0.0744
= 1.050 x 10-2
Parse tree 2:
A → S + P (1.0000)
172
Index
Parse Trees
Sentence
Probability values
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.2017)
FN → KN + KN + Pent
(0.0744)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.0744
= 1.2001 x 10-3
The best parse tree that represents “aku kopek
buah limau itu” is Parse Tree 1.
T76
Sentence: binaan bangunan itu konkrit
Parse tree1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + Pent
(0.0744)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0744x0.3121
= 3.274x10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN + Pent
(0.0744)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0744x0.3965
= 2.360x10-3
The best parse tree that represents “binaan
bangunan itu konkrit” is Parse Tree 1.
T77
Sentence: jiran rumah kami amat baik
173
Index
Parse Trees
Sentence
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN +
KN(0.0483)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0483 x 0.2766
= 1.884x10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KPeng + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.0008
= 1.291x10-5
The best parse tree that represents “jiran rumah
kami amat baik” is Parse Tree 1.
T78
Sentence: adik saya sangat nakal
Parse tree 1:
A→ S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.2017 x 0.2766
= 7.886x10-3
Parse tree 2:
A → S + P (1.0000)
174
Index
Parse Trees
Sentence
Probability values
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KPeng + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538x10-5
The best parse tree that represents “adik saya
sangat nakal” is Parse Tree 1.
T79
Sentence: buah strawberi rasa masam
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FK → KKTr + FN (0.4815)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.7390x
0.2017x 0.4815 x 0.3965
= 2.846x10-2
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 3:
A → S + P (1.0000)
175
Index
Parse Trees
Sentence
Probability values
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
The best parse tree that represents “buah
strawberi rasa masam” is Parse Tree 1.
T80
Sentence: pakcik saya rakan kongsi ayah
Parse tree1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN + KN
(0.0483)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.0483
= 7.794 x 10-4
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN + KN
(0.0115)
The best parse tree that represents “pakcik saya
rakan kongsi ayah” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0115
= 3.645x 10-4
176
Index
Parse Trees
Sentence
T81
Sentence: kami pagar kandang itu.
Parse tree 1
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
The best parse tree that represents “kami pagar
kandang itu” is Parse Tree 1.
T82
Sentence: kami cas generator ini
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
177
Index
Parse Trees
Sentence
Parse tree 2:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
The best parse tree that represents “kami cas
generator ini” is Parse Tree 1.
T83
Sentence: datuk saya pesara polis
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
The best parse tree that represents “makcik saya
kerani sekolah” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
178
Index
Parse Trees
Sentence
T84
Sentence: rakan saya pegawai bomba
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
The best parse tree that represents “rakan saya
pegawai bomba” is Parse Tree 1.
T85
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
Sentence: bunga mawar sangat wangi
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.2017 x 0.2766
= 7.886x10-3
179
Index
Parse Trees
Sentence
Parse tree 2:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KPeng + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538x10-5
The best parse tree that represents “bunga
mawar sangat wangi” is Parse Tree 1.
T86
Sentence: anak kakak sangat comel
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN (0.2017)
FA → KPeng + KA (0.2766)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.2017 x 0.2766
= 7.886x10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KPeng + KA
(0.0008)
The best parse tree that represents “anak kakak
sangat comel” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538x10-5
180
Index
Parse Trees
Sentence
T87
Sentence: abang saya bagi budak itu duit
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FK → KKTr + FN (0.4815)
FN → KN + Pent + KN
(0.0015)
Probability value
= 1.0000 x 1.0000 x 0.7390
X 0.2017 x 0.4815 x 0.0015
= 1.077x10-4
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN+KN (0.2017)
FK → KSN+ FN (0.975)
FN → KN + Pent + KN
(0.0015)
The best parse tree that represents “abang saya
bagi budak itu duit” is Parse Tree 1.
T88
Probability value
= 1.0000 x 1.0000 x 0.0400
X 0.2017 x 0.975 x 0.0015
= 1.180x10-5
Sentence: pengadil bagi mata sangat tinggi
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN + KN (0.2017)
FK → KKTr + FN (0.4815)
FN → KN + KPeng + KA
(0.0008)
Probability value
= 1.0000 x 1.0000 x 0.7390
X 0.2017 x 0.4815 x 0.0008
181
Index
Parse Trees
Sentence
Parse tree 2:
Probability values
= 5.742x10-5
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN+KN (0.2017)
FK → KSN+ FN (0.975)
FN → KN + KPeng + KA
(0.0008)
The best parse tree that represents “pengadil
bagi mata sangat tinggi” is Parse Tree 1.
T89
Sentence: lelaki itu pengacara televisyen
Parse tree 1:
Probability value
= 1.0000 x 1.0000 x 0.0400
X 0.2017 x 0.975 x 0.0008
= 6.293x10-6
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + Pent (0.1242)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.1242 x 0.2017
= 2.004 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + Pent + KN
(0.0015)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0015x 0.3965
= 4.758x 10-5
The best parse tree that represents “lelaki itu
pengacara televisyen” is Parse Tree 1.
182
Index
Parse Trees
Sentence
T90
Sentence: kereta kepunyaan ayah baru
Probability values
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0483 x 0.3121
= 2.125 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “rumah milik
saya baru” is Parse Tree 1
T91
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: Pembetulan tesis saya minor
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + Pent
(0.0744)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0744 x 0.3121
= 3.274 x 10-3
183
Index
Parse Trees
Sentence
Parse tree 2:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + Pent + KA
(0.0008)
The best parse tree that represents “pembetulan
tesis saya minor” is Parse Tree 1.
T92
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0008
= 2.538 x 10-5
Sentence: rumah kami di bandar
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN + KN (0.2017)
FS → KSN+ FN (0.975)
FN → KN (0.3965)
Probability value
= 1.0000 x 1.0000 x 0.0400 x
0.2017 x 0.975 x 0.3965
= 3.119 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KSN + KN
(0.0146)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0146
= 4.631 x 10-4
The best parse tree that represents “bapa saya
ke pejabat” is Parse Tree 1.
184
Index
Parse Trees
Sentence
T93
Sentence: penduduk kampung ke sawah padi
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FS (0.0400)
FN → KN + KN (0.2017)
FS → KSN+ FN (0.975)
FN → KN + FN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0400 x
0.2017 x 0.975 x 0.2017
= 1.587 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KSN + KN +
KN (0.0038)
The best parse tree that represents “penduduk
kampung ke sawah padi” is Parse Tree 1.
T94
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0038
= 1.205 x 10-4
Sentence: perkakasan sekolah adik baru
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.1410
X 0. 0483X 0.3121
= 2.125 X 10-3
185
Index
Parse Trees
Sentence
Parse tree 2:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
The best parse tree that represents “perkakasan
sekolah adik baru” is Parse Tree 2.
T95
Sentence: murid sekolah itu rajin
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + Pent
(0.0744)
FA → KA (0.3121)
Parse tree 2:
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.0744x0.3121
= 3.274x10-3
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + Pent + KA
(0.0015)
The best parse tree that represents “murid
sekolah itu rajin” is Parse Tree 1.
Probability value
= 1.0000 x 1.0000 x 0.1410 x
0.3965x0.0023
= 1.286x10-4
186
Index
Parse Trees
Sentence
T96
Sentence: beliau pemimpin besar negara
Probability values
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0483 x 0.3121
= 2.125 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “kami bentuk
adunan biskut itu” is Parse Tree 1.
T97
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: dia pelajar kolej swasta
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FA (0.1410)
FN → KN + KN + KN
(0.0483)
FA → KA (0.3121)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.0483 x 0.3121
= 2.125 x 10-3
187
Index
Parse Trees
Sentence
Parse tree 2:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “dia pelajar
kolej swasta” is Parse Tree 1.
T98
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: rakan kami pengusaha kedai perabot
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN + KN
(0.0483)
Parse tree 2:
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.0483
= 7.794 x 10-4
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN + KN
(0.0115)
The best parse tree that represents “pulau
peranginan milik kerajaan negeri” is Parse Tree
1
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0115
= 3.645x 10-4
188
Index
Parse Trees
Sentence
T99
Sentence: ibu pengusaha butik pengantin
Parse tree 1:
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + KN (0.2017)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.2017
= 3.255 x 10-3
Parse tree 2:
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN (0.3965)
FN → KN + KN + KN
(0.0483)
The best parse tree that represents “kakak
jurusolek butik pengantin” is Parse Tree 1.
T100
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.3965 x 0.0483
= 1.532 x 10-3
Sentence: beliau atlet negara Malaysia
Parse tree 1:
A → S + P (1.0000)
S → FN (1.0000)
P→ FK (0.7390)
FN → KN (0.3965)
FK → KKTr + FN (0.4815)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.7390 x
0.3964 x 0.4815 x 0.1242
= 1.752 x 10-2
Parse tree 2:
189
Index
Parse Trees
Sentence
Probability values
A → S + P (1.0000)
S → FN (1.0000)
P→ FN (0.0800)
FN → KN + KN (0.2017)
FN → KN + Pent (0.1242)
Probability value
= 1.0000 x 1.0000 x 0.0800 x
0.2017 x 0.1242
= 2.004 x 10-3
The best parse tree that represents “beliau atlet
negara malaysia” is Parse Tree 1.
190
APPENDIX D
USER MANUAL
APPENDIX D: USER MANUAL
The main interface for the prototype of Statistical Parser for Malay Language is shown below.
The interface is divided into two divisions. The left-hand side is a place where the user put the
sentence. Another side is a place to show the tagged sentence.
The main interface with example test sentence “bapa pemandu teksi”
The tagged sentence is shown after the “Proses” button is clicked. The message below is
shown if the sentence is grammatically correct.
191
Otherwise, the message is illustrated below.
If the sentence is grammatically correct, the highest probability is shown as well as the possible
parse trees.
192
APPENDIX E
LETTERS OF APPROVAL
APPENDIX E: LETTERS OF APPROVAL
193
194
Download