chapter 2

advertisement
義 守 大 學
資 訊 管 理 研 究 所
碩 士 論 文
模糊相關與支援向量學習應用在文件
多重分類問題之研究
Fuzzy Correlation and Support Vector
Learning Approach to
Multi-Categorization of Documents
研究生:胡翠峰
指導教授:林建宏博士
中 華 民 國 九十三 年 七 月
模糊相關與支援向量學習應用在文件
多重分類問題之研究
Fuzzy Correlation and Support Vector
Learning Approach to
Multi-Categorization of Documents
研 究 生:胡翠峰
指導教授:林建宏博士
Student:Tsui-Feng Hu
Advisor:Dr. Jiann-Horng Lin
義守大學
資訊管理研究所
碩士論文
A Thesis
Submitted to Institute of Information Management
I-Shou University
in Partial Fulfillment of the Requirements
for the Master Degree
in
Information Management
July, 2004
Kaohsiung, Taiwan
中 華 民 國 九十三 年 七 月
I
模糊相關與支援向量學習應用在文件
多重分類問題之研究
研究生:胡翠峰
指導教授:林建宏 博士
義守大學資訊管理研究所
摘
要
在本論文中,我們提出了一個新的文件分類方法。這個方法是基於支援向量學習與模糊
相關,用來解決電子文件多類別與多重分類的問題。支援向量機(Support Vector Machines,
SVMs)是一個在高維度特徵空間的線性學習系統。而其學習演算法是從最佳理論以及統計學
習理論得來的。支援向量機提供強而有效的分類演算法,而這個演算法是可以在高維度的輸
入空間中。有效處理分類問題。除了支援向量機之外,我們還使用模糊相關的觀念,模糊相
關是可以量測兩個變數或兩個屬性之間的相關程度。我們利用模糊相關去量測在未分類文件
與事先定義的類別之間的相關性,並且將未分類的文件分類到多個不同的類別。這個方法不
但可以解決多類別分類也可以處理多重分類的問題。
關鍵詞: 模糊相關,支援向量機,文件多重分類
II
Fuzzy Correlation and Support Vector
Learning Approach to
Multi-Categorization of Documents
Student: Tsui-Feng Hu
Advisor: Dr. Jiann-Horng Lin
Institute of Information Management
I-Shou University
ABSTRACT
In this thesis, we propose a new text categorization method for the multi-class and multi-label
problems based on support vector machines in conjunction with fuzzy correlation. Support vector
machines (SVMs) are learning systems that use a hypothesis space of linear function in a high
dimensional feature space, trained with a learning algorithm from optimization theory that
implements a learning bias derived from statistical learning theory. SVMs provide efficient and
powerful categorization algorithms which are capable of dealing with high dimensional input space.
In addition to SVM, we use concept of fuzzy correlation which can measure correlation degree
between two-variable or two-attribute. We employ fuzzy correlation to measure correlation between
unclassified documents and predefined categories. This way not only solves multi-class
classification but also multi-label categorization problem.
Keywords: Fuzzy Correlation, Support Vector Machines (SVMs), Multi-Categorization of
Documents
III
Acknowledgements
我的碩士論文能夠順利完成,首先最感謝的是我的指導教授林建宏博士,老師在這段期
間不辭辛勞的矯正我的讀書態度,教導我細心及耐心的讀書觀念,指導我研究方向及撰寫論
文的方法,且不斷的和我討論及修改論文,使我在短短二年間真的學到很多,並且能有多篇
的論文發表,同時順利取得碩士學位,在此致上由衷的敬意與感謝。
再來我要感謝我的碩士論文口試委員:高雄大學洪宗貝教授與本校資訊管理研究所林文
揚教授,謝謝他們能撥冗參加口試會議,並於口試期間給予我相當有建設性及前曕性的建議,
使我的論文更趨完整。
兩年的研究生生活是短暫且辛苦的,但卻也是我受益良多的一段時間。感謝這兩年指導
我進行研究的教授們,包括林文揚老師、錢炳全老師及其他參與研討會議的老師們,還有過
去參與研討的學長姐及學弟妹們。感謝你們,由於你們的支持及幫助,才能使我順利的完成
學業。
最後感謝我的爸媽,謝謝您們對於我求學階段的照顧與支持,使我無後顧之憂的取得碩
士學位,在此也表達衷心的謝意。這一路走來,我學到了很多,在這裏謹向所有幫助過我及
默默支持我的人,致上我最深的謝意。
胡翠峰
民國 93 年 7 月
IV
Contents
摘 要
.......................................................................................................................................... II
ABSTRACT ..................................................................................................................................... III
ACKNOWLEDGEMENTS.............................................................................................................. IV
CONTENTS .......................................................................................................................................V
LIST OF FIGURES ........................................................................................................................ VII
LIST OF TABLES ............................................................................................................................. IX
CHAPTER 1 INTRODUCTION ....................................................................................................... 1
1.1
RESEARCH BACKGROUND AND MOTIVATION .......................................................................... 1
1.2
CONTRIBUTIONS OF THE THESIS ............................................................................................. 4
1.3
ORGANIZATION OF THE THESIS ............................................................................................... 4
CHAPTER 2 REVIEW OF RELATED WORKS ............................................................................ 5
2.1
DOCUMENT CATEGORIZATION ................................................................................................. 5
2.2
REVIEW OF DOCUMENT CATEGORIZATION .............................................................................. 6
2.3
WHY USE SUPPORT VECTOR MACHINES ................................................................................. 7
2.4
WHY USE FUZZY CORRELATION ........................................................................................... 11
CHAPTER 3 SUPPORT VECTOR MACHINES (SVMS) ......................................................... ..12
3.1
THE MAXIMAL MARGIN CLASSIFIER ..................................................................................... 14
3.2
KERNEL-INDUCED FEATURE SPACES ..................................................................................... 15
3.3
KERNEL FUNCTIONS.............................................................................................................. 16
3.4
SOFT MARGIN OPTIMIZATION ............................................................................................... 18
35
SUPPORT VECTOR CLASSIFICATION, CLUSTERING, REGRESSION AND FUZZY SUPPORT
VECTOR MACHINES............................................................................................................... 18
3.6
SUPPORT VECTOR MACHINES (SVMS) APPLICATIONS............................................................ 20
CHAPTER 4 MULTI-CLASS SUPPORT VECTOR LEARNING .............................................. 21
4.1
ONE-AGAINST-ONE CLASSIFIERS STRATEGY ........................................................................ 21
V
4.2
ONE-AGAINST-REST CLASSIFIERS STRATEGY ....................................................................... 23
4.3
HIERARCHIES OR TREES OF BINARY SVM CLASSIFIERS STRATEGY ....................................... 25
4.4
DECISION DIRECTED ACYCLIC GRAPH (DDAG) ................................................................... 25
4.5
MULTI-CLASS CLASSIFIERS COMPARSION ............................................................................. 27
CHAPTER 5 FUZZY CORRELATION ......................................................................................... 28
5.1
FUZZY CORRELATION ............................................................................................................ 29
5.2
FUZZY CORRELATION FOR MULTI-CATEGORIZATION OF DOCUMENTS ............................... ....31
CHAPTER 6 FUZZY CORRELATION AND SUPPORT VECTOR LEARNING ................... 35
6.1
FRAMEWORK OF THE PROPOSED APPROACH ......................................................................... 35
6.2
PRE-PROCESSING ................................................................................................................... 38
6.3
FUZZY CORRELATION AND OAO-SVMS FOR MULTI-CATEGORIZATION OF DOCUMENTS ....... 42
CHAPTER 7 EXPERIMENTAL RESULTS .................................................................................. 50
7.1
EXPERIMENTAL DATA SOURCE ............................................................................................ 50
7.2
EXPERIMENTAL RESULTS ...................................................................................................... 53
CHAPTER 8 CONCLUSION AND FUTURE WORK ................................................................. 79
8.1
CONCLUDING REMARKS ....................................................................................................... 79
8.2
DIRECTIONS FOR FUTURE RESEARCH .................................................................................... 80
BIBLIOGRAPHY ............................................................................................................................. 81
VI
List of Figures
FIGURE 2.1: WORDS APPEAR FREQUENCY IN THE DOCUMENTS AND THEIRS RELATION ....................... 6
FIGURE 2.2: AUTOMATIC DOCUMENTS CATEGORIZATION FLOW CHART ............................................... 7
FIGURE 3.1: BINARY CATEGORIZATION OF SUPPORT VECTOR MACHINE ............................................. 13
FIGURE 3.2: FRAMEWORK OF SUPPORT VECTOR MACHINES (SVMS) ................................................. 13
FIGURE 3.3: SVMS MAPPING .............................................................................................................. 17
FIGURE 3.4: SUPPORT VECTOR CLUSTERING (SVC) ............................................................................ 19
FIGURE 3.5: SUPPORT VECTOR REGRESSION (SVR) ............................................................................ 19
FIGURE 3.6: FUZZY SUPPORT VECTOR MACHINES (FSVMS) .............................................................. 20
FIGURE 4.1: ONE-AGAINST-ONE CLASSIFIERS STRATEGYI,II (MAJORITY VOTING SCHEME) .............. 23
FIGURE 4.2: ONE-AGAINST-REST CLASSIFIERS STRATEGY.................................................................. 24
FIGURE 4.3: HIERARCHIES OR TREES OF BINARY SVM CLASSIFIERS STRATEGY................................. 25
FIGURE 4.4: DECISION DIRECTED ACYCLIC GRAPH (DDAG) ............................................................. 26
FIGURE 6.1: FRAMEWORK OF DOCUMENT CATEGORIZATION .............................................................. 35
FIGURE 6.2: FRAMEWORK OF THE PROPOSED APPROACH ................................................................... 36
FIGURE 6.3: WEB FREQUENCY INDEXER WEBPAGE............................................................................. 39
FIGURE 6.4: WEB FREQUENCY INDEXER COMPUTES WORD FREQUENCY ........................................... 39
FIGURE 6.5: PICKING OUT STOP OF WORDS IN BAGS OF WORDS......................................................... 40
FIGURE 6.6: MULTI-CLASS SVMS (OAO-SVMS) ARCHITECTURE I ................................................... 43
FIGURE 6.7: MULTI-CLASS SVMS (OAO-SVMS) ARCHITECTURE II .................................................. 44
FIGURE 6.8: IMPROVING PRE-SET STATES IN TRAINING DATA ............................................................. 48
FIGURE 6.9: IMPROVING TWO-CLASS SVMS IN GRAY REGION ........................................................ .. 49
FIGURE 7.1: REUTERS-21578 DATASET FORM ................................................................................... ..52
FIGURE 7.2: WEB FREQUENCY INDEXER COMPUTES EVERY WORD FREQUENCY OF EVERY
VII
CHAPTERS .................................................................................................................... .. 54
FIGURE 7.3: COMPUTING EVERY WORD FREQUENCY AND BELONGING TO WHICH ONE
DOCUMENT ..................................................................................................................... 54
FIGURE 7.4: DATA FORM BEFORE INPUT ............................................................................................. 55
FIGURE 7.5: TRAINING DATA AND TEST DATA BEFORE CLASSIFYING .................................................. 62
FIGURE 7.6: DECISION FUNCTION PARAMETER VALUES ...................................................................... 68
FIGURE 7.7: BELONGING TO WHICH ONE CATEGORY IN THE MULTI-CLASS SVMS ............................ 69
FIGURE 7.8: ACCURACY WITH TEN CATEGORIES IN THE 50,300-DIMENSION ...................................... 71
FIGURE 7.9: AVER-ACCURACY IN THE DIFFERENT LEARNING MACHINES ........................................... 72
FIGURE 7.10: COMPUTING KEYWORDS IN THE DIFFERENT CATEGORIES S .......................................... 73
FIGURE 7.11: KEYWORDS IN THE TEN CATEGORIES FREQUENCY ........................................................ 74
FIGURE 7.12: CORRELATION MEMBERSHIP IN THE TEN CATEGORIES .................................................. 78
VIII
List of Tables
TABLE 2.1: CLASSIFICATION ACCURACY, THORSTEN JOACHIMS, 1997, REUTERS-21758 DATASET ....... 8
TABLE 2.2: CLASSIFICATION ACCURACY, THORSTEN JOACHIMS, WEBKB DATASET ............................. 8
TABLE 2.3: CLASSIFICATION ACCURACY, THORSTEN JOACHIMS, OHSUMED COLLECTION DATASET ..... 9
TABLE 2.4: CLASSIFICATION ERROR, JASON AND RYAN,2001 ................................................................ 9
TABLE 2.5: CLASSIFICATION ERROR, JASON AND RYAN, 2001 ............................................................... 9
TABLE 2.6: CLASSIFICATION ERROR, FRIEDHELM SCHWENKER, 2000 ................................................. 10
TABLE 2.7: CLASSIFICATION ERROR, JOHN AND NELLO, 2000............................................................. 10
TABLE 2.8: CLASSIFICATION BETWEEN TWO-UNCLASSIFIED DOCUMENTS AND FOUR-PREDEFINED
CATEGORIES ...................................................................................................................... 11
TABLE 4.1: MULTI-CLASS CLASSIFIER COMPARISON .......................................................................... 27
TABLE 7.1: DOCUMENTS IN ONE CATEGORY OWNED .......................................................................... 51
TABLE 7.2: DIFFERENT CATEGORIES WITH TRAINING NUMBERS ......................................................... 51
TABLE 7.3: ACCURACY COMPARISON WITH DIFFERENT METHODS ...................................................... 70
TABLE 7.4: ACCURACY IN THE DIFFERENT DIMENSIONS ..................................................................... 71
TABLE 7.5: EVERY DOCUMENT IN THE DIFFERENT CATEGORIES APPEARS FREQUENCY ...................... 74
TABLE 7.6: CORRELATION BETWEEN EVERY DOCUMENT AND TEN CATEGORIES ................................. 75
TABLE 7.7: MULTI-CATEGORIIES OF DOCUMENTS ............................................................................... 77
IX
CHAPTER 1
INTRODUCTION
1.1 Research Background and Motivation
There are billions of text documents available in electronic form. These collections
represent a massive amount of information that is easily accessible. However, seeking relevant
information in this huge collection requires organization. With the rapid growth of online
information, document categorization has become one of the key techniques for handling and
organizing text data. This can be greatly aided by automated classifier systems. The accuracy of
such systems determines their usefulness. Text categorization is the classification to assign a
text document to appropriate category/ies in a predefined set of categories. Originally, research
in text categorization addressed the binary problem, where a document is either relevant or not
with respect to a given category. In real world situation, however, the great variety of different
sources and hence categories usually poses multi-class classification problem, where a
document belongs to exactly one category selected from a predefined set [33][84][85][86].
Even more general is the case of multi-label problem, where a document can be classified
into more than one category. While binary and multi-class problems (single-categorization of
documents) were investigated extensively [52], multi-label problems (multi-categorization of
documents) have received very little attention [87].
In our thesis, we propose a new text categorization method for the multi-class and
multi-label problems based on support vector machines in conjunction with fuzzy correlation.
1
The concept of support vector machines is proposed by Vapnik in 1995 according to a
foundation of improving statistical learning theory. A support vector learning method is similar
to perceptron on neural network and the are all classifier models. Owning to support vector
machines have higher accuracy and provide relative model of support vector regression and
support vector clustering, it is most appropriate method on document categorization.
Support vector machines (SVMs) are learning systems that use a hypothesis space of
linear functions in a high dimensional feature space, trained with a learning algorithm from
optimization theory that implements a learning bias derived from statistical learning theory. To
find out what methods are promising for learning document classifier, we should find out more
about the properties of text: (1) high dimensional input space; (2) few irrelevant features; (3)
document vectors are sparse; (4) most document categorization problems are linearly separable.
In [33], Joachims published results on a set of binary text classification experiments using the
SVM. The SVM yields lower error than many other classification techniques. Yang and Liu [78]
followed later with experiments of her own on the same data set. She used improved versions of
Naïve Bayes (NB) and k-nearest neighbors (kNN) but still found that the SVM performed at
least as well as all other classifier she tried. Both papers used the SVM for binary text
classification, leaving the multi-class problem (assigning a single label to each example) open
for future research. The multi-class classification problem refers to assigning each of the
observations into on of k classes. As two-class problems are much easier to solve, many
authors propose to use two-class classifiers for multi-class classification. Berger and Ghani
individually chose to attack the multi-class text classification problem using error-correcting
output codes (ECOC) [76] [79]. They both chose to use Naïve Bayes as the binary classifier.
ECOC combines the outputs of many individual binary classifiers in an additive fashion to
produce a single multi-class output. It works in two-stages: first, independently construct many
subordinate classifiers, each responsible for removing some uncertainty about the correct class
2
of the input; secondary, apply a voting scheme to decide upon the correct class, given the output
of each weak learner. In our thesis, we focus on techniques that provide a multi-class
classification solution by combining all pairwise comparisons. A common way to combine
pairwise comparisons is by voting [80]. It construct a rule for discriminating between every pair
of classes and then selecting the class with the most winning two-class decisions. Though the
voting procedure requires just pairwise decisions, it only predicts a class label. In many
scenarios, however, probability estimates are desired. As numerous (pairwise) classifiers do
provide class probabilities, several authors [81][82] have proposed probability estimates by
combining the pairwise class probabilities. A parametric approach was proposed by Platt [83],
which consists of finding the parameters of a sigmoid function, mapping the scores into
probability estimates. SVMs learn a decision boundary between two classes by mapping the
training examples onto a higher dimensional space and then determining the optimal separating
hyperplane between that space. Given a test example, the SVM outputs a score that provides the
distance of the test example from the separating hyperplane. The sign of the score indicates to
which class text example belongs. In our approach, we want to have a measure of confidence
(belief) in the classification. The final decision is based on the classifier with maximum
confidence. To provide an accurate measure of confidence, we adopt a concept of fuzzy
correlation to determine unclassified documents relation among other categorizations. Fuzzy
correlation not only can distinguish relative degree between text documents and classified
categorization but also obtain messages of positive, negative and irrelative correlation between
text documents and classified categorization.
3
1.2 Contributions of The Thesis
The main contributions of this thesis are:
1.
We present an efficient method for producing class membership estimates for multi-class
text categorization problem.
2.
Based on SVM binary classifiers in conjunction with the class membership, we propose a
measure of confidence given by the fuzzy correlation. An acknowledged deficiency of
SVMs is that the uncalibrated outputs do not provide estimates of posterior probability of
class membership. Our approach not only solves multi-class classification but also
multi-label categorization problems.
1.3 Organization of The Thesis
The remainder of this thesis is organized as follows. The related document categorization,
why use support vector learning technology and fuzzy correlation are briefly reviewed in
Chapter 2. Support Vector Machines (SVMs) is proposed in Chapter 3. Multi-class SVMs
classifiers strategy is adopted in Chapter 4. Fuzzy correlation is proposed in Chapter 5. Fuzzy
correlation and support vector learning approach to solve multi-categorization of document is
applied in Chapter 6. Experimental results for this new approach are given in Chapter 7. Finally,
conclusion and future work are given in Chapter 8.
4
CHAPTER 2
REVIEW OF RELATED WORKS
In this chapter, we review some related researches about the thesis, including document
categorization, fuzzy correlation, and machine learning technology.
2.1 Document Categorization
In the past, document categorization is read, analyzed, and classified by a group of
specialists. But it is slowly efficiency and expensively labor power cost, therefore
categorization does not fast and effectively overall constructed. Recently, mass information
quantity are brought up in networks because a prosperity is on information technology and the
most of data are slowly transformed a kind of electronic format. Consequently, an essential
issue that is rapidly and accurately obtaining significant information in the period of
information explosion
Document categorization is a domain among text mining which is as data mining, a quite
mature technology, generally analyzes data. Text mining combines data mining, information
extraction, information retrieval, document classification, probabilistic modeling, linear algebra,
machine learning, computational linguistics to discover structure, pattern and knowledge in
large textual corpora…so on. Thus it can be seen, text mining is a widespread domain. It can
apply to every kind of research methods, for example: computer science, mathematics and
statistics, information retrieval, artificial intelligence…so on. It is not really easy to take these
research methods to apply in document categorization appropriately. For which one is more
5
appropriate, it still not explicitly indicate up to the present. Because most research are all
theoretical, if we want to apply in real network world, we will still more spare no efforts to
carry out it.
2.2 Review of Document Categorization
Document categorization [6][32][54][76] extremely would spend on and waste time, if
document categorization were done manually. For this reason, automatic document
categorization technique will rise and develop. The earliest automatic technique of ruled-based
approaches of expert system applies in document categorization. This method must need
constructional regulation and membership function, and it difficultly corrects behaviors. From
or after that time, some categorization methods are successively appearance. These
categorizations have four advantages: 1) easily construct and update; 2) by material information
providing some data to users easily; 3) constructed in interesting categories of users; 4)
accurately provide decision to users; improve the earliest expert system’s shortcoming.
Presumed resolving
power of significant
words
Nonsignificant
high-frequency
terms
C
D
Nonsignificant
terms
Resolving
power of
low-frequency
words
Words in deceasing frequency order
Figure 2.1 Words appear frequency in the documents and theirs relation
6
Machine
classify
documents
categories
documents
training
filing
Filing
documents
extracting
documents
Training
documents
Man-made
confirm
Classified
documents
categories
Man-made
classify
Figure 2.2 Automatic documents categorization flow chart
The technical applications of statistics categorization and machine learning for document
categorization [2][4][5][19][31][35][36][42][43][52][53][60][67][76] are maturity little by little.
For example: classification and regression tree, multivariate regression models, nearest
neighbor classifiers, decision tree, neural networks, gene algorithm, symbolic rule learning,
naïve bayes rules, bayes nets, and support vector machines for document categorization…so on.
2.3 Why Use Support Vector Machines
Accuracy is a significant target for evaluate these document categorization methods which
own higher accuracy and appropriateness. We review and refer many research reports and
literature,
and
discover
support
vector
machines
(SVMs)
[1][2][3][7][8][9][11][12][14][16][20][21][23][24][25][26][28][29][37][38][40][41][44][45][46
7
][50][55][56][57][65][66][72][73] that apply in document categorization have higher accuracy
than the common categorization methods in machine learning generally. For this reason, this is
a motive for us to use support vector machines (SVMs) in document categorization.
According to experimental result in the literature information, we can prove that support
vector machines (SVMs) actually have more excellent classified accuracy from following
experimented tables.
Table 2.1 Classification accuracy from [33], Thorsten Joachims, 1997, Reuters-21758 dataset
Earn
Acq
Money-fx
Grain
Crude
Trade
Interest
Ship
Wheat
Corn
Micro-average.
Naïve
Bayes
Rocchio
C4.5
k-NN
95.9
91.5
62.9
72.5
81.0
50.0
58.0
78.7
60.6
47.3
72.0
96.1
92.1
67.6
79.5
81.5
77.4
72.5
83.1
79.4
62.2
79.9
96.1
85.3
69.4
89.1
75.5
59.2
49.1
80.9
85.5
87.7
79.4
97.3
92.0
78.2
82.2
85.7
77.4
74.0
79.2
76.6
77.9
82.3
SVM
(RBF
r=0.5)
98.5
95.0
74.0
93.1
88.9
76.9
74.4
85.4
85.2
85.1
86.4
SVM
(RBF
r=1.0)
98.4
95.3
76.3
91.9
88.9
77.8
76.2
87.6
85.9
85.7
86.3
SVM
(RBF
r=1.2)
98.3
95.4
75.9
90.6
88.2
76.8
76.1
87.1
85.9
84.5
86.2
Table 2.2 Classification accuracy from [34], Thorsten Joachims, WebKB dataset
Course
Faculty
Project
Student
Average
Naïve Bayes
57.2
42.4
21.4
63.5
46.1
8
SVM
68.7
52.5
37.5
70.0
57.2
Table 2.3 Classification accuracy from [34], Thorsten Joachims, Ohsumed collection dataset
Pathology
Cardiovascular
Neoplasms
Nervous system
Immunologic
Average
Naïve Bayes
39.6
49.0
53.1
28.1
28.3
39.6
SVM
41.8
58.0
65.1
35.5
42.8
48.6
Table 2.4 Classification error from [48] Jason and Ryan, 2001
20 News
group
Ova
Dense 15
BCH 15
Dense 31
BCH 31
Dense 63
BCH 63
800
SVM
NB
0.131 0.146
0.142 0.176
0.145 0.169
0.135 0.168
0.131 0.153
0.129 0.154
0.125 0.145
250
SVM
NB
0.167 0.199
0.193 0.222
0.196 0.225
0.180 0.214
0.173 0.198
0.171 0.198
0.164 0.188
100
SVM
NB
0.214 0.277
0.251 0.282
0.262 0.311
0.233 0.267
0.224 0.259
0.222 0.256
0.213 0.245
30
SVM
0.311
0.366
0.415
0.348
0.333
0.326
0.312
NB
0.455
0.431
0.520
0.428
0.438
0.407
0.390
Table 2.5 Classification error from [48] Jason and Ryan, 2001
Industry
Sector
Ova
Dense 15
BCH 15
Dense 31
BCH 31
Dense 63
BCH 63
800
SVM
NB
0.072 0.357
0.119 0.191
0.106 0.182
0.083 0.145
0.076 0.140
0.072 0.135
0.067 0.128
250
SVM
NB
0.176 0.568
0.283 0.363
0.261 0.352
0.216 0.301
0.198 0.292
0.189 0.279
0.176 0.272
9
100
SVM
NB
0.341 0.725
0.461 0.542
0.438 0.518
0.394 0.482
0.371 0.462
0.363 0.453
0.343 0.443
30
SVM
0.650
0.738
0.717
0.701
0.676
0.674
0.653
NB
0.885
0.805
0.771
0.769
0.743
0.745
0.734
Table 2.6 Classification error from [51] Friedhelm Schwenker, 2000
Classifier
MLP
5-NN
LVQ
RBF
SVM-1-R
SVM-1-1
SVM-TR
Error (%)
2.41
2.34
3.01
1.51
1.40
1.37
1.39
MLP: multi-player preceptors
5-NN: 5-nearest neighbor classifier trained with Kohonen’s software package with
LOVQ1&OLVQ3
SVM-1-R: one-against-rest strategy
SVM-1-1: one-against-one strategy
SVM-TR: hierarchies or trees of binary SVM classifiers strategy
Table 2.7 Classification Error from [47] John and Nello, 2000

1-v-r
Max Wins
DDAG
C
100
100
100
3.58
5.06
5.06
Error Rate (%)
4.7
4.5
4.4
1-v-r: One-against-Rest Strategy, Max Wins: Max Wins algorithm
DDAG: Decision Directed Acyclic Graph
From above-mentioned experimented table, and by other especial statistic test estimating
in the literature of Yang & Liu ,{SVM,KNN}>LLSF>multilayered perceptrons>>multinomial
naïve bayes (in this five classifier, SVM has the best categorization efficiency ). And by
micro-average computing efficiency for Reuters in Joachims; SVM (0.864)>KNN (0.823)>
{Rocchio (0.799), C4.5 (0.794)}>naïve bayes (0.72); SVM still has the best categorization
efficiency. We can get a significant message that support vector machines (SVMs) are more
appropriate in document categorization than others. In an aspect of the multi-class
categorization decision, one-against-one classifiers strategy is the superior than others, and we
use
and
think
the
one-against-one
SVMs
classifiers
strategy
(OAO-SVMs)
[3][9][25][36][37][38][45][47][48][65] that is similar structure concept of decision directed
10
acyclic graph classifiers strategy (DDAG) but OAO-SVMs is more simple and appropriate in
multi-class classified decision structure than other classifiers strategy.
2.4 Why Use Fuzzy Correlation
Fuzzy set is a kind of mathematics model to express method of linguistic information. That
is a kind of tools that are quantification of ambiguous meaning. We employ fuzzy correlation to
measure correlation between two-variables or two-attributes. Using fuzzy correlation
[10][17][22][70], we can not make a threshold subjectively, and let a threshold be called into
question.
Table 2.8 Correlation between two-unclassified documents and four-predefined categories
Categorization A
Categorization B
Categorization C
Categorization D
Document 1
0.90
0.65
0.68
0.92
Document 2
0.41
0.63
0.44
0.62
Categorization A
Categorization B
Categorization C
Categorization D
Document 1
0.12
0.68
0.69
0.71
Document 2
0.91
0.93
0.23
0.69
By the above-mentioned table, we can employ fuzzy correlation to compute correlation
degree between unclassified documents and pre-defined categories, and to match up learning
machines to classify. Fuzzy correlation can measure correlation degree between unclassified
documents and pre-defined categories; SVMs can classify unclassified documents into
multiple categories through fuzzy correlation measure. Because SVMs just only can classify
unclassified documents to only one or special one pre-defined category.
11
CHAPTER 3
SUPPORT VECTOR MACHINES
(SVMS)
Support vector machines (SVMs) are a system for efficiently training the linear learning
machines in the kernel-induced feature spaces, and can be used for pattern categorization and
nonlinear regression. In categorization, the main idea of a support vector machine is to construct
a hyperplane as the decision surface in such a way that the margin of separation between
positive negative examples is maximized. The machine achieves this desirable property by
following a principled approach rooted in the statistical learning theory. Form the perspective of
statistical learning theory the motivation for considering binary classifier SVMs comes from
theoretical bounds on the generalization error (the theoretical generalization performance on
new data). These generalization bound have two important features. First, the upper bound in
the generalization error does not depend on the dimension of the space. Secondly, the error
bound is minimized by maximizing the margin,  , i.e. the minimal distance between the
hyperplane separating the two classes and the closest data points to the hyperplane as shown in
Figure 3.1. Accordingly, the support vector machines (SVMs) can provide a good generalization
performance to avoid overfitting problem pattern categorization problem. This attribute is
unique to support vector machines (SVMs).
12
+
margin
+
+

+
+
optimal hyperplane
Figure 3.1.Binary categorization of support vector machine
Support vector machines (SVMs) are a linear binary classifier, and define two different
dataset by predefined classified value (1 or -1). And then separate two dataset by classified
function. The theory anticipates to labeling the classified data, and inputs data into a linear
function to train continuously, and train out the optimization decision function for two dataset in
final, and by the best optimal decision function becoming maximum margin in hyperplane that
can separate two dataset. Therefore, SVMs are exactly handling the optimization problem.
 1
 T x  b  0 
 1
Figure 3.2 Framework of support vector machines (SVMs)
13
3.1 The Maximal Margin Classifier
Let us consider a binary categorization task with data points xi i  1,..., m having
corresponding labels yi  1 and let the decision function [14][50][64][73] be:
f x  signw  x  b
3.1
The hyperplane parameter of the decision function is w, b  . If the data set is separable
then the data will be correctly classified if yi w  x  b   0i . We implicitly define a scale for
w b
to give canonical hyperplane such that w  x  b  1 for the closest points on one side
and w  x  b  1 for the closest points on the other side. For the separating hyperplane
w  x  b  0 the normal vector is clearly
means the margin is  
w
. Since w  x  b  1 and w  x  b  1 this
w2
1
. To maximize the margin the task is therefore:
w2
Minimize
3.2
1 2
w
2 2
subject to the constraints:
3.3
yi w  x  b  1  i
and the learning task can be reduce to minimization of the primal lagrangian:
L
N
1
w  w   i  yi w  x  b   1
2
i 1
3.4
where  i are the Lagrangian multipliers (hence  i  0 ). From Wolfe’s theorem, we can take
the derivative with respect to b and w to obtain
N
L
 0    i yi  0
b
i 1
3.5
N
L
 0  w    i yi xi
w
i 1
and re-substitute back in the primal 3.4 to give Wolfe dual Lagrangian:
14
N
W      i 
i 1
1 N
 i j yi y j xi , x j 
2 i 1
3.6
which must be maximized with respect to the  i subject to the constraint:
3.7
.  0
i
N
 y
i
i 1
i
Solving Equation
3.8
0
3.6
with constraints Equation
3.7
determines the Lagrange
multipliers, and the optimal separating hyperplane is given by,
^
^
w   i yi xi
SVs
^
1^
b   w xr  xs  ,
2
where SVs denotes the set of support vectors whose Lagrange multiplier is positive.
^
The bias term b is computed here using two support vectors, xr and xs which are any
support vectors from each class satisfying,
^
3.9
^
 r , s  0, yr  1, ys  1
But can be computed using all the support vectors on the margin for stability.
3.2 Kernel-Induced Feature Spaces
For the dual Lagrangian 3.6 we notice that the data points xi only appear inside an
inner product. To get a better representation of the data we can therefore map the data points
into an alternative higher dimensional space, called feature space, through replacement:
xi  x j    xi    x j 
3.10
The functional form of the mapping  xi  does not need to be known since it is implicitly
15
defined by the choice of kernel [12][14]:
K xi , x j    xi    x j 
3.11
After substituting the data inner product xi  x j for K xi , x j  , the dual Lagrangian 3.6
becomes
N
W     i 
i 1
1 N
i j yi y j K xi , x j 
2 i , j 1
3.12
where K x, y  is the kernel function performing the non-linear mapping into feature space,
and the constraints are unchanged,
3.13
i  0
N
 y
i 1
i
i
3.14
0
Solving the Equation 3.12 with constraints Equation 3.13 determines the Lagrange
multipliers, and a hard margin classifier in the feature space is given by,
^
^

f x   sign   i yi K xi , x   b 
 SVs

3.15
where
^
b
^
1

 i yi K xr , xi   K xs  xi 
2 SVs
3.16
The bias is computed here using two support vectors, but can be computed using all the
support vectors on the margin for stability.
3.3 Kernel Functions
The idea of kernel function [14][44][73] is to enable operations to be performed in the
input space rather than the potentially high dimensional feature space. Hence the inner product
does not need to be evaluated in the feature space. This provides a way of addressing the curse
16
of dimensionality. In the following subsections, we introduction three common choices of
kernel function.
A、 Polynomial
A polynomial mapping is a popular method for non-linear modeling,
K  x, y    x  y 
d
K  x, y    x  y   1
d
3.17
where d  1,2,... . The second kernel is usually preferable as it avoid problems with the
Hessian matrix becoming zero.
B、 Gaussian Radial Basis Function
Radial basis functions have received significant attention, most commonly with
a Gaussian of the form,
 x  y 2 

K x, y   exp  
2

2



An attractive feature [15][30][49][68] of the SVMs is that this selection is implicit,
with each support vectors contributing one local Gaussian function, centered at that data point.
Figure 3.3 SVMs mapping
17
3.4 Soft Margin Optimization
Most real life data sets contain noise and SVMs [39][59][71] can fit this noise leading to
poor generalization. The effect of outliers and noise can noise can be reduced by introduction a
soft margin and two schemes, l1 error norm and l2 error norm, are current used. The
justification for these soft margin techniques comes from statistical learning theory but can
readily viewed as relaxation of the hard margin constrain 3.3 . Thus for the l1 error norm we
introduce a positive slack variable  i into 3.3 :
3.19
yi w   xi   b  1  i i
N
and the task is now to minimize the sum of error

i 1
min
i
in addition to w 2 :
N
1
w  w  C  i
2
i 1
subject to yi w   xi   b  1  i i
3.20
N
For l2 error norm, the task is to minimize the sum of squared error

i 1
2
i
in addition to
w 2:
min
N
1
w  w  C  i2
2
i 1
subject to yi w   xi   b  1  i i
3.21
3.5 Support Vector Classification, Clustering, Regression
and Fuzzy Support Vector Machines
Support vector learning [6][11][12][13][14][77], it can apply in support vector
classification [1][2][3][7][8][9][16][19][23][24][25][26][28][29][31][33][34], support vector
18
clustering (SVC) [27][49][77], and support vector regression (SVR) [4][20][21][57]. Support
vector learning takes support vector machines (SVMs) concept into support vector clustering
(SVC) and support vector regression (SVR). Let SVMs apply to more widespread domain. In
addition to SVR and SVC, recently add the fuzzy theory concept that is so-called fuzzy support
vector machines (FSVMs). In FSVMs, according to different degree of fuzzy values, we clearly
divide categorization boundary in different degree. This method makes categorization more
elasticity and categorization accuracy promotion.
Figure3.4 Support vector clustering (SVC)
Figure 3.5 Support vector regression (SVR)
19
Class boundary with membership function
Figure 3.6 Fuzzy support vector machines (FSVMs)
3.6 Support Vector Machines (SVMs) Application
Support vector machines (SVMs) can apply in following domains:
A.
Document categorization [30][58][61][62][63][69]

A kernel from IR applied to information filtering
B. Image recognition

Aspect independent classification

Colour-based classification
C.
Hand-written digit recognition
D.
Bioinformation
E.

Protein homology detection

Gene expression
Commerce and finance
20
CHAPTER 4
MULTI-CLASS SUPPORT VECTOR
LEARNING
Support vector machines (SVMs) are the binary classifiers, but only one classifier does not
classify multiple categories. Therefore, we need to use other methods or many binary classifiers
to combine classifier strategy model. At present, there are common four classifier model:
one-against-one classifiers strategy, one-against rest classifiers strategy, hierarchies or trees of
binary SVM classifiers strategy and decision directed acyclic graph However, these classifiers
strategy model [3][9][25][36][37][43][45][47][48][51][65][66] are all advantages and
disadvantages, we will describe these four classifiers model below:
4.1 One-against-One Classifiers Strategy (OAO)
One-against-one classifiers strategy structure has
N  N  1
classifiers, and it employs the
2
method of majority voting scheme to combine classifier and to evaluate classified result finally.
But the maximum classifier numbers are never over
Minimize:
N  N  1
.
2
 
1 ij T ij
w w  C  tij
2
t
Subject to: wij  xt   bij  1  tij , if yt  i
T
21
w  x   b
ij T
t
ij
 1  tij , if yt  j
 tij  0
Ex. Superior scholars, Ordinary scholars, Maimed scholars, Aboriginal scholars
Superior scholars
Ordinary scholars
Classifier A
Superior scholars
Maimed scholars
Classifier B
Superior scholars
Aboriginal scholars
Classifier C
Maimed scholars
Classifier D
Ordinary scholars
Aboriginal scholars
Classifier E
Maimed scholars
Aboriginal scholars
Ordinary scholars
Classifier F
22
Superior scholars
Ordinary scholars
Maimed scholars
Common students
Superior
scholars
Peculiar students
Classifier A
-1
Ordinary
scholars
+1
Maimed
scholars
Classifier B
Aboriginal
scholars
Classifier C
-1
Superior scholars
Aboriginal scholars
+1
-1
Ordinary scholars
Maimed scholars
+1
Aboriginal scholars
Figure 4.1 One-against-one classifiers strategy I, II (Majority voting scheme)
4.2 One-against-Rest Classifiers Strategy (OAR)
One-against-rest classifiers strategy structure has
N  1
classifiers if it has N
categories. It employs one classifier to classify one category, and other categories are one by
one classified by other classifiers.
Minimize:
 
l
1 iT i
w w  C  ij
2
j 1
 
Subject to: wi x j   bi  1   ij , if y j  i
T
23
w  x   b
i T
i
j
 1   ij , if y j  i
 ij  0 , j  1,2,..., l


class of x  arg max i 1, 2,..., k wi x   bi
T

Ex. Superior scholars, Ordinary scholars, Maimed scholars, Aboriginal scholars
Superior scholars, Ordinary scholars,
Maimed scholars, Aboriginal scholars
Classifier
+1
-1
Superior scholars
Ordinary scholars,
Maimed scholars, Aboriginal scholars
Classifier
+1
-1
Maimed scholars,
Aboriginal scholars
Ordinary scholars
Classifier
+1
Aboriginal scholars
Figure 4.2 One-against-rest classifiers strategy
24
-1
Maimed scholars
4.3 Hierarchies or Trees of Binary SVM Classifiers
Strategy
To take binary classifiers build a hierarchical tree structure, and a number of classifiers are
not fixed. By data distributed state, we can know the number of classifiers, and finally
output does not relate among every each categories.
Ex. SS: Superior scholars, OS: Ordinary scholars,
MS: Maimed scholars, AS: Aboriginal scholars
{SS, OS, MS, AS}
Classifier
-1
+1
{SS, OS}
{MS, AS}
Classifier
-1
SS
Classifier
+1
-1
OS
MS
+1
AS
Figure 4.3 Hierarchies or trees of binary SVM classifiers strategy
4.4 Decision Directed Acyclic Graph (DDAG)
The method of decision directed acyclic graph (DDAG) is provided by Taylor. This way
combines one-against-one classifiers strategy and hierarchies or trees of binary SVM classifiers
strategy. DDAG and one-against-one classifier are the same in training phase. A number of
25
classifiers are fixed in
N  N  1
at most, and by hierarchical trees structure improves output
2
classifiers in every each categories which are not related. DDAG is appropriate to multi-class
classifier because DDAG make classifier quantity and output structure more completed.
Ex. SS: Superior scholars, OS: Ordinary scholars,
MS: Maimed scholars, AS: Aboriginal scholars
{SS, OS, MS, AS}
SS vs AS
Not AS
Not SS
{SS, OS, MS}
{OS, MS, AS}
SS vs MS
Not MS
Not SS
Not AS
OS, MS
SS vs MS
SS
OS vs AS
OS
Not OS
MS vs AS
MS
Figure 4.4 Decision directed acyclic graph (DDAG)
26
AS
4.5 Multi-Class Classifier Comparison
Table 4.1.Multi-class classifier comparison
One-against-One
classifiers strategy
Advantage
1.Maxiumum and steady
quantity of classification
implements
Disadvantage
1.Evaluative measure and
strategy with Majority
voting Scheme needed
2.Classified implements
decreased
One-against-Rest
classifiers strategy
Hierarchies or Trees of
Binary SVM classifiers
strategy
3.High accuracy
1.Understood constructed 1.Un-accuracy classified
easily
than other classifiers
1.Relational source
1.Uncertain classification
understood
implement quantity
in hierarchy structure
2.Incomplete independent
structure of input class
Decision Directed Acyclic 1.Maxiumum and steady
Graph (DDAG)
quantity of classification
implements
3.Classified data points all
refreshed class 1,1
in every time
1.Classified data points all
refreshed class 1,1
in every time
2.Relational source
understood
in hierarchy structure
3.Incomplete input
classification structure
improved
4.High accuracy
※Classified data points all refreshed class 1,1 in every time, which don’t accord to data
points property classify, unduly subjective judgment.
27
CHAPTER 5
FUZZY CORRELATION
Correlation is frequently used to find out relation whose correlation between two-variables
or two-attributes in data. If variations of two random variables X and Y are existence and
greater than zero
  X ,Y  
 0 ,
correlation of X and Y , representing   X , Y  , and defining
Cov X , Y 
. In the defining, a significant property of random variables X
Var X   Var Y 
and Y is  1    X ,Y   1.
Correlation estimates linearly correlation degree between variable X and Y . While a
value of   X , Y  is closed to +1 or -1, it represents highly linear correlation between X and
Y ; otherwise, while a value of   X , Y  is zero  0 , it represents irrelative correlation. The
positive value of   X , Y  represents positive correlation which when the value of X is
added, the value Y also tend to be added; otherwise, the negative value of   X , Y 
represents negative correlation which when the value of X is added, then the value Y also
tend to be decreased. If the   X ,Y   0 , the irrelative correlation between X and Y .
By way of correlation analysis, we can easily measure relation in common between
two-attributes or two-variables. How to distinguish correlation degree from two-undefined
attributes or variables? Fuzzy correlation is a correlation degree measurement of fuzzy set
[10][74][75]. For example, there are N -undefined attributes A1 , A2 ,..., AN and n elements
x1 , x2 ,..., xn ; We do not know real connotation in every A and x , and we just know
membership degree of each x in each A . For this reason, how to measure correlation degree
28
of two-undefined attributes Ai and A j , i  j ?
5. 1 Fuzzy Correlation
Evaluating a measurement is not absolute formula by viewing different viewpoints from
the measurer. The same as above, evaluating correlation degree of two-undefined attributes is
not also absolute formula, and the correlation degree may be accepted if it does not obey human
intuition. Therefore, employing correlation of fuzzy set is accepted evaluative implement that is
common so-called fuzzy correlation [10][17][22][70].
In 1999, Gerstenkorn and Manko propose a method about correlation between two-fuzzy
sets A and B , k  A, B  , defining
k  A, B  
C  A, B 
T  A  T B 
n
C  A, B     A  xi    B  xi   v A  xi   vB  xi 
i 1
n


T  A    A  xi   v A  xi  ,
i 1
n

2
2

T B     B xi   vB xi  ,
i 1
2
2
when  A : X  0,1, vA : X  0,1, and 0   A x  vA x  1;
B : X  0,1, vB : X  0,1, and 0  B x  vB x  1;
for all x  X
 A x is x membership degree in A , vA x is x no membership degree in A ; the
same ahead, B x is x membership degree in B , vB x is x no membership degree in
B . According to this definition, the value of k  A, B  will between 0,1 and Gerstenkorn
29
and Manko think that if the value is A  B , then the value is k  A, B  1 . However, above
definition only represents correlation degree in sense of propriety between two fuzzy set, and
it is still inconvenient to apply in real application.
In 1999, Ding-An Chiang and Nancy P. Lin propose a method about correlation between
two-fuzzy sets, and this way combines concept of correlation in traditional statistics with fuzzy
theory [74][75]. Assume that there is a random sample x1, x2 ,..., xn  in a sample space X ,
x1, x2 ,..., xn   X ,
 A x1 , B x1 ,...,  A xn , B xn 
alone with a sequence of paired data
which correspond to the grades of the membership functions of fuzzy sets A and B defined
on X now. Let us define the correlation coefficient  A, B between fuzzy set A and B :
  x     x    / n  1
n
 A, B 
A
i 1
i
A
  A xi 
B
i 1
n
 B xi 
i 1
n
  x    
n
, S A2 
n
B 
i
S A  SB
n
where  A 
B
i 1
2
A
i
A
n 1
  x    
n
, S B2 
i 1
, S A  S A2 ;
2
B
i
n 1
B
, S B  S B2 .
B xi   a   A xi   b , for some constant a, b .
According to above the definition, we know the correlation  A, B of fuzzy sets A and B
will in  1,1. Moreover, we can obtain correlation degree between fuzzy set A and B by
 A, B , and even more obtain correlations of positive correlation, negative correlation, or
irrelative correlation between fuzzy set A and B (positive correlation:  A, B  0 , negative
correlation:  A, B  0 , irrelative correlation:  A, B  0 ).
30
5. 2 Fuzzy Correlation for Multi-Categorization of
Documents
Traditionally the process of document categorization is according to concept of
document-self, and we take unclassified documents to a single class which is predefined. This
form of document categorization is called single-categorization of documents. Owning to the
concept that documents maybe involve different subjects of discussion, or the correlation is not
completely independent among every predefined category, it wonders about appropriateness for
the way that each document only belongs to an unitary, specific category.
For this reason, it is necessary that unclassified documents belong to different categories in
certain of conditions. This form of document categorization is called multi-categorization of
documents.
Up to present, many methods have been proposed to deal with single-categorization of
documents problem. But they are not suitable to be used to solve multi-categorization of
documents.
In order to solve the multi-categorization of documents, we employ fuzzy correlation in
multi-categorization of documents. Fuzzy correlation not only can discriminate from correlation
degree but also gets messages of positive, negative, and irrelative correlation between
documents
and
classified
categories.
By
correlation,
we
can
solve
traditionally
single-categorization method for multi-categorization of documents problem.
In fuzzy correlation [10] application, we need to define a keyword set and employ this
keyword set to obtain correlation degree between unclassified document and each category.
First of all we must define and employ the set of keyword that can estimate correlation degree
between unclassified documents and each classified categories.
We assume the defined keyword set X , X  wk1, wk2 ,..., wkn , and we observe a certain
31
document
T
and
classified
T wk1 , T wk2 ,..., T wkn 
categories
to
Ci
obtain
 wk ,  wk ,...,  wk 
and
Ci
1
Ci
i
Ci
.
n
two-set
T
and
numeric:
C
i
are
membership values, wki , i  1,..., n individually representing each keyword belong to high
significant degree between unclassified documents T and each classified categories Ci , then
we can employ fuzzy correlation  T ,  C
as correlation degree between unclassified documents
i
T and each classified categories Ci :
  wk      wk    /n  1
n

T

, Ci
i 1
T
i
T
  wk 
i 1
T
2
T
,S
n
  wk 
i 1
Ci
i
T wki  
Ci
 wk     , S

2
i
T
n
i
,S
i
T
n 1
n
C 
i
ST  SCi
n
where T 
Ci
2
Ci


T
wki   C 
 ST2
2
Ci
i
n 1
, SCi  SC2i
nCi wki 
nT wki 
, Ci wki  
nT wki   nF wki 
nci wki   nF wki 
( nT wki  are frequency that keywords wki appear in a document T ; nCi wki  are frequency
that keywords wki appear in categories Ci ; nF wki  are frequency that keywords wki
appear in the Frequency List (FL) table )
When the fuzzy correlation  T ,  C i  0 , the unclassified documents T
and each
classified categories Ci belong to positive correlation; on the contrary, when the fuzzy
correlation  T ,  C i  0 , the unclassified documents T and each classified categories Ci
belong to negative correlation; the fuzzy correlation  T ,  C i  0 , the unclassified documents T
and each classified categories Ci belong to irrelative correlation.
32
Perhaps we can employ another way to compute fuzzy correlation [10] between the
document T and each classified category Ci . We employ membership function Tm wki 
and C l wki  in the fuzzy theory to measure a significant degree of every keywords wki
between documents Tm and every classified category Cl , and we adopt a kind of fuzzy
keyword


 

FX wki   wk1 , Tm wk1  , wk2 , Tm wk2  ,..., wkn , Tm wkn 
set
to
present
significant degree of keywords in the documents.
T wki  
m
nTm wki 
nTm wki   nF wki 
, m  1,2,..., p; i  1,2,..., n
nTm wki  : frequency that keywords wki appear in this documents Tm
nF wki  : frequency that keywords wki appear in a frequency list FL
The same as above, we also apply membership function to present a significant degree
between these keywords wki and every classified category Cl ,


 

FX wki   wk1 , Cl wk1  , wk2 , Cl wk2  ,..., wkn , Cl wkn 
C wki  
l
nCl wki 
nCl wki   nF wki 
, l  1,2,..., k ; i  1,2,..., n
nCl wki  : frequency that keywords wki appear in these classified categories Cl
nF wki  : frequency that keywords wki appear in a frequency list FL
Then we can compute membership function between the unclassified documents Tm and
classified categories Cl :
n
C Tm  
l
  wk    wk 
i 1
Tm
i
Cl
n
  wk 
i 1
Tm
i
; m  1,2,..., p; l  1,2,..., k ; i  1,2,..., n
i
When every unclassified documents and every classified category are presented with
membership values of fuzzy theory, we can undertake to compute membership degree for
33
every unclassified document in the multiple categories, that is our search for
multi-categorization of documents. We apply intersection of fuzzy theory to compute
membership degree. Cl  C j , ; l , j  1,2,..., k . However, we compute intersection after we
employ multi-class SVMs classifier to classify out every document that belongs to a particular
category.
According to that particular category, then we will intersect that particular category
(by multi-class SVMs) with the other classified categories to compute membership degree.


C C Tm   m i nC Tm , C Tn 
l
j
l
j
then we apply  -cut threshold to evaluate the minimum with satisfied restrictive condition,
and it can still provide appropriately elasticity in the multi-categorization status.
 

  m i nm a xC C Tn 
l, j
l
j
We can get an unclassified document that maybe can belong to other multiple categories
in addition to multi-class SVMs classify out one particular category.


SCl  T | Cl T    , l  1,2,..., k
We can adopt either measure of fuzzy correlation described in this section and the outputs
of all pairwise coupling SVM binary classifiers for the multi-class classification and multi-label
categorization problems. In both ways, a document is assigned to the classes with larger
confidences for the multi-label categorization problem and the maximum confidence for the
multi-class classification problem. For the multi-label categorization, this post-processing
thresholding step is independent of the learning step. The critical step in thresholding is to
determine the value, known as the threshold, at which the larger confidence measures (fuzzy
correlations, in our approach) are considered. We adopt the positive fuzzy correlation
coefficients for the larger confidence measures and   cut for the threshold, respectively.
34
CHAPTER 6
FUZZY CORRELATION AND
SUPPORT VECTOR LEARNING
6.1 Framework of The Proposed Approach
Machine Learning
N documents
Fuzzy correlation
Figure 6.1 Framework of document categorization
We can clearly understand document categorization from above graph. In the document
categorization model, webpage data are input documents and predefined categories are output
webpage which belongs to. For example: earn, acquisitions, money-fx, and grain four categories.
35
We handle the document categorization by machine learning methods, for example: fuzzy rules,
neural networks, data warehousing, naïve, Bayesian networks, rough set…as categorization
webpages methods.
Input
Preprocess
Process
Output
Class E
SVMs
Class S
N-documents
Fuzzy
Correlation
TFIDF
&
Feature
Selection
Class M
Class F
Figure 6.2 Framework of the proposed approach
We input the frequency of words in the webpages, and then by TFIDF (term frequency
inverse document frequency) transform weights in a pre-processing. Features selection of vector
are input vector, and by multi-class processing model “Support vector learning” which has
training categorization and test categorization, and finally we can judge categories by output
value  1,1 .
In this thesis, we use support vector machines (SVMs) in the document categorization. The
input data resource comes from Reuters data set. There are three pre-processing phases 1)
computing words frequency, 2) processing stop of words, and 3) feature selection. We employ
OAO-SVMs (one-against-one SVMs) strategy to improve common shortcomings in the
36
multi-class categorization methods and let a whole training architecture be more rationalization.
By output values  1,1 determine categorization categories finally.
Input
Data
&
Approaches
Reuters-21578
collection of
documents
Pre-process
1.Web
Frequency
Indexer ,
computing
Words
Frequency
2.Removing
Stop of
Words by
oneself
Process
Output
1.One-against- Output
One (OAO) Class
SVMs
(Multi-Class
SVMs)
2. Fuzzy
Correlation
3.Using
TFIDF to
execute
Feature
Selection
In our thesis, we apply Reuters-21578 collection data set benchmark to do document
categorization. We can collect part of articles which are from 2000 chapters to test and verify
our thesis. The input form is multi-dimension vector, and by pre-processing stages, the input
vectors are as follows,
T1 ~ Tm
feature words
CTS CORP SHR
T1  3.1 1.4 4.0 ... ... ...
T2  0 1.2 2.4 ... ... ...
Input Vector = ...  ...
... ... ... ... ...

... 2.4 0 5.1 ... ... ...
Tm  0 1.4 0 ... ... ...
37
MTHS
... 3.4 
... 0 
... ... 

... ... 
... 4.21
According to above, we can know an input vector is multi-dimensional matrix. A row
displays feature words of all documents, and the average every chapter of the features are at
least 50-70. Therefore, it is quite great quantity dimension. There will be still hundreds of
features even if by feature selection. As for a column, it presents chapter quantity of document.
In this matrix, these values present a weight value for every feature. If the keyword appear in
someone document, the weight value will display numerals 1.4,4.21,... in the matrix, but if
no, then the weight value will display zero 0  .
6.2 Pre-processing
Before accomplishing input vectors, we have to transform every documents data. SVMs
can manipulate after pre-processing. Normalizing and formatting data, and computing words
frequency, processing stop of words, feature selection… are all my pre-processing phases.
These phases are indispensable to pre-transforming on document categorization.
A.
Computing words frequency
We pick out significant words in every document by computing program, and
transform these words into the values. We can clearly understand importance of these words in
every document through words appear frequency and weights. Therefore, document
categorization is computing words frequency in every document in first stage.
How to select methods in computing words frequency is more difficult because the
selectiveness is too much. Therefore, in order to operating convenience, and retrenching cost of
time, we choose a webpage that can compute words frequency in network. It’s not only to
obtain conveniently but also to transform quickly, un-needed plug program outside, appropriate
to any system….
38
We chose the webpage that is web frequency indexer of Georgetown University to
compute words frequency tool. Because we do not input net address, we just copy article of
documents to paste and then to transform, and this webpage will help us to compute word
frequency as follows:
Web frequency indexer of Georgetown University
( http://www.georgetown.edu/faculty/ballc/webtools/web_freqs.html )
Figure 6.3 Web frequency indexer webpage
Figure 6.4 Web frequency indexer computes word frequency
39
B.
Processing stop of words
Stop of words are prepositions (of, in, on, about, for, …), conjunctions
(when, while, how, and, but,…), articles (an, a, the, this, that, these, those, another, others,…),
numerals (#12, 11,4,…), auxiliary verbs (will, may, should, can,…), expletives (oh, wow,…),
symbols (%, $, #,…) and so on. If there are too many stop of words in the document
categorization, they are unmeaning and interfere with classified accuracy. Therefore, we have to
remove these stop of words from input vectors, prevent unnecessary interference, and reduce
dimension of input vectors. However, up to present, there is ineffective to remove stop of words.
It still classify manually in real program. Consequently handling these stop of words is quite
time-consuming.
shr
shr
at
$
#12
the
corp
mths
cts
a
corp
Pick out the
stop of words
Class M
mths
cts
Class M
Figure 6.5 Picking out stop of words in bags of words
C.
Feature selection
In the pattern recognition domain, there are two methods to reduce dimension, one
is the feature selection and the other is the feature extraction. Reducing the feature concentrates
on the way of feature selection in the document categorization domain. The other way is the
feature extraction, but this way is not main stream in general.
40
There are two kind of the feature selection methods, one is the threshold methods,
and the other is the information theory methods. In general, we common use the threshold
methods to choose in the feature selection.
C.1
Threshold methods:
1.
Document frequency thresholding, DF
DF thresholding is a simple way of vocabulary decrease. We compute DF to
have less than set threshold value in training set.
2.
Information gain, IG
IG is common employed to handle the best vocabulary or string in machine
learning. By vocabulary or string of known or unknown, we can obtain IG in
every category.
3.
Mutual information, MI
MI applies in relation application of statistic language model, and it is
common standard criterion. That is computing relation between vocabulary and
vocabulary.
4.
 2 statistic, CHI
 2 statistic is applied in an independent of event on statistic analysis. To
construct a table between vocabulary and categories happened.
5.
Term frequency inverse document frequency, TFIDF
TFIDF combines term frequency with inverse document frequency This
way is the most general use in document categorization. TFIDF not only computes
words of weights but also normalizes the length of document vectors. TFIDF not
only handles presentation of documents but also does the function of feature
selection.
6.
Others: Odds ratio, weirdness, term strength (TS)…
41
C.2
Information theory methods:
1.
Single-noise ratio
This way applies in the words of particular structure document.
2.
Feature clustering
The goal is to find similar features and gather in a cluster, and every new
cluster can become a new feature that regards as a concept.
6.3 Fuzzy Correlation and One-against-One (OAO) SVMs
for Multi-Categorization of Documents
We adopt fuzzy correlation with OAO SVMs (one-against-one SVMs). That is our
categorization architecture. The reason of adopting SVMs is according to Chapter 1 which was
already introduced. For documents property, they are appropriate way for the support vector
learning. In multi-class categorization, we adopt one-against-one (OAO) SVMs to handle
because one-against-one (OAO) SVMs classifiers strategy is superior to all multi-class SVMs
(one-against-rest (OAR) SVMs, Hierarchies or trees of binary SVMs, decision directed acyclic
graph (DDAG) SVMs). Beside OAO, in order to solve problem in multi-categorization of
documents we also apply fuzzy correlation to compute correlation degree between every
documents and every categories. By fuzzy correlation, we can understand which one document
is related to which categories in degree, and then we can classify these documents more
appropriately through OAO. We will give an example below to display the inner framework:
We give an example to explain. We assume four categories: music (M), food (F), sport (S),
education (E), and webpage of every category are separately music-three chapters ( M1 , M 2 , M 3 ),
food-four chapters ( F1 , F2 , F3 , F4 ), sport-five chapters ( S1 , S2 , S3 , S4 , S5 ), education-six chapters
42
( E1 , E2 , E3 , E4 , E5 , E6 ). The classified architecture is used to multiple SVMs classifiers because
one SVM just can classify two categories and the output one is +1
in your left side, the other is -1in your right side. We can train these data through SVMs.
General OAO-SVMs
M1 , M 2 , M 3
F1 , F2 , F3 , F4
M1 , M 2 , M 3
S1 , S2 , S3 , S4 , S5
M1 , M 2 , M 3
E1 , E2 , E3 , E4 , E5 , E6
F1 , F2 , F3 , F4
S1 , S2 , S3 , S4 , S5
F1 , F2 , F3 , F4
E1 , E2 , E3 , E4 , E5 , E6
S1 , S2 , S3 , S4 , S5
SVM
SVM
SVM
SVM
SVM
E1 , E2 , E3 , E4 , E5 , E6
Figure 6.6 Multi-class SVMs (OAO-SVMs) architecture I
43
SVM
Majority Voting Scheme
M1 , M 2 , M 3
S1 , S2 , S3 , S4 , S5
F1 , F2 , F3 , F4
Dancing Course
E1 , E2 , E3 , E4 , E5 , E6
Cooking Course
SVM
+1
-1
M1 , M 2 , M 3
F1 , F2 , F3 , F4
S1 , S2 , S3 , S4 , S5
E1 , E2 , E3 , E4 , E5 , E6
SVM
+1
M1 , M 2 , M 3
SVM
-1
+1
S1 , S2 , S3 , S4 , S5
F1 , F2 , F3 , F4
-1
E1 , E2 , E3 , E4 , E5 , E6
Figure 6.7 Multi-class SVMs (OAO-SVMs) architecture II
44
But in terms of figure 6.7, we can clearly set music category is  1 , education category is
 1 when it trains. However, the remnant categories (for example: sport, food two categories)
will be set  1 or  1 ? If we can not decide immediately, support vector learning will not
classify. A scholar decides category which is  1 or  1 in advance. In order words, food and
sport categories are set to belong to  1 or  1 in a first classifier in advance. In this state, we
still do not understand food and sport categories which belong to music a more or education a
more. No matter what classifiers, one-against-one classifiers strategy, hierarchies or trees of
binary SVM classifiers strategy, and decision directed acyclic graph all have the same problem.
This problem is that the remnant webpage (or documents) will belong to what category in a next
level classifier when we classify webpage immediately.
Is there a method to solve the above-mentioned problem? This concept looks alike
clustering. The remnant webpage (or documents) in next level classifier belong to music
category (  1 ) a more or education category (  1 ) a more We can give an example: movies types
categorization. A movies can classify general audiences (G), parental guidance suggested (PG),
parents strongly cautioned (PG-13), restricted (R), no one 17 and under admitted (NC-17) five
categories. We can classify out G and NC-17 two categories immediately if we only classify
bi-categories. For other categories (PG, PG-13, and R), we can clearly define which belong to
what category (G (  1 ) or NC-17 (  1 )). Therefore, we need a method to evaluate other remnant
types: G, PG, PG-13 belong to a category  1 ; R, NC-17 belong to the other category  1 . Or G,
PG belong to a category  1 ; PG-13, R, NC-17 belong to the other category  1 ; et al.
How about selecting appropriate categorization way to classify in a fuzzy region again
when we understand the problem? In order to solve this problem, we can use fuzzy membership
for every webpage (or documents) or modify the training state (we still use SVMs classifier
because we have provided advantage of SVMs in the chapter 2).
In the multi-class SVMs classifiers strategy, we employ pairwise categorization (OAO
45
SVMs, OAR SVMs classifiers) to solve traditional multi-class SVMs categorization problem for
unclear data reset category (  1class , or  1class ). The OAO-SVMs with majority voting scheme
and OAR-SVMs with max distance solve multi-class SVMs problem (reset  1class /  1class ).
Previously using pairwise categorization [76][77] in the ECOC (Error-Correcting Output Coding)
with max distance (ex. hamming distance/hamming code) for classifications (text, pattern,
image…) to improve categorization accuracy, and the other way is by probability to compute
likelihood value for every document belong to every category.
We apply OAO-SVMs with fuzzy correlation to solve multi-categorization of documents.
OAO-SVMs is a kind of multi-class SVMs classifiers strategy, and it is also pairwise
categorization classifier. OAO-SVMs has higher accuracy than OAR-SVMs, and easier to
understand than DDAG-SVMs in the same accuracy. However OAO-SVMs has a problem that
is reset  1,1 category in every classified data points.
In order to improve this status, we use fuzzy correlation with OAO-SVMs classifier to
solve rest category every time and multi-categorization of documents. Our method is as follows,
for OAO-SVMs architecture, has
N  N  1
classifiers, and adds voting scheme (fuzzy
2
correlation) to evaluate test data which will belong to which category (music, food, sport,
education four categories) and compute correlation between document and category.
46
Voting Strategy
M 1 , M 2 , M 3 , F1 , F2 , F3 , F4 , S1 , S 2 , S 3 , S 4 , S 5 , E1 , E2 , E3 , E4 , E5 , E6 
M , F
SVM
M /F
F, S
SVM
F/S
S, E
E, M 
SVM
SVM
S/E
E/M
M , S
SVM
M /S
Voting Strategy
(Fuzzy Correlation)
Output
◎Test Data belongs to
M - (Music Class) probability
F - (Food Class) probability
S - (Sport Class) probability
E - (Education Class) probability
◎Fuzzy Correlation : M & F , F & S , S & E ,
E &M , M &S , F & E
47
F, E
SVM
F/E
S2 , S3
margin
F2 , F4 , S1, S4 , S5 
F1, F3

M1, M 2 , M 3
E1, E2 , E3 , E4 , E5 , E6 
optimal hyperplane
Figure 6.8 Improving pre-set states in training data
In classifier phase, we still adapt SVMs because SVMs is the best categorization accuracy
in webpage (or documents) than other classifiers ,for example: naïve bayes, K-nearest neighbors,
Rocchio,
C4.5,
LLSF,
NNet,
DNF,
decision
tree…( SVM, KNN > LLSF > Multilayer edPerceptrons >> Multinomia lNaiveBaye s )[31][32],(
SVM 0.864 > KNN 0.823 > Rocchio 0.799, C 4.50.794 > NaiveBayes0.72 )[33][34],(
SVM 0.66 > KNN 0.591 > NaiveBayes0.57> Rocchio 0.566 > C 4.50.50 )[69], in the chapter
2 which has been described. However, in other phase, we can use fuzzy membership of those
data (webpage or documents), and those data with fuzzy value can be according to its fuzzy
value classified.
48
M 1 , M 2 , M 3 , F1 , F2 , F3 , F4 , S1 , S 2 , S 3 , S 4 , S 5 , E1 , E2 , E3 , E4 , E5 , E6 
Fuzzy membership function
SVMs
M1, M 2 , M 3 , F1
E1, E2 , E3 , E4 , E5 , E6 , S4 , S5 
F2 , F3 , F4 , S1, S2 , S3
E1, E2 , E3 , E4 , E5 , E6 , S4 , S6 
F2 , F3 , F4 , S2 , S3 , S5 
M1, M 2 , M 3 , F1
Figure 6.9 Improving two-class SVMs in gray region
49
CHAPTER 7
EXPERIMENTAL RESULTS
7. 1 Experimental Data Source
A.
Experimental Data Source
Our experimental data source comes from Reuter-21578 data set that was provided by
David D.Lewis of AT& T Lab in 1997. Up to present, that data set are provided freely in this
webpage; http://www.research.att.com/~lewis. This data set is the most common used of
standard criterion in document categorization. There are twenty thousand of documents that are
to differ in length, over one hundred eighteen categories of documents, and every document has
two hundred of words in average. Almost literatures all use ten categories to research data.
Therefore, in our thesis, we also select ten categories (eg. earnings, corporate acquisition,
money market, grain, crude, trade, interest, wheat, ship, corn) in the research situation from
Reuter-21578 classified categories. In the ten categories, we randomly select fifteen hundred
training data set and fifty test data set to as research data.
B.
Experimental Data Form
The data form of Reuters-21578 data set is SMGL form. Using tags of SGML
language as a label to match up mapping document type definition (DTD) of SGML document
form, and it is explicit boundary for title, categories, and content of document which is every
significant portion.
1.
All documents in the database are classified by five-categorization methods, all
50
categories have explicit description in detail by five-categorization methods, and give as far as
possible every category a appropriate denomination.
2.
Every document is given a new identification number (NEWID) according to
time serial, and every one-thousand document are combined a file.
There are five-categorization methods of documents in the Reuter-21578. It is Topics,
Places, People, Orgs, and Exchanges respectively. The most general use method is Topics
categorization in document categorization research. The Topics in Reuters-21578, are classified
five macro-categories and one hundred thirty-five micro-categories.
Table 7.1 Documents in one category owned
Owning N-documents
N  1000
1000  N  100
100  N  0
N 0
Categories
2
21
97
15
Table7.2 Different categories with training numbers
Dumais, S. et.
Category name
Num train
Acquisition (acq)
1650
Earn (earn)
2877
Grain (grain)
433
Money-fx (money)
538
Crude (crude)
389
Trade (trade)
369
Interest (interest)
347
Wheat (wheat)
212
Ship (ship)
197
Corn (corn)
182
Our thesis
Category name
Num train
Acq
150
Earn
150
Grain
150
Money
150
Crude
150
Trade
150
Interest
150
Wheat
150
Ship
150
Corn
150
51
Training Data
Test Data
Figure 7.1 Reuters-21578 dataset form
52
7.2 Experimental Results
Because the quantity of experimental data is too huge to describe in detail, we will give a
simple example to describe the experimental process. In our thesis, we use a tool that is
program design software Matlab 6.5 of engineering, and by this tool, we can compute complex
and a great quantity of matrix operation
In the program, we employ part of elements of Matlab toolbox program, and design
program by myself to transform data between training and test. Following the experimental
process of a simple example:
Step 1. Randomly selects documents of training and test data from Reuters-21578
collections, and employs Web Frequency Indexer to compute the frequency of every word in
every document.
53
Figure 7.2 Web frequency indexer computes every word frequency of every chapter
Step 2. Employing EXCEL separates vocabulary and frequency and annotates
documents which are serial number.
Figure 7.3 Computing every word frequency and belonging to which one document
54
Step3. Copying vocabulary, frequency, and serial number of documents into two
TXT document files which are input data of Matlab program reading-in.
Figure 7.4 Data form before input
55
Step4. Employing a designed Matlab program takes TXT format files of training data
and test data reading-in, and transforms them into input vector form of entering SVMs.
Training data before classifying
Training data before classifying
56
Training data before classifying
Training data before classifying
57
Training data before classifying
Training data before classifying
58
Training data before classifying
Training data before classifying
59
Training data before classifying
Training data before classifying
60
Training data before classifying
Training data before classifying
61
Test data before classifying
Test data before classifying
Figure 7.5 Training data and test data before classifying
62
Step5.Using Matlab toolbox trains parameter program to read in the input vector of
training data and to proceed multi-class categorization tree process. Training out decision
parameter function (training data, pre-classified categories,  , and b value).
63
64
65
66
67
Figure 7.6 Decision function parameter values
68
Step6. Inputting parameter of trained and test data into Matlab toolbox program
classified operation elements program computing , and getting every document belongs to
which one category.
Figure 7.7 Belonging to which one category in the multi-class SVMs
69
We use 1500 training data from four categories (acq, earn, money, grain, crude, trade,
interest, wheat, ship, corn), and 50 test data will be experimental data. We employ four different
dimensions (50, 300) with every data in 150 training data. The test data is employed to two
dimensions the same as training data.
Table 7.3 Accuracy comparison with different methods
Findsim NBayes BayesNets
Trees
Linear
SVM
50
50
50
50
300
300
64.70% 87.80%
88.30%
56.70% 89.70% 93.70%
92.90% 95.90%
95.80%
79.50% 97.80% 98.00%
67.50% 78.80%
81.40%
85.40% 85.00% 94.60%
46.70% 56.60%
58.80%
84.50% 66.20% 74.50%
70.10% 79.50%
79.60%
81.70% 85.00% 88.90%
65.10% 63.90%
69.00%
72.00% 72.50% 75.90%
63.40% 64.90%
71.30%
70.20% 67.10% 77.70%
68.90% 69.70%
82.70%
86.50% 92.50% 91.80%
49.20% 85.40%
84.40%
85.00% 74.20% 85.60%
48.20% 48.20%
76.40%
78.50% 91.80% 90.30%
63.67% 73.07%
78.77%
78.00% 82.18% 87.09%
OAO: Using One-against-One SVM learning method
K: Dimension length
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
OAO
300
75.50%
90.00%
96.50%
96.50%
85.50%
78.50%
83.50%
91.00%
88.60%
92.50%
87.81%
Findsim 50
Nbayes 50
BayesNets 50
OAO 50
Ac
q
Ea
rn
Gr
ain
M
on
ey
Cr
ud
e
Tr
ad
e
Int
ere
st
W
he
at
Sh
ip
Co
rn
Accuracy (%)
K
Acq
Earn
Grain
Money
Crude
Trade
Interest
Wheat
Ship
Corn
Avg Top 10
OAO
Different categories (10-categories in the 50-dimension &
differnet learning machines)
70
Tree 300
Lnear SVM 300
OAO 300
Ac
q
Ea
rn
Gr
ain
M
on
ey
Cr
ud
e
Tr
ad
e
Int
ere
st
W
he
at
Sh
ip
Co
rn
Accuracy (%)
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
Different categories (10-categories in the300-dimension &
differnet learning machines)
Figure 7.8 Accuracy with ten categories in the 50, 300-dimension
Table 7.4 Accuracy in the different dimensions
OAO
OAO
K
50
300
Acq
56.70%
75.50%
Earn
79.50%
90.00%
Grain
85.40%
96.50%
Money
84.50%
96.50%
Crude
81.70%
85.50%
Trade
72.00%
78.50%
Interest
70.20%
83.50%
Wheat
86.50%
91.00%
Ship
85.00%
88.60%
Corn
78.50%
92.50%
Avg Top 10
78.00%
87.81%
71
Avg-accuracy (%)
100.00%
80.00%
60.00%
Avg Top 10
40.00%
20.00%
0.00%
50
50
50
50
Findsim
Nbayes
BayesNets
OAO
Avg-accuracy (%)
Different learning machines in the 50-dimension
100.00%
80.00%
60.00%
Avg Top 10
40.00%
20.00%
0.00%
300
300
300
Tree
Lnear SVM
OAO
Different learning machines in the 300-dimension
Figure 7.9 Average-accuracy in the different learning machines
72
We can get different experimental results from different dimension, and the higher
dimension will be the more accuracy when we get. When a dimension (50-dimension) is lower,
that indicates the keyword quantity is less than high dimension (300 -dimension). Therefore, for
accuracy, 50-dimension is not as good as other higher dimensions (300-dimension).
Step7. Using fuzzy correlation in classified categories which gives different weight to
different import data points, and adjusts classified elasticity; fuzzy correlation can decrease
overfitting effect and improve performance.
Figure 7.10 Computing keywords in the different categories
73
Table 7.5 Every document in the different categories appear frequency
Acq
Earn
Grain
Money Crude
Trade Interest Wheat
Ship
Corn
Document 1
886
1327
460
949
253
138
92
21
45
14
Document 2
743
893
620
1617
557
296
177
45
75
24
Document 3
943
621
324
2128
251
220
393
47
67
7
Document 4
283
114
200
293
130
167
22
17
38
6
Document 5
262
772
1498
752
160
85
10
60
33
56
Acq
2000
Earn
1500
Grain
1000
Money
500
Crude
Trade
0
1
D
oc
um
en
t2
D
oc
um
en
t3
D
oc
um
en
t4
D
oc
um
en
t5
Interest
en
t
D
oc
um
Keyword frequencies in
the different categories
2500
Five documents
Figure 7.11 keywords in the ten categories frequency
74
Wheat
Ship
Corn
Table 7.6 Correlation between every document and ten categories
Document 1
Document 2
Document 3
Document 4
Document 5
Acq/Earn
Acq/Grain
Acq/Money
Acq/Crude
Acq/Trade
0.120426(Earn) 0.084163(Grain) 0.069005(Acq) 0.119112(Crude) 0.069005(Acq)
0.188062(Acq) 0.188062(Acq) 0.188062(Acq) 0.226254(Crude) 0.188062(Acq)
0.097418(Acq) 0.097418(Acq) 0.097418(Acq) 0.097418(Acq) 0.097418(Acq)
0.187947(Acq) 0.187947(Acq) 0.187947(Acq) 0.187947(Acq) 0.072241(Trade)
0.750464(Earn) 0.744488(Acq) 0.746495(Money) 0.744488(Acq) 0.744488(Acq)
Acq/Interest
Acq/Wheat
0.069005(Acq) 0.092289(Wheat)
0.188062(Acq) 0.188062(Acq)
0.097418(Acq) 0.097418(Acq)
0.086886(Interest) 0.009547(Wheat)
0.744488(Acq) 0.744488(Acq)
Document 1
Document 2
Document 3
Document 4
Document 5
Earn/Grain
Earn/Money
0.120426(Earn) 0.120456(Earn)
0.044604(Earn) 0.174060(Money)
0.050268(Grain) 0.147439(Earn)
0.216264(Earn) 0.216264(Earn)
0.750464(Earn) 0.750464(Earn)
Acq/Ship
0.069005(Acq)
0.188062(Acq)
0.097418(Acq)
0.187947(Acq)
0.744488(Acq)
Earn/Crude
0.120456(Earn)
0.226254(Crude)
0.105767(Crude)
0.216264(Earn)
0.750464(Earn)
Earn/Wheat
Earn/Ship
0.120456(Earn) 0.120456(Earn)
0.044604(Earn) 0.109310(Ship)
0.075739(Wheat) 0.021144(Ship)
0.009547(Wheat) 0.216264(Earn)
0.750464(Earn) 0.750464(Earn)
Document 1
Document 2
Document 3
Document 4
Document 5
Grain/Money
0.084163(Grain)
0.174060(Money)
0.050268(Grain)
0.289866(Money)
0.746495(Money)
Grain/Crude
0.119113(Crude)
0.226254(Crude)
0.050268(Grain)
0.271311(Crude)
0.728464(Crude)
75
Acq/Corn
0.069005(Acq)
0.188062(Acq)
0.158219(Corn)
0.075291(Corn)
0.744488(Acq)
Earn/Trade
Earn/Interest
0.120456(Earn) 0.120456(Earn)
0.044604(Earn) 0.150447(Interest)
0.147439(Earn) 0.058989(Interest)
0.072241(Trade) 0.086886(Interest)
0.750464(Earn) 0.750464(Earn)
Earn/Corn
0.120456(Earn)
0.044604(Earn)
0.158219(Corn)
0.075291(Corn)
0.750464(Earn)
Grain/Trade
Grain/Interest
0.084163(Grain) 0.084163(Grain)
0.022044(Grain) 0.150447(Interest)
0.050268(Grain) 0.058989(Interest)
0.072241(Trade) 0.086886(Interest)
0.553465(Grain) 0.553465(Grain)
Grain/Wheat
Grain/Ship
Grain/Corn
0.092287(Wheat) 0.084163(Grain) 0.084163(Grain)
0.022044(Grain) 0.109310(Ship) 0.030244(Corn)
0.050268(Grain) 0.021144(Ship) 0.158219(Corn)
0.009547(Wheat) 0.257325(Ship) 0.075291(Corn)
0.553465(Grain) 0.677528(Ship) 0.553465(Grain)
Document 1
Document 2
Document 3
Document 4
Document 5
Money/Crude
Money/Trade
Money/Interest
Money/Wheat
0.119113(Crude) 0.057845(Trade) 0.044430(Money) 0.092289(Wheat)
0.226254(Crude) 0.174060(Money) 0.174060(Money) 0.174060(Money)
0.105767(Crude) 0.151960(Money) 0.058989(Interest) 0.075739(Wheat)
0.271311(Crude) 0.072241(Trade) 0.086886(Interest) 0.009547(Wheat)
0.746495(Money) 0.746495(Money) 0.746495(Money) 0.746495(Money)
Money/Ship
Money/Corn
0.044430(Money) 0.060037(Corn)
0.174060(Money) 0.174060(Money)
0.021144(Ship) 0.158219(Corn)
0.257325(Ship) 0.075291(Corn)
0.746495(Money) 0.746495(Money)
Document 1
Document 2
Document 3
Document 4
Document 5
Crude/Trade
0.119113(Crude)
0.226254(Crude)
0.105767(Crude)
0.072241(Trade)
0.728464(Crude)
Document 1
Document 2
Document 3
Document 4
Document 5
Crude/Interest
0.119113(Crude)
0.226254(Crude)
0.058989(Interest)
0.086886(Interest)
0.728464(Crude)
Trade/Interest
0.057845(Trade)
0.150447(Interest)
0.058989(Interest)
0.072241(Trade)
0.108223(Trade)
Crude/Wheat
0.119113(Crude)
0.226254(Crude)
0.075739(Wheat)
0.009547(Wheat)
0.728464(Crude)
Trade/Wheat
0.092289(Wheat)
0.077211(Trade)
0.075739(Wheat)
0.009547(Wheat)
0.411780(Wheat)
76
Crude/Ship
0.119113(Crude)
0.226154(Crude)
0.021144(Ship)
0.257325(Ship)
0.728464(Crude)
Trade/Ship
0.057845(Trade)
0.109310(Ship)
0.021144(Ship)
0.072241(Trade)
0.677529(Ship)
Crude/Corn
0.119113(Crude)
0.226154(Crude)
0.158219(Corn)
0.075291(Corn)
0.728464(Crude)
Trade/Corn
0.060037(Corn)
0.030244(Corn)
0.158219(Corn)
0.072241(Trade)
0.179348(Corn)
Document 1
Document 2
Document 3
Document 4
Document 5
Interest/Wheat
0.092289(Wheat)
0.150447(Interest)
0.058989(Interest)
0.009547(Wheat)
0.503183(Interest)
Interest/Ship
0.031982(Ship)
0.150447(Interest)
0.058989(Interest)
0.086886(Interest)
0.677529(Ship)
Interest/Corn
0.060037(Corn)
0.150447(Interest)
0.158219(Corn)
0.075291(Corn)
0.503183(Interest)
We employ fuzzy correlation to measure relation between every document and
categories. For fuzzy correlation, we do not use a criterion (ex.  -cut to satisfy the lowest
degree of restricted conditions), and we can evaluate relation degree (positive, negative,
irrelative) between every document and every categories. It is more objective for correlation
coefficient than a threshold decision (  -cut). Fuzzy correlation coefficient can provide
elasticity estimation in the multi-categorization of documents.
The results of the fuzzy correlation coefficients for combining SVM binary classifiers
are shown in table 7.6. It shows the confidence measures between test documents and categories.
In conjunction with forty-five pairwise coupling SVM binary classifiers, a document is assigned
to the classes with positive fuzzy correlation coefficients for the multi-label categorization
problem as shown in table 7.7.
Table 7.7 Multi-categories of documents
Acq
Earn Grain Money Crude Trade Interest Wheat Ship Corn
Document 1
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
Document 2
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
Document 3
ˇ
ˇ
ˇ
ˇ
ˇ
Document 4
ˇ
ˇ
ˇ
ˇ
Document 5
ˇ
ˇ
ˇ
ˇ
ˇ
77
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
C acq  Document1, Document 2, Document 3, Document 4, Document 5
C earn  Document1, Document 2, Document 3, Document 4, Document 5
C grain  Document1, Document 2, Document 3, Document 5
C money  Document1, Document 2, Document 3, Document 4, Document 5
C Crude  Document1, Document 2, Document 3, Document 4, Document 5
CTrade  Document1, Document 2, Document 4, Document 5
C Interest  Document 2, Document 3, Document 4, Document 5
CW heat  Document1, Document 3, Document 4, Document 5
C Ship  Document1, Document 2, Document 3, Document 4, Document 5
1
Document 1
Document 2
Document 3
Document 4
Document 5
0.8
0.6
0.4
0.2
n
or
C
Sh
ip
ru
de
Tr
ad
e
In
te
re
st
W
he
at
C
Ea
rn
G
ra
in
M
on
ey
0
A
cq
Correlation membership
C Corn  Document1, Document 2, Document 3, Document 4, Document 5
Ten categories
Figure 7.12 Correlation membership in the ten categories
In the table 7.2 and table 7.4, we can obtain the same accuracy as other methods for few
training data, and the accuracy will improve high to follow the dimension increased. After
running multi-SVMs, we then use fuzzy correlation to compute correlation between every
document and every category. We employ machine learning to classify nonlinear documents and
then by fuzzy correlation to evaluate correlation degree. The machine learning can help us to
classify documents, and it can improve accuracy and cut down time for running program.
Besides machine learning, the fuzzy correlation can find out correlation between every
document and every category, and then classified to multi-categorization.
78
CHAPTER 8
CONCLUSION AND FUTURE WORK
8. 1 Concluding Remarks
In the document categorization domain, owing to integrating diversification of science and
technology, more and more documents simultaneously include many knowledge and technology
in different domains. Therefore, multi-categorization of documents problem is more and more
important at present.
In many supervised learning tasks, a learned classifier automatically induces a ranking of
test examples, making it possible to determine which test examples are more likely to belong to
a certain class when compared to other test examples. However, for many applications this
ranking is not sufficient, particularly when the classification decision is cost-sensitive. In this
case, it is necessary to convert the output of the classifier into well-calibrated posterior
probabilities. An acknowledged deficiency of SVMs is that the uncalibrated outputs do not
provide estimates of posterior probability of class membership.
There are many theses in the automatic document categorization research to be brought up.
However, all of these methods just only determine unclassified document which shall be
classified one particular pre-defined category or shall belong to one particular pre-defined
category, and just only accomplish mission in the single-categorization of documents. Even
more SVMs that are most appropriately to apply in the document categorization, SVMs do not
solve multi-categorization of documents problem yet. For multi-categorization of documents,
categorization effect in all categorization methods is worse.
79
In our thesis, we have presented an efficient method for producing class membership
estimates for multi-class text categorization problem. Based on SVM binary classifiers in
conjunction with the class membership, our method relies on a measure of confidence given by
the fuzzy correlation. This approach not only solves multi-class classification but also
multi-label categorization problems.
8.2 Directions for Future Research
However, our thesis still has many problems to solve yet. In order to improve optimal
learning model, we will have many research and application space in the future work.
1.
How to process efficiently in Stop of Word.
2.
How to select an appropriate method of feature selection decreases multi-dimension
of input vector and improves accuracy.
3.
How to solve multi-categorization of documents methods improves categorization
accuracy in addition to our thesis provide to.
4.
How to establish a standard system model helps appropriately categorization for
webpage and e-mail in a real world.
5.
How to efficient classify for data mining, knowledge extraction and integration helps
people to analysis in useful information of every region.
6.
In addition to document categorization, we also implement in DNA sequence
recognition, image recognition, and medicine disease recognition and so on.
80
Bibliography
[1] S.Abe and T.Inoue, “Fuzzy Support Vector Machines for Multiclass Problems,” in
Proceedings of European Symposium on Artifical Neural Networks Bruges(Belgium),
pp.113-118, April 2002.
[2] A.Basu, C.Watters and M.Shepherd, “Support Vector Machines for Text Categorization,” in
Proceedings of the 36th Hawaii International Conference on System Sciences (HICSS-36),
2003.
[3] K.P.Bennett and Bredensteiner, “Multi-Category Classification by Support Vector
Machines,” E.J., Computational Optimizations and Applications, Vol. 12, pp. 53-79, 1999.
[4] R.Bekkerman, R.El.Yaniv, Y.Winter, and N.Tishby, “On Feature Distributional Clustering
for Text Categorization,” in Proceedings of the 24th Annual International ACM-SIGIR
Conference on Research and Development in Information Retrieval, pp. 146-153, 2001.
[5] Benkhalifa, M, A. Bensaid, and A. Mouradi, “Text Categorization Using the
Semi-supervised Fuzzy C-algorithm,” in Proceedings of the Fuzzy Information conference
in NAFIPS, pp. 561-565, 1999.
[6] Blosseville and M.J. et al., “Automatic Document Classification: Natural Language
Processing, Statistical Analysis, and Expert System Techniques Used Together,” in
Proceedings of ACM Transaction on Information System, pp. 51-58, July 1992.
[7] C.-C.Chang and C.-J.Lin, “The Analysis of Decomposition Methods for Support Vector
Machines,” in Proceedings of IEEE Transactions on Neural Networks, Vol. 11, No.4, pp.
1003-1008, 2000.
[8] C.-C.Chang and C.-J.Lin, “LIBSVM: A Library for Support Vector Machines,” Software
available a http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.
[9] W.T.Chen, “Multiclass Support Vector Learning with Application to Document
81
Classification,” Master Degree Thesis, Department of Information Engineering, I-Shou
University, June 2002.
[10] D.A.Chiang and N.P.Lin, “Correlation of Fuzzy Sets,” Fuzzy Sets and Systems, Vol. 102,
pp.221-226,1999.
[11] J.H.Chiang and P.Y.Hao, “Support Vector Learning Mechanism for Fuzzy Rule-Based
Modeling: A New Approach,” in Proceedings of IEEE Transactions on Fuzzy Systems, Vol.
12, No. 1, February 2004.
[12] C.Cortes and V.N.Vapnik, Support vector networks. Machine Learning, 20:1-25,1995.
[13] K.Crammer and Y.Singer, “One the Learnability and Design of Output Codes For
Multiclass Problem,” in Proceedings of Computer Learning Theory, pp. 35-46, 2000.
[14] N.Cristianini, “Support Vector and Kernel Machines,” ICML, 2001.
http://www.support-vector.net/icml-tutorial.pdf
[15] Fuka, K.Hanka, R., “Feature Set Reduction for Document Classification Problems,” in
Proceedings of IJCAI-01 Workshop: Text Learning: Beyond Supervision. Seattle, August
2001.
[16] F.Fung and O.L.Mangasarian, “Data Selection for Support Vector Machine Classifiers,” in
Proceedings of the ACM, 2000.
[17] T.Gerstenkorn and J.Manko, “Correlation of Intuitionistic Fuzzy Sets,” Fuzzy Sets and
Systems, Vol. 44, pp. 39-43, 1991.
[18] Gudivala et al., “Information Retrieval on the World Wide Web,” in Proceedings of IEEE
Internet Computing, Vol. 1, No. 5, pp. 58-68, September 1997.
[19] H.Han, C.L.Giles, E.Manavoglu and H.Zha, “Automatic Document Metadata Extraction
Using Support Vector Machines,” in Proceedings of the Joint Conference on Digital
Libraries (JCDL), 2003.
[20] P.Y.Hao and J.H.Chiang, “A Fuzzy Model of Support Vector Machine Regression,” in
82
Proceedings of IEEE International Conference on Fuzzy Systems, pp.738-742, 2003.
[21] D.H.Hong and C.Hwang, “Support Vector Fuzzy Regression Machines,” Fuzzy Sets and
Systems, Vol. 138, pp. 271-281, 2003.
[22] D.H.Hong and S.Y.Hwang, “Correlation of Intuitionistic Fuzzy Sets in Probability
Spaces,” Fuzzy Sets and Systems, Vol.75, pp. 77-81,1995.
[23] J.M.Hsu, “Learning from Incomplete Data Using Support Vector Machines,” Master
Degree Thesis, Institute of Information and Computer Science, National Chiao Tung
University, June 2002.
[24] H.P.Huang and Y.H.Liu, “Fuzzy Support Vector Machines for Pattern Recognition and
Data Mining,” in Proceedings of International Journal of Fuzzy Systems, Vol.4, No. 3,
September 2002.
[25] C.W.Hsu and C.J.Lin, “A Comparison of Methods for Multi-Class Support Vector
Machines, ” IEEE Transactions on Neural Networks, Vol. 13, pp. 415-425, 2002.
[26] J.M.Hsu, “Learning from Incomplete Data Using Support Vector Machines,” Master
Degree Thesis, Institute of Information and Computer Science, National Chiao Tung
University, June 2002.
[27] A.B.Hur, D.Horn, H.T.Siegelmann, and V.Vapnik, “Support Vector Clustering,” Journal of
Machine Learning Research, Vol. 2, pp. 125-137, 2001.
[28] T.Inoue and S.Abe, “Fuzzy Support Vector Machines for Pattern Classification,” in
Proceedings of International Joint Conference on Neural Networks, Vol. 2, pp.1449-1454,
July 2001.
[29] J.T.Jeng and T.T.Lee, “Support Vector Machines for the Fuzzy Neural Networks,” in
Proceedings of IEEE SMC’99 Conference. IEEE International Conference on Systems, Man,
and Cybernetics, Vol. 6, pp. 115-120, 1999.
[30] T.C.Jo, “Text Categorization with the Concept of Fuzzy Set of Informative Keywords,” , in
83
Proceedings of IEEE International Fuzzy Systems Conference, Vol. 2, pp. 609-614, August
22-25, 1999.
[31] T.Joachims, “A Statistical Learning Model of Text Classification for Support Vector
Machines,” GMD Forschungszentrum IT, AIS.KD Schloss Birlinghoven, 53754 Sankt
Augustin, Germany Thorsten.Joachims@gmd.de.
[32] T.Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text
Categorization,” in Proceedings of International Conference on Machine Learning, pp.
143-151, 1997.
[33] T.Joachims, “Text Categorization with Support Vector Machines: Learning with Many
Relevant Features,” in Proceedings of ECML-98, 10th European Conference on Machine
Learning, 1998.
[34] T.Joachims, “Transductive Inference for Text Categorization Using Support Vector
Machines,” in Proceedings of ICML-99, 16th International Conference on Machine
Learning, Morgan Kaufmann, San Francisco, CA, pp.200-209, 1999.
[35] S.J.Ker and J.N.Chen, “A Text Categorization Based on Summarization Technique,” in
Proceedings of NLPIR Workshop of ACL, pp. 79-83, 2000.
[36] A.K.Khalid, A.Tyrrell, A.Vachher, T.Travers, and P.Jackson, “Combining Multiple
Classifiers for Text Categorization,” in Proceedings of the Tenth International Conference
on Information and Knowledge Management, pp. 97-104, 2001.
[37] B.Kijsirikul and N.Ussivakul, “Multiclass Support Vector Machines Using Adaptive
Directed Acyclic Graph,” in Proceedings of the 2002 International Joint Conference on
Neural Networks (IJCNN’02), Vol. 1, pp. 980-985, May 2002.
[38] U.Krebel, “Pairwise Classification and Support Vector Machines,” In B.Scholkopf,
C.Burges, and A.Smola, edits, Advances in Kernel Methods, chapter 15, pp. 255-268. the
MIT Press, 1999.
84
[39] S.Lawrence and C.L.Giles, “Searching the World Wilde Web: General Scientific
Information Access,” in Proceedings of IEEE Communication, 1999.
[40] C.F.Lin and S.D.Wang, “Fuzzy Support Vector Machines,” in Proceedings of IEEE
Transaction on Neural Networks, Vol. 13, No. 2, pp. 464-471, March 2002.
[41] J.H.Lin and T.F.Hu, “Information Theoretic Support Vector Learning for Fuzzy Inference
Systems,” in Proceedings of International Conference on Informatics, Cybernetics and
Systems(ICICS), December 2003.
[42] H.R.Lin, “Combining Classifiers for Chinese Text Categorization,” Master Degree Thesis,
Department of Computer Science and Information Engineering, National Chung Cheng
University, August 2000.
[43] L.M.Manevitz and M.Yousef, “One-Class SVMs for Document Classification,” Journal of
Machine Learning Research 2, Submitted 3/01; Published 12/01, pp. 139-154, 2001.
[44] B.Michael, “Support Vector Machines,”, 1999.
http://www.cse.ucsc.edu/research/compbio/genex/genexTR2html/node3.html
[45] E.Mayoraz and E.Alpaydin, “Support Vector Machines for Multi-Class Classification,”
IWANN’99, Alicante, Spain, June 1999.
[46] D.McKay and C.Fyfe, “Probability Prediction Using Support Vector Machines,” in
Proceedings of the Fourth International Conference on Knowledge-Based Intelligent
Engineering Systems & Allied Technologies, 30th August-1st ,September 2000.
[47] J.C.Platt, et al., “Large Margin DAGs for Multiclass Classification,” appear in Advances in
Neural Information Proceedings Systems 12 S.A. Solla, T.K. Leen and K.-R. Muller (eds.),
pp. 547-553, MIT Press, 2000.
[48] J.Rennie, “Improving Multi-Class Text Classification with the Support Vector Machines,”
Master’s thesis, Massachusetts Institute of Technology, 2001.
[49] S.M.Ruger and S.E. Grauch, “Feature Reduction for Document Clustering and
85
Classification,” DTR 200/8, Department of Computing, Imperial College London, pp. 1-9,
September 2000.
[50] B.Scholkopf, C.J.C.Burges, and A.J.Smola, “Introduction to Support Vector Learning,,” in
Proceedings of Advances in Kernel Methods- Support Vector Learning, MIT Press, pp. 1-15,
Cambridge, MA, 1999.
[51] F.Schwenker,
“Hierarchical
Support
Vector
Machines
for
Multi-Class
Pattern
Recognition,” in Proceedings of IEEE 4th International Conference on Knowledge-Based
Intelligent Engineering System & Allied Technologies, 30th Aug-1st, Brighton, UK, pp.
561-565, September 2000.
[52] F.Sebastiani, “Machine Learning in Automated Text Categorization,” in Proceedings of the
ACM Computing Surveys, Vol. 34, No. 1, pp. 1-47, March 2002.
[53] S.Scott and S.Matwin, “Feature Engineering for Text Classification,” in Proceedings of
ICML-99, 16th International Conference on Machine Learning, pp. 1-13, 1999.
[54] T.Scheffer and S.Wrobel, “Text Classification Beyond the Bag-of-Words Representation,”
in Proceedings of the nineteenth International Conference on Machines Learning (ICML),
2002.
[55] J.G.Shanahan and N.Roma, “Improving SVM Text Classification Performance through
Threshold
Adjustment,”
in
Proceedings
of
European
Conference
on
Machine
Learning(ECML), pp. 361-372, September 2003.
[56] P.Sollich, “Probabilistic Methods for Support Vector Machines,” in Proceedings Advances
in Neural Information System, The MIT Press, 2000.
[57] Z.Sun and Y.Sun, “Fuzzy Support Vector Machine for Regression Estimation,” in
Proceedings of IEEE International Conference on Systems, Man and Cybernetics, Vol. 4,
October 2003.
[58] Tahami and V., “A Fuzzy Model of Document Retrieval Systems,” in Proceedings of
86
Information and Management, Vol. 12, pp. 177-187, 1976.
[59] Y.Tan, Y.Xia, and J.Wang, “Neural Network Realization of Support Vector Methods for
Pattern Classification,” in Proceedings IEEE-INNS-ENNS International Joint Conference on
Neural Networks (IJCNN’00)24-27, pp. 411-416, July 2000 Como, Italy.
[60] N.L.Taso, “The Investigation of Fuzzy Document Classification on Internet,” Master
Degree Thesis, Department of Information Engineering, Tamkang University, Taipei, 2000.
[61] J.J.Tasy and J.D.Wang, “Improving Automatic Chinese Text Categorization by Error
Correction,” in Proceedings of the 5th International Workshop on Information Retrieval with
Asian Language, pp. 1-8, 2000.
[62] Y.H.Tseng, “Automatic Information Organization and Subject Analysis for Digital
Documents,” Journal of Taipei Public Library, vol. 20, No. 2, pp. 23-35, December 2002.
[63] Y.H.Tseng, “Effectiveness Issues in Automatic Text Categorization,” Journal of the library
association of China, No. 68, pp. 62-83, June 2002.
[64] V.N.Vapnik, The Nature of Statistical Learning Theory. John Wiley and Sons, New York,
1995.
[65] J.N.Wang, “A Study of Multiclass Support Vector Machines,” Master Degree Thesis,
Department of Information Management, Yuan-Ze University, June 2003.
[66] J.Weston and C.Watkins, “Multi-Class Support Vector Machines,” Royal Holloway
Technical report CSD-TR-98-04, 1998.
[67] K.J.Wu, “Learning Between Class Hierarchies for Text Categorization,” Master Degree
Thesis, Department of Computer Science and Information Engineering, National Chung
Cheng University, August 2001.
[68] Y.Yang and J.O.Pedersen, “A Comparative Study on Feature Selection in Text
Categorization,” in Proceedings of International Conference on Machine Learning,
pp.412-420, 1997.
87
[69] Y.Yang, “A Study on Thresholding Strategies for Text Categorization,” in Proceedings of
the 23rd Annual International ACM-SIGIR Conference on Research and Development in
Information Retrieval, pp. 137-145, 2001.
[70] C.Yu, “Correlation of Fuzzy Numbers,” Fuzzy Sets and Systems, Vol. 55, pp. 303-307,
1993.
[71] H.Yu, J.Han and K.Chang, “PEBL: Web Pages Classification without Negative Examples,”
in Proceedings of IEEE Transaction on the Knowledge and Data Engineering, Vol. 16, pp.
70-81, January 2004.
[72] C.H.Zheng, G.W.Zheng, L.G.Jiao and A.L.Ding, “Multi-targets Recognition for
High-Resolution Range Profile of Radar based on Fuzzy Support Vector Machine,” in
Proceedings of the Fifth International Conference on Computational Intelligence and
Multimedia Applications(ICCIMA’03), pp.407-412, September 2003.
[73] ISIS Research Group, “Support Vector Machines”, 2001.
http://www.isis.ecs.soton.ac.uk/resources/svminfo/
[74] Zadeh and L.A., “Fuzzy Sets,” Information and Control, Vol. 8, pp. 338-353, 1965.
[75] Zimmermann and H.-J, “Fuzzy Set Theory- and Its Applications,” 2nd revised edition,
Kluwer Academic Publishers, 1991.
[76] R.Ghani, “Using Error-Correcting Codes For Text Classification,” in Proceedings of the
17th International Conference on Machine Learning (ICML 2000).
[77] H.P.Kriegel, R.Kroger, A.Pryakhin, and M.Schubert, “Using Support Vector Machines for
Classifying Large Sets of Multi-Represented Objects,” in Proceedings of the International
Conference on Data Mining (SDM’04), pp.102-114, 2004.
[78] Y.Yang and X.Liu, “A re-examination of text categorization methods,” in Proceedings of
the ACM SIGIR Conference on Research and Development in Information Retrieval, 1999.
[79] Adam Berger, “Error-correcting output coding for text classification,” in Proceedings of
88
IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999.
[80] J.Friedman, “Another Approach to Polychotomous,” Technical Report, Dept. of Statistics,
Stanford University, 1996.
[81] T.Hastie and R.Tibshirani, “Classification by Pairwise Coupling,” The Annals of Statistics,
26 (1): 451-471,1998.
[82] D.Price, S.Knerr, L.Personnaz, and G.Dreyfus, “Pairwise Neural Network Classifiers with
Probabilistic Outputs,” Neural Information Processing Systems, 7:1109-1116, 1995.
[83] J.Platt, “Probabilistic output for Support Vector Machines and Comparison to Regularized
Likelihood methods,” in Advances in Large Margin Classifiers, 2000.
[84] K.D.Baker and A.K.McCallum, “Distributional clustering of words for text classification,”
in Proceedings of the 21th Annual Int. ACM SIGIR conference on Research and
Development in Information Retrieval, p. 96-103, 1998.
[85] Y.Yang, “An Evaluation of Statistical Approaches to Text Categorization,” Information
Retrieval, 1(1).
[86] D.D.Lewis and M. Ringuette, “A Comparison of Two Learning Algorithms for Text
Classification,” in Proceedings of the Third Annual Symposium on Document Analysis and
Information Retrieval, p. 81-93, 1994.
[87] L.Aas and L.Eikvil, “Text Categorization: A Survey,”. Norwegian Computing Center,
Report NR 941, 1999.
89
Download