Department of Multimedia and Entertainment Science, Southern Taiwan University

advertisement
Consumers’ Preferences Modeling With Multiclass Fuzzy Support Vector
Machines
Chih-Chieh Yang
Department of Multimedia and Entertainment Science, Southern Taiwan University
No. 1, Nantai Street, Yongkang City, Tainan County, Taiwan 71005
Meng-Dar Shieh
Department of Industrial Design, National Cheng Kung University, Tainan, Taiwan
70101
Abstract
Consumers’ preferences toward product design are often affected by a large
number of form features. It is very important for product designers to understand the
relationship between consumers’ preferences and product form features. In this paper,
an approach based on multiclass fuzzy support vector machines (multiclass fuzzy
SVM) is proposed to construct the prediction model of consumers’ preferences.
Product samples are collected and their form features are systematically examined.
Each product sample is assigned a class label and a fuzzy membership agreeing this
label to formulate a multiclass classification problem. A one-versus-one multiclass
fuzzy SVM model is constructed using collected product samples. Optimal training
parameter set of model is determined by a two-step cross-validation. A case study of
mobile phone design is also given to demonstrate the effectiveness of proposed
methodology. Two standard kernel functions including polynomial kernel and
Gaussian kernel are used and compared their performance. The experiment results
show that the performance of Gaussian kernel model is better than polynomial model.
The Gaussian model performed very well and is capable to prevent the overfitting
problem.
Keywords: Consumers’ preferences; Multiclass fuzzy support vector machines;
Mobile phone design
1. Introduction
The appearance of product is one of the most important factors affecting
consumers’ purchase decision. Traditionally, the quality of product form design
heavily depends on designers’ intuitions and not proves to gain success in the
1
marketplace. In order to understand the consumers’ preferences and develop appealing
product in a more effective manner, many researches have been done to study product
form design in systematic approaches. The most noticeable research is Kansei
Engineering proposed by (Jindo, Hirasago et al. 1995). The main issue is how to deal
with the inter-attribute correlations between attributes and take care of the nonlinear
property of attributes (Shimizu and Jindo 1995; Park and Han 2004). The mostly
adapted techniques in product design field such as multiple regression analysis (Park
and Han 2004) or multivariate analysis (Shimizu and Jindo 1995) heavily dependent
on the assumption of impendence and linearity hence can not deal with the
nonlinearity of the relationship effectively. In addition, prior to establish mathematical
model, data simplification and variable screening is often needed to obtain better
results (Hsu, Chuang et al. 2000). Fuzzy regression analysis (Shimizu and Jindo 1995)
and other method suffer the same shortcomings (Park and Han 2004).
(Vapnik 1995) developed a new kind of algorithm called support vector machine
(SVM). SVM has been shown to provide higher performance than traditional learning
techniques (Burges 1998). SVM’s remarkable and robust performance with respect to
sparse and noisy data makes them first choice in a number of applications such as
pattern recognition (Burges 1998) and bioinformatics (Scholkopf, Guyon et al. 2001).
SVM is known as its elegance in solving the nonlinear problem with the technique of
“kernels” that automatically do a nonlinear mapping to a feature space. In a
consequence, the nonlinear relationship between product form features can be
processed effectively by introducing suitable kernel function.
This study proposed an approach based on multiclass fuzzy SVM for consumers’
preferences modeling. This approach begins with processing product form with
discrete and continuous attributes and can also deal with sparse feature vectors. Each
product sample is assigned a class label and a fuzzy membership used to describe the
semantic differential score that agrees this label. A one-versus-one multiclass fuzzy
SVM model is constructed using collected product samples. Optimal training
parameter set of model is determined by a two-step cross-validation. The reminder of
the paper is organized as follows. Section 2 gives an introduction of multiclass fuzzy
SVM. Section 3 presents the proposed prediction model of consumers’ preferences.
Section 4 demonstrates the experimental results of the proposed model using mobile
phone design as example. Finally, Section 5 presents some brief conclusions and
suggestions for future work.
2. Multiclass fuzzy support vector machines
2.1. Fuzzy support vector machines for binary classification
2
An SVM maps the input points into high-dimensional feature space and finds a
separating hyperplane that maximizes the margin between two classes in the space.
Maximizing the margin is a quadratic programming (QP) problem can be solved from
its dual problem by introducing Lagrangian multipliers. Without any knowledge of the
mapping, the SVM finds the optimal hyperplane by using the dot product functions in
feature space via the aid of kernels. The solution of the optimal hyperplane can be
written as a combination of a few input points that are called support vectors.
In many real-world applications, input samples may not be exactly assigned to
one class and the effects of the training samples might be different. Some are more
important to be fully assigned to one class so that SVM can separate these samples
more correctly. Some samples might be noisy and less meaningful and should discard
them. Equally treating every data samples may cause unsuitable overfitting problem.
The original SVM lacks this kind of ability. (Huang and Liu 2002; Lin and Wang
2002) proposed the concept of fuzzy SVM which combines fuzzy logic and SVM to
make different training samples have different contributions to their own class. The
central of their concept is to fuzzify the training set and assign each data sample a
membership value according to its relative importance in the class. A description of
fuzzy SVM is given in the Appendix.
Figure 1 illustrated a simplified binary classification problem with only two
attributes training by fuzzy SVM using linear kernel. Since all data samples only have
two attributes, the data point can be plotted in 2D plane and explain the training
results in a more intuitive manner. Red and blue disks are two classes of training
l
samples. Grey values indicate the value of the argument
  y K ( x , x)  b
i i
i
of Eq. (20)
i 1
in the Appendix. Then new data sample without given class label can be discriminated
according to Eq. (3) in the Appendix. In Figure 1 the middle solid line is the decision
surface and data points lie on this surface satisfy D( x)  0 . The outer dash lines
precisely meet the constraint in Eq. (6) in the Appendix and data points lie on this
margin satisfy D( x)  1 or D( x)  1 .
In addition, support vectors are very useful in data analysis and interpretation. In
the original definition of SVM, the data points satisfying the condition  i  0 are
called a support vector. In fuzzy SVM, the same value of  i may indicate a different
type of support vectors due to the factor i (Lin and Wang 2002). The one with
corresponding  i  i C is misclassified. The one with corresponding 0   i  i C
lies on the margin of the hyperplane (marked by extra circles in Figure 1).
Different fuzzy memberships to different classes were applied to demonstrate the
training effect of i . In Figure 1(a) both classes are assigned membership equal to 1,
3
and the training result is exactly the same as SVM. In Figure 1(b) and Figure 1(c) red
and blue data points are assigned membership equal to 0.1 separately. Since i is the
attitude toward certain class label of the corresponding data sample, the training
results are sensitive to the membership of training samples.
Tuning parameter C can make balance between the minimization of the error
function and the maximization of the margin of the optimal hyperplane. Take the same
two-dimensional dataset in Figure 1(a) for example. Figure 2 shows the influence of
different parameter C from 10000 to 0.1 of polynomial kernel with degree 2. The
increase of C makes the training of SVMs less misclassifications and narrower
margin, while the decrease of C makes SVMs ignore more training points and get a
wider margin.
(a) All data membership = 1.0
(b) All blue data membership = 0.1
(c) All red data membership = 0.1
Figure 1. The relationship between membership value and training margin of fuzzy
SVM (linear kernel).
4
C = 10000
C = 1000
C = 100
C = 10
C=1
C = 0.1
Figure 2. The relationship between parameter C and the training margin
(polynomial kernel with degree 2).
2.2 One-versus-one multiclass support vector machines
In previous section we have described the concept of fuzzy SVM. However,
fuzzy SVM is still limited in binary classification. How to effectively extend SVM for
multiclass classification is still an on-going research issue. (Hsu and Lin 2001; Duan
and Keerthi 2005) compared the performance of several multi-class SVM methods
based on binary SVM, including “one-versus-rest” (OVR) (Bottou, Cortes et al. 1994),
“one-versus-one” (OVO) (Krebel 1999), and directed acyclic graph SVM (DAGSVM)
5
(Platt, Cristianini et al. 2000). The binary SVM was used as a component in these
multiclass classification algorithms. Since OVO method and DAGSVM were shown
to have higher accuracy for practical use (Hsu and Lin 2001), the OVO method was
n
n(n  1)
adapted in this study. The OVO method constructs   
binary SVMs for
2
 2
n -class problem, where each of the n(n  1) / 2 SVMs is trained on data samples from
two classes. Data samples are partitioned by a series of optimal hyperplanes. The
optimal hyperplane means training data is maximally distant from the hyperplane
itself, and the lowest classification error rate will be achieved when using this
hyperplane to classify current training set. These hyperplanes can be modified from
Eq. (2) in the Appendix as
wst  z  bst  0
(1)
and the decision functions are defined as Dst ( xi )  wst  z  bst , s and t mean two
arbitrary classes separated by an optimal hyperplane in n classes; wst is the weight
vector and bst is the bias term. After all n(n  1) / 2 classifiers are constructed, a
max-win voting strategy is used to examine all data samples (Krebel 1999). Each of
the n(n  1) / 2 OVO SVMs casts one vote. If Dst ( xi ) says xi is in the s-th class,
then the vote of xi for the s-th is added by one. Otherwise, the t-th is increased by
one. Then xi can be predicted in the class with the largest vote. Since fuzzy SVM is
a natural extension of traditional SVM, the same OVO scheme can be used to deal
with multiclass problem without any difficulty.
3. Prediction model of consumers’ preferences
This study aims to develop a prediction model based on consumers’ preferences.
The product samples are analyzed by considering the sparse and mixed properties of
form features. Class labels are used to describe the consumers’ preferences toward
product samples. The product samples are collected and their form features are
systematically examined. Consumers are asked to assign one most suitable class
labels to each product sample. After analyzing form feature of product samples and
collecting consumers’ evaluation data, an OVO multiclass fuzzy SVM model is
constructed. In order to obtain optimal training model, a two-step cross-validation is
used to search best combination of parameters.
3.1. Processing sparse and mixed product form features
6
Two characteristics of product form features are considered in this study. Firstly,
the form feature vector is often sparse. This is mainly because there often exist large
amounts of features to represent product form design, and each product sample is not
necessarily occupied all form features. The number of active or non-zero features in a
feature vector is lower than the total number of features. This situation is very
common in product form feature representation (Kwahk and Han 2002). Secondly,
product form features are often mixed with two kinds of attributes denoted as
“discrete” or “continuous” type. Discrete attributes denote categorical choices among
fixed number of variables, such as types of texture, material used in parts etc.
Continuous attributes such as length, and proportion often have some kind of scale or
can be measured and the domain of variable is continuous without interruption. SVM
can deal with mixed attribute of discrete and continuous types at the same time. Since
SVM requires that each data sample be represented as a vector of real numbers,
discrete attributes can be represented as integer number. Taking a three-category
attribute “circle, rectangle, triangle” for example, it can be coded as {1,2,3} . As for
continuous attributes, because kernel values usually depend on the inner products of
feature vectors, e.g. linear kernel and polynomial kernel, large attribute values might
cause numerical problems (Hsu, Chang et al. 2003). Continuous attributes are linearly
scaled to the range [0, 1] to avoid numerical difficulties during calculation.
3.2. Describing consumers’ preferences using class labels
The concept of product positioning was borrowed describe the consumers’
preferences toward product form design. (Kaul and Rao 1995) suggested that a
company should provide an array of products into the marketplace in order to the
needs of each homogenous consumer segment. Vice versa, consumers often make
choices in the marketplace according to the perceived product attributes. Base on this
idea, product samples are assumed to be distinguished by consumers and classified
into different groups. The managerial decisions can be made more effectively by
identifying the relative importance attached to various product attributes. Take the
mobile phone design for example, class labels such as sports, simplicity, female, plain
and business etc. are used to describe different product divisions provided in the
marketplace. Although there exist other product characteristics may affect consumers’
subjective perceptions (brand, price, etc.), the authors mainly emphasize on the
factors only in product form design. Other marketing strategies which may influence
the decision of consumers are beyond the scope of this study.
3.3. Collecting product samples
7
A total of 69 mobile phones were collected from the Taiwan market in 2006.
Three product designers each with at least 5 years experiences conducted the product
form features analysis. They first examined the main component structure using the
method proposed in (Kwahk and Han 2002) and then used this structure to analyzes
all product samples. Form features of each product sample were discussed by all
designers and determine one unified representation. Continuous attributes were
recorded directly while discrete attributes were processed by the method described in
Section 3.1. The color and texture information of the product samples were ignored
and emphasized on the form features only. All entities in the feature matrix are
prepared for training multiclass fuzzy SVM. Five class labels such as sports,
simplicity, female, plain and business are chosen for semantic evaluations. In order to
collect consumers’ perception data for mobile phone design, 30 subjects, including 15
males and 15 females, were asked to evaluate all product samples using the selected
five class labels. Each subject was asked to choose the most suitable class label for
resenting each product sample, and evaluates each sample in a semantic differential
scale from 0 (very low) to 1 (very high). Since each product sample only had single
instance when training the multiclass fuzzy SVM model, the label with most
frequently assigned label were used for representing each product sample. Training
multiple instances of samples is another interesting issue worth of further research.
The selected class label is assigned as +1, and rest of the labels is assigned as –1. The
semantic differential score is directly stored as the membership value for fuzzy SVM
training.
3.4. Constructing multiclass fuzzy SVM model
In this study, each product sample is assigned a class label to formulate a
multiclass classification problem. This problem is then divided into a series of OVO
SVM sub-problems. The objective of multiclass classification is to correctly
discriminating these classes from each other and each OVO problem is addressed by a
two different class labels (e.g. sports versus simplicity). Each classifier uses the fuzzy
SVM to define a hyperplane that best separates product samples into two classes.
Each test sample is sequentially presented to each of the 5  (5  1) / 2  10 OVO
classifiers and can be predicted which label it belong to, based on the OVO classifier
having the largest vote.
3.5. Choosing optimal parameters using cross-validation
8
Since the number of product samples is limited, it is important to obtain best
generalization performance and reduce the overfitting problem. Practical
implementation is to partition these data samples into training data and testing data.
Various partition strategies have been proposed including leave-one-out
cross-validation (LOOCV), k-fold cross-validation, repeated random subsampling,
and bootstrapping (Berrar, Bradbury et al. 2006). In this study, 5-fold cross-validation
is used to choose optimal parameters. The whole training samples are randomly
divided into five subsets of approximately equal size. Each multiclass model is trained
using 5  1  4 subsets and tested using the remaining subset. Training is repeated
five times and the average testing error rates for all the five subset that are not
included in the training data is calculated.
The performance of SVM model is heavily dependent on the regulation
parameter C and the parameter of chosen kernel function. Take the Gaussian kernel
for example, each binary classifier requires the selection of two parameters, which are
the regularization parameter C and kernel parameter  2 . C and  2 of each
classifier within the multiclass model are set to be the same for calculation efficiency.
Since cross validation may be very time-consuming, a two-step grid search is
conducted to find the optimal hyperparameter pair (Hsu, Chang et al. 2003). In the
first step, a coarse grid search is taken using the following sets of values:

C  103 ,...,103

and  2  103 ,...,103  . Thus 49 combinations of C and  2 are
tried in this step. An optimal pair (C0 , 02 ) is selected from the coarse grid search. In
the second step, a fine grid search is conducted around (C0 , 02 ) , where
C  0.2C0 ,0.4C0 ,...,0.8C0 , C0 ,2C0 ,4C0 ,...,8C0  , and


 2  0.2 02 ,0.4 02 ,...,0.8 02 , 02 , 2 02 , 4 02 ,...,8 02 .
All together, 81 combinations of C and  2 are tried in this step. The optimal
hyperparameter pair is selected from this fine search. Likewise, the same two-step
grid search is repeated on polynomial kernel. For polynomial kernel, the coarse grid
of polynomial is taken as C  103 ,...,103  and   1,2,...,5 . When (C0 ,  0 ) is
determined, the range of the fine grid search is as
C  0.2C0 ,0.4C0 ,...,0.8C0 , C0 ,2C0 ,4C0 ,...,8C0  , and
  0.20 ,0.40 ,...,0.80 , 0 ,1.20 ,1.40 ,...,1.80  .
After comparing the performance of all training models using different kernel
9
functions and parameters, the best combination of parameters obtained by
cross-validation is used to build the multiclass fuzzy SVM model.
4. Experimental results
4.1. Data set
Mobile phone design has been selected to demonstrate the proposed
methodology. Table 1 shows a part of product samples used in this study. The set of
Si (i  1, 2,...,10) represents a part of product samples to be analyzed; the set of
X i (i  1, 2,...,6) denotes the product form feature attributes; and the set of
Yi (i  1, 2,...,5) represents the class labels; i is the membership value of the +1 class
label of each product sample S i . For the sake of simplicity only six product form
features are listed in the example of Table 1.
X  { X1 , X 2 , X 3 , X 4 , X 5 , X 6 }
 {body-length,body-width,body-thickness,body-volume,
body-type,function-button-type}
Five class labels are used to describe consumers’ subjective perceptions of mobile
phone design. These class labels list as following:
Y  {Y1 , Y2 , Y3 , Y4 , Y5 }
 {sports, simplicity, female, plain, business}
Take product sample S1 as example, the consumer choose label Y4 “plain” and
the attitude of Y4 is   0.5 . A complete list of all product form features is shown in
Table 2.
Product samples
Product form features
Class labels and membership values
(S )
(X)
(Y, )
X1
X2
X3
X4
X5
X6
Y1
Y2
Y3
Y4
Y5

S1
0.75
0.45
0.72
0.62
2
3
-1
-1
-1
+1
-1
0.5
S2
0.67
0.43
0.64
0.47
3
3
-1
-1
+1
-1
-1
0.8
S3
0.79
0.42
0.57
0.48
1
3
+1
-1
-1
-1
-1
0.5
S4
0.75
0.44
0.6
0.5
3
3
-1
-1
-1
+1
-1
0.6
S5
0.67
0.42
0.77
0.54
2
2
-1
-1
+1
-1
-1
0.6
S6
0.72
0.48
0.53
0.47
2
3
-1
-1
-1
+1
-1
0.9
S7
1
0.44
0.56
0.63
1
1
-1
-1
+1
-1
-1
0.7
S8
0.77
0.45
0.81
0.71
2
1
-1
-1
+1
-1
-1
1
S9
0.75
0.45
0.72
0.62
2
3
-1
-1
+1
-1
-1
0.6
S10
0.67
0.43
0.64
0.47
3
3
-1
-1
+1
-1
-1
0.8
10
Table 1. Part of training product samples for mobile phone design.
Form features
Type
Attributes
Length
Continuous
None
Continuous
None
Continuous
None
Continuous
None
( X1 )
Width
( X2 )
Body
Thickness
( X3 )
Volume
( X4 )
Type
Discrete
( X5 )
Function button
Type
Block body
Flip body
Slide body
( X 51 )
( X 52 )
( X 53 )
( X 61 )
( X 62 )
Round
Square
( X 71 )
( X 72 )
( X 73 )
Circular
Regular
Asymmetric
( X 81 )
( X 82 )
( X 83 )
Square
Vertical
Horizontal
( X 91 )
( X 92 )
( X 93 )
( X 101 )
( X 102 )
( X 103 )
Discrete
( X6 )
Style
( X 63 )
Discrete
( X7 )
Shape
Discrete
Number button
( X8 )
Arrangement
Discrete
( X9 )
Detail treatment
Discrete
( X 10 )
11
( X 104 )
Position
Discrete
Panel
( X 11 )
Shape
Middle
Upper
Lower
Full
( X 111 )
( X 112 )
( X 113 )
( X 114 )
Square
Fillet
Shield
Round
( X 121 )
( X 122 )
( X 123 )
( X 123 )
Discrete
( X 12 )
Table 2. Complete list of product form features used in this study.
12
4.2. Training effect of different kernel functions
The training effect of polynomial kernel and Gaussian kernel are investigated
with the whole product samples. Average training accuracies of kernel functions and
the corresponding parameters are shown in Figure 3.
For polynomial kernel in Figure 3(a), the average error rates of linear kernel for
all parameter C were all larger than 40%. When p  2 , the regulating effect of
parameter C was most obvious. As the decrease of parameter C from 1000 to
0.001, the average error rate increased from 0% to 34.8%. This is due to the parameter
C can adjust the margin of optimal hyperplane. Since training with smaller C will
result in larger margin, training error can also be increased. The parameter C had
similar regulating effect when p  3 . However, the training error rate increased more
drastically than p  2 . Although the training accuracies of polynomial kernel ( p  1 )
were all superior to linear kernel, they might suffer from the problem of overfitting
and had poor generalization ability.
For Gaussian kernel in Figure 3(b), the regulating effect of parameter C was
less pronounced than polynomial kernel for all kernel parameters  . It has been
reported in (Wang, Xu et al. 2003) that too large and too small value of  both lead
to poor generalization performance. Our results exhibited similar effects of  . For
larger value of  , such as  2  10 , all training data were regarded as one data. In a
consequence, the training model cannot recognize new data and the training error rate
is very high. On the other hand, for smaller value of  , such as  2  10 , all training
data were regarded as support vectors, and they can be separated correctly. The
training error rate declined extremely. However, for untrained data, the training model
may not given good result due to overfitting problem.
In general, the linear kernel performed worse than nonlinear kernels. The
polynomial kernel and Gaussian kernel are capable to nonlinearly map the training
samples into higher dimensional space unlike linear kernel, thus they can handle the
case when the relation between product form features and class labels is nonlinear.
Since every single kernel function has different properties and generalization
performance, the advantages of different kernel functions can be combined by using
their mixtures (Smits and Jordaan 2002). In addition, there exist some theorems which
can help to build kernel functions that take into consideration the domain knowledge
(Barzilay and Brailovsky 1999), these issues are beyond the scope of this paper.
13
(a)
(b)
Figure 3. Average training accuracies using (a) polynomial kernel and (b) Gaussian
kernel.
14
4.3. Analysis of cross-validation process
In order to obtain best performance and reduce the overfitting of the training
model, a two-step cross validation process was used to determine optimal parameters.
Figure 4 shows the results of cross-validation for polynomial kernel. Best parameter
set (C0 ,  0 ) obtained from first step of coarse grid search was (100,1) by choosing
the lowest error rate of 71%. The optimal pair of parameter (C ,  ) obtained in
second step of fine grid search was (800,1) . The average error rate of second step
improved slightly to 68.1%. The results of cross-validation for Gaussian kernel are
shown in Figure 5. Best parameter set (C0 , 02 ) obtained from coarse grid search was
(10,10) . Optimal parameter set (C , 2 ) obtained in fine grid search was (40, 4) . The
training error rate also improved slightly from 73.9% in first step to 72.4% in second
step.
As shown in previous section, if the training model is built with whole data
samples and selecting one of the parameter set from the region with very low average
error rates (  10% ) in Figure 3, the training model can hardly get rid of the overfitting
problem. An interesting result shows that the best parameter set obtained by cross
validation of both kernel functions seems to lay on the boundary of the region with
very low average error rates. This indicates that the process of cross-validation is
capable to balance the trade off between improving training accuracy and prevent
overfitting. Since the purpose of cross-validation is to search the best combination of
parameters, the accuracy of the individual training model in this process is not we
concern, regardless for their high training error rates (all larger than 65%). Each of the
optimal parameters of polynomial kernel and Gaussian kernel obtained from
cross-validation were then used to build the final training model.
15
(a)
(b)
Figure 4. Average training accuracy of cross-validation in (a) coarse grid and (b) fine
grid using polynomial kernel.
16
(a)
(b)
Figure 5. Average training accuracy of cross-validation in (a) coarse grid and (b) fine
grid using Gaussian kernel.
17
4.4. Performance of the optimal training model
The best parameter set of polynomial kernel and Gaussian kernel obtained from
the cross-validation process are both used to build the multiclass fuzzy SVM training
model. The average accuracy rate of the polynomial kernel model with
(C,  )  (800,1) was 66.3%, while average accuracy rate of the Gaussian kernel model
with (C , 2 )  (40, 4) was 98.6%. The confusion matrices are used for further
analysis as shown in Table 3. Diagonal elements are the number of correctly
classified samples while off-diagonal elements indicate the number of misclassified
samples. For the polynomial kernel model in Table 3(a), the most confusing was the
“female” class. More than half of “female” samples were misclassified as “plain,”
“sports,” and “simplicity” class and the accuracy was down to 20%. According to our
observation, two characteristics of “female” product sample are the area of decoration
and the color of body. Since the color and texture of the product samples are ignored,
these samples may not provide enough information for polynomial kernel to correctly
classify them. For the Gaussian kernel model in Table 3(b), the model performed very
well and had only one misclassified sample. The performance of the Gaussian kernel
model with parameter set (C , 2 )  (40, 4) was better than polynomial kernel model.
Actual Class
(a)
Predicted Class
Accuracy
rate (%)
plain
sports
female
simplicity business
plain
10
3
0
1
0
71.4
sports
1
15
0
1
0
88.2
female
1
5
2
2
0
20.0
simplicity
1
0
0
15
1
88.2
business
1
1
0
2
7
63.6
Average accuracy rate
66.3
Actual Class
(b)
Predicted Class
Accuracy
rate (%)
plain
sports
female
plain
13
0
1
0
0
92.9
sports
0
17
0
0
0
100.0
female
0
0
10
0
0
100.0
simplicity
0
0
0
17
0
100.0
business
0
0
0
0
11
100.0
Average accuracy rate
simplicity business
98.6
Table 3. Confusion matrices and accuracy rate of the optimal training model obtained
from (a) polynomial kernel and (b) Gaussian kernel.
18
5. Conclusion and future works
In this paper, an approach based on multiclass fuzzy SVM is proposed to develop
a prediction model of consumers’ preferences. The OVO multiclass fuzzy SVM
model can deal with the nonlinear relationship between product form features by
introducing kernel function. The optimal training parameters were determined by a
two-step cross-validation process. According to the experimental results of mobile
phone design, the optimal training model was obtained by choosing the Gaussian
kernel model with lowest average error rates 72.4% of cross-validation. The
parameter set (C , 2 ) of the optimal training model was (40, 4) . The optimal
Gaussian kernel model training with all product samples also had very high accuracy
of 98.6%. In a consequence, the Gaussian kernel model is superior to the polynomial
model. The result is consistent with the fact that Gaussian kernel is popular and
commonly used in many applications due to its good features. For further discussions
of the properties of Gaussian kernel can be found in (Sathiya and Lin 2003; Wang, Xu
et al. 2003).
Since our case study was developed based on mobile phone design and used
relative small amount of product form features, the form features of different product
samples such as consumer electronics, furniture, car design etc. may have different
characteristics to consider with. A more comprehensive collection of different product
samples is needed to study the effectiveness of the proposed multiclass fuzzy SVM
model. Extending standard kernel functions such as polynomial kernel and Gaussian
kernel by considering the characteristics of product form features is also a very
interesting issue and requires further study.
Appendix: Fuzzy support vector machines
For a binary classification problem, a set S of l training samples, each
represented are given as ( xi , yi , i ) where xi is the feature vector, yi is the class
label, and i is the fuzzy membership. Each training sample belongs to either of two
classes. These samples are given a label yi  {1, 1} , a fuzzy membership   i  1
with i  1,..., l , and sufficient small   0 . The data samples with i  0 means
nothing and can be removed from training set without affecting the result. These
training samples can be used to build a decision function (or discriminant function)
D( x) , which is a scalar function of an input sample. Decision functions are simple
weighted sums of the training samples xi plus a bias are called linear discriminant
functions (Duda and Hart 1973), denoted as
19
D( x )  w  x  b
(1)
where w is the weight vector and b is a bias value. D( x) can be seen as a
hyperplane; w is the normal vector of the separating plane the bias term b is the
offset of the hyperplane along its normal vector. A data set is said to be linearly
separable if a linear discriminant function can separate it without error. In most cases,
finding a suitable linear discriminant function is too restrictive to be of practical use.
A solution to this situation is mapping the original input space into a higher dimension
feature space and searching the optimal hyperplane in this feature space. Let
zi   ( xi ) denote the corresponding feature space vector with a mapping function 
from R N to a feature space Z . The hyperplane can be defined as
w z  b  0
(2)
The set S is said to be linearly separable if there exists (w, b) such that the
inequalities
w  zi  b  1  yi =  1
w  zi  b  1  yi =  1
(3)
are valid for all data samples of the set S . For the linearly separable set S , a
unique optimal hyperplane can be found for which the margin between the projections
of the training points of the two classes is maximized. To deal with data that are not
linearly separable, the previous analysis can be generalized by introducing some
non-negative variables i  0 such that Eq. (3) is modified to
yi ( w  zi  b)  1  i , i  1,..., l
(4)
the non-zero  i in Eq. (4) are those for which the data samples xi does not satisfy
l
Eq. (3). Thus the term

i
can be thought of as some measure of the amount of
i 1
misclassifications. Since the fuzzy membership i is the attitude of the
corresponding sample xi toward one class and the parameter  i is the measure of
error in the SVM, the term ii is a measure of error with different weighting. The
optimal hyperplane problem is then regarded as the solution to
minimize
l
1 2
w  C i  i
2
i 1

subject to yi ( w  zi  b)  1  i , i  1,..., l ,
i  0 i  1,..., l
(5)
20
where C is a constant. The parameter C can be regarded as a regulation parameter.
Tuning this parameter can make balance between the minimization of the error
function and the maximization of the margin of the optimal hyperplane. A larger C
makes the training of SVMs less misclassifications and narrower margin. The
decrease of C makes SVMs ignore more training points and get a wider margin. It is
noted that a smaller i reduces the effect of the parameter  i such that the
corresponding point xi is treated as less important. The optimization problem (5) can
be solved by introducing Lagrange multiplier  and transformed into:
minimize W ( ) 
1
2
l
i
y
i
j yi y j ( zi
i 1 j 1
l
subject to
l
 
i
 zj) 
l

i
i 1
 0, 0   i  i C , i  1,..., l
(6)
i 1
and the Kuhn-Tucker conditions are defined as
 i ( yi ( w  zi  b)  1  i )  0, i  1,..., l
(7)
(8)
( i C   i )i  0, i  1,..., l
The data sample xi with the corresponding  i  0 is called a support vector.
There are two types of support vector. The one with corresponding 0   i  i C lies
on the margin of the hyperplane. The one with corresponding  i  i C is
misclassified. An important difference between SVM and fuzzy SVM is that the point
with the same value of  i may indicate a different type of support vectors in fuzzy
SVM due to the factor i (Lin and Wang 2002). The mapping  is usually
nonlinear and unknown. Instead of calculating  , the kernel function K is used to
compute the inner product of two vectors in the feature space Z and thus implicitly
defines the mapping function, which is
K ( xi , x j )   ( xi )   ( x j )  zi  z j
(9)
Kernel is one of the core concepts in SVMs and plays a very important role. The
following are three types of commonly used kernel functions:
linear kernel: K ( xi , x j )  xi  x j
(10)
polynomial kernel: K ( xi , x j )  (1  xi  x j ) p
(11)
Gaussian kernel: K ( xi , x j )  exp( xi  x j
21
2
/ 2 2 )
(12)
where the order p of polynomial kernel in Eq. (11) and the spread with  of
Gaussian kernel in Eq. (12) are adjustable kernel parameters. The weight vector w
and the decision function can be expressed by using the Lagrange multiplier  i :
w
l
 y z
(13)
i i i
i 1
D( x)  sign( w  z  b)  sign(
l
  y K ( x , x )  b)
i i
i
(14)
i 1
References
Abe, S. and T. Inoue (2002). Fuzzy support vector machines for multiclass problems.
European Symposium on Artificial Neural Networks Bruges, Belgium.
Barzilay, O. and V. L. Brailovsky (1999). "On domain knowledge and feature
selection using a support vector machine." Pattern Recognition Letters 20:
475-484.
Berrar, D., I. Bradbury, et al. (2006). "Avoiding model selection bias in small-sample
genomic datasets." Bioinformatics 22(10): 1245-1250.
Bottou, L., C. Cortes, et al. (1994). Comparison of classifier methods: a case study in
handwriting digit recognition. International Conference on Pattern
Recognition, IEEE Computer Society Press.
Burges, C. (1998). "A tutorial on support vector machines for pattern recognition."
Data Mining and Knowledge Discovery 2(2).
Duan, K.-B. and S. S. Keerthi (2005). "Which is the best multiclass SVM method? An
empirical study." Multiple Classifier Systems: 278-285.
Duda, R. O. and P. E. Hart (1973). Pattern classification and scene analysis, Wiley.
Hsu, C.-W., C.-C. Chang, et al. (2003). A practical guide to support vector
classification.
Hsu, C.-W. and C.-J. Lin (2001). "A comparison of methods for multi-class support
vector machines." IEEE Transactions on Neural Networks 13: 415-425.
Hsu, S. H., M. C. Chuang, et al. (2000). "A semantic differential study of designers'
and users' product form perception." International Journal of Industrial
Ergonomics 25: 375-391.
Huang, H.-P. and Y.-H. Liu (2002). "Fuzzy support vector machines for pattern
recognition and data mining." International Journal of Fuzzy Systems 4(3):
826-835.
Inoue, T. and S. Abe (2001). Fuzzy support vector machines for pattern classification.
Jindo, T., K. Hirasago, et al. (1995). "Development of a design support system for
office chairs using 3-D graphics." International Journal of Industrial
22
Ergonomics 15: 49-62.
Kaul, A. and V. R. Rao (1995). "Research for product positioning and design
decisions." International Journal of Research in Marketing 12: 293-320.
Krebel, U. (1999). Pairwise classification and support vector machines. Advances in
Kernel Methods - Support Vector Learning. B. Scholkopf, J. C. Burges and A.
J. Smola. Cambridge, MA, MIT Press: 255-268.
Kwahk, J. and S. H. Han (2002). "A methodology for evaluating the usability of
audiovisual consumer electronic products." Applied Ergonomics 33: 419-431.
Lin, C.-F. and S.-D. Wang (2002). "Fuzzy support vector machines." IEEE
Transactions on Neural Networks 13(2): 464-471.
Park, J. and S. H. Han (2004). "A fuzzy rule-based approach to modeling affective
user satisfaction towards office chair design." International Journal of
Industrial Ergonomics 34: 31-47.
Platt, J. C., N. Cristianini, et al. (2000). Large margin DAGs for multiclass
classification. Advances in Neural Information Processing Systems, MIT
Press.
Sathiya, S. and C.-J. Lin (2003). "Asymptotic behaviors of support vector machines
with Gaussian kernel." Neural Computation 15(7): 1667-1689.
Scholkopf, B., I. Guyon, et al. (2001). Statistical learning and kernel methods in
bioinformatics, San Miniato.
Shimizu, Y. and T. Jindo (1995). "A fuzzy logic analysis method for evaluating human
sensitivities." International Journal of Industrial Ergonomics 15: 39-47.
Smits, G. F. and E. M. Jordaan (2002). Improved SVM regression using mixtures of
kernels. Proceedings of IJCNN'02 on Neural Networks.
Wang, W., Z. Xu, et al. (2003). "Determination of the spread parameter in the
Gaussian kernel for classification and regression." Neurocomputing 55:
643-663.
23
Download