Department of Multimedia and Entertainment Science, Southern Taiwan University

advertisement
Product Form Feature Selection for Mobile Phone Design using LS-SVR and
ARD
Chih-Chieh Yang
Department of Multimedia and Entertainment Science, Southern Taiwan University
No. 1, Nantai Street, Yongkang City, Tainan County, Taiwan 71005
Meng-Dar Shieh
Department of Industrial Design, National Cheng Kung University, Tainan, Taiwan
70101
Abstract
In the product design field, it is important to pin point critical product form
features (PFFs) that influence consumers’ affective responses (CARs) of a product
design. Manual inspection of critical form features based on expert opinions (such as
those of product designers) has not proved to meet with the acceptance of consumers.
In this paper, an approach based on least squares support vector regression (LS-SVR)
and automatic relevance determination (ARD) is proposed to streamline the task of
product form feature selection (PFFS) according to the CAR data. The representation
of PFFs is determined by morphological analysis and pairwise adjectives are used to
express CARs. In order to gather the CAR data, an experiment of semantic
differential (SD) evaluation on collected product samples was conducted. The
LS-SVR prediction model can be constructed using the PFFs as input data and the
evaluated SD scores as output value. The optimal parameters of the LS-SVR model
are tuned by using Bayesian inference. Finally, an ARD selection process is used to
analyze the relative relevance of PFFs to obtain feature ranking. A case study of
mobile phone design is also given to demonstrate the proposed method. In order to
examine the effectiveness of the proposed method, the predictive performance is
compared with a typical LS-SVR model using a grid search with leave-one-out cross
validation (LOOCV) and a multiple linear regression (MLR). The proposed model
based on LS-SVR and ARD is outperformed to the other two model benefits from the
soft feature selection mechanism. Furthermore, the resulting feature ranking is also
compared with that of the MLR with backward model selection (BMS). The results of
these two methods using the CAR data of three adjectives exhibit similar feature
ranking. Since only one small data of mobile phone design are investigated, a more
comprehensive investigations based on different kinds of products will be needed to
verify the proposed method in our future study.
1
Keywords: Feature selection; Least squares support vector regression; Automatic
relevance determination; Bayesian inference
1. Introduction
The way a product looks is one of the most important factors affecting a
consumer’s purchasing decision. Traditionally, the success of a product’s design
depended on the designers’ artistic sensibilities, which quite often did not meet with
great acceptance in the marketplace. Many systematic product design studies have
been carried out to get a better insight into consumers’ subjective perceptions. The
most notable research is Kansei engineering (KE) (Jindo et al., 1995). The basic
assumption of KE studies is that there exists a cause-and-effect relationship between
product form features (PFFs) and consumers’ affective responses (CARs) (Han and
Hong, 2003). Therefore, a prediction model can be constructed from collected product
samples and the CAR data. Using the prediction model, the relationship between PFFs
and CAR can be analyzed and a specially designed product form for specific target
consumer groups can be produced more objectively and efficiently. The construction
of the prediction model can be regarded as a function estimation problem (regression)
taking PFFs of collected product samples as input data and the CAR data as output
values. Various methods can be used to construct the prediction model including
multiple linear regressions (MLR) (Han et al., 2004) quantification theory type I (QT1)
(Lai et al., 2006), partial least squares regression (PLSR), neural networks (NN)
(Hsiao and Huang, 2002) and support vector regression (SVR). Of these methods,
SVR’s remarkable performance makes it the first choice in a number of real-world
applications (Vapnik et al., 1997) but it has not been adapted to the product design
field according to our knowledge.
It is an important issue in KE (Han and Hong, 2003), that the problem of product
form feature selection (PFFS) according to consumers’ perceptions has not been
intensively investigated. The subjective perceptions of consumers are often influenced
by a wide variety of form features. The number of form features could be many and
might be highly correlated to each other. Therefore, manual inspection of the relative
importance of PFFs and finding out the most critical features that please consumers is
a difficult task. In the product design field, critical design features are often arrived at
based on the opinions of experts or focus groups. However, the selection of features
based on expert opinion often lacks objectivity. Only a few attempts have been made
to overcome these shortcomings in the PFFS process. For example, Han and Hong
(2003) used several traditional statistical methods for screening critical design
2
features including principal component regression (PCR), cluster analysis, and partial
least squares (PLS). In the study of Han and Yang (2004), a genetic algorithm-based
PLS method is applied to screen design variables.
According to the magazine article by Blum and Langley (1997) feature selection
can be divided into three major categories: the filter, wrapper, and embedded methods,
based on the relationship between the selection scheme and the learning algorithm
used in the model construction. The embedded method directly incorporates the
feature selection process within the learning algorithm. Such examples are CART
(Breiman et al., 1983), LASSO (Tibshirani, 1996), support vector machine recursive
feature elimination (SVM-RFE) (Guyon et al., 2002) and automatic relevance
determination (ARD) (MacKay, 1994). The optimal feature subset can be determined
simultaneously during the training process. The embedded method usually involves
evaluating the changes in an objective function of the applied learning algorithm. The
feature subset can be selected by optimizing the objective function that jointly
rewards the predictive performance and penalizes for selecting too many features.
Compared to the wrapper method, the embedded method can reach a solution faster
by avoiding retraining the learning algorithm from scratch for every possible feature
subset.
Depending on whether the features will be eliminated or not, the feature
selection process can be viewed in another manner and divided into ‘hard’ feature
selection and ‘soft’ feature selection (Viaene et al., 2005). Most existing methods
belong to the hard feature selection category, which aims to prune irrelevant inputs by
eliminating features with relatively low importance. The purpose of the hard feature
selection is often to deal with the well-known ‘curse of dimensionality’ problem.
When there are a great number of features and there is not enough training data, the
irrelevant inputs may reduce the performance of the prediction model and therefore
need to be eliminated. In contrast, rather than seeking to eliminate irrelevant features,
the soft feature selection keeps all the original inputs and adjusts the influence of the
features to the learning algorithm (Grandvalet and Canu, 2003). The underlying
concept of the soft feature selection mechanism is that every feature should more or
less have a certain influence. Situations where all features are either highly relevant or
not relevant at all are uncommon. This description is especially pertinent in the case
of product form design. Since the perceived feelings toward a product form is based
on one’s observation of the product’s appearance, every PFFs should have some
degree of influence on the consumer’s affective response. However, studies of the soft
selection method in the literature are relatively sparse. Of the above-mentioned
methods, the ARD method has been successfully applied in a number of real-world
applications (Lopez et al., 2005; Wang and Lu, 2006; Behrens et al., 2007). Although
3
most applications of ARD are proposed in combination with the Bayesian neural
networks (MacKay, 1994), ARD can also be integrated into other learning algorithm
such as Bayesian SVM (Chu et al., 2006) or other Bayesian based models (Behrens et
al., 2007).
In our previous study (Shieh and Yang, 2007), a PFFS method for SVM-RFE
based on a multiclass classification model of product form design was proposed.
Unlike the classification-based feature selection method, such as SVM-RFE, which is
based on the consumer’s judgment to discriminate the product form design from each
other, the regression-based feature selection method, such as ARD used in this study,
treats the PFFS problem in a different manner. This study proposes an approach based
on least squares support vector regression (LS-SVR), as a variant of the SVR
algorithm and ARD to deal with the PFFS problem. A similar backward elimination
process of SVM-RFE is used to gradually screen out less important PFF based on the
soft feature selection mechanism of ARD. PFFs used as input data and the CAR
scores gathered from the questionnaire as output values are used to construct the
LS-SVR prediction model. The optimal values of the parameters of the LS-SVR
model are determined by Bayesian inference. ARD as a soft-embedded feature
selection method is used to determine the relevance of PFFs based on the constructed
LS-SVR model. The remainder of the paper is organized as follows: Section 2
presents the outline of the proposed PFFS method. Section 3 introduces the theoretical
backgrounds of the LS-SVR algorithm and the ARD method for feature selection. A
detailed implementation procedure of the proposed method is given in Section 4.
Section 5 demonstrates the results of the proposed method using mobile phone
designs as an example. Section 6 presents some conclusions and discussions. Finally,
several possible future studies to extend the limitations of this work are described in
Section 7.
2. Outline of the proposed method for product form feature selection
The flow chart of the proposed method for PFFS is presented in Fig. 1. The
procedure comprises the following steps:
(1) Determine the product form representation using morphological analysis.
(2) Conduct the questionnaire investigation for semantic differential (SD) evaluation
to gather the CAR data on product samples.
(3) Construct the LS-SVR prediction model based on PFFs and the pairwise
adjectives.
(4) Analyze the relative relevance of PFFs and obtain feature ranking using the ARD
selection process.
4
< Insert Fig. 1 about here >
3. Theoretical backgrounds
3.1. Least squares support vector regression
This section briefly introduces the algorithm of LS-SVR. A training data set D
of l data points are given,
( x1 , y1 ) ( xl , yl )
(1)
where xi  Rn is the input data, yi  R is the desired output value. LS-SVR begins
with approximating an unknown function in the primal weight space using the
following equation:
(2)
f ( x)  w   ( x)  b
where w  R k is the weight vector and  () : Rn  Rk is a nonlinear function that
maps the input space into a higher dimension feature space. By introducing the
quadratic loss function into LS-SVR, one can formulate the following optimization
problem in the weight space
minimize
1 2
1
w C
2
2
l

2
i
(3)
i 1
subject to the equality constraints
yi  w   ( xi )  b  i , i  1,..., l
where C  0 is the regularization parameter, b is a bias term, and  i
(4)
is the
difference between the desired output and the actual output. When the dimension of
w becomes infinite, one cannot solve the primal problem in Eq. (3). The dual
problem can be derived using the Lagrange multipliers method. The mapping  is
usually nonlinear and unknown. Instead of calculating  , the kernel function K is
used to compute the inner product of the two vectors in the feature space and thus
implicitly defines the mapping function, which is
K ( xi , x j )   ( xi )   ( x j )
(5)
The following are three commonly used kernel functions (Vapnik, 1995):
linear: K ( xi , x j )  xi  x j
(6)
polynomial: K ( xi , x j )  (1  xi  x j ) p
(7)
radial basis function (RBF): K ( xi , x j )  exp( 
5
xi  x j
2
2
)
(8)
where the indices i and j correspond to different input vectors. p in Eq. (7) and
 in Eq. (8) are adjustable kernel parameters. In an LS-SVR algorithm, the kernel
parameter and the regularization parameter C are the only two parameters that need
to be tuned, which is less than that for the standard SVR algorithm. In the case of the
linear kernel, C is the only parameter that needs to be tuned. The resulting LS-SVR
prediction function is as follows:
f ( x) 
l
 K ( x, x )  b.
i
(9)
i
i 1
For detailed deviations of the LS-SVR algorithm the authors refer to the textbook of
Suykens et al. (2002).
3.2. Feature selection based on automatic relevance determination
In this study, ARD is used to perform the soft feature selection, which is
equivalent to determining the relevant dimensions of the n -dimensional input space
using Bayesian inference. The simplest form of ARD can be carried out by
introducing the standard Mahalanobis kernel (Herbrich, 2002) as follows:
K ( xi , x j )  exp(
n

xid  x dj
2
 d2
d 1
(10)
)
where xid denotes the dth element of input vector xi , d  1,..., n . The Mahalanobis
kernel can be regarded as a generalized RBF kernel in Eq. (8), which has a separate
length scale parameter  d for each input space. The influence of the dth feature in
the input space can be adjusted by setting larger or smaller values of  d . The
Bayesian inference of ARD is then implemented by minimizing an objective function
and the relevance of each input can be determined automatically according to the
optimized  d values. However, the objective function often converges very slowly
and results in very large values of optimized  d when using the standard
Mahalanobis kernel. As a consequence, an augmented Mahalanobis kernel with an
additional two parameters is adapted as follows (Chu et al., 2006):
K ( xi , x j )   a exp( 
n

d 1
xid  x dj
 d2
2
)  b
(11)
where  a  0 denotes the amplitude parameter and  b  0 denotes the offset
parameter. The parameters { a , 1 ,..., n ,  b } can be determined simultaneously
during the optimization process. The ARD process for PFFS combined with Bayesian
inference for parameter tuning is described in Section 4.5. For detailed derivations of
Bayesian inference for ARD, the authors recommend the textbook of Suykens et al.
6
(2002).
4. Implementation procedures
This study implements a method for PFFS by considering the case study of
mobile phone design. The detailed procedures are described in the following
paragraphs.
4.1. Determination of product form representation
In this aspect of form representation for product design, this study adopted
morphological analysis (Jones, 1992). In fact, morphological analysis in KE studies is
the most widely used technique in KE due to its simple and intuitive way to define
PFFs. The mixture of continuous and discrete attributes is also allowed. Practically,
the product is decomposed into several main components and every possible attribute
for each component is examined. For this study, a mobile phone was decomposed into
body, function button, number button and panel. Continuous attributes such as length
and volume are recorded directly. Discrete attributes such as type of body, style of
button etc. were represented as categorical choices. Twelve form features of the
mobile phone designs, including four continuous attributes and eight discrete
attributes, were used. Continuous attributes were normalized to a zero mean and unit
variance. The discrete attributes, which are in fact nominal variables, were
pre-processed using one-of-n encoding, instead of encoding as integer numbers.
Taking a three-category attribute “red, green, blue” for example, it can be represented
as (0,0,1), (0,1,0), and (1,0,0). A complete list of all PFFs is shown in Table 1. Notice
that the color and texture information of the product samples were ignored and
emphasis was placed on the form features only. The images of product samples used
in the experiments described in Section 4.2 are converted to gray image using
image-processing software.
< Insert Table 1 about here >
4.2. Experiment and questionnaire design
A total of 69 mobile phones of different design were collected from the Taiwan
marketplace.
Three
pairwise
adjectives
including
traditional-modern,
rational-emotional and heavy-handy were adopted for SD evaluations (Osgood et al.,
1957). In order to collect the CAR data for mobile phone design, 30 subjects (15 of
7
each sex) were asked to evaluate each product sample using a score from -1 to +1 in
an interval of 0.1. In this manner, consumers can express their affective response
toward every product sample by choosing one of the pairwise adjectives. Take the
paired adjectives “traditional-modern” for example. If a consumer evaluates a certain
product sample with a score of 1 , it means this product suggests a strong ‘modern’
feeling to the consumer. And vice versa; if a score of 1 is evaluated, this product
projects a strong ‘traditional’ feeling. A zero value indicates that the consumer
perceived a neutral feeling between the modern and traditional. It should be
mentioned that if consumers are asked to evaluate the whole set of product samples at
the same time, the experimental results will inevitably induce more errors. As a
consequence, each consumer was asked to evaluate only 23 product samples
(one-third of the total) randomly picked from all the samples. The product samples
were presented to consumers in the format of questionnaires by asking them the SD
scores of most relevant adjectives. In order to collect the evaluation data in a more
effective way, a user-friendly interface for the questionnaire was also provided. The
images of product samples are displayed in the interface. After each section of the
experiment, the evaluation data of each subject can be recorded directly, thus
simplifying the data post-processing time. The presentation order of the product
samples was randomized to avoid any systematic effects. All the subjects' evaluation
scores toward each product sample were averaged to reach a final utility score. These
scores were applied as the output values in the LS-SVR model.
4.3. Construction of the LS-SVR prediction model
LS-SVR was used to construct the prediction model based on the collected
product samples. The form features of these product samples were treated as input
data while the average utility scores obtained from all the consumers were used as
output values. Since LS-SVR can only deal with one output value at a time, three
prediction models were constructed according to the selected adjectives. Since the
relationship between the input PFFs and the output CAR score is often nonlinear, the
frequently used methods such as MLR (Han et al., 2004) and QT1 (Lai et al., 2006),
which dependent on the assumption of linearity, can not deal with the nonlinear
relationship effectively. SVR has proved to be very powerful when dealing with
nonlinear problems. Compared to the standard SVR algorithm, LS-SVR has higher
computing efficiency and fewer parameters that need to be determined.
The performance of the LS-SVR model is heavily dependent on the
regularization parameter C and the kernel parameters. In order to balance the trade
off between improving training accuracy and prevent the problem of overfitting,
8
conducting a grid search with cross validation (CV) is the most frequently adopted
strategy in the literature. In this study, tuning the parameters of the LS-SVR model
using Bayesian inference can avoid the time-consuming grid search with CV.
4.4. The ARD process for PFFS
By using the ARD technique, the relevance of PFFs can be determined in a ‘soft’
way by introducing the augmented Mahalanobis kernel in Eq. (11). The selection
mechanism of features is embedded in the Bayesian inference. Small values of  d
will indicate the high relevance of the dth input; while a large value will indicate that
this input is less relevant with the other inputs. As a consequence, a reverse
elimination process similar to that of SVM-RFE can be used to prune off the least
relevant features with the largest value of  d . This process is repeated until all
features are removed. The complete feature selection procedure based on LS-SVR and
ARD is described as follows:
(1) Normalize the continuous attributes to a zero mean and unit variance and deal
with the discrete attributes using one-of-n encoding;
(2) Optimize the regulating parameter C and the parameter  of the RBF kernel
using Bayesian inference;
(3) Use the optimized value of  obtained in Step (3) as the initial values
 1   2    n of the augmented Mahalanobis kernel with initial values of
 a  1 and  b  0 ;
(4) Start with an empty ranked feature list R  [] and the selected feature list
F  [1,..., d ] with all features;
(5) Repeat until all features are ranked:
(a) Find the feature e with largest value of  d using feature list F ;
(b) Store the optimized parameter  d for the remaining features;
(c) Remove the feature e from the feature list F ;
(d) Inset the feature e into feature list R : R  [e, R] ;
(e) Re-calculate the parameter C for the training data using the feature list F ;
(f) Re-calculate the initial value of the parameter  for the training data using
the feature list F .
(6) Output: Ranked feature list R .
The overall ranking of PFFs can be determined by the resulting feature list R .
The relative relevance of each feature in each elimination step can be analyzed by
examining the optimized values of  d . Note that in each step of the selection
procedure, the regulating parameter C and the initial value of parameter  need to
be re-calculated after eliminating the least important feature in order to obtain optimal
9
prediction performance.
10
5. Experimental results and analyses
5.1. Effects of optimization algorithms for ARD
Since the Bayesian inference of ARD involves minimizing an objective function,
the choice of the optimization algorithm has a great influence on the obtained results.
In this study, two algorithms: the steepest descent method and the BFGS method were
investigated. In order to examine the effects of the algorithms, the parameters  a  1
and  b  0 were fixed to the standard Mahalanobis kernel in Eq. (10). The pairwise
adjectives “traditional-modern” were taken as an example and the results of the first
ARD step with the original twelve input features are shown in Fig. 2. Notice that the
value of the objective function declines quickly in the first few iterations in both
algorithms. However, the BFGS method takes fewer (about 1/10) iterations than that
of the steepest descent to reach minimization. In addition, the optimized  d values
obtained by the BFGS method were much larger (about 100 times) than that of the
steepest descent. These observations indicate that the BFGS method converges more
dramatically than that of the steepest descent. More specifically, the values of  d in
steepest descent change very steadily after the 100th iterations. The values of  d in
the BFGS method increase rapidly after only 10 iterations (35th~45th) and the
maximum  d (least important feature, X11) was about 250 times larger than the
minimum  d (most important feature, X3). As for steepest descent, the maximum
 d (X11) was only 25 times larger than the minimum  d (X3).
After examining the previous analysis using standard a Mahalanobis kernel, the
steepest descent method was favored to obtain the feature ranking due to its steady
convergence properties. However, it may suffer from not really reaching the minimum
of the objective function compared to the BFGS method due to its slow convergence.
This problem can be easily overcome by introducing the augmented Mahalanobis
kernel with additional parameters in Eq. (11) to accelerate the convergence while still
maintaining a steady variation of  d . Another issue is when to stop the iteration
process in each elimination step. Since most applications of ARD emphasize
improving the generalization performance of the prediction model, the iteration of
ARD is often set to stop when the training error, e.g. root mean squared error (RMSE),
becomes smaller than a pre-defined threshold. However, in the proposed method, the
initial values of  d were first optimized using the RBF kernel and the initial state of
the LS-SVR model had already achieved a relatively low training error. As a
consequence, the training error of the LS-SVR did not improve significantly during
the ARD iteration process. In this study, another stopping criterion was used.
Concerned that the difference between the form features perceived by the consumers
11
should be kept to a reasonable proportion, the stopping criterion of each elimination is
triggered when the proportion of the maximum  d to the minimum  d exceeds to a
pre-defined threshold. The purpose of this study is to analyze the relative relevance of
the form features as well as to obtain the feature ranking; how to provide a
meaningful comparison between the features is our main concern. In this manner,
when the relevance of one feature is small enough, that is, the  d of this feature is
larger than the other features and thus can be eliminated. In each elimination step, the
feature with the largest value of  d will be pruned off and the feature ranking
obtained.
< Insert Fig. 2 about here >
5.2. Predictive performance of the LS-SVR model with ARD
In order to examine the effectiveness of Bayesian inference applied on the
LS-SVR model with ARD, the predictive performance is compared with a typical
LS-SVR model (RBF kernel) using a grid search with LOOCV and an MLR model.
The predictive performance was measured by using RMSE. A grid search with
LOOCV is taken using the following sets of values: C  {103 ,102.9 ,...,102.9 ,103}
and  2  {103 ,102.9 ,...,102.9 ,103} . An optimal pair of parameters is obtained from
with the lowest RMSE value from the grid search.
Fig. 3 shows the predictive performance of the prediction models for the
“traditional-modern” pairwise adjective. The blue solid and red dash lines are the
original and predictive adjective scores of all the training product samples
respectively. For the result shown in Fig. 3(a), the parameter C and the parameter
 of the RBF kernel, obtained by Bayesian inference before applying ARD, was
(C, 2 )  (2.4156,911.09) . The performance of the model produced the result of RMSE
= 0.147. The training result after applying ARD is shown in Fig. 3(b). The optimal
parameter sets (C, 2 )  (2.4156,911.09) was used as the initial values of ARD. The
performance of the ARD model using the augmented Mahalanobis kernel was more
accurate and improved to where the RMSE = 0.036. The improved performance of the
ARD model benefits from the soft feature selection mechanism by gradually adjusting
the influence of each feature in the Bayesian inference process. As for the result of
LS-SVR with LOOCV shown in Fig. 3(c), the optimal parameter sets (C , 2 ) were
(25.119,251.19) and produced the result of RMSE = 0.174. The result of MLR shown
in Fig. 3(d) produced the result of RMSE = 0.197. Among these four constructed
models, LS-SVR with ARD gives the best predictive performance. In this study, the
posterior distribution of the model parameters can be approximated by the most
12
probable value using Laplace’s method. The inverse covariance matrix can be
regarded as confidence intervals (error bars) on the most probable parameters. As
shown in Figs. 3(a) and 3(b), the black dotted lines denote the upper and lower bound
of the error bars. It can be observed that the width of the error bar after applying ARD
became more compact than before applying ARD.
< Insert Fig. 3 about here >
5.3. Analysis of the feature ranking
The feature ranking of the three pairwise adjectives obtained from the proposed
ARD selection process is compared with that of an MLR model with backward model
selection (BMS). In each elimination step of BMS for MLR, the feature with the
smallest partial correlation with the output CAR score is considered first for removal.
The resulting feature ranking is shown in Table 2. The tendency of the feature ranking
for ARD and MLR is similar. For the result of the adjective “rational-emotional”, the
difference of ARD ranking compared to MLR is small (all absolute ranking difference
values less than 3). As for the results of the other two adjectives, if the features, which
their absolute ranking difference value is larger than 5 (X2 and X3 for
“rational-emotional”, X3, X7 and X1 for “heavy-handy”) are not taken into account,
the feature ranking of ARD and MLR is also quite similar.
The deleted line of feature X9 selected as the last rank tells us that it was not
considered in the final ranking. It is very interesting to find that although the most
important form feature obtained from the three adjectives had the same result: X9, the
value of  d for X9 did not change from the beginning and remained at its initial
value to the end in all the elimination steps. For example, the vertical line of  d for
X9 in the first step indicates that it remained unchanged as shown in Fig. 2(a). This is
due to the attribute of X9 (arrangement of number buttons) where all product samples
had the identical value of X9-1 (square). During the iteration of optimization steps,
the optimization algorithm such as steepest descent or the BFGS method need to
compute the gradients. The identical value of X9 for all training data samples enforces
zero gradients in each step and causes the ARD process to eliminate other feature with
larger value of  d . As a consequence, the relevance of X9 compared with the other
features cannot be processed using the ARD selection process.
The obtained feature ranking using ARD is very helpful to product designers
when they are determining the importance of form features while designing a mobile
phone. For example, according to the obtained results, the designer should decrease
the thickness (X3) and volume (X4) of the body since these two features had a high
13
ranking with different adjectives. For the adjective “rational-emotional”, the length
(X1) and width (X2) of the body also had a very high ranking. As for the feature X5
(body type) with three kinds of attributes including block, flip, and slide type, it had a
different ranking with each of the three adjectives. The feature X5 had the highest
ranking (3rd) for the adjective “traditional-modern”, medium ranking (7th) for
“heavy-handy”, and lowest ranking (10th) for “rational-emotional”. The least
important features of these three adjectives are quite the same including X6 (type of
function button), X10 (detail treatment of number button) and X11 (position of panel).
< Insert Table 2 about here >
5.4. Relative relevance analysis of features
In each elimination step of ARD, the value of  d was used to determine the
relative relevance for all form features. A normalized relevance value of each feature
was calculated by dividing the inverse  d with the maximum inverse  d in each
step. Hence, the most relevant feature in each step will have a value of 1.0 and the
remaining features will have relevance values between 0.0 and 1.0. Fig. 4 shows the
distribution of the normalized relevance values using the adjective
“traditional-modern” as an illustrated example. Fig. 4(a) shows the relevance
distribution of Step 1 before removing any feature. The two most relevant features
(X3 and X5) had relatively higher relevance values than the other features. These less
relevant features were eliminated and the distribution of the relevance value was
similar during the following steps (Steps 2~4). Fig. 4(b) shows the relevance
distribution of Step 5 after eliminating four features. The relevance values of X2, X7,
X8 and X12 increased slightly in this step compared to Step1. Notice that the
relevance of feature X4 increased gradually in the first five steps and became the most
relevant feature in Step 6 as shown in Fig. 4(c). After Step 7 shown in Fig. 4(d),
compared to the most relevant feature X4, the remaining features is much smaller.
< Insert Fig. 4 about here >
6. Conclusions and discussions
In this study, an approach based on LS-SVR and ARD is proposed to deal with
the PFFS problem. PFFs are examined by morphological analysis. Pairwise adjectives
are used to express CARs. The LS-SVR prediction model is formulated as a function
estimation problem taking the form features as the input data and the averaging CAR
14
data as the output value. A backward elimination process based on ARD, a
soft-embedded feature selection method, is used to analyze the relative relevance of
the form features and obtain the feature ranking. In each elimination step of ARD, the
influence of each form feature is adjusted gradually by tuning the length scales of an
augmented Mahalanobis kernel with Bayesian inference. Depending on the designated
adjective to be analyzed, this proposed method for PFFS provides product designers
with a potential tool for systematically determining the relevance of PFFs.
Although the resulting feature ranking obtained by the proposed method based
on LS-SVR and ARD, is similar to that of the MLR with BMS, the proposed method
gives a higher predictive performance benefits from the soft feature selection
mechanism. However, our case study was based on the design of mobile phones and
used a relatively small amount of PFFs. The form features of other products such as
consumer electronics, furniture, automobiles, etc., may have to consider different
characteristics. A more comprehensive study of different products is needed to verify
the effectiveness of the proposed method. Also, inducing more complex product
attributes other than form features, such as color, texture and material information on
the product samples, is the main research direction being taken by the authors.
7. Suggestions for future study
According to the earlier manuscript of this study, the reviewers have pointed out
several suggestions worth of future works. The biggest challenge posed by the
reviewers is probably the usefulness of the proposed method for PFFS due to the CAR
data is involved with human subjective evaluations. Meanwhile, a more complete
empirical comparison with other feature selection algorithms with filter, wrapper and
embedded approach is still needed. Since kernel-based feature selection methods,
such as ARD, are mainly applied in fields with high-dimensional data, such as
bioinformatics, one may argue that it is no need to apply such sophisticated method to
deal with the problem brought out in this study. Although using a relative small
dataset of mobile phones and only 12 features are used, the authors regard this study
as a fundamental study for developing a more comprehensive application for product
design. Several possible future works to extend the limitations of this study are
described here.
Considerations for the diversity of consumers
When linking the CAR data to PFFs in KE studies, CAR is frequently treated as
averaged data across all subjects. Since the subjective perception of consumers are
usually diverse and heterogeneous in nature, averaged consumer data may fail to
15
represent any individual opinion. A remedy for such problem is to construct individual
prediction model for each consumer and aggregate the individual output CAR by
fuzzy integral, e.g. the Choquet integral, which is often in domains like multicriteria
decision making (MCDM).
Integration with CAD models for product form representation
Simple and intuitive representation scheme of PFFs obtained from
morphological analysis can be replaced by CAD models. The crux to adapt more
complex CAD models is rely on making proper use of kernel function, such as
building kernel function based on similarity measure for CAD models or integrating
the use of model-tree and edit distance-based kernel to reflect the model constructing
process.
Web-based platform for collecting consumers’ subjective evaluation data
The collection of the CAR data in KE studies is often accomplished by tedious
SD experiments. A web-based platform providing natural browsing environment
which mimic e-commerce website is capable to automatically gather more evaluated
responses for large amounts of product samples.
Acknowledgements
The authors would like to thank three anonymous reviewers for their valuable
comments that have led to a substantial improvement in this study. This work was
supported by the National Science Council of the Republic of China under grant
NSC96-2221-E-006-126.
References
Behrens, T. E. J., Berg, H. J., Jbabdi, S., Rushworth, M. F. S., Woolrich, M. W., 2007.
Probabilistic diffusion tractography with multiple fibre orientations: What can
we gain? NeuroImage 34, 144-155.
Blum, A. L., Langley, P., 1997. Selection of relevant features and examples in
machine learning. Artificial Intelligence 97, 245-271.
Breiman, L., Friedman, J., Olshen, R., Stone, C., 1983. CART: Classification and
Regression Trees. Wadsworth, CBelmont, California.
Chu, W., Keerthi, S. S., Ong, C. J., Ghahramani, Z., 2006. Bayesian support vector
machines for feature ranking and selection. In: Guyon, I., Gunn, S., Nikravesh,
M., Zadeh, L. (Eds.), Feature Extraction, Foundations and Applications.
16
Springer Verlag.
Grandvalet, Y., Canu, S., 2003. Adaptive scaling for feature selection in SVMs. In:
Becker, S., Thrun, S., Obermayer, K. (Eds.), Advances in Neural Information
Processing Systems 15. MIT Press, 569-576.
Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer
classification using support vector machines. Machine Learning 46(1-3),
389-422.
Han, S. H., Hong, S. W., 2003. A systematic approach for coupling user satisfaction
with product design. Ergonomics 46(13/14), 1441-1461.
Han, S. H., Kim, K. J., Yun, M. H., Hong, S. W., Kim, J., 2004. Identifying mobile
phone design features critical to user satisfaction. Human Factors and
Ergonomics in Manufacturing 14(1), 15-29.
Han, S. H., Yang, H., 2004. Screening important design variables for building a
usability model. International Journal of Industrial Ergonomics 33, 159-171.
Herbrich, R., 2002. Learning Kernel Classifier: Theory and Algorithms. MIT Press,
Cambridge.
Hsiao, S. W., Huang, H. C., 2002. A neural network based approach for product form
design. Design studies 23, 67-84.
Hsu, S. H., Chuang, M. C., Chang, C. C., 2000. A semantic differential study of
designers' and users' product form perception. International Journal of
Industrial Ergonomics 25, 375-391.
Jindo, T., Hirasago, K., Nagamachi, M., 1995. Development of a design support
system for office chairs using 3-D graphics. International Journal of Industrial
Ergonomics 15, 49-62.
Jones, J. C., 1992. Design Methods. Van Nostrand Reinhold, New York.
Lai, H. H., Lin, Y. C., Yeh, C. H., Wei, C. H., 2006. User-oriented design for the
optimal combination on product design. International Journal of Production
Economics 100, 253-267.
Lopez, G., Batlles, F. J., Tovar-Pescador, J., 2005. Selection of input parameters to
model direct solar irradiance by using artificial neural networks. Energy 30,
1675-1684.
MacKay, D., 1994. Bayesian non-linear modelling for the prediction competition.
ASHRAE Transactions 100, 1053-1062.
Miller, G. A., 1956. The magical number seven, plus or minus two: some limits on our
capacity for processing information. Psychological Review 63, 81-97.
Osgood, C. E., Suci, C. J., Tannenbaum, P. H., 1957. The Measurement of Meaning.
University of Illinois Press, Champaign, IL.
Shieh, M. D., Yang, C. C., 2007. Multiclass SVM-RFE for product form feature
17
selection. Expert Systems with Applications, doi:10.1016/j.eswa.2007.07.043.
Suykens, J. A. K., Van Gestel, T., Brabanter, D., De Moor, B., Vandewalle, J., 2002.
Least Squares Support Vector Machines. World Scientific, Singapore.
Tibshirani, R., 1996. Regression shrinkage and selection via the Lasso. Journal of
Royal Statistical Society Series B 58(1), 267-288.
Vapnik, V. N., 1995. The nature of statistical learning theory. Springer, New York.
Vapnik, V. N., Golowich, S., Smola, A., 1997. Support vector method for function
approximation, regression estimation and signal processing. In: Mozer, M.,
Jordan, M., Petsche, T. (Eds.), Advance in neural information processing
system 9. MIT Press, Cambridge, MA, pp. 281-287.
Viaene, S., Dedene, G., Derrig, R. A., 2005. Auto claim fraud detection using
Bayesian learning neural networks. Expert Systems with Applications 29,
653-666.
Wang, D., Lu, W. Z., 2006. Interval estimation of urban ozone level and selection of
influential factors by employing automatic relevance determination model.
Chemosphere 62, 1600-1611.
18
Fig. 1. The flow chart of the proposed method for PFFS.
Determine the product form representation
using morphological analysis
Conduct the questionaire investigation to
gather comsumers' affective responses
Construct the LS-SVR model using the form
feautres and the representative adjectives
Analyze the relative relevance of form features
and obtain feature ranking using the ARD
selection process
19
Fig. 2. Step 1 of Bayesian inference on length scales (traditional-modern) using (a)
steepest descent method and (b) BFGS method with fixed  a  1 and  b  0 .
(a)
20
(b)
21
Fig. 3. Performance of the traditional-modern pairwise adjective of (a) LS-SVR with
Bayesian inference before ARD, (b) LS-SVR with Bayesian inference after
ARD, (c) LS-SVR with LOOCV and (d) MLR.
(a)
(b)
22
(c)
(d)
23
Fig. 4. Relevance distribution of (a) Step 1, (b) Step 5, (c) Step 6 and (d) Step 7 of
traditional-modern pairwise adjective.
(a)
(b)
24
(c)
(d)
25
Table 1. Complete list of PFFs of mobile phone design.
Form features
Type
Attributes
Length
Continuous
None
Continuous
None
Continuous
None
Continuous
None
(X1)
Width
(X2)
Thickness
Body
(X3)
Volume
(X4)
Type
Discrete
(X5)
Type
Block body
Flip body
Slide body
(X5-1)
(X5-2)
(X5-3)
Full-
Partial-
Regular-
separated
separated
separated
(X6-1)
(X6-2)
(X6-3)
Round
Square
Bar
(X7-1)
(X7-2)
(X7-3)
Circular
Regular
Asymmetric
(X8-1)
(X8-2)
(X8-3)
Square
Vertical
Horizontal
(X9-1)
(X9-2)
(X9-3)
Discrete
Function button
(X6)
Style
Discrete
(X7)
Shape
Discrete
Number button
(X8)
Arrangement
Discrete
(X9)
26
Detail treatment
Discrete
(X10)
Regular seam
Seamless
Vertical seam
Horizontal
(X10-1)
(X10-2)
(X10-3)
seam
(X10-4)
Position
Discrete
Panel
(X11)
Shape
Middle
Upper
Lower
Full
(X11-1)
(X11-2)
(X11-3)
(X11-4)
Square
Fillet
Shield
Round
(X12-1)
(X12-2)
(X12-3)
(X12-4)
Discrete
(X12)
27
Table 2. Feature ranking of (a) traditional-modern (b) rational-emotional and (c) heavy-handy pairwise adjectives using ARD and MLR.
Traditional-modern
(a)
Rank ARD
(b)
*Dif
MLR
Rational-emotional
Rank ARD
(c)
*Dif
MLR
Heavy-handy
Rank ARD
*Dif
MLR
1
X4
+2
X3
1
X2
+9
X1
1
X3
+5
X4
2
3
4
5
6
7
8
X3
X5
X8
X7
X12
X2
X1
-1
-1
+3
0
-2
+2
-2
X5
X4
X12
X7
X1
X8
X10
2
3
4
5
6
7
8
X3
X1
X4
X11
X7
X8
X12
+5
-2
-2
-2
-2
+4
-2
X4
X11
X7
X10
X12
X6
X5
2
3
4
5
6
7
8
X8
X12
X4
X2
X7
X5
X1
+3
+1
-3
-3
+5
0
-5
X2
X1
X12
X8
X3
X5
X6
9
10
11
12
X10
X6
X11
X9
-1
+1
-1
-
X2
X11
X6
X9
9
10
11
12
X10
X5
X6
X9
-4
-2
-4
-
X3
X2
X8
X9
9
10
11
12
X11
X6
X10
X9
0
-2
-1
-
X11
X10
X7
X9
*Dif denotes the ranking difference of the feature using ARD compared to MLR with BMS (+: proceed in rank, -: recede in rank).
The underlined feature for ARD indicate its absolute value of ranking difference larger than 5.
28
Download