Product Form Feature Selection for Mobile Phone Design using LS-SVR and ARD Chih-Chieh Yang Department of Multimedia and Entertainment Science, Southern Taiwan University No. 1, Nantai Street, Yongkang City, Tainan County, Taiwan 71005 Meng-Dar Shieh Department of Industrial Design, National Cheng Kung University, Tainan, Taiwan 70101 Abstract In the product design field, it is important to pin point critical product form features (PFFs) that influence consumers’ affective responses (CARs) of a product design. Manual inspection of critical form features based on expert opinions (such as those of product designers) has not proved to meet with the acceptance of consumers. In this paper, an approach based on least squares support vector regression (LS-SVR) and automatic relevance determination (ARD) is proposed to streamline the task of product form feature selection (PFFS) according to the CAR data. The representation of PFFs is determined by morphological analysis and pairwise adjectives are used to express CARs. In order to gather the CAR data, an experiment of semantic differential (SD) evaluation on collected product samples was conducted. The LS-SVR prediction model can be constructed using the PFFs as input data and the evaluated SD scores as output value. The optimal parameters of the LS-SVR model are tuned by using Bayesian inference. Finally, an ARD selection process is used to analyze the relative relevance of PFFs to obtain feature ranking. A case study of mobile phone design is also given to demonstrate the proposed method. In order to examine the effectiveness of the proposed method, the predictive performance is compared with a typical LS-SVR model using a grid search with leave-one-out cross validation (LOOCV) and a multiple linear regression (MLR). The proposed model based on LS-SVR and ARD is outperformed to the other two model benefits from the soft feature selection mechanism. Furthermore, the resulting feature ranking is also compared with that of the MLR with backward model selection (BMS). The results of these two methods using the CAR data of three adjectives exhibit similar feature ranking. Since only one small data of mobile phone design are investigated, a more comprehensive investigations based on different kinds of products will be needed to verify the proposed method in our future study. 1 Keywords: Feature selection; Least squares support vector regression; Automatic relevance determination; Bayesian inference 1. Introduction The way a product looks is one of the most important factors affecting a consumer’s purchasing decision. Traditionally, the success of a product’s design depended on the designers’ artistic sensibilities, which quite often did not meet with great acceptance in the marketplace. Many systematic product design studies have been carried out to get a better insight into consumers’ subjective perceptions. The most notable research is Kansei engineering (KE) (Jindo et al., 1995). The basic assumption of KE studies is that there exists a cause-and-effect relationship between product form features (PFFs) and consumers’ affective responses (CARs) (Han and Hong, 2003). Therefore, a prediction model can be constructed from collected product samples and the CAR data. Using the prediction model, the relationship between PFFs and CAR can be analyzed and a specially designed product form for specific target consumer groups can be produced more objectively and efficiently. The construction of the prediction model can be regarded as a function estimation problem (regression) taking PFFs of collected product samples as input data and the CAR data as output values. Various methods can be used to construct the prediction model including multiple linear regressions (MLR) (Han et al., 2004) quantification theory type I (QT1) (Lai et al., 2006), partial least squares regression (PLSR), neural networks (NN) (Hsiao and Huang, 2002) and support vector regression (SVR). Of these methods, SVR’s remarkable performance makes it the first choice in a number of real-world applications (Vapnik et al., 1997) but it has not been adapted to the product design field according to our knowledge. It is an important issue in KE (Han and Hong, 2003), that the problem of product form feature selection (PFFS) according to consumers’ perceptions has not been intensively investigated. The subjective perceptions of consumers are often influenced by a wide variety of form features. The number of form features could be many and might be highly correlated to each other. Therefore, manual inspection of the relative importance of PFFs and finding out the most critical features that please consumers is a difficult task. In the product design field, critical design features are often arrived at based on the opinions of experts or focus groups. However, the selection of features based on expert opinion often lacks objectivity. Only a few attempts have been made to overcome these shortcomings in the PFFS process. For example, Han and Hong (2003) used several traditional statistical methods for screening critical design 2 features including principal component regression (PCR), cluster analysis, and partial least squares (PLS). In the study of Han and Yang (2004), a genetic algorithm-based PLS method is applied to screen design variables. According to the magazine article by Blum and Langley (1997) feature selection can be divided into three major categories: the filter, wrapper, and embedded methods, based on the relationship between the selection scheme and the learning algorithm used in the model construction. The embedded method directly incorporates the feature selection process within the learning algorithm. Such examples are CART (Breiman et al., 1983), LASSO (Tibshirani, 1996), support vector machine recursive feature elimination (SVM-RFE) (Guyon et al., 2002) and automatic relevance determination (ARD) (MacKay, 1994). The optimal feature subset can be determined simultaneously during the training process. The embedded method usually involves evaluating the changes in an objective function of the applied learning algorithm. The feature subset can be selected by optimizing the objective function that jointly rewards the predictive performance and penalizes for selecting too many features. Compared to the wrapper method, the embedded method can reach a solution faster by avoiding retraining the learning algorithm from scratch for every possible feature subset. Depending on whether the features will be eliminated or not, the feature selection process can be viewed in another manner and divided into ‘hard’ feature selection and ‘soft’ feature selection (Viaene et al., 2005). Most existing methods belong to the hard feature selection category, which aims to prune irrelevant inputs by eliminating features with relatively low importance. The purpose of the hard feature selection is often to deal with the well-known ‘curse of dimensionality’ problem. When there are a great number of features and there is not enough training data, the irrelevant inputs may reduce the performance of the prediction model and therefore need to be eliminated. In contrast, rather than seeking to eliminate irrelevant features, the soft feature selection keeps all the original inputs and adjusts the influence of the features to the learning algorithm (Grandvalet and Canu, 2003). The underlying concept of the soft feature selection mechanism is that every feature should more or less have a certain influence. Situations where all features are either highly relevant or not relevant at all are uncommon. This description is especially pertinent in the case of product form design. Since the perceived feelings toward a product form is based on one’s observation of the product’s appearance, every PFFs should have some degree of influence on the consumer’s affective response. However, studies of the soft selection method in the literature are relatively sparse. Of the above-mentioned methods, the ARD method has been successfully applied in a number of real-world applications (Lopez et al., 2005; Wang and Lu, 2006; Behrens et al., 2007). Although 3 most applications of ARD are proposed in combination with the Bayesian neural networks (MacKay, 1994), ARD can also be integrated into other learning algorithm such as Bayesian SVM (Chu et al., 2006) or other Bayesian based models (Behrens et al., 2007). In our previous study (Shieh and Yang, 2007), a PFFS method for SVM-RFE based on a multiclass classification model of product form design was proposed. Unlike the classification-based feature selection method, such as SVM-RFE, which is based on the consumer’s judgment to discriminate the product form design from each other, the regression-based feature selection method, such as ARD used in this study, treats the PFFS problem in a different manner. This study proposes an approach based on least squares support vector regression (LS-SVR), as a variant of the SVR algorithm and ARD to deal with the PFFS problem. A similar backward elimination process of SVM-RFE is used to gradually screen out less important PFF based on the soft feature selection mechanism of ARD. PFFs used as input data and the CAR scores gathered from the questionnaire as output values are used to construct the LS-SVR prediction model. The optimal values of the parameters of the LS-SVR model are determined by Bayesian inference. ARD as a soft-embedded feature selection method is used to determine the relevance of PFFs based on the constructed LS-SVR model. The remainder of the paper is organized as follows: Section 2 presents the outline of the proposed PFFS method. Section 3 introduces the theoretical backgrounds of the LS-SVR algorithm and the ARD method for feature selection. A detailed implementation procedure of the proposed method is given in Section 4. Section 5 demonstrates the results of the proposed method using mobile phone designs as an example. Section 6 presents some conclusions and discussions. Finally, several possible future studies to extend the limitations of this work are described in Section 7. 2. Outline of the proposed method for product form feature selection The flow chart of the proposed method for PFFS is presented in Fig. 1. The procedure comprises the following steps: (1) Determine the product form representation using morphological analysis. (2) Conduct the questionnaire investigation for semantic differential (SD) evaluation to gather the CAR data on product samples. (3) Construct the LS-SVR prediction model based on PFFs and the pairwise adjectives. (4) Analyze the relative relevance of PFFs and obtain feature ranking using the ARD selection process. 4 < Insert Fig. 1 about here > 3. Theoretical backgrounds 3.1. Least squares support vector regression This section briefly introduces the algorithm of LS-SVR. A training data set D of l data points are given, ( x1 , y1 ) ( xl , yl ) (1) where xi Rn is the input data, yi R is the desired output value. LS-SVR begins with approximating an unknown function in the primal weight space using the following equation: (2) f ( x) w ( x) b where w R k is the weight vector and () : Rn Rk is a nonlinear function that maps the input space into a higher dimension feature space. By introducing the quadratic loss function into LS-SVR, one can formulate the following optimization problem in the weight space minimize 1 2 1 w C 2 2 l 2 i (3) i 1 subject to the equality constraints yi w ( xi ) b i , i 1,..., l where C 0 is the regularization parameter, b is a bias term, and i (4) is the difference between the desired output and the actual output. When the dimension of w becomes infinite, one cannot solve the primal problem in Eq. (3). The dual problem can be derived using the Lagrange multipliers method. The mapping is usually nonlinear and unknown. Instead of calculating , the kernel function K is used to compute the inner product of the two vectors in the feature space and thus implicitly defines the mapping function, which is K ( xi , x j ) ( xi ) ( x j ) (5) The following are three commonly used kernel functions (Vapnik, 1995): linear: K ( xi , x j ) xi x j (6) polynomial: K ( xi , x j ) (1 xi x j ) p (7) radial basis function (RBF): K ( xi , x j ) exp( 5 xi x j 2 2 ) (8) where the indices i and j correspond to different input vectors. p in Eq. (7) and in Eq. (8) are adjustable kernel parameters. In an LS-SVR algorithm, the kernel parameter and the regularization parameter C are the only two parameters that need to be tuned, which is less than that for the standard SVR algorithm. In the case of the linear kernel, C is the only parameter that needs to be tuned. The resulting LS-SVR prediction function is as follows: f ( x) l K ( x, x ) b. i (9) i i 1 For detailed deviations of the LS-SVR algorithm the authors refer to the textbook of Suykens et al. (2002). 3.2. Feature selection based on automatic relevance determination In this study, ARD is used to perform the soft feature selection, which is equivalent to determining the relevant dimensions of the n -dimensional input space using Bayesian inference. The simplest form of ARD can be carried out by introducing the standard Mahalanobis kernel (Herbrich, 2002) as follows: K ( xi , x j ) exp( n xid x dj 2 d2 d 1 (10) ) where xid denotes the dth element of input vector xi , d 1,..., n . The Mahalanobis kernel can be regarded as a generalized RBF kernel in Eq. (8), which has a separate length scale parameter d for each input space. The influence of the dth feature in the input space can be adjusted by setting larger or smaller values of d . The Bayesian inference of ARD is then implemented by minimizing an objective function and the relevance of each input can be determined automatically according to the optimized d values. However, the objective function often converges very slowly and results in very large values of optimized d when using the standard Mahalanobis kernel. As a consequence, an augmented Mahalanobis kernel with an additional two parameters is adapted as follows (Chu et al., 2006): K ( xi , x j ) a exp( n d 1 xid x dj d2 2 ) b (11) where a 0 denotes the amplitude parameter and b 0 denotes the offset parameter. The parameters { a , 1 ,..., n , b } can be determined simultaneously during the optimization process. The ARD process for PFFS combined with Bayesian inference for parameter tuning is described in Section 4.5. For detailed derivations of Bayesian inference for ARD, the authors recommend the textbook of Suykens et al. 6 (2002). 4. Implementation procedures This study implements a method for PFFS by considering the case study of mobile phone design. The detailed procedures are described in the following paragraphs. 4.1. Determination of product form representation In this aspect of form representation for product design, this study adopted morphological analysis (Jones, 1992). In fact, morphological analysis in KE studies is the most widely used technique in KE due to its simple and intuitive way to define PFFs. The mixture of continuous and discrete attributes is also allowed. Practically, the product is decomposed into several main components and every possible attribute for each component is examined. For this study, a mobile phone was decomposed into body, function button, number button and panel. Continuous attributes such as length and volume are recorded directly. Discrete attributes such as type of body, style of button etc. were represented as categorical choices. Twelve form features of the mobile phone designs, including four continuous attributes and eight discrete attributes, were used. Continuous attributes were normalized to a zero mean and unit variance. The discrete attributes, which are in fact nominal variables, were pre-processed using one-of-n encoding, instead of encoding as integer numbers. Taking a three-category attribute “red, green, blue” for example, it can be represented as (0,0,1), (0,1,0), and (1,0,0). A complete list of all PFFs is shown in Table 1. Notice that the color and texture information of the product samples were ignored and emphasis was placed on the form features only. The images of product samples used in the experiments described in Section 4.2 are converted to gray image using image-processing software. < Insert Table 1 about here > 4.2. Experiment and questionnaire design A total of 69 mobile phones of different design were collected from the Taiwan marketplace. Three pairwise adjectives including traditional-modern, rational-emotional and heavy-handy were adopted for SD evaluations (Osgood et al., 1957). In order to collect the CAR data for mobile phone design, 30 subjects (15 of 7 each sex) were asked to evaluate each product sample using a score from -1 to +1 in an interval of 0.1. In this manner, consumers can express their affective response toward every product sample by choosing one of the pairwise adjectives. Take the paired adjectives “traditional-modern” for example. If a consumer evaluates a certain product sample with a score of 1 , it means this product suggests a strong ‘modern’ feeling to the consumer. And vice versa; if a score of 1 is evaluated, this product projects a strong ‘traditional’ feeling. A zero value indicates that the consumer perceived a neutral feeling between the modern and traditional. It should be mentioned that if consumers are asked to evaluate the whole set of product samples at the same time, the experimental results will inevitably induce more errors. As a consequence, each consumer was asked to evaluate only 23 product samples (one-third of the total) randomly picked from all the samples. The product samples were presented to consumers in the format of questionnaires by asking them the SD scores of most relevant adjectives. In order to collect the evaluation data in a more effective way, a user-friendly interface for the questionnaire was also provided. The images of product samples are displayed in the interface. After each section of the experiment, the evaluation data of each subject can be recorded directly, thus simplifying the data post-processing time. The presentation order of the product samples was randomized to avoid any systematic effects. All the subjects' evaluation scores toward each product sample were averaged to reach a final utility score. These scores were applied as the output values in the LS-SVR model. 4.3. Construction of the LS-SVR prediction model LS-SVR was used to construct the prediction model based on the collected product samples. The form features of these product samples were treated as input data while the average utility scores obtained from all the consumers were used as output values. Since LS-SVR can only deal with one output value at a time, three prediction models were constructed according to the selected adjectives. Since the relationship between the input PFFs and the output CAR score is often nonlinear, the frequently used methods such as MLR (Han et al., 2004) and QT1 (Lai et al., 2006), which dependent on the assumption of linearity, can not deal with the nonlinear relationship effectively. SVR has proved to be very powerful when dealing with nonlinear problems. Compared to the standard SVR algorithm, LS-SVR has higher computing efficiency and fewer parameters that need to be determined. The performance of the LS-SVR model is heavily dependent on the regularization parameter C and the kernel parameters. In order to balance the trade off between improving training accuracy and prevent the problem of overfitting, 8 conducting a grid search with cross validation (CV) is the most frequently adopted strategy in the literature. In this study, tuning the parameters of the LS-SVR model using Bayesian inference can avoid the time-consuming grid search with CV. 4.4. The ARD process for PFFS By using the ARD technique, the relevance of PFFs can be determined in a ‘soft’ way by introducing the augmented Mahalanobis kernel in Eq. (11). The selection mechanism of features is embedded in the Bayesian inference. Small values of d will indicate the high relevance of the dth input; while a large value will indicate that this input is less relevant with the other inputs. As a consequence, a reverse elimination process similar to that of SVM-RFE can be used to prune off the least relevant features with the largest value of d . This process is repeated until all features are removed. The complete feature selection procedure based on LS-SVR and ARD is described as follows: (1) Normalize the continuous attributes to a zero mean and unit variance and deal with the discrete attributes using one-of-n encoding; (2) Optimize the regulating parameter C and the parameter of the RBF kernel using Bayesian inference; (3) Use the optimized value of obtained in Step (3) as the initial values 1 2 n of the augmented Mahalanobis kernel with initial values of a 1 and b 0 ; (4) Start with an empty ranked feature list R [] and the selected feature list F [1,..., d ] with all features; (5) Repeat until all features are ranked: (a) Find the feature e with largest value of d using feature list F ; (b) Store the optimized parameter d for the remaining features; (c) Remove the feature e from the feature list F ; (d) Inset the feature e into feature list R : R [e, R] ; (e) Re-calculate the parameter C for the training data using the feature list F ; (f) Re-calculate the initial value of the parameter for the training data using the feature list F . (6) Output: Ranked feature list R . The overall ranking of PFFs can be determined by the resulting feature list R . The relative relevance of each feature in each elimination step can be analyzed by examining the optimized values of d . Note that in each step of the selection procedure, the regulating parameter C and the initial value of parameter need to be re-calculated after eliminating the least important feature in order to obtain optimal 9 prediction performance. 10 5. Experimental results and analyses 5.1. Effects of optimization algorithms for ARD Since the Bayesian inference of ARD involves minimizing an objective function, the choice of the optimization algorithm has a great influence on the obtained results. In this study, two algorithms: the steepest descent method and the BFGS method were investigated. In order to examine the effects of the algorithms, the parameters a 1 and b 0 were fixed to the standard Mahalanobis kernel in Eq. (10). The pairwise adjectives “traditional-modern” were taken as an example and the results of the first ARD step with the original twelve input features are shown in Fig. 2. Notice that the value of the objective function declines quickly in the first few iterations in both algorithms. However, the BFGS method takes fewer (about 1/10) iterations than that of the steepest descent to reach minimization. In addition, the optimized d values obtained by the BFGS method were much larger (about 100 times) than that of the steepest descent. These observations indicate that the BFGS method converges more dramatically than that of the steepest descent. More specifically, the values of d in steepest descent change very steadily after the 100th iterations. The values of d in the BFGS method increase rapidly after only 10 iterations (35th~45th) and the maximum d (least important feature, X11) was about 250 times larger than the minimum d (most important feature, X3). As for steepest descent, the maximum d (X11) was only 25 times larger than the minimum d (X3). After examining the previous analysis using standard a Mahalanobis kernel, the steepest descent method was favored to obtain the feature ranking due to its steady convergence properties. However, it may suffer from not really reaching the minimum of the objective function compared to the BFGS method due to its slow convergence. This problem can be easily overcome by introducing the augmented Mahalanobis kernel with additional parameters in Eq. (11) to accelerate the convergence while still maintaining a steady variation of d . Another issue is when to stop the iteration process in each elimination step. Since most applications of ARD emphasize improving the generalization performance of the prediction model, the iteration of ARD is often set to stop when the training error, e.g. root mean squared error (RMSE), becomes smaller than a pre-defined threshold. However, in the proposed method, the initial values of d were first optimized using the RBF kernel and the initial state of the LS-SVR model had already achieved a relatively low training error. As a consequence, the training error of the LS-SVR did not improve significantly during the ARD iteration process. In this study, another stopping criterion was used. Concerned that the difference between the form features perceived by the consumers 11 should be kept to a reasonable proportion, the stopping criterion of each elimination is triggered when the proportion of the maximum d to the minimum d exceeds to a pre-defined threshold. The purpose of this study is to analyze the relative relevance of the form features as well as to obtain the feature ranking; how to provide a meaningful comparison between the features is our main concern. In this manner, when the relevance of one feature is small enough, that is, the d of this feature is larger than the other features and thus can be eliminated. In each elimination step, the feature with the largest value of d will be pruned off and the feature ranking obtained. < Insert Fig. 2 about here > 5.2. Predictive performance of the LS-SVR model with ARD In order to examine the effectiveness of Bayesian inference applied on the LS-SVR model with ARD, the predictive performance is compared with a typical LS-SVR model (RBF kernel) using a grid search with LOOCV and an MLR model. The predictive performance was measured by using RMSE. A grid search with LOOCV is taken using the following sets of values: C {103 ,102.9 ,...,102.9 ,103} and 2 {103 ,102.9 ,...,102.9 ,103} . An optimal pair of parameters is obtained from with the lowest RMSE value from the grid search. Fig. 3 shows the predictive performance of the prediction models for the “traditional-modern” pairwise adjective. The blue solid and red dash lines are the original and predictive adjective scores of all the training product samples respectively. For the result shown in Fig. 3(a), the parameter C and the parameter of the RBF kernel, obtained by Bayesian inference before applying ARD, was (C, 2 ) (2.4156,911.09) . The performance of the model produced the result of RMSE = 0.147. The training result after applying ARD is shown in Fig. 3(b). The optimal parameter sets (C, 2 ) (2.4156,911.09) was used as the initial values of ARD. The performance of the ARD model using the augmented Mahalanobis kernel was more accurate and improved to where the RMSE = 0.036. The improved performance of the ARD model benefits from the soft feature selection mechanism by gradually adjusting the influence of each feature in the Bayesian inference process. As for the result of LS-SVR with LOOCV shown in Fig. 3(c), the optimal parameter sets (C , 2 ) were (25.119,251.19) and produced the result of RMSE = 0.174. The result of MLR shown in Fig. 3(d) produced the result of RMSE = 0.197. Among these four constructed models, LS-SVR with ARD gives the best predictive performance. In this study, the posterior distribution of the model parameters can be approximated by the most 12 probable value using Laplace’s method. The inverse covariance matrix can be regarded as confidence intervals (error bars) on the most probable parameters. As shown in Figs. 3(a) and 3(b), the black dotted lines denote the upper and lower bound of the error bars. It can be observed that the width of the error bar after applying ARD became more compact than before applying ARD. < Insert Fig. 3 about here > 5.3. Analysis of the feature ranking The feature ranking of the three pairwise adjectives obtained from the proposed ARD selection process is compared with that of an MLR model with backward model selection (BMS). In each elimination step of BMS for MLR, the feature with the smallest partial correlation with the output CAR score is considered first for removal. The resulting feature ranking is shown in Table 2. The tendency of the feature ranking for ARD and MLR is similar. For the result of the adjective “rational-emotional”, the difference of ARD ranking compared to MLR is small (all absolute ranking difference values less than 3). As for the results of the other two adjectives, if the features, which their absolute ranking difference value is larger than 5 (X2 and X3 for “rational-emotional”, X3, X7 and X1 for “heavy-handy”) are not taken into account, the feature ranking of ARD and MLR is also quite similar. The deleted line of feature X9 selected as the last rank tells us that it was not considered in the final ranking. It is very interesting to find that although the most important form feature obtained from the three adjectives had the same result: X9, the value of d for X9 did not change from the beginning and remained at its initial value to the end in all the elimination steps. For example, the vertical line of d for X9 in the first step indicates that it remained unchanged as shown in Fig. 2(a). This is due to the attribute of X9 (arrangement of number buttons) where all product samples had the identical value of X9-1 (square). During the iteration of optimization steps, the optimization algorithm such as steepest descent or the BFGS method need to compute the gradients. The identical value of X9 for all training data samples enforces zero gradients in each step and causes the ARD process to eliminate other feature with larger value of d . As a consequence, the relevance of X9 compared with the other features cannot be processed using the ARD selection process. The obtained feature ranking using ARD is very helpful to product designers when they are determining the importance of form features while designing a mobile phone. For example, according to the obtained results, the designer should decrease the thickness (X3) and volume (X4) of the body since these two features had a high 13 ranking with different adjectives. For the adjective “rational-emotional”, the length (X1) and width (X2) of the body also had a very high ranking. As for the feature X5 (body type) with three kinds of attributes including block, flip, and slide type, it had a different ranking with each of the three adjectives. The feature X5 had the highest ranking (3rd) for the adjective “traditional-modern”, medium ranking (7th) for “heavy-handy”, and lowest ranking (10th) for “rational-emotional”. The least important features of these three adjectives are quite the same including X6 (type of function button), X10 (detail treatment of number button) and X11 (position of panel). < Insert Table 2 about here > 5.4. Relative relevance analysis of features In each elimination step of ARD, the value of d was used to determine the relative relevance for all form features. A normalized relevance value of each feature was calculated by dividing the inverse d with the maximum inverse d in each step. Hence, the most relevant feature in each step will have a value of 1.0 and the remaining features will have relevance values between 0.0 and 1.0. Fig. 4 shows the distribution of the normalized relevance values using the adjective “traditional-modern” as an illustrated example. Fig. 4(a) shows the relevance distribution of Step 1 before removing any feature. The two most relevant features (X3 and X5) had relatively higher relevance values than the other features. These less relevant features were eliminated and the distribution of the relevance value was similar during the following steps (Steps 2~4). Fig. 4(b) shows the relevance distribution of Step 5 after eliminating four features. The relevance values of X2, X7, X8 and X12 increased slightly in this step compared to Step1. Notice that the relevance of feature X4 increased gradually in the first five steps and became the most relevant feature in Step 6 as shown in Fig. 4(c). After Step 7 shown in Fig. 4(d), compared to the most relevant feature X4, the remaining features is much smaller. < Insert Fig. 4 about here > 6. Conclusions and discussions In this study, an approach based on LS-SVR and ARD is proposed to deal with the PFFS problem. PFFs are examined by morphological analysis. Pairwise adjectives are used to express CARs. The LS-SVR prediction model is formulated as a function estimation problem taking the form features as the input data and the averaging CAR 14 data as the output value. A backward elimination process based on ARD, a soft-embedded feature selection method, is used to analyze the relative relevance of the form features and obtain the feature ranking. In each elimination step of ARD, the influence of each form feature is adjusted gradually by tuning the length scales of an augmented Mahalanobis kernel with Bayesian inference. Depending on the designated adjective to be analyzed, this proposed method for PFFS provides product designers with a potential tool for systematically determining the relevance of PFFs. Although the resulting feature ranking obtained by the proposed method based on LS-SVR and ARD, is similar to that of the MLR with BMS, the proposed method gives a higher predictive performance benefits from the soft feature selection mechanism. However, our case study was based on the design of mobile phones and used a relatively small amount of PFFs. The form features of other products such as consumer electronics, furniture, automobiles, etc., may have to consider different characteristics. A more comprehensive study of different products is needed to verify the effectiveness of the proposed method. Also, inducing more complex product attributes other than form features, such as color, texture and material information on the product samples, is the main research direction being taken by the authors. 7. Suggestions for future study According to the earlier manuscript of this study, the reviewers have pointed out several suggestions worth of future works. The biggest challenge posed by the reviewers is probably the usefulness of the proposed method for PFFS due to the CAR data is involved with human subjective evaluations. Meanwhile, a more complete empirical comparison with other feature selection algorithms with filter, wrapper and embedded approach is still needed. Since kernel-based feature selection methods, such as ARD, are mainly applied in fields with high-dimensional data, such as bioinformatics, one may argue that it is no need to apply such sophisticated method to deal with the problem brought out in this study. Although using a relative small dataset of mobile phones and only 12 features are used, the authors regard this study as a fundamental study for developing a more comprehensive application for product design. Several possible future works to extend the limitations of this study are described here. Considerations for the diversity of consumers When linking the CAR data to PFFs in KE studies, CAR is frequently treated as averaged data across all subjects. Since the subjective perception of consumers are usually diverse and heterogeneous in nature, averaged consumer data may fail to 15 represent any individual opinion. A remedy for such problem is to construct individual prediction model for each consumer and aggregate the individual output CAR by fuzzy integral, e.g. the Choquet integral, which is often in domains like multicriteria decision making (MCDM). Integration with CAD models for product form representation Simple and intuitive representation scheme of PFFs obtained from morphological analysis can be replaced by CAD models. The crux to adapt more complex CAD models is rely on making proper use of kernel function, such as building kernel function based on similarity measure for CAD models or integrating the use of model-tree and edit distance-based kernel to reflect the model constructing process. Web-based platform for collecting consumers’ subjective evaluation data The collection of the CAR data in KE studies is often accomplished by tedious SD experiments. A web-based platform providing natural browsing environment which mimic e-commerce website is capable to automatically gather more evaluated responses for large amounts of product samples. Acknowledgements The authors would like to thank three anonymous reviewers for their valuable comments that have led to a substantial improvement in this study. This work was supported by the National Science Council of the Republic of China under grant NSC96-2221-E-006-126. References Behrens, T. E. J., Berg, H. J., Jbabdi, S., Rushworth, M. F. S., Woolrich, M. W., 2007. Probabilistic diffusion tractography with multiple fibre orientations: What can we gain? NeuroImage 34, 144-155. Blum, A. L., Langley, P., 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 245-271. Breiman, L., Friedman, J., Olshen, R., Stone, C., 1983. CART: Classification and Regression Trees. Wadsworth, CBelmont, California. Chu, W., Keerthi, S. S., Ong, C. J., Ghahramani, Z., 2006. Bayesian support vector machines for feature ranking and selection. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (Eds.), Feature Extraction, Foundations and Applications. 16 Springer Verlag. Grandvalet, Y., Canu, S., 2003. Adaptive scaling for feature selection in SVMs. In: Becker, S., Thrun, S., Obermayer, K. (Eds.), Advances in Neural Information Processing Systems 15. MIT Press, 569-576. Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389-422. Han, S. H., Hong, S. W., 2003. A systematic approach for coupling user satisfaction with product design. Ergonomics 46(13/14), 1441-1461. Han, S. H., Kim, K. J., Yun, M. H., Hong, S. W., Kim, J., 2004. Identifying mobile phone design features critical to user satisfaction. Human Factors and Ergonomics in Manufacturing 14(1), 15-29. Han, S. H., Yang, H., 2004. Screening important design variables for building a usability model. International Journal of Industrial Ergonomics 33, 159-171. Herbrich, R., 2002. Learning Kernel Classifier: Theory and Algorithms. MIT Press, Cambridge. Hsiao, S. W., Huang, H. C., 2002. A neural network based approach for product form design. Design studies 23, 67-84. Hsu, S. H., Chuang, M. C., Chang, C. C., 2000. A semantic differential study of designers' and users' product form perception. International Journal of Industrial Ergonomics 25, 375-391. Jindo, T., Hirasago, K., Nagamachi, M., 1995. Development of a design support system for office chairs using 3-D graphics. International Journal of Industrial Ergonomics 15, 49-62. Jones, J. C., 1992. Design Methods. Van Nostrand Reinhold, New York. Lai, H. H., Lin, Y. C., Yeh, C. H., Wei, C. H., 2006. User-oriented design for the optimal combination on product design. International Journal of Production Economics 100, 253-267. Lopez, G., Batlles, F. J., Tovar-Pescador, J., 2005. Selection of input parameters to model direct solar irradiance by using artificial neural networks. Energy 30, 1675-1684. MacKay, D., 1994. Bayesian non-linear modelling for the prediction competition. ASHRAE Transactions 100, 1053-1062. Miller, G. A., 1956. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review 63, 81-97. Osgood, C. E., Suci, C. J., Tannenbaum, P. H., 1957. The Measurement of Meaning. University of Illinois Press, Champaign, IL. Shieh, M. D., Yang, C. C., 2007. Multiclass SVM-RFE for product form feature 17 selection. Expert Systems with Applications, doi:10.1016/j.eswa.2007.07.043. Suykens, J. A. K., Van Gestel, T., Brabanter, D., De Moor, B., Vandewalle, J., 2002. Least Squares Support Vector Machines. World Scientific, Singapore. Tibshirani, R., 1996. Regression shrinkage and selection via the Lasso. Journal of Royal Statistical Society Series B 58(1), 267-288. Vapnik, V. N., 1995. The nature of statistical learning theory. Springer, New York. Vapnik, V. N., Golowich, S., Smola, A., 1997. Support vector method for function approximation, regression estimation and signal processing. In: Mozer, M., Jordan, M., Petsche, T. (Eds.), Advance in neural information processing system 9. MIT Press, Cambridge, MA, pp. 281-287. Viaene, S., Dedene, G., Derrig, R. A., 2005. Auto claim fraud detection using Bayesian learning neural networks. Expert Systems with Applications 29, 653-666. Wang, D., Lu, W. Z., 2006. Interval estimation of urban ozone level and selection of influential factors by employing automatic relevance determination model. Chemosphere 62, 1600-1611. 18 Fig. 1. The flow chart of the proposed method for PFFS. Determine the product form representation using morphological analysis Conduct the questionaire investigation to gather comsumers' affective responses Construct the LS-SVR model using the form feautres and the representative adjectives Analyze the relative relevance of form features and obtain feature ranking using the ARD selection process 19 Fig. 2. Step 1 of Bayesian inference on length scales (traditional-modern) using (a) steepest descent method and (b) BFGS method with fixed a 1 and b 0 . (a) 20 (b) 21 Fig. 3. Performance of the traditional-modern pairwise adjective of (a) LS-SVR with Bayesian inference before ARD, (b) LS-SVR with Bayesian inference after ARD, (c) LS-SVR with LOOCV and (d) MLR. (a) (b) 22 (c) (d) 23 Fig. 4. Relevance distribution of (a) Step 1, (b) Step 5, (c) Step 6 and (d) Step 7 of traditional-modern pairwise adjective. (a) (b) 24 (c) (d) 25 Table 1. Complete list of PFFs of mobile phone design. Form features Type Attributes Length Continuous None Continuous None Continuous None Continuous None (X1) Width (X2) Thickness Body (X3) Volume (X4) Type Discrete (X5) Type Block body Flip body Slide body (X5-1) (X5-2) (X5-3) Full- Partial- Regular- separated separated separated (X6-1) (X6-2) (X6-3) Round Square Bar (X7-1) (X7-2) (X7-3) Circular Regular Asymmetric (X8-1) (X8-2) (X8-3) Square Vertical Horizontal (X9-1) (X9-2) (X9-3) Discrete Function button (X6) Style Discrete (X7) Shape Discrete Number button (X8) Arrangement Discrete (X9) 26 Detail treatment Discrete (X10) Regular seam Seamless Vertical seam Horizontal (X10-1) (X10-2) (X10-3) seam (X10-4) Position Discrete Panel (X11) Shape Middle Upper Lower Full (X11-1) (X11-2) (X11-3) (X11-4) Square Fillet Shield Round (X12-1) (X12-2) (X12-3) (X12-4) Discrete (X12) 27 Table 2. Feature ranking of (a) traditional-modern (b) rational-emotional and (c) heavy-handy pairwise adjectives using ARD and MLR. Traditional-modern (a) Rank ARD (b) *Dif MLR Rational-emotional Rank ARD (c) *Dif MLR Heavy-handy Rank ARD *Dif MLR 1 X4 +2 X3 1 X2 +9 X1 1 X3 +5 X4 2 3 4 5 6 7 8 X3 X5 X8 X7 X12 X2 X1 -1 -1 +3 0 -2 +2 -2 X5 X4 X12 X7 X1 X8 X10 2 3 4 5 6 7 8 X3 X1 X4 X11 X7 X8 X12 +5 -2 -2 -2 -2 +4 -2 X4 X11 X7 X10 X12 X6 X5 2 3 4 5 6 7 8 X8 X12 X4 X2 X7 X5 X1 +3 +1 -3 -3 +5 0 -5 X2 X1 X12 X8 X3 X5 X6 9 10 11 12 X10 X6 X11 X9 -1 +1 -1 - X2 X11 X6 X9 9 10 11 12 X10 X5 X6 X9 -4 -2 -4 - X3 X2 X8 X9 9 10 11 12 X11 X6 X10 X9 0 -2 -1 - X11 X10 X7 X9 *Dif denotes the ranking difference of the feature using ARD compared to MLR with BMS (+: proceed in rank, -: recede in rank). The underlined feature for ARD indicate its absolute value of ranking difference larger than 5. 28