Classification and variable selection algorithms using signomial

advertisement
Classification and variable selection algorithms using signomial function = Signomial 함수를 이용한 분류와 변
수 선택 해법
Data mining techniques extract useful information from large databases. The techniques can be categorized as
being either descriptive or predictive. In this thesis, we focus on classification, the predictive data mining used for
discrete target variables, and variable selection for classification. We propose classification algorithms for multiclass classification problems, and variable selection algorithms for binary classification and multi-class
classification using signomial function. Specifically, this research contributes to the field of classification and
variable selection by: 1. Constructing a multi-class classifier directly by solving a single optimization problem to
be capable of capturing the correlations among classes; 2. Obtaining classifiers which are sparse and can be
explicitly described in original space, which facilitates interpretation; 3. Determining a subset of variables that is
desirable for predicting the output, considering nonlinear interactions of variables; 4. Performing variable
selection for multi-class classification by treating multiple classes jointly to select a small common subset of
variables. First, we propose two multi-class classification methods using signomial function. Each of them directly
constructs a multi-class classifier by solving a single optimization problem. Since the number of possible
signomial terms is huge, we propose a column generation method that iteratively generates good signomial terms.
The both methods obtain better or comparable classification accuracies and give more sparse classifiers than the
existing methods. Next, we propose two embedded variable selection methods using signomial function. We
attempt to select, among a set of the input variables, those that lead to the best performance of the classifier.
One method repeatedly removes variables based on backward selection, and the other method directly select a
set of the variables by solving an optimization problem. The proposed methods conduct variable selection
considering nonlinear interactions of variables, and additionally obtain a signomial classifier with the selected
variables. The proposed methods select more desirable variables for predicting the output and give the classifiers
with the better or comparable test error rates, as compared with the existing methods. Lastly, we develop some
embedded variable selection methods for multi-class classification using signomial function. We introduce a
sparsity function which measures the number of the selected variables, and add the sparsity function to an
objective function. In addition to, we propose that different sparsity parameters are imposed on different variables
according to their relative importance. The proposed methods treat multiple classes jointly in multi-class
classification problems, and select variables that are desirable for predicting the output. In addtion to, the
proposed methods automatically determine the number of the variables to be selected, and obtain classifiers
without the additional training process.
Download