Feature selection Feature Selection refers to the selection of the most appropriate subset of features that describes (adequately) a given classification task. Key feature selection methods: - Open-loop (filter/ front-end/ preset bias) - Closed-loop (wrapper/ performance bias) 1- Open-loop methods (FILTER, preset bias, front end): Select features for which the reduced data set maximizes between-class separability (by evaluating within-class and between-class covariance matrices ); no feedback mechanism from the processing algorithm. 1 2-Closed-loop methods (WRAPPER, performance bias, classifier feedback): Select features based on the processing algorithm performance (feedback mechanism), which serves as a criterion for feature subset selection Feature selection has four different approaches Filter approach Wrapper approach Embedded approach Hybrid approach 2 Filter approach A subset of features is selected by this approach without using any learning algorithm. Higher-dimensional datasets use this method and it is relatively faster than the wrapperbased approaches Independent of classification model Uses only dataset of annotated examples A relevance measure for each feature is calculated: E.g: Feature – Class entropy Kullback-Leibler divergence (cross-entropy) Information gain, gain ratio Normalize relevance scores weights Fast, but discards feature dependencies 3 Wrapper approach This approach has high computational complexity. It uses a learning algorithm to evaluate the accuracy produced by the use of the selected features in classification. Wrapper methods can give high classification accuracy for particular classifiers Specific to a classification algorithm The search for a good feature subset is guided by a search algorithm (e.g. greedy forward or backward) The algorithm uses the evaluation of the classifier as a guide to find good feature subsets Examples: sequential forward or backward search, simulated annealing, stochastic iterative sampling (e.g. GA, EDA) Computationally intensive, but able to take into account feature dependencies 4