Extending Propositional Satisfiability to Determine Minimal Fuzzy-Rough Reducts Richard Jensen Aberystwyth University, UK Andrew Tuson City University, UK Qiang Shen Aberystwyth University, UK Richard Jensen, Andrew Tuson and Qiang Shen Outline • The importance of feature selection • Rough set theory • Fuzzy-rough feature selection (FRFS) • FRFS-SAT • Experimentation • Conclusion Richard Jensen, Andrew Tuson and Qiang Shen Feature selection • Why dimensionality reduction/feature selection? High dimensional data Dimensionality Reduction Intractable Low dimensional data Processing System • Growth of information - need to manage this effectively • Curse of dimensionality - a problem for machine learning Richard Jensen, Andrew Tuson and Qiang Shen Rough set theory Upper Approximation Set A Lower Approximation Equivalence class Rx Rx is the set of all points that are indiscernible with point x in terms of feature subset B Richard Jensen, Andrew Tuson and Qiang Shen Discernibility approach • Decision-relative discernibility matrix • Compare objects • Examine attribute values • For attributes that differ: cij {a C | a( xi ) a( x j ), d ( xi ) d ( x j )} • If decision values differ, include attributes in matrix • Else leave slot blank • Construct discernibility function: fC(a1,...,am) {(cij ) :1 j i | U |, cij } Richard Jensen, Andrew Tuson and Qiang Shen Example • Remove duplicates fC(a,b,c,d) = {a ⋁ b ⋁ c ⋁ d} ⋀ {a ⋁ c ⋁ d} ⋀ {b ⋁ c} ⋀ {d} ⋀ {a ⋁ b ⋁ c} ⋀ {a ⋁ b ⋁ d} ⋀ {b ⋁ c ⋁ d} ⋀ {a ⋁ d} • Remove supersets fC(a,b,c,d) = {b ⋁ c} ⋀ {d} Richard Jensen, Andrew Tuson and Qiang Shen Finding reducts • Usually too expensive to search exhaustively for reducts with minimal cardinality • Reducts found through: • Converting from CNF to DNF (expensive) • Hill-climbing search using clauses (non-optimal) • Other search methods - GAs etc (non-optimal) • RSAR-SAT • Solve directly in SAT formulation. • DPLL approach is both fast and ensures optimal reducts Richard Jensen, Andrew Tuson and Qiang Shen Fuzzy discernibility matrices • Extension of crisp approach • Previously, attributes had {0,1} membership to clauses • Now have membership in [0,1] • Allows real-coded data as well as nominal. • Fuzzy DMs can be used to find fuzzy-rough reducts Richard Jensen, Andrew Tuson and Qiang Shen Formulation • Fuzzy satisfiability • In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true • For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true Richard Jensen, Andrew Tuson and Qiang Shen Experimentation: setup • 9 benchmark datasets • Features – 10 to 39 • Objects – 120 to 690 • Methods used: • FRFS-SAT • Greedy hill-climbing: fuzzy dependency, fuzzy boundary region and fuzzy discernibility. • Evolutionary algorithms: genetic algorithms (GA) and particle swarm optimization (PSO) using fuzzy dependency • 10x10-fold cross validation • FS performed on the training folds, test folds reduced using discovered reducts Richard Jensen, Andrew Tuson and Qiang Shen Experimentation: results Richard Jensen, Andrew Tuson and Qiang Shen Conclusion • Extended propositional satisfiability to enable search for fuzzy-rough reducts • New framework for fuzzy satisfiability • New DPLL algorithm • Fuzzy clause simplification • Future work: • • • • Non-chronological backtracking Better heuristics Unsupervised FS Other extensions in propositional satisfiability Richard Jensen, Andrew Tuson and Qiang Shen • WEKA implementations of all fuzzy-rough feature selectors and classifiers can be downloaded from: Richard Jensen, Andrew Tuson and Qiang Shen Feature selection • Feature selection (FS) is a DR technique that preserves data semantics (meaning of data) Generation Evaluation Stopping Criterion Validation • Subset generation: forwards, backwards, random… • Evaluation function: determines ‘goodness’ of subsets • Stopping criterion: decide when to stop subset search Richard Jensen, Andrew Tuson and Qiang Shen Algorithm Richard Jensen, Andrew Tuson and Qiang Shen Example Richard Jensen, Andrew Tuson and Qiang Shen