Hybrid Fuzzy-Rough Rule Induction and Feature Selection Dr. Richard Jensen Aberystwyth University, UK Dr. Chris Cornelis Ghent University, Belgium rkj@aber.ac.uk Chris.Cornelis@UGent.be Prof Qiang Shen Aberystwyth University, UK qqs@aber.ac.uk FUZZ-IEEE 2009 Richard Jensen, Chris Cornelis and Qiang Shen Outline • Introduction • Rough set theory (RST) • Fuzzy-rough set theory • Proposed method: QuickRules • Experimentation • Conclusion Richard Jensen, Chris Cornelis and Qiang Shen Introduction • Rule induction has many advantages: e.g. understandability, accuracy, adding prior knowledge • ... but also limitations: scaling, dealing with noise, uncertainty... • Pre-processing often used Richard Jensen, Chris Cornelis and Qiang Shen Rough set theory Upper Approximation Set A Lower Approximation Equivalence class Rx Rx is the set of all points that are indiscernible with point x in terms of feature subset B Richard Jensen, Chris Cornelis and Qiang Shen Discovering rules via RST • Equivalence classes • Form the antecedent part of a rule • The lower approximation tells us if this is predictive of a given concept (certain rules) • Typically done in one of two ways: • Overlaying reducts • Building rules by considering individual equivalence classes (e.g. LEM2) • These require a discretization procedure Richard Jensen, Chris Cornelis and Qiang Shen Fuzzy rough sets Rough set t-norm Fuzzy rough set implicator Richard Jensen, Chris Cornelis and Qiang Shen 6 Fuzzy-rough sets • Fuzzy-rough feature selection • Evaluation: function based on fuzzy-rough lower approximation • Generation: greedy hill-climbing • Stopping criterion: when maximal ‘goodness’ is reached (or to degree α) • The fuzzy tolerance classes used during this process can be used to create fuzzy rules Richard Jensen, Chris Cornelis and Qiang Shen QuickRules Richard Jensen, Chris Cornelis and Qiang Shen Check Richard Jensen, Chris Cornelis and Qiang Shen Experimentation • 10-fold cross validation • 6 fuzzy/rough set classifiers • 5 non fuzzy/rough set classifiers Richard Jensen, Chris Cornelis and Qiang Shen Experimentation Richard Jensen, Chris Cornelis and Qiang Shen Experimentation Richard Jensen, Chris Cornelis and Qiang Shen Conclusion • Proposed a rule induction method based on fuzzy-rough sets • Based on fuzzy-rough feature selection, using fuzzy tolerance classes • Future work • Post-processing • Other search mechanisms (from FS literature) • Other measures, e.g. VQRS positive region and dependency Richard Jensen, Chris Cornelis and Qiang Shen • WEKA implementations of all fuzzy-rough classifiers and feature selectors can be downloaded from: Richard Jensen, Chris Cornelis and Qiang Shen Richard Jensen, Chris Cornelis and Qiang Shen Richard Jensen, Chris Cornelis and Qiang Shen