Refining Rules Incorporated into Knowledge-Based Support Vector Learners via Successive Linear Programming Richard Maclin University of Minnesota - Duluth Edward Wild, Jude Shavlik, Lisa Torrey, Trevor Walker University of Wisconsin - Madison The Setting Given • Examples for classification/regression task • Advice from an expert about the task Do • Learn an accurate model • Refine the advice (if needed) Knowledge-Based Support Vector Classification/Regression Motivation • Advice-taking methods incorporate human user’s knowledge • But users may not be able to precisely define advice • Idea: allow users to specify advice but refine the advice with the data An Example of Advice True concept IF (3x1 – 4x2) > -1 THEN class = + ELSE class = Examples 0.8 , 0.7 , 0.3 , 0.2 , + 0.2 , 0.6 , 0.8 , 0.1 , Advice IF (3x1 – 4x2) > 0 THEN class = + ELSE class = (wrong, threshold should be -1) Knowledge-Based Classification Knowledge Refinement SVM Formulation min (model complexity) + C (penalties for error) such that model fits data (with slack vars for error) Knowledge-Based SVMs [Fung et al., 2002, 2003 (KBSVM), Mangasarian et al., 2004 (KBKR)] min (model complexity) + C (penalties for error) + (µ1,µ2) (penalties for not following advice) such that model fits data (with slack vars for error) + model fits advice (also with slacks) Refining Advice min (model complexity) + C (penalties for error) + (µ1,µ2) (penalties for not following advice) + ρ (penalties for changing advice) such that model fits data (with slack vars for error) + model fits advice (also with slacks) + variables to refine advice Incorporating Advice in KBKR Advice format Bx ≤ d f(x) ≥ IF (3x1 – 4x2) > 0 THEN class = + (f(x) ≥ 1) 3 x1 x 2 x is ... xk 4 ... 0 x 0 f(x) ≥ 1 Linear Programming with Advice Advice Bx ≤ d f(x) ≥ IF (3x1 – 4x2) > 0 THEN class = + KBSVMs: min ||w||1 + |b| + C||s||1 sum per advice k µ1||zk||1+µ2ζk such that Y(wTx +b) + s ≥ 1 for each advice k wk+BkTuk = zk -dTuk + ζk ≥ βk – bk (s,uk,ζk)≥0 Refining Advice KBSVMs: min ||w||1 + |b| + C||s||1 sum per advice k µ1||zk||1+µ2ζk+ρ||δ||1 such that Would like to just Tx +b) + s ≥ 1 Y(w add to linear for each advice k programming wk+BkTuk = zk formulation, but (δ-d)Tuk + ζk ≥ βk – bk Cannot solve for δ and (s,uk,ζk)≥0 Advice Bx ≤ (d - δ) f(x) ≥ u simultaneously! Solution: Successive Linear Programming Rule-Refining Support Vector Machines (RRSVM) algorithm: Set δ=0 Repeat Fix value of δ and solve LP for u Fix value of u and solve LP for δ Until no change to δ or max # of repeats Experiments Artificial data sets IF (3x1–4x2)>-1 THEN class = + ELSE class = - Data randomly generated (with and w/o noise) Errors added (e.g., -1 dropped) to make advice Promoter data set Data: Towell et al. (1990) Domain theory: Ortega (1995) Methodology • Experiments repeated twenty times • Artificial data results – training and test set randomly generated (separately) • Promoter data – ten fold cross validation • Parameters chosen using cross validation (ten folds) on training data Standard SVMs: KBSVMs: RRSVMs: C C, µ1, µ2 C, µ1, µ2 , ρ Artificial Data Results 0.30 Error SVM 0.25 SVM - Only Relevant Features 0.20 KBSVM - Good Advice RRSVM - Good Advice 0.15 0.10 0.05 0.00 0 50 100 Training Set Size 150 200 Artificial Data Results 0.30 Error SVM 0.25 SVM - Only Relevant Features 0.20 KBSVM - Good Advice RRSVM - Good Advice 0.15 KB SVM - Bad Advice 0.10 RRSVM - Bad Advice 0.05 0.00 0 50 100 Training Set Size 150 200 Promoter Results SVM KBSVM - Original Advice RRSVM - Original Advice KBSVM - Poor Advice RRSVM - Poor Advice 0.00 0.05 0.10 Average Error 0.15 Related Work • Knowledge-Based Kernel Methods – – – – – Fung et al., NIPS 2002, COLT 2003 Mangasarian et al., JMLR 2005 Maclin et al., AAAI 2005, 2006 Le et al., ICML 2006 Mangasarian and Wild, IEEE Trans Neural Nets 2006 • Knowledge Refinement – Towell et al., AAAI 1990 – Pazzani and Kibler, MLJ 1992 – Ourston and Mooney, AIJ 1994 • Extracting Learned Knowledge from Networks – – – – Fu, AAAI 1991 Towell and Shavlik, MLJ 1993 Thrun, 1995 Fung et al., KDD 2005 Future Work • Test on other domains • Address limitations (speed, # of parameters) • Refine multipliers of antecedents • Add additional terms to rules • Investigate rule extraction methods Conclusions RRSVM • Key idea: refine advice by adjusting thresholds of rules • Can produce more accurate models • Able to produce changes to advice • Have shown that RRSVM converges Acknowledgements • US Naval Research Laboratory grant N00173-06-1-G002 (to RM) • DARPA grant HR0011-04-1-0007 (to JS) Questions?