Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 12: Local Models Introduction Divide the input space into local regions and learn simple (constant/linear) models in each patch Unsupervised: Competitive, online clustering Supervised: Radial-basis func, mixture of experts 3 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Competitive Learning E mi i 1 X t i bit x t mi k t t 1 if x m min x ml t i l bi 0 othe rwise t t b x t i Batc hk - me ans: mi t b t i Batc hk - me ans: E t mij bit x tj mij mij 4 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Winner-take-all network 5 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Adaptive Resonance Theory t i Incremental; add a new cluster if not covered; defined by vigilance, ρ k t b x mi min x t ml l 1 mk 1 x t t m x mi i if bi othe rwise (Carpenter and Grossberg, 1988) 6 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Self-Organizing Maps Units have a neighborhood defined; mi is “between” mi-1 and mi+1, and are all updated together One-dim map: (Kohonen, 1990) ml e l ,i x t ml l i 2 1 e l ,i e xp 2 2 2 7 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Radial-Basis Functions Locally-tuned units: xt m h pht e xp 2sh2 2 H y whpht w 0 t h 1 8 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Local vs Distributed Representation 9 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Training RBF Hybrid learning: First layer centers and spreads: Unsupervised k-means Second layer weights: Supervised gradient-descent Fully supervised (Broomhead and Lowe, 1988; Moody and Darken, 1989) 10 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Regression E m , s , w h y it h ih i ,h 1 | X rit y it 2 t i 2 H t w p ih h w i 0 h 1 w ih rit y it p ht t mhj t t t ri y i w ih p h t i x t j mhj s h2 t t x mh t t s h ri y i w ih p h 3 s t i h 2 11 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Classification E mh , sh , wih i ,h | X rit log y it t i y t e xp h wihpht wi 0 i t e xp w p k h kh h wk 0 12 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Rules and Exceptions H y t whpht v T x t v 0 h 1 Exceptions Default rule 13 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Rule-Based Knowledge IF x 1 a A ND x 2 b OR x 3 c THEN y 0.1 x 1 a 2 x 2 b 2 p1 e xp e xp with w 1 0.1 2 2 2s1 2s2 x 3 c 2 p 2 e xp with w 2 0.1 2 2s3 Incorporation of prior knowledge (before training) Rule extraction (after training) (Tresp et al., 1997) Fuzzy membership functions and fuzzy rules 14 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Normalized Basis Functions t h g pht t p l 1 l H 2 t e xp x mh / 2sh2 2 t 2 e xp x m / 2 s l l l H y w ih ght t i h 1 w ih rit y it ght t mhj ri y w ih y t i t t i t i x g t h t j mhj sh2 15 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Competitive Basis Functions p r t | xt p h | xt p r t | h, xt Mixture model: H h 1 t p x | h p h t ph|x t p x | l p l l 2 t ah e xp x mh / 2sh2 t gh 2 t 2 a e xp x m / 2 s l l l l 16 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) t f t h t t i t ih t h t g e xp 1 / 2 r y l t l mhj f g g e xp 1 / 2i ri y t h 1 t 2 t | X log g e xp ri y ih h t 2 i t w ih is the c onstantfit y ih t h w ih ri y f t p r |x Regression L mh , sh , w ih i ,h rit y it 1 e xp 2 2 2 t i t h t j mhj sh2 t 2 ih t i t h x p h | x p r | h , x p h | r , x l p l | x p r | l , x t 2 il 17 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Classification L mh , sh , wih i ,h | X log g t h t h y t t ri ih i t t log g e xp ri log y ih t h i e xpwih k e xpwkh t h t y ih t h f t ght e xp i rit log y ih t t t g e xp r log y l l i i il 18 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) EM for RBF (Supervised EM) E-step: M-step: fht p r | h, x t mh sh t t f t h x t f t h f x w ih t t h t t mh x mh t f t h T t t f t h ri t f t h 19 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Learning Vector Quantization H units per class prelabeled (Kohonen, 1990) Given x, mi is the closest: mi x t mi t m x mi i if x t and mi have the same c lasslabe l othe rwise x mi mj 20 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Mixture of Experts In RBF, each local fit is a constant, wih, second layer weight In MoE, each local fit is a linear function of x, a local expert: t t wih vih xt (Jacobs et al., 1991) 21 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) MoE as Models Combined Radial gating: 2 t e xp x mh / 2sh2 ght 2 t 2 e xp x m / 2 s l l l Softmax gating: T t exp m t hx gh T t exp m l l x 22 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Cooperative MoE Regression E mh , s h , w ih i ,h r 1 t t | X ri y i 2 t i w 2 t v ih rit y ih g ht x t t mhj i t t y ih t ih y it g ht x tj t 23 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Competitive MoE: Regression L mh , s h , w ih i ,h 1 t | X log g e xp rit y ih t h 2 i t y ih w ih v ih x t t h f x 2 t v ih rit y ih f ht x t t m h t h ght t t 24 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Competitive MoE: Classification L mh , sh , w ih i ,h | X log g t t y ih h t h y t t ri ih i t log ght e xp rit log y ih t h i e xpw ih e xpv ih x t t e xp w e xp v x k k kh kh 25 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Hierarchical Mixture of Experts Tree of MoE where each MoE is an expert in a higher-level MoE Soft decision tree: Takes a weighted (gating) average of all leaves (experts), as opposed to using a single path and a single leaf Can be trained using EM (Jordan and Jacobs, 1994) 26 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)