Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. Bayes Net Structure Learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Copyright © 2001, Andrew W. Moore Oct 29th, 2001 Reminder: A Bayes Net Copyright © 2001, Andrew W. Moore Bayes Net Structure: Slide 2 1 Estimating Probability Tables Copyright © 2001, Andrew W. Moore Bayes Net Structure: Slide 3 Estimating Probability Tables Copyright © 2001, Andrew W. Moore Bayes Net Structure: Slide 4 2 Scoring a structure (Which of these fits the data best?) N. Friedman and Z. Yakhini, On the sample complexity of learning Bayesian networks, Proceedings of the 12th conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1996 Score = N − params log R 2 num combinations m of parent values (arityof X j ) + R∑ j =1 ∑ ∑ P(V ) P( X k =1 k v =1 j = v | Vk ) log P ( X j = v | Vk ) Copyright © 2001, Andrew W. Moore Bayes Net Structure: Slide 5 Scoring a structure Number of nonredundant parameters defining the net #Attributes Sums over all the rows in the probability table for X j #Records Score = N − params log R 2 The parent values in the k’th row of X j ’s probability table num combinations m of parent values (arityof X j ) + R∑ j =1 ∑ k =1 ∑ P(V ) P( X v =1 k j = v | Vk ) log P ( X j = v | Vk ) All these values estimated from data Copyright © 2001, Andrew W. Moore Bayes Net Structure: Slide 6 3 Scoring a structure This is called a BIC (Bayes Information Criterion) estimate This part is a penalty for too many parameters This part is the training set loglikelihood Score = N − params log R 2 BIC asymptotically tries to get the structure right. (There’s a lot of heavy emotional debate about whether this is the best scoring criterion) num combinations m of parent values (arityof X j ) + R∑ j =1 ∑ k =1 ∑ P(V ) P( X v =1 k j = v | Vk ) log P ( X j = v | Vk ) All these values estimated from data Copyright © 2001, Andrew W. Moore Bayes Net Structure: Slide 7 Searching for structure with best score Copyright © 2001, Andrew W. Moore Bayes Net Structure: Slide 8 4 Inputs Inputs Inputs Learning Methods until today Dec Tree, Sigmoid Perceptron, Sigmoid N.Net, Classifier Predict Gauss/Joint BC, Gauss Naïve BC, N.Neigh category Density Estimator Probability Regressor Predict real no. Copyright © 2001, Andrew W. Moore Joint DE, Naïve DE, Gauss/Joint DE, Gauss Naïve DE Linear Regression, Quadratic Regression, Perceptron, Neural Net, N.Neigh, Kernel, LWR Bayes Net Structure: Slide 9 Inputs Inputs Inputs Learning Methods added today Dec Tree, Sigmoid Perceptron, Sigmoid N.Net, Classifier Predict Gauss/Joint BC, Gauss Naïve BC, N.Neigh category Density Estimator Probability Regressor Predict real no. Copyright © 2001, Andrew W. Moore Joint DE, Naïve DE, Gauss/Joint DE, Gauss Naïve DE, Bayes Net Structure Learning (Note, can be extended to permit mixed categorical/real values) Linear Regression, Quadratic Regression, Perceptron, Neural Net, N.Neigh, Kernel, LWR Bayes Net Structure: Slide 10 5 Inputs Inputs Inputs But also, for free… Dec Tree, Sigmoid Perceptron, Sigmoid N.Net, Classifier Predict Gauss/Joint BC, Gauss Naïve BC, N.Neigh, Bayes category Net Based BC Density Estimator Probability Regressor Predict real no. Copyright © 2001, Andrew W. Moore Joint DE, Naïve DE, Gauss/Joint DE, Gauss Naïve DE, Bayes Net Structure Learning Linear Regression, Quadratic Regression, Perceptron, Neural Net, N.Neigh, Kernel, LWR Bayes Net Structure: Slide 11 Inputs Inputs Inputs Inputs And a new operation… Inference P(E1|E2) Joint DE, Bayes Net Structure Learning Engine Learn Dec Tree, Sigmoid Perceptron, Sigmoid N.Net, Classifier Predict Gauss/Joint BC, Gauss Naïve BC, N.Neigh, Bayes category Net Based BC Density Estimator Probability Regressor Predict real no. Copyright © 2001, Andrew W. Moore Joint DE, Naïve DE, Gauss/Joint DE, Gauss Naïve DE, Bayes Net Structure Learning Linear Regression, Quadratic Regression, Perceptron, Neural Net, N.Neigh, Kernel, LWR Bayes Net Structure: Slide 12 6