Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University kbhwang@bi.snu.ac.kr Outline Bayesian network – revisit I Properties of Bayesian network Structural learning of Bayesian network Project 3-1 Analysis of structural learning algorithms ALARM dataset Bayesian network – revisit II Bayesian network classifiers (probabilistic inference) Project 3-2 Classification of microarray gene expression data using Bayesian networks Copyright (c) 2004 by SNU CSE Biointelligence Lab 2 Bayesian Network The joint probability distribution over all the variables in the Bayesian network. P( X 1 , X 2 ,..., X n ) i 1 P( X i | Pa i ) n Local probability distribution for Xi Pai : the set of parents of X i A i ( i1 ,..., iqi ) ~ parameter for P( X i | Pai ) B qi : # of configurations of Pai ri : # of states of X i C D E P( A, B, C , D, E ) P( A) P( B | A) P(C | A, B) P( D | A, B, C ) P( E | A, B, C , D) P( A) P( B) P(C | A, B) P( D | B) P( E | C ) Copyright (c) 2004 by SNU CSE Biointelligence Lab 3 Generative Model From the underlying distribution, a set of data examples can be generated. Conditional probability of interest can be calculated from jpd. Gene A Gene B This Bayesian network can classify the examples by calculating appropriate conditional probability. Gene C Gene E Gene F Class Gene D Gene G P(Class| other variables) Gene H Copyright (c) 2004 by SNU CSE Biointelligence Lab 4 Classification by Bayesian Networks I Calculate the conditional probability of ‘Class’ variable given the value of the other variables. Infer conditional probability from joint probability distribution. For example, P(Class | Gene A, Gene B, GeneC , Gene D, Gene E , Gene F , GeneG, Gene H ) P(Class, Gene A, Gene B, GeneC , Gene D, Gene E , Gene F , GeneG, Gene H ) , P(Class, Gene A, Gene B, GeneC, Gene D, Gene E, Gene F , GeneG, Gene H ) Class where summation is taken over all the possible class values. Copyright (c) 2004 by SNU CSE Biointelligence Lab 5 Knowing the Causal Structure Gene A A set of comprehensible rules (or knowledge) Gene B Gene C regulates Gene E and F. Gene C Class Gene D Gene D regulates Gene G and H. Class has an effect on Gene F and G. Gene E Gene F Gene G Gene H Copyright (c) 2004 by SNU CSE Biointelligence Lab 6 Learning Bayesian Networks Metric approach Use a scoring metric to measure how well a particular structure fits an observed set of cases. A search algorithm is used. Find a canonical form of an equivalence class. Independence approach An independence oracle (approximated by some statistical test) is queried to identify the equivalence class that captures the independencies in the distribution from which the data was generated. Search for a PDAG Copyright (c) 2004 by SNU CSE Biointelligence Lab 7 Scoring Metrics for Bayesian Networks Likelihood L(G, G, C) = P(C|Gh, G) Gh: hypothesis that the given data (C) was generated by a distribution that can be factored according to G. The maximum likelihood metric of G (entropy metric with opposite sign) log M ML (G, C ) max log L(G, G , C ) N: data size G xj: jth example N log PG (x j ) j 1 N Pˆ (x) log P( xi | pai ) i x i N ( x , pa ) log P( x | pa ) i i i i xi ,pai prefer complete graph structure Copyright (c) 2004 by SNU CSE Biointelligence Lab 8 Information Criterion Scoring Metrics The Akaike information criterion (AIC) metric M AIC (G, C ) log M ML (G, C ) Dim(G) Dim(G) (ri 1) qi i Bayesian information criterion (BIC) metric 1 M BIC (G, C ) log M ML (G, C ) Dim (G ) log N 2 Copyright (c) 2004 by SNU CSE Biointelligence Lab 9 MDL Scoring Metrics The minimum description length (MDL) metric 1 M MDL1 (G, C) log P(G) M BIC (G, C) The minimum description length (MDL) metric 2 M MDL 2 (G, C) log M ML (G, C) | EG | log N c Dim(G) Copyright (c) 2004 by SNU CSE Biointelligence Lab 10 Bayesian Scoring Metrics A Bayesian metric M (G, C, ) log P(G h | ) log P(C | G h , ) c The BDe (Bayesian Dirichlet & likelihood equivalence) metric n qi P(C | G , ) h i 1 j 1 ( Nij ' ) ( Nij ' Nij ) ri ( Nijk ' Nijk ) k 1 ( Nijk ' ) Prior on the network structure P(G ) 2 n i log n log |Pai | Copyright (c) 2004 by SNU CSE Biointelligence Lab 11 Greedy Search Algorithm Generate initial Bayesian network structure G0. For m = 1, 2, 3, …, until convergence. Among the possible local changes (insertion of an edge, reversal of an edge, and deletion of an edge) in Gm–1, the one leads to the largest improvement in the score is performed. The resulting graph is Gm. Stopping criterion Score(Gm–1) == Score(Gm). At each iteration (learning Bayesian network consisting of n variables) O(n2) local changes should be evaluated to select the best one. Random restarts is usually adopted to escape local maxima. Copyright (c) 2004 by SNU CSE Biointelligence Lab 12 Project 3-1 Analysis of structural learning algorithms Data generation from ALARM network Various data set size, e.g., 1000, 3000, 5000, 10000. Structural learning of Bayesian network by greedy search (hillclimbing) with several kinds of scoring metrics Compare the results w.r.t. edge errors according to various sample sizes and learning methods Copyright (c) 2004 by SNU CSE Biointelligence Lab 13 ALARM Network # of nodes: 37 # of edges: 46 # of possible values of variable: 2 ~ 4 values Copyright (c) 2004 by SNU CSE Biointelligence Lab 14 Data Generation Using Netica (http://www.norsys.com) Copyright (c) 2004 by SNU CSE Biointelligence Lab 15 Structural Learning WEKA (http://www.cs.waikato.ac.nz/ml/weka/) http://www.cs.waikato.ac.nz/~remco/weka_bn/ Copyright (c) 2004 by SNU CSE Biointelligence Lab 16 Probabilistic Inference Calculate the conditional probability given values of observed variables. Junction tree algorithm Sampling methods General probabilistic inference is intractable. (It is known to be NP-hard.) However, calculation of the conditional probability for classification is rather straightforward because of the property of Bayesian network structure (d-separation). Copyright (c) 2004 by SNU CSE Biointelligence Lab 17 Markov Blanket Variables of interest X = {X1, X2, …, Xn} For a variable Xi, its Markov blanket MB(Xi) is the subset of X – Xi which satisfies the following: P( X i | X X i ) P( X i | MB( X i )). Markov boundary Minimal Markov blanket Copyright (c) 2004 by SNU CSE Biointelligence Lab 18 Markov Blanket in Bayesian Network Given Bayesian network structure, determination of the Markov blanket of a variable is straightforward. By the conditional independence assertions. Gene A Gene C Gene E Gene F The Markov blanket of a node in the Bayesian network consists of all of its parents, spouses, and children. Gene B Class Gene D Gene G Gene H Copyright (c) 2004 by SNU CSE Biointelligence Lab 19 Classification by Bayesian Networks II P(Class | Gene A, Gene B, GeneC , Gene D, Gene E , Gene F , GeneG, Gene H ) P(Class, Gene A, Gene B, GeneC , Gene D, Gene E , Gene F , GeneG, Gene H ) P(Class, Gene A, Gene B, GeneC , Gene D, Gene E, Gene F , GeneG, Gene H ) Class P( A) P( B) P(C ) P(Class | A, B) P( D) P( E | C ) P( F | C , Class) P(G | Class, D) P( H | D) P( A) P( B) P(C ) P(Class | A, B) P( D) P( E | C ) P( F | C , Class) P(G | Class, D) P( H | D) Class P( A) P( B) P(C ) P(Class | A, B) P( D) P( F | C , Class) P(G | Class, D) P( A) P( B) P(C ) P(Class | A, B) P( D) P( F | C , Class) P(G | Class, D) Class P(Class | A, B) P( F | C , Class) P(G | Class, D) Copyright (c) 2004 by SNU CSE Biointelligence Lab 20 Project 3-2 Classification using Bayesian network Evaluate performance of Bayesian network classifier (classification accuracy) Various parameter settings, e.g., scoring metrics and learning methods If possible, compare with other learning methods such as neural networks and decision trees. Leave-one-out cross validation Using WEKA Copyright (c) 2004 by SNU CSE Biointelligence Lab 21 Molecular Biology: Central Dogma DNA microarray Copyright (c) 2004 by SNU CSE Biointelligence Lab 22 DNA Microarrays Monitor thousands of gene expression levels simultaneously traditional one gene experiments. Fabricated by high-speed robotics. Known probes Copyright (c) 2004 by SNU CSE Biointelligence Lab 23 Types of DNA Microarrays Oligonucleotide chips An array of oligonucleotide (20 ~ 80-mer oligos) probes is synthesized. cDNA microarrays Probe cDNA (500 ~ 5,000 bases long) is immobilized to a solid surface. Copyright (c) 2004 by SNU CSE Biointelligence Lab 24 Study Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells, MH Cheok et al., Nature Genetics 35, 2003. 60 leukemia patients Bone marrow samples Affymetrix GeneChip arrays Copyright (c) 2004 by SNU CSE Biointelligence Lab Gene expression data 25 Gene Expression Data # of data examples 120 (60: before treatment, 60: after treatment) # of genes measured 12600 (Affymetrix HG-U95A array) Task Classification between “before treatment” and “after treatment” based on gene expression pattern Copyright (c) 2004 by SNU CSE Biointelligence Lab 26 Affymetrix GeneChip Arrays Use short oligos to detect gene expression level. Each gene is probed by a set of short oligos. Each gene expression level is summarized by Signal: numerical value describing the abundance of mRNA A/P call: denotes the statistical significance of signal Copyright (c) 2004 by SNU CSE Biointelligence Lab 27 Preprocessing Remove the genes having more than 60 ‘A’ calls # of genes: 12600 3190 Discretization of gene expression level Criterion: median gene expression value of each sample 0 (low) and 1 (high) Copyright (c) 2004 by SNU CSE Biointelligence Lab 28 Gene Filtering Using mutual information log P(G, C ) I (G; C ) P(G, C ) log P(G) P(C ) G ,C Estimated probabilities were used. # of genes: 3190 50 Final dataset # of attributes: 51 (one for the class) Class: 0 (after treatment), 1 (before treatment) # of data examples: 120 Copyright (c) 2004 by SNU CSE Biointelligence Lab 29 Final Dataset 51 120 Copyright (c) 2004 by SNU CSE Biointelligence Lab 30 Submission Deadline: 2004. 12. 2 Location: 301-419 Copyright (c) 2004 by SNU CSE Biointelligence Lab 31