Project#3

advertisement
Artificial Intelligence
Term Project #3
Kyu-Baek Hwang
Biointelligence Lab
School of Computer Science and Engineering
Seoul National University
kbhwang@bi.snu.ac.kr
Outline

Bayesian network – revisit I
 Properties of Bayesian network
 Structural learning of Bayesian network

Project 3-1
 Analysis of structural learning algorithms
 ALARM dataset

Bayesian network – revisit II
 Bayesian network classifiers (probabilistic inference)

Project 3-2
 Classification of microarray gene expression data using
Bayesian networks
Copyright (c) 2004 by SNU CSE Biointelligence Lab
2
Bayesian Network

The joint probability distribution over all the variables in
the Bayesian network.
P( X 1 , X 2 ,..., X n )  i 1 P( X i | Pa i )
n
Local probability
distribution for Xi
Pai : the set of parents of X i
A
i  ( i1 ,..., iqi ) ~ parameter for P( X i | Pai )
B
qi : # of configurations of Pai
ri : # of states of X i
C
D
E
P( A, B, C , D, E )
 P( A) P( B | A) P(C | A, B) P( D | A, B, C ) P( E | A, B, C , D)
 P( A) P( B) P(C | A, B) P( D | B) P( E | C )
Copyright (c) 2004 by SNU CSE Biointelligence Lab
3
Generative Model


From the underlying distribution, a set of data examples
can be generated.
Conditional probability of interest can be calculated from
jpd.
Gene A
Gene B
This Bayesian network can classify
the examples by calculating
appropriate conditional probability.
Gene C
Gene E
Gene F
Class
Gene D
Gene G
 P(Class| other variables)
Gene H
Copyright (c) 2004 by SNU CSE Biointelligence Lab
4
Classification by Bayesian Networks I

Calculate the conditional probability of ‘Class’ variable
given the value of the other variables.
 Infer conditional probability from joint probability distribution.
 For example,
P(Class | Gene A, Gene B, GeneC , Gene D, Gene E , Gene F , GeneG, Gene H )

P(Class, Gene A, Gene B, GeneC , Gene D, Gene E , Gene F , GeneG, Gene H )
,
 P(Class, Gene A, Gene B, GeneC, Gene D, Gene E, Gene F , GeneG, Gene H )
Class
 where
summation is taken over all the possible class values.
Copyright (c) 2004 by SNU CSE Biointelligence Lab
5
Knowing the Causal Structure
Gene A
A set of comprehensible rules (or
knowledge)
Gene B
Gene C regulates Gene E and F.
Gene C
Class
Gene D
Gene D regulates Gene G and H.
Class has an effect on Gene F and G.
Gene E
Gene F
Gene G
Gene H
Copyright (c) 2004 by SNU CSE Biointelligence Lab
6
Learning Bayesian Networks

Metric approach
 Use a scoring metric to measure how well a particular structure
fits an observed set of cases.
 A search algorithm is used.  Find a canonical form of an
equivalence class.

Independence approach
 An independence oracle (approximated by some statistical test)
is queried to identify the equivalence class that captures the
independencies in the distribution from which the data was
generated.  Search for a PDAG
Copyright (c) 2004 by SNU CSE Biointelligence Lab
7
Scoring Metrics for Bayesian Networks

Likelihood L(G, G, C) = P(C|Gh, G)
 Gh: hypothesis that the given data (C) was generated by a
distribution that can be factored according to G.

The maximum likelihood metric of G (entropy metric with opposite
sign)
log M ML (G, C )  max log L(G, G , C )
N: data size
G
xj: jth example
N
  log PG (x j )
j 1
 N  Pˆ (x) log  P( xi | pai )

i
x
i
 N ( x , pa ) log P( x | pa )
i
i
i
i
xi ,pai
 prefer complete graph structure
Copyright (c) 2004 by SNU CSE Biointelligence Lab
8
Information Criterion Scoring Metrics

The Akaike information criterion (AIC) metric
M AIC (G, C )  log M ML (G, C )  Dim(G)
Dim(G)   (ri  1)  qi
i

Bayesian information criterion (BIC) metric
1
M BIC (G, C )  log M ML (G, C )  Dim (G ) log N
2
Copyright (c) 2004 by SNU CSE Biointelligence Lab
9
MDL Scoring Metrics

The minimum description length (MDL) metric 1
M MDL1 (G, C)  log P(G)  M BIC (G, C)

The minimum description length (MDL) metric 2
M MDL 2 (G, C)  log M ML (G, C) | EG | log N  c  Dim(G)
Copyright (c) 2004 by SNU CSE Biointelligence Lab
10
Bayesian Scoring Metrics

A Bayesian metric
M (G, C,  )  log P(G h |  )  log P(C | G h ,  )  c

The BDe (Bayesian Dirichlet & likelihood equivalence)
metric
n
qi
P(C | G ,  )  
h
i 1 j 1

( Nij ' )
( Nij ' Nij )
ri
( Nijk ' Nijk )
k 1
( Nijk ' )

Prior on the network structure
P(G )  2


 n 
 i  log n  log |Pai |  
Copyright (c) 2004 by SNU CSE Biointelligence Lab
11
Greedy Search Algorithm

Generate initial Bayesian network structure G0.
 For m = 1, 2, 3, …, until convergence.
 Among the possible local changes (insertion of an edge, reversal of an
edge, and deletion of an edge) in Gm–1, the one leads to the largest
improvement in the score is performed. The resulting graph is Gm.

Stopping criterion
 Score(Gm–1) == Score(Gm).

At each iteration (learning Bayesian network consisting of n
variables)
 O(n2) local changes should be evaluated to select the best one.

Random restarts is usually adopted to escape local maxima.
Copyright (c) 2004 by SNU CSE Biointelligence Lab
12
Project 3-1

Analysis of structural learning algorithms
 Data generation from ALARM network
 Various
data set size, e.g., 1000, 3000, 5000, 10000.
 Structural learning of Bayesian network by greedy search (hillclimbing) with several kinds of scoring metrics
 Compare the results w.r.t. edge errors according to various
sample sizes and learning methods
Copyright (c) 2004 by SNU CSE Biointelligence Lab
13
ALARM Network
# of nodes: 37
# of edges: 46
# of possible values of variable: 2 ~ 4 values
Copyright (c) 2004 by SNU CSE Biointelligence Lab
14
Data Generation

Using Netica (http://www.norsys.com)
Copyright (c) 2004 by SNU CSE Biointelligence Lab
15
Structural Learning

WEKA (http://www.cs.waikato.ac.nz/ml/weka/)
 http://www.cs.waikato.ac.nz/~remco/weka_bn/
Copyright (c) 2004 by SNU CSE Biointelligence Lab
16
Probabilistic Inference

Calculate the conditional probability given values of
observed variables.
 Junction tree algorithm
 Sampling methods
 General probabilistic inference is intractable. (It is known to be
NP-hard.)
 However,
calculation of the conditional probability for
classification is rather straightforward because of the property of
Bayesian network structure (d-separation).
Copyright (c) 2004 by SNU CSE Biointelligence Lab
17
Markov Blanket

Variables of interest
 X = {X1, X2, …, Xn}

For a variable Xi, its Markov blanket MB(Xi) is the
subset of X – Xi which satisfies the following:
P( X i | X  X i )  P( X i | MB( X i )).

Markov boundary
 Minimal Markov blanket
Copyright (c) 2004 by SNU CSE Biointelligence Lab
18
Markov Blanket in Bayesian Network

Given Bayesian network structure, determination of the
Markov blanket of a variable is straightforward.
 By the conditional independence assertions.
Gene A
Gene C
Gene E
Gene F
The Markov blanket of a node in
the Bayesian network consists of
all of its parents, spouses, and
children.
Gene B
Class
Gene D
Gene G
Gene H
Copyright (c) 2004 by SNU CSE Biointelligence Lab
19
Classification by Bayesian Networks II
P(Class | Gene A, Gene B, GeneC , Gene D, Gene E , Gene F , GeneG, Gene H )
P(Class, Gene A, Gene B, GeneC , Gene D, Gene E , Gene F , GeneG, Gene H )

 P(Class, Gene A, Gene B, GeneC , Gene D, Gene E, Gene F , GeneG, Gene H )
Class

P( A) P( B) P(C ) P(Class | A, B) P( D) P( E | C ) P( F | C , Class) P(G | Class, D) P( H | D)
 P( A) P( B) P(C ) P(Class | A, B) P( D) P( E | C ) P( F | C , Class) P(G | Class, D) P( H | D)
Class

P( A) P( B) P(C ) P(Class | A, B) P( D) P( F | C , Class) P(G | Class, D)
 P( A) P( B) P(C ) P(Class | A, B) P( D) P( F | C , Class) P(G | Class, D)
Class
 P(Class | A, B) P( F | C , Class) P(G | Class, D)
Copyright (c) 2004 by SNU CSE Biointelligence Lab
20
Project 3-2

Classification using Bayesian network
 Evaluate performance of Bayesian network classifier
(classification accuracy)
 Various parameter settings, e.g., scoring metrics and learning
methods
 If possible, compare with other learning methods such as neural
networks and decision trees.
 Leave-one-out cross validation

Using WEKA
Copyright (c) 2004 by SNU CSE Biointelligence Lab
21
Molecular Biology: Central Dogma
DNA
microarray
Copyright (c) 2004 by SNU CSE Biointelligence Lab
22
DNA Microarrays


Monitor thousands of gene expression levels
simultaneously  traditional one gene experiments.
Fabricated by high-speed robotics.
Known
probes
Copyright (c) 2004 by SNU CSE Biointelligence Lab
23
Types of DNA Microarrays

Oligonucleotide chips
 An array of oligonucleotide (20 ~ 80-mer oligos) probes is
synthesized.

cDNA microarrays
 Probe cDNA (500 ~ 5,000 bases long) is immobilized to a solid
surface.
Copyright (c) 2004 by SNU CSE Biointelligence Lab
24
Study

Treatment-specific changes in gene expression
discriminate in vivo drug response in human leukemia
cells, MH Cheok et al., Nature Genetics 35, 2003.
60 leukemia
patients
Bone marrow
samples
Affymetrix
GeneChip arrays
Copyright (c) 2004 by SNU CSE Biointelligence Lab
Gene expression
data
25
Gene Expression Data

# of data examples
 120 (60: before treatment, 60: after treatment)

# of genes measured
 12600 (Affymetrix HG-U95A array)

Task
 Classification between “before treatment” and “after treatment”
based on gene expression pattern
Copyright (c) 2004 by SNU CSE Biointelligence Lab
26
Affymetrix GeneChip Arrays



Use short oligos to detect gene expression level.
Each gene is probed by a set of short oligos.
Each gene expression level is summarized by
 Signal: numerical value describing the abundance of mRNA
 A/P call: denotes the statistical significance of signal
Copyright (c) 2004 by SNU CSE Biointelligence Lab
27
Preprocessing

Remove the genes having more than 60 ‘A’ calls
 # of genes: 12600  3190

Discretization of gene expression level
 Criterion: median gene expression value of each sample
 0 (low) and 1 (high)
Copyright (c) 2004 by SNU CSE Biointelligence Lab
28
Gene Filtering

Using mutual information
log P(G, C )
I (G; C )   P(G, C )
log P(G) P(C )
G ,C
 Estimated probabilities were used.
 # of genes: 3190  50

Final dataset
 # of attributes: 51 (one for the class)
 Class:
0 (after treatment), 1 (before treatment)
 # of data examples: 120
Copyright (c) 2004 by SNU CSE Biointelligence Lab
29
Final Dataset
51
120
Copyright (c) 2004 by SNU CSE Biointelligence Lab
30
Submission


Deadline: 2004. 12. 2
Location: 301-419
Copyright (c) 2004 by SNU CSE Biointelligence Lab
31
Download