Learning with Hypergraphs: Discovery of Higher-Order Interaction Patterns from High-Dimensional Data Moscow State University, Faculty of Computational Mathematics and Cybernetics, Feb. 22, 2007, Moscow, Russia Byoung-Tak Zhang Biointelligence Laboratory School of Computer Science and Engineering Brain Science, Cognitive Science, Bioinformatics Programs Seoul National University Seoul 151-742, Korea btzhang@cse.snu.ac.kr http://bi.snu.ac.kr/ Probabilistic Graphical Models (PGMs) Represent the joint probability distribution on some random variables in graphical form. Undirected PGMs Directed PGMs • C and D are B A C independent given B. D • C asserts dependency between A and B. • B and E are independent given C. E Generative: The probability distribution for some P( A, B, C , D, E ) variables given values of P( A) P( B | A) P(C | A, B) P( D | A, B, C ) other variables can be P( E | A, B, C , D) obtained. P( A) P( B) P(C | A, B) P( D | B) P( E | C ) Probabilistic inference 2 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Kinds of Graphical Models Graphical Models Undirected - Boltzmann Machines - Markov Random Fields Directed - Bayesian Networks - Latent Variable Models - Hidden Markov Models - Generative Topographic Mapping - Non-negative Matrix Factorization 3 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Bayesian Networks BN = (S, P) consists of a network structure S and a set of local probability distributions P n p(x) p( x | pa ) i 1 i i <BN for detecting credit card fraud> • Structure can be found by relying on the prior knowledge of causal relationships 4 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ From Bayes Nets to High-Order PGMs (1) Naïve Bayes P( F | J , G, S , A) P( J , G, S , A | F ) P( F ) P( J , G, S , A) P( J , G, S , A | F ) J A F P( J | F ) P(G | F ) P( S | F ) P( A | F ) G S P( x | F ) x{ J ,G , S , A} J (2) Bayesian Net F P( F , J , G, S , A) A P(G | F ) P( J | F ) P( J | A)( J | S ) P( x | pa ( x)) S G x{ F , J ,G , S , A} (3) High-Order PGM J P ( F , J , G , S , A) A F P( J , G | F ) P( J , S | F ) P( J , A | F ) P (G, S | F ) P(G, A | F ) G S P( S , A | F ) he ( x , y ){( x , y )| x , y{ J ,G , S , A} and x y } © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ P (he( x5, y ) | F ) The Hypernetworks Hypergraphs A hypergraph is a (undirected) graph G whose edges connect a non-null number of vertices, i.e. G = (V, E), where V = {v1, v2, …, vn}, E = {E1, E2, …, En}, and Ei = {vi1, vi2, …, vim} An m-hypergraph consists of a set V of vertices and a subset E of V[m], i.e. G = (V, V[m]) where V[m] is a set of subsets of V whose elements have precisely m members. A hypergraph G is said to be k-uniform if every edge Ei in E has cardinality k. A hypergraph G is k-regular if every vertex has degree k. Rem.: An ordinary graph is a 2-uniform hypergraph. 7 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ An Example Hypergraph E1 G = (V, E) V = {v1, v2, v3, …, v7} E = {E1, E2, E3, E4, E5} E3 v1 E2 E1 = {v1, v3, v4} E2 = {v1, v4} E3 = {v2, v3, v6} E4 = {v3, v4, v6, v7} E5 = {v4, v5, v7} v2 E4 v3 v4 v6 v5 E5 v7 8 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Hypernetworks [Zhang, DNA-2006] A hypernetwork is a hypergraph of weighted edges. It is defined as a triple H = (V, E, W), where V = {v1, v2, …, vn}, E = {E1, E2, …, En}, and W = {w1, w2, …, wn}. An m-hypernetwork consists of a set V of vertices and a subset E of V[m], i.e. H = (V, V[m], W) where V[m] is a set of subsets of V whose elements have precisely m members and W is the set of weights associated with the hyperedges. A hypernetwork H is said to be k-uniform if every edge Ei in E has cardinality k. A hypernetwork H is k-regular if every vertex has degree k. Rem.: An ordinary graph is a 2-uniform hypergraph with wi=1. 9 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ A Hypernetwork x1 x2 x15 x3 x14 x4 x13 x5 x12 x6 x11 x7 x10 x8 x9 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 10 Learning with Hypernetworks The Hypernetwork Model of Learning The hypernetwo rk is defined as H ( X , S ,W ) X ( x1 , x2 ,..., xI ) The energy of the hypernetwo rk E ( x ( n ) ;W ) 1 1 w(i i2 ) x (i n ) x (i n ) w(i 3i )i x (i n ) x (i n ) x (i n ) ... 2 i1 ,i2 1 2 1 2 6 i1 ,i2 ,i3 1 2 3 1 2 3 S Si , Si X , k | S i | The probabilit y distributi on i 1 P(x ( n ) | W ) exp[ E (x ( n ) ;W )] ( 2) ( 3) (K ) W (W , W ,...,W ) Z(W ) Training set : 1 1 1 ( 2) ( n ) ( n ) ( 3) ( n ) ( n ) ( n ) exp w x x w x x x ... D {x ( n ) }1N Z(W ) 2 6 i , i i , i , i K 1 1 (k ) (n) (n) (n) exp w x x ... x , Z(W ) c ( k ) i ,i ,..., i k 2 i1i2 i1 i2 i1i2i3 1 2 i1 i2 i3 1 2 3 i1i2 ...ik 1 2 i1 i2 ik k where the partition function is K 1 (k ) (m) (m) (m) Z(W ) exp wi1i2 ...ik x i1 x i2 ...x ik k 2 c(k ) i1 ,i2 ,..., ik x( m ) [Zhang, 2006] 12 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Deriving the Learning Rule P({x ( n ) }N1 | W ) N P(x (n) | W ) n 1 ln P ({x ( n ) }N1 | W ) N ln P (x ( n ) | W ( 2 ) , W ( 3) ,..., W ( K ) ) n 1 K 1 (k ) (n) (n) (n) exp wi i ...i x i x i ...x i ln Z (W ) 12 k 1 2 k c ( k ) n 1 k 2 i , i ,..., i 1 2 k N (n) N ln P ({ x } 1 |W ) (s) w i i ...i 12 s 13 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Derivation of the Learning Rule ln P ({x ( n ) }N1 | W ) (s) w i1i2 ...is w(i1is2)...is K 1 (k ) (n) (n) (n) exp w x x ... x ln Z ( W ) i1i2 ...ik i1 i2 ik c ( k ) n 1 k 2 i , i ,..., i 1 2 k N N K 1 (k ) (n) (n) (n) exp w x x ... x ln Z ( W ) (s) (s) i1i2 ...ik i1 i2 ik c ( k ) w n 1 w i i ...i k 2 i , i ,..., i 1 2 k i1i2 ...is 12 s N x x ...x N x (i1n ) x (i2n ) ...x (isn ) xi1 xi2 ...xis P ( x|W ) n 1 i1 i2 is Data xi1 xi2 ...xis P ( x|W ) where 1 N xi1 xi2 ...xis Data xi1 xi2 ...xis P ( x|W ) x N n 1 (n) i1 x (i n ) ...x (i n ) 2 s x i x i ...x i P ( x | W ) x 1 2 s 14 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 1 x1 =1 x2 =0 x3 =0 x4 =1 x5 =0 x6 =0 x7 =0 x8 =0 x9 =0 x10 =1 x11 =0 x12 =1 x13 =0 x14 =0 x15 =0 y =1 2 x1 =0 x2 =1 x3 =1 x4 =0 x5 =0 x6 =0 x7 =0 x8 =0 x9 =1 x10 =0 x11 =0 x12 =0 x13 =0 x14 =1 x15 =0 y =0 3 x1 =0 x2 =0 x3 =1 x4 =0 x5 =0 x6 =1 x7 =0 x8 =1 x9 =0 x10 =0 x11 =0 x12 =0 x13 =1 x14 =0 x15 =0 y =1 4 x1 =0 x2 =0 x3 =0 x4 =0 x5 =0 x6 =0 x7 =0 x8 =1 x9 =0 x10 =0 x11 =1 x12 =0 x13 =0 x14 =0 x15 =1 y =1 4 examples x1 x2 1 x1 x4 x10 y=1 x1 x4 x12 y=1 x4 x10 x12 y=1 x15 Round 3 1 2 x3 x14 x4 2 3 4 x2 x3 x9 y=0 x2 x3 x14 y=0 x3 x9 x14 y=0 x3 x6 x8 y=1 x3 x6 x13 y=1 x6 x8 x13 y=1 x8 x11 x15 y=0 x13 x12 x5 x6 x11 x7 x10 x8 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ x9 15 Molecular Self-Assembly of Hypernetworks xi xj y Molecular Encoding Hypernetwork Representation X1 x1 x3 x1 Class x1 x3 x1 X2 x3 Class x1 Class x1 x3 X8 x3 Class Class x1 Class x1 x3 x1 x1 xn … x1 x2 x3 x1 Class x2 x4 x3 x2 X3 X7 x1 x1 Class x3 x3 x2 Class Class x1 x3 x4 x2 x2 Class x2 X4 X6 x2 x2 x4 Class x1 x4 Class Class x4 Class Class x2 x4 x3 Class x2 x1 Class x3 Class Class x1 x3 x4 x2 x4 x1 x3 Class Class x1 x1 x2 xn … Class x1 Class x1 Class Class Class x2 x2 Class Class Class x2 Class Class x4 Class X5 16 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Encoding a Hypernetwork with DNA a) z1 : (x1=0, x2=1, x3=0, y=1) z2 : (x1=0, x2=0, x3=1, x4=0, x5=0, y=0) z3 : (x2=1, x4=1, y=1) Collection of (labeled) hyperedges z4 : (x2=1, x3=0, x4=1, y=0) b) z1 : AAAACCAATTGGAAGGCCATGCGG z2 : AAAACCAATTCCAAGGGGCCTTCCCCAACCATGCCC z3 : AATTGGCCTTGGATGCGG Library of DNA molecules z4 : AATTGGAAGGCCCCTTGGATGCCC corresponding to (a) where AAAA x1 AATT x2 AAGG x3 CCTT x4 CCAA x5 ATGC CC 0 GG 1 y 17 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ DNA Molecular Computing Nanostructure Molecular recognition Self-replication Self-assembly Heat Cool Repeat Polymer 18 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning the Hypernetwork (by Molecular Evolution) Next generation i i Library of combinatorial molecules Library Example + Select the library elements matching the example Amplify the matched library elements by PCR [Zhang, DNA11] Hybridize 19 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Molecular Information Processing MP4.avi 20 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ The Theory of Bayesian Evolution Evolution as a Bayesian inference process Evolutionary computation (EC) is viewed as an iterative process of generating the individuals of ever higher posterior probabilities from the priors and the observed data. generation 0 P(A |D) P(A |D) ... P0(Ai) generation g Pg(Ai |D) Pg(Ai) i i [Zhang, CEC-99] © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 21 Evolutionary Learning Algorithm for Hypernetwork Classifiers 1. Let the hypernetwork H represent the current distribution P(X,Y). 2. Get a training example (x,y). 3. Classify x using H as follows 3.1 Extract all molecules matching x into M. 3.2 From M separate the molecules into classes: Extract the molecules with label Y=0 into M0 Extract the molecules with label Y=1 into M1 3.3 Compute y*=argmaxY{0,1}| MY |/|M| 4. Update H If y*=y, then Hn ← Hn-1+{c(u, v)} for u=x and v=y for (u, v) Hn-1, If y*≠y, then Hn ← Hn-1{c(u, v)} for u=x and v ≠ y for (u, v) Hn-1 5.Goto step 2 if not terminated. 22 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning with Hypergraphs: Application Results Biological Applications DNA-Based Molecular Diagnosis MicroRNA-Based Diagnosis Aptamer-Based Diagnosis DNA-Based Diagnosis & 120 samples from 60 leukemia patients Gene expression data Training Hypernets with 6-fold validation [Cheok et al., Nature Genetics, 2003] © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Class: ALL/AML Diagnosis 25 Learning Curve Fitness evolution of the population of hyperedges 26 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Order Effects on Learning Fitness curves for runs with fixed-cardinality hyperedges (card = 1, 4, 7, 10) 27 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Aptamer-Based Cardiovascular Disease Diagnosis Training Data ▷ Disease : Cardiovascular Disease (CVD) ▷ Classes : 4 Classes [Normal / 1st / 2nd / 3rd Stages] ▷ The number of Samples : 135 Samples [N : 40 / 1st : 38 / 2nd : 19 / 3rd : 18] ▷ Preprocessing Feature Selection Using Gain Ratio Convert to Real-value 3K Aptamer Array 3K Real-value Data Binarization Using MDL 150 Real-value Data 150 Boolean Data ▷ Simulation Parameter Value 1) Order : 2 ~ 70 2) Sampling Rate : 50 3) In each case, 10 times repeated and averaged ▷ Classification : Majority voting with The Sum of Library Element Weight ▷ Training / Test Size : Traing 108 (80%) / Test 27 (20%) 29 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning & Classification by Hypernetworks Training Data Test Data X0=1X1=1X2=0X3=0X4=1X5=1X6=1X7=0 … X149=1C=1 X0=0X1=0X2=0X3=1X4=1X5=1X6=0X7=0 … X149=1C=0 X0=0X1=0X2=1X3=1X4=0X5=1X6=0X7=1 … X149=1C=1 Binarization X0=0X1=1X2=1X3=1X4=0X5=0X6=0X7=1 … X149=1C=1 X0=1X1=0X2=1X3=1X4=0X5=0X6=0X7=1 … X149=1C=0 X0=1X1=1X2=0X3=0 C=1 W=1000 Data Set Source Data X0=1X4=1X6=1X7=0 C=1 W=1000 X18=1X35=0X68=1X82=0C=1 W=1000 Learining Loop [Evolution Stage] X6=0X7=0X8=0X9=1 C=0 W=1000 X14=0X4=1X5=1X7=0 C=0 W=1000 Adjust Learning Rate X22=0X4=1X6=0X149=1C=0 W=1000 95 X0=1X1=1X2=0X3=0 C=1 X1=0X33=1X4=0X9=1 C=1 W=1000 W’=1 90 X0=1X4=1X6=1X7=0 C=1 W’=45 X3=1X6=0X52=1X8=0 C=1 W=1000 85 80 X18=1X35=0X68=1X82=0C=1 W’=4000 X0=0X2=1X4=0X5=1 C=1 W=1000 75 70 160 200 240 280 320 360 400 440 480 520 560 600 640 680 720 760 800 840 880 920 960 1000 160 200 240 280 320 360 400 440 480 520 560 600 640 680 720 760 800 840 880 920 960 1000 0 80 40 120 Test X22=0X4=1X6=0X149=1C=0 W’=500 86 84 82 80 X1=0X33=1X4=0X9=1 C=1 W’=1300 78 76 W’=4 X0=0X2=1X4=0X5=1 C=1 W’=14 Test Data 74 72 70 0 Training Data X3=1X6=0X52=1X8=0 C=1 40 Weight Update Rule (Learning) : Error Correction In case that all index-value matched, If Class is correct, w = w*1.0001 Else w = w*0.95. X14=0X4=1X5=1X7=0 C=0 W’=8530 65 80 Weight Update X6=0X7=0X8=0X9=1 C=0 W’=12 120 Library Library 30 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Simulation Result (1/3) ▷ Training & test errors as learning goes on (order k=12) 100 95 Accuracy 90 Training Test 85 80 75 0 50 100 150 200 250 Epoch 300 350 400 450 500 31 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Simulation Result (2/3) ▷ Accuracy on test data as learning goes on (order k=12) 84 82 80 78 Accuracy Order 76 2 4 8 12 16 20 30 40 50 60 70 74 72 70 68 66 64 0 50 100 Epoch 150 200 32 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Simulation Result (3/3) ▷ The effect of learning 84 Learning Sampling only 82 80 78 Accuracy 76 74 72 70 68 66 64 0 10 20 30 40 Order 50 60 70 80 33 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Mining Cancer-Related MicroRNA Modules from miRNA Expression Profiles Gene Regulation by microRNAs MicroRNAs MicroRNAs (miRNAs) are endogenous about 22 nt RNAs that can play important regulatory roles in animals, plants and viruses. Post-transcriptional gene regulation Binding target genes for degradation or translational repression Recently, miRNAs are reported that related to the cancer development and progression. 35 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Dataset The miRNA expression microarray data The expression profiles of miRNA in human among 11 tumors, which were bladder, breast, colon, kidney, lung, pancreas, prostate, uterus, melanoma, mesothelioma, ovary tissue (Lu et al., 2005). This dataset consists of an expre ssion matrix of 151 miRNAs (ro ws) and 89 samples (columns). Tissue type Cancer Norma l Bladder 1 6 Breast 3 6 Colon 4 7 Kidney 3 4 Lung 2 5 Pancreas 1 8 Prostate 6 6 Uterus 1 10 Melanoma 0 3 Mesothelioma 0 8 Ovary 0 5 All tissues 21 68 36 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Representing a Hypernetwork from miRNA Expression Data Data item : 151 miRNAs 89 samples X=1 1 X=2 0 X=3 1 X=4 1 X=5 0 X=6 1 2 X=1 0 X=2 0 X=3 0 X=4 1 X=5 0 X=6 0 89 X=1 1 X=2 0 X=3 0 X=4 1 X=5 0 X=6 1 X=151 0 Class cancer ……. X=151 1 Class normal ……. X=151 1 Class cancer … 1 Library (normal or cancer classification rules) 1 2 X=2 cancer X=10 X=20 normal X=1 X=45 cancer X=10 X=31 cancer X=1 X=80 normal X=31 X=20 normal X=1 X=2 cancer … X=1 89 X=1 X=2 cancer X=1 X=45 cancer X=1 X=45 cancer X=1 X=2 cancer A hypernetwork H = (X, E, W) of DNA Molecules 37 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Performance Leave-one-out cross-validation Algorithms Correct classification rate Bayesian Network 79.77 % Naïve Bayes 83.15 % 88.76 % 90.00% 91.01 % ID3 Hypernetworks Sequential Minimal Optimization (SMO) Multi-layer perceptron (MLP) 92.13 % 38 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Accuracy vs. Order for Test Data (sampling only) 1 0.9 Classification ratio 0.8 0.7 0.6 0.5 0.4 0.3 0.2 20 40 60 80 Order 100 120 140 39 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Curves for Training Data 1 Classification ratio 0.95 0.9 Order 0.85 2 3 4 5 6 7 0.8 0 10 20 30 Epoch 40 50 60 40 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ miRNA Data Mining miRNA modules related to cancer Weight miRNA modules a b miRNAs related to cancer miRNAs weight hsa-miR-155 295972.7 7919.249184 hsa-miR-215 1 hsa-miR-7 1 hsa-miR-105 283034.8 6787.927872 hsa-miR-194 1 hsa-miR-30d 0 hsa-miR-223 280371.4 6787.927872 hsa-miR-214 1 hsa-miR-30e 0 hsa-miR-21 277609.9 6084.600896 hsa-miR-21 1 hsa-miR-321 1 hsa-let-7c 270764.7 5656.60656 hsa-miR-142-3p 1 hsa-miR-34b 0 hsa-miR-142-3p 266700.1 5656.60656 hsa-miR-142-3p 1 hsa-miR-96 0 hsa-miR-29b 263159 5656.60656 hsa-miR-126 1 hsa-miR-30c 0 hsa-miR-224 260877.3 5324.025784 hsa-miR-26b 1 hsa-miR-29b 1 hsa-miR-183 260877.3 5324.025784 hsa-let-7f 1 hsa-miR-9* 1 hsa-miR-184 260116.7 5324.025784 hsa-miR-224 1 hsa-miR-301 0 hsa-let-7a 256313.8 41 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Non-Biological Applications Digit Recognition Face Classification Text Classification Movie Title Prediction Digit Recognition: Dataset Original Data Handwritten digits (0 ~ 9) Training data: 2,630 (263 examples for each class) Test data: 1,130 (113 examples for each class) Preprocessing Each example is 8x8 binary matrix. Each pixel is 0 or 1. 43 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Pattern Classification Hidden Layer Input Layer x1 x1 Class x1 x1 x1 x2 x2 Class x2 x1 Class x1 x3 x3 Class x1 Class x1 x1 x3 x3 • • •x Class Class x1 Class Class x2 1 x3 x1 Class x1 xn … x1 x2 x3 x1 Class x2 x4 x3 x2 x1 x1 Class x3 x3 x2 x1 Class x2 Class x1 x3 x4 x2 x4 x2 x2 x2 • • • Class x1 x4 Class x2 x4 x3 Class x1 x3 x1 x4 x3 Class x2 Class x1x3 Class xn Class x4 Class x1 Class x2 Class 2 Class Class Class x3 x3 x1 x1 x3 x4 x1 Class x1 x3 x4 wi x x1x2 Class Class x1 x2 x3 Class Class x1 Class x1 x2 Class x1 xn Class Class x2 Class … Class Class x1 Class x2 x2 Class • • • x3 wm x1 Output Layer wj x3 x2 Class w2 x1 Class “Layered” Hypernetwork w1 Probabilistic Library (DNA Representation) Class x1 … x1 Class n n m k 1 k n n W {wi | 1 i , w # of copies} k 1 k xn … Class xn Class Class x1…xn © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 44 Simulation Results – without Error Correction |Train set| = 3760, |Test set| = 1797. 45 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Performance Comparison Methods Accuracy MLP with 37 hidden nodes 0.941 MLP with no hidden nodes 0.901 SVM with polynomial kernel 0.926 SVM with RBF kernel 0.934 Decision Tree 0.859 Naïve Bayes 0.885 kNN (k=1) 0.936 kNN (k=3) 0.951 Hypernet with learning (k = 10) 0.923 Hypernet with sampling (k = 33) 0.949 46 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Error Correction Algorithm 1. 2. 3. Initialize the library as before. maxChangeCnt := librarySize. For i := 0 to iteration_limit 1. trainCorrectCnt := 0. 2. 3. Run classification for all training patterns. For each correctly classifed patterns, increase trainCorrectCnt. For each library elements 1. 2. Initialize fitness value to 0. For each misclassified training patterns if a library element is matched to that example 1. 2. 4. 5. 6. if classified correctly, then fitness of the library element gains 2 points. Else it loses 1 points. changeCnt := max{ librarySize * (1.5 * (trainSetSize - trainCorrectCnt) / trainSetSize + 0.01), maxChangeCnt * 0.9 }. maxChangeCnt := changeCnt. Delete changeCnt library elements of lowest fitness and resample library elements whose classes are that of deleted ones. 47 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Simulation Results – with Error Correction iterationLimit = Train 37, librarySize = 382,300, Test 1 0.93 0.99 0.92 0.98 Classification ratio Classification ratio 0.97 0.96 0.95 0.94 Order 6 10 14 18 22 27 0.93 0.92 0.91 0.9 0 5 10 15 20 Iteration 25 30 35 0.91 0.9 Order 0.89 6 10 14 18 22 26 0.88 0.87 0 5 10 15 20 Iteration 25 30 35 48 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Performance Comparison Algorithms Correct classification rate Random Forest (f=10, t=50) KNN (k=4) Hypernetwork (Order=26) AdaBoost (Weak Learner: J48) 94.10 % 93.49 % 92.99 % 91.93 % SVM (Gaussian Kernel, SMO) MLP 91.37 % 90.53 % Naïve Bayes J48 87.26 % 84.86 % 49 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Face Classification Experiments Face Data Set Yale dataset 15 people 11 images per person Total 165 images 51 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Training Images of a Person 10 for training The remaining 1 for test 52 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Bitmaps for Training Data (Dimensionality = 480) 53 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Classification Rate by Leave-One-Out 54 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Classification Rate (Dimensionality = 64 by PCA) 55 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Text Classification Experiments Text Classification 1. Documents 2. Bag-of-words representation ... 3. Term vectors 1 0 0 0 2 0 1 baseball specs graphics hockey unix space 0 1 0 1 0 d1 d2 d3 1 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0 1 1 1 1 1 0 0 0 3 0 0 0 2 1 0 0 0 1 dn 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 4. Binary term-document matrix x1=0 x2=1 x3=1 y=1 x1=0 x2=0 x3=1 y=0 x2=1 x3=1 y=1 x1=0 x2=0 y=0 x1=0 x2=0 y=0 x1=0 x2=0 y=0 x2=1 x1=0 y=0 x1=0 y=0 x1=0 y=0 x1=0 x2=0 y=1 x1=0 x2=0 y=1 x1=0 x2=0 y=1 y=0 x2=0 y=1 x2=0 y=1 x2=0 y=1 x1=0 x2=1 y=0 x1=0 x2=1 y=0 x1=0 x2=1 y=0 x1=0 x2=0 x3=0 y=0 x1=0 x2=0 x3=0 y=0 x1=0 x2=0 x3=0 y=0 x1=0 x2=0 x3=0 y=1 x1=0 x2=0 x3=0 y=1 x1=0 x2=0 x3=0 y=1 x1=0 x2=0 x3=1 y=1 x1=0 x2=0 x3=1 y=1 x1=0 x2=0 x3=1 y=1 y=0 x3=0 x2=0 y=0 x2=0 y=0 x2=0 y=0 x1=0 y=1 x1=0 y=1 x1=0 y=1 x1=0 x2=1 y=1 x1=0 x2=1 y=1 x1=0 x2=1 y=1 x2=1 x1=0 x2=0 x3=1 y=0 x1=0 x2=0 x3=1 y=0 x1=0 x2=0 x3=1 y=0 x1=0 x2=1 x3=0 y=0 x1=0 x2=1 x3=0 y=0 x1=0 x2=1 x3=0 y=0 x1=0 x2=1 x3=0 y=1 x1=0 x2=1 x3=0 y=1 x1=0 x2=1 x3=0 y=1 57 5. DNA encoded kernel functions © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Text Classification Data from Reuters-21578 (‘ACQ’ and ‘EARN’) Learning curves: average for 10 runs 58 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Performance Comparison ‘ACQ’ data (4,724 documents) ‘EARN’ data (7,888 documents) Higher-dimensional kernel functions can improve the performance further. 59 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning from Movie Captions Experiments Learning Hypernets from Movie Captions Order Sequential Range: 2~3 Corpus Friends Prison Break 24 61 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions 62 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions 63 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions 64 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions Classification Query generation - I intend to marry her : I ? to marry her I intend ? marry her I intend to ? her I intend to marry ? Matching - I ? to marry her order 2: I intend, I am, intend to, …. order 3: I intend to, intend to marry, … Count the number of max-perfect-matchin hyperedges 65 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions Completion & Classification Examples Query Completion who are you Corpus: Friends, 24, Prison Break ? are you who ? you who are ? what are you who are you who are you you need to wear it Corpus: 24, Prison Break, House ? need to wear it you ? to wear it you need ? wear it you need to ? it you need to wear ? i need to wear it you want to wear it you need to wear it you need to do it you need to wear a Classification Friends Friends Friends 24 24 24 House 24 66 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Conclusion Hypernetworks are a graphical model employing higher-order nodes explicitly and allowing for a more natural representation for learning higher-order graphical models. We introduce an evolutionary learning algorithm that makes use of the high information density and massive parallelism of molecular computing to solve the combinatorial explosion problems. Applied to pattern recognition (and completion) problems in IT and BT. Obtained a performance competitive to conventional ML classifiers. Why does this work? Exploits the huge population size available in DNA computing to build an ensemble machine, i.e. a hypernetwork, of simple random hyperedges. A new kind of evolutionary algorithm where a very simple “molecular” operators are applied to a “huge” population of individuals in a “massively parallel” way. Another potential of hypernetworks is for application to solving biological problems where data are given as “wet” DNA or RNA molecules. 67 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Acknowledgements Simulation Experiments Joo-Kyoung Kim, Sun Kim, Soo-Jin Kim, Jung-Woo Ha, Chan-Hoon Park, Ha-Young Jang Collaborating Labs - Biointelligence Laboratory, Seoul National University - RNomics Lab, Seoul National University - DigitalGenomics, Inc. - GenoProt, Inc. Supported by - National Research Lab Program of Min. of Sci. & Tech. (2002-2007) - Next Generation Tech. Program of Min. of Ind. & Comm. (2000-2010) More Information at - http://bi.snu.ac.kr/MEC/ - http://cbit.snu.ac.kr/ 68 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/