Hypernetwork Models of Memory Byoung-Tak Zhang Biointelligence Laboratory School of Computer Science and Engineering Brain Science, Cognitive Science, Bioinformatics Programs Seoul National University Seoul 151-742, Korea btzhang@cse.snu.ac.kr http://bi.snu.ac.kr/ Signaling Networks 2 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Protein-Protein Interaction Networks [Jeong et al., Nature 2001] © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 3 Regulatory Networks Transcription Factor [Katia Basso, et al., Nature Genetics 37, 382 – 390, 2005] 4 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Molecular Networks x2 x15 x3 x14 x4 x13 x5 x12 x6 x11 x7 x10 x8 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ x9 5 The “Hyperinteraction” Model of Biomolecules x1 x2 x15 x3 x14 x4 x13 x5 x12 x6 x11 x7 [Zhang, DNA-2006] [Zhang, FOCI-2007] x10 x8 • Vertex: • Genes • Proteins • Neurotransmitters • Neuromodulators • Hormones • Edge: • Interactions • Genetic • Signaling • Metabolic • Synaptic x9 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 6 The Hypernetworks Hypergraphs A hypergraph is a (undirected) graph G whose edges connect a non-null number of vertices, i.e. G = (V, E), where V = {v1, v2, …, vn}, E = {E1, E2, …, En}, and Ei = {vi1, vi2, …, vim} An m-hypergraph consists of a set V of vertices and a subset E of V[m], i.e. G = (V, V[m]) where V[m] is a set of subsets of V whose elements have precisely m members. A hypergraph G is said to be k-uniform if every edge Ei in E has cardinality k. A hypergraph G is k-regular if every vertex has degree k. Rem.: An ordinary graph is a 2-uniform hypergraph. 8 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ An Example Hypergraph E1 G = (V, E) V = {v1, v2, v3, …, v7} E = {E1, E2, E3, E4, E5} E3 v1 E2 E1 = {v1, v3, v4} E2 = {v1, v4} E3 = {v2, v3, v6} E4 = {v3, v4, v6, v7} E5 = {v4, v5, v7} v2 E4 v3 v4 v6 v5 E5 v7 9 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Hypernetworks [Zhang, DNA-2006] A hypernetwork is a hypergraph of weighted edges. It is defined as a triple H = (V, E, W), where V = {v1, v2, …, vn}, E = {E1, E2, …, En}, and W = {w1, w2, …, wn}. An m-hypernetwork consists of a set V of vertices and a subset E of V[m], i.e. H = (V, V[m], W) where V[m] is a set of subsets of V whose elements have precisely m members and W is the set of weights associated with the hyperedges. A hypernetwork H is said to be k-uniform if every edge Ei in E has cardinality k. A hypernetwork H is k-regular if every vertex has degree k. Rem.: An ordinary graph is a 2-uniform hypergraph with wi=1. 10 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ The Hypernetwork Memory x1 x2 [Zhang, DNA12-2006] x15 The hypernetwo rk is defined as H ( X , S ,W ) X ( x1 , x2 ,..., xI ) S Si , Si X , k | Si | i x3 W (W ( 2) ,W (3) ,...,W ( K ) ) Training set : x14 D {x ( n ) }1N x4 x13 The energy of the hypernetwo rk 1 1 E (x ( n ) ;W ) w(i1i22) x (i1n ) x (i2n ) w(i13i2)i3 x (i1n ) x (i2n ) x (i3n ) ... 2 i1 ,i2 6 i1 ,i2 ,i3 The probabilit y distributi on x5 x12 x6 x11 P(x ( n ) | W ) 1 exp[ E (x ( n ) ;W )] Z(W ) 1 1 1 exp w(i1i22) x (i1n ) x (i2n ) w(i13i2)i3 x (i1n ) x (i2n ) x (i3n ) ... Z(W ) 6 i1 ,i2 ,i3 2 i1 ,i2 K 1 1 exp w(i1ik2 )...ik x (i1n ) x (i2n ) ...x (ikn ) , Z(W ) k 2 c(k ) i1 ,i2 ,..., ik where the partition function is x7 x10 x8 K 1 (k ) (m) (m) (m) Z(W ) exp wi1i2 ...ik x i1 x i2 ...x ik k 2 c(k ) i1 ,i2 ,..., ik x( m ) x9 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 11 1 x1 =1 x2 =0 x3 =0 x4 =1 x5 =0 x6 =0 x7 =0 x8 =0 x9 =0 x10 =1 x11 =0 x12 =1 x13 =0 x14 =0 x15 =0 y =1 2 x1 =0 x2 =1 x3 =1 x4 =0 x5 =0 x6 =0 x7 =0 x8 =0 x9 =1 x10 =0 x11 =0 x12 =0 x13 =0 x14 =1 x15 =0 y =0 3 x1 =0 x2 =0 x3 =1 x4 =0 x5 =0 x6 =1 x7 =0 x8 =1 x9 =0 x10 =0 x11 =0 x12 =0 x13 =1 x14 =0 x15 =0 y =1 4 x1 =0 x2 =0 x3 =0 x4 =0 x5 =0 x6 =0 x7 =0 x8 =1 x9 =0 x10 =0 x11 =1 x12 =0 x13 =0 x14 =0 x15 =1 y =1 4 sentences (with labels) x1 x2 1 x1 x4 x10 y=1 x1 x4 x12 y=1 x4 x10 x12 y=1 x15 Round 3 1 2 x3 x14 x4 2 3 4 x2 x3 x9 y=0 x2 x3 x14 y=0 x3 x9 x14 y=0 x3 x6 x8 y=1 x3 x6 x13 y=1 x6 x8 x13 y=1 x8 x11 x15 y=0 x13 x12 x5 x6 x11 x7 x10 x8 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ x9 12 An Example: Hypernetwork Model of Linguistic Memory The Language Game Platform 14 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ A Language Game We ? ? a lot ? gifts. We don't have a lot of gifts. ? still ? believe ? did this. I still can't believe you did this. 15 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Text Corpus: TV Drama Series Friends, 24, House, Grey Anatomy, Gilmore Girls, Sex and the City I don't know what happened. What ? ? ? here. Take a look at this. … 289,468 Sentences (Training Data) ? have ? visit the ? room. … 700 Sentences with Blanks (Test Data) © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 16 Step 1: Learning a Linguistic Memory Computer network is rapidly increased. The price of computer network installing is very cheap. The price of monitor display is on decreasing. Nowadays, color monitor display is so common, the price is not so high. This is a system adopting text mode color display. This is an animation news networks. ... k=2 k=2 k=3 k=4 … Computer Network Computer Price Computer Network Price Computer Monitor Display Computer Display Monitor Display Monitor Price Computer Display News Price Hypernetwork Memory ` Color Monitor Color Display Color Monitor Color Text Color Text Display Text News Display Monitor Color Price Text Network News Price Network © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 17 Step 2: Recalling from the Memory He is my best friend He is a strong boy Strong friend likes pretty girl Storage X1 X2 He is is my X3 my He is is a Strong friend friend likes X8 best best friend X7 a strong strong boy likes strong pretty girl X4 X6 X5 Recall He is a a ? friend Strong friend Self-assembly He is a strong friend strong best friend © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 18 The Language Game: Results ? gonna ? upstairs ? ? a shower I'm gonna go upstairs and take a shower We ? ? a lot ? gifts We don't have a lot of gifts ? have ? visit the ? room I have to visit the ladies' room ? ? don't need your ? If I don't need your help ? still ? believe ? did this I still can't believe you did this ? ? a dream about ? In ? I had a dream about you in Copenhagen ? ? ? decision to make a decision What ? ? ? here What are you doing here ? appreciate it if ? call her by ? ? I appreciate it if you call her by the way ? you ? first ? of medical school Are you go first day of medical school Would you ? to meet ? ? Tuesday ? Would you nice to meet you in Tuesday and I'm standing ? the ? ? ? cafeteria I'm standing in the one of the cafeteria Why ? you ? come ? down ? Why are you go come on down here ? think ? I ? met ? somewhere before I think but I am met him somewhere before 19 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Experimental Setup The order (k) of an hyperedge Range: 2~4 Fixed order for each experiment The method of creating hyperedges from training data Sliding window method Sequential sampling from the first word The number of blanks (question marks) in test data Range: 1~4 Maximum: k - 1 20 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Behavior Analysis (1/3) Sentences with One Missing Words Completion 800 700 600 500 400 300 200 Order 2 Order 3 Order 4 100 0 40K 80K 120K 160K 200K 240K 280K 290K The performance monotonically increases as the learning corpus grows. The low-order memory performs best for the one-missing-word problem. 21 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Behavior Analysis (2/3) Sentences with Two Missing Words Completion 800 700 600 500 400 300 200 Order 2 Order 3 Order 4 100 0 40K 80K 120K 160K 200K 240K 280K 290K The medium-order (k=3) memory performs best for the two-missing-words problem. 22 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Behavior Analysis (3/3) Sentences with Three Missing Words Completion 800 700 600 500 400 300 200 Order 2 Order 3 Order 4 100 0 40K 80K 120K 160K 200K 240K 280K 290K The high-order (k=4) memory performs best for the three-missing-words problem. 23 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning with Hypernetworks 1 x1 =1 x2 =0 x3 =0 x4 =1 x5 =0 x6 =0 x7 =0 x8 =0 x9 =0 x10 =1 x11 =0 x12 =1 x13 =0 x14 =0 x15 =0 y =1 2 x1 =0 x2 =1 x3 =1 x4 =0 x5 =0 x6 =0 x7 =0 x8 =0 x9 =1 x10 =0 x11 =0 x12 =0 x13 =0 x14 =1 x15 =0 y =0 3 x1 =0 x2 =0 x3 =1 x4 =0 x5 =0 x6 =1 x7 =0 x8 =1 x9 =0 x10 =0 x11 =0 x12 =0 x13 =1 x14 =0 x15 =0 y =1 4 x1 =0 x2 =0 x3 =0 x4 =0 x5 =0 x6 =0 x7 =0 x8 =1 x9 =0 x10 =0 x11 =1 x12 =0 x13 =0 x14 =0 x15 =1 y =1 4 examples x1 x2 1 x1 x4 x10 y=1 x1 x4 x12 y=1 x4 x10 x12 y=1 x15 Round 3 1 2 x3 x14 x4 2 3 4 x2 x3 x9 y=0 x2 x3 x14 y=0 x3 x9 x14 y=0 x3 x6 x8 y=1 x3 x6 x13 y=1 x6 x8 x13 y=1 x8 x11 x15 y=0 x13 x12 x5 x6 x11 x7 x10 x8 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ x9 25 1 2 Random Graph Process (RGP) 3 4 5 26 27 Collectively Autocatalytic Sets “A contemporary cell is a collectively autocatalytic whole in which DNA, RNA, the code, proteins, and metabolism linking the synthesis of species of some molecular species to the breakdown of other “high-energy” molecular species all weave together and conspire to catalyze the entire set of reactions required for the whole cell to reproduce.” (Kauffman) 28 Reaction Graph 29 The Hypernetwork Model of Memory The hypernetwo rk is defined as H ( X , S ,W ) X ( x1 , x2 ,..., xI ) The energy of the hypernetwork 1 1 ( n) ( n) E (x (n ) ;W ) wi(1) xi( n ) wi(2) x x wi(3) x i( n ) x i( n ) x i( n ) ... i i i i i 1 1 2 i1 ,i2 1 2 1 2 6 i1 ,i2 ,i3 1 2 3 1 2 3 i1 S Si , Si X , k | Si | The probability distribution i 1 P (x (n ) |W ) exp[ E ( x(n ) ;W )] ( 2) ( 3) (K ) W (W ,W ,...,W ) Z(W ) Training set : 1 1 1 (2) ( n ) ( n ) (3) ( n ) ( n ) ( n ) exp w x x w x x x ... i1i2 i1 i2 i1i2i3 i1 i2 i3 D {x ( n ) }1N Z(W ) 2 6 i , i i , i , i 1 2 3 12 K 1 1 (k ) (n) (n) (n) exp w x x ... x , i1i2 ... ik i1 i2 ik Z(W ) c ( k ) i1 ,i2 ,...,ik k 2 where the partition function is K 1 (k ) (m) (m) ( m) Z(W ) exp wi1i2 ...ik xi1 xi2 ...xik k 2 c(k ) i1 ,i2 ,...,ik x( m ) [Zhang, 2006] 30 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Deriving the Learning Rule P({x ( n ) }N1 | W ) N P(x (n) | W ) n 1 ln P ({x ( n ) }N1 | W ) N ln P (x ( n ) | W ( 2 ) , W ( 3) ,..., W ( K ) ) n 1 K 1 (k ) (n) (n) (n) exp wi1i2 ...ik x i1 x i2 ...x ik ln Z (W ) n 1 k 2 c( k ) i1 ,i2 ,..., ik N (n) N ln P ({ x } 1 |W ) (s) w i i ...i 12 s 31 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Derivation of the Learning Rule ln P ({x ( n ) }N1 | W ) (s) wi1i2 ...is w(i1is2)...is K 1 (k ) (n) (n) (n) exp w x x ... x ln Z ( W ) i1i2 ...ik i1 i2 ik c ( k ) n 1 k 2 i , i ,..., i 1 2 k N N K 1 (k ) (n) (n) (n) exp w x x ... x ln Z ( W ) (s) (s) i1i2 ...ik i1 i2 ik c ( k ) w n 1 w i i ...i k 2 i , i ,..., i 1 2 k i1i2 ...is 12 s N x x ...x N x (i1n ) x (i2n ) ...x (isn ) xi1 xi2 ...xis n 1 i1 i2 is Data P ( x|W ) xi1 xi2 ...xis P ( x|W ) where 1 N xi1 xi2 ...xis Data xi1 xi2 ...xis P ( x|W ) x N n 1 (n) i1 x (i2n ) ...x (isn ) x i1 x i2 ...x is P ( x | W ) x 32 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Comparison to Other Machine Learning Methods Probabilistic Graphical Models (PGMs) Represent the joint probability distribution on some random variables in graphical form. Undirected PGMs Directed PGMs • C and D are B A C independent given B. D • C asserts dependency between A and B. • B and E are independent given C. E Generative: The probability distribution for some P( A, B, C , D, E ) variables given values of P( A) P( B | A) P(C | A, B) P( D | A, B, C ) other variables can be P( E | A, B, C , D) obtained. P( A) P( B) P(C | A, B) P( D | B) P( E | C ) Probabilistic inference 34 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Kinds of Graphical Models Graphical Models Undirected - Boltzmann Machines - Markov Random Fields Directed - Bayesian Networks - Latent Variable Models - Hidden Markov Models - Generative Topographic Mapping - Non-negative Matrix Factorization 35 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Bayesian Networks BN = (S, P) consists of a network structure S and a set of local probability distributions P n p(x) p( x | pa ) i 1 i i <BN for detecting credit card fraud> • Structure can be found by relying on the prior knowledge of causal relationships 36 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ From Bayes Nets to High-Order PGMs (1) Naïve Bayes J A F P( F | J , G, S , A) P( J , G, S , A | F ) P( F ) P( J , G, S , A) P( J , G, S , A | F ) P( J | F ) P(G | F ) P( S | F ) P( A | F ) G S P( x | F ) x{ J ,G , S , A} J (2) Bayesian Net F P( F , J , G, S , A) P(G | F ) P( J | F ) P( J | A)( J | S ) P( x | pa ( x)) G x{ F , J ,G , S , A} S (3) High-Order PGM J A F G A S P( F , J , G, S , A) P( J , G | F ) P( J , S | F ) P( J , A | F ) P(G, S | F ) P(G, A | F ) P( S , A | F ) he ( x , y ){( x , y )| x , y{ J ,G , S , A} and x y } © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ P(he(37 x, y ) | F ) Visual Memories Digit Recognition Face Classification Text Classification Movie Title Prediction Digit Recognition: Dataset Original Data Handwritten digits (0 ~ 9) Training data: 2,630 (263 examples for each class) Test data: 1,130 (113 examples for each class) Preprocessing Each example is 8x8 binary matrix. Each pixel is 0 or 1. 39 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Pattern Classification Hidden Layer Input Layer x1 x1 Class x1 x1 x1 x2 x2 Class x2 x1 Class x1 x3 x3 Class x1 Class x1 x1 x3 x3 • • •x Class Class x1 Class Class x2 1 x3 x1 Class x1 xn … x1 x2 x3 x1 Class x2 x4 x3 x2 x1 x1 Class x3 x3 x2 x1 Class x2 Class x1 x3 x4 x2 x4 x2 x2 x2 • • • Class x1 x4 Class x2 x4 x3 Class x1 x3 x1 x4 x3 Class x2 Class x1x3 Class xn Class x4 Class x1 Class x2 Class 2 Class Class Class x3 x3 x1 x1 x3 x4 x1 Class x1 x3 x4 wi x x1x2 Class Class x1 x2 x3 Class Class x1 Class x1 x2 Class x1 xn Class Class x2 Class … Class Class x1 Class x2 x2 Class • • • x3 wm x1 Output Layer wj x3 x2 Class w2 x1 Class “Layered” Hypernetwork w1 Probabilistic Library (DNA Representation) Class x1 … x1 Class n n m k 1 k n n W {wi | 1 i , w # of copies} k 1 k xn … Class xn Class Class x1…xn © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 40 Simulation Results – without Error Correction |Train set| = 3760, |Test set| = 1797. 41 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Performance Comparison Methods Accuracy MLP with 37 hidden nodes 0.941 MLP with no hidden nodes 0.901 SVM with polynomial kernel 0.926 SVM with RBF kernel 0.934 Decision Tree 0.859 Naïve Bayes 0.885 kNN (k=1) 0.936 kNN (k=3) 0.951 Hypernet with learning (k = 10) 0.923 Hypernet with sampling (k = 33) 0.949 42 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Error Correction Algorithm 1. 2. 3. Initialize the library as before. maxChangeCnt := librarySize. For i := 0 to iteration_limit 1. trainCorrectCnt := 0. 2. 3. Run classification for all training patterns. For each correctly classifed patterns, increase trainCorrectCnt. For each library elements 1. 2. Initialize fitness value to 0. For each misclassified training patterns if a library element is matched to that example 1. 2. 4. 5. 6. if classified correctly, then fitness of the library element gains 2 points. Else it loses 1 points. changeCnt := max{ librarySize * (1.5 * (trainSetSize - trainCorrectCnt) / trainSetSize + 0.01), maxChangeCnt * 0.9 }. maxChangeCnt := changeCnt. Delete changeCnt library elements of lowest fitness and resample library elements whose classes are that of deleted ones. 43 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Simulation Results – with Error Correction iterationLimit = Train 37, librarySize = 382,300, Test 1 0.93 0.99 0.92 0.98 Classification ratio Classification ratio 0.97 0.96 0.95 0.94 Order 6 10 14 18 22 27 0.93 0.92 0.91 0.9 0 5 10 15 20 Iteration 25 30 35 0.91 0.9 Order 0.89 6 10 14 18 22 26 0.88 0.87 0 5 10 15 20 Iteration 25 30 35 44 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Performance Comparison Algorithms Correct classification rate Random Forest (f=10, t=50) KNN (k=4) Hypernetwork (Order=26) 94.10 % 93.49 % 92.99 % AdaBoost (Weak Learner: J48) SVM (Gaussian Kernel, SMO) MLP 91.93 % 91.37 % 90.53 % Naïve Bayes J48 87.26 % 84.86 % 45 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Face Classification Experiments Face Data Set Yale dataset 15 people 11 images per person Total 165 images 47 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Training Images of a Person 10 for training The remaining 1 for test 48 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Bitmaps for Training Data (Dimensionality = 480) 49 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Classification Rate by Leave-One-Out 50 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Classification Rate (Dimensionality = 64 by PCA) 51 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions Order Sequential Range: 2~3 Corpus Friends Prison Break 24 52 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions 53 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions 54 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions 55 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions Classification Query generation - I intend to marry her : I ? to marry her I intend ? marry her I intend to ? her I intend to marry ? Matching - I ? to marry her order 2: I intend, I am, intend to, …. order 3: I intend to, intend to marry, … Count the number of max-perfect-matchin hyperedges 56 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Learning Hypernets from Movie Captions Completion & Classification Examples Query Completion who are you Corpus: Friends, 24, Prison Break ? are you who ? you who are ? what are you who are you who are you you need to wear it Corpus: 24, Prison Break, House ? need to wear it you ? to wear it you need ? wear it you need to ? it you need to wear ? i need to wear it you want to wear it you need to wear it you need to do it you need to wear a Classification Friends Friends Friends 24 24 24 House 24 57 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/