Weighting training sequences Why do we want to weight training sequences? Many different proposals – Based on trees – Based on the 3D position of the sequences – Interested only in classifying family membership – Maximizing entropy Probability estimation and weights 1 Why do we want to weight training sequences? Parts of sequences can be closely related to each other and don’t deserve the same influence in the estimation process as a sequence which is highly diverted. – Phylogenetic trees AGTC – Sequences AGAA, CCTC, AGTC AGAA Probability estimation and weights CCTC 2 Weighting schemes based on trees Thompson, Higgins & Gibson (1994) (Represents electric currents as calculated by Kirchhoff’s laws) Gerstein, Sonnhammer & Chothia (1994) Root weights from Gaussian parameters (Altschul-Caroll-Lipman weights for a three-leaf tree 1989) Probability estimation and weights 3 Thompson, Higgins & Gibson Electric network of voltages, currents and resistances I1 I 2 R4 I1 R1 1 V4 V5 I 3 R3 I 2 R2 2 3 Probability estimation and weights 4 Thompson, Higgins & Gibson V4 2I1 2I 2 V5 2I1 3( I1 I 2 ) 4 I 3 I1 I 2 3 I1 2 1 V4 V5 I3 4 I1 : I 2 : I 3 1 : 1 : 2 I2 2 2 3 Probability estimation and weights 5 Gerstein, Sonnhammer & Chothia Works up the tree, incrementing the weights – Initially: weights are set to the edge lengths t n (resistances in previous example) wi tn wi w k leaves k below n Probability estimation and weights 6 Gerstein, Sonnhammer & Chothia w1 2, w2 2, w3 4 w1 w2 2 1.5 3.5 2 3 4 1 2 w1 : w2 : w3 7 : 7 : 8 2 0 1 2 3 Probability estimation and weights 7 Gerstein, Sonnhammer & Chothia Small difference with Thompson, Higgins & Gibson? 1 1 2 T, H & G : I1 : I 2 2 : 1 G, S & C : w1 : w2 1 : 2 2 Probability estimation and weights 8 Root weights from Gaussian parameters Continuous in stead of discrete members of an alphabet Probability density in stead of a substitution matrix Example: Gaussian ( x y) P( x y | t ) exp 2 t Probability estimation and weights 2 9 Root weights from Gaussian parameters P( x at node 4 | L1 , L2 ) ( x x1 ) 2 ( x x2 ) 2 exp K1 exp 2t1 2t 2 v ( x v1 x1 v2 x2 ) K1 exp 2t12 2 v1 t2 (t1 t2 ) , v2 t1 (t1 t2 ) Probability estimation and weights 10 Root weights from Gaussian parameters Altschul-Caroll-Lipman weights for a tree with three leaves P( x at node 5 | L1 , L2 , L3 ) ( x w1 x1 w2 x2 w3 x3 ) 2 K 2 exp 2t123 Probability estimation and weights 11 Root weights from Gaussian parameters w1 t 2t3 / t1t2 (t3 t4 )(t1 t2 ) w2 t1t3 / , w3 {t1t 2 t 4 (t1 t 2 )} / 3 4 2 1 w1 : w2 : w3 1 : 1 : 2 2 2 3 Probability estimation and weights 12 Weighting schemes based on trees Thompson, Higgins & Gibson (Electric current): 1:1:2 Gerstein, Sonnhammer & Chothia: 7:7:8 Altschul-Caroll-Lipman weights for a tree with three leaves: 1:1:2 Probability estimation and weights 13 Weighting scheme using ‘sequence space’ Voronoi weights wi ni n k k Probability estimation and weights 14 More weighting schemes Maximum discrimination weights Maximum entropy weights – Based on averaging – Based on maximum ‘uniformity’ (entropy) Probability estimation and weights 15 Maximum discrimination weights Does not try to maximize likelihood or posterior probability It decides whether a sequence is a member of a family Probability estimation and weights 16 Maximum discrimination weights P( x | M ) P( M ) P( M | x) P( x | M ) P( M ) P( X | R) P( R) Discrimination D D P( M | x ) k k Maximize D, emphasis is on distant or difficult members Probability estimation and weights 17 Maximum discrimination weights Differences with previous systems – Iterative method Initial weights give rise to a model New calculated posterior probabilities P(M|x) gives rise to new weights and hence a new model until convergence is reached – It optimizes performance for that what the model is designed for : classifying whether a sequence is a member of a family Probability estimation and weights 18 More weighting schemes Maximum discrimination weights Maximum entropy weights – Based on averaging – Based on maximum ‘uniformity’ (entropy) Probability estimation and weights 19 Maximum entropy weights Entropy = A measure of the average uncertainty of an outcome (maximum when we are maximally uncertain about the outcome) 1 Averaging: wk mk i i ixik weight wk , total different residues mi and kia residues of type a in column i Probability estimation and weights 20 Maximum entropy weights Sequences AGAA CCTC AGTC 1 4 1 2 1 4 1 4 1 2 1 4 1 2 1 4 1 4 1 2 1 4 1 4 3 w1 8 3 w2 8 2 w3 8 m1 2 (A and C) k1 A 2, k1C 1 Probability estimation and weights 21 Maximum entropy weights ‘Uniformity’: Shannon : H ( X ) P( xi ) log P( xi ) H (w) w i i k i k H i ( w) pia log( pia ) a Probability estimation and weights 22 Maximum entropy weights Sequences AGAA CCTC AGTC H1 ( w) ( w1 w3 ) log( w1 w3 ) w2 log w2 H 2 ( w) ( w1 w3 ) log( w1 w3 ) w2 log w2 H 3 ( w) w1 log w1 ( w2 w3 ) log( w2 w3 ) H 4 ( w) w1 log w1 ( w2 w3 ) log( w2 w3 ) Probability estimation and weights 23 Maximum entropy weights ( w1 w3 ) w w ( w2 w3 ) 2 2 2 2 2 ( w1 w3 ) ( w2 w3 ) 2 2 2 Solving the equations leads to: 1 1 w1 , w2 , w3 0 2 2 Probability estimation and weights 24 Summary of the entropy methods Maximum entropy weights (avaraging) 3 3 2 w1 , w2 , w3 8 8 8 Maximum entropy weights (‘uniformity’) 1 1 w1 , w2 , w3 0 2 2 Probability estimation and weights 25 Conclusion Many different methods Which one to use depends on problem Questions?? Probability estimation and weights 26