Weighting training sequences
Why do we want to weight training
sequences?
Many different proposals
– Based on trees
– Based on the 3D position of the sequences
– Interested only in classifying family
membership
– Maximizing entropy
Probability estimation and weights
1
Why do we want to weight
training sequences?
Parts of sequences can be closely related to
each other and don’t deserve the same
influence in the estimation process as a
sequence which is highly diverted.
– Phylogenetic trees
AGTC
– Sequences AGAA, CCTC, AGTC
AGAA
Probability estimation and weights
CCTC
2
Weighting schemes based on
trees
Thompson, Higgins & Gibson (1994)
(Represents electric currents as calculated
by Kirchhoff’s laws)
Gerstein, Sonnhammer & Chothia (1994)
Root weights from Gaussian parameters
(Altschul-Caroll-Lipman weights for a
three-leaf tree 1989)
Probability estimation and weights
3
Thompson, Higgins & Gibson
Electric network of voltages, currents and
resistances
I1 I 2 R4
I1 R1
1
V4
V5
I 3 R3
I 2 R2
2
3
Probability estimation and weights
4
Thompson, Higgins & Gibson
V4 2I1 2I 2
V5 2I1 3( I1 I 2 ) 4 I 3
I1 I 2 3
I1 2
1
V4
V5
I3 4
I1 : I 2 : I 3 1 : 1 : 2
I2 2
2
3
Probability estimation and weights
5
Gerstein, Sonnhammer &
Chothia
Works up the tree, incrementing the weights
– Initially: weights are set to the edge lengths t n
(resistances in previous example)
wi
tn wi
w
k
leaves k below n
Probability estimation and weights
6
Gerstein, Sonnhammer &
Chothia
w1 2, w2 2, w3 4
w1 w2 2 1.5 3.5
2
3
4
1
2
w1 : w2 : w3 7 : 7 : 8
2
0
1
2
3
Probability estimation and weights
7
Gerstein, Sonnhammer &
Chothia
Small difference with Thompson, Higgins
& Gibson?
1
1
2
T, H & G : I1 : I 2 2 : 1
G, S & C : w1 : w2 1 : 2
2
Probability estimation and weights
8
Root weights from Gaussian
parameters
Continuous in stead of discrete members of
an alphabet
Probability density in stead of a substitution
matrix
Example: Gaussian
( x y)
P( x y | t ) exp
2 t
Probability estimation and weights
2
9
Root weights from Gaussian
parameters
P( x at node 4 | L1 , L2 )
( x x1 ) 2
( x x2 ) 2
exp
K1 exp
2t1
2t 2
v
( x v1 x1 v2 x2 )
K1 exp
2t12
2
v1 t2 (t1 t2 ) , v2 t1 (t1 t2 )
Probability estimation and weights
10
Root weights from Gaussian
parameters
Altschul-Caroll-Lipman weights for a tree
with three leaves
P( x at node 5 | L1 , L2 , L3 )
( x w1 x1 w2 x2 w3 x3 ) 2
K 2 exp
2t123
Probability estimation and weights
11
Root weights from Gaussian
parameters
w1 t 2t3 /
t1t2 (t3 t4 )(t1 t2 )
w2 t1t3 / ,
w3 {t1t 2 t 4 (t1 t 2 )} /
3
4
2
1
w1 : w2 : w3 1 : 1 : 2
2
2
3
Probability estimation and weights
12
Weighting schemes based on
trees
Thompson, Higgins & Gibson (Electric
current): 1:1:2
Gerstein, Sonnhammer & Chothia: 7:7:8
Altschul-Caroll-Lipman weights for a tree
with three leaves: 1:1:2
Probability estimation and weights
13
Weighting scheme using
‘sequence space’
Voronoi weights
wi
ni
n
k k
Probability estimation and weights
14
More weighting schemes
Maximum discrimination weights
Maximum entropy weights
– Based on averaging
– Based on maximum ‘uniformity’ (entropy)
Probability estimation and weights
15
Maximum discrimination weights
Does not try to maximize likelihood or
posterior probability
It decides whether a sequence is a member
of a family
Probability estimation and weights
16
Maximum discrimination weights
P( x | M ) P( M )
P( M | x)
P( x | M ) P( M ) P( X | R) P( R)
Discrimination D
D P( M | x )
k
k
Maximize D, emphasis is on distant or
difficult members
Probability estimation and weights
17
Maximum discrimination weights
Differences with previous systems
– Iterative method
Initial weights give rise to a model
New calculated posterior probabilities P(M|x) gives
rise to new weights and hence a new model until
convergence is reached
– It optimizes performance for that what the
model is designed for : classifying whether a
sequence is a member of a family
Probability estimation and weights
18
More weighting schemes
Maximum discrimination weights
Maximum entropy weights
– Based on averaging
– Based on maximum ‘uniformity’ (entropy)
Probability estimation and weights
19
Maximum entropy weights
Entropy = A measure of the average
uncertainty of an outcome (maximum when
we are maximally uncertain about the
outcome)
1
Averaging: wk
mk
i
i ixik
weight wk , total different residues mi
and kia residues of type a in column i
Probability estimation and weights
20
Maximum entropy weights
Sequences
AGAA
CCTC
AGTC
1
4
1
2
1
4
1
4
1
2
1
4
1
2
1
4
1
4
1
2
1
4
1
4
3
w1
8
3
w2
8
2
w3
8
m1 2 (A and C)
k1 A 2, k1C 1
Probability estimation and weights
21
Maximum entropy weights
‘Uniformity’:
Shannon : H ( X ) P( xi ) log P( xi )
H (w) w
i
i
k
i
k
H i ( w) pia log( pia )
a
Probability estimation and weights
22
Maximum entropy weights
Sequences
AGAA
CCTC
AGTC
H1 ( w) ( w1 w3 ) log( w1 w3 ) w2 log w2
H 2 ( w) ( w1 w3 ) log( w1 w3 ) w2 log w2
H 3 ( w) w1 log w1 ( w2 w3 ) log( w2 w3 )
H 4 ( w) w1 log w1 ( w2 w3 ) log( w2 w3 )
Probability estimation and weights
23
Maximum entropy weights
( w1 w3 ) w w ( w2 w3 )
2
2
2
2
2
( w1 w3 ) ( w2 w3 )
2
2
2
Solving the equations leads to:
1
1
w1 , w2 , w3 0
2
2
Probability estimation and weights
24
Summary of the entropy methods
Maximum entropy weights (avaraging)
3
3
2
w1 , w2 , w3
8
8
8
Maximum entropy weights (‘uniformity’)
1
1
w1 , w2 , w3 0
2
2
Probability estimation and weights
25
Conclusion
Many different methods
Which one to use depends on problem
Questions??
Probability estimation and weights
26