Proofs and Details for the paper “A Generative-Discriminative Hybrid Method for

advertisement
Proofs and Details for the paper
“A Generative-Discriminative Hybrid Method for
Multi-View Object Detection”
1
Proof of Theorem 1
Proof. We start from the posterior probability p(X|O, H = 1). According to the
Bayes rule
1
p(O|X, H = 1)p(X|H = 1)
C
where C is the normalization term, which is the positive likelihood p(O|H = 1):
X
C = p(O|H = 1) =
p(O|X, H = 1)P (X|H = 1)
(1)
p(X|O, H = 1) =
X
Next, let us rewrite the posterior probability p(X|O, H = 1) as the following
Q +
Q
+
u fB1 (yu )
uv fB2 (yuv ) p(O|X, H = 1)p(X|H = 1)Z
p(X|O, H = 1) =
Q
Q +
+
CZ
uv fB2 (yuv )
u fB1 (yu )
(2)
Using the independence assumption
Y
Y
p(O|X, H = 1) =
p(yuv |x1u , x1v , ..., xN u , xN v , H = 1)
p(yu |x1u , ..., xN u , H = 1)
uv
u
and plugging in the parameter mapping equations (also in the page 4 section 2.2 of
the main paper)
p(yu |x11 = 0, ..., xiu = 1, ..., H = 1) = fi (yu )
p(yuv |x11 = 0, ..., xiu = 1, xjv = 1, ..., H = 1) = fij (yuv )
and
p(yu |x11 = 0, xiu = 0, ..., xN M = 0, H = 1) = fB+1 (yu )
p(yuv |x11 = 0, xiu = 0, ..., xN M = 0, H = 1) = fB+2 (yuv )
1
Comparing to the term in the Gibbs distribution in Eq.(8)(main paper), we note for
every matching X, we have
Y
Y
p(O|X, H = 1)p(X|H = 1)Z
=
ς
(x
,
x
)
ηiu (xiu )
Q +
Q
iu,jv
iu
jv
+
uv fB2 (yuv )
u fB1 (yu )
iu,jv
iu
Since the spaces of all matchings of p(X|O, H = 1) and the Gibbs distribution in
Eq.(7)(main paper) are the same, the normalization constant should be also equal,
i.e.
CZ
= Z0
Q +
Q
+
f
(y
)
f
(y
)
u B1 u
uv B2 uv
Therefore the positive likelihood is
p(O|H = 1) = C =
Y
Z0 Y +
fB1 (yu )
fB+2
Z u
uv
(3)
and the likelihood ratio is
Q
Q +
+
0
p(O|H = 1)
Z0
uv fB2 Z
u fB1 (yu )
=Q −
=
σ
Q
−
p(O|H = 0)
Z
u fB1 (yu )
uv fB2 Z
(4)
It would be also easy to show that for the integer-based representation of matching,
we would have the same result.
2
Proof of Lemma 1
Proof. We first prove a special case when N ≤ M for the unpruned MRF. We first
enumerate the matchings where there are i nodes nI1 , nI2 ...nIi in the RARG being
matched to the nodes in ARG, where 1 ≤ i ≤ N ,and I1 , I2 ...Ii is the index of the
RARG node. The corresponding summation is
µ ¶
M
M (M − 1)(M − 2)...(M − i + 1)zI1 zI2 ...zIi =
i!zI1 zI2 ...zIi
i
For all matchings where there are i nodes being matched to RARG, the summation
becomes
µ ¶
µ ¶
X
M
M
i!
zI1 zI2 ...zIi =
i!Πi (z1 , z2 , ..., zN )
i
i
1≤I1 <I2 <...<Ii ≤N
Where
X
Πi (z1 , z2 , ..., zN ) =
1≤I1 <I2 <...<Ii ≤N
2
zI1 zI2 ...zIi
is a shorthand notation known as Elementary Symmetric Polynomial. Therefore Z
can be written as the following form and satisfies the inequality
Z=
N
X
M (M − 1)...(M − i + 1)Πi (z1 , z2 , ..., zN ) ≤
i=0
N
X
M i Πi (z1 , z2 , ..., zN ) (5)
i=0
The equality holds when N/M tends to zero. And we have the following relationship
N
X
N
Y
Πi (z1 , z2 , ..., zN ) = 1+z1 +z2 +...+zN +z1 z2 +...+zN −1 zN +... =
(1+zi )
i=0
i=1
and
M i Πi (z1 , z2 , ..., zN ) = Πi (M z1 , M z2 , ..., M zN )
Therefore, the RHS in equation 5 can be simplified as the following
N
X
i
M Πi (z1 , z2 , ..., zN ) =
i=0
N
X
N
Y
Πi (M z1 , M z2 , ..., M zN ) =
(1 + M zi )
i=0
i=1
The above function in fact is the partition function of the Gibbs distribution if we
remove the one-to-one constraints. Likewise, for the general case and unpruned
MRF, the partition function is upper-bounded by the partition function of the Gibbs
distribution if we remove the one-to-one constraints, which for pruned MRF, by
enumerating the matchings, can be written as
N
Y
1 + d1 z1 + d2 z2 + ... + dN zN + d1 d2 z1 z2 + ... =
(1 + di zi )
i=1
where di is the number of the nodes in the ARG that are allowed to match to the
node i in the RARG after pruning the Association Graph. Therefore we have
N
Y
ln Z≤ (1 + di zi )
i=1
3
Proof of Lemma 2
Proof. The partition function can be calculated by enumerating the admissible
matching (matching that does not violate the one-to-one constraint) as the following
Y
X
Y
XY
ψiu,jv (xiu , xjv )
φiu (xiu ) =
zi
Z(N ; M ; z1 , z2 , ..., zN ) =
X iu,jv
iu
3
admissible X iu
where zi = φiu (1) is defined in the main paper section 2.2. Therefore, the partition
function is the summation of monomials whose variables have maximum power of
1. And the fact is true for both pruned and unpruned MRF. We can separate the
above monomial summation into two polynomials, the polynomial containing zi
and the polynomial not
Z(N ; M ; z1 , z2 , ..., zN ) = V1 (z1 , z2 , ..., zi , ...zN ) + V2 (z1 , z2 , ..., zi−1 , zi+1 ...zN )
Then the occurrence probability ri , which is defined as the prior probability of the
occurrence of the node i in the generated ARG, should be
ri =
∂Z
1
zi ∂V
zi ∂z
V1
∂ ln Z
∂zi
i
=
=
= zi
Z
Z
Z
∂zi
Where we have used the fact that V1 is the summation of the monomials in the form
of zI1 zI2 ...zIL , which has the following invariant property
zI1 zI2 ...zIL = zIk
4
∂
(zI zI ...zIL ),
∂zIk 1 2
∀Ik ∈ {I1 , I2 , ..., IL }
Proof of the Equation (10) in the main paper
We need to estimate the occurrence probability ri . The overall likelihood for positive training data is defined as
L=
K
X
ln p(Ok |H = 1)
(6)
k=1
where K is the number of the positive training instances. We have the variational
approximation of the overall log-likelihood
L≈
K X
X
q̂(xkiu = 1)lnzi − KlnZ(N ; M ; z1 , z2 , ..., zN ) + α
(7)
k=1 iu
where α is a term independent on the occurrence probability r1 , r2 , ..., rN . To
maximize the approximated likelihood with respect to zi , we compute the partial
derivative of the Eq.(7) to zi , and equates it to zero. With the help of lemma 2, we
4
obtain
i
∂
∂L
∂ hXX
=
q̂(xkiu = 1)lnzi − K
ln Z(N ; M ; z1 , z2 , ..., zN )
∂zi
∂zi
∂zi
K
k=1 iu
=
K X
X
q̂(xkiu = 1)
k=1 iu
ri
1
−K =0
zi
zi
(8)
Since zi is assumed to be non-zero, the above equation leads to the definition equation of ri
1 XX
ri =
q̂(xkiu = 1)
(9)
K
u
k
5
The Update Equations for the Gaussian Density Functions at Nodes and Edges of the RARG
Refered by the E-M step in Section 2.4 (main paper page 5), the E-M update equations are listed as the following:
k
k
k
k
k
k
ξiu
= q̂(xkiu = 1), ξiu,jv
= q̂(xkiu = 1, xkjv = 1);
ξ¯iu
= 1 − ξiu
, ξ¯iu,jv
= 1 − ξiu,jv
P P k k
P P k k
ξ (y − µi )(y k − µi )T
k
u ξiu yu
µi = P P k
Σi = k u iuP uP k u
ξ
k
u ξiu
P k P u iuk
P P
k
k
k
k
T
k
uv ξiu,jv yuv
k
uv ξiu,jv (yuv − µij )(yuv − µij )
P
P
µij = P P
Σ
=
ij
k
k
k
uv ξiu,jv
k
uv ξiu,jv
P P ¯k k
P P ¯k k
ξ (y − µi )(y k − µi )T
+
+
k
u ξiu yu
ΣB1 = k u iuP uP ¯k u
µB1 = P P ¯k
ξ
ξ
P kP u ¯kiu k
P P k¯k u iuk
k
T
k
uv ξiu,jv yuv
k
uv ξiu,jv (yuv − µij )(yuv − µij )
+
P
P
P
P
µ+
=
Σ
=
B2
B2
ξ¯k
ξ¯k
k
uv iu,jv
k
uv iu,jv
where k is the iteration index, and µi and Σi are mean and covariance matrix
of the Gaussian R.V. at the nodes of the Random ARG, µij and Σij are mean
and covariance matrix of the Gassian R.V. at the edges of the Random ARG.
+
+
+
µ+
B1 ,µB2 ,ΣB1 ,ΣB2 are the corresponding parameters of the Gaussian R.V. for modelling background.
The above equations are obtained by maximizing the variational approximation
of the overall likelihood of the positive training data, as defined in equation (6).
5
Download