Proofs and Details for the paper “A Generative-Discriminative Hybrid Method for Multi-View Object Detection” 1 Proof of Theorem 1 Proof. We start from the posterior probability p(X|O, H = 1). According to the Bayes rule 1 p(O|X, H = 1)p(X|H = 1) C where C is the normalization term, which is the positive likelihood p(O|H = 1): X C = p(O|H = 1) = p(O|X, H = 1)P (X|H = 1) (1) p(X|O, H = 1) = X Next, let us rewrite the posterior probability p(X|O, H = 1) as the following Q + Q + u fB1 (yu ) uv fB2 (yuv ) p(O|X, H = 1)p(X|H = 1)Z p(X|O, H = 1) = Q Q + + CZ uv fB2 (yuv ) u fB1 (yu ) (2) Using the independence assumption Y Y p(O|X, H = 1) = p(yuv |x1u , x1v , ..., xN u , xN v , H = 1) p(yu |x1u , ..., xN u , H = 1) uv u and plugging in the parameter mapping equations (also in the page 4 section 2.2 of the main paper) p(yu |x11 = 0, ..., xiu = 1, ..., H = 1) = fi (yu ) p(yuv |x11 = 0, ..., xiu = 1, xjv = 1, ..., H = 1) = fij (yuv ) and p(yu |x11 = 0, xiu = 0, ..., xN M = 0, H = 1) = fB+1 (yu ) p(yuv |x11 = 0, xiu = 0, ..., xN M = 0, H = 1) = fB+2 (yuv ) 1 Comparing to the term in the Gibbs distribution in Eq.(8)(main paper), we note for every matching X, we have Y Y p(O|X, H = 1)p(X|H = 1)Z = ς (x , x ) ηiu (xiu ) Q + Q iu,jv iu jv + uv fB2 (yuv ) u fB1 (yu ) iu,jv iu Since the spaces of all matchings of p(X|O, H = 1) and the Gibbs distribution in Eq.(7)(main paper) are the same, the normalization constant should be also equal, i.e. CZ = Z0 Q + Q + f (y ) f (y ) u B1 u uv B2 uv Therefore the positive likelihood is p(O|H = 1) = C = Y Z0 Y + fB1 (yu ) fB+2 Z u uv (3) and the likelihood ratio is Q Q + + 0 p(O|H = 1) Z0 uv fB2 Z u fB1 (yu ) =Q − = σ Q − p(O|H = 0) Z u fB1 (yu ) uv fB2 Z (4) It would be also easy to show that for the integer-based representation of matching, we would have the same result. 2 Proof of Lemma 1 Proof. We first prove a special case when N ≤ M for the unpruned MRF. We first enumerate the matchings where there are i nodes nI1 , nI2 ...nIi in the RARG being matched to the nodes in ARG, where 1 ≤ i ≤ N ,and I1 , I2 ...Ii is the index of the RARG node. The corresponding summation is µ ¶ M M (M − 1)(M − 2)...(M − i + 1)zI1 zI2 ...zIi = i!zI1 zI2 ...zIi i For all matchings where there are i nodes being matched to RARG, the summation becomes µ ¶ µ ¶ X M M i! zI1 zI2 ...zIi = i!Πi (z1 , z2 , ..., zN ) i i 1≤I1 <I2 <...<Ii ≤N Where X Πi (z1 , z2 , ..., zN ) = 1≤I1 <I2 <...<Ii ≤N 2 zI1 zI2 ...zIi is a shorthand notation known as Elementary Symmetric Polynomial. Therefore Z can be written as the following form and satisfies the inequality Z= N X M (M − 1)...(M − i + 1)Πi (z1 , z2 , ..., zN ) ≤ i=0 N X M i Πi (z1 , z2 , ..., zN ) (5) i=0 The equality holds when N/M tends to zero. And we have the following relationship N X N Y Πi (z1 , z2 , ..., zN ) = 1+z1 +z2 +...+zN +z1 z2 +...+zN −1 zN +... = (1+zi ) i=0 i=1 and M i Πi (z1 , z2 , ..., zN ) = Πi (M z1 , M z2 , ..., M zN ) Therefore, the RHS in equation 5 can be simplified as the following N X i M Πi (z1 , z2 , ..., zN ) = i=0 N X N Y Πi (M z1 , M z2 , ..., M zN ) = (1 + M zi ) i=0 i=1 The above function in fact is the partition function of the Gibbs distribution if we remove the one-to-one constraints. Likewise, for the general case and unpruned MRF, the partition function is upper-bounded by the partition function of the Gibbs distribution if we remove the one-to-one constraints, which for pruned MRF, by enumerating the matchings, can be written as N Y 1 + d1 z1 + d2 z2 + ... + dN zN + d1 d2 z1 z2 + ... = (1 + di zi ) i=1 where di is the number of the nodes in the ARG that are allowed to match to the node i in the RARG after pruning the Association Graph. Therefore we have N Y ln Z≤ (1 + di zi ) i=1 3 Proof of Lemma 2 Proof. The partition function can be calculated by enumerating the admissible matching (matching that does not violate the one-to-one constraint) as the following Y X Y XY ψiu,jv (xiu , xjv ) φiu (xiu ) = zi Z(N ; M ; z1 , z2 , ..., zN ) = X iu,jv iu 3 admissible X iu where zi = φiu (1) is defined in the main paper section 2.2. Therefore, the partition function is the summation of monomials whose variables have maximum power of 1. And the fact is true for both pruned and unpruned MRF. We can separate the above monomial summation into two polynomials, the polynomial containing zi and the polynomial not Z(N ; M ; z1 , z2 , ..., zN ) = V1 (z1 , z2 , ..., zi , ...zN ) + V2 (z1 , z2 , ..., zi−1 , zi+1 ...zN ) Then the occurrence probability ri , which is defined as the prior probability of the occurrence of the node i in the generated ARG, should be ri = ∂Z 1 zi ∂V zi ∂z V1 ∂ ln Z ∂zi i = = = zi Z Z Z ∂zi Where we have used the fact that V1 is the summation of the monomials in the form of zI1 zI2 ...zIL , which has the following invariant property zI1 zI2 ...zIL = zIk 4 ∂ (zI zI ...zIL ), ∂zIk 1 2 ∀Ik ∈ {I1 , I2 , ..., IL } Proof of the Equation (10) in the main paper We need to estimate the occurrence probability ri . The overall likelihood for positive training data is defined as L= K X ln p(Ok |H = 1) (6) k=1 where K is the number of the positive training instances. We have the variational approximation of the overall log-likelihood L≈ K X X q̂(xkiu = 1)lnzi − KlnZ(N ; M ; z1 , z2 , ..., zN ) + α (7) k=1 iu where α is a term independent on the occurrence probability r1 , r2 , ..., rN . To maximize the approximated likelihood with respect to zi , we compute the partial derivative of the Eq.(7) to zi , and equates it to zero. With the help of lemma 2, we 4 obtain i ∂ ∂L ∂ hXX = q̂(xkiu = 1)lnzi − K ln Z(N ; M ; z1 , z2 , ..., zN ) ∂zi ∂zi ∂zi K k=1 iu = K X X q̂(xkiu = 1) k=1 iu ri 1 −K =0 zi zi (8) Since zi is assumed to be non-zero, the above equation leads to the definition equation of ri 1 XX ri = q̂(xkiu = 1) (9) K u k 5 The Update Equations for the Gaussian Density Functions at Nodes and Edges of the RARG Refered by the E-M step in Section 2.4 (main paper page 5), the E-M update equations are listed as the following: k k k k k k ξiu = q̂(xkiu = 1), ξiu,jv = q̂(xkiu = 1, xkjv = 1); ξ¯iu = 1 − ξiu , ξ¯iu,jv = 1 − ξiu,jv P P k k P P k k ξ (y − µi )(y k − µi )T k u ξiu yu µi = P P k Σi = k u iuP uP k u ξ k u ξiu P k P u iuk P P k k k k T k uv ξiu,jv yuv k uv ξiu,jv (yuv − µij )(yuv − µij ) P P µij = P P Σ = ij k k k uv ξiu,jv k uv ξiu,jv P P ¯k k P P ¯k k ξ (y − µi )(y k − µi )T + + k u ξiu yu ΣB1 = k u iuP uP ¯k u µB1 = P P ¯k ξ ξ P kP u ¯kiu k P P k¯k u iuk k T k uv ξiu,jv yuv k uv ξiu,jv (yuv − µij )(yuv − µij ) + P P P P µ+ = Σ = B2 B2 ξ¯k ξ¯k k uv iu,jv k uv iu,jv where k is the iteration index, and µi and Σi are mean and covariance matrix of the Gaussian R.V. at the nodes of the Random ARG, µij and Σij are mean and covariance matrix of the Gassian R.V. at the edges of the Random ARG. + + + µ+ B1 ,µB2 ,ΣB1 ,ΣB2 are the corresponding parameters of the Gaussian R.V. for modelling background. The above equations are obtained by maximizing the variational approximation of the overall likelihood of the positive training data, as defined in equation (6). 5