IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI) 1 Attribute fusion in a latent process model for time series of graphs Minh Tang, Youngser Park, Nam H. Lee, and Carey E. Priebe F We can thus decompose Var[τ∗λ (G (t ))] as X X Var[τ∗λ (G (t ))] = S 21 Var[ Yu v ] + S 22 Var[ Yu v ] Appendix Proofs of some stated results {u ,v }∈S1 Corollary 2: The maximizer of µλ also maximizes µ2λ = + S 23 Var[ λT ζζT λ λT ξλ ν T ξ−1/2 ζζT ξ−1/2 ν νT ν The claim then follows directly from the Rayleigh-Ritz theorem for Hermitian matrices. Lemma 3: τλ (G ) is a U-statistics with kernel function h(Y1 , Y2 , Y3 ) = Y1 Y2 Y3 . By the theory of U-statistics, we know that τλ (G ) − E[τ∗λ (G )] d −→ N (0, 1) (1) p Var[τ∗λ (G )] provided that Var[τλ (G ) − τ∗λ (G )] = o(Var[τ∗λ (G )]). (3) and h X i Var[τ∗λ (G (t ))] = Var Yu v E[Yu w ]E[Yv w ] {u ,v,w } hX i = (n − 2)2 〈λ, π00 〉4 Var Yu v The above expression is reasoned as follows. If w ∈ [m ], then E[Yu w ] = E[Yv w ] = 〈λ, π11 〉 and there are m −2 possible choices for w ∈ [m ] different from u and v . If w ∈ [n ]\[m ], then E[Yv w ] = E[Yu w ] = 〈λ, π10 〉 and there are n −m possible choices for w . Analogous reasoning gives the expressions for S 2 and S 3 in the statement of the lemma. We also have Var[Yu v ] = + (4) n 〈λ, π00 〉4 2 〈λ, η00 λ〉. We now sketch the derivation of Var[τ∗λ (G (t ))] for t = t ∗ . We V partition the set {u , v } ∈ 2 into the sets S1 = {u , v ∈ [m ]}, S2 = {u ∈ [m ], v ∈ [n ] \ [m ]}, S3 = {u , v ∈ [n ] \ [m ]}. 〈λ, η00 λ〉 if {u , v } ∈ S1 〈λ, η01 λ〉 if {u , v } ∈ S2 . 〈λ, η11 λ〉 if {u , v } ∈ S3 (7) m 〈λ, η00 λ〉S 21 + m (n 2 n −m 〈λ, η00 λ〉S 23 2 − m )〈λ, η01 λ〉S 22 as desired. To complete the proof one must show that Var[τλ (G ) − τ∗λ (G )] = o(Var[τ∗λ (G )]) and this follows directly from the argument in [1] or [2]. Proposition 5: Let v ∈ V (t ) and denote by d λ (v ; t ) the (fused) degree of vertex v , i.e., X d λ (v ; t ) = 〈λ, Γv w 〉. w ∈N (v ) {u ,v } = (n − 2) (6) w 6=u ,v = ((m − 2)〈λ, π11 〉2 + (n − m )〈λ, π10 〉2 )Yu v . (2) E[h(Yi , Yj , Yk )|Yi ] = Yi E[Yj ]E[Yk ]. n 〈λ, π00 〉3 3 Now, for {u , v } ∈ S1 , we have X S 1 Yu v = E[h(Yu v , Yu w , Yv w ) | Yu v ] Var[τ∗λ (G (t ))] = E[h(Yi , Yj , Yk ) = E[Yi ]E[Yj ]E[Yk ] 2 (5) and thus By the independent edge assumption, we have Thus, for t < t ∗ , we have E[τ∗λ (G (t ))] = {u ,v }∈S2 Yu v ] {u ,v }∈S3 Because ξ is positive definite, there exists a positive definite matrix ξ1/2 such that ξ1/2 ξ1/2 = ξ. Letting ν = ξ1/2 λ, the above expression can be rewritten as µ2λ = X For t < each of the Γv w is a multinomial trial with probability vector π00 . The following statements are made as n → ∞ for fixed K . By the central limit theorem, we have t ∗, d λ (v ; t ) − (n − 1)〈λ, π00 〉 d −→ N (0, 1). p (n − 1)〈λ, η00 λ〉 (8) We can thus consider the degree sequence of G (t ) for t < t ∗ as a sequence of dependent normally distributed random variables. By an argument analogous to the argument for Erdös-Renyi random graphs in [3, §III.1] we can show that IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI) the dependency among the {d λ (v ; t )}v ∈V (t ) can be ignored. Another way of doing this is to note that the covariance between X u and X v , where X u and X v are the ratio in Eq. (8) for vertices u and v , is given by r = Cov(X u , X v ) = p 3〈λ, π00 〉 (n − 1)〈λ, η00 λ〉 . Let ζv = ∆λ (t )−(n −1)〈λ,π00 〉 p (n −1)〈λ,η00 λ〉 −x ≤ a n + b n x → e −e . Theorem 6: Let X ∼ G (α, β ). We consider the normalX −µ ization σ . We have P h X −µ σ i −(z σ+µ−α)/β ≤ z = P[X ≤ z σ + µ] = e −e = e −e (9) X −µ Because r log n → 0 as n → ∞, the sample maximum of the X u converges to the sample maximum of a sequence of independent N (0, 1) random variables. d λ (v ; t ), can thus be considered as a sequence of independent random variables from a normal distribution. It is well known that the sample maximum of standard normal random variables converges weakly to a Gumbel distribution [4, §2.3]. It is, however, not clear whether the convergence of ∆λ (t ) to a Gumbel distribution continues to hold under the composition of weak convergence as outlined above. We avoid this problem by showing directly that P 2 (10) −(z −(α−µ)/σ)/(β /σ) . α−µ β Thus, σ ∼ G ( σ , σ ). Because the sample mean and the sample variance are consistent estimators, the claim follows after an application of Slutsky’s theorem. Lemma 7: Let φλ (v ; t ) = ψλ (v ; t ) − d λ (v ; t ) be the (fused) locality statistics for vertex v at time t not including the (fused) degree of v , i.e., X 〈λ, Γu w 〉. (13) φλ (v ; t ) = u w ∈N (v ) u ,w 6=v The following statements are conditional on |N (v )| = l . First of all, we have K X φλ (v ; t ) = λk z k k =1 d λ (v ;t )−(n−1)〈λ,π00 〉 p and Fn (u ) = P(ζv ≤ u ). If n → ∞ p(n−1)〈λ,η00 λ〉 and u = O( log n ), we have the following moderate deviations result [5], [6, Theorem 2, §XVI.7]. By the central limit theorem, we have i 1 − Fn (u ) h 3 6 = 1 + (C pu n ) +O( un ) 1 − Φ(u ) (11) for some constant C . Letting u n = a n + b n x in Eq. (11), we have u n6 )) n 3 6 u u = Φ(u n ) + (1 − Φ(u n ))(C pnn +O( nn )) u3 u6 = Φ(u n ) +O( u n11−δ )(C pnn +O( nn )) n u n5 = Φ(u n ) +O( n 3/2−δ ) C 00 = (12) u n5 = (Φ(u n )) +O( n 1/2−δ ) −x → e −e . Eq. (10) is established and we obtain the limiting Gumbel distribution for ∆λ (t ) for t < t ∗ . The case when t = t ∗ can be p derive in a similar manner. d We first show that if m = Ω( n log n ) then ∆λ (v ; t ∗ ) −→ ∗ maxv ∈[m ] d λ (v ; t ) [7, Lemma 3.1]. We then show, again by d (v ;t ∗ )−µ d 2 the central limit theorem, that for v ∈ [m ], λ σ −→ 2 N (0, 1). It then follows, similar to our previous reasoning for d (v ;t ∗ )−µ d 〈λ(2) ,π00 〉 , 〈λ,π00 〉 p 00 = (〈λ,π00 〉)2 . 〈λ(2) , π00 〉 (14) l We note that p 00 ∈ [0, 1]. Now let Yl = C 00 Bin( 2 , p 00 ). Then l l E[Yl ] = 2 〈λ, π00 〉 and Var[Yl ] = 2 〈λ, η00 λ〉 and again by the central limit theorem, we have l l ψλ (v ; t ) − 2 〈λ, π00 〉 d Yl − 2 〈λ, π00 〉 −→ Æ . (15) Æ l l 〈λ, η λ〉 〈λ, η λ〉 00 00 2 2 for some sufficiently small δ > 0. We therefore have P(max ζ(v ) ≤ u n ) = (Fn (u n ))n v ∈[n] h in u n5 = Φ(u n ) +O( n 3/2−δ ) l φλ (v ; t ) − 2 〈λ, π00 〉 d −→ N (0, 1). Æ l 〈λ, η00 λ〉 2 Let λ(2) be the element-wise square of λ. Define C 00 and p 00 to be u3 Fn (u n ) = 1 − (1 − Φ(u n ))(1 + C pnn +O( n where the (z 1 , . . . , z K ) are distributed as l (z 1 , z 2 , . . . , z K ) ∼ multinomial 2 , π00 . 2 the case where t < t ∗ , that maxv ∈[m ] λ σ −→ G (a m ,b m ) 2 and we obtain the limiting Gumbel distribution for ∆λ (t ) for t = t ∗ . Eq. (15) states that the locality statistics for our attributed random graphs model with t < t ∗ can be approximated by the locality statistics for an Erdös-Renyi graph with edge probability p 00 . The lemma then follows from Theorem 1.1 in [7]. Lemma 8: For ease of exposition we drop the index t ∗ from our discussion. Let φλ (v ) = ψλ (v )−d λ (v ). Let M (v ) be the number of neighbors of v that lies in [m ] and W (v ) be the number of neighbors of v that lies in [n ] \ [m ]. The following statements are conditional on M (v ) = l ζ and W (v ) = l ξ . We have φλ (v ) = K X k =1 (ζ) (ξ) (ω) λk (y k + y k + y k ) (16) IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI) (ζ) (ζ) (ξ) (ξ) (ω) (ω) where (y 1 , . . . , y K ), (y 1 , . . . , y K ), (y 1 , . . . , y K ) are distributed as (ζ) (ζ) l (y 1 , . . . , y K ) ∼ multinomial 2ζ , π11 (ξ) (ξ) l (y 1 , . . . , y K ) ∼ multinomial 2ξ , π00 (ω) (y 1 , . . . , y m(ω) ) ∼ multinomial l ζ l ξ , π10 . 3 Let us define h(v ) = E[Y (v )], i.e., D(v ) h(v ) = C 00 p 00 2 + (C 11 p 11 − C 00 p 00 ) + (C 10 p 10 − C 00 p 00 )M (v )W (v ). We then have P(Υ(G ) ≥ a n ,m ) = P [ =P [ Y (v ) ≥ a n,m , D(v ) ≥ d n,m v ∈V (G ) l l ρ = 〈λ, 2ζ π11 + 2ξ π00 + l ζ l ξ π10 〉 l l ς = 〈λ, 2ζ η11 + 2ξ η00 + l ζ l ξ η10 λ.〉 ≤ P1 + P2 where By the central limit theorem, as l ζ → ∞ and l ξ → ∞ (17) λ(2) Let be the element-wise square of λ. Define C 00 , C 01 , C 11 and p 00 , p 01 , p 11 to be C 00 = 〈λ(2) ,π00 〉 ; 〈λ,π00 〉 p 00 = (〈λ,π00 〉)2 〈λ(2) , π00 〉 (18) C 11 = 〈λ(2) ,π11 〉 ; 〈λ,π11 〉 p 11 = (〈λ,π11 〉)2 〈λ(2) , π11 〉 (19) C 10 = 〈λ(2) ,π11 〉 ; 〈λ,π10 〉 p 10 = (〈λ,π10 〉)2 〈λ(2) , π10 〉 (20) We note that p 00 , p 01 p11 are all elements of[0, 1]. , and l lζ Now let Yζ ∼ C 11 Bin 2 , p 11 , Yξ ∼ C 00 Bin 2ξ , p 00 and Yω ∼ C 10 Bin l ζ l ξ , p 10 . We also set Y = Yζ + Yξ + Yω . By the central limit theorem, we have φλ (v ) − ρ d Y − ρ −→ . ς ς Y (v ) ≥ a n,m v ∈V (G ) Let ρ and ς be defined as φλ (v ) − ρ d −→ N (0, 1) ς M (v ) 2 ϑn = C 00 h n P1 = P( [ 2 i1/2 log n p 00 (1 − p 00 ) D(v ) ≥ d n ,m , h(v ) ≥ a n ,m − ϑn ) v ∈V (G ) P2 = P( [ D(v ) ≥ d n,m , Y (v ) − h(v ) ≥ ϑn ). v ∈V (G ) We now show that P2 is negligible as n → ∞. To proceed, let A be the event {M (v ) = e , W (v ) = f } and let p e , f = P(A). P2 can then be bounded as follows X P2 ≤ P(Y (v ) − h(v ) ≥ ϑn | A)p e , f n e + f ≥d n ,m X Y (v )−h(v ) ϑn P Var[Y = ≥ A p e ,f (v )]1/2 Var[Y (v )]1/2 e + f ≥d n ,m X ≤ (1 + o(1))P(Z ≥ Θ(log n ))p e , f e + f ≥d n ,m = o(n −1 ). (21) Eq. (21) states that the locality statistics φλ (v ) for our attributed random graphs model at time t = t ∗ can be approximated by the locality statistics Y (v ) for an unattributed kidney and egg model. The limiting distribution for the scan statistics in unattributed kidneyegg graphs had previously been considered in [7]. We provided a sketch of the arguments from [7] below, along with some minor changes to handle the case where the probability of kidney-kidney and kidney-egg connections are different. Let G be an instance of κ(n , m , p 11 , p 10 , p 00 ), an unattributed kidney-egg graph with the probability of egg-egg, eggkidney, and kidney-kidney connections being p 11 , p 10 , and p 00 , respectively. D(v ) = M (v ) + W (v ) is then the degree of v in G . We now show two inequalities relating the tail distribution of ∆(G ) and Υ(G ) = maxv ∈V (G ) Y (v ). We now consider P1 . We note that P1 ≤ R 1 + R 2 where [ R1 = P D(v ) ≥ d n ,m , h(v ) ≥ a n,m − ϑn , v ∈[m ] R2 = P [ D(v ) ≥ d n ,m , h(v ) ≥ a n ,m − ϑn . v ∈[n]\[m ] D(v ) Let us define g (v ) = h(v )−C 00 p 00 2 . R 1 is then bounded as follows [ h(v ) ≥ a n,m − ϑn R1 ≤ P v ∈[m ] ≤P [ D(v ) ≥ v ∈[m ] q 2(a n ,m −ϑn −g (v )) C 00 p 00 . (24) We now consider the term a n ,m − g (v ). We have N a n ,m − g (v ) = C 00 p 00 2κ µ M (v ) + (C 11 p 11 − C 00 p 00 )( 2E − 2 ) + (C 10 p 10 − C 00 p 00 )(µE µF − M (v )W (v )). lim sup P(Υ(G ) ≥ a n ,m ) ≤ lim P(∆(G ) ≥ N κ ), (22) Let E and F be sets of vertices defined by lim inf P(Υ(G ) ≥ a n ,m ) ≥ lim P(∆(G ) ≥ N κ ). (23) E = {v : |M (v ) − µE | ≤ σE log m } (25) F = {v : |W (v ) − µF | ≤ σF log (n − m )}. (26) ∗ p Eq. (22): Let C = max{C 11 ,C 10 ,C 00 } and d n ,m = 2a n ,m /C ∗ . We first note that Υ(G ) ≥ a n,m ⇒ C ∗ D(v ) 2 ≥ a n,m ⇒ D(v ) ≥ d n,m . Then we have, for v ∈ E ∩ F a n ,m − g (v ) = C 00 p 00 Nκ + Θ(m 3/2 log m ) 2 p + Θ(m n − m ) (27) IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI) p When m = Ω( n log n ), Eq. (27) gives C p a n,m − g (v ) = N κ2 002 00 +O(n −1/2−a log n ) . (28) for some a > 0. The set {v ∈ [m ]} can be partition into {v ∈ [m ] ∩ (E ∩ F)} and {v ∈ [m ] \ (E ∩ F)}. We can show that P{v ∈ [m ]\(E∩F)} = o(1) by using a concentration inequality, e.g., Hoeffding’s bound. We thus have [ Æ log n R1 ≤ P D(v ) ≥ N κ 1 +O( n 1/2+a ) + o(n −1 ) v ∈[m ] v ∈E∩F =P [ D(v ) ≥ N κ +O(n 1/2−a log n) + o(n −1 ) 4 As Var[Y (v )] = Θ(N κ ) for {M (v ), W (v )} ∈ S 3 , we have X pS3 = P(Y (v ) < a n ,m )p e , f {e , f }∈S 3 ≤ ≤ The same argument can be applied to R 2 to show that [ R2 ≤ P D(v ) ≥ N κ (1 + o(1)) = o(1). (30) v ∈[n ]\[m ] h P Z ≤O X (37) m 3/2 log m Nκ i − ϕ(n ) p e , f . We now set a = 2(k1+1) . Then for m = O(n k /(k +1) ) and ϕ(n ) = O(n 1/2−a ) we have m 3/2 log m Nκ − ϕ(n ) = −O(n k /2(k +1)) ) (38) Y (v ) ≥ a n,m , D(v ) ≥ N κ Thus P(Y (v ) < a n,m , D(v ) ≥ N κ ) → 0 as desired. From Eq. (22) and Eq. (23), we have (40) N κ,y µ + 〈λ, π11 − π00 〉 2E + 〈λ, π10 − π00 〉µE µF . 2 The above expression is equal to σ2E +F σE +F a n ,m + 〈λ, π00 〉y p Nκ + y p +O(1) . (41) 2 log m 2 2 log m v ∈[m ] We thus have Y (v ) < a n ,m , D(v ) ≥ N κ . v ∈[m ] a n,m ,y = a n ,m + (y + o(1))b n,m . We now show that P(∪v ∈[m ] Y (v ) < a n,m , D(v ) ≥ N κ ) → 0 as n → ∞. Let v ∈ [m ] be arbitrary. It is then sufficient to show that m P(Y (v ) < a n ,m , D(v ) ≥ N κ ) = o(1). We note that P(Y (v ) < a n ,m , D(v ) ≥ N κ ) can be rewritten as X P(Y (v ) ≤ a n ,m | M (v ) = e , W (v ) = f )p e , f . (31) We therefore have lim P(Υ(G ) ≥ a n,m ,y ) = lim P We now split the indices set e + f ≥ N κ in Eq. (31) into three parts S 1 , S 2 and S 3 , namely S 1 = {e ≥ µE + σE log m } (32) S 2 = {e ≤ µE + σE log m , e + f ≤ N κ + ϕ(n )} (33) S 3 = {e ≤ µE + σE log m , e + f ≥ N κ + ϕ(n )} (34) where ϕ(n ) = Θ(n 1/2−a ) for some a > 0. We can then show that m P(M (v ) = e , W (v ) = f , {e , f } ∈ S 1 ) = o(1) by applying a concentration inequality. Similarly, e + f ≥ N κ and e ≤ µE + σE log m implies that (35) and once again, by a concentration inequality, we can show that m P(M (v ) = e , W (v ) = f , {e , f } ∈ S 2 ) = o(1). As for S 3 , from the fact that e + f ≥ N κ + ϕ(n ), we have the bound log m a n ,m − h(v ) ≤ (C 11 p 11 − C 10 p 10 )[m σE log m + 2 ] (36) − C 00 p 00 N κ ϕ(n ). Υ(G ) − a n ,m ≥y b n ,m = lim P(∆(G ) ≥ N κ,y ) ∆(G ) − N y κ ≥p . = lim P σE +F 2 log m e + f ≥N κ f ≥ µF + (z m − o(1))σF (39) {e , f }∈S 3 〈λ, π00 〉 v ∈[m ] [ pe,f 2 log m v ∈[n ] −P a n,m −h(v ) Var[Y (v )]1/2 Let N κ,y = N κ + y pσE +F . We now define a n ,m ,y as Eq. (23): We start by noting that [ P(Υ(G ) ≥ a n,m ) = P Y (v ) ≥ a n ,m D(v ) ≥ N κ ≤ lim P(Υ(G ) ≥ a n,m ) = lim P(∆(G ) ≥ N κ ). Eq. (22) is therefore established. ≥P Y (v )−h(v )) Var[Y (v )]1/2 which then implies i X h m pS3 ≤ m P Z ≤ −O(n k /2(k +1) p e , f = o(1). → P(∆ ≥ N κ ). [ {e , f }∈S 3 (29) log n = P ∆ ≥ µE +F + σE +F (z m +O( n a )) + o(n −1 ) ≥P P {e , f }∈S 3 v ∈[m ] [ X Because ∆(G ) converges weakly to a Gumbel distribution in the limit ([3], [7]), we have Υ(G ) − a −y n ,m P ≤ y → e −e . (42) b n ,m References [1] K. Nowicki and J. C. Wierman, “Subgraph counts in random graphs using incomplete U-statistics methods,” Discrete Mathematics, vol. 72, pp. 299–310, 1988. 1 [2] A. Rukhin, “Asymptotic analysis of various statistics for random graph inference,” Ph.D. dissertation, Johns Hopkins University, 2009. 1 [3] B. Bollobás, Random Graphs. Academic Press, 1985. 1, 4 [4] J. Galambos, The Asymptotic Theory of Extreme Order Statistics. John Wiley & Sons, 1987. 2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI) [5] W. Feller, An Introduction to Probability Theory and Its Applications, 2nd ed. John Wiley and Sons, 1971. 2 [6] H. Rubin and J. Sethuraman, “Probabilities of moderate deviations,” Sankhyā: The Indian Journal of Statistics, Series A, vol. 27, pp. 325–346, 1965. 2 [7] A. Rukhin and C. E. Priebe, “On the limiting distribution of a graph scan statistic,” Communications in Statistics: Theory and Methods (in press). 2, 3, 4 5