supplement - Johns Hopkins University

advertisement
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
1
Attribute fusion in a latent process model for time series
of graphs
Minh Tang, Youngser Park, Nam H. Lee, and Carey E. Priebe
F
We can thus decompose Var[τ∗λ (G (t ))] as
X
X
Var[τ∗λ (G (t ))] = S 21 Var[
Yu v ] + S 22 Var[
Yu v ]
Appendix
Proofs of some stated results
{u ,v }∈S1
Corollary 2: The maximizer of µλ also maximizes
µ2λ =
+ S 23 Var[
λT ζζT λ
λT ξλ
ν T ξ−1/2 ζζT ξ−1/2 ν
νT ν
The claim then follows directly from the Rayleigh-Ritz
theorem for Hermitian matrices.
Lemma 3: τλ (G ) is a U-statistics with kernel function
h(Y1 , Y2 , Y3 ) = Y1 Y2 Y3 . By the theory of U-statistics, we know
that
τλ (G ) − E[τ∗λ (G )] d
−→ N (0, 1)
(1)
p
Var[τ∗λ (G )]
provided that Var[τλ (G ) − τ∗λ (G )] = o(Var[τ∗λ (G )]).
(3)
and
h X
i
Var[τ∗λ (G (t ))] = Var
Yu v E[Yu w ]E[Yv w ]
{u ,v,w }
hX
i
= (n − 2)2 ⟨λ, π00 ⟩4 Var
Yu v
The above expression is reasoned as follows. If w ∈ [m ],
then E[Yu w ] = E[Yv w ] = ⟨λ, π11 ⟩ and there are m −2 possible
choices for w ∈ [m ] different from u and v . If w ∈ [n ]\[m ],
then E[Yv w ] = E[Yu w ] = ⟨λ, π10 ⟩ and there are n −m possible
choices for w . Analogous reasoning gives the expressions
for S 2 and S 3 in the statement of the lemma.
We also have
Var[Yu v ] =
+
(4)
n
⟨λ, π00 ⟩4 2 ⟨λ, η00 λ⟩.
We now sketch the derivation of Var[τ∗λ (G (t ))] for t = t ∗ . We
V
partition the set {u , v } ∈ 2 into the sets
S1 = {u , v ∈ [m ]},
S2 = {u ∈ [m ], v ∈ [n ] \ [m ]},
S3 = {u , v ∈ [n ] \ [m ]}.

⟨λ, η00 λ⟩ if {u , v } ∈ S1
⟨λ, η01 λ⟩ if {u , v } ∈ S2 .

⟨λ, η11 λ⟩ if {u , v } ∈ S3
(7)
m
⟨λ, η00 λ⟩S 21 + m (n
2
n −m ⟨λ, η00 λ⟩S 23
2
− m )⟨λ, η01 λ⟩S 22
as desired. To complete the proof one must show that
Var[τλ (G ) − τ∗λ (G )] = o(Var[τ∗λ (G )]) and this follows directly
from the argument in [1] or [2].
Proposition 5: Let v ∈ V (t ) and denote by d λ (v ; t ) the
(fused) degree of vertex v , i.e.,
X
d λ (v ; t ) =
⟨λ, Γv w ⟩.
w ∈N (v )
{u ,v }
= (n − 2)
(6)
w 6=u ,v
= ((m − 2)⟨λ, π11 ⟩2 + (n − m )⟨λ, π10 ⟩2 )Yu v .
(2)
E[h(Yi , Yj , Yk )|Yi ] = Yi E[Yj ]E[Yk ].
n
⟨λ, π00 ⟩3
3
Now, for {u , v } ∈ S1 , we have
X
S 1 Yu v =
E[h(Yu v , Yu w , Yv w ) | Yu v ]
Var[τ∗λ (G (t ))] =
E[h(Yi , Yj , Yk ) = E[Yi ]E[Yj ]E[Yk ]
2
(5)
and thus
By the independent edge assumption, we have
Thus, for t < t ∗ , we have E[τ∗λ (G (t ))] =
{u ,v }∈S2
Yu v ]
{u ,v }∈S3
Because ξ is positive definite, there exists a positive definite
matrix ξ1/2 such that ξ1/2 ξ1/2 = ξ. Letting ν = ξ1/2 λ, the
above expression can be rewritten as
µ2λ =
X
For t <
each of the Γv w is a multinomial trial with
probability vector π00 . The following statements are made
as n → ∞ for fixed K . By the central limit theorem, we have
t ∗,
d λ (v ; t ) − (n − 1)⟨λ, π00 ⟩ d
−→ N (0, 1).
p
(n − 1)⟨λ, η00 λ⟩
(8)
We can thus consider the degree sequence of G (t ) for t < t ∗
as a sequence of dependent normally distributed random
variables. By an argument analogous to the argument for
Erdös-Renyi random graphs in [3, §III.1] we can show that
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
the dependency among the {d λ (v ; t )}v ∈V (t ) can be ignored.
Another way of doing this is to note that the covariance
between X u and X v , where X u and X v are the ratio in Eq. (8)
for vertices u and v , is given by
r = Cov(X u , X v ) = p
3⟨λ, π00 ⟩
(n − 1)⟨λ, η00 λ⟩
.
Let ζv =
∆λ (t )−(n −1)⟨λ,π00 ⟩
p
(n −1)⟨λ,η00 λ⟩
−x
≤ a n + b n x → e −e .
Theorem 6: Let X ∼ G (α, β ). We consider the normalX −µ
ization σ . We have
P
h
X −µ
σ
i
−(z σ+µ−α)/β
≤ z = P[X ≤ z σ + µ] = e −e
= e −e
(9)
X −µ
Because r log n → 0 as n → ∞, the sample maximum of the
X u converges to the sample maximum of a sequence of
independent N (0, 1) random variables. d λ (v ; t ), can thus be
considered as a sequence of independent random variables
from a normal distribution. It is well known that the sample
maximum of standard normal random variables converges
weakly to a Gumbel distribution [4, §2.3]. It is, however,
not clear whether the convergence of ∆λ (t ) to a Gumbel
distribution continues to hold under the composition of
weak convergence as outlined above. We avoid this problem
by showing directly that
P
2
(10)
−(z −(α−µ)/σ)/(β /σ)
.
α−µ β
Thus, σ ∼ G ( σ , σ ). Because the sample mean and the
sample variance are consistent estimators, the claim follows
after an application of Slutsky’s theorem.
Lemma 7: Let φλ (v ; t ) = ψλ (v ; t ) − d λ (v ; t ) be the
(fused) locality statistics for vertex v at time t not including
the (fused) degree of v , i.e.,
X
⟨λ, Γu w ⟩.
(13)
φλ (v ; t ) =
u w ∈N (v )
u ,w 6=v
The following statements are conditional on |N (v )| = l . First
of all, we have
K
X
φλ (v ; t ) =
λk z k
k =1
d λ (v ;t )−(n−1)⟨λ,π00 ⟩
p
and Fn (u ) = P(ζv ≤ u ). If n → ∞
p(n−1)⟨λ,η00 λ⟩
and u = O( log n ), we have the following moderate deviations result [5], [6, Theorem 2, §XVI.7].
By the central limit theorem, we have
i
1 − Fn (u ) h
3
6
= 1 + (C pu n ) +O( un )
1 − Φ(u )
(11)
for some constant C . Letting u n = a n + b n x in Eq. (11), we
have
u n6
))
n
3
6
u
u
= Φ(u n ) + (1 − Φ(u n ))(C pnn +O( nn ))
u3
u6
= Φ(u n ) +O( u n11−δ )(C pnn +O( nn ))
n
u n5
= Φ(u n ) +O( n 3/2−δ
)
C 00 =
(12)
u n5
= (Φ(u n )) +O( n 1/2−δ )
−x
→ e −e .
Eq. (10) is established and we obtain the limiting Gumbel
distribution for ∆λ (t ) for t < t ∗ .
The case when t = t ∗ can be p
derive in a similar manner.
d
We first show that if m = Ω( n log n ) then ∆λ (v ; t ∗ ) −→
∗
maxv ∈[m ] d λ (v ; t ) [7, Lemma 3.1]. We then show, again by
d (v ;t ∗ )−µ
d
2
the central limit theorem, that for v ∈ [m ], λ σ
−→
2
N (0, 1). It then follows, similar to our previous reasoning for
d (v ;t ∗ )−µ
d
⟨λ(2) ,π00 ⟩
,
⟨λ,π00 ⟩
p 00 =
(⟨λ,π00 ⟩)2
.
⟨λ(2) , π00 ⟩
(14)
l
We note that p 00 ∈ [0, 1]. Now let Yl = C 00 Bin( 2 , p 00 ). Then
l
l
E[Yl ] = 2 ⟨λ, π00 ⟩ and Var[Yl ] = 2 ⟨λ, η00 λ⟩ and again by
the central limit theorem, we have
l
l
ψλ (v ; t ) − 2 ⟨λ, π00 ⟩ d Yl − 2 ⟨λ, π00 ⟩
−→ Æ .
(15)
Æ l
l
⟨λ,
η
λ⟩
⟨λ,
η
λ⟩
00
00
2
2
for some sufficiently small δ > 0. We therefore have
P(max ζ(v ) ≤ u n ) = (Fn (u n ))n
v ∈[n]
h
in
u n5
= Φ(u n ) +O( n 3/2−δ
)
l
φλ (v ; t ) − 2 ⟨λ, π00 ⟩ d
−→ N (0, 1).
Æ l
⟨λ, η00 λ⟩
2
Let λ(2) be the element-wise square of λ. Define C 00 and
p 00 to be
u3
Fn (u n ) = 1 − (1 − Φ(u n ))(1 + C pnn +O(
n
where the (z 1 , . . . , z K ) are distributed as
l
(z 1 , z 2 , . . . , z K ) ∼ multinomial 2 , π00 .
2
the case where t < t ∗ , that maxv ∈[m ] λ σ
−→ G (a m ,b m )
2
and we obtain the limiting Gumbel distribution for ∆λ (t )
for t = t ∗ .
Eq. (15) states that the locality statistics for our attributed
random graphs model with t < t ∗ can be approximated by
the locality statistics for an Erdös-Renyi graph with edge
probability p 00 . The lemma then follows from Theorem 1.1
in [7].
Lemma 8: For ease of exposition we drop the index
t ∗ from our discussion. Let φλ (v ) = ψλ (v )−d λ (v ). Let M (v )
be the number of neighbors of v that lies in [m ] and W (v )
be the number of neighbors of v that lies in [n ] \ [m ].
The following statements are conditional on M (v ) = l ζ and
W (v ) = l ξ . We have
φλ (v ) =
K
X
k =1
(ζ)
(ξ)
(ω)
λk (y k + y k + y k )
(16)
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
(ζ)
(ζ)
(ξ)
(ξ)
(ω)
(ω)
where (y 1 , . . . , y K ), (y 1 , . . . , y K ), (y 1 , . . . , y K ) are distributed as
(ζ)
(ζ)
l
(y 1 , . . . , y K ) ∼ multinomial 2ζ , π11
(ξ)
(ξ)
l
(y 1 , . . . , y K ) ∼ multinomial 2ξ , π00
(ω)
(y 1 , . . . , y m(ω) ) ∼ multinomial l ζ l ξ , π10 .
3
Let us define h(v ) = E[Y (v )], i.e.,
D(v )
h(v ) = C 00 p 00 2 + (C 11 p 11 − C 00 p 00 )
+ (C 10 p 10 − C 00 p 00 )M (v )W (v ).
We then have
P(Υ(G ) ≥ a n ,m ) = P
[
=P
[
Y (v ) ≥ a n,m , D(v ) ≥ d n,m
v ∈V (G )
l l ρ = ⟨λ, 2ζ π11 + 2ξ π00 + l ζ l ξ π10 ⟩
l
l ς = ⟨λ, 2ζ η11 + 2ξ η00 + l ζ l ξ η10 λ.⟩
≤ P1 + P2
where
By the central limit theorem, as l ζ → ∞ and l ξ → ∞
(17)
λ(2)
Let
be the element-wise square of λ. Define C 00 , C 01 ,
C 11 and p 00 , p 01 , p 11 to be
C 00 =
⟨λ(2) ,π00 ⟩
;
⟨λ,π00 ⟩
p 00 =
(⟨λ,π00 ⟩)2
⟨λ(2) , π00 ⟩
(18)
C 11 =
⟨λ(2) ,π11 ⟩
;
⟨λ,π11 ⟩
p 11 =
(⟨λ,π11 ⟩)2
⟨λ(2) , π11 ⟩
(19)
C 10 =
⟨λ(2) ,π11 ⟩
;
⟨λ,π10 ⟩
p 10 =
(⟨λ,π10 ⟩)2
⟨λ(2) , π10 ⟩
(20)
We note that p 00 , p 01
p11 are all elements
of[0, 1].
, and
l
lζ
Now let Yζ ∼ C 11 Bin 2 , p 11 , Yξ ∼ C 00 Bin 2ξ , p 00 and
Yω ∼ C 10 Bin l ζ l ξ , p 10 . We also set Y = Yζ + Yξ + Yω . By the
central limit theorem, we have
φλ (v ) − ρ d Y − ρ
−→
.
ς
ς
Y (v ) ≥ a n,m
v ∈V (G )
Let ρ and ς be defined as
φλ (v ) − ρ d
−→ N (0, 1)
ς
M (v )
2
ϑn = C 00
h n
P1 = P(
[
2
i1/2
log n
p 00 (1 − p 00 )
D(v ) ≥ d n ,m , h(v ) ≥ a n ,m − ϑn )
v ∈V (G )
P2 = P(
[
D(v ) ≥ d n,m , Y (v ) − h(v ) ≥ ϑn ).
v ∈V (G )
We now show that P2 is negligible as n → ∞. To proceed,
let A be the event {M (v ) = e , W (v ) = f } and let p e , f = P(A).
P2 can then be bounded as follows
X
P2
≤
P(Y (v ) − h(v ) ≥ ϑn | A)p e , f
n e + f ≥d
n ,m
X
Y (v )−h(v )
ϑn
P Var[Y
=
≥
A p e ,f
(v )]1/2
Var[Y (v )]1/2
e + f ≥d n ,m
X
≤
(1 + o(1))P(Z ≥ Θ(log n ))p e , f
e + f ≥d n ,m
= o(n −1 ).
(21)
Eq. (21) states that the locality statistics φλ (v ) for
our attributed random graphs model at time t = t ∗
can be approximated by the locality statistics Y (v ) for
an unattributed kidney and egg model. The limiting
distribution for the scan statistics in unattributed kidneyegg graphs had previously been considered in [7]. We
provided a sketch of the arguments from [7] below, along
with some minor changes to handle the case where the
probability of kidney-kidney and kidney-egg connections
are different.
Let G be an instance of κ(n , m , p 11 , p 10 , p 00 ), an unattributed
kidney-egg graph with the probability of egg-egg, eggkidney, and kidney-kidney connections being p 11 , p 10 , and
p 00 , respectively. D(v ) = M (v ) + W (v ) is then the degree
of v in G . We now show two inequalities relating the tail
distribution of ∆(G ) and Υ(G ) = maxv ∈V (G ) Y (v ).
We now consider P1 . We note that P1 ≤ R 1 + R 2 where
[
R1 = P
D(v ) ≥ d n ,m , h(v ) ≥ a n,m − ϑn ,
v ∈[m ]
R2 = P
[
D(v ) ≥ d n ,m , h(v ) ≥ a n ,m − ϑn .
v ∈[n]\[m ]
D(v )
Let us define g (v ) = h(v )−C 00 p 00 2 . R 1 is then bounded
as follows
[
h(v ) ≥ a n,m − ϑn
R1 ≤ P
v ∈[m ]
≤P
[
D(v ) ≥
v ∈[m ]
q
2(a n ,m −ϑn −g (v ))
C 00 p 00
.
(24)
We now consider the term a n ,m − g (v ). We have
N a n ,m − g (v ) = C 00 p 00 2κ
µ M (v )
+ (C 11 p 11 − C 00 p 00 )( 2E − 2 )
+ (C 10 p 10 − C 00 p 00 )(µE µF − M (v )W (v )).
lim sup P(Υ(G ) ≥ a n ,m ) ≤ lim P(∆(G ) ≥ N κ ),
(22)
Let E and F be sets of vertices defined by
lim inf P(Υ(G ) ≥ a n ,m ) ≥ lim P(∆(G ) ≥ N κ ).
(23)
E = {v : |M (v ) − µE | ≤ σE log m }
(25)
F = {v : |W (v ) − µF | ≤ σF log (n − m )}.
(26)
∗
p Eq. (22): Let C = max{C 11 ,C 10 ,C 00 } and d n ,m =
2a n ,m /C ∗ . We first note that
Υ(G ) ≥ a n,m ⇒ C ∗
D(v )
2
≥ a n,m ⇒ D(v ) ≥ d n,m .
Then we have, for v ∈ E ∩ F
a n ,m − g (v ) = C 00 p 00
Nκ + Θ(m 3/2 log m )
2
p
+ Θ(m n − m )
(27)
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
p
When m = Ω( n log n ), Eq. (27) gives
C p
a n,m − g (v ) = N κ2 002 00 +O(n −1/2−a log n ) .
(28)
for some a > 0. The set {v ∈ [m ]} can be partition into
{v ∈ [m ] ∩ (E ∩ F)} and {v ∈ [m ] \ (E ∩ F)}. We can show that
P{v ∈ [m ]\(E∩F)} = o(1) by using a concentration inequality,
e.g., Hoeffding’s bound. We thus have
[
Æ
log n
R1 ≤ P
D(v ) ≥ N κ 1 +O( n 1/2+a ) + o(n −1 )
v ∈[m ]
v ∈E∩F
=P
[
D(v ) ≥ N κ +O(n 1/2−a log n) + o(n −1 )
4
As Var[Y (v )] = Θ(N κ ) for {M (v ), W (v )} ∈ S 3 , we have
X
pS3 =
P(Y (v ) < a n ,m )p e , f
{e , f }∈S 3
≤
≤
The same argument can be applied to R 2 to show that
[
R2 ≤ P
D(v ) ≥ N κ (1 + o(1)) = o(1).
(30)
v ∈[n ]\[m ]
h
P Z ≤O
X
(37)
m 3/2 log m
Nκ
i
− ϕ(n ) p e , f .
We now set a = 2(k1+1) . Then for m = O(n k /(k +1) ) and ϕ(n ) =
O(n 1/2−a ) we have
m 3/2 log m
Nκ
− ϕ(n ) = −O(n k /2(k +1)) )
(38)
Y (v ) ≥ a n,m , D(v ) ≥ N κ
Thus P(Y (v ) < a n,m , D(v ) ≥ N κ ) → 0 as desired.
From Eq. (22) and Eq. (23), we have
(40)
N κ,y µ + ⟨λ, π11 − π00 ⟩ 2E + ⟨λ, π10 − π00 ⟩µE µF .
2
The above expression is equal to
σ2E +F
σE +F a n ,m + ⟨λ, π00 ⟩y p
Nκ + y p
+O(1) . (41)
2 log m
2 2 log m
v ∈[m ]
We thus have
Y (v ) < a n ,m , D(v ) ≥ N κ .
v ∈[m ]
a n,m ,y = a n ,m + (y + o(1))b n,m .
We now show that P(∪v ∈[m ] Y (v ) < a n,m , D(v ) ≥ N κ ) → 0 as
n → ∞. Let v ∈ [m ] be arbitrary. It is then sufficient to
show that m P(Y (v ) < a n ,m , D(v ) ≥ N κ ) = o(1). We note that
P(Y (v ) < a n ,m , D(v ) ≥ N κ ) can be rewritten as
X
P(Y (v ) ≤ a n ,m | M (v ) = e , W (v ) = f )p e , f .
(31)
We therefore have
lim P(Υ(G ) ≥ a n,m ,y ) = lim P
We now split the indices set e + f ≥ N κ in Eq. (31) into three
parts S 1 , S 2 and S 3 , namely
S 1 = {e ≥ µE + σE log m }
(32)
S 2 = {e ≤ µE + σE log m , e + f ≤ N κ + ϕ(n )}
(33)
S 3 = {e ≤ µE + σE log m , e + f ≥ N κ + ϕ(n )}
(34)
where ϕ(n ) = Θ(n 1/2−a ) for some a > 0. We can then show
that m P(M (v ) = e , W (v ) = f , {e , f } ∈ S 1 ) = o(1) by applying
a concentration inequality. Similarly, e + f ≥ N κ and e ≤
µE + σE log m implies that
(35)
and once again, by a concentration inequality, we can show
that m P(M (v ) = e , W (v ) = f , {e , f } ∈ S 2 ) = o(1). As for S 3 ,
from the fact that e + f ≥ N κ + ϕ(n ), we have the bound
log m a n ,m − h(v ) ≤ (C 11 p 11 − C 10 p 10 )[m σE log m + 2 ]
(36)
− C 00 p 00 N κ ϕ(n ).
Υ(G ) − a
n ,m
≥y
b n ,m
= lim P(∆(G ) ≥ N κ,y )
∆(G ) − N
y
κ
≥p
.
= lim P
σE +F
2 log m
e + f ≥N κ
f ≥ µF + (z m − o(1))σF
(39)
{e , f }∈S 3
⟨λ, π00 ⟩
v ∈[m ]
[
pe,f
2 log m
v ∈[n ]
−P
a n,m −h(v )
Var[Y (v )]1/2
Let N κ,y = N κ + y pσE +F . We now define a n ,m ,y as
Eq. (23): We start by noting that
[
P(Υ(G ) ≥ a n,m ) = P
Y (v ) ≥ a n ,m
D(v ) ≥ N κ
≤
lim P(Υ(G ) ≥ a n,m ) = lim P(∆(G ) ≥ N κ ).
Eq. (22) is therefore established.
≥P
Y (v )−h(v ))
Var[Y (v )]1/2
which then implies
i
X h
m pS3 ≤ m
P Z ≤ −O(n k /2(k +1) p e , f = o(1).
→ P(∆ ≥ N κ ).
[
{e , f }∈S 3
(29)
log n
= P ∆ ≥ µE +F + σE +F (z m +O( n a )) + o(n −1 )
≥P
P
{e , f }∈S 3
v ∈[m ]
[
X
Because ∆(G ) converges weakly to a Gumbel distribution
in the limit ([3], [7]), we have
Υ(G ) − a
−y
n ,m
P
≤ y → e −e .
(42)
b n ,m
References
[1] K. Nowicki and J. C. Wierman, “Subgraph counts in random graphs
using incomplete U-statistics methods,” Discrete Mathematics, vol. 72,
pp. 299–310, 1988. 1
[2] A. Rukhin, “Asymptotic analysis of various statistics for random graph
inference,” Ph.D. dissertation, Johns Hopkins University, 2009. 1
[3] B. Bollobás, Random Graphs.
Academic Press, 1985. 1, 4
[4] J. Galambos, The Asymptotic Theory of Extreme Order Statistics. John
Wiley & Sons, 1987. 2
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
[5] W. Feller, An Introduction to Probability Theory and Its Applications,
2nd ed. John Wiley and Sons, 1971. 2
[6] H. Rubin and J. Sethuraman, “Probabilities of moderate deviations,”
Sankhyā: The Indian Journal of Statistics, Series A, vol. 27, pp. 325–346,
1965. 2
[7] A. Rukhin and C. E. Priebe, “On the limiting distribution of a graph
scan statistic,” Communications in Statistics: Theory and Methods (in
press). 2, 3, 4
5
Download