Bursty and Hierarchical Structure in Streams

advertisement
BURSTY AND HIERARCHICAL
STRUCTURE IN STREAMS
Jon Kleinberg
ACM’ 02
OUTLINE.
Introduction.
 Preliminary.
 Automaton model.
 Experiment.
 Conclusion.

INTRODUCTION

Motivation
Stream has many relate work.
 Text mining, topic detection and tracking and visualization.
 The “Bursts” may useful?

PRELIMINARY
n + 1 messages that arrive over a period of time of length T.
 Gaps of size : ĝ = T/n
 exponential density function : f(x) = αе-ax

AUTOMATON MODEL

Two-state model
Using a probabilistic automaton A
 A with two states q0 and q1, we can think of “low” and “high”
 q0 : f0(x) = α0е-a0x
 q1: f1(x) = α1е-a1x, α 1> α0


A changes state with probability p∈(0, 1)

A begins in state q0. Before each message is emitted, A changes state
with probability p.
n+1 message, gaps x = (xl, x2,... , xn)
 state sequence q = (qi1,…..,qin)
 fq(xl ,.....,xn) = ∏tn=1 fit(xt)

AUTOMATON MODEL

b denotes the number of state transitions in the sequence q


the number of indices it so that qit ≠ qit+1
probability of q :

Pr[q|x] =

Z is the constant

-ln Pr[q|x] =

AUTOMATON MODEL

Infinite-state model
Bursts of greater and greater intensity would be associated
with gaps smaller and smaller than ĝ.
 αi = ĝ-1 si, where s > 1 is a parameter.

f (x) = αiе-aix
 i
For every i and j, there is a cost τ(i , j) associated with a state
transition from qi to qj.
 When j > i, moving from qi to qj incurs a cost of (j - i)γInn,
where γ > 0 is a parameter; and when j < i, the cost is 0.
 This automaton, with its associated parameters s and γ will be
denoted A*s, γ.


AUTOMATON MODEL

Computing a minimum-cost state sequence
Finding a state sequence q = (qi1, .... ,qin) in A*s, γ that
minimizes the cost c(q|x). Such a sequence will be called
optimal.
 A natural number k for q0, q1,... , qk-1 from A*s, γ and denotes
the k-state automaton by Aks, γ .


two-state automaton A2s, γ


Let q* = (ql1,…,qln) be an optimal state sequence in Aks, γ
Let q = (qi1,…,qin) be an arbitrary state sequence in A*s, γ
The goal is to show that c (q*|x) ≤ c (q|x).
AUTOMATON MODEL
If q does not contain any states of index greater than k-1, this
inequality follows from the fact that q* is an optimal state
sequence in Aks, γ .
 Otherwise….
 q' = (qi1',.... , qit') where it' = min(it, k-1).



Since q' is a state sequence in Aks, γ , and since q* is an optimal
state sequence for this automaton, it follows that
c(q*|x) ≤ c(q'|x) ≤ c(q|x)
EXPERIMENT
EXPERIMENT
CONCLUSION
Download