Distributed Top-K Monitoring
Brian Babcock & Chris Olston
Presented by Yuval Altman
To be presented at ACM SIGMOD 2003 International
Conference on Management of Data
1
The problem
Continuously report the k largest
values obtained from
distributed data streams.
2
Motivation Google is the most popular search engine in
the world.
Servers in multiple sites in the world handle
millions of queries an hour.
What are the top 20 search terms?
3
The problem
Continuously report the k largest values obtained
from distributed data streams.
Multiple sources - physically far away
 Communication is expensive.
 Inefficient to transmit large amounts of data
Streaming model
 Values change over time
Approximation may be sufficient
4
Motivation – Detecting DDos attacks
5
Formal problem definition
m+1 nodes:
Monitor nodes: N1, N2 , … , Nm
 Coordinator node: N0

Set of n data objects U = {O1, O2 , … , On}

i.e. Search terms, IP addresses
Objects are associated with real values
V1, V2 , … , Vn

i.e. # of requests DNS queries to IP address in
last 15 minutes
6
Distributed streaming model
Updates to values through a sequence of
< Oi , Nj , > touples where:
Nj detects a change  in the value Vi of Oi.
 Change is not seen by other nodes Nk (ki)

For each node j, Define Partial values
V1,j, V2,j,…, Vn,j: Vi,j =  < Oi , Nj , > ()
The value Vi for an object Oi:
Vi= j (Vi,j)
7
Model example
U = {O1, O2 , O3 , O4}
< O1 , N1 , 2>
< O2 , N1 , 3>
< O4 , N1 , 4>
< O3 , N1 , 2>
< O1 , N1 , 1>
V1,1 = 3
V2,1 = 3
V3,1 = 2
V4,1 = 4
N1
< O2 , N2 , 3>
< O4 , N2 , 5>
< O4 , N2 , -2>
< O3 , N2 , 4>
< O3 , N2 , 5> N2
V1,2 = 0
V2,2 = 3
V3,2 = 9
V4,2 = 3
< O2 , N3 , -1>
< O3 , N3 , 4>
< O2 , N3 , 2>
< O3 , N3 , 3>
< O2 , N3 , 5> N3
V1,3 = 0
V2,3 = 6
V3,3 = 7
V4,3 = 0
V1=3 , V2=12 , V3=18 , V4=7
8
Using the model
Top-k IP addresses in the last 15 minutes:
<IPAddr,Router,1> when receiving a request
for an IP address.
 A cancelling <IPAddr,Router,-1> 15 minutes
afterwards

Can Adopt a different strategy:
<IPAddr, Router, 15> when receiving a request.
 <IPAddr, Router, -1> 15 times on the minute

9
The problem
The coordinator node N0 must
report a set TU, |T|=k, that
represents the top-k data objects.
Must be the correct within .
Formally. If OtT and OsU-T :
Vt+   VS
Example
=5
100
97
95
92
90
88
87
83
80
75
10
Related work
One time distributed top-k calculation


Bruno, Gravano, Marian 2002
Fagin, Lotem, Naor 2001
Much better than transmitting all the values to
coordinator node
Not streaming


no means to detect changes to data
Running algorithm continuously is very expensive
Monitor nodes have limited query capabilities

Sorted (GetNext) and random (GetValue)
11
Related work
Streaming top-k monitoring from single source



Charikar, Chen, Farach-Colton 2002
Manku, Motwani 2002
Gibbons, Matias 1998
Randomized Algorithms

Focus on minimizing space
Reminder: The objective is to minimize
communication costs
12
Overview of algorithm
Initialize a top-k set at the coordinator node
Set arithmetic constraints at monitor nodes

Depend on current top-k set
Constraints valid  No communications
Constraints invalidated  Resolution
Possibly new top-k set
 Reallocation of constraints

13
Choosing the constraints
Ideally, data is distributed evenly at monitor
nodes, such that the top-k sets are the same
In this case, the global top-k set matches the local
local top-k sets

It suffices that local constraints remain valid
N1 (US)
N2 (Germany)
N3 (Japan)
Global List
Money=100
Sex=98
Health=94
Mail=92
Sex=30
Money=20
Mail=5
Health=3
Money=50
Sex=5
Mail=4
Health=1
Money=170
Sex=133
Mail=101
Health=98
14
Adjustment factors
In real life, data is not distributed evenly
<N1,Sex,-8>
<N3,Health,5>
N1 (US)
N2 (Germany)
N3 (Japan)
Global List
Money=100
Health=94
Mail=92
Sex=90
Sex=30
Money=20
Mail=5
Health=3
Money=50
Health=6
Sex=5
Mail=4
Money=170
Sex=125
Health=103
Mail=101
Local constraints are invalidated, but global
top-k still valid
15
Adjustment factors
For each node Nj and object Oi associate an
adjustment factor i,j
Constraints are evaluated after adding the
adjustment factors

If OtT and OsU-T : Vt,i+  t,i  Vs,i +  t,i
Adjustment factors for each object sum to
zero:

This ensures sum remains valid
16
Adjustment factors example
N1 (US)
N2 (Germany)
N3 (Japan)
Global List
Money=100
Health=94
Mail=92
Sex=90
Sex=30
Money=10
Mail=5
Health=3
Money=50
Health=6
Sex=5
Mail=4
Money=170
Sex=125
Health=103
Mail=101
N1 (US)
N2 (Germany)
N3 (Japan)
Global List
Money=100
Sex=100
Health=94
Mail=92
Money=20
Sex=15
Mail=5
Health=3
Money=50
Sex=10
Health=6
Mail=4
Money=170
Sex=125
Health=103
Mail=101 17
Sex,1=10, Sex,2=-15, Sex,3=5
Coordinator adjustment factor
For each object Oj add an adjustment factor
j,0 at the coordinator node

Factors for each object Oj must still sum to 0
To allow error, if OtT and OsU-T :
Give Ot values a “bonus” of 
 Let Vt,0 =  Vs,0 = 0
 The constraint:  t,0+    s,0

18
Allowing error – example
 =5
<N3,Health,40>
N1 (US)
N2 (Germany)
N3 (Japan)
Global List
Money=100
Sex=98
Health=94
Mail=92
Sex=30
Money=20
Mail=5
Health=3
Money=50
Health=41
Sex=5
Mail=4
Money=170
Health=138
Sex=133
Mail=101
sex,1=-4, 2,sex,2=-25, sex,3=29
health,2=2, health,3=-7
The trick: Health,0 =5
sex,0 + 5  health,0
19
Why do adjustment factors work?
For OtT and OsU-T :
As long as for each node Ni the adjusted
constraints and the coordinator constraint are
valid:


Vt,i+  t,i  Vs,i +  t,I
 t,0+    s,0
We can sum for the nodes and the error constraint
and get:
Vt+   Vs
20
Algorithm details
Coordinator node No maintains
Current approximate Top-k set
 All adjustment factors i,j

Each monitor node Nj maintains
Current approximate top-k set
 For each object Oi

Partial value: Vi,j
 Relevant adjustment factor: i,j

21
Algorithm details
Initialization. Coordinator:



Computes the approximate top-k set once.
Chooses adjustment factors
Sends adjustment factors and top-k set to monitors
Monitor node constraints:

For OtT and OsU-T : Vt,j+  t,j  Vs,j +  t,j
Adjustment factor constraints:


For each object Oi:  j (i,j) = 0
For objects OtT and OsU-T:  t,0+    s,0
22
Algorithm for monitor node Nj
Algorithm for monitor node Nj
While (1)




Read tuple < Oi , Nj , >
Vi,j = Vi,j + 
Check constraints: For OtT and OsU-T :
Vt,j+  t,j  Vs,j +  t,j
If invalid, initiate resolution.
End
To check constraints: Use two Heaps (or Fibheaps)
23
Resolution – phase 1
First, Nj sends a message
to N0 with:
N3 (Japan)
Money=50
Mail=10
Sex=5
F - The set of objects
involved in violated
Health=1
constraints
Love=0
 All partial values for
F3 = {Mail, Sex}
objects in R = FT
R3 = {Money,Mail, Sex}
 The border value Bf Vmoney,3 = 50
Maximum adjusted value
Vmail,3 = 10
Vsex,3 = 5
not in the resolution set

B3 = 1
24
Resolution – phase 2
The coordinator N0 attempts to resolve the
constraints using the  *,0 slack
For each violated constraint N0 tests:

Vt,j+  t,j +  t,0 +   Vs,j +  s,j +  s,0
If all tests succeed, the top-k set is valid, and
there’s no need to communicate with other nodes.


No reallocates adjustment factors.
Resolution is over
If at least one test fails, proceed to phase 3
25
Phase 2 resolution example
 =5
Money=100
Sex=98
Mail=96
Health=92
*,* =0
Money=35
Sex =20
Mail=5
Health=3
Money=50
Sex=5
Mail=4
Health=1
Money=185
Sex=123
Mail=105
Health=96
Money=50
Sex=5
Mail=4
Health=1
Money=185
Sex=123
Mail=122
Health=96
<N2,Mail,17>
Money=100
Sex=98
Mail=96
Health=92
Money=35
Mail=22
Sex =20
Health=3
To fix: sex,0 =-2 sex,2 =2
26
Phase 2 resolution failure
<N2,Sex,5>
Money=100
Sex=98
Mail=96
Health=92
sex,0 =-2 sex,2 =2
Money=35
Sex =27
Mail=22
Health=3
Money=50
Sex=5
Mail=4
Health=1
Money=185
Sex=128
Mail=122
Health=96
Money=35
Sex =27
Mail=22
Health=3
Money=50
Mail=9
Sex=5
Health=1
Money=185
Sex=128
Mail=127
Health=96
<N3,Mail,5>
Money=100
Sex=98
Mail=96
Health=92
Can’t “loan” 4 from sex,0
27
Resolution – phase 3
The coordinator N0 contacts all the nodes Ni
excluding Nj, requesting:


Partial values for objects in R = FT
Border values Bi
N0 sums the partial values and sorts them to
compute new top-k list T’
N0 reallocates new adjustment factors for T’
N0 sends T’ and adjustment factors to all nodes
28
Resolution – summary
Phase 1 - Nj detects failed constraints and
notifies N0. Initiates resolution for R = FT
Phase 2 – N0 attempts to resolve constraints
using  *,0 – the “bank”

If success, reallocate adjustment factors & stop
Phase 3 - N0 requests all updated partial
values for R, sorts, computes new top-k list

Reallocate adjustment factors
29
Resolution Performance
Means to measure algorithm performance
Messages are usually small

Only resolution set R = FT is involved
Two phase resolution
Initiation + reallocation
 Only two messages

Three phase resolution
Initiation + Query + reallocation
 1 + 2(m-1) + m = 3m –1

30
Adjustment factor reallocation
Input:
top-k list T’
 Partial values in resolution set R
 Border values

Money=50
Mail=10
Sex=5
Health=1
Love=0
F = {Mail, Sex}
R = {Money,Mail, Sex}
 New adjustment factors i,j
Vmoney = 50
Method - For each object:
Vmail = 10
Vsex = 5
 Meet border value constraints
B= 1
Output
Calculate leeway
 Distribute leeway evenly

31
Leeway computation
For each object in R compute leeway  :
the slack above the sum of border values
Define:
Sum of border values: B = j (Bj)
 Computed values: Vi = j (Vi,j)
 Vi,0 = 0 ; Bj = max (i,0) where Oi not in R

If Oi  T’ : i= Vi – B + 
Otherwise : i= Vi – B
32
Leeway computation example
N1 (US)
N2 (Germany)
N3 (Japan)
Global List
Money=100
Sex=98
Sex=30
Money=20
Mail=5
Money=50
Mail=10
Sex=5
Money=170
Sex=133
Mail=107
Love = 5
Health=3
Health=1
Love=0
Health=98
Love=90
Health=94
Mail=92
Love = 85
B = 94+5+1 = 100
money = 170 – B = 70
sex = 133 – B = 33
Mail = 107 – B = 7
 =0
33
Leeway distribution
Initialization: Meet constraints


i,j = Bj - Vi,j
For Oi  T’ , j = 0 : i,0 = B0 - 
Leeway distribution:

i,j = i,j + (i / m)
Correctness: Vt,j+  t,j  Vs,j +  t,j
If Os  R: follows from Vt,i, > Bi
 If Os  R: follows from t,i > s,i

34
Leeway distribution example
N1 (US)
N2 (Germany)
N3 (Japan)
Global List
Money=100
Sex=98
Sex=30
Money=20
Mail=5
Money=50
Mail=10
Sex=5
Money=170
Sex=133
Mail=107
Love = 5
Health=3
Health=1
Love=0
Health=98
Love=90
Health=94
Mail=92
Love = 85
sex = 33
sex,1 = B1 – Vsex,1 + 33/3 = 94 – 98 + 11 = 7
sex,2 = B2 – Vsex,2 + 33/3 = 5 – 30 + 11 = -14
sex,3 = B3 – Vsex,3 + 33/3 = 1 – 5 + 11 = 7
35
Leeway distribution example
money = 70
money,1 = B1 – Vmoney,1 + 70/3 = 94 – 100 + 24 = 18
money,2 = B2 – Vmoney,2 + 70/3 = 5 – 20 + 23 = 8
money,3 = B3 – Vmoney,3 + 70/3 = 1 – 50 + 23 = -26
mail = 7
mail,1 = B1 – Vmail,1 + 7/3 = 94 – 92 + 3 = 5
mail,2 = B2 – Vmail,2 + 7/3 = 5 – 5 + 2 = 2
mail,3 = B3 – Vmail,3 + 7/3 = 1 – 10 + 2 = -7
36
Reallocation Results
N1 (US)
N2 (Germany)
N3 (Japan)
Global List
Money=100
Sex=98
Sex=30
Money=20
Mail=5
Money=50
Mail=10
Sex=5
Money=170
Sex=133
Mail=107
Love = 85
Love = 5
Health=3
Health=1
Love=0
Health=98
Love=90
N1 (US)
N2 (Germany)
N3 (Japan)
Global List
Money=118
Sex=105
Mail=97
Money=28
Sex=16
Mail=7
Money=24
Sex=12
Mail=3
Money=170
Sex=133
Mail=107
Health=94
Love = 85
Love = 5
Health=3
Health=1
Love=0
Health=98
Love=90
Health=94
Mail=92
37
Leeway distribution to N0
Leeway also distributed to monitor node
 added to leeway computation for Ot  T’
 Initialization for t,0 for Ot  T’ is B0 - 
 Any addition can be “loaned” to monitor nodes

Amount distributed to N0
Higher (i / 2) – Less chance for phase 3 in
resolution
 Lower (0) – Less resolutions (More leeway to
monitor nodes)

38
Proportional leeway distribution
Allocate more leeway to monitor nodes
updated more often
Top-k likely to change more
Good for monitor notes that exhibit
characteristic behavior
Google locations
 Enterprise routers

39
Experiments
Query 1:


FIFA ’98 Servers at 4 locations throughout the world.
20 top Web site page hit statistics
Query 2:


Most loaded server in a cluster
Single value per monitor node
Query 3:


Berkly to world WAN link, with 4 monitor points
20 top destination hosts by number outgoing tcp
packets
40
Results – Query 1
41
Results – Query 2
42
Results – query 3
43
Analysis of results
Allowing error improves results
dramatically
Leeway for N0 – Dominant factor

Low – Half leeway to N0
Low   little leeway
 Resolutions are bound to happen. Make them less
expensive


High – No leeway to N0
44
Analysis of results
Even / Proportional leeway distribution
depends on query.
Server load – Proportional
 Berkly WAN – Monitor nodes simulated, so
even distribution better
 FIFA – Proportional for lower . Even for
higher .

45
Comparison to alternative
Caching
Coordinator holds cached partial data values
 Monitor must send update to coordinator when
partial value deviates by  /2m

Monitor will always have correct partial
values, within  /2
Top-k list always correct within 
46
Results:
Note the
log scale!
47
Summary
Problem – find top-k set within error 


Distributed – multiple sources
Streaming – frequent updates
Naive approach


Transmit streams to coordinator node
If error is allowed, transmit only when deviation from
cached value threatens correctness
New approach offers dramatic improvement over
naïve approach for low-medium .
48
Summary
Use adjustment factors to establish constraints
Monitor node initiates resolution when constraint
gets broken
Resolution


Attempt to use coordinator node leeway. If successful,
fix constraints by adjustment factor reallocation.
Get partial values for resolution set from all nodes,
compute new top-k set. Reallocate leeway to all nodes.
Reallocation


Distribute leeway evenly between monitor nodes
Distribute leeway for monitor on on low 
49
Questions?
50