Rank centrality

advertisement
Sahand Negahban Sewoong Oh Devavrat Shah
Yale + UIUC + MIT
o Given partial preferences
o Compute global ranking with scores to reflect intensity
o Sports
o Outcome of games between teams/players
o Social recommendations
o Ratings of few restaurants/movies
o Competitive conference/Graduate admission
o Ordering of few papers/applicants
o Partial preferences are revealed in different forms
o Sports: Win and Loss
o Social: Starred rating
o Conferences: Scores
o All can be viewed as pair-wise comparisons
o IND beats AUS: IND > AUS
o South Indies ***** vs MTR ***: SI > MTR
o Ranking Paper 10/10 vs Other Paper 5/10: Ranking > Other
o Revealed preferences lead to
o Bag of pair-wise comparisons
o Sports, Social, Conferences, Transactions, etc.
o Question of interest
o Obtain global ranking over objects of interest
o Teams/Players, Restaurants, Papers, Applicants.
o Along with intensity/score for each object
o Using given partial preferences/pair-wise comparisons
# times 1 defeats 2
A12
1
A21
6
2
5
3
4
o Q1. Given weighted comparison graph G=(V, E, A)
o Find ranking of/scores associated with objects
o Q2. When possible (e.g. Conference/Crowd-Sourcing), choose G so as to
o Minimize the number of comparisons required to find ranking/scores
A12
1
A21
6
o We posit
5
3
o Distribution over permutations as ground-truth
o Pair-wise comparisons are drawn from this distribution
Data
A > B > C
B > C > A
B > C > A
B > C > A
Distribution
0.25
A > B > C
0.75
B > C > A
Ranking
B > C > A
2
4
6
2
> 3
o Useful axiomatic properties [Young ‘74]
4
> 4
o Simple
> 1
o Borda count: average position is score
> 5
o NP-hard, 2-approx algorithm [Dwork et al ’01]
> 2
o Extended Condorcet Criteria
> 6
o Kemeny optimal: minimize disagreements
> 5
o Some algorithms
5
2
> 1
o Axiomatic impossibility [Arrow ’51]
A21
> 4
o Input: complete preference (not comparisons)
> 3
6
A12
1
3
A12
1
A21
6
2
o Algorithm with comparisons
o Variant of Kemeny optimal:
argmin
s
åA I(s (i) < s (j))
5
3
ij
4
o NP-hard
o Variant of Borda count: average position from comparison?
o If pij = Aij/(Aij+ Aji) represent pair-wise marginal distribution
o Then, Borda count is given as
c(i) µ å pij
j
[Ammar, Shah ’11]
o Requires: G complete, many comparisons per pair
o Also see (short list of relatd works):
[Diaconis ‘87], [Alder et al ‘87], [Braverman-Mossel ’09], [Caramanis et al ‘11],
[Fernoud et al ’11], [Duchi et al ‘12]…
A12
1
A21
6
2
o General model
o Effectively impossible to do aggregation
5
o Practically
o Restrict choice model
o Popularly utilized model is instance of Thurstone’s ‘27
o Used for transportation system (cf. McFadden)
o TrueSkill uses for ranking online gamers
o Pricing in airline industry (cf. Talluri and Van Ryzin)
o…
3
4
o Choice model (distribution over permutations)
[Bradley-Terry-Luce (BTL) or MNL Model]
o Each object i has an associated weight wi > 0
o When objects i and j are compared
o P(i > j) = wi /(wi + wj)
o Sampling model
o Edges E of graph G are selected
o For each (i,j) ε E, sample k pair-wise comparisons
A12
1
A21
6
2
o Random walk on comparison graph G=(V,E,A)
o d = max (undirected) vertex degree of G
o For each edge (i,j):
5
o Pij = (Aji +1)/(Aij +Aji +2) x 1/(d+1)
o For each node i:
o Pii = 1- Σj≠i Pij
o Let G be connected
o Let s be the unique stationary distribution of RW P
sT = sT P
o Ranking:
o Use s as scores of objects
o Closely related to Dwork et al ‘01 + Saaty ‘03
3
4
A12
1
A21
6
2
o Random walk on comparison graph G=(V,E,A)
o Let s be the unique stationary distribution of RW P
o Ranking:
sT = sT P
o Use s as scores of objects
æ A +1 ö
1
ij
ç
÷÷ s(j)
s(i) =
å
ç
j≠i
Z(i)
è Aij +A ji +2 ø
o That is, object i has higher score if
o It beats object j with higher score,
o Or, beats many objects.
5
3
4
A12
1
A21
6
2
o Random walk on comparison graph G=(V,E,A)
o Let s be the unique stationary distribution of RW P
o Ranking:
sT = sT P
o Use s as scores of objects
æ A +1 ö
1
ij
ç
÷÷ s(j)
s(i) =
å
ç
j≠i
Z(i)
è Aij +A ji +2 ø
o Compared to variant of Borda count:
æ A +1 ö
ij
÷
sb (i) = å çç
j≠i A +A +2 ÷
è ij ji ø
5
3
4
International Cricket Ranking
o Error(s) =
1
w
(å
i>j
)
(wi -w j ) I {(s(i)-s(j))(wi -w j )<0}
2
1/2
o G: Erdos-Renyi graph with edge prob. d/n
k
d/n
o Theorem 1 (Negahban-Oh-Shah).
o Let R= (maxij wi/wj).
o Let G be Erdos-Renyi graph.
o Under Rank centrality, with d = Ω(log n)
s-w
≤C
w
R 5log n
kd
o That is, sufficient to have O(R5 n log n) samples
o Optimal dependence on n for ER graph
o Dependence on R ?
o Theorem 1 (Negahban-Oh-Shah).
o Let R= (maxij wi/wj).
o Let G be Erdos-Renyi graph.
o Under Rank centrality, with d = Ω(log n)
s-w
≤C
w
R 5log n
kd
o Information theoretic lower-bound: for any algorithm
s-w
1
³ C'
w
kd
o Theorem 2 (Negahban-Oh-Shah).
o Let R= (maxij wi/wj).
o Let G be any connected graph:
o L = D-1 E be it’s Laplacian
o Δ = 1- λmax(L)
o κ = dmax /dmin
o Under Rank centrality, with kd = Ω(log n)
s-w
C
R 5log n
≤ k
w
D
kd
o That is, number of samples required O(R5 κ2 n log n x Δ-2)
o Graph structure plays role through it’s Laplacian
o Theorem 2 (Negahban-Oh-Shah).
o Under Rank centrality, with kd = Ω(log n)
s-w
C
R 5log n
≤ k
w
D
kd
o That is, number of samples required O(R5 κ2 n log n x Δ-2)
o Choice of graph G
o Subject to constraints, choose G so that
o Spectral gap Δ is maximized
o SDP [Boyd, Diaconis, Xiao ‘04]
o Bound on
o Use of comparison theorem [Diaconis-Saloff Coste ‘94]++
o Bound on
o Use of (modified) concentration of measure inequality for matrices
o Finally, use this to further bound Error(s)
A12
1
A21
6
o MIT admission system
5
o ACM conferences (MobiHoc ‘11, Sigmetrics ‘13)
o Past few years has been used for efficient reviewing
o Daily polls (cf. A. Ammar)
o polls.mit.edu
o Netflix
o?
2
3
4
o Pair-wise comparisons
o Universal way to look at partial preferences
o Rank centrality
o Simple and intuitive algorithm for rank aggregation
o The comparison graph plays important role in aggregation
o Choose G to maximize spectral gap of natural RW
Download