atri@cse.buffalo.edu
Research Interests
Theoretical Computer Science
Coding Theory
Algorithmic Game Theory
Sublinear algorithms
Approximation and online algorithms
Computational Complexity
The “algorithmic” side
2
Coding Theory
3
The setup
C(x) x
Mapping C
Error-correcting code or just code
Encoding: x
C(x)
Decoding: y
X
C(x) is a codeword x y = C(x)+error
Give up
4
Different Channels and Codes
Internet
Checksum used in multiple layers of TCP/IP stack
Cell phones
Satellite broadcast
TV
Deep space telecommunications
Mars Rover
5
“Unusual” Channels
Data Storage
CDs and DVDs
RAID
ECC memory
Paper bar codes
UPS (MaxiCode)
Codes are all around us
6
Redundancy vs. Error-correction
Repetition code : Repeat every bit say 100 times
Good error correcting properties
Too much redundancy
Parity code : Add a parity bit
Minimum amount of redundancy
Bad error correcting properties
Two errors go completely undetected
1 1 1 0 0 1
1 0 0 0 0 1
Neither of these codes are satisfactory
7
Two main challenges in coding theory
Problem with parity example
Messages mapped to codewords which do not differ in many places
Need to pick a lot of codewords that differ a lot from each other
Efficient decoding
Naive algorithm: check received word with all codewords
8
The fundamental tradeoff
Correct as many errors as possible with as little redundancy as possible
Can one achieve the “optimal” tradeoff with efficient encoding and decoding ?
9
A “low level” view
Think of each symbol in
being a packet
The setup
Sender wants to send k packets
After encoding sends n packets
Some packets get corrupted
Receiver needs to recover the original k packets
10
The Optimal Tradeoff
C(x) sent, y received
How much of y must be correct to recover x ?
At least k packets must be correct
[ Guruswami , R.
STOC 2006 ]
An explicit code along with efficient decoding algorithm
Works as long as (almost) k packets are correct
11
So what is left to do?
I cheated a bit in the last slide
The result only holds for large packets
We do not know an “optimal” code over smaller symbols (for example bits)
12
Computational Complexity
Collisions to lead to shallower decision trees
[ Aspnes, Demirbas, O’Donnell, R.
, Uurtamo 2008 ]
13
Wireless Sensor Networks
Murat Demirbas’ specialty
14
Compute Aggregate Functions
Each mote has one bit of information
Does at least
2 motes have temperature at least 70F?
Is the temperature at least 70F?
15
One possible solution
Ask each mote one at a time
Is your temp at least 70F?
In the worst case might have to ask
ALL motes
Is your temp at least 70F?
16
Can we do any better?
Formalize/Generalize this question
Decision Tree model
Inputs: x
1
, x
2
,…, x n in {0,1}
Function: f : {0,1} n
{0,1}
Minimum # queries to the input to determine f(x
1
, x
2
,…,x n
) in the worst case
This worst case number of queries is called the decision tree complexity of f
Very well studied complexity measure of functions
17
Back to our ≥ 70F example
The 2threshold T
2,n function
Previously saw D(T
2,n
)
n
In fact D(f)
n
Also D(T
2,n
) ≥ n
For the t -threshold function, D(T t,n
) ≥ n
Logical OR , Majority are a special case
18
So are we done?
The central node can broad/multi-cast
Is your temp at least 70F?
ALL motes
No
Is your temp at least 70F?
19
Replies from the motes
Answer back only if answer is yes
Is your temp at least
70F?
20
Scenario 1: all the answers are 0
Central node hears “silence”
Is your temp at least
70F?
21
Scenario 2: Only one answers is 1
Central node hears a “yes”
Is your temp at least
70F?
Yes
22
Scenario 3: ≥ 2 answers are 1
There is a collision
Central node can detect it!
All done with ONE query!
Is your temp at least
70F?
Yes
Yes
23
Feedback to complexity theory
A new “decision tree” model
Inputs: x
1
, x
2
,…, x n in {0,1}
Function: f : {0,1} n
{0,1}
Minimum # queries to the input to determine f(x
1
, x
2
,…,x n
) in the worst case
Queries are more general
Query any subset of bits
Answer is 0 , 1 or 2 + depending on #ones in the subset
k + decision trees
24
Our Results
D 2+ (T t,n
) is O(t log (n/t))
D 2+ (T t,n
) is
(t)
More general results
Understand D 2+ (f) fairly well
25
Approximation Algorithms
Ranking in Tournaments
[ Coppersmith, Fleischer, R.
SODA 2006 ]
26
US Open 2005
Venus Williams Maria Sharapova
#4
Kim Clijsters
#1
#3
#2
Nadia Petrova
Everyone plays everyone
Rank the players
Min #upsets
Rank by number of wins
Break ties
27
Ranking in Tournament results
[ Coppersmith, Fleischer, R.
SODA 2006 ]
Ordering by number of wins is 5 -approx
Ties broken arbitrarily
Problem shown to be NP-hard in 2005
Application in Rank Aggregation
Gives provable guarantee for Borda’s method
(1781!)
Future Directions
Try and analyze (variants) of heuristics that work well in practice
28
Research Interests
Theoretical Computer Science
Coding Theory
Algorithmic Game Theory
Sublinear algorithms
Approximation and online algorithms
Computational Complexity
The “algorithmic” side
29
For more information…
My Office is Bell 123: drop by!
atri@cse.buffalo.edu
CSE 545 in Spring 09
Course on error correcting codes
30
Algorithmic Game Theory
Online auction of digital goods
[ Blum, Kumar, R.
, Wu SODA 2003 ]
31
Online Auctions of Digital goods
Say you want to sell mp3s of a song
Can make copies with no extra cost
Buyers arrive one by one
Specify how much they are willing to pay
You need to decide to sell or not
At what price ?
You want to make lots of money
32
However…
Why not just sell at the value specified by a buyer ?
Buyers are selfish
They will lie to get a better deal
Why not charge a single fixed price ?
Do not know best price in advance
The challenge
Build a online pricing scheme that gives buyers no incentive to cheat
Our work gives pricing scheme as good as best fixed price
[ Blum, Kumar, R.
, Wu SODA 2003 ]
33
Problems I am interested in
Problems motivated by game theory
Sometimes, “old” problem with a twist
What is the best way to pair up potential couples in a dating site?
Twist on the classical graph matching problem
34
Sublinear Algorithms
Data Streams
[ Beame, Jayram, R.
STOC 2007 ]
35
Data Streams (one application)
Databases are huge
Fully reside in disk memory
Main memory
Fast, not much of it
Disk memory
Slow, lots of it
Random access is expensive
Sequential scan is reasonably cheap
Main memory
Disk Memory
36
Data Streams (one application)
Given a restriction on number of random accesses to disk memory
How much main memory is required ?
For computations such as join of tables
Answer: a lot
[ Beame, Jayram, R.
STOC
2007 ]
Open question: computing other functions?
Main memory
Disk memory
37