Tracing a Single User

advertisement
Tracing a Single User
Joint work with Noga Alon
Group Testing
Dorfman raised the following problem in 1941:



All American inductees gave blood samples, that
were tested for the presence of a syphilitic antigen.
We assume that the number of infected blood
samples r is much smaller than the total number m.
Testing each sample separately requires m tests.
Group Testing (cont.)




Instead, one can test pools that contain blood from a
set of samples.
If the outcome is negative – none of the samples in
the pool is infected.
Otherwise, the pool contains at least one infected
sample, which can be determined by further tests.
This way, less than m tests are needed.
Molecular Biology



In recent years this problem has gained popularity
again in the field of molecular biology.
For example, when we are given a large set of DNA
sequences, and we look for all those that contain a
specific short subsequence.
We can use a method similar to that of the blood
testing problem.
Molecular Biology (cont.)

In some applications, we are interested in finding one
sequence that contains the short subsequence, rather
than all of them.
Parallelization


Often, we would prefer to conduct all experiments
simultaneously, even at the cost of increasing the
number of experiments.
Thus, we need our tests to be non-adaptive, i.e. the
pool tested in each experiment is independent of the
outcomes of other experiments.
Non-Adaptive Tests
a1 a2
.
.
.
.
am
T1
0
1
1
1
1
0
0
1
T2
1
1
1
1
0
0
1
1
.
0
1
0
0
1
1
0
1
.
1
0
1
0
0
0
0
0
.
1
0
0
1
1
0
1
1
Tn
0
0
1
0
0
1
1
0
r-SUT Definition
Definition: Let F be a family of subsets of
[n] = {1,…,n}. F is called r-single-user-tracing
superimposed (r-SUT) if F1,…,FkF with |Fi|r,
k
 A  ...   A   Fi  .
AF1
AFk
i1
In other words, given the union of up to r sets from F,
one can identify at least one of those sets.
Communication

Suppose that m users share a common channel.

Each user is associated with a vector in {0,1}n.


All active users transmit their vectors, and a single
receiver gets the OR of all transmitted vectors.
Given that at most r users are active simultaneously,
we would like the receiver to be able to identify at
least one of them.
Maximal r-SUT Families


Let g(n,r) denote the maximum size of an
r-SUT family of subsets of [n].
Let Rg(r) = lim sup
n
log g(n,r) / n.
Csűrös and Ruszinkó: There exist constants c1,c2>0 s.t.
c1
r2
 R g r  
c2
r
.
Our result: Rg(r) =(1/r) (and hence (1/r)).
Lower Bound



Let m = 2n/(20r).
We construct a family F={F1,…,Fm} of subsets of [n]
at random as follows:
 1 ≤ i ≤ m and 1 ≤ j ≤ n independently, put j in Fi
with probability 1/r.
Lower Bound (cont.)

We show that F is r-SUT with positive probability.

We say a configuration of F1,…,FkF with |Fi|r and
k
 Fi   is bad if all the unions  A are equal.
i1

AFi
We show that with positive probability there are no
bad configurations.
Lower Bound (cont.)


We show that with probability > ½ no small
configuration is bad, and that with probability > ½
no large configuration is bad.
Therefore, with positive probability there is no bad
configuration.
Small Configurations
k
 Fi  2r
i1
Proposition: With probability > ½ the following holds:
s<2r and distinct A1,…,AsF, j[n] that belongs to
exactly one of the sets A1,…,As.
Corollary: With probability > ½ no small configuration is
bad.
Small Configurations (cont.)
A5
A2
A1
A8
A4
A3
A7
A6
A9
Large Configurations
k
 Fi  2r
i 1
Proposition: With probability > ½ the following holds.
r
r
i1
i1
For all distinct A1,…,Ar,B1,…,BrF,  A i   B i .
Corollary: With probability > ½ no large configuration is
bad.
Large Configurations (cont.)
B2
B1
B3
B1
B3
A1
A2
A3
Ai
B2
Tracing Multiple Users


Recently, Laczay and Ruszinkó have introduced the
following generalization of r-SUT families.
For integers n, r2, and 1kr, a family F of subsets
of [n] is called k-out-of-r multiple-user-tracing
superimposed (MUTk(r)) if given the union of any ℓr
sets from F, one can identify at least min(k,ℓ) of
them.
Tracing Multiple Users (cont.)



Let h(n,r,k) denote the maximum size of a
MUTk(r) family of subsets of [n].
Let Rh(r,k) = lim sup
n
log h(n,r,k) / n.
We have shown that there are constants c1,c2,c3,c4>0
s.t.
k
.
min cr1 , c 22  R h r , k  min cr3 , c 4 log
2

k

 

k

Open Problems

We have shown that Rg(r) = (1/r), but the question
of finding the exact constant is still open.

This problem is open even for the case of r = 2.

1/3  Rg(2)  1/2+o(1).
By a careful
analysis of
the random
construction
Follows from a
result of
Coppersmith
and Shearer
Open Problems (cont.)


We show how to construct an r-SUT family in time
mO(r), where m is the size of the family.
It would be interesting to find explicit constructions
for all r.
There are other related problems for which there are
still gaps between lower and upper bounds:
 Multiple-user tracing families
 r-superimposed families
 Disjointly r-superimposed families
 Graph identifying codes
Download