Topics In Applied Probability Contents March 12, 2013

advertisement
Topics In Applied Probability
March 12, 2013
Contents
1 Noise Sensitivity And Boolean Function
1.1 Definitions And Examples . . . . . . . .
1.2 Noise Sensitivity and Stability . . . . . .
1.3 Fourier Analysis Of Boolean Functions .
1.4 Hypercontractivity . . . . . . . . . . . .
1.5 Percolation . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
4
8
15
19
2 Concentration
2.1 Efron-Stein Inequality . . .
2.2 Martingale Method . . . . .
2.3 Convex Hull Approximation
2.4 Entropy . . . . . . . . . . .
2.5 Stein’s Method . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
20
22
24
27
32
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
Noise Sensitivity And Boolean Function
1.1
Definitions And Examples
Definition 1.1. Boolean Function
A boolean function is a map f : {−1, 1}n → {−1, 1} for some n ∈ N
Theorem 1.1. Suppose {Xi }ni=1 are independent Bernoulli p random variables which represent votes
for a correct answer 1 and an incorrect answer −1. If we choose the result chosen by the majority of
the Xi and let R(n, p) be the probability that the overall result is correct then:
• p > 1/2
R(n, p) is increasing in p with limit 1
• p < 1/2
argmaxn≥1 R(n, p) = 1
Proof. Let n = 2m − 1, we will show that for n ≥ 1 that increasing m by one decreases the probability
of the correct solution for p < 1/2 and increases it for p > 1/2.
R(n + 2, p) − R(n, p) = P(n + 2 are correct, first n are incorrect) − P(n + 2 are incorrect, first n are correct)
n
n
=
pm−1 (1 − p)m p2 −
pm (1 − p)m−1 (1 − p)2
m
m
n
=
pm (1 − p)m (2p − 1)
m
(
> 0 p > 1/2
< 0 p < 1/2
Definition 1.2. Uniform Measure
N
The uniform measure on Ωn is P = (δ1 /2 + δ−1 /2) n
which is the product measure which independently assigns probability 1/2 to each bit taking either of
the two possible values.
Example 1.1. There are four important Boolean functions which we will consider, these are:
• Dictator Function:
DICTn (ω) = ω1
which represents the first bit having all of the influence.
• Parity Function:
P ARn (ω) =
n
Y
(
ωi =
i=1
1
−1
#{ωi = −1} even
o/w
which is a non-increasing Boolean function.
• Majority Function:
M AJn (ω) = sign
n
X
!
ωi
i=1
(
=
1
−1
#{ωi = 1} > n/2
o/w
which represents a voter model where the outcome is decided by a majority vote.
2
• Iterated Majority:
IT M AJn (ω) = M AJ3 (IT M AJn/3 (ω1 , ..., ωn/3 ), IT M AJn/3 (ω1+n/3 , ..., ω2n/3 ), IT M AJn/3 (ω1+2n/3 , ..., ωn ))
which represents a voter system where the voters are grouped into blocks of three, each block takes
a value based on a majority vote, these results are then grouped into blocks of three and the
process repeats. In this case the order is important.
Denote [n] := {1, ..., n} and for i, j ∈ [n], ω ∈ Ωn
(
ω(j)
j 6= i
i
ω (j) =
−ω(j) j = i
Definition 1.3. Pivotal
For f : Ωn → {−1, 1}, ω ∈ Ωn we say that i ∈ [n] is pivotal if f (ω) 6= f (ω i )
Definition 1.4. Pivot Set
For f : Ωn → {−1, 1}, ω ∈ Ωn we write the pivot set to be
Pf (ω) := {i ∈ [n] : i is pivotal f or f, ω}
Definition 1.5. Influence
For f : Ωn → {−1, 1} the influence of i ∈ [n] is
Ii (f ) = P(i ∈ Pf )
We denote Inf (f ) to be the vector if influences.
Definition 1.6. Total Influence
The total influence of f : Ωn → {−1, 1} is
I(f ) :=
n
X
||Inf (f )||1
i=1
Example 1.2. Tribes:
Suppose that we have a population of n = mk where m ≈ 2k log(2). If we let k be the size of a tribe
then let each member of each tribe vote {−1, 1}, we then take decision 1 if at least one tribe
unanimously voted 1 so we have
(
1
∃tribe all voting 1
f (ω) =
−1 o/w
This means that n ≈ k2k log(2) so k ≈ log2 (n) − log2 (log2 (n)) and (1 − 2−k )m ≈ 1/2
Which is the probability that no tribes votes unanimously.
Ii (f ) = P(1 pivotal)
by symmetry
= P(other k − 1 members of the first tribe vote 1, no other tribe in unanimous)
= P(other k − 1 members of the first tribe vote 1)P(no other tribe in unanimous) by independence of tribes
k−1
1
(1 − 2−k )m−1
=
2
≈ 2−k
≈c
log(n)
n
Theorem 1.2. KKL Theorem:
If P(f (ω) = 1) ≈ 1/2 then
maxIi (f ) ≥ c
3
log(n)
n
1.2
Noise Sensitivity and Stability
Notation 1.1. Let ω ∼ U{−1, 1}n then we denote ω (ε) to be ω where each bit is independently
rerandomized with probability ε.
(ε)
It follows that P(ωi = ωi ) = 1 − ε/2
Definition 1.7. Noise Sensitive:
The sequence of boolean functions {fn }n∈N is noise sensitive if ∀ε ∈ (0, 1) we have that
lim E[fn (ω)fn (ω (ε) )] − E[fn (ω)]2 = 0
n→∞
For the previous definition it suffices that the limit holds for some ε ∈ (0, 1) since this implies it
∀ε ∈ (0, 1)
Intuitively this is saying that changing a few bits randomly leads to independece between
fn (ω), fn (ω (ε) ) for large populations sizes n.
In this sense a small amount of noise changes the values sufficiently and hence it is more likely that an
individual is pivotal.
Definition 1.8. Noise Stable:
The sequence of boolean functions {fn }n∈N is noise stable if
lim supn P(fn (ω) 6= fn (ω (ε) )) = 0
ε→0
Intuitively this says that for any population size, changing approximately nε/2 bits has no effect as
ε → 0.
Lemma 1.1. If fn is noise sensitive and noise stable then limn→∞ V ar[fn ] = 0
Proof. Let P(fn (ω) = 1) = p then we have that
V ar[fn (ω)] = E[fn (ω)2 ] − E[fn (ω)]2
= 1 − (1 − p − p)2
= 4p(1 − p)
which is equal to zero iff p ∈ {0, 1}
From noise stability we have that
lim supn P(fn (ω) 6= fn (ω (ε) )) = 0
ε→0
and from noise sensitivity we have that ∀ε > 0
lim E[fn (ω)fn (ω (ε) )] − E[fn (ω)]2 = 0
n→∞
which gives us that
lim E[fn (ω)fn (ω (ε) )] + V ar[fn (ω)] − E[fn (ω)2 ] = 0
n→∞
lim E[fn (ω)(fn (ω (ε) ) − fn (ω))] + V ar[fn (ω)] = 0
n→∞
For δ > 0 choose ε > 0 s.t. supn P(fn (ω) 6= fn (ω (ε) )) < δ
which gives us that
E[fn (ω)(fn (ω (ε) ) − fn (ω))] ≤ E[fn (ω (ε) ) − fn (ω)] < 2δ
Example 1.3. The dictatorship function f (ω) = ω1 is noise stable and not noise sensitive.
4
Proof.
P(fn (ω) 6= fn (ω (ε) ) = ε/2
which is independent of n and converges to zero as ε → 0 so indeed the function is noise stable.
We can write
(
1
p = 1 − ε/2
(ε)
0
0
ω =ω×ω :
ω =
−1 p = ε/2
`
for ω 0 ω
For this we have that:
E[fn (ω)]2 = 0
E[fn (ω)fn (ω (ε) )] = E[ω1 ω1 ω10 ]
= E[ω12 ]E[ω 01 ]
= E[ω10 ]
= 1 − ε/2 − ε/2
=1−ε
which does not converge to 0 as n → ∞ hence the function is not noise sensitive.
Example 1.4. The parity function fn (ω) =
Qn
i=1
ωi is noise sensitive and not noise stable.
Proof.
"
E[fn (ω)fn (ω
(ε)
)] = E
=E
n
Y
!
ωi
i=1
" n
Y
n
Y
!#
(ε)
ωi
i=1
#
ωi2 ωi0
"
=E
i=1
n
Y
#
ωi0
i=1
=
n
Y
E[ωi0 ]
by independence
i=1
= (1 − ε)n
lim (1 − ε)n = 0
∀ε > 0
n→∞
hence indeed the function is noise sensitive.
Let X ∼ Bin(n, ε/2) then
P(fn (ω) 6= fn (ω (ε) )) = P(X odd)
lim P(X odd) = 1/2
n→∞
hence we do not have convergence to zero as ε → 0
Example 1.5. The iterated majority function is noise sensitive and not noise stable.
5
Proof. We want to consider the function iteratively so notice that for n = 1 we only have one member
of the population hence
P(f1 (ω) 6= f1 (ω (ε) )) = ε/2
since the probability that they choose to re-randomize is ε and the probability of the new value being
different given a re-randomization is taking place is 1/2.
For n = 3 let ω = + + + denote the event that the three members agree and ω = + + − denote the
event where one member votes differently to the other two. We clearly have that
P(ω = + + +) = 1/4 = 1 − P(ω = + + 1). We can then condition on these events to get:
ε 2 ε ε 3
1−
+
P(f3 (ω 6= f3 (ω ∗(ε) )|ω = + + +) = 3
2
2
2
ε
ε 2 ε
ε 2 ε 3
∗(ε)
1−
+2
1−
P(f3 (ω 6= f3 (ω )|ω = + + −) =
+
2
2
2
2
2 ε 3
ε
ε 2 1
ε
ε
ε 2 ε 3
3
ε 2
∗(ε)
P(f3 (ω 6= f3 (ω )) =
1−
+
1−
+2
1−
3
+
+
4
2
2
2
4
2
2
2
2
2
2
3
2
3
3
2
3
1 3ε
3ε
ε
3 ε
ε
ε
ε
=
−
+
+
+ ε − ε2 +
+
−
4
4
8
8
4 8
4
4
8
2
3
3ε 3ε
ε
=
−
+
4
8
8
Moreover if we write εk to be the noise at level k we inductively have that
εk+1 =
3ε2
3εk
− k + ε3k
2
2
We want to find a stable equilibrium of εk+1 = εk . There are three equilibria which are εk = 0, 1/2, 1.
We have that
|f 0 (0)| = 3/2 > 1
0
|f (1/2)| = 3/4 < 1
|f 0 (1)| = 3/2 > 1
so ε = 1/2 is stable hence
lim P(fn (ω) 6= fn (ω (ε) )) = 1/2
n→∞
So indeed the function is sensitive.
Theorem 1.3. Discrete Poincare:
If f : Ωn → {−1, 1} then
V ar(f ) ≤ I(f ) ≤ n||Inf (f )||∞
in particular
||Inf (f )||∞ ≤
V ar(f )
n
Proof. Let ω, ω̃ be i.i.d on Ωn then V ar(f ) = E[(f (ω) − f (ω̃))2 ]/2
If we then define
(
ω(i) i > k
ωk (i) :=
ω̃(i) i ≤ k
6
for i, k = 1, ..., n then we have that ω0 = ω and ωn = ω̃ so
f (ω) − f (ω̃) = f (ω0 ) − f (ωn )
= (f (ω0 ) − f (ω1 )) + f (ω1 ) − f (ωn )
n
X
=
f (ωi−1 ) − f (ωi )
i=1
V ar(f ) = E[(f (ω) − f (ω̃))2 ]/2

!2 
n
X
= E
f (ωi−1 ) − f (ωi )  /2
i=1
≤
n
X
E[(f (ωi ) − f (ωi−1 ))2 ]/4
i=1
=
=
=
n
X
i=1
n
X
i=1
n
X
E[4If (ωi )6=f (ωi−1 ) ]/4
P(f (ωi ) 6= f (ωi−1 ))
Ii (f )
i=1
≤ n||Inf (f )||∞
Theorem 1.4.
||Inf (f )||∞ ≥
cV ar(f )log(n)
n
for some universal constant c.
Lemma 1.2. ∃c universal constant such that
I(f ) =
n
X
Ii (f ) ≥ cV ar(f ) log
i=1
1
||Inf (f )||∞
Theorem 1.5. If
lim
n→∞
n
X
Ii (fn )2 = 0
i=1
then fn is noise sensitive.
In general the converse to this fails, this is clear because the parity function is noise sensitive but the
condition fails.
Theorem 1.6. For c > 0.234 we have that
Cov(fn (ω), fn (ω
(ε)
)) ≤ 20
n
X
i=1
Lemma 1.3. If fn are monotone and
lim
n→∞
n
X
Ii (fn )2 6= 0
i=1
then fn is not noise sensitive.
7
!cε
Ii (f )
1.3
Fourier Analysis Of Boolean Functions
Definition 1.9. Partial Ordering:
We can partially order Ωn by saying ω ≤ ω 0 iff ωi ≤ ωi0
∀i ∈ [n]
Definition 1.10. Increasing:
We say that f is monotone iff x ≤ y =⇒ f (x) ≤ f (y)
Notation 1.2. We denote L2 (Ωn ) := {f : Ωn → R} to be the space of functions of Ωn .
Definition 1.11. Inner Product:
For any f, g ∈ L2 (Ωn ) we define the inner product to be
X
2−n f (ω)g(ω)
< f, g >:= E[f g] =
ω∈Ωn
Definition 1.12. Walsh Function:
For S ⊆ [n] we define the Walsh function of S as χS : Ωn → {−1, 1} as
Y
χS (ω) =
ωi
i∈S
Lemma 1.4. {χS }S⊆[n] forms an orthonormal basis for L2 (Ωn )
Proof.
< χS , χS 0 > = E[χS χS 0 ]


Y
Y
= E
ωi
ωj 
j∈S 0
i∈S
"
#
Y
=E
ωi
i∈S∆S 0
=
Y
E[ωi ]
i∈S∆S 0
(
=
1
0
S = S0
o/w
Where ∆ represents the symmetric difference.
It remains to show that the set of Walsh functions spans.
If f ∈ L2 (Ωn ) then we can write
X
f (ω) =
f (ω 0 )Iω0 (ω)
ω 0 ∈Ωn
Since there are precisely 2n elements {Iω0 }ω0 ∈Ωn we indeed have that the dimansion of L2 (Ωn ) is 2n
which is also the number of Walsh functions hence since they are orthonormal they must span.
Definition 1.13. Fourier-Walsh Decomposition:
For f ∈ L2 (Ωn ) we write the Fourier-Walsh decomposition as
X
f=
fˆ(s)χS
s⊆[n]
where fˆ(s) =< f, χS > is the Fourier-Walsh coefficient.
Definition 1.14. Energy Spectrum:
For f ∈ L2 (Ωn ) we define the energy spectrum to be
X
Ef (m) :=
s⊆[n]:|s|=m
8
fˆ(s)2
Typically we have that if fˆ is large for small s then f is stable and if fˆ is large for large s then f is
unstable.
Lemma 1.5.
V ar[f ] =
n
X
Ef (m)
m=1
Proof.
n
X
Ef (m) =
m=0
X
fˆ(s)2
s⊆[n]
=
X
fˆ(s) < χs , χs0 > fˆ(s0 )
s,s0 ⊆[n]
=
X
< f, χs >< χs , χs0 >< f, χs0 >
s,s0 ⊆[n]
=< f, f >
= E[f 2 ]
Ef (0) = fˆ(φ)2
= E[f ]2
n
n
X
X
Ef (m) =
Ef (m) − Ef (0)
m=1
m=0
= E[f 2 ] − E[f ]2
= V ar(f )
Lemma 1.6. For f : Ωn → R we have that
Cov(f (ω), f (ω (ε) )) =
n
X
m=1
9
Ef (m)(1 − ε)m
Proof.

E[f (ω)f (ω (ε) )] = E 

X
fˆ(s)χs (ω)
=
fˆ(s0 )χs0 (ω (ε) )
s0 ⊆[n]
s⊆[n]
X
X
fˆ(s)fˆ(s0 )E[χs (ω)χs0 (ω (ε) )]
s,s0 ⊆[n]
X ∼ Bin(|s0 |, ε/2)
E[χs (ω)χs0 (ω)] = E[χs (ω)χs0 (ω)(−1)X ]
= E[χs (ω)χs0 (ω)]E[(−1)X ]
(
0
s 6= s0
=
|s|
(1 − ε)
s = s0
X
E[f (ω)f (ω (ε) )] =
fˆ(s)2 (1 − ε)|s|
s⊆[n]
=
n
X
fˆ(m)2 (1 − ε)m
m=0
= fˆ(φ)2 +
n
X
fˆ(m)2 (1 − ε)m
m=1
n
X
= E[f (ω)]2 +
Ef (m)(1 − ε)m
m=1
Cov(f (ω), f (ω (ε) )) =
n
X
Ef (m)(1 − ε)m
m=1
If the concentration of energy is on small m then the convergence with respect to ε will be slow.
Corollary 1.1. Cov(f (ω), f (ω (ε) )) is positive and decreasing in ε.
Proposition 1.1. A sequence {fn }∞
n=1 : Ωn → {−1, 1} is noise sensitive iff ∀k ≥ 1 we have that
lim
n→∞
k
X
Efn (m) = 0
m=1
Proof. Suppose fn is noise sensitive then by definition we have that
lim Cov(f (ω), f (ω (ε) )) = 0
n→∞
So by the previous lemma we have that
lim
n→∞
n
X
Ef (m)(1 − ε)m
m=1
For k ≥ 1 we have that
k
X
n
X
1
Efn (m) ≤
Efn (m)(1 − ε)m
k
(1
−
ε)
m=1
m=1
Since (1 − ε)−k is a fixed constant we have that
lim
n→∞
k
X
Efn (m) = 0
m=1
10
Now suppose that
k
X
lim
n→∞
Efn (m) = 0
m=1
Then we have that
n
X
k
X
Efn (m)(1 − ε)m ≤
m=1
m=1
Notice that by our assumption limn→∞
(1 − ε)k+1
n
X
Pk
m=1
Efn (m) = 0 and that
n→∞
n
X
Efn (m) = V ar[fn ] ≤ 1
m=1
m=k+1
lim sup
Efn (m)
m=k+1
Efn (m) ≤ (1 − ε)k+1
So indeed
n
X
Efn (m) + (1 − ε)k+1
n
X
Efn (m)(1 − ε)m ≤ (1 − ε)k+1
∀k
m=1
Since this holds for arbitrary k we can conclude that
lim sup
n→∞
n
X
Efn (m)(1 − ε)m = 0
m=1
So
lim Cov(f (ω), f (ω (ε) )) = 0
n→∞
by the previous lemma hence the sequence {fn }∞
n=1 is noise sensitive.
Lemma 1.7. If f is a boolean function then
2P(f (ω) 6= f (ω (ε) )) = V ar[f (ω)] − Cov[f (ω), f (ω (ε) )]
Proof.
E[f (ω)f (ω (ε) )] = P(f (ω) = f (ω (ε) )) − P(f (ω) 6= f (ω (ε) ))
= 1 − 2P(f (ω) 6= f (ω (ε) ))
= E[f (ω)f (ω)] − 2P(f (ω) 6= f (ω (ε) ))
E[f (ω)f (ω (ε) )] − E[f (ω)]2 = E[f (ω)f (ω)] − E[f (ω)]2 − 2P(f (ω) 6= f (ω (ε) ))
Cov[f (ω), f (ω (ε) )] = V ar[f (ω)] − 2P(f (ω) 6= f (ω (ε) ))
So indeed the result follows by rearrangement.
Corollary 1.2. For boolean function f we have that
2P(f (ω) 6= f (ω (ε) )) =
n
X
Ef (m)(1 − (1 − ε)m )
m=1
Proof.
2P(f (ω) 6= f (ω (ε) )) = V ar[f (ω)] − Cov[f (ω), f (ω (ε) )]
= Cov[f (ω), f (ω)] − Cov[f (ω), f (ω (ε) )]
n
n
X
X
m
=
Ef (m)(1 − 0) −
Ef (m)(1 − ε)m
=
m=1
n
X
m=1
Ef (m)(1 − (1 − ε)m )
m=1
11
Proposition 1.2. A sequence of boolean functions {fn }∞
n=1 is noise stable iff ∀ε ∈ (0, 1)
supn∈N
n
X
∃kε ∈ N s.t.
Efn (m) ≤ ε
m=k
Proof. Suppose the sequence fn is noise stable then
supn∈N
n
X
Efn (m) ≤
n
X
1
sup
Efn (m)(1 − (1 − ε)m )
n∈N
1 − (1 − ε)k
m=k
m=k
n
X
1
≤
sup
Efn (m)(1 − (1 − ε)m )
n∈N
1 − (1 − ε)k
m=1
=
1
supn∈N 2P(fn (ω) 6= fn (ω (ε) ))
1 − (1 − ε)k
Since fn is noise stable we have that ∀ε ∈ (0, 1)
∃δ ∈ (0, 1) s.t.
supn∈N P(fn (ω) 6= fn (ω (δ) )) ≤
ε
4
So choose k ∈ N such that 1 − (1 − ε)k ≥ 1/2 then
supn∈N
n
X
Efn (m) ≤ ε
m=k
Now suppose that ∀ε >∈ (0, 1)
∃kε ∈ N s.t.
supn∈N
n
X
Efn (m) ≤ ε
m=k
then
2supn∈N P(fn (ω) 6= fn (ω (δ) )) ≤ supn∈N
k−1
X
Efn (m)(1 − (1 − δ)m ) + ε
m=1
≤ supn∈N V ar[fn (ω)](1 − (1 − δ)k ) + ε
≤ (1 − (1 − δ)k ) + ε
We can choose both δ, ε arbitrarily small so indeed
lim P(fn (ω) 6= fn (ω (ε) )) = 0
n→∞
Proposition 1.3. If fn is noise sensitive and gn is noise stable then limn→∞ Cov(fn , gn ) = 0
Proof.
Cov(fn , gn ) = E[fn (ω)gn (ω)] − E[fn (ω)]E[gn (ω)]
X
=
fˆ(s) < χs , χs0 ĝ(s) − fˆ(φ)ĝ(φ)
s,s0 ⊆[n]
=
X
fˆ(s)ĝ(s) − fˆ(φ)ĝ(φ)
s⊆[n]
=
X
fˆ(s)ĝ(s)
|s|≥1
12
Fix ε > 0 then ∃k s.t.
supn∈N
X
Egn (m) < ε
m≥k
since gn is noise stable from proposition 1.2.
X
v
u n
n
X
uX
ˆ
Ef (m)
f (s)ĝ(s) ≤ t
Eg (m)
|s|≥k
m=k
≤
p
byCauchy − Schwarz
m=k
V ar(f )ε2
≤ε
So we have that
lim sup Cov(fn , gn ) = lim sup
n→∞
n→∞
= lim sup
n→∞
X
fˆ(s)ĝ(s)
|s|≥1
X
n→∞
|s|>k
X
≤ ε + lim sup
n→∞
X
fˆ(s)ĝ(s) + lim sup
fˆ(s)ĝ(s)
|s|∈[1,k]
fˆ(s)ĝ(s)
|s|∈[1,k]
v
u k
k
uX
X
Eg (m)
Ef (m)
≤ ε + lim sup t
n→∞
m=1
m=1
v
u k
uX
p
≤ ε + lim sup V ar(g)t
Ef (m)
n→∞
m=1
=ε
since fn noise sensitive.
Lemma 1.8. Parseval’s Identity:
< f, f >=
X
fˆ(s)2
s⊆[n]
Definition 1.15. Discrete Derivative:
For f : Ωn → R we define the operator
∇k f (ω) = f (ω) − f (ω k )
where
(
ωik
=
ωi
−ωi
i 6= k
i=k
to be the discrete derivative.
Proposition 1.4. If f : Ωn → {−1, 1} then
Ik (f ) =
X
s3k
13
fˆ(s)2
Proof. Notice that
( Q
ωk
χs (ω) =
ωi =
Qi∈s ik
− i∈s ωi
i∈s
Y
k∈
/s
k∈s
So we have that
X
∇k f (ω) =
fˆ(s)(χs (ω) − χs (ω k ))
s⊆[n]
=2
X
fˆ(s)χs (ω)
s3k
∇ˆk f (s) =< ∇k f, χs >
X
=2
fˆ(s0 ) < χs0 (ω), χs (ω) >
s0 3k
(
0
k∈
/s
=
2fˆ(s) k ∈ s
So by defninition of the influence function we have that:
Ik (f ) = P(∇k f 6= 0)
= P(∇k f (ω) = 2) + 2P(∇k f (ω) = −2)
X
1
m2 P(∇k f (ω) = m)
=
4
m∈{−2,0,2}
1
= E[(∇k f )2 ]
4
X
=
fˆ(s)2
s3k
Corollary 1.3. If f : Ωn → {−1, 1} then
I(f ) =
n
X
Ik (f ) =
k=1
X
|s|fˆ(s)2
s⊆[n]
Definition 1.16. Monotonic:
If ω, ω 0 ∈ Ωn then ω ≤ ω 0 iff ωi ≤ ωi0 ∀i
f : Ω → {−1, 1} is monotonic iff ω ≤ ω 0 =⇒ f (ω) ≤ f (ω 0 )
Proposition 1.5. If f is monotonic then
Ik (f ) = fˆ({k})
Proof.
fˆ({k}) =< f, χ{k} >
= E[f χ{k} ]
= E[f (ω)ωk (Ik∈P + Ik∈P
/ )]
= E[f (ω)ωk Ik∈P ]
Since f is monotonic we have that k ∈ P =⇒ f (ω) = ωk so
fˆ({k}) = E[Ik∈P ] = P(k ∈ P ) = Ik (f )
14
Corollary 1.4. If fn is a noise sensitive sequence of monotonic functions then
lim
n→∞
n
X
Ik (fn )2 = 0
k=1
Proof.
n
X
Ik (fn )2 =
k=1
n
X
fˆ({k})2
by monotonicity
k=1
= Efn [1]
which converges to 0 by noise sensitivity.
Corollary 1.5. If f is monotonic then I(f ) ≤
√
n
Proof.
I(f ) =
n
X
fˆ({k})2
k=1
v
v
u n u n
uX uX
1t
fˆ({k})2
≤t
k=1
=
≤
1.4
√
√
by Cauchy-Schwarz
k=1
n < f, f >2
n
Hypercontractivity
Definition 1.17. Noise Operator:
For ρ ∈ [0, 1] we say that
Tρ := E[f (ω (1−ρ) )|ω]
is the noise operator of f with respect to ρ
Lemma 1.9. For f ∈ L2 (Ωn ) we have that
Tˆρ f (s) = ρ|s| fˆ(s)
Proof.
Tˆρ f (s) =< Tρ f, χs >
= E[E[f (ω (1−ρ) )|ω]χs (ω)]
= E[f (ω (1−ρ) )χs (ω)]
= E[f (ω)χs (ω (1−ρ) )]
= E[f (ω)χs (ω)χs (ω 0 )]
where ω, ω 0 are independent and P(ω 0 = 1) = 1 − (1 − ρ)/2 so
Tˆρ f (s) = E[f (ω)χs (ω)χs (ω 0 )]
= E[f (ω)χs (ω)]E[χs (ω 0 )]
= fˆ(s)ρ|s|
15
Corollary 1.6.
Tρ f (ω) =
X
ρ|s| fˆ(s)χs (ω)
s⊆[n]
Notice that because ρ < 1 we have that if f is stable then the noise operator changes f very little
because the weight is concentrated on small s whereas if f is noise sensitive then the reverse happens.
Theorem 1.7. BGB:
For f ∈ L2 (Ωn ), ρ ∈ [0, 1] we have that
||Tρ f ||2 ≤ ||f ||1+ρ2
Theorem 1.8. BKS:
If
lim
n
X
n→∞
Ii (fn )2 = 0
i=1
then fn is noise sensitive.
Proof. For the case when ∃c, δ > 0 s.t.
n
X
Ii (fn )2 ≤ cn−δ
i=1
We want to show that
lim
n→∞
k
X
Efn (m) = 0
m=1
−1
By Holder: we have that for 1 ≤ p, q ≤ ∞ with p
+ q −1 = 1 we have
||f g||1 ≤ ||f ||p ||g||g
So
k
X
X
Efn (m) =
m=1
fˆn (s)2
1≤|s|≤k
X
≤
fˆn (s)2 |s|
1≤|s|≤k
=
n
X
X
fˆn (s)2 I{i∈s}
i=1 1≤|s|≤k
≤
n
X
X
∇iˆfn (s)2
i=1 1≤|s|≤k
≤
n
X
X
ρ2(|s|−k) ∇iˆfn (s)2
i=1 1≤|s|≤k
= ρ−2k
n
X
X
ˆi fn )(s)2
Tρ (∇
i=1 1≤|s|≤k
≤ ρ−2k
n
X
X
ˆi fn )(s)2
Tρ (∇
i=1 1≤|s|≤n
= ρ−2k
n
X
< Tρ (∇i fn ), Tρ (∇i fn ) >
by Parseval
i=1
≤
n
X
||∇i fn ||21+ρ2
by hyperconductivity
i=1
16
For f boolean wehave that ∇i f ∈ {−2, 0, 2} hence |∇i f (ω)|p = |∇i f (ω)|q 2p−q for any q < p so we have
that:
! 22
1+ρ
X
−n
1+ρ2
||∇i f (ω)||1+ρ2 ≤ 2
|∇i f |
ω
!
=
2−n
X
|∇i f |2 21+ρ
2
2
1+ρ2
−2
ω
!
=
2
−n
X
2
1+ρ2
2
|∇i f |
2
4
2− 1+ρ
2
ω
4
≤ 4||∇i f ||21+ρ
k
X
Ef (m) ≤ 4ρ−2k
n
X
m=1
2
4
||∇i f ||21+ρ
2
i=1
= 16ρ−2k
n
X
2
Ii (f ) 1+ρ2
i=1
≤ 16ρ−2k
n
X
!
1
1+ρ2
Ii (f )2
ρ2
n 1+ρ2
by Holder
i=1
16ρ−2k cn−δ
1
1+ρ2
ρ2
n 1+ρ2
By choosing ρ2 < δ we have that
lim 16ρ−2k cn−δ
1
1+ρ2
n→∞
ρ2
n 1+ρ2 ≤ c0 n
for some constant c0 so indeed fn is noise sensitive.
Example 1.6. The iterated majority function is noise sensitive.
For n = 3k we have Ii (f ) = 2−k so
n
X
Ii (f )2 = 3k (2−k )2
i=1
=
k
3
4
= nlog3 (0.75)
= n−δ
Example 1.7. The tribes function is noise sensitive.
We have that
c log(n)
Ii (f ) ≈
n
Fix ε ∈ (0, 1)
n
X
i=1
Ii (f )2 =
c2 (log(n))2
n
n2
≤ nε−1
For sufficiently large n.
17
2
− δ−ρ
1+ρ2
=0
The tribes function is the most noise sensitive monotonic function. In general the converse to the BKL
theorem is false however the following lemma shows when it can be true.
Lemma 1.10. If fn is a noise sensitive sequence of monotonic functions then
lim
n
X
n→∞
Ii (f )2 = 0
i=1
Proof. We have that Ik (f ) = fˆ({k}) for k = 1, ..., n and by noise sensitivity we have that
lim
n→∞
k
X
Efn (m) = 0
m=1
in particular this must hold for k = 1 so
n
X
k=1
Ik (f )2 =
n
X
fˆ({k}) = Efn (1)
k=1
Theorem 1.9. KKL:
∃c ∈ (0, ∞) s.t. ∀i = 1, ..., n and ∀f : Ω → {1, −1} we have that
n
X
1
I(f ) =
Ii (f ) ≥ cV ar(f ) log
Imax
i=1
where Imax = ||Inf (f )||∞
Proof. Let ρ2 = 1/2 then 1 + ρ2 = 3/2 and 2(1 + ρ2 )−1 = 4/3 hence
2
Ii (f )4/3 = Ii (f ) 1+ρ2
4
1
2
= || ∇i f ||21+ρ
2
1
= || ∇i f ||21+ρ2
2
≥ ||Tρ ∇i f ||2
X
=
fˆ(s)2 ρ2|s|
s3i
=
X
fˆ(s)2−|s|
s3i
n
X
Ii (f )3/4 ≥
n
X
mEf (m)2−m
m=1
i=1
≥ 2−b
b
X
b≤n
Ef (m)
m=1
Ii (f ) =
X
|s|fˆ(s)2
s∈[n]
≥b
n
X
Ef (m)
m=b+1
n
X
1/3
Ii (f )4/3 ≥ Imax
I(f )
i=1
V ar(f ) =
≤
n
X
Ef (m)
m=1
1/3
(2−b Imax
+ b−1 )I(f )
18
1/3
−1
Now we can find b ≥ A log(Imax
) for some fixed constant A such that 2−b Imax ≤ 1/b which gives us
that
V ar(f ) ≤
2I(f )
b
2 I(f )
≤
−1
A log(Imax
)
Corollary 1.7.
Imax ≥ cV ar(f )
log(n)
n
Proof. I(f ) ≤ nImax be definition so by KKL we have that
Imax
cV ar(f )
≥
−1
n
log(Imax
)
and by monotonicity of the LHS in Imax we have that the result follows.
1.5
Percolation
Definition 1.18. Path:
γ(x, y) = (z0 , ...zn ) is a path if zi ∈ Z2 ∀i such that x = z0 , y = zn and ||zi − zi−1 ||1 = 1 ∀i
Definition 1.19. First Passage Time:
For 0 < a < b we independently assign edges of Z2 weights ωe ∈ {a, b} such that
P(ωe = a) = 1/2 = P(ωe = b).
We then let
X
weight(γ(x, y)) =
ωe
e∈γ
We typically want to find the path with the minimum weight hence our is often to find the first passage
time:
infγ(x,y) weight(γ)
For f : {0, 1}n → {0, 1} denote
Pp (ω) :=
n
Y
pω(i) (1 − p)1−ω(i)
i=1
and let λ be the Lebesgue measure on the cube [0, 1]n
Theorem 1.10. BKKKL:
Let Ik (A) = λ({Xk determines IA (X)}) be the influence of the kth variable which is the probability
that the kth variable is critical in determining whether X ∈ A.
Then
n
X
1
cλ(A)λ(Ac )
log
Ik (A) ≥
V ar(IA )
Imax
k=1
and
Imax ≥
cλ(A)λ(Ac ) log(n)
n
Definition 1.20. Embedding:
For fixed p ∈ [0, 1] and x ∈ [0, 1]n write
(
1
ωi =
0
xi > 1 − p
o/w
19
With this definition we have that ω ∼ Pp
Corollary 1.8. If we let Ik,p (B) = Pp (ωk af f ects 1B ) then as a result of BKKKL we have that
n
X
c
Ik,p (B) ≥ cPp (B)Pp (B ) log
k=1
and
Imax ≥
1
Imax
cPp (B)Pp (B c ) log(n)
n
Theorem 1.11. Rasso’s Formula:
If A ⊆ {0, 1}n is increasing then
n
X
d
Pp (A) =
Ii,p (A)
dp
i=1
Lemma 1.11. For A increasing A 6= φ, {0, 1}n define Pq := p : Pp (A) = q then for ε ∈ (0, 1/2) we
have that
1
c log 2ε
p1−ε − pε ≤
log 1δ
In particular for symmetric increasing functions we have that
1
p1−ε − pε ≤ O
log(n)
Tribes, majority and iterated majority are examples of symmetric, increasing functions.
Theorem 1.12. Harris-Kesten:
For percolation on Z2 we have that pc = 1/2 and Θ(1/2) = 0
2
Concentration
2.1
Efron-Stein Inequality
Theorem 2.1. Efron-Stein Inequality
If {Xi }ni=1 are a sequence of i.i.d. random variables and f : Rn → R then
n
V ar[f ] ≤
1X
E[(f (X) − f (X (i) ))2 ]
2 i=1
where
(
(i)
Xj
=
Xj
Xj0
j=
6 i
j=i
where {Xi0 }ni=1 is a sequence of i.i.d. random variables with the same distribution as and independent
of {Xi }ni=1 .
Proof. Let
(
(i)
X̂j
:=
Xj
Xj0
20
j>i
j≤i
Then
V ar[f ] = E[f (X)2 ] − E[f (X)]2
= E[f (X)2 ] − E[f (X)]E[f (X 0 )]
= E[f (X)2 ] − E[f (X)f (X 0 )]
by independence
= E[f (X)(f (X) − f (X 0 ))]
n
X
f (X) − f (X 0 ) =
f (X̂(i − 1)) − f (X̂ (i) )
i=1
V ar[f ] =
n
X
E[f (X)(f (X̂ (i−1) ) − f (X̂ (i) ))]
i=1
=
n
X
E[f (X (i) )(f (X̂ (i) ) − f (X̂ (i−1) ))]
i=1
Since the variables are independent and identically distributed. By summing the last two equations
and dividing by 2 we get that
n
V ar[f ] =
1X
E[(f (X) − f (X (i) ))(f (X̂ (i−1) ) − f (X̂ (i) ))]
2 i=1
n
≤
1X
E[(f (X) − f (X (i) ))2 ]1/2 E[(f (X̂ (i−1) ) − f (X̂ (i) ))2 ]1/2
2 i=1
by Cauchy-Schwarz
n
1X
=
E[(f (X) − f (X (i) ))2 ]
2 i=1
since the variables are equal in distribution.
Notice that the inequality only arises for Cauchy-Schwarz so for functions where equality is obtained
in the Cauchy-Schwarz inequality we also have equality for Efron-Stein.
Corollary 2.1. First Passage Percolation:
Suppose we have a percolation grid of size [1, n]2 and each vertex takes some independent random time
ωx ≥ 0 to open once a neighbour is open. Given (1, 1) is initially open we want to find the earliest time
2
(n, n) becomes open. Furthermore assume that paths are directed
P to be increasing and that E[ωx ] < ∞.
For a path T : (1, 1) → (n, n) the time taken for the path is x∈T ωx .
So the first passage time is
X
ωx
TF P T := minT :(1,1)→(n,n)
x∈T
"
V ar
#
X
x∈T
ωx =
X
V ar[ωx ]
x∈T
= |T |V ar[ωx ]
= 2nV ar[ωx ]
Is the variance of any particular fixed path, ideally we would like to know the variance of TF P T .
21

V ar[TF P T (ω)] = V ar 

X
ωx 
TF P T (ω)
1
≤
2
=
X
E[(TF P T (ω) − TF P T (ω (x) ))2 ]
x∈[1,n]2
X
E[(TF P T (ω) − TF P T (ω (x) ))2 , ωx(x) ≥ ωx ]
by symmetry
x∈[1,n]2
=
X
E[(TF P T (ω) − TF P T (ω (x) ))2 , ωx(x) ≥ ωx ]
x∈TF P T (ω)
Since if x ∈
/ TF P T (ω) then there will be no change by increasing the time at a site.
Hence TF P T (ω) ≤ TF P T (ω (x) )
In particular
0 ≤ TF P T (ω (x) ) − TF P T (ω) ≤ ωx(x) − ωx ≤ ωx(x)
Moreover
0 ≤ (TF P T (ω (x) ) − TF P T (ω))2 ≤ (ωx(x) )2
So indeed
V ar[TF P T (ω)] ≤ 2nE[(ωx(x) )2 ]
2.2
Martingale Method
Lemma 2.1.
P(f − E[f ] > t) ≤ infλ>0 e−λt E[eλ(f −E[f ]) ]
Proof. Let λ > 0 then by Markov we have that
P(f − E[f ] > t) = P(eλ(f −E[f ]) > eλt )
≤ e−λt E[eλ(f −E[f ]) ]
Since λ > 0 was arbitrary we can choose any such λ > 0 so indeed the statement holds.
Corollary 2.2. If
E[eλ(f −E[f ]) ] ≤ e
then
σ 2 λ2
2
t2
P(f − E[f ] > t) ≤ e− 2σ2
Definition 2.1. Martingale Difference:
For a probability space (Ω, F, P) with filtration {FiN
}ni=0 where Fi = σ({Xj }ij=0 ) for some random
n
variables {Xj }j=0 on (Ω, F, P) and a function f :
F → R we define the martingale difference to be
di := E[f |Fi ] − E[f |Fi−1 ]
Pn
Theorem 2.2. Suppose f ∈ L1 (R) and let D2 := i=1 ||di ||2∞
then
t2
P(f − E[f ] > t) ≤ e− 2D2
22
Proof.
E[eλ(f −E[f ]) ] = E[eλ
Pn
i=1
= E[E[eλ
= E[e
λ
di
]
Pn
di
i=1
Pn
i=1
di
|Fn−1 ]]
E[eλdn |Fn−1 ]]
We want to estimate E[eλdn |Fn−1 ]
For u ∈ (−1, 1) we have that
eλu ≤
1 + u λ 1 − u −λ
e +
e
2
2
In particular
dn
eλdn = e ||dn ||∞ λ||dn ||∞
≤
1+
E[eλdn ] ≤ E
dn
||dn ||∞
" 2
1+
eλ||dn ||∞ +
#
dn
||dn ||∞
2
1−
dn
||dn ||∞
2"
eλ||dn ||∞ + E
e−λ||dn ||∞
#
1 − ||ddnn||∞
2
e−λ||dn ||∞
E[dn ] = 0
E[eλdn ] ≤
eλ||dn ||∞ + e−λ||dn ||∞
2
= cosh(λ||dn ||∞ )
≤e
λ2 ||dn ||∞
2
E[eλ(f −E[f ]) ] ≤ E[eλ
≤e
Pn−1
i=1
di
]e
λ2 ||dn ||2
∞
2
λ2 D 2
2
So the result follows from corollary 2.2.
Definition 2.2. Length:
A finite metric space (X, d) is of length at most l if ∃{X (i) }ni=1 sequence of partitions such that
X (i) ⊂ X (i+1) and X (n) = {{x}x∈X } is the complete partition such that
(i)
(i)
(i−1)
(i)
(i)
(i−1)
then we have ψ : X (i) → X (i) where for any
∈ X (i−1) with Aj , Ak ⊂ Ap
Aj , Ak ∈ X (i) , Ap
(i)
(i)
x ∈ Aj we have that ψ(x) ∈ Ak such that d(x, ψ(x)) ≤ ai where
n
X
l=
!1/2
a2i
i=1
Proposition 2.1. Let X, d) be a metric space of length at most l and
|
µ :=
1 X
X|δxi
|X| i=1
If f ∈ Lip1 (X) i.e. |f (x) − f (y)| ≤ |x − y| then
t2
µ(f − E[f ] > t) ≤ e− 2l2
Definition 2.3. Hamming Distance:
23
For X = {0, 1}n we denote the metric
n
dH (x, y) :=
1X
|xi − yi |
n i=1
to be the Hamming distance.
Lemma 2.2. If we construct a partition
X (i) = {{(xi )ni=1 }xj ∈{0,1} j = 1, ..., i
xi ∈ {0, 1}}
on the n dimensional hypercube then if f ∈ Lip1 (dH ) i.e. |f (x) − f (y)| ≤ dH (x, y) then
Z
t2
t2 n
µ f − f dµ > r ≤ e− 2l2 = e− 2
2.3
Convex Hull Approximation
Lemma 2.3. If f : Rn → R s.t.
Then
∂f 1
∂xi ≤ √n
Z
s2
µ f − f dµ > s ≤ e− 2
Proof. For x, y ∈ {0, 1}n we have that
|f (x) − f (y)| =
n
X
∂f (ξx
i=1
i ,yi
∂xi
≤ supi supξxi ,yi
)
(yi − xi )
ξxi ,yi ∈ (xi , yi )
n
∂f (ξxi ,yi ) X
|xi − yi |
∂xi
i=1
∂f = dH (x, y)n
∂x
√
≤ ndH (x, y)
√
So we have that f / n ∈ Lip1 (dH ) which gives us that
Z
√
t2 n
µ f − f dµ > t n ≤ e− 2
√
hence choosing t such that s = t n gives the required result.
The bound in the above lemma is independent of n hence this is a dimension free concentration
inequality.
In general this won’t hold if we replace the uniformity with ||∆f ||l2 ≤ 1.
Definition 2.4. Convex:
A function f : X → R is convex if whenever x, y ∈ X and λ ∈ (0, 1) then we have that
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
Lemma 2.4. If f is convex then for any x, y ∈ X we have that
f (x) ≤ f (y) + max||a||l2 ≤1
n
X
i=1
24
ai (yi − xi )
Proof.
f (x) ≤ f (y) + ∆f (x)(y − x)
n
X
∂f (x)
(yi − xi )
= f (y) +
∂xi
i=1
≤ f (y) + max||a||l2 ≤1
n
X
ai (yi − xi )
i=1
Corollary 2.3. For f : {0, 1}n → R we have that
f (x) ≤ f (y) + max||a||l2 ≤1
n
X
ai Ixi 6=yi
i=1
We let da (x, y) =
Pn
i=1
ai Ixi 6=yi and Dc (x, y) = max||a||l2 ≤1 da (x, y)
Theorem
Nn 2.3. Talagrand:
If A ⊂ i=1 Ω(i) =: X where we consider X with the product measure P then if we write
Dc (x, A) := infy∈A d(x, y) then
Z
D c (x,A)2
1
dP(x) ≤
e 4
P(A)
Moreover
r2
e− 4
P(D (x, A) ≥ r) ≤
P(A)
c
Proof. Let UA := {(Ix1 6=y1 , ..., Ixn 6=yn ) : y ∈ A} and let VA (x) be the smallest convex set containing
UA (x).
We have that:
n
X
Dc (x, A) = infy∈A sup||α||≤1
α1 Ixi 6=yi
i=1
= infs∈UA (x) sup||α||≤1 < α, s >
= infs∈VA (x) sup||α||≤1 < α, s >
= infs∈VA (x) ||s||
We want to complete the proof inductively so suppose n = 1.
If x ∈ Ac then x = {1} so Dc (x, A) = 1. From this we have that:
Z
Z
Z
c
2
D c (x,A)2 /4
D c (x,A)2 /4
e
dP(x) =
e
dP(x) +
eD (x,A) /4 dP(x)
(1)
c
Ω
A
ZA
Z
1/4
=
1dP(x) +
e dP(x)
A
Ac
c
= P(A) + e
1/4
P(A )
≤ P(A) + e
1/4
(1 − P(A))
=e
1/4
+ (1 − e1/4 )P(A)
≤1
≤
1
P(A)
25
So the case holds for n = 1.
Suppose the statement
Nn holds for n − 1 then we want to show that it holds for n.
Let z = (ω, x) N
∈ k=1 Ω(k)
Nn
n
(k)
Let B = {y ∈ k=2
: (ω, y) ∈ A f or some ω} be the projection of A onto k=2 Ω(k) .
NnΩ (k)
Let A(ω) = {y ∈ k=2 Ω : (ω, y) ∈ A}.
If s ∈ UA(ω) then (s, 0) ∈ UA (z) and
if t ∈ UB then (t, 1) ∈ UA (z)
So by convexity of V we have that for θ ∈ [0, 1] we have
θ(s, 0) + (1 − θ)(t, 1) = (θs + (1 − θ)t, 1 − θ) ∈ VA (z)
Dc (x, A)2 = infy∈VA (z) ||y||2
≤ inf(s,t)∈UA(ω) ×UB ||(θs + (1 − θ)t, (1 − θ)||2
= inf(s,t)∈UA(ω) ×UB |(θs + (1 − θ)t)|2 + |(1 − θ)|2
≤ inf(s,t)∈UA(ω) ×UB θs2 + (1 − θ)t2 + (1 − θ)2
= θDc (x, A(ω))2 + (1 − θ)Dc (x, B)2 + (1 − θ)2
Z
Z
n
O
c
2
2
c
2 1−θ
c
2
θ
e 4 D (x,A(ω)) e 4 D (x,B) dP1 (ω)d
eD (z,A) /4 dP(z) ≤ e(1−θ)
Pk (x)
k=2
Z
2
≤ e(1−θ)
e
c
2
θ
4 D (x,A(ω))
d
n
O
!θ
Z
Pk (x)
e
1−θ
c
2
4 D (x,B)
k=2
2
= e(1−θ)
Z
1
k=2 Pk (B)
Nn
Moreover we have that
infθ∈[0,1] e
Nn
θ
P (B)
Nn k=2 k
dP1 (ω)
k=2 Pk (A(ω))
(1−θ)2
4
u−θ ≤ 2 − u
so optimising over θ gives us that
Nn
Z
Z
Pk (B)
1
D c (z,A)2 /4
k=2
Nn
e
dP(z) ≤
2 − Nn
dP(ω)
Pk (B)
k=2 Pk (A(ω))
Z k=2
1
Nn
≤
dP(ω)
k=2 Pk (B)
1
= Nn
k=2 Pk (B)
1
≤
P(A)
Corollary 2.4. If f : X → R s.t. ∀x ∈ X ∃α s.t. f (x) ≤ f (y) + dα (x, y) then
f (x) ≤ m + Dc (x, A)
where A = {x : f (x) ≤ m}
Proof. If x ∈ A then Dc (x, A) = 0 and this holds trivially so assume x ∈
/ A.
Let x0 ∈ A then
f (x) ≤ f (x0 ) + Dc (x, x0 )
≤ m + Dc (x, A)
26
d
n
O
k=2
!1−θ
Pk (x)
dP1 (ω)
by Holder
Corollary 2.5. Choose m to be the median of f then
P(f ≥ m + r) ≤ P(Dc (x, A) ≥ r)
r2
e− 4
≤
P(A)
= 2e−
2.4
r2
4
Entropy
Definition 2.5. Entropy:
For a probability measure µ the entropy of a function f is defined as
Z
Z
Z
Z
f
f dµ = log R
f dµ
Entµ (f ) = f log(f )dµ − f dµ log
f dµ
Definition 2.6. Log-Sobelev Inequality:
Probability measure µ satisfies the Log-Sobelev inequality if forall integrable f we have that
Z
Entµ (f 2 ) ≤ C |∇f |2 dµ
Definition 2.7. Poincare Inequality:
Probability measure µ satisfies the Poincare inequality if for all integrable f
Z
V arµ (f ) ≤ C |∇f |2 dµ
Corollary 2.6. If µ satisfies the Log-Sobelev inequality then it also satisfies the Poincare inequality.
R
Lemma 2.5. If µ is the Lebesgue measure and µ(Ω) = 1 then if f (x)dµ(x) = 0 then
Z
Z
V arµ (f ) =
f 2 (x)dµ(x) ≤ C |∇f |2 dµ(x)
Ω
and the optiimal C is the largest positive eigenvalue of the Laplacian ∆ satisfying
(
∆u(x) = λu(x) x ∈ Ω
u(x) = 0
x ∈ ∂Ω
Definition 2.8. Generator:
Given a stochastic process Xt satisfying dXt = σ(Xt )dBt + b(Xt )dt we have the generator
L=
n
∂2
1 X
(σσ ∗ )i,j
+ b(x)∇
2 i,j=1
∂xi ∂xj
Definition 2.9. Semi-Group:
If Xt is a stochastic process then the operator Pt satisfying
Z
(Pt f )(X) = EX [f (Xt )] = f (y)pt (x, y)dy
is called the semi-group of X.
Lemma 2.6. If Pt is the semi-group of a stochastic process X then Pt+s = Pt Ps for any t, s ≥ 0.
Lemma 2.7. If X is a stochastic process with semi-group Pt and generator L then
Pt = etL
and in particular
Pt L = etL L =
d
Pt = LetL = LPt
dt
27
Lemma 2.8. If Pt is the semi-group of a stochastic process with standard Gaussian density γ then
Z
P∞ f (x) := lim Pt f (x) = f (y)dγ(y)
t→∞
and
P0 f (x) = f (x)
Proof.
Z
f (e−t x +
p
1 − e−2t y)dγ(y)
p
= Eγ [f (e−t x + 1 − e−2t )Y ]
(Pt f )(x) =
If X
`
Y ∼ N (0, 1) then e−t X +
√
Y ∼ N (0, 1)
1 − e−2t Y ∼ N (0, 1) so
P0 f (x) = Eγ [f (x)] = f (x)
and
Z
lim Pt f (x) = Eγ [f (Y )] =
t→∞
f (y)dγ(y)
Theorem 2.4. If γ is the density of the standard Gaussian in Rn then
Z
2
Entγ (f ) ≤ 2
|∇f |2 dγ(x)
Rn
Proof.
Z
Z
Z
f log(f )dγ −
Entγ (f ) =
f dγ log
f dγ
Z
(P0 f ) log(P0 f )dγ − (P∞ f ) log(P∞ f )
Z
= − (P∞ f ) log(P∞ f ) − (P0 f ) log(P0 f )dγ
Z ∞
Z
d
=−
(Pt f ) log(Pt f )dγdt
dt
0
Z ∞Z
Z ∞Z
d
d
=−
log(Pt f ) (Pt f )dγdt −
(Pt f ) log(Pt f )dγdt
dt
dt
0
0
Z ∞Z
Z ∞Z
d
(Pt f )(x)
dγ(x)dt
=−
(LPt f )(x) log(Pt f )(x)dγ(x)dt −
(Pt f )(x) dt
(P
t f )(x)
0
0
Z ∞Z
Z
Z
d
∞
(Pt f )(x)
d
(Pt f )(x) dt
dγ(x)dt =
(Pt f )(x)dγ(x)dt
(Pt f )(x)
dt
0
Z0 ∞
Z
d
f (x)dγ(x)dt
=
dt
0
=0
=
So we have that
Z
∞
Z
Entγ (f ) = −
(LPt f )(x) log(Pt f )(x)dγ(x)dt
0
Using the substitution L = 12 ∆ − x∇ gives
Z
∞
Z
Entγ (f ) = −
0
28
|∇Pt f (x)|2
dγ(x)dt
Pt f (x)
Moreover we have that
Z
p
∇Pt f (x) = ∇f (e−t x + 1 − e−2t y)dγ(y)
Z
p
= e−t (∇f )(e−t x + 1 − e−2t y)dγ(y)
Z
p
|∇Pt f (x)| = e−t (∇f )(e−t x + 1 − e−2t y)dγ(y)
Z
p
≤ e−t |∇f |(e−t x + 1 − e−2t y)dγ(y)
Z p
p
|∇f |
−t
=e
f √ (e−t x + 1 − e−2t y)dγ(y)
f
Z
1/2 Z
1/2
p
p
|∇f |2 −t
≤ e−t
f (e−t x + 1 − e−2t y)dγ(y)
(e x + 1 − e−2t y)dγ(y)
f
1/2
|∇f |2
= e−t (Pt f )(x) Pt
(x)
f
So
≤
∞
|∇Pt f (x)|2
dγ(x)dt
Pt f (x)
0
Z ∞ Z e−2t Pt f (x)Pt |∇f |2
f
Z
Entγ (f ) =
Z
dγ(x)dt
Pt f (x)
Z ∞Z
|∇f |2
=
e−2t Pt
dγ(x)dt
f
0
Z ∞
Z
|∇f |2
dγ(x)dt
=
e−2t Pt
f
0
Z ∞
Z
|∇f |2
=
e−2t
dγ(x)dt
f
0
Z ∞
Z
|∇f |2
=
e−2t dt
dγ(x)
f
0
Z
|∇f |2
1
dγ(x)
=
2
f
0
So it follows that
Z
1
|∇f 2 |2
Entγ (f 2 ) ≤
dγ(x)
2
f2
Z
= 2 |∇f |2 dγ(x)
Corollary 2.7. If dµ = e−u(x) such that ∇u ≥ cI then
2
Entµ (f 2 ) ≤ ||∇f ||2L2 (µ)
c
Proposition 2.2. If µ is a probability measure such that Entµ (f ) ≤ C||∇f ||2L2 (µ) and
|F (x) − F (y)| ≤ ||F ||Lip |x − y| ∀x, y ∈ Rn then
Z
Ct2
µ F − F dµ ≥ t ≤ e− 2
29
Cauchy-Schwarz
Proof. Set
f 2 = eλF −C
λ2
2
||F ||2Lip
Then by Log-Sobelev we have that
Z
Z Z
2
2
2
Cλ2
Cλ2
Cλ2
Cλ2
λF −
||F ||2Lip eλF − 2 ||F ||Lip dµ − eλF − 2 ||F ||Lip dµ log
eλF − 2 ||F ||Lip dµ
2
Z
λ2
|∇F |2 |f |2 dµ
≤ Ccs
4
Z
2
Cλ2
λ2
F Lipschitz
≤ Ccs ||F ||2Lip eλF − 2 ||F ||Lip dµ
4
Letting
Z
Λ(λ) :=
eλF −
Cλ2
2
||F ||2Lip
we have that
Cλ2
λ2
||F ||2Lip Λ(λ) − Λ(λ) log(Λ(λ)) ≤ Ccs ||F ||2Lip Λ(λ)
2
4
So choosing C = Ccs /2 and solving
λΛ0 (λ) +
λΛ0 (λ) ≤ Λ(λ) log(Λ(λ))
yields that
Λ(λ) ≤ e
Cλ2
2
as required.
Lemma 2.9. ∀y ∈ R we have that
x log(x) − x ≥ xy − ey
Lemma 2.10. Variation Characterisation:
Z
eg dµ ≤ 1, g ∈ Cb (R)
Z
= supg∈Cb (R)
f gdµ − log
eg dµ
Z
Z
≥ f gdµ − log
eg dµ
Entµ (f ) = sup
Z
f gdµ :
Z
Theorem
Tensarisation:
N2.5.
Nn
n
Let X = i=1 Ωi and P = i=1 µi be a product measure on X then if f : X → R is measurable and
each µi satisfies Log-Sobelev then
n Z
X
EntP (f ) ≤
Entµi (fi )dP
i=1
where fi (y) = f (x1 , ..., xi−1 , y, xi+1 , ..., xn ).
Proof. For g s.t.
Z
eg dP ≤ 1
define
R
gi (x1 , ..., xn ) := log
eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi−1 (xi−1 )
R
eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi (xi )
Notice that:
30
!
•
Z
Z R
gi
e dµi (xi ) =
RR
=
eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi−1 (xi−1 )
R
dµi (xi )
eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi (xi )
eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi−1 (xi−1 )dµi (xi )
R
eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi (xi )
=1
•
!
R g(x ,...,x )
R g(x ,...,x )
n
n
e 1
dµ1 (x1 )...dµn−1 (xn−1 )
e 1
dµ1 (x1 )
eg(x1 ,...,xn )
R
... R g(x ,...,x )
gi = log R g(x ,...,x )
g(x1 ,...,xn ) dµ (x )dµ (x )
1
n dµ (x )
n dµ (x )...dµ (x )
e
e
e 1
1
1
1
1
2
2
1 1
n n
i=1
g(x1 ,...,xn )
e
= log R g(x ,...,x )
1
n dµ (x )...dµ (x )
e
1 1
n n
Z
g(x1 ,...,xn )
e
= g − log
dµ1 (x1 )...dµn (xn )
n
X
≥g
•
Z
f gdP ≤
n Z
X
f gi dP
i=1
=
≤
n Z Z
X
i=1
n Z
X
f (x)gi (x)dµ(xi )dP
Entµi (fi )dP
i=1
Z
EntP (f ) = sup
≤
g
n Z
X
f gdP
Entµi (fi )dP
i=1
Corollary 2.8. If
Z
Entµi (fi ) ≤ Ci
then
|∇fi |2 dµi
Z Z
EntP (f ) ≤ maxi Ci
∀i
||∇f ||2L2 (R) dµi dP
Corollary 2.9. If
Z
f dµ = 1
then
Z
Entµ (f ) =
Z
f log(f )dµ =
log(f )(f dµ)
dν := f dµ is a probability measure so
Z
Entµ (f ) =
log
31
dν
dµ
dν
Definition 2.10. Relative Entropy:
The relative entropy of ν given µ is defined as
(
dν
Entµ dµ
H(ν|µ) =
∞
ν≤µ
o/w
Corollary 2.10. The entropy of a function is always positive moreover H(ν|µ) = 0 iff ν = µ.
Definition 2.11. Total-Variation Distance:
For measures µ, ν we define the total variation distance to be
||µ − ν||T V := supA |µ(A) − ν(A)|
Corollary 2.11.
||µ − ν||T V
Z dν
= (x) − 1dµ(x)
dµ
Lemma 2.11.
||µ − ν||T V ≤
2.5
p
H(ν|µ)/2
Stein’s Method
Lemma 2.12. W ∼ N (0, σ 2 ) iff
E[W ϕ(W )] = σ 2 E[ϕ0 (W )]
∀ϕ Lipschitz on C 1
Theorem 2.6. Let γσ2 be the density of a N (0, σ 2 ) random variable and suppose W is a random
variable with distribution µ.
If ∃T random variable such that
E[W ϕ(W )] = E[T ϕ0 (W )]
∀ϕ Lipschitz on C 1 then
||µ − γσ2 ||T.V ≤
2
E[|T − σ 2 |]
σ2
Proof. Given any bounded f and some Z ∼ N (0, σ 2 ) we have that
xϕ(x) − σ 2 ϕ0 (x) = f (x) − E[f (Z)]
has some Lipschitz solution ϕ s.t. ||ϕ0 ||∞ ≤ 2/σ 2
So
W ϕ(W ) − σ 2 ϕ0 (W ) = f (W ) − E[f (Z)]
and taking expectations of both sides yields
E[W ϕ(W )] − E[σ 2 ϕ0 (W )] = E[f (W ) − f (Z)]
Furthermore
E[W ϕ(W )] − E[σ 2 ϕ0 (W )] = E[T ϕ0 (W )] − E[σ 2 ϕ0 (W )]
= E[(T − σ 2 )ϕ0 (W )]
E[|f (W ) − f (Z)|] ≤ E[|T − σ 2 ||ϕ0 (W )|]
2
≤ 2 E[|T − σ 2 |]
σ
Hence we have that
||µ − γσ2 ||T.V = supf ∈Lip E[f (W ) − f (Z)] ≤
32
2
E[|T − σ 2 |]
σ2
In this case T is called Stein’s Operator.
Corollary 2.12.
Setting φ(x) = 1 gives
E[W ] = 0
Setting φ(x) = x gives
V ar[W ] = E[W 2 ] = E[T ]
Setting σ 2 = V ar[W ] gives
||µ − γσ2 ||T.V ≤
2p
V ar[T ]
σ2
Theorem 2.7. If W = f (X1 , ..., Xn ) for Xi ∼ N (0, 1) and E[W ] = 0 then
Z 1X
n
√
√
1
√ ∂i f (X)∂i f ( tX + 1 − tY )dt
T =
0 i=1 2 t
where Y is independent of X and has the same distribution then T satisfies the condition for theorem
2.6.
√
√
√
√
√
√
Proof. Let Xt := tX + 1 − tY and Zt := 1 − tX − tY then X = tXt + 1 − tZt . We have
that:
E[W ϕ(W )] = E[f (X)ϕ(f (X))]
= E[f (X)ϕ(f (X))] − E[f (X)ϕ(f (Y ))] + E[f (X)ϕ(f (Y ))]
= E[f (X)(ϕ(f (X)) − ϕ(f (Y )))] + E[f (X)]E[ϕ(f (Y ))]
= E[f (X)(ϕ(f (X)) − ϕ(f (Y )))]
Z 1
√
√
d
ϕ(f ( tX + 1 − tY ))dt
= E f (X)
0 dt
"
#
Z 1
n
X
√
d √
0
= E f (X)
∂i f (Xt ) ( tXi + 1 − tYi )dt
ϕ (f (Xt ))
dt
0
i=1
"
#
Z 1
n
X
1
1
0
√ Xi − √
= E f (X)
ϕ (f (Xt ))
∂i f (Xt )
Yi dt
2 1−t
2 t
0
i=1
Z 1X
n
1
1
√ Xi − √
Yi dt
=
E f (X)ϕ0 (Xt )∂i f (Xt )
2 1−t
2 t
0 i=1
Z 1
n
X
1
p
=
E[f (X)ϕ0 (Xt )∂i f (Xt )Zi ]dt
0 2 t(1 − t) i=1
Z 1
n
X
∂
1
0
p
=
E
f (X)ϕ (Xt )∂i f (Xt ) dt
∂Zt,i
0 2 t(1 − t) i=1
Z 1
n
1 X
√
=
E[∂i f (X)ϕ0 (Xt )∂i f (Xt )]dt
0 2 t i=1
Z 1
n
1 X
√
E[∂i f (Xt )ϕ0 (X)∂i f (X)]dt
=
2
t
0
i=1
"Z
#
n
1
1 X
0
√
=E
∂i f (Xt )ϕ (X)∂i f (X)dt
0 2 t i=1
" Z
!
#
n
1
1 X
√
=E
∂i f (X)∂i f (Xt )dt ϕ0 (f (X))
0 2 t i=1
= E[T ϕ0 (f (X))]
33
Corollary 2.13.
V ar[W ] ≤ E[|∇f (X)|2 ]
Proof. We have that
1
Z
E[W ϕ(W )] =
0
n
1 X
√
E[∂i f (X)∂i f (Xt )ϕ0 (W )]dt
2 t i=1
So set ϕ(x) = x then we have that
V ar[W ] = E[W ϕ(W ]
Z 1
n
1 X
√
=
E[∂i f (X)∂i f (Xt )]dt
0 2 t i=1
Z 1
1
√ E[|∇f (X)||∇f (Xt )|]dt
≤
0 2 t
Z 1
1
√ E[|∇f (X)|2 ]1/2 E[|∇f (Xt )|2 ]1/2 dt
≤
2
t
0
Z 1
1
√ dtE[|∇f (X)|2 ]
=
2
t
0
= E[|∇f (X)|2 ]
Cauchy-Schwarz
Cauchy-Schwarz
Lemma 2.13. If Entµ (f 2 ) ≤ CCS |∇f ||2L2 (µ) then for every f Lipshitz we have that
Z
µ f−
f dµ > t
≤ e−Ct
2
Proof. We have that
1
Z
E[W ϕ(W )] =
0
n
1 X
√
E[∂i f (X)∂i f (Xt )ϕ0 (W )]dt
2 t i=1
So set ϕ(x) = eλx and let Λ(λ) = E[eλW ] be the moment generating function of W .
This gives us that
Λ0 (λ) = E[W eλW ]
Z 1
n
λ X
√
=
E[∂i f (X)∂i f (Xt )]dt
0 2 t i=1
Z 1
λ
√ E[|f (X)||f (Xt )|eλW ]dt
≤
0 2 t
Z 1
λ
√ ||f ||2Lip E[eλW ]dt
≤
0 2 t
= λ||f ||2Lip Λ(λ)
Cauchy-Schwarz
= λ||f ||2Lip E[eλW ]
this gives us the differential inequality
Λ0 (λ) ≤ λ||f ||2Lip Λ(λ)
hence
Λ0 (λ)
≤ λ||f ||2Lip
Λ(λ)
34
So
log(Λ0 (λ)) ≤ λ||f ||2Lip
which gives us that
log(Λ(λ)) ≤
So
Λ(λ) ≤ e
λ2
||f ||2Lip
2
λ2
2
||f ||2Lip
Furthermore we have that
Z
µ f − f dµ > t ≤ e−λt Λ(λ)
=e
λ2 ||f ||2
Lip −2λt
2
In particular this must hold for
inf e
λ2 ||f ||2
Lip −2λt
2
λ>0
The infimum such λ > 0 must also satisfy
d λ2 ||f ||2Lip − 2λt
=0
dλ
2
by montonicity. Hence λ||f ||2Lip − t = 0 hence
λ=
t
||f ||2Lip
which gives us that
Z
t2
−
2
µ f − f dµ > t ≤ e 2||f ||Lip
Theorem 2.8. If W = f (X1 , ..., Xn ) where Xi ∼i,i,d N (0, 1) and E[W ] = 0, V ar[W ] = σ 2 then
√
10
||µ − γσ2 ||T.V ≤ 2 E[||H(f (X))||4 ]1/4 E[|∇f (Xt )|4 ]1/4
σ
where
H(f )i,j =
∂2f
∂xi ∂xj
and
||H(f )|| =
n
X
!1/2
2
Hi,j
i=1
Proof. If T is Stein’s operator then
E[W ϕ(W )] = E[T (X, Y )ϕ0 (f (X))]
= E[E[T (X, Y )|X]ϕ0 (f (X))]
So we have that T̃ (X) := E[T (X, Y )|X] is also a Stein’s operator, moreover
T̃ (x) =
n Z
X
i=1
0
1
√
√
1
√ ∂i f (x)EY [∂i f ( tx + 1 − tY )]dt
2 t
35
So it follows that
n
X
∂ T̃
(x) =
∂xj
i=1
1
Z
0
√
1 √ ∂i,j f (x)EY [∂i f (xt )] + ∂i f (x) tE[∂i,j f (xt )] dt
2 t
From corollarys 2.12,2.13 we have that
||µ − γσ2 ||T.V ≤
2
σ2
q
2
σ2
V ar[T̃ ] ≤
q
E[|∇T̃ (x)|2 ]
We then get that since (a + b)2 ≤ 2a2 + 2b2 and by Cauchy-Schwarz we have that

!2 
n
X
∂
T̃

E[|∇T̃ (x)|2 ] = E 
∂xj
j=1

!2 
Z 1X
n
n
X
√
1
√ ∂i,j f (x)EY [∂i f (xt )] + ∂i f (x) tE[∂i,j f (xt )] dt 
= E
0 i=1 2 t
j=1



!2
!2  2
n Z 1
n
n
X
X
X
√
1


√ 
≤ E 
∂i,j f (x)EY [∂i f (xt )] +
∂i f (x) tE[∂i,j f (xt )]  dt 
2 t
j=1 0
i=1
i=1

n Z
X
≤ 2E 
j=1
1
1
√
2 t
0
n
X
!2
∂i,j f (x)EY [∂i f (xt )]


n Z
X
dt + 2E 
i=1
j=1
The Hessian matrix is symmetric (i.e. ∂i,j = ∂j,i ) so we get that:
n
n
X
X
j=1
!2
∂i,j f (x)EY [∂i f (xt )]
=
i=1
n
n
X
X
j=1
=
=
Hi,j EY [∇f (Xt )]i
i=1
n
n
X
X
j=1
n
X
!2
!2
Hj,i EY [∇f (Xt )]i
i=1
|HEY [∇f (Xt )]j |2
j=1
= ||HEY [∇f (Xt )]||2
E[||HEY [∇f (Xt )]||2 ] ≤ E[||H||2 ]E[EY [|∇f (Xt )|]2 ]
≤ E[||H||4 ]1/2 E[EY [|∇f (Xt )|]4 ]1/2
We can conclude that
q
2
2
||µ − γσ || ≤ 2 E[|∇T̃ (x)|2 ]
σ
≤ CE[||H||4 ]1/4 E[|∇f (Xt )|4 ]1/4
Theorem 2.9. Chalterise:
36
0
1
1
√
2 t
n
X
i=1
!2
∂i f (x)EY [∂i,j f (xt )]

dt
If X, Y ∈ Rn are independent random vectors such that {Yi }ni=1 are also independent let
i
h
i−1
] − E[Yi ]
Ai = E E[Xi |{Xj }j=0
h
i
2
Bi = E E[Xi2 |{Xj }i−1
j=0 ] − E[Yi ]
M3 = maxi E[|Xi |3 ] + E[|Yi |3 ]
Li (f ) = maxr |∂ir f |
then
|E[f (X) − f (Y )]| ≤ L1 (f )
n
X
n
X
1
n
Ai + L2 (f )
Bi + L3 (f )M3
2
6
i=1
i=1
Lemma 2.14. If A is an n × n matrix then we define the norms
p
||A|| = sup|x|=1 |Ax| = λmax AT A
v
uX
q
u n 2
||A||H.S = t
ai,j = T r(AT A)
i,j=1
where λmax is the maximal eigenvalue of AT A.
Then
√
• ||A|| ≤ ||A||H.S ≤ n||A||
• ||AB||H.S ≤ ||A||||B||H.S
• |T r(AB)| ≤ ||A||H.S ||B||H.S
• ||AB|| ≤ ||A||||B||
n
Proposition 2.3.
√ Suppose X = (Xi,j )i,j=1 are independent copies of N (0, 1) random variables. We
let Ai,j := Xi,j / n be the normalised matrix which has eigenvalues {λi (A)}i = 1n .
For k ∈ N we have that


n
Y
X
X k−2

aij ,ij+1  aik−1 ,i1
λki = T r(Ak ) =
i=1
ij
j=1
which can be written as some function F of the vector (X1,1 , ..., X1,n , X2,n , ..., Xn,n ) which is Gaussian
and hence has Stein’s coefficient T with
√
2p
10
k
||T r(A ) − γσ2 ||L.V ≤
V ar(T ) ≤ 2 E[||HF (X) ||4 ]1/4 E[|∇f (X)|4 ]1/4
σ2
σ
Furthermore there exist contant functions C1 (k), C2 (k) such that
C1 (k) ≤ σ 2 = V ar(T (Ak )) ≤ C2 (k)
Proof. We want to bound both E[||HF (X) ||4 ]1/4 and E[|∇f (X)|4 ]1/4 above and then V ar(T r(Ak ))
below so we split the proof into three parts:
• We start by bounding E[|∇f (X)|4 ]1/4 above.
37
∂
Ak
∂xi,j
k−1
X
∂A k−r−1
=
T r Ar
A
∂xi,j
r=0
k−1
X
∂A k−1
=
A
Tr
∂xi,j
r=0
∂A k−1
= kT r
A
∂xi,j
( √
1/ n i = p, j = q
=
0
o/w
∂
T r(Ak ) = T r
∂xi,j
∂A
∂xi,j
p,q
1
= √ ei eTj
n
∂
1
k
T k−1
T r(A ) = kT r √ ei ej A
∂xi,j
n
k
= √ T r(eTj Ak−1 ei )
n
k
= √ (Ak−1 )i,j
n
2
n
X ∂
k
2
T r(A )
|∇F (X)| =
∂xi,j
i,j=1
n
k 2 X k−1 2
=
(A
)i,j
n i,j=1
k 2 k−1 2
||A
||H.S
n
≤ k 2 ||Ak−1 ||2
=
≤ ||A||2(k−1)
So we have that
E[|∇F (X)|4 ] ≤ k 4 E[||A||4(k−1) ]
• We now show that V ar(T r(Ak )) is bounded below.
If X, Y are positively correlated then
0 ≤ Cov(X, Y )
= E[XY ] − E[X]E[Y ]
so E[XY ] ≥ E[X]E[Y ]
Furthermore the products {Xi1 ,i2 ...Xik−1 ,i1 }ij are positively correlated so




X k−2
Y
1

Xij ,ij+1  Xik−1 ,i1 
V ar(T r(Ak )) = V ar  k/2
n
i
j=1
j
1 X
≥ k
V ar(X1,1 )k
n i
j
= V ar(X1,1 )k
38
So indeed we have a lower bound.
• Finally we show that E[||HT r(Ak ) ||4 ]1/4 is bounded above.
∂
Ak−1
∂xi,j
k−2
X
∂2
∂A r ∂A k−r−2
T r(Ak ) = k
A
A
Tr
∂xi,j ∂xp,q
∂xi,j
∂xp,q
r=0
∂A ∂A k−2
= k(k − 1)T r
A
∂xi,j ∂xp,q
∂
T r(Ak ) = kT r
∂xi,j
Note that ||A|| = sup|y|=1 |Ay| = sup|y|,|x|=1 < x, Ay >. So we have that


n
n
n
 X

X
X
∂ 2 T r(Ak )
:
||HT r(Ak ) || = sup
c2i,j =
ci,j dp,q
d2p,q = 1


∂xi,j ∂xp,q
i,j=1
i,j,p,q=1
i,j=1
Pn
Write C = (ci,j )ni,j=1 and D = (dp,q )np,q=1 then ||C||2H.S = i,j=1 c2i,j and ||D||2H.S =
So we have that
k−2
n
X
kX
∂ 2 T r(Ak )
=
T r(CAr DAk−r−2 )
ci,j dp,q
∂x
∂x
n
i,j
p,q
r=0
i,j,p,q=1
Moreover since
T r(ACBD) ≤ ||AC||H.S ||BD||H.S ≤ ||A||||B||||C||H.S ||D||H.S
it follows that
k−2
k(k − 1)
kX
T r(CAr DAk−r−2 ) ≤
||A||k−2 ||C||H.S ||D||H.S
n r=0
n
k(k − 1)
||A||k−2
n
k(k − 1)
E[||A||4(k−2) ]1/4
≤
n
=
E[||HT r(Ak ) ||4 ]1/4
So we have our final bound.
Lemma 2.15. If
Z
Pt f (x) =
f (e−t x +
p
1 − e−2t y)dγ(y)
then
||Pt f (x)||L2 (γ) ≤ ||f ||Lq∗ (t) (γ)
where q ∗ (t) = 1 + e−2t .
Theorem 2.10. Benaim-Rossignal:
If γ is the standard Gaussian density in Rn then
 
∂f
n ∂x
X
2
i
L1 (γ) 
∂f 
V ar(f ) ≤
2 ϕ  
∂f
∂x
L
(γ)
i
i=1
∂xi
where
1
Z
ϕ(u) =
0
u2t
dt
(1 + t)2
39
L2 (γ)
Pn
i,j=1
d2i,j .
Proof.
Z
V arγ (f ) =
f 2 (x)dγ(x) −
2
Z
f (x)dγ(x)
Z
Z
f 2 (x)dγ(x) − (P∞ f (x))2 dγ(x)
Z
Z ∞
d
(Pt f (x))2 dγ(x)
=−
dt
0
Z ∞Z
d
Pt f (x) Pt f (x)dγ(x)
= −2
dt
0
2
Z ∞X
n ∂
=2
Pt f (x) dγ(x)
∂xi
0
i=1
Z ∞
n Z
X
2
((Pt ∂i f )(x)) dγ(x)dt
=2
e−2t
=
0
Z
i=1
∞
e−2t
=2
0
Z
≤2
Z
Z
=
Z
≤
||Pt ∂i f ||2L2 (γ) dt
i=1
∞
e−2t
0
||∂i f ||2Lq∗ (t) (γ) =
n
X
n
X
||∂i f ||2Lq∗ (t) (γ) dt
by hypercontractivity
i=1
|∂i f |q
∗
(t)
|∂i f |2(q
∗
q∗2(t)
dγ(x)
(t)−1)
|∂i f |2−q
∗
(t)
q∗2(t)
dγ(x)
∗ (t))
Z
2(2−q
2(qq∗∗(t)−1)
(t)
q ∗ (t)
|∂i f |dγ(x)
by Holder
|∂i f |2 dγ(x)
4(q ∗ (t)−1)
∗
2(2−q ∗ (t))
∗
q (t)
q (t)
= ||∂i f (x)||L2 (γ)
||∂i f (x)||L1 (γ)
  q∗4(t) −2
∂f
∂x
i
L1 (γ) 

||∂i f (x)||2L2 (γ)
=  
∂f
∂xi 2
L (γ)
So we have that
Z ∞
n
X
V arγ (f ) ≤ 2
e−2t
||∂i f ||2Lq∗ (t) (γ) dt
0
i=1
 q∗4(t) −2
 ∂f
∂x
 i L1 (γ) 
≤2
e−2t
||∂i f (x)||2L2 (γ) dt

 ∂f
0
∂xi 2
i=1
L (γ)
 
∂f
n
X ∂f 2
 ∂xi L1 (γ) 
≤
2 ϕ  
∂f
∂xi L (γ)
∂x
i=1
i
2
Z
∞
n
X
L (γ)
40
Download