Topics In Applied Probability March 12, 2013 Contents 1 Noise Sensitivity And Boolean Function 1.1 Definitions And Examples . . . . . . . . 1.2 Noise Sensitivity and Stability . . . . . . 1.3 Fourier Analysis Of Boolean Functions . 1.4 Hypercontractivity . . . . . . . . . . . . 1.5 Percolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 4 8 15 19 2 Concentration 2.1 Efron-Stein Inequality . . . 2.2 Martingale Method . . . . . 2.3 Convex Hull Approximation 2.4 Entropy . . . . . . . . . . . 2.5 Stein’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 20 22 24 27 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 Noise Sensitivity And Boolean Function 1.1 Definitions And Examples Definition 1.1. Boolean Function A boolean function is a map f : {−1, 1}n → {−1, 1} for some n ∈ N Theorem 1.1. Suppose {Xi }ni=1 are independent Bernoulli p random variables which represent votes for a correct answer 1 and an incorrect answer −1. If we choose the result chosen by the majority of the Xi and let R(n, p) be the probability that the overall result is correct then: • p > 1/2 R(n, p) is increasing in p with limit 1 • p < 1/2 argmaxn≥1 R(n, p) = 1 Proof. Let n = 2m − 1, we will show that for n ≥ 1 that increasing m by one decreases the probability of the correct solution for p < 1/2 and increases it for p > 1/2. R(n + 2, p) − R(n, p) = P(n + 2 are correct, first n are incorrect) − P(n + 2 are incorrect, first n are correct) n n = pm−1 (1 − p)m p2 − pm (1 − p)m−1 (1 − p)2 m m n = pm (1 − p)m (2p − 1) m ( > 0 p > 1/2 < 0 p < 1/2 Definition 1.2. Uniform Measure N The uniform measure on Ωn is P = (δ1 /2 + δ−1 /2) n which is the product measure which independently assigns probability 1/2 to each bit taking either of the two possible values. Example 1.1. There are four important Boolean functions which we will consider, these are: • Dictator Function: DICTn (ω) = ω1 which represents the first bit having all of the influence. • Parity Function: P ARn (ω) = n Y ( ωi = i=1 1 −1 #{ωi = −1} even o/w which is a non-increasing Boolean function. • Majority Function: M AJn (ω) = sign n X ! ωi i=1 ( = 1 −1 #{ωi = 1} > n/2 o/w which represents a voter model where the outcome is decided by a majority vote. 2 • Iterated Majority: IT M AJn (ω) = M AJ3 (IT M AJn/3 (ω1 , ..., ωn/3 ), IT M AJn/3 (ω1+n/3 , ..., ω2n/3 ), IT M AJn/3 (ω1+2n/3 , ..., ωn )) which represents a voter system where the voters are grouped into blocks of three, each block takes a value based on a majority vote, these results are then grouped into blocks of three and the process repeats. In this case the order is important. Denote [n] := {1, ..., n} and for i, j ∈ [n], ω ∈ Ωn ( ω(j) j 6= i i ω (j) = −ω(j) j = i Definition 1.3. Pivotal For f : Ωn → {−1, 1}, ω ∈ Ωn we say that i ∈ [n] is pivotal if f (ω) 6= f (ω i ) Definition 1.4. Pivot Set For f : Ωn → {−1, 1}, ω ∈ Ωn we write the pivot set to be Pf (ω) := {i ∈ [n] : i is pivotal f or f, ω} Definition 1.5. Influence For f : Ωn → {−1, 1} the influence of i ∈ [n] is Ii (f ) = P(i ∈ Pf ) We denote Inf (f ) to be the vector if influences. Definition 1.6. Total Influence The total influence of f : Ωn → {−1, 1} is I(f ) := n X ||Inf (f )||1 i=1 Example 1.2. Tribes: Suppose that we have a population of n = mk where m ≈ 2k log(2). If we let k be the size of a tribe then let each member of each tribe vote {−1, 1}, we then take decision 1 if at least one tribe unanimously voted 1 so we have ( 1 ∃tribe all voting 1 f (ω) = −1 o/w This means that n ≈ k2k log(2) so k ≈ log2 (n) − log2 (log2 (n)) and (1 − 2−k )m ≈ 1/2 Which is the probability that no tribes votes unanimously. Ii (f ) = P(1 pivotal) by symmetry = P(other k − 1 members of the first tribe vote 1, no other tribe in unanimous) = P(other k − 1 members of the first tribe vote 1)P(no other tribe in unanimous) by independence of tribes k−1 1 (1 − 2−k )m−1 = 2 ≈ 2−k ≈c log(n) n Theorem 1.2. KKL Theorem: If P(f (ω) = 1) ≈ 1/2 then maxIi (f ) ≥ c 3 log(n) n 1.2 Noise Sensitivity and Stability Notation 1.1. Let ω ∼ U{−1, 1}n then we denote ω (ε) to be ω where each bit is independently rerandomized with probability ε. (ε) It follows that P(ωi = ωi ) = 1 − ε/2 Definition 1.7. Noise Sensitive: The sequence of boolean functions {fn }n∈N is noise sensitive if ∀ε ∈ (0, 1) we have that lim E[fn (ω)fn (ω (ε) )] − E[fn (ω)]2 = 0 n→∞ For the previous definition it suffices that the limit holds for some ε ∈ (0, 1) since this implies it ∀ε ∈ (0, 1) Intuitively this is saying that changing a few bits randomly leads to independece between fn (ω), fn (ω (ε) ) for large populations sizes n. In this sense a small amount of noise changes the values sufficiently and hence it is more likely that an individual is pivotal. Definition 1.8. Noise Stable: The sequence of boolean functions {fn }n∈N is noise stable if lim supn P(fn (ω) 6= fn (ω (ε) )) = 0 ε→0 Intuitively this says that for any population size, changing approximately nε/2 bits has no effect as ε → 0. Lemma 1.1. If fn is noise sensitive and noise stable then limn→∞ V ar[fn ] = 0 Proof. Let P(fn (ω) = 1) = p then we have that V ar[fn (ω)] = E[fn (ω)2 ] − E[fn (ω)]2 = 1 − (1 − p − p)2 = 4p(1 − p) which is equal to zero iff p ∈ {0, 1} From noise stability we have that lim supn P(fn (ω) 6= fn (ω (ε) )) = 0 ε→0 and from noise sensitivity we have that ∀ε > 0 lim E[fn (ω)fn (ω (ε) )] − E[fn (ω)]2 = 0 n→∞ which gives us that lim E[fn (ω)fn (ω (ε) )] + V ar[fn (ω)] − E[fn (ω)2 ] = 0 n→∞ lim E[fn (ω)(fn (ω (ε) ) − fn (ω))] + V ar[fn (ω)] = 0 n→∞ For δ > 0 choose ε > 0 s.t. supn P(fn (ω) 6= fn (ω (ε) )) < δ which gives us that E[fn (ω)(fn (ω (ε) ) − fn (ω))] ≤ E[fn (ω (ε) ) − fn (ω)] < 2δ Example 1.3. The dictatorship function f (ω) = ω1 is noise stable and not noise sensitive. 4 Proof. P(fn (ω) 6= fn (ω (ε) ) = ε/2 which is independent of n and converges to zero as ε → 0 so indeed the function is noise stable. We can write ( 1 p = 1 − ε/2 (ε) 0 0 ω =ω×ω : ω = −1 p = ε/2 ` for ω 0 ω For this we have that: E[fn (ω)]2 = 0 E[fn (ω)fn (ω (ε) )] = E[ω1 ω1 ω10 ] = E[ω12 ]E[ω 01 ] = E[ω10 ] = 1 − ε/2 − ε/2 =1−ε which does not converge to 0 as n → ∞ hence the function is not noise sensitive. Example 1.4. The parity function fn (ω) = Qn i=1 ωi is noise sensitive and not noise stable. Proof. " E[fn (ω)fn (ω (ε) )] = E =E n Y ! ωi i=1 " n Y n Y !# (ε) ωi i=1 # ωi2 ωi0 " =E i=1 n Y # ωi0 i=1 = n Y E[ωi0 ] by independence i=1 = (1 − ε)n lim (1 − ε)n = 0 ∀ε > 0 n→∞ hence indeed the function is noise sensitive. Let X ∼ Bin(n, ε/2) then P(fn (ω) 6= fn (ω (ε) )) = P(X odd) lim P(X odd) = 1/2 n→∞ hence we do not have convergence to zero as ε → 0 Example 1.5. The iterated majority function is noise sensitive and not noise stable. 5 Proof. We want to consider the function iteratively so notice that for n = 1 we only have one member of the population hence P(f1 (ω) 6= f1 (ω (ε) )) = ε/2 since the probability that they choose to re-randomize is ε and the probability of the new value being different given a re-randomization is taking place is 1/2. For n = 3 let ω = + + + denote the event that the three members agree and ω = + + − denote the event where one member votes differently to the other two. We clearly have that P(ω = + + +) = 1/4 = 1 − P(ω = + + 1). We can then condition on these events to get: ε 2 ε ε 3 1− + P(f3 (ω 6= f3 (ω ∗(ε) )|ω = + + +) = 3 2 2 2 ε ε 2 ε ε 2 ε 3 ∗(ε) 1− +2 1− P(f3 (ω 6= f3 (ω )|ω = + + −) = + 2 2 2 2 2 ε 3 ε ε 2 1 ε ε ε 2 ε 3 3 ε 2 ∗(ε) P(f3 (ω 6= f3 (ω )) = 1− + 1− +2 1− 3 + + 4 2 2 2 4 2 2 2 2 2 2 3 2 3 3 2 3 1 3ε 3ε ε 3 ε ε ε ε = − + + + ε − ε2 + + − 4 4 8 8 4 8 4 4 8 2 3 3ε 3ε ε = − + 4 8 8 Moreover if we write εk to be the noise at level k we inductively have that εk+1 = 3ε2 3εk − k + ε3k 2 2 We want to find a stable equilibrium of εk+1 = εk . There are three equilibria which are εk = 0, 1/2, 1. We have that |f 0 (0)| = 3/2 > 1 0 |f (1/2)| = 3/4 < 1 |f 0 (1)| = 3/2 > 1 so ε = 1/2 is stable hence lim P(fn (ω) 6= fn (ω (ε) )) = 1/2 n→∞ So indeed the function is sensitive. Theorem 1.3. Discrete Poincare: If f : Ωn → {−1, 1} then V ar(f ) ≤ I(f ) ≤ n||Inf (f )||∞ in particular ||Inf (f )||∞ ≤ V ar(f ) n Proof. Let ω, ω̃ be i.i.d on Ωn then V ar(f ) = E[(f (ω) − f (ω̃))2 ]/2 If we then define ( ω(i) i > k ωk (i) := ω̃(i) i ≤ k 6 for i, k = 1, ..., n then we have that ω0 = ω and ωn = ω̃ so f (ω) − f (ω̃) = f (ω0 ) − f (ωn ) = (f (ω0 ) − f (ω1 )) + f (ω1 ) − f (ωn ) n X = f (ωi−1 ) − f (ωi ) i=1 V ar(f ) = E[(f (ω) − f (ω̃))2 ]/2 !2 n X = E f (ωi−1 ) − f (ωi ) /2 i=1 ≤ n X E[(f (ωi ) − f (ωi−1 ))2 ]/4 i=1 = = = n X i=1 n X i=1 n X E[4If (ωi )6=f (ωi−1 ) ]/4 P(f (ωi ) 6= f (ωi−1 )) Ii (f ) i=1 ≤ n||Inf (f )||∞ Theorem 1.4. ||Inf (f )||∞ ≥ cV ar(f )log(n) n for some universal constant c. Lemma 1.2. ∃c universal constant such that I(f ) = n X Ii (f ) ≥ cV ar(f ) log i=1 1 ||Inf (f )||∞ Theorem 1.5. If lim n→∞ n X Ii (fn )2 = 0 i=1 then fn is noise sensitive. In general the converse to this fails, this is clear because the parity function is noise sensitive but the condition fails. Theorem 1.6. For c > 0.234 we have that Cov(fn (ω), fn (ω (ε) )) ≤ 20 n X i=1 Lemma 1.3. If fn are monotone and lim n→∞ n X Ii (fn )2 6= 0 i=1 then fn is not noise sensitive. 7 !cε Ii (f ) 1.3 Fourier Analysis Of Boolean Functions Definition 1.9. Partial Ordering: We can partially order Ωn by saying ω ≤ ω 0 iff ωi ≤ ωi0 ∀i ∈ [n] Definition 1.10. Increasing: We say that f is monotone iff x ≤ y =⇒ f (x) ≤ f (y) Notation 1.2. We denote L2 (Ωn ) := {f : Ωn → R} to be the space of functions of Ωn . Definition 1.11. Inner Product: For any f, g ∈ L2 (Ωn ) we define the inner product to be X 2−n f (ω)g(ω) < f, g >:= E[f g] = ω∈Ωn Definition 1.12. Walsh Function: For S ⊆ [n] we define the Walsh function of S as χS : Ωn → {−1, 1} as Y χS (ω) = ωi i∈S Lemma 1.4. {χS }S⊆[n] forms an orthonormal basis for L2 (Ωn ) Proof. < χS , χS 0 > = E[χS χS 0 ] Y Y = E ωi ωj j∈S 0 i∈S " # Y =E ωi i∈S∆S 0 = Y E[ωi ] i∈S∆S 0 ( = 1 0 S = S0 o/w Where ∆ represents the symmetric difference. It remains to show that the set of Walsh functions spans. If f ∈ L2 (Ωn ) then we can write X f (ω) = f (ω 0 )Iω0 (ω) ω 0 ∈Ωn Since there are precisely 2n elements {Iω0 }ω0 ∈Ωn we indeed have that the dimansion of L2 (Ωn ) is 2n which is also the number of Walsh functions hence since they are orthonormal they must span. Definition 1.13. Fourier-Walsh Decomposition: For f ∈ L2 (Ωn ) we write the Fourier-Walsh decomposition as X f= fˆ(s)χS s⊆[n] where fˆ(s) =< f, χS > is the Fourier-Walsh coefficient. Definition 1.14. Energy Spectrum: For f ∈ L2 (Ωn ) we define the energy spectrum to be X Ef (m) := s⊆[n]:|s|=m 8 fˆ(s)2 Typically we have that if fˆ is large for small s then f is stable and if fˆ is large for large s then f is unstable. Lemma 1.5. V ar[f ] = n X Ef (m) m=1 Proof. n X Ef (m) = m=0 X fˆ(s)2 s⊆[n] = X fˆ(s) < χs , χs0 > fˆ(s0 ) s,s0 ⊆[n] = X < f, χs >< χs , χs0 >< f, χs0 > s,s0 ⊆[n] =< f, f > = E[f 2 ] Ef (0) = fˆ(φ)2 = E[f ]2 n n X X Ef (m) = Ef (m) − Ef (0) m=1 m=0 = E[f 2 ] − E[f ]2 = V ar(f ) Lemma 1.6. For f : Ωn → R we have that Cov(f (ω), f (ω (ε) )) = n X m=1 9 Ef (m)(1 − ε)m Proof. E[f (ω)f (ω (ε) )] = E X fˆ(s)χs (ω) = fˆ(s0 )χs0 (ω (ε) ) s0 ⊆[n] s⊆[n] X X fˆ(s)fˆ(s0 )E[χs (ω)χs0 (ω (ε) )] s,s0 ⊆[n] X ∼ Bin(|s0 |, ε/2) E[χs (ω)χs0 (ω)] = E[χs (ω)χs0 (ω)(−1)X ] = E[χs (ω)χs0 (ω)]E[(−1)X ] ( 0 s 6= s0 = |s| (1 − ε) s = s0 X E[f (ω)f (ω (ε) )] = fˆ(s)2 (1 − ε)|s| s⊆[n] = n X fˆ(m)2 (1 − ε)m m=0 = fˆ(φ)2 + n X fˆ(m)2 (1 − ε)m m=1 n X = E[f (ω)]2 + Ef (m)(1 − ε)m m=1 Cov(f (ω), f (ω (ε) )) = n X Ef (m)(1 − ε)m m=1 If the concentration of energy is on small m then the convergence with respect to ε will be slow. Corollary 1.1. Cov(f (ω), f (ω (ε) )) is positive and decreasing in ε. Proposition 1.1. A sequence {fn }∞ n=1 : Ωn → {−1, 1} is noise sensitive iff ∀k ≥ 1 we have that lim n→∞ k X Efn (m) = 0 m=1 Proof. Suppose fn is noise sensitive then by definition we have that lim Cov(f (ω), f (ω (ε) )) = 0 n→∞ So by the previous lemma we have that lim n→∞ n X Ef (m)(1 − ε)m m=1 For k ≥ 1 we have that k X n X 1 Efn (m) ≤ Efn (m)(1 − ε)m k (1 − ε) m=1 m=1 Since (1 − ε)−k is a fixed constant we have that lim n→∞ k X Efn (m) = 0 m=1 10 Now suppose that k X lim n→∞ Efn (m) = 0 m=1 Then we have that n X k X Efn (m)(1 − ε)m ≤ m=1 m=1 Notice that by our assumption limn→∞ (1 − ε)k+1 n X Pk m=1 Efn (m) = 0 and that n→∞ n X Efn (m) = V ar[fn ] ≤ 1 m=1 m=k+1 lim sup Efn (m) m=k+1 Efn (m) ≤ (1 − ε)k+1 So indeed n X Efn (m) + (1 − ε)k+1 n X Efn (m)(1 − ε)m ≤ (1 − ε)k+1 ∀k m=1 Since this holds for arbitrary k we can conclude that lim sup n→∞ n X Efn (m)(1 − ε)m = 0 m=1 So lim Cov(f (ω), f (ω (ε) )) = 0 n→∞ by the previous lemma hence the sequence {fn }∞ n=1 is noise sensitive. Lemma 1.7. If f is a boolean function then 2P(f (ω) 6= f (ω (ε) )) = V ar[f (ω)] − Cov[f (ω), f (ω (ε) )] Proof. E[f (ω)f (ω (ε) )] = P(f (ω) = f (ω (ε) )) − P(f (ω) 6= f (ω (ε) )) = 1 − 2P(f (ω) 6= f (ω (ε) )) = E[f (ω)f (ω)] − 2P(f (ω) 6= f (ω (ε) )) E[f (ω)f (ω (ε) )] − E[f (ω)]2 = E[f (ω)f (ω)] − E[f (ω)]2 − 2P(f (ω) 6= f (ω (ε) )) Cov[f (ω), f (ω (ε) )] = V ar[f (ω)] − 2P(f (ω) 6= f (ω (ε) )) So indeed the result follows by rearrangement. Corollary 1.2. For boolean function f we have that 2P(f (ω) 6= f (ω (ε) )) = n X Ef (m)(1 − (1 − ε)m ) m=1 Proof. 2P(f (ω) 6= f (ω (ε) )) = V ar[f (ω)] − Cov[f (ω), f (ω (ε) )] = Cov[f (ω), f (ω)] − Cov[f (ω), f (ω (ε) )] n n X X m = Ef (m)(1 − 0) − Ef (m)(1 − ε)m = m=1 n X m=1 Ef (m)(1 − (1 − ε)m ) m=1 11 Proposition 1.2. A sequence of boolean functions {fn }∞ n=1 is noise stable iff ∀ε ∈ (0, 1) supn∈N n X ∃kε ∈ N s.t. Efn (m) ≤ ε m=k Proof. Suppose the sequence fn is noise stable then supn∈N n X Efn (m) ≤ n X 1 sup Efn (m)(1 − (1 − ε)m ) n∈N 1 − (1 − ε)k m=k m=k n X 1 ≤ sup Efn (m)(1 − (1 − ε)m ) n∈N 1 − (1 − ε)k m=1 = 1 supn∈N 2P(fn (ω) 6= fn (ω (ε) )) 1 − (1 − ε)k Since fn is noise stable we have that ∀ε ∈ (0, 1) ∃δ ∈ (0, 1) s.t. supn∈N P(fn (ω) 6= fn (ω (δ) )) ≤ ε 4 So choose k ∈ N such that 1 − (1 − ε)k ≥ 1/2 then supn∈N n X Efn (m) ≤ ε m=k Now suppose that ∀ε >∈ (0, 1) ∃kε ∈ N s.t. supn∈N n X Efn (m) ≤ ε m=k then 2supn∈N P(fn (ω) 6= fn (ω (δ) )) ≤ supn∈N k−1 X Efn (m)(1 − (1 − δ)m ) + ε m=1 ≤ supn∈N V ar[fn (ω)](1 − (1 − δ)k ) + ε ≤ (1 − (1 − δ)k ) + ε We can choose both δ, ε arbitrarily small so indeed lim P(fn (ω) 6= fn (ω (ε) )) = 0 n→∞ Proposition 1.3. If fn is noise sensitive and gn is noise stable then limn→∞ Cov(fn , gn ) = 0 Proof. Cov(fn , gn ) = E[fn (ω)gn (ω)] − E[fn (ω)]E[gn (ω)] X = fˆ(s) < χs , χs0 ĝ(s) − fˆ(φ)ĝ(φ) s,s0 ⊆[n] = X fˆ(s)ĝ(s) − fˆ(φ)ĝ(φ) s⊆[n] = X fˆ(s)ĝ(s) |s|≥1 12 Fix ε > 0 then ∃k s.t. supn∈N X Egn (m) < ε m≥k since gn is noise stable from proposition 1.2. X v u n n X uX ˆ Ef (m) f (s)ĝ(s) ≤ t Eg (m) |s|≥k m=k ≤ p byCauchy − Schwarz m=k V ar(f )ε2 ≤ε So we have that lim sup Cov(fn , gn ) = lim sup n→∞ n→∞ = lim sup n→∞ X fˆ(s)ĝ(s) |s|≥1 X n→∞ |s|>k X ≤ ε + lim sup n→∞ X fˆ(s)ĝ(s) + lim sup fˆ(s)ĝ(s) |s|∈[1,k] fˆ(s)ĝ(s) |s|∈[1,k] v u k k uX X Eg (m) Ef (m) ≤ ε + lim sup t n→∞ m=1 m=1 v u k uX p ≤ ε + lim sup V ar(g)t Ef (m) n→∞ m=1 =ε since fn noise sensitive. Lemma 1.8. Parseval’s Identity: < f, f >= X fˆ(s)2 s⊆[n] Definition 1.15. Discrete Derivative: For f : Ωn → R we define the operator ∇k f (ω) = f (ω) − f (ω k ) where ( ωik = ωi −ωi i 6= k i=k to be the discrete derivative. Proposition 1.4. If f : Ωn → {−1, 1} then Ik (f ) = X s3k 13 fˆ(s)2 Proof. Notice that ( Q ωk χs (ω) = ωi = Qi∈s ik − i∈s ωi i∈s Y k∈ /s k∈s So we have that X ∇k f (ω) = fˆ(s)(χs (ω) − χs (ω k )) s⊆[n] =2 X fˆ(s)χs (ω) s3k ∇ˆk f (s) =< ∇k f, χs > X =2 fˆ(s0 ) < χs0 (ω), χs (ω) > s0 3k ( 0 k∈ /s = 2fˆ(s) k ∈ s So by defninition of the influence function we have that: Ik (f ) = P(∇k f 6= 0) = P(∇k f (ω) = 2) + 2P(∇k f (ω) = −2) X 1 m2 P(∇k f (ω) = m) = 4 m∈{−2,0,2} 1 = E[(∇k f )2 ] 4 X = fˆ(s)2 s3k Corollary 1.3. If f : Ωn → {−1, 1} then I(f ) = n X Ik (f ) = k=1 X |s|fˆ(s)2 s⊆[n] Definition 1.16. Monotonic: If ω, ω 0 ∈ Ωn then ω ≤ ω 0 iff ωi ≤ ωi0 ∀i f : Ω → {−1, 1} is monotonic iff ω ≤ ω 0 =⇒ f (ω) ≤ f (ω 0 ) Proposition 1.5. If f is monotonic then Ik (f ) = fˆ({k}) Proof. fˆ({k}) =< f, χ{k} > = E[f χ{k} ] = E[f (ω)ωk (Ik∈P + Ik∈P / )] = E[f (ω)ωk Ik∈P ] Since f is monotonic we have that k ∈ P =⇒ f (ω) = ωk so fˆ({k}) = E[Ik∈P ] = P(k ∈ P ) = Ik (f ) 14 Corollary 1.4. If fn is a noise sensitive sequence of monotonic functions then lim n→∞ n X Ik (fn )2 = 0 k=1 Proof. n X Ik (fn )2 = k=1 n X fˆ({k})2 by monotonicity k=1 = Efn [1] which converges to 0 by noise sensitivity. Corollary 1.5. If f is monotonic then I(f ) ≤ √ n Proof. I(f ) = n X fˆ({k})2 k=1 v v u n u n uX uX 1t fˆ({k})2 ≤t k=1 = ≤ 1.4 √ √ by Cauchy-Schwarz k=1 n < f, f >2 n Hypercontractivity Definition 1.17. Noise Operator: For ρ ∈ [0, 1] we say that Tρ := E[f (ω (1−ρ) )|ω] is the noise operator of f with respect to ρ Lemma 1.9. For f ∈ L2 (Ωn ) we have that Tˆρ f (s) = ρ|s| fˆ(s) Proof. Tˆρ f (s) =< Tρ f, χs > = E[E[f (ω (1−ρ) )|ω]χs (ω)] = E[f (ω (1−ρ) )χs (ω)] = E[f (ω)χs (ω (1−ρ) )] = E[f (ω)χs (ω)χs (ω 0 )] where ω, ω 0 are independent and P(ω 0 = 1) = 1 − (1 − ρ)/2 so Tˆρ f (s) = E[f (ω)χs (ω)χs (ω 0 )] = E[f (ω)χs (ω)]E[χs (ω 0 )] = fˆ(s)ρ|s| 15 Corollary 1.6. Tρ f (ω) = X ρ|s| fˆ(s)χs (ω) s⊆[n] Notice that because ρ < 1 we have that if f is stable then the noise operator changes f very little because the weight is concentrated on small s whereas if f is noise sensitive then the reverse happens. Theorem 1.7. BGB: For f ∈ L2 (Ωn ), ρ ∈ [0, 1] we have that ||Tρ f ||2 ≤ ||f ||1+ρ2 Theorem 1.8. BKS: If lim n X n→∞ Ii (fn )2 = 0 i=1 then fn is noise sensitive. Proof. For the case when ∃c, δ > 0 s.t. n X Ii (fn )2 ≤ cn−δ i=1 We want to show that lim n→∞ k X Efn (m) = 0 m=1 −1 By Holder: we have that for 1 ≤ p, q ≤ ∞ with p + q −1 = 1 we have ||f g||1 ≤ ||f ||p ||g||g So k X X Efn (m) = m=1 fˆn (s)2 1≤|s|≤k X ≤ fˆn (s)2 |s| 1≤|s|≤k = n X X fˆn (s)2 I{i∈s} i=1 1≤|s|≤k ≤ n X X ∇iˆfn (s)2 i=1 1≤|s|≤k ≤ n X X ρ2(|s|−k) ∇iˆfn (s)2 i=1 1≤|s|≤k = ρ−2k n X X ˆi fn )(s)2 Tρ (∇ i=1 1≤|s|≤k ≤ ρ−2k n X X ˆi fn )(s)2 Tρ (∇ i=1 1≤|s|≤n = ρ−2k n X < Tρ (∇i fn ), Tρ (∇i fn ) > by Parseval i=1 ≤ n X ||∇i fn ||21+ρ2 by hyperconductivity i=1 16 For f boolean wehave that ∇i f ∈ {−2, 0, 2} hence |∇i f (ω)|p = |∇i f (ω)|q 2p−q for any q < p so we have that: ! 22 1+ρ X −n 1+ρ2 ||∇i f (ω)||1+ρ2 ≤ 2 |∇i f | ω ! = 2−n X |∇i f |2 21+ρ 2 2 1+ρ2 −2 ω ! = 2 −n X 2 1+ρ2 2 |∇i f | 2 4 2− 1+ρ 2 ω 4 ≤ 4||∇i f ||21+ρ k X Ef (m) ≤ 4ρ−2k n X m=1 2 4 ||∇i f ||21+ρ 2 i=1 = 16ρ−2k n X 2 Ii (f ) 1+ρ2 i=1 ≤ 16ρ−2k n X ! 1 1+ρ2 Ii (f )2 ρ2 n 1+ρ2 by Holder i=1 16ρ−2k cn−δ 1 1+ρ2 ρ2 n 1+ρ2 By choosing ρ2 < δ we have that lim 16ρ−2k cn−δ 1 1+ρ2 n→∞ ρ2 n 1+ρ2 ≤ c0 n for some constant c0 so indeed fn is noise sensitive. Example 1.6. The iterated majority function is noise sensitive. For n = 3k we have Ii (f ) = 2−k so n X Ii (f )2 = 3k (2−k )2 i=1 = k 3 4 = nlog3 (0.75) = n−δ Example 1.7. The tribes function is noise sensitive. We have that c log(n) Ii (f ) ≈ n Fix ε ∈ (0, 1) n X i=1 Ii (f )2 = c2 (log(n))2 n n2 ≤ nε−1 For sufficiently large n. 17 2 − δ−ρ 1+ρ2 =0 The tribes function is the most noise sensitive monotonic function. In general the converse to the BKL theorem is false however the following lemma shows when it can be true. Lemma 1.10. If fn is a noise sensitive sequence of monotonic functions then lim n X n→∞ Ii (f )2 = 0 i=1 Proof. We have that Ik (f ) = fˆ({k}) for k = 1, ..., n and by noise sensitivity we have that lim n→∞ k X Efn (m) = 0 m=1 in particular this must hold for k = 1 so n X k=1 Ik (f )2 = n X fˆ({k}) = Efn (1) k=1 Theorem 1.9. KKL: ∃c ∈ (0, ∞) s.t. ∀i = 1, ..., n and ∀f : Ω → {1, −1} we have that n X 1 I(f ) = Ii (f ) ≥ cV ar(f ) log Imax i=1 where Imax = ||Inf (f )||∞ Proof. Let ρ2 = 1/2 then 1 + ρ2 = 3/2 and 2(1 + ρ2 )−1 = 4/3 hence 2 Ii (f )4/3 = Ii (f ) 1+ρ2 4 1 2 = || ∇i f ||21+ρ 2 1 = || ∇i f ||21+ρ2 2 ≥ ||Tρ ∇i f ||2 X = fˆ(s)2 ρ2|s| s3i = X fˆ(s)2−|s| s3i n X Ii (f )3/4 ≥ n X mEf (m)2−m m=1 i=1 ≥ 2−b b X b≤n Ef (m) m=1 Ii (f ) = X |s|fˆ(s)2 s∈[n] ≥b n X Ef (m) m=b+1 n X 1/3 Ii (f )4/3 ≥ Imax I(f ) i=1 V ar(f ) = ≤ n X Ef (m) m=1 1/3 (2−b Imax + b−1 )I(f ) 18 1/3 −1 Now we can find b ≥ A log(Imax ) for some fixed constant A such that 2−b Imax ≤ 1/b which gives us that V ar(f ) ≤ 2I(f ) b 2 I(f ) ≤ −1 A log(Imax ) Corollary 1.7. Imax ≥ cV ar(f ) log(n) n Proof. I(f ) ≤ nImax be definition so by KKL we have that Imax cV ar(f ) ≥ −1 n log(Imax ) and by monotonicity of the LHS in Imax we have that the result follows. 1.5 Percolation Definition 1.18. Path: γ(x, y) = (z0 , ...zn ) is a path if zi ∈ Z2 ∀i such that x = z0 , y = zn and ||zi − zi−1 ||1 = 1 ∀i Definition 1.19. First Passage Time: For 0 < a < b we independently assign edges of Z2 weights ωe ∈ {a, b} such that P(ωe = a) = 1/2 = P(ωe = b). We then let X weight(γ(x, y)) = ωe e∈γ We typically want to find the path with the minimum weight hence our is often to find the first passage time: infγ(x,y) weight(γ) For f : {0, 1}n → {0, 1} denote Pp (ω) := n Y pω(i) (1 − p)1−ω(i) i=1 and let λ be the Lebesgue measure on the cube [0, 1]n Theorem 1.10. BKKKL: Let Ik (A) = λ({Xk determines IA (X)}) be the influence of the kth variable which is the probability that the kth variable is critical in determining whether X ∈ A. Then n X 1 cλ(A)λ(Ac ) log Ik (A) ≥ V ar(IA ) Imax k=1 and Imax ≥ cλ(A)λ(Ac ) log(n) n Definition 1.20. Embedding: For fixed p ∈ [0, 1] and x ∈ [0, 1]n write ( 1 ωi = 0 xi > 1 − p o/w 19 With this definition we have that ω ∼ Pp Corollary 1.8. If we let Ik,p (B) = Pp (ωk af f ects 1B ) then as a result of BKKKL we have that n X c Ik,p (B) ≥ cPp (B)Pp (B ) log k=1 and Imax ≥ 1 Imax cPp (B)Pp (B c ) log(n) n Theorem 1.11. Rasso’s Formula: If A ⊆ {0, 1}n is increasing then n X d Pp (A) = Ii,p (A) dp i=1 Lemma 1.11. For A increasing A 6= φ, {0, 1}n define Pq := p : Pp (A) = q then for ε ∈ (0, 1/2) we have that 1 c log 2ε p1−ε − pε ≤ log 1δ In particular for symmetric increasing functions we have that 1 p1−ε − pε ≤ O log(n) Tribes, majority and iterated majority are examples of symmetric, increasing functions. Theorem 1.12. Harris-Kesten: For percolation on Z2 we have that pc = 1/2 and Θ(1/2) = 0 2 Concentration 2.1 Efron-Stein Inequality Theorem 2.1. Efron-Stein Inequality If {Xi }ni=1 are a sequence of i.i.d. random variables and f : Rn → R then n V ar[f ] ≤ 1X E[(f (X) − f (X (i) ))2 ] 2 i=1 where ( (i) Xj = Xj Xj0 j= 6 i j=i where {Xi0 }ni=1 is a sequence of i.i.d. random variables with the same distribution as and independent of {Xi }ni=1 . Proof. Let ( (i) X̂j := Xj Xj0 20 j>i j≤i Then V ar[f ] = E[f (X)2 ] − E[f (X)]2 = E[f (X)2 ] − E[f (X)]E[f (X 0 )] = E[f (X)2 ] − E[f (X)f (X 0 )] by independence = E[f (X)(f (X) − f (X 0 ))] n X f (X) − f (X 0 ) = f (X̂(i − 1)) − f (X̂ (i) ) i=1 V ar[f ] = n X E[f (X)(f (X̂ (i−1) ) − f (X̂ (i) ))] i=1 = n X E[f (X (i) )(f (X̂ (i) ) − f (X̂ (i−1) ))] i=1 Since the variables are independent and identically distributed. By summing the last two equations and dividing by 2 we get that n V ar[f ] = 1X E[(f (X) − f (X (i) ))(f (X̂ (i−1) ) − f (X̂ (i) ))] 2 i=1 n ≤ 1X E[(f (X) − f (X (i) ))2 ]1/2 E[(f (X̂ (i−1) ) − f (X̂ (i) ))2 ]1/2 2 i=1 by Cauchy-Schwarz n 1X = E[(f (X) − f (X (i) ))2 ] 2 i=1 since the variables are equal in distribution. Notice that the inequality only arises for Cauchy-Schwarz so for functions where equality is obtained in the Cauchy-Schwarz inequality we also have equality for Efron-Stein. Corollary 2.1. First Passage Percolation: Suppose we have a percolation grid of size [1, n]2 and each vertex takes some independent random time ωx ≥ 0 to open once a neighbour is open. Given (1, 1) is initially open we want to find the earliest time 2 (n, n) becomes open. Furthermore assume that paths are directed P to be increasing and that E[ωx ] < ∞. For a path T : (1, 1) → (n, n) the time taken for the path is x∈T ωx . So the first passage time is X ωx TF P T := minT :(1,1)→(n,n) x∈T " V ar # X x∈T ωx = X V ar[ωx ] x∈T = |T |V ar[ωx ] = 2nV ar[ωx ] Is the variance of any particular fixed path, ideally we would like to know the variance of TF P T . 21 V ar[TF P T (ω)] = V ar X ωx TF P T (ω) 1 ≤ 2 = X E[(TF P T (ω) − TF P T (ω (x) ))2 ] x∈[1,n]2 X E[(TF P T (ω) − TF P T (ω (x) ))2 , ωx(x) ≥ ωx ] by symmetry x∈[1,n]2 = X E[(TF P T (ω) − TF P T (ω (x) ))2 , ωx(x) ≥ ωx ] x∈TF P T (ω) Since if x ∈ / TF P T (ω) then there will be no change by increasing the time at a site. Hence TF P T (ω) ≤ TF P T (ω (x) ) In particular 0 ≤ TF P T (ω (x) ) − TF P T (ω) ≤ ωx(x) − ωx ≤ ωx(x) Moreover 0 ≤ (TF P T (ω (x) ) − TF P T (ω))2 ≤ (ωx(x) )2 So indeed V ar[TF P T (ω)] ≤ 2nE[(ωx(x) )2 ] 2.2 Martingale Method Lemma 2.1. P(f − E[f ] > t) ≤ infλ>0 e−λt E[eλ(f −E[f ]) ] Proof. Let λ > 0 then by Markov we have that P(f − E[f ] > t) = P(eλ(f −E[f ]) > eλt ) ≤ e−λt E[eλ(f −E[f ]) ] Since λ > 0 was arbitrary we can choose any such λ > 0 so indeed the statement holds. Corollary 2.2. If E[eλ(f −E[f ]) ] ≤ e then σ 2 λ2 2 t2 P(f − E[f ] > t) ≤ e− 2σ2 Definition 2.1. Martingale Difference: For a probability space (Ω, F, P) with filtration {FiN }ni=0 where Fi = σ({Xj }ij=0 ) for some random n variables {Xj }j=0 on (Ω, F, P) and a function f : F → R we define the martingale difference to be di := E[f |Fi ] − E[f |Fi−1 ] Pn Theorem 2.2. Suppose f ∈ L1 (R) and let D2 := i=1 ||di ||2∞ then t2 P(f − E[f ] > t) ≤ e− 2D2 22 Proof. E[eλ(f −E[f ]) ] = E[eλ Pn i=1 = E[E[eλ = E[e λ di ] Pn di i=1 Pn i=1 di |Fn−1 ]] E[eλdn |Fn−1 ]] We want to estimate E[eλdn |Fn−1 ] For u ∈ (−1, 1) we have that eλu ≤ 1 + u λ 1 − u −λ e + e 2 2 In particular dn eλdn = e ||dn ||∞ λ||dn ||∞ ≤ 1+ E[eλdn ] ≤ E dn ||dn ||∞ " 2 1+ eλ||dn ||∞ + # dn ||dn ||∞ 2 1− dn ||dn ||∞ 2" eλ||dn ||∞ + E e−λ||dn ||∞ # 1 − ||ddnn||∞ 2 e−λ||dn ||∞ E[dn ] = 0 E[eλdn ] ≤ eλ||dn ||∞ + e−λ||dn ||∞ 2 = cosh(λ||dn ||∞ ) ≤e λ2 ||dn ||∞ 2 E[eλ(f −E[f ]) ] ≤ E[eλ ≤e Pn−1 i=1 di ]e λ2 ||dn ||2 ∞ 2 λ2 D 2 2 So the result follows from corollary 2.2. Definition 2.2. Length: A finite metric space (X, d) is of length at most l if ∃{X (i) }ni=1 sequence of partitions such that X (i) ⊂ X (i+1) and X (n) = {{x}x∈X } is the complete partition such that (i) (i) (i−1) (i) (i) (i−1) then we have ψ : X (i) → X (i) where for any ∈ X (i−1) with Aj , Ak ⊂ Ap Aj , Ak ∈ X (i) , Ap (i) (i) x ∈ Aj we have that ψ(x) ∈ Ak such that d(x, ψ(x)) ≤ ai where n X l= !1/2 a2i i=1 Proposition 2.1. Let X, d) be a metric space of length at most l and | µ := 1 X X|δxi |X| i=1 If f ∈ Lip1 (X) i.e. |f (x) − f (y)| ≤ |x − y| then t2 µ(f − E[f ] > t) ≤ e− 2l2 Definition 2.3. Hamming Distance: 23 For X = {0, 1}n we denote the metric n dH (x, y) := 1X |xi − yi | n i=1 to be the Hamming distance. Lemma 2.2. If we construct a partition X (i) = {{(xi )ni=1 }xj ∈{0,1} j = 1, ..., i xi ∈ {0, 1}} on the n dimensional hypercube then if f ∈ Lip1 (dH ) i.e. |f (x) − f (y)| ≤ dH (x, y) then Z t2 t2 n µ f − f dµ > r ≤ e− 2l2 = e− 2 2.3 Convex Hull Approximation Lemma 2.3. If f : Rn → R s.t. Then ∂f 1 ∂xi ≤ √n Z s2 µ f − f dµ > s ≤ e− 2 Proof. For x, y ∈ {0, 1}n we have that |f (x) − f (y)| = n X ∂f (ξx i=1 i ,yi ∂xi ≤ supi supξxi ,yi ) (yi − xi ) ξxi ,yi ∈ (xi , yi ) n ∂f (ξxi ,yi ) X |xi − yi | ∂xi i=1 ∂f = dH (x, y)n ∂x √ ≤ ndH (x, y) √ So we have that f / n ∈ Lip1 (dH ) which gives us that Z √ t2 n µ f − f dµ > t n ≤ e− 2 √ hence choosing t such that s = t n gives the required result. The bound in the above lemma is independent of n hence this is a dimension free concentration inequality. In general this won’t hold if we replace the uniformity with ||∆f ||l2 ≤ 1. Definition 2.4. Convex: A function f : X → R is convex if whenever x, y ∈ X and λ ∈ (0, 1) then we have that f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) Lemma 2.4. If f is convex then for any x, y ∈ X we have that f (x) ≤ f (y) + max||a||l2 ≤1 n X i=1 24 ai (yi − xi ) Proof. f (x) ≤ f (y) + ∆f (x)(y − x) n X ∂f (x) (yi − xi ) = f (y) + ∂xi i=1 ≤ f (y) + max||a||l2 ≤1 n X ai (yi − xi ) i=1 Corollary 2.3. For f : {0, 1}n → R we have that f (x) ≤ f (y) + max||a||l2 ≤1 n X ai Ixi 6=yi i=1 We let da (x, y) = Pn i=1 ai Ixi 6=yi and Dc (x, y) = max||a||l2 ≤1 da (x, y) Theorem Nn 2.3. Talagrand: If A ⊂ i=1 Ω(i) =: X where we consider X with the product measure P then if we write Dc (x, A) := infy∈A d(x, y) then Z D c (x,A)2 1 dP(x) ≤ e 4 P(A) Moreover r2 e− 4 P(D (x, A) ≥ r) ≤ P(A) c Proof. Let UA := {(Ix1 6=y1 , ..., Ixn 6=yn ) : y ∈ A} and let VA (x) be the smallest convex set containing UA (x). We have that: n X Dc (x, A) = infy∈A sup||α||≤1 α1 Ixi 6=yi i=1 = infs∈UA (x) sup||α||≤1 < α, s > = infs∈VA (x) sup||α||≤1 < α, s > = infs∈VA (x) ||s|| We want to complete the proof inductively so suppose n = 1. If x ∈ Ac then x = {1} so Dc (x, A) = 1. From this we have that: Z Z Z c 2 D c (x,A)2 /4 D c (x,A)2 /4 e dP(x) = e dP(x) + eD (x,A) /4 dP(x) (1) c Ω A ZA Z 1/4 = 1dP(x) + e dP(x) A Ac c = P(A) + e 1/4 P(A ) ≤ P(A) + e 1/4 (1 − P(A)) =e 1/4 + (1 − e1/4 )P(A) ≤1 ≤ 1 P(A) 25 So the case holds for n = 1. Suppose the statement Nn holds for n − 1 then we want to show that it holds for n. Let z = (ω, x) N ∈ k=1 Ω(k) Nn n (k) Let B = {y ∈ k=2 : (ω, y) ∈ A f or some ω} be the projection of A onto k=2 Ω(k) . NnΩ (k) Let A(ω) = {y ∈ k=2 Ω : (ω, y) ∈ A}. If s ∈ UA(ω) then (s, 0) ∈ UA (z) and if t ∈ UB then (t, 1) ∈ UA (z) So by convexity of V we have that for θ ∈ [0, 1] we have θ(s, 0) + (1 − θ)(t, 1) = (θs + (1 − θ)t, 1 − θ) ∈ VA (z) Dc (x, A)2 = infy∈VA (z) ||y||2 ≤ inf(s,t)∈UA(ω) ×UB ||(θs + (1 − θ)t, (1 − θ)||2 = inf(s,t)∈UA(ω) ×UB |(θs + (1 − θ)t)|2 + |(1 − θ)|2 ≤ inf(s,t)∈UA(ω) ×UB θs2 + (1 − θ)t2 + (1 − θ)2 = θDc (x, A(ω))2 + (1 − θ)Dc (x, B)2 + (1 − θ)2 Z Z n O c 2 2 c 2 1−θ c 2 θ e 4 D (x,A(ω)) e 4 D (x,B) dP1 (ω)d eD (z,A) /4 dP(z) ≤ e(1−θ) Pk (x) k=2 Z 2 ≤ e(1−θ) e c 2 θ 4 D (x,A(ω)) d n O !θ Z Pk (x) e 1−θ c 2 4 D (x,B) k=2 2 = e(1−θ) Z 1 k=2 Pk (B) Nn Moreover we have that infθ∈[0,1] e Nn θ P (B) Nn k=2 k dP1 (ω) k=2 Pk (A(ω)) (1−θ)2 4 u−θ ≤ 2 − u so optimising over θ gives us that Nn Z Z Pk (B) 1 D c (z,A)2 /4 k=2 Nn e dP(z) ≤ 2 − Nn dP(ω) Pk (B) k=2 Pk (A(ω)) Z k=2 1 Nn ≤ dP(ω) k=2 Pk (B) 1 = Nn k=2 Pk (B) 1 ≤ P(A) Corollary 2.4. If f : X → R s.t. ∀x ∈ X ∃α s.t. f (x) ≤ f (y) + dα (x, y) then f (x) ≤ m + Dc (x, A) where A = {x : f (x) ≤ m} Proof. If x ∈ A then Dc (x, A) = 0 and this holds trivially so assume x ∈ / A. Let x0 ∈ A then f (x) ≤ f (x0 ) + Dc (x, x0 ) ≤ m + Dc (x, A) 26 d n O k=2 !1−θ Pk (x) dP1 (ω) by Holder Corollary 2.5. Choose m to be the median of f then P(f ≥ m + r) ≤ P(Dc (x, A) ≥ r) r2 e− 4 ≤ P(A) = 2e− 2.4 r2 4 Entropy Definition 2.5. Entropy: For a probability measure µ the entropy of a function f is defined as Z Z Z Z f f dµ = log R f dµ Entµ (f ) = f log(f )dµ − f dµ log f dµ Definition 2.6. Log-Sobelev Inequality: Probability measure µ satisfies the Log-Sobelev inequality if forall integrable f we have that Z Entµ (f 2 ) ≤ C |∇f |2 dµ Definition 2.7. Poincare Inequality: Probability measure µ satisfies the Poincare inequality if for all integrable f Z V arµ (f ) ≤ C |∇f |2 dµ Corollary 2.6. If µ satisfies the Log-Sobelev inequality then it also satisfies the Poincare inequality. R Lemma 2.5. If µ is the Lebesgue measure and µ(Ω) = 1 then if f (x)dµ(x) = 0 then Z Z V arµ (f ) = f 2 (x)dµ(x) ≤ C |∇f |2 dµ(x) Ω and the optiimal C is the largest positive eigenvalue of the Laplacian ∆ satisfying ( ∆u(x) = λu(x) x ∈ Ω u(x) = 0 x ∈ ∂Ω Definition 2.8. Generator: Given a stochastic process Xt satisfying dXt = σ(Xt )dBt + b(Xt )dt we have the generator L= n ∂2 1 X (σσ ∗ )i,j + b(x)∇ 2 i,j=1 ∂xi ∂xj Definition 2.9. Semi-Group: If Xt is a stochastic process then the operator Pt satisfying Z (Pt f )(X) = EX [f (Xt )] = f (y)pt (x, y)dy is called the semi-group of X. Lemma 2.6. If Pt is the semi-group of a stochastic process X then Pt+s = Pt Ps for any t, s ≥ 0. Lemma 2.7. If X is a stochastic process with semi-group Pt and generator L then Pt = etL and in particular Pt L = etL L = d Pt = LetL = LPt dt 27 Lemma 2.8. If Pt is the semi-group of a stochastic process with standard Gaussian density γ then Z P∞ f (x) := lim Pt f (x) = f (y)dγ(y) t→∞ and P0 f (x) = f (x) Proof. Z f (e−t x + p 1 − e−2t y)dγ(y) p = Eγ [f (e−t x + 1 − e−2t )Y ] (Pt f )(x) = If X ` Y ∼ N (0, 1) then e−t X + √ Y ∼ N (0, 1) 1 − e−2t Y ∼ N (0, 1) so P0 f (x) = Eγ [f (x)] = f (x) and Z lim Pt f (x) = Eγ [f (Y )] = t→∞ f (y)dγ(y) Theorem 2.4. If γ is the density of the standard Gaussian in Rn then Z 2 Entγ (f ) ≤ 2 |∇f |2 dγ(x) Rn Proof. Z Z Z f log(f )dγ − Entγ (f ) = f dγ log f dγ Z (P0 f ) log(P0 f )dγ − (P∞ f ) log(P∞ f ) Z = − (P∞ f ) log(P∞ f ) − (P0 f ) log(P0 f )dγ Z ∞ Z d =− (Pt f ) log(Pt f )dγdt dt 0 Z ∞Z Z ∞Z d d =− log(Pt f ) (Pt f )dγdt − (Pt f ) log(Pt f )dγdt dt dt 0 0 Z ∞Z Z ∞Z d (Pt f )(x) dγ(x)dt =− (LPt f )(x) log(Pt f )(x)dγ(x)dt − (Pt f )(x) dt (P t f )(x) 0 0 Z ∞Z Z Z d ∞ (Pt f )(x) d (Pt f )(x) dt dγ(x)dt = (Pt f )(x)dγ(x)dt (Pt f )(x) dt 0 Z0 ∞ Z d f (x)dγ(x)dt = dt 0 =0 = So we have that Z ∞ Z Entγ (f ) = − (LPt f )(x) log(Pt f )(x)dγ(x)dt 0 Using the substitution L = 12 ∆ − x∇ gives Z ∞ Z Entγ (f ) = − 0 28 |∇Pt f (x)|2 dγ(x)dt Pt f (x) Moreover we have that Z p ∇Pt f (x) = ∇f (e−t x + 1 − e−2t y)dγ(y) Z p = e−t (∇f )(e−t x + 1 − e−2t y)dγ(y) Z p |∇Pt f (x)| = e−t (∇f )(e−t x + 1 − e−2t y)dγ(y) Z p ≤ e−t |∇f |(e−t x + 1 − e−2t y)dγ(y) Z p p |∇f | −t =e f √ (e−t x + 1 − e−2t y)dγ(y) f Z 1/2 Z 1/2 p p |∇f |2 −t ≤ e−t f (e−t x + 1 − e−2t y)dγ(y) (e x + 1 − e−2t y)dγ(y) f 1/2 |∇f |2 = e−t (Pt f )(x) Pt (x) f So ≤ ∞ |∇Pt f (x)|2 dγ(x)dt Pt f (x) 0 Z ∞ Z e−2t Pt f (x)Pt |∇f |2 f Z Entγ (f ) = Z dγ(x)dt Pt f (x) Z ∞Z |∇f |2 = e−2t Pt dγ(x)dt f 0 Z ∞ Z |∇f |2 dγ(x)dt = e−2t Pt f 0 Z ∞ Z |∇f |2 = e−2t dγ(x)dt f 0 Z ∞ Z |∇f |2 = e−2t dt dγ(x) f 0 Z |∇f |2 1 dγ(x) = 2 f 0 So it follows that Z 1 |∇f 2 |2 Entγ (f 2 ) ≤ dγ(x) 2 f2 Z = 2 |∇f |2 dγ(x) Corollary 2.7. If dµ = e−u(x) such that ∇u ≥ cI then 2 Entµ (f 2 ) ≤ ||∇f ||2L2 (µ) c Proposition 2.2. If µ is a probability measure such that Entµ (f ) ≤ C||∇f ||2L2 (µ) and |F (x) − F (y)| ≤ ||F ||Lip |x − y| ∀x, y ∈ Rn then Z Ct2 µ F − F dµ ≥ t ≤ e− 2 29 Cauchy-Schwarz Proof. Set f 2 = eλF −C λ2 2 ||F ||2Lip Then by Log-Sobelev we have that Z Z Z 2 2 2 Cλ2 Cλ2 Cλ2 Cλ2 λF − ||F ||2Lip eλF − 2 ||F ||Lip dµ − eλF − 2 ||F ||Lip dµ log eλF − 2 ||F ||Lip dµ 2 Z λ2 |∇F |2 |f |2 dµ ≤ Ccs 4 Z 2 Cλ2 λ2 F Lipschitz ≤ Ccs ||F ||2Lip eλF − 2 ||F ||Lip dµ 4 Letting Z Λ(λ) := eλF − Cλ2 2 ||F ||2Lip we have that Cλ2 λ2 ||F ||2Lip Λ(λ) − Λ(λ) log(Λ(λ)) ≤ Ccs ||F ||2Lip Λ(λ) 2 4 So choosing C = Ccs /2 and solving λΛ0 (λ) + λΛ0 (λ) ≤ Λ(λ) log(Λ(λ)) yields that Λ(λ) ≤ e Cλ2 2 as required. Lemma 2.9. ∀y ∈ R we have that x log(x) − x ≥ xy − ey Lemma 2.10. Variation Characterisation: Z eg dµ ≤ 1, g ∈ Cb (R) Z = supg∈Cb (R) f gdµ − log eg dµ Z Z ≥ f gdµ − log eg dµ Entµ (f ) = sup Z f gdµ : Z Theorem Tensarisation: N2.5. Nn n Let X = i=1 Ωi and P = i=1 µi be a product measure on X then if f : X → R is measurable and each µi satisfies Log-Sobelev then n Z X EntP (f ) ≤ Entµi (fi )dP i=1 where fi (y) = f (x1 , ..., xi−1 , y, xi+1 , ..., xn ). Proof. For g s.t. Z eg dP ≤ 1 define R gi (x1 , ..., xn ) := log eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi−1 (xi−1 ) R eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi (xi ) Notice that: 30 ! • Z Z R gi e dµi (xi ) = RR = eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi−1 (xi−1 ) R dµi (xi ) eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi (xi ) eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi−1 (xi−1 )dµi (xi ) R eg(x1 ,...,xn ) dµ1 (x1 ), ..., dµi (xi ) =1 • ! R g(x ,...,x ) R g(x ,...,x ) n n e 1 dµ1 (x1 )...dµn−1 (xn−1 ) e 1 dµ1 (x1 ) eg(x1 ,...,xn ) R ... R g(x ,...,x ) gi = log R g(x ,...,x ) g(x1 ,...,xn ) dµ (x )dµ (x ) 1 n dµ (x ) n dµ (x )...dµ (x ) e e e 1 1 1 1 1 2 2 1 1 n n i=1 g(x1 ,...,xn ) e = log R g(x ,...,x ) 1 n dµ (x )...dµ (x ) e 1 1 n n Z g(x1 ,...,xn ) e = g − log dµ1 (x1 )...dµn (xn ) n X ≥g • Z f gdP ≤ n Z X f gi dP i=1 = ≤ n Z Z X i=1 n Z X f (x)gi (x)dµ(xi )dP Entµi (fi )dP i=1 Z EntP (f ) = sup ≤ g n Z X f gdP Entµi (fi )dP i=1 Corollary 2.8. If Z Entµi (fi ) ≤ Ci then |∇fi |2 dµi Z Z EntP (f ) ≤ maxi Ci ∀i ||∇f ||2L2 (R) dµi dP Corollary 2.9. If Z f dµ = 1 then Z Entµ (f ) = Z f log(f )dµ = log(f )(f dµ) dν := f dµ is a probability measure so Z Entµ (f ) = log 31 dν dµ dν Definition 2.10. Relative Entropy: The relative entropy of ν given µ is defined as ( dν Entµ dµ H(ν|µ) = ∞ ν≤µ o/w Corollary 2.10. The entropy of a function is always positive moreover H(ν|µ) = 0 iff ν = µ. Definition 2.11. Total-Variation Distance: For measures µ, ν we define the total variation distance to be ||µ − ν||T V := supA |µ(A) − ν(A)| Corollary 2.11. ||µ − ν||T V Z dν = (x) − 1dµ(x) dµ Lemma 2.11. ||µ − ν||T V ≤ 2.5 p H(ν|µ)/2 Stein’s Method Lemma 2.12. W ∼ N (0, σ 2 ) iff E[W ϕ(W )] = σ 2 E[ϕ0 (W )] ∀ϕ Lipschitz on C 1 Theorem 2.6. Let γσ2 be the density of a N (0, σ 2 ) random variable and suppose W is a random variable with distribution µ. If ∃T random variable such that E[W ϕ(W )] = E[T ϕ0 (W )] ∀ϕ Lipschitz on C 1 then ||µ − γσ2 ||T.V ≤ 2 E[|T − σ 2 |] σ2 Proof. Given any bounded f and some Z ∼ N (0, σ 2 ) we have that xϕ(x) − σ 2 ϕ0 (x) = f (x) − E[f (Z)] has some Lipschitz solution ϕ s.t. ||ϕ0 ||∞ ≤ 2/σ 2 So W ϕ(W ) − σ 2 ϕ0 (W ) = f (W ) − E[f (Z)] and taking expectations of both sides yields E[W ϕ(W )] − E[σ 2 ϕ0 (W )] = E[f (W ) − f (Z)] Furthermore E[W ϕ(W )] − E[σ 2 ϕ0 (W )] = E[T ϕ0 (W )] − E[σ 2 ϕ0 (W )] = E[(T − σ 2 )ϕ0 (W )] E[|f (W ) − f (Z)|] ≤ E[|T − σ 2 ||ϕ0 (W )|] 2 ≤ 2 E[|T − σ 2 |] σ Hence we have that ||µ − γσ2 ||T.V = supf ∈Lip E[f (W ) − f (Z)] ≤ 32 2 E[|T − σ 2 |] σ2 In this case T is called Stein’s Operator. Corollary 2.12. Setting φ(x) = 1 gives E[W ] = 0 Setting φ(x) = x gives V ar[W ] = E[W 2 ] = E[T ] Setting σ 2 = V ar[W ] gives ||µ − γσ2 ||T.V ≤ 2p V ar[T ] σ2 Theorem 2.7. If W = f (X1 , ..., Xn ) for Xi ∼ N (0, 1) and E[W ] = 0 then Z 1X n √ √ 1 √ ∂i f (X)∂i f ( tX + 1 − tY )dt T = 0 i=1 2 t where Y is independent of X and has the same distribution then T satisfies the condition for theorem 2.6. √ √ √ √ √ √ Proof. Let Xt := tX + 1 − tY and Zt := 1 − tX − tY then X = tXt + 1 − tZt . We have that: E[W ϕ(W )] = E[f (X)ϕ(f (X))] = E[f (X)ϕ(f (X))] − E[f (X)ϕ(f (Y ))] + E[f (X)ϕ(f (Y ))] = E[f (X)(ϕ(f (X)) − ϕ(f (Y )))] + E[f (X)]E[ϕ(f (Y ))] = E[f (X)(ϕ(f (X)) − ϕ(f (Y )))] Z 1 √ √ d ϕ(f ( tX + 1 − tY ))dt = E f (X) 0 dt " # Z 1 n X √ d √ 0 = E f (X) ∂i f (Xt ) ( tXi + 1 − tYi )dt ϕ (f (Xt )) dt 0 i=1 " # Z 1 n X 1 1 0 √ Xi − √ = E f (X) ϕ (f (Xt )) ∂i f (Xt ) Yi dt 2 1−t 2 t 0 i=1 Z 1X n 1 1 √ Xi − √ Yi dt = E f (X)ϕ0 (Xt )∂i f (Xt ) 2 1−t 2 t 0 i=1 Z 1 n X 1 p = E[f (X)ϕ0 (Xt )∂i f (Xt )Zi ]dt 0 2 t(1 − t) i=1 Z 1 n X ∂ 1 0 p = E f (X)ϕ (Xt )∂i f (Xt ) dt ∂Zt,i 0 2 t(1 − t) i=1 Z 1 n 1 X √ = E[∂i f (X)ϕ0 (Xt )∂i f (Xt )]dt 0 2 t i=1 Z 1 n 1 X √ E[∂i f (Xt )ϕ0 (X)∂i f (X)]dt = 2 t 0 i=1 "Z # n 1 1 X 0 √ =E ∂i f (Xt )ϕ (X)∂i f (X)dt 0 2 t i=1 " Z ! # n 1 1 X √ =E ∂i f (X)∂i f (Xt )dt ϕ0 (f (X)) 0 2 t i=1 = E[T ϕ0 (f (X))] 33 Corollary 2.13. V ar[W ] ≤ E[|∇f (X)|2 ] Proof. We have that 1 Z E[W ϕ(W )] = 0 n 1 X √ E[∂i f (X)∂i f (Xt )ϕ0 (W )]dt 2 t i=1 So set ϕ(x) = x then we have that V ar[W ] = E[W ϕ(W ] Z 1 n 1 X √ = E[∂i f (X)∂i f (Xt )]dt 0 2 t i=1 Z 1 1 √ E[|∇f (X)||∇f (Xt )|]dt ≤ 0 2 t Z 1 1 √ E[|∇f (X)|2 ]1/2 E[|∇f (Xt )|2 ]1/2 dt ≤ 2 t 0 Z 1 1 √ dtE[|∇f (X)|2 ] = 2 t 0 = E[|∇f (X)|2 ] Cauchy-Schwarz Cauchy-Schwarz Lemma 2.13. If Entµ (f 2 ) ≤ CCS |∇f ||2L2 (µ) then for every f Lipshitz we have that Z µ f− f dµ > t ≤ e−Ct 2 Proof. We have that 1 Z E[W ϕ(W )] = 0 n 1 X √ E[∂i f (X)∂i f (Xt )ϕ0 (W )]dt 2 t i=1 So set ϕ(x) = eλx and let Λ(λ) = E[eλW ] be the moment generating function of W . This gives us that Λ0 (λ) = E[W eλW ] Z 1 n λ X √ = E[∂i f (X)∂i f (Xt )]dt 0 2 t i=1 Z 1 λ √ E[|f (X)||f (Xt )|eλW ]dt ≤ 0 2 t Z 1 λ √ ||f ||2Lip E[eλW ]dt ≤ 0 2 t = λ||f ||2Lip Λ(λ) Cauchy-Schwarz = λ||f ||2Lip E[eλW ] this gives us the differential inequality Λ0 (λ) ≤ λ||f ||2Lip Λ(λ) hence Λ0 (λ) ≤ λ||f ||2Lip Λ(λ) 34 So log(Λ0 (λ)) ≤ λ||f ||2Lip which gives us that log(Λ(λ)) ≤ So Λ(λ) ≤ e λ2 ||f ||2Lip 2 λ2 2 ||f ||2Lip Furthermore we have that Z µ f − f dµ > t ≤ e−λt Λ(λ) =e λ2 ||f ||2 Lip −2λt 2 In particular this must hold for inf e λ2 ||f ||2 Lip −2λt 2 λ>0 The infimum such λ > 0 must also satisfy d λ2 ||f ||2Lip − 2λt =0 dλ 2 by montonicity. Hence λ||f ||2Lip − t = 0 hence λ= t ||f ||2Lip which gives us that Z t2 − 2 µ f − f dµ > t ≤ e 2||f ||Lip Theorem 2.8. If W = f (X1 , ..., Xn ) where Xi ∼i,i,d N (0, 1) and E[W ] = 0, V ar[W ] = σ 2 then √ 10 ||µ − γσ2 ||T.V ≤ 2 E[||H(f (X))||4 ]1/4 E[|∇f (Xt )|4 ]1/4 σ where H(f )i,j = ∂2f ∂xi ∂xj and ||H(f )|| = n X !1/2 2 Hi,j i=1 Proof. If T is Stein’s operator then E[W ϕ(W )] = E[T (X, Y )ϕ0 (f (X))] = E[E[T (X, Y )|X]ϕ0 (f (X))] So we have that T̃ (X) := E[T (X, Y )|X] is also a Stein’s operator, moreover T̃ (x) = n Z X i=1 0 1 √ √ 1 √ ∂i f (x)EY [∂i f ( tx + 1 − tY )]dt 2 t 35 So it follows that n X ∂ T̃ (x) = ∂xj i=1 1 Z 0 √ 1 √ ∂i,j f (x)EY [∂i f (xt )] + ∂i f (x) tE[∂i,j f (xt )] dt 2 t From corollarys 2.12,2.13 we have that ||µ − γσ2 ||T.V ≤ 2 σ2 q 2 σ2 V ar[T̃ ] ≤ q E[|∇T̃ (x)|2 ] We then get that since (a + b)2 ≤ 2a2 + 2b2 and by Cauchy-Schwarz we have that !2 n X ∂ T̃ E[|∇T̃ (x)|2 ] = E ∂xj j=1 !2 Z 1X n n X √ 1 √ ∂i,j f (x)EY [∂i f (xt )] + ∂i f (x) tE[∂i,j f (xt )] dt = E 0 i=1 2 t j=1 !2 !2 2 n Z 1 n n X X X √ 1 √ ≤ E ∂i,j f (x)EY [∂i f (xt )] + ∂i f (x) tE[∂i,j f (xt )] dt 2 t j=1 0 i=1 i=1 n Z X ≤ 2E j=1 1 1 √ 2 t 0 n X !2 ∂i,j f (x)EY [∂i f (xt )] n Z X dt + 2E i=1 j=1 The Hessian matrix is symmetric (i.e. ∂i,j = ∂j,i ) so we get that: n n X X j=1 !2 ∂i,j f (x)EY [∂i f (xt )] = i=1 n n X X j=1 = = Hi,j EY [∇f (Xt )]i i=1 n n X X j=1 n X !2 !2 Hj,i EY [∇f (Xt )]i i=1 |HEY [∇f (Xt )]j |2 j=1 = ||HEY [∇f (Xt )]||2 E[||HEY [∇f (Xt )]||2 ] ≤ E[||H||2 ]E[EY [|∇f (Xt )|]2 ] ≤ E[||H||4 ]1/2 E[EY [|∇f (Xt )|]4 ]1/2 We can conclude that q 2 2 ||µ − γσ || ≤ 2 E[|∇T̃ (x)|2 ] σ ≤ CE[||H||4 ]1/4 E[|∇f (Xt )|4 ]1/4 Theorem 2.9. Chalterise: 36 0 1 1 √ 2 t n X i=1 !2 ∂i f (x)EY [∂i,j f (xt )] dt If X, Y ∈ Rn are independent random vectors such that {Yi }ni=1 are also independent let i h i−1 ] − E[Yi ] Ai = E E[Xi |{Xj }j=0 h i 2 Bi = E E[Xi2 |{Xj }i−1 j=0 ] − E[Yi ] M3 = maxi E[|Xi |3 ] + E[|Yi |3 ] Li (f ) = maxr |∂ir f | then |E[f (X) − f (Y )]| ≤ L1 (f ) n X n X 1 n Ai + L2 (f ) Bi + L3 (f )M3 2 6 i=1 i=1 Lemma 2.14. If A is an n × n matrix then we define the norms p ||A|| = sup|x|=1 |Ax| = λmax AT A v uX q u n 2 ||A||H.S = t ai,j = T r(AT A) i,j=1 where λmax is the maximal eigenvalue of AT A. Then √ • ||A|| ≤ ||A||H.S ≤ n||A|| • ||AB||H.S ≤ ||A||||B||H.S • |T r(AB)| ≤ ||A||H.S ||B||H.S • ||AB|| ≤ ||A||||B|| n Proposition 2.3. √ Suppose X = (Xi,j )i,j=1 are independent copies of N (0, 1) random variables. We let Ai,j := Xi,j / n be the normalised matrix which has eigenvalues {λi (A)}i = 1n . For k ∈ N we have that n Y X X k−2 aij ,ij+1 aik−1 ,i1 λki = T r(Ak ) = i=1 ij j=1 which can be written as some function F of the vector (X1,1 , ..., X1,n , X2,n , ..., Xn,n ) which is Gaussian and hence has Stein’s coefficient T with √ 2p 10 k ||T r(A ) − γσ2 ||L.V ≤ V ar(T ) ≤ 2 E[||HF (X) ||4 ]1/4 E[|∇f (X)|4 ]1/4 σ2 σ Furthermore there exist contant functions C1 (k), C2 (k) such that C1 (k) ≤ σ 2 = V ar(T (Ak )) ≤ C2 (k) Proof. We want to bound both E[||HF (X) ||4 ]1/4 and E[|∇f (X)|4 ]1/4 above and then V ar(T r(Ak )) below so we split the proof into three parts: • We start by bounding E[|∇f (X)|4 ]1/4 above. 37 ∂ Ak ∂xi,j k−1 X ∂A k−r−1 = T r Ar A ∂xi,j r=0 k−1 X ∂A k−1 = A Tr ∂xi,j r=0 ∂A k−1 = kT r A ∂xi,j ( √ 1/ n i = p, j = q = 0 o/w ∂ T r(Ak ) = T r ∂xi,j ∂A ∂xi,j p,q 1 = √ ei eTj n ∂ 1 k T k−1 T r(A ) = kT r √ ei ej A ∂xi,j n k = √ T r(eTj Ak−1 ei ) n k = √ (Ak−1 )i,j n 2 n X ∂ k 2 T r(A ) |∇F (X)| = ∂xi,j i,j=1 n k 2 X k−1 2 = (A )i,j n i,j=1 k 2 k−1 2 ||A ||H.S n ≤ k 2 ||Ak−1 ||2 = ≤ ||A||2(k−1) So we have that E[|∇F (X)|4 ] ≤ k 4 E[||A||4(k−1) ] • We now show that V ar(T r(Ak )) is bounded below. If X, Y are positively correlated then 0 ≤ Cov(X, Y ) = E[XY ] − E[X]E[Y ] so E[XY ] ≥ E[X]E[Y ] Furthermore the products {Xi1 ,i2 ...Xik−1 ,i1 }ij are positively correlated so X k−2 Y 1 Xij ,ij+1 Xik−1 ,i1 V ar(T r(Ak )) = V ar k/2 n i j=1 j 1 X ≥ k V ar(X1,1 )k n i j = V ar(X1,1 )k 38 So indeed we have a lower bound. • Finally we show that E[||HT r(Ak ) ||4 ]1/4 is bounded above. ∂ Ak−1 ∂xi,j k−2 X ∂2 ∂A r ∂A k−r−2 T r(Ak ) = k A A Tr ∂xi,j ∂xp,q ∂xi,j ∂xp,q r=0 ∂A ∂A k−2 = k(k − 1)T r A ∂xi,j ∂xp,q ∂ T r(Ak ) = kT r ∂xi,j Note that ||A|| = sup|y|=1 |Ay| = sup|y|,|x|=1 < x, Ay >. So we have that n n n X X X ∂ 2 T r(Ak ) : ||HT r(Ak ) || = sup c2i,j = ci,j dp,q d2p,q = 1 ∂xi,j ∂xp,q i,j=1 i,j,p,q=1 i,j=1 Pn Write C = (ci,j )ni,j=1 and D = (dp,q )np,q=1 then ||C||2H.S = i,j=1 c2i,j and ||D||2H.S = So we have that k−2 n X kX ∂ 2 T r(Ak ) = T r(CAr DAk−r−2 ) ci,j dp,q ∂x ∂x n i,j p,q r=0 i,j,p,q=1 Moreover since T r(ACBD) ≤ ||AC||H.S ||BD||H.S ≤ ||A||||B||||C||H.S ||D||H.S it follows that k−2 k(k − 1) kX T r(CAr DAk−r−2 ) ≤ ||A||k−2 ||C||H.S ||D||H.S n r=0 n k(k − 1) ||A||k−2 n k(k − 1) E[||A||4(k−2) ]1/4 ≤ n = E[||HT r(Ak ) ||4 ]1/4 So we have our final bound. Lemma 2.15. If Z Pt f (x) = f (e−t x + p 1 − e−2t y)dγ(y) then ||Pt f (x)||L2 (γ) ≤ ||f ||Lq∗ (t) (γ) where q ∗ (t) = 1 + e−2t . Theorem 2.10. Benaim-Rossignal: If γ is the standard Gaussian density in Rn then ∂f n ∂x X 2 i L1 (γ) ∂f V ar(f ) ≤ 2 ϕ ∂f ∂x L (γ) i i=1 ∂xi where 1 Z ϕ(u) = 0 u2t dt (1 + t)2 39 L2 (γ) Pn i,j=1 d2i,j . Proof. Z V arγ (f ) = f 2 (x)dγ(x) − 2 Z f (x)dγ(x) Z Z f 2 (x)dγ(x) − (P∞ f (x))2 dγ(x) Z Z ∞ d (Pt f (x))2 dγ(x) =− dt 0 Z ∞Z d Pt f (x) Pt f (x)dγ(x) = −2 dt 0 2 Z ∞X n ∂ =2 Pt f (x) dγ(x) ∂xi 0 i=1 Z ∞ n Z X 2 ((Pt ∂i f )(x)) dγ(x)dt =2 e−2t = 0 Z i=1 ∞ e−2t =2 0 Z ≤2 Z Z = Z ≤ ||Pt ∂i f ||2L2 (γ) dt i=1 ∞ e−2t 0 ||∂i f ||2Lq∗ (t) (γ) = n X n X ||∂i f ||2Lq∗ (t) (γ) dt by hypercontractivity i=1 |∂i f |q ∗ (t) |∂i f |2(q ∗ q∗2(t) dγ(x) (t)−1) |∂i f |2−q ∗ (t) q∗2(t) dγ(x) ∗ (t)) Z 2(2−q 2(qq∗∗(t)−1) (t) q ∗ (t) |∂i f |dγ(x) by Holder |∂i f |2 dγ(x) 4(q ∗ (t)−1) ∗ 2(2−q ∗ (t)) ∗ q (t) q (t) = ||∂i f (x)||L2 (γ) ||∂i f (x)||L1 (γ) q∗4(t) −2 ∂f ∂x i L1 (γ) ||∂i f (x)||2L2 (γ) = ∂f ∂xi 2 L (γ) So we have that Z ∞ n X V arγ (f ) ≤ 2 e−2t ||∂i f ||2Lq∗ (t) (γ) dt 0 i=1 q∗4(t) −2 ∂f ∂x i L1 (γ) ≤2 e−2t ||∂i f (x)||2L2 (γ) dt ∂f 0 ∂xi 2 i=1 L (γ) ∂f n X ∂f 2 ∂xi L1 (γ) ≤ 2 ϕ ∂f ∂xi L (γ) ∂x i=1 i 2 Z ∞ n X L (γ) 40