Exponential type inequalities for Markov Chains Witold Bednorz MCMC, Warwick 16-20 March 2009

advertisement
Sums of Independent Random Variables
Markov Chains
Exponential type inequalities for Markov
Chains
Witold Bednorz1
1 Department
of Mathematics
University of Warsaw
MCMC, Warwick 16-20 March 2009
Joint work with R.Adamczak from Warsaw University
Sums of Independent Random Variables
Outline
1
Sums of Independent Random Variables
2
Markov Chains
Markov Chains
Sums of Independent Random Variables
Markov Chains
Independent Random Variables
Let X1 , X2 , ..., Xn be family of independent random
variables, such that EXi = 0 and E(exp(|Xi |/c) − 1) 6 1.
Sums of Independent Random Variables
Markov Chains
Independent Random Variables
Let X1 , X2 , ..., Xn be family of independent random
variables, such that EXi = 0 and E(exp(|Xi |/c) − 1) 6 1.
To prove the concentration inequality we use Markov
inequality
P(
n
X
i=1
Xi > t) 6 e−λt
n
Y
i=1
E exp(λXi ).
Sums of Independent Random Variables
Markov Chains
Independent Random Variables
Let X1 , X2 , ..., Xn be family of independent random
variables, such that EXi = 0 and E(exp(|Xi |/c) − 1) 6 1.
To prove the concentration inequality we use Markov
inequality
P(
n
X
Xi > t) 6 e−λt
i=1
n
Y
E exp(λXi ).
i=1
The simplest bound on the Laplace transform is
∞
∞
X
X
Eλk |Xi |k
λ2 c 2
E exp(λXi ) 6 1+
6 1+
λk c k 6 exp(
).
k!
1 − λc
k =2
k =2
Sums of Independent Random Variables
Markov Chains
Independent Random Variables
Let X1 , X2 , ..., Xn be family of independent random
variables, such that EXi = 0 and E(exp(|Xi |/c) − 1) 6 1.
To prove the concentration inequality we use Markov
inequality
P(
n
X
Xi > t) 6 e−λt
i=1
n
Y
E exp(λXi ).
i=1
The simplest bound on the Laplace transform is
∞
∞
X
X
Eλk |Xi |k
λ2 c 2
E exp(λXi ) 6 1+
6 1+
λk c k 6 exp(
).
k!
1 − λc
k =2
k =2
Therefore
n
X
λ2 c 2 n
P(
Xi > t) 6 e−λt exp(
).
1 − λc
i=1
Sums of Independent Random Variables
Markov Chains
Variance Improvement
We keep the assumption that X1 , ..., Xn are independent,
EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. Let σi2 = EXi2 .
Sums of Independent Random Variables
Markov Chains
Variance Improvement
We keep the assumption that X1 , ..., Xn are independent,
EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. Let σi2 = EXi2 .
If |Xi | 6 M, then the previous argument shows that
P
n
Y
λ2 ni=1 σi2
).
E exp(λXi ) 6 exp(
2(1 − (1/3)λM)
i=1
Sums of Independent Random Variables
Markov Chains
Variance Improvement
We keep the assumption that X1 , ..., Xn are independent,
EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. Let σi2 = EXi2 .
If |Xi | 6 M, then the previous argument shows that
P
n
Y
λ2 ni=1 σi2
).
E exp(λXi ) 6 exp(
2(1 − (1/3)λM)
i=1
When there is no bound |Xi | 6 M, we can artificially
generate it using decomposition
Xi = (Xi 1|Xi |6M − EXi 1|Xi |6M ) + (Xi 1|Xi |>M − EXi 1|Xi |>M ).
Sums of Independent Random Variables
Markov Chains
Variance Improvement
We keep the assumption that X1 , ..., Xn are independent,
EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. Let σi2 = EXi2 .
If |Xi | 6 M, then the previous argument shows that
P
n
Y
λ2 ni=1 σi2
).
E exp(λXi ) 6 exp(
2(1 − (1/3)λM)
i=1
When there is no bound |Xi | 6 M, we can artificially
generate it using decomposition
Xi = (Xi 1|Xi |6M − EXi 1|Xi |6M ) + (Xi 1|Xi |>M − EXi 1|Xi |>M ).
Choosing M = c log n one can show that
P
n
Y
λ2 ni=1 σi2
E exp(λXi ) 6 L exp(
).
1 − (2/3)λ(c log n)
i=1
Sums of Independent Random Variables
Markov Chains
Basic Inequalities
(Bennet’s Inequality) Suppose that X1 , X2 , ..., Xn are real
independent random variables such that EXi = 0,
E(exp(|Xi |/c) − 1) 6 1, then
P(|
n
X
i=1
Xi | > t) 6 2 exp(−
t2
).
nc 2 + ct
Sums of Independent Random Variables
Markov Chains
Basic Inequalities
(Bennet’s Inequality) Suppose that X1 , X2 , ..., Xn are real
independent random variables such that EXi = 0,
E(exp(|Xi |/c) − 1) 6 1, then
P(|
n
X
Xi | > t) 6 2 exp(−
i=1
t2
).
nc 2 + ct
(Bernstein’s Inequality) Suppose that X1 , X2 , ..., Xn are real
independent
random variables such that EXi = 0, |Xi | 6 M,
Pn
2
2
i=1 EXi = nσ , then
P(|
n
X
i=1
Xi | > t) 6 2 exp(−
t2
).
2(nσ 2 + (1/3)Mt)
Sums of Independent Random Variables
Markov Chains
CLT Inequality
Suppose that X1 , X2 , ..., Xn are real independent random
variables
such that EXi = 0, E(exp(|Xi |/c) − 1) 6 1,
Pn
2 = nσ 2 , then
EX
i=1
i
P(|
n
X
i=1
Xi | > t) 6 K exp(−
t2
),
(nσ 2 + (2/3)(c log n)t)
for some universal constant K .
Sums of Independent Random Variables
Markov Chains
CLT Inequality
Suppose that X1 , X2 , ..., Xn are real independent random
variables
such that EXi = 0, E(exp(|Xi |/c) − 1) 6 1,
Pn
2 = nσ 2 , then
EX
i=1
i
P(|
n
X
i=1
Xi | > t) 6 K exp(−
t2
),
(nσ 2 + (2/3)(c log n)t)
for some universal constant K .
The meaning of the result is that for exponentially fast
decaying random variables the CLT (Gaussian √
concentration) can be observed for t 6 (σ 2 /c)( n/ log n)
Sums of Independent Random Variables
Markov Chains
Markov Chains
Let (Xi )i>0 be a Markov chain on a general state space
(X , B).
Sums of Independent Random Variables
Markov Chains
Markov Chains
Let (Xi )i>0 be a Markov chain on a general state space
(X , B).
We assume that there exists a stationary distribution π on
(X , B).
Sums of Independent Random Variables
Markov Chains
Markov Chains
Let (Xi )i>0 be a Markov chain on a general state space
(X , B).
We assume that there exists a stationary distribution π on
(X , B).
We assume that (Xn )n>0 verifies geometric drift condition:
there exist a small set C, constants b < ∞, β > 0 and a
function V > 1 satisfying
PV (x) − V (x) 6 −βV (x) + b1C (x), x ∈ X .
Sums of Independent Random Variables
Markov Chains
Markov Chains
Let (Xi )i>0 be a Markov chain on a general state space
(X , B).
We assume that there exists a stationary distribution π on
(X , B).
We assume that (Xn )n>0 verifies geometric drift condition:
there exist a small set C, constants b < ∞, β > 0 and a
function V > 1 satisfying
PV (x) − V (x) 6 −βV (x) + b1C (x), x ∈ X .
The geometric drift condition is equivalent to the existence
of a small set C ∈ B(X ) and a constant c > 0 such that
Ex (exp(
τC
) − 1) 6 1 for x ∈ C.
c
Sums of Independent Random Variables
Simplified Setting
Let f : X → R be a π-integrable function.
Markov Chains
Sums of Independent Random Variables
Markov Chains
Simplified Setting
Let f : X → R be a π-integrable function.
The goal is to show concentration inequalities of the type
Pµ (|
n
X
f̄ (Xi )| > t) 6 exp(−Φ(µ, t, n)),
i=1
for suitably chosen Φ, where µ is the initial distribution.
Sums of Independent Random Variables
Markov Chains
Simplified Setting
Let f : X → R be a π-integrable function.
The goal is to show concentration inequalities of the type
Pµ (|
n
X
f̄ (Xi )| > t) 6 exp(−Φ(µ, t, n)),
i=1
for suitably chosen Φ, where µ is the initial distribution.
We first consider the concentration inequalities for a
bounded f , i.e. we assume that |f | 6 a.
Sums of Independent Random Variables
Markov Chains
Simplified Setting
Let f : X → R be a π-integrable function.
The goal is to show concentration inequalities of the type
Pµ (|
n
X
f̄ (Xi )| > t) 6 exp(−Φ(µ, t, n)),
i=1
for suitably chosen Φ, where µ is the initial distribution.
We first consider the concentration inequalities for a
bounded f , i.e. we assume that |f | 6 a.
Whenever geometric drift condition holds, then there exists
a set S of full π-measure such that
τC −1
Ex (exp((2acx )−1
X
i=0
f̄ (Xi )) − 1) 6 1, for x ∈ S.
Sums of Independent Random Variables
Renewal Decomposition
We can start the Markov chain from any regular point
x ∈ S.
Markov Chains
Sums of Independent Random Variables
Renewal Decomposition
We can start the Markov chain from any regular point
x ∈ S.
Using the split chain construction the problem can be
reduced to the case where a true atom C = α-exists.
Markov Chains
Sums of Independent Random Variables
Markov Chains
Renewal Decomposition
We can start the Markov chain from any regular point
x ∈ S.
Using the split chain construction the problem can be
reduced to the case where a true atom C = α-exists.
Let τα (1) = τα , τα (k + 1) = inf{n > τα (k ) : Xn ∈ α} we
define
Y1 =
τα
X
i=1
f̄ (Xi ), Y2 =
n
X
i=τα (N)+1
f̄ (Xi ), N =
n
X
k =1
1Xk ∈α
Sums of Independent Random Variables
Markov Chains
Renewal Decomposition
We can start the Markov chain from any regular point
x ∈ S.
Using the split chain construction the problem can be
reduced to the case where a true atom C = α-exists.
Let τα (1) = τα , τα (k + 1) = inf{n > τα (k ) : Xn ∈ α} we
define
Y1 =
τα
X
f̄ (Xi ), Y2 =
i=1
Then
n
X
k =1
i=τα (N)+1
n
X
f̄ (Xi ) = Y1 +
i=1
where Zi =
f̄ (Xi ), N =
Pτα (i+1)
τα (i)+1 f̄ (Xi ).
N−1
X
i=1
n
X
Zi + Y2 ,
1Xk ∈α
Sums of Independent Random Variables
Markov Chains
Estimates for the Entrance and Exit Part.
We use assumption Ex (exp( τcαx ) − 1) 6 1, to get
Px (|Y1 | > t) 6 2 exp(−
t
).
2acx
Sums of Independent Random Variables
Markov Chains
Estimates for the Entrance and Exit Part.
We use assumption Ex (exp( τcαx ) − 1) 6 1, to get
Px (|Y1 | > t) 6 2 exp(−
t
).
2acx
We use the assumption Eα (exp( τcα ) − 1) 6 1, to get
Px (|Y2 | > t) 6 2 exp(−
t
).
2π(α)ac 2
Sums of Independent Random Variables
Markov Chains
Estimates for the Entrance and Exit Part.
We use assumption Ex (exp( τcαx ) − 1) 6 1, to get
Px (|Y1 | > t) 6 2 exp(−
t
).
2acx
We use the assumption Eα (exp( τcα ) − 1) 6 1, to get
Px (|Y2 | > t) 6 2 exp(−
t
).
2π(α)ac 2
In the similar way way we show that
Px (|ZN | > t) 6 2 exp(−
t
).
2π(α)ac 2
Sums of Independent Random Variables
Markov Chains
Estimate for the Main Body
To bound
leads to
PN
i=1 Zi
Ex (exp(λ
we use the martingale method, which
N
X
i=1
Zi )) 6 (Ex (Eα exp(2λZ1 ))N )1/2 .
Sums of Independent Random Variables
Markov Chains
Estimate for the Main Body
To bound
leads to
PN
i=1 Zi
Ex (exp(λ
we use the martingale method, which
N
X
Zi )) 6 (Ex (Eα exp(2λZ1 ))N )1/2 .
i=1
Then one can apply bounds on the Laplace transform of
Z1 , e.g.
4λ2 c 2 a2
Eα exp(2λZ1 ) 6 exp(
),
1 − 2λca
which gives
Ex (exp(λ
N
X
i=1
Zi )) 6 (Ex exp(
4Nλ2 c 2 1/2
)) .
1 − 2λc
Sums of Independent Random Variables
Markov Chains
Bounded Case Result
The construction of N, i.e. {N 6 k } = {τα (k + 1) > n},
provides that
Px (N > k ) 6 exp(−
k
), for k > 2nπ(α).
4π(α)2 c 2
Sums of Independent Random Variables
Markov Chains
Bounded Case Result
The construction of N, i.e. {N 6 k } = {τα (k + 1) > n},
provides that
Px (N > k ) 6 exp(−
k
), for k > 2nπ(α).
4π(α)2 c 2
Theorem (R.Adamczak,W.B, 2009)
Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the
geometric drift condition. Then for any regular initial point x ∈ S
Px (|
n
X
i=1
f̄ (Xi )| > t) 6 K exp(−
1
t2
t
t
min(
,
,
)).
2
2
2
K
nπ(α)c a π(α)c a cx a
Sums of Independent Random Variables
Markov Chains
Remarks
The result√shows that gaussian type concentration works
on |t| 6 a n.
Sums of Independent Random Variables
Markov Chains
Remarks
The result√shows that gaussian type concentration works
on |t| 6 a n.
If we apply N 6 n and the variance improvement to bound
Eα exp(λZ1 ). Consequently
CLT works well for
√
t 6 (σ 2 /(ac 2 ))( n/ log n). then the following can be
obtained
Sums of Independent Random Variables
Markov Chains
Remarks
The result√shows that gaussian type concentration works
on |t| 6 a n.
If we apply N 6 n and the variance improvement to bound
Eα exp(λZ1 ). Consequently
CLT works well for
√
t 6 (σ 2 /(ac 2 ))( n/ log n). then the following can be
obtained
Theorem (R.Adamczak,W.B 2009)
Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the
geometric drift condition. Then for any regular initial point x ∈ S
Px (|
n
X
f̄ (Xi )| > t) 6 K exp(−
i=1
where σ 2 = Eα Z12
1
t2
t(log n)−1 t
min(
,
,
)),
K
nπ(α)σ 2 π(α)c 2 a cx a
Sums of Independent Random Variables
Markov Chains
General Setting
In the general case we assume the multiplicative drift
condition: there exists a small set C and a function
V : X → R+ and constants b < ∞, β > 0 such that
exp(−V (x))P(exp(V ))(x) 6 exp(−β|f (x)| + b1C (x)).
Sums of Independent Random Variables
Markov Chains
General Setting
In the general case we assume the multiplicative drift
condition: there exists a small set C and a function
V : X → R+ and constants b < ∞, β > 0 such that
exp(−V (x))P(exp(V ))(x) 6 exp(−β|f (x)| + b1C (x)).
The multiplicative drift condition is a natural setting to show
multiplicative ergodicity. It guarantees (I.Kontoyiannis,
S.P.Meyn 2005) that there exists a small set C, d < ∞ s.t.
τC −1
Ex (exp(d
−1
X
k =0
|f (Xk )|) − 1) 6 1, for x ∈ C.
Sums of Independent Random Variables
Markov Chains
General Setting
In the general case we assume the multiplicative drift
condition: there exists a small set C and a function
V : X → R+ and constants b < ∞, β > 0 such that
exp(−V (x))P(exp(V ))(x) 6 exp(−β|f (x)| + b1C (x)).
The multiplicative drift condition is a natural setting to show
multiplicative ergodicity. It guarantees (I.Kontoyiannis,
S.P.Meyn 2005) that there exists a small set C, d < ∞ s.t.
τC −1
Ex (exp(d
−1
X
|f (Xk )|) − 1) 6 1, for x ∈ C.
k =0
There exists a set S of full π-measure, such that
τC −1
−1
Ex (exp((2dx )
X
k =0
|f (Xk )|) − 1) 6 1, for x ∈ X .
Sums of Independent Random Variables
Main Result
Following the idea with the split chain we reduce the
problem to the case where a true atom C = α exists.
Markov Chains
Sums of Independent Random Variables
Markov Chains
Main Result
Following the idea with the split chain we reduce the
problem to the case where a true atom C = α exists.
In the atomic case our assumptions can be rewritten as
Eα (exp(d −1
τX
α −1
k =0
|f (Xk )|) − 1) 6 1, Eα (exp(c −1 τα ) − 1) 6 1
Sums of Independent Random Variables
Markov Chains
Main Result
Following the idea with the split chain we reduce the
problem to the case where a true atom C = α exists.
In the atomic case our assumptions can be rewritten as
Eα (exp(d −1
τX
α −1
|f (Xk )|) − 1) 6 1, Eα (exp(c −1 τα ) − 1) 6 1
k =0
Theorem (R.Adamczak, W.B. 2009)
Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the
multiplicative drift condition. Then for any regular initial point
x ∈S
Px (|
n
X
i=1
f̄ (Xi )| > t) 6 K exp(−
1
t2
t
t
min(
,
, )).
2
K
nπ(α)d π(α)cd dx
Sums of Independent Random Variables
Markov Chains
Main Result
Theorem (R.Adamczak, W.B. 2009)
Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the
multiplicative drift condition. Then for any regular initial point
x ∈S
Px (|
n
X
f̄ (Xi )| > t) 6 K exp(−
i=1
where σ 2 = Eα Z12
1
t2
t(log n)−1 t
min(
,
, )),
K
nπ(α)σ 2 π(α)cd dx
Sums of Independent Random Variables
Markov Chains
Main Result
Theorem (R.Adamczak, W.B. 2009)
Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the
multiplicative drift condition. Then for any regular initial point
x ∈S
Px (|
n
X
f̄ (Xi )| > t) 6 K exp(−
i=1
1
t2
t(log n)−1 t
min(
,
, )),
K
nπ(α)σ 2 π(α)cd dx
where σ 2 = Eα Z12
It shows that√
the CLT-type concentration can be seen for
t 6 (σ 2 /cd)( n/ log(n)).
Sums of Independent Random Variables
Markov Chains
Main Result
Theorem (R.Adamczak, W.B. 2009)
Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the
multiplicative drift condition. Then for any regular initial point
x ∈S
Px (|
n
X
f̄ (Xi )| > t) 6 K exp(−
i=1
1
t2
t(log n)−1 t
min(
,
, )),
K
nπ(α)σ 2 π(α)cd dx
where σ 2 = Eα Z12
It shows that√
the CLT-type concentration can be seen for
t 6 (σ 2 /cd)( n/ log(n)).
The result allows to compute the independent-like
concentration inequality in terms of parameters given in
drift conditions.
Download