Sums of Independent Random Variables Markov Chains Exponential type inequalities for Markov Chains Witold Bednorz1 1 Department of Mathematics University of Warsaw MCMC, Warwick 16-20 March 2009 Joint work with R.Adamczak from Warsaw University Sums of Independent Random Variables Outline 1 Sums of Independent Random Variables 2 Markov Chains Markov Chains Sums of Independent Random Variables Markov Chains Independent Random Variables Let X1 , X2 , ..., Xn be family of independent random variables, such that EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. Sums of Independent Random Variables Markov Chains Independent Random Variables Let X1 , X2 , ..., Xn be family of independent random variables, such that EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. To prove the concentration inequality we use Markov inequality P( n X i=1 Xi > t) 6 e−λt n Y i=1 E exp(λXi ). Sums of Independent Random Variables Markov Chains Independent Random Variables Let X1 , X2 , ..., Xn be family of independent random variables, such that EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. To prove the concentration inequality we use Markov inequality P( n X Xi > t) 6 e−λt i=1 n Y E exp(λXi ). i=1 The simplest bound on the Laplace transform is ∞ ∞ X X Eλk |Xi |k λ2 c 2 E exp(λXi ) 6 1+ 6 1+ λk c k 6 exp( ). k! 1 − λc k =2 k =2 Sums of Independent Random Variables Markov Chains Independent Random Variables Let X1 , X2 , ..., Xn be family of independent random variables, such that EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. To prove the concentration inequality we use Markov inequality P( n X Xi > t) 6 e−λt i=1 n Y E exp(λXi ). i=1 The simplest bound on the Laplace transform is ∞ ∞ X X Eλk |Xi |k λ2 c 2 E exp(λXi ) 6 1+ 6 1+ λk c k 6 exp( ). k! 1 − λc k =2 k =2 Therefore n X λ2 c 2 n P( Xi > t) 6 e−λt exp( ). 1 − λc i=1 Sums of Independent Random Variables Markov Chains Variance Improvement We keep the assumption that X1 , ..., Xn are independent, EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. Let σi2 = EXi2 . Sums of Independent Random Variables Markov Chains Variance Improvement We keep the assumption that X1 , ..., Xn are independent, EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. Let σi2 = EXi2 . If |Xi | 6 M, then the previous argument shows that P n Y λ2 ni=1 σi2 ). E exp(λXi ) 6 exp( 2(1 − (1/3)λM) i=1 Sums of Independent Random Variables Markov Chains Variance Improvement We keep the assumption that X1 , ..., Xn are independent, EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. Let σi2 = EXi2 . If |Xi | 6 M, then the previous argument shows that P n Y λ2 ni=1 σi2 ). E exp(λXi ) 6 exp( 2(1 − (1/3)λM) i=1 When there is no bound |Xi | 6 M, we can artificially generate it using decomposition Xi = (Xi 1|Xi |6M − EXi 1|Xi |6M ) + (Xi 1|Xi |>M − EXi 1|Xi |>M ). Sums of Independent Random Variables Markov Chains Variance Improvement We keep the assumption that X1 , ..., Xn are independent, EXi = 0 and E(exp(|Xi |/c) − 1) 6 1. Let σi2 = EXi2 . If |Xi | 6 M, then the previous argument shows that P n Y λ2 ni=1 σi2 ). E exp(λXi ) 6 exp( 2(1 − (1/3)λM) i=1 When there is no bound |Xi | 6 M, we can artificially generate it using decomposition Xi = (Xi 1|Xi |6M − EXi 1|Xi |6M ) + (Xi 1|Xi |>M − EXi 1|Xi |>M ). Choosing M = c log n one can show that P n Y λ2 ni=1 σi2 E exp(λXi ) 6 L exp( ). 1 − (2/3)λ(c log n) i=1 Sums of Independent Random Variables Markov Chains Basic Inequalities (Bennet’s Inequality) Suppose that X1 , X2 , ..., Xn are real independent random variables such that EXi = 0, E(exp(|Xi |/c) − 1) 6 1, then P(| n X i=1 Xi | > t) 6 2 exp(− t2 ). nc 2 + ct Sums of Independent Random Variables Markov Chains Basic Inequalities (Bennet’s Inequality) Suppose that X1 , X2 , ..., Xn are real independent random variables such that EXi = 0, E(exp(|Xi |/c) − 1) 6 1, then P(| n X Xi | > t) 6 2 exp(− i=1 t2 ). nc 2 + ct (Bernstein’s Inequality) Suppose that X1 , X2 , ..., Xn are real independent random variables such that EXi = 0, |Xi | 6 M, Pn 2 2 i=1 EXi = nσ , then P(| n X i=1 Xi | > t) 6 2 exp(− t2 ). 2(nσ 2 + (1/3)Mt) Sums of Independent Random Variables Markov Chains CLT Inequality Suppose that X1 , X2 , ..., Xn are real independent random variables such that EXi = 0, E(exp(|Xi |/c) − 1) 6 1, Pn 2 = nσ 2 , then EX i=1 i P(| n X i=1 Xi | > t) 6 K exp(− t2 ), (nσ 2 + (2/3)(c log n)t) for some universal constant K . Sums of Independent Random Variables Markov Chains CLT Inequality Suppose that X1 , X2 , ..., Xn are real independent random variables such that EXi = 0, E(exp(|Xi |/c) − 1) 6 1, Pn 2 = nσ 2 , then EX i=1 i P(| n X i=1 Xi | > t) 6 K exp(− t2 ), (nσ 2 + (2/3)(c log n)t) for some universal constant K . The meaning of the result is that for exponentially fast decaying random variables the CLT (Gaussian √ concentration) can be observed for t 6 (σ 2 /c)( n/ log n) Sums of Independent Random Variables Markov Chains Markov Chains Let (Xi )i>0 be a Markov chain on a general state space (X , B). Sums of Independent Random Variables Markov Chains Markov Chains Let (Xi )i>0 be a Markov chain on a general state space (X , B). We assume that there exists a stationary distribution π on (X , B). Sums of Independent Random Variables Markov Chains Markov Chains Let (Xi )i>0 be a Markov chain on a general state space (X , B). We assume that there exists a stationary distribution π on (X , B). We assume that (Xn )n>0 verifies geometric drift condition: there exist a small set C, constants b < ∞, β > 0 and a function V > 1 satisfying PV (x) − V (x) 6 −βV (x) + b1C (x), x ∈ X . Sums of Independent Random Variables Markov Chains Markov Chains Let (Xi )i>0 be a Markov chain on a general state space (X , B). We assume that there exists a stationary distribution π on (X , B). We assume that (Xn )n>0 verifies geometric drift condition: there exist a small set C, constants b < ∞, β > 0 and a function V > 1 satisfying PV (x) − V (x) 6 −βV (x) + b1C (x), x ∈ X . The geometric drift condition is equivalent to the existence of a small set C ∈ B(X ) and a constant c > 0 such that Ex (exp( τC ) − 1) 6 1 for x ∈ C. c Sums of Independent Random Variables Simplified Setting Let f : X → R be a π-integrable function. Markov Chains Sums of Independent Random Variables Markov Chains Simplified Setting Let f : X → R be a π-integrable function. The goal is to show concentration inequalities of the type Pµ (| n X f̄ (Xi )| > t) 6 exp(−Φ(µ, t, n)), i=1 for suitably chosen Φ, where µ is the initial distribution. Sums of Independent Random Variables Markov Chains Simplified Setting Let f : X → R be a π-integrable function. The goal is to show concentration inequalities of the type Pµ (| n X f̄ (Xi )| > t) 6 exp(−Φ(µ, t, n)), i=1 for suitably chosen Φ, where µ is the initial distribution. We first consider the concentration inequalities for a bounded f , i.e. we assume that |f | 6 a. Sums of Independent Random Variables Markov Chains Simplified Setting Let f : X → R be a π-integrable function. The goal is to show concentration inequalities of the type Pµ (| n X f̄ (Xi )| > t) 6 exp(−Φ(µ, t, n)), i=1 for suitably chosen Φ, where µ is the initial distribution. We first consider the concentration inequalities for a bounded f , i.e. we assume that |f | 6 a. Whenever geometric drift condition holds, then there exists a set S of full π-measure such that τC −1 Ex (exp((2acx )−1 X i=0 f̄ (Xi )) − 1) 6 1, for x ∈ S. Sums of Independent Random Variables Renewal Decomposition We can start the Markov chain from any regular point x ∈ S. Markov Chains Sums of Independent Random Variables Renewal Decomposition We can start the Markov chain from any regular point x ∈ S. Using the split chain construction the problem can be reduced to the case where a true atom C = α-exists. Markov Chains Sums of Independent Random Variables Markov Chains Renewal Decomposition We can start the Markov chain from any regular point x ∈ S. Using the split chain construction the problem can be reduced to the case where a true atom C = α-exists. Let τα (1) = τα , τα (k + 1) = inf{n > τα (k ) : Xn ∈ α} we define Y1 = τα X i=1 f̄ (Xi ), Y2 = n X i=τα (N)+1 f̄ (Xi ), N = n X k =1 1Xk ∈α Sums of Independent Random Variables Markov Chains Renewal Decomposition We can start the Markov chain from any regular point x ∈ S. Using the split chain construction the problem can be reduced to the case where a true atom C = α-exists. Let τα (1) = τα , τα (k + 1) = inf{n > τα (k ) : Xn ∈ α} we define Y1 = τα X f̄ (Xi ), Y2 = i=1 Then n X k =1 i=τα (N)+1 n X f̄ (Xi ) = Y1 + i=1 where Zi = f̄ (Xi ), N = Pτα (i+1) τα (i)+1 f̄ (Xi ). N−1 X i=1 n X Zi + Y2 , 1Xk ∈α Sums of Independent Random Variables Markov Chains Estimates for the Entrance and Exit Part. We use assumption Ex (exp( τcαx ) − 1) 6 1, to get Px (|Y1 | > t) 6 2 exp(− t ). 2acx Sums of Independent Random Variables Markov Chains Estimates for the Entrance and Exit Part. We use assumption Ex (exp( τcαx ) − 1) 6 1, to get Px (|Y1 | > t) 6 2 exp(− t ). 2acx We use the assumption Eα (exp( τcα ) − 1) 6 1, to get Px (|Y2 | > t) 6 2 exp(− t ). 2π(α)ac 2 Sums of Independent Random Variables Markov Chains Estimates for the Entrance and Exit Part. We use assumption Ex (exp( τcαx ) − 1) 6 1, to get Px (|Y1 | > t) 6 2 exp(− t ). 2acx We use the assumption Eα (exp( τcα ) − 1) 6 1, to get Px (|Y2 | > t) 6 2 exp(− t ). 2π(α)ac 2 In the similar way way we show that Px (|ZN | > t) 6 2 exp(− t ). 2π(α)ac 2 Sums of Independent Random Variables Markov Chains Estimate for the Main Body To bound leads to PN i=1 Zi Ex (exp(λ we use the martingale method, which N X i=1 Zi )) 6 (Ex (Eα exp(2λZ1 ))N )1/2 . Sums of Independent Random Variables Markov Chains Estimate for the Main Body To bound leads to PN i=1 Zi Ex (exp(λ we use the martingale method, which N X Zi )) 6 (Ex (Eα exp(2λZ1 ))N )1/2 . i=1 Then one can apply bounds on the Laplace transform of Z1 , e.g. 4λ2 c 2 a2 Eα exp(2λZ1 ) 6 exp( ), 1 − 2λca which gives Ex (exp(λ N X i=1 Zi )) 6 (Ex exp( 4Nλ2 c 2 1/2 )) . 1 − 2λc Sums of Independent Random Variables Markov Chains Bounded Case Result The construction of N, i.e. {N 6 k } = {τα (k + 1) > n}, provides that Px (N > k ) 6 exp(− k ), for k > 2nπ(α). 4π(α)2 c 2 Sums of Independent Random Variables Markov Chains Bounded Case Result The construction of N, i.e. {N 6 k } = {τα (k + 1) > n}, provides that Px (N > k ) 6 exp(− k ), for k > 2nπ(α). 4π(α)2 c 2 Theorem (R.Adamczak,W.B, 2009) Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the geometric drift condition. Then for any regular initial point x ∈ S Px (| n X i=1 f̄ (Xi )| > t) 6 K exp(− 1 t2 t t min( , , )). 2 2 2 K nπ(α)c a π(α)c a cx a Sums of Independent Random Variables Markov Chains Remarks The result√shows that gaussian type concentration works on |t| 6 a n. Sums of Independent Random Variables Markov Chains Remarks The result√shows that gaussian type concentration works on |t| 6 a n. If we apply N 6 n and the variance improvement to bound Eα exp(λZ1 ). Consequently CLT works well for √ t 6 (σ 2 /(ac 2 ))( n/ log n). then the following can be obtained Sums of Independent Random Variables Markov Chains Remarks The result√shows that gaussian type concentration works on |t| 6 a n. If we apply N 6 n and the variance improvement to bound Eα exp(λZ1 ). Consequently CLT works well for √ t 6 (σ 2 /(ac 2 ))( n/ log n). then the following can be obtained Theorem (R.Adamczak,W.B 2009) Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the geometric drift condition. Then for any regular initial point x ∈ S Px (| n X f̄ (Xi )| > t) 6 K exp(− i=1 where σ 2 = Eα Z12 1 t2 t(log n)−1 t min( , , )), K nπ(α)σ 2 π(α)c 2 a cx a Sums of Independent Random Variables Markov Chains General Setting In the general case we assume the multiplicative drift condition: there exists a small set C and a function V : X → R+ and constants b < ∞, β > 0 such that exp(−V (x))P(exp(V ))(x) 6 exp(−β|f (x)| + b1C (x)). Sums of Independent Random Variables Markov Chains General Setting In the general case we assume the multiplicative drift condition: there exists a small set C and a function V : X → R+ and constants b < ∞, β > 0 such that exp(−V (x))P(exp(V ))(x) 6 exp(−β|f (x)| + b1C (x)). The multiplicative drift condition is a natural setting to show multiplicative ergodicity. It guarantees (I.Kontoyiannis, S.P.Meyn 2005) that there exists a small set C, d < ∞ s.t. τC −1 Ex (exp(d −1 X k =0 |f (Xk )|) − 1) 6 1, for x ∈ C. Sums of Independent Random Variables Markov Chains General Setting In the general case we assume the multiplicative drift condition: there exists a small set C and a function V : X → R+ and constants b < ∞, β > 0 such that exp(−V (x))P(exp(V ))(x) 6 exp(−β|f (x)| + b1C (x)). The multiplicative drift condition is a natural setting to show multiplicative ergodicity. It guarantees (I.Kontoyiannis, S.P.Meyn 2005) that there exists a small set C, d < ∞ s.t. τC −1 Ex (exp(d −1 X |f (Xk )|) − 1) 6 1, for x ∈ C. k =0 There exists a set S of full π-measure, such that τC −1 −1 Ex (exp((2dx ) X k =0 |f (Xk )|) − 1) 6 1, for x ∈ X . Sums of Independent Random Variables Main Result Following the idea with the split chain we reduce the problem to the case where a true atom C = α exists. Markov Chains Sums of Independent Random Variables Markov Chains Main Result Following the idea with the split chain we reduce the problem to the case where a true atom C = α exists. In the atomic case our assumptions can be rewritten as Eα (exp(d −1 τX α −1 k =0 |f (Xk )|) − 1) 6 1, Eα (exp(c −1 τα ) − 1) 6 1 Sums of Independent Random Variables Markov Chains Main Result Following the idea with the split chain we reduce the problem to the case where a true atom C = α exists. In the atomic case our assumptions can be rewritten as Eα (exp(d −1 τX α −1 |f (Xk )|) − 1) 6 1, Eα (exp(c −1 τα ) − 1) 6 1 k =0 Theorem (R.Adamczak, W.B. 2009) Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the multiplicative drift condition. Then for any regular initial point x ∈S Px (| n X i=1 f̄ (Xi )| > t) 6 K exp(− 1 t2 t t min( , , )). 2 K nπ(α)d π(α)cd dx Sums of Independent Random Variables Markov Chains Main Result Theorem (R.Adamczak, W.B. 2009) Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the multiplicative drift condition. Then for any regular initial point x ∈S Px (| n X f̄ (Xi )| > t) 6 K exp(− i=1 where σ 2 = Eα Z12 1 t2 t(log n)−1 t min( , , )), K nπ(α)σ 2 π(α)cd dx Sums of Independent Random Variables Markov Chains Main Result Theorem (R.Adamczak, W.B. 2009) Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the multiplicative drift condition. Then for any regular initial point x ∈S Px (| n X f̄ (Xi )| > t) 6 K exp(− i=1 1 t2 t(log n)−1 t min( , , )), K nπ(α)σ 2 π(α)cd dx where σ 2 = Eα Z12 It shows that√ the CLT-type concentration can be seen for t 6 (σ 2 /cd)( n/ log(n)). Sums of Independent Random Variables Markov Chains Main Result Theorem (R.Adamczak, W.B. 2009) Let (Xn )n>0 be Markov chain with values in (X , B) satisfying the multiplicative drift condition. Then for any regular initial point x ∈S Px (| n X f̄ (Xi )| > t) 6 K exp(− i=1 1 t2 t(log n)−1 t min( , , )), K nπ(α)σ 2 π(α)cd dx where σ 2 = Eα Z12 It shows that√ the CLT-type concentration can be seen for t 6 (σ 2 /cd)( n/ log(n)). The result allows to compute the independent-like concentration inequality in terms of parameters given in drift conditions.