Averaging and Rates of Averaging for Uniform Families of Deterministic Fast-Slow Skew Product Systems A. Korepanov Z. Kosloff I. Melbourne Mathematics Institute, University of Warwick, Coventry, CV4 7AL, UK 21 September 2015 Abstract We consider families of fast-slow skew product maps of the form xn+1 = xn + a(xn , yn , ), yn+1 = T yn , where T is a family of nonuniformly expanding maps, and prove averaging and rates of averaging for the slow variables x as → 0. Similar results are obtained also for continuous time systems ẋ = a(x, y, ), ẏ = g (y). Our results include cases where the family of fast dynamical systems consists of intermittent maps, unimodal maps (along the Collet-Eckmann parameters) and Viana maps. 1 Introduction The classical Krylov-Bogolyubov averaging method [32] deals with skew product flows of the form ẋ = a(x, y, ), ẏ = g(y). Let ν be an ergodic invariant probability measure for the fast flow generated by g. Under a uniform Lipschitz condition on a, it can be shown that solutions to the slow x dynamics, suitably rescaled, R converge almost surely to solutions of an averaged ODE Ẋ = ā(X) where ā(x) = a(x, y, 0) dν(y). A considerably harder problem is to handle the fully-coupled situation ẋ = a(x, y, ), ẏ = g(x, y, ). 1 Here it is supposed that there is a distinguished family of ergodic invariant probability measures νRx, for the fast vector fields g(x, ·, ) and the averaged vector field is given by ā(x) = a(x, y, 0) dνx,0 (y). The first results on averaging for fully-coupled systems were due to Anosov [6] who considered the case where the fast vector fields are Anosov with νx, absolutely continuous. Convergence here is in the sense of convergence in probability with respect to Lebesgue measure. Kifer [24, 25] extended the results of [6] to the case where the fast vector fields are Axiom A (uniformly hyperbolic) with SRB measures ν . More generally, Kifer considers the case where x 7→ νx,0 is sufficiently regular so that ā is Lipschitz, and gives necessary and sufficient conditions for averaging to hold. However, the only situations where the conditions in [24, 25] are verified are in the Axiom A case, even though it is hoped [25] that the conditions are verifiable for nonuniformly hyperbolic examples. Analogous results for the discrete time case are obtained in [23]. See also [15, Theorem 5] for certain partially hyperbolic fast vector fields. Here, we consider an intermediate class of examples that lies between the classical uncoupled situation and the fully coupled systems of [6, 24], namely families of skew products of the form ẋ = a(x, y, ), ẏ = g(y, ), (1.1) with distinguished family of ergodic invariant measures ν and averaged vector field R ā(x) = a(x, y, 0) dν0 (y). Notice that in this way we avoid issues concerned with the regularity of the averaged vector field ā, but we still have to deal with the -dependence of the measures ν as well as the fast vector fields. In other words, linear response (differentiability) of the invariant measures is replaced by statistical stability (weak convergence) which is more tractable. Indeed one aspect of the general framework in this paper is that our averaging theorems hold in a similar generality to the methods of Alves & Viana [2, 5] for proving statistical stability. Hence, we obtain results on averaging and rates of averaging for a large class of families of skew products (1.1), going far beyond the uniformly hyperbolic setting, both in discrete and continuous time. Our examples include situations where the fast dynamics is given by intermittent maps with arbitrarily poor mixing properties, unimodal maps where linear response fails, and flows built as suspensions over such maps. We obtain results also on rates of averaging. In the very simple situation ẋ = a(y), ẏ = g(y), where g is a uniformly expanding semiflow or uniformly hyperbolic flow, it is easily seen that the optimal rate of averaging in L1 is O(1/2 ). For systems of the 1 form (1.1), we often obtain the essentially optimal rate O( 2 − ).1 We have chosen to focus in this paper on the case of noninvertible dynamical systems. In this situation, the measures of interest are absolutely continuous and we are able to present the main ideas without going into the technical issues presented by 1 q− denotes q − a for all a > 0. 2 dealing with nonabsolutely continuous measures as required in the invertible setting. The invertible case will be covered in a separate paper. Even in the noninvertible setting, our results depend strongly on extensions and clarifications of the classical first order and second order averaging theorems. These prerequisites are presented in Appendices A and B and may be of independent interest. The remainder of the paper is organised as followed. In Section 2, we set up the averaging problem for families of fast-slow skew product systems in the discrete time case, leading to a general result Theorem 2.2 for such systems. In Section 3, we show that Theorem 2.2 leads easily to averaging when the fast dynamics is a family of uniformly expanding maps. Section 4 is the heart of the paper and deals with the case when the fast dynamics is a family of nonuniformly expanding maps. Our main examples are presented in Section 5. In Section 6, we show how the continuous time case reduces to the discrete time case. In Section 7, we present a simple example to show that almost sure convergence fails for families of skew products. 2 General averaging theorem for families of skew products Let T : M → M , 0 ≤ < 0 , be a family of transformations defined on a measurable space M . For each ∈ [0, 0 ), let ν denote a T -invariant ergodic probability measure on M . We consider the family of fast-slow systems () () () xn+1 = x() n + a(xn , yn , ), () yn+1 = T yn() , () x0 = x0 , () y0 = y0 , (2.1) () where the initial condition x0 = x0 is fixed throughout. The initial condition y0 ∈ M is chosen randomly with respect to various measures that are specified in the statements of the results. Here a : Rd × M × [0, 0 ) → Rd is a family of functions satisfying certain regularity hypotheses. R Define ā(x) = M a(x, y, 0) dν0 (y) and consider the ODE Ẋ = ā(X), X(0) = x0 . (2.2) We are interested in the convergence, and rate of convergence, of the slow variables () xn , suitably rescaled, to solutions X(t) of this ODE. More precisely, define x̂() : () [0, 1] → Rd by setting x̂() (t) = x[t/] . We study convergence of the difference z = sup |x̂() (t) − X(t)|. t∈[0,1] Remark 2.1 The restriction to the time interval [0, 1] entails no loss of generality: the results apply to arbitrary bounded intervals by rescaling . 3 Regularity assumptions Given a function g : Rd → Rn , we define kgkLip = max{|g|∞ , Lip g} where Lip g = supx6=x0 |g(x) − g(x0 )|/|x − x0 | and |x − x0 | = maxi=1,...,n |xi − x0i |. In this section, and also in Appendices A and B, we consider functions g : Rd × M × [0, 0 ) → Rn where there is no metric structure assumed on M . In that case, kgkLip = supy∈M sup∈[0,0 ) kg(·, y, )kLip . If E ⊂ Rd , then kg|E kLip is computed by restricting to x, x0 ∈ E (and y ∈ M , ∈ [0, 0 )). ∂ . If g : Rd × M × [0, 0 ) → Rn , then Dg : Rd × Throughout, we write D = ∂x M × [0, 0 ) → Rn×d and kDgkLip is defined accordingly. Similarly for kDg|E kLip when E ⊂ Rd . Below, L1 , L2 , L3 ≥ 1 are constants. For first order averaging we require that a is globally Lipschitz in x: kakLip ≤ L1 . (2.3) Set E = {x ∈ Rd : |x − x0 | ≤ L1 }. For second order averaging we require in addition that Da|E is Lipschitz in x: kDa|E kLip ≤ L2 . (2.4) In both cases, we suppose that sup sup |a(x, y, ) − a(x, y, 0)| ≤ L3 . (2.5) x∈E y∈M In the sequel we let L = max{L1 , L3 } when performing first order averaging, and L = max{L1 , L2 , L3 } when performing second order averaging. 2.1 Order functions and a general averaging theorem R Define ā(x, ) = M a(x, y, ) dν (y) and let v,x (y) = a(x, y, ) − ā(x, ). We define the first order function δ1, : M → R, δ1, = sup sup |v,x,n | where v,x,n = x∈E 1≤n≤1/ n−1 X v,x ◦ Tj . j=0 Next, we define the second order function δ : M → R, δ = δ1, + δ2, , δ2, = sup sup |V,x,n |, x∈E 1≤n≤1/ Theorem 2.2 Let S = supx∈E | R M where V,x,n n−1 X = (Dv,x ) ◦ Tj . j=0 a(x, y, 0) (dν − dν0 )(y)| + . 4 (a) Assume conditions (2.3) and (2.5). Then z ≤ 6e2L ( p δ1, + S ). (b) Assume conditions (2.3)—(2.5). If δ ≤ 21 , then z ≤ 6e2L (δ + S ). The proof of Theorem 2.2 is postponed to Appendices A and B. Remark 2.3 For averaging without rates, it suffices instead of condition (2.5) that lim→0 |a(x, y, ) − a(x, y, 0)| = 0 for all x ∈ E, y ∈ M . Remark 2.4 As shown in Section 7, almost sure convergence in the averaging theorem is not likely to hold for fast-slow systems of type (2.1). Hence we consider convergence in Lq with respect to certain absolutely continuous probability measures on M . Since z ≤ 2L and δ ≤ 4L, convergence in Lp is equivalent to convergence in Lq for all p, q ∈ (0, ∞). For brevity, we restrict statements to convergence in L1 except when speaking of rates. Throughout the remainder of this paper, it is assumed that conditions (2.3)—(2.5) are satisfied, and we apply Theorem 2.2(b). By Theorem 2.2(a), the results without rates go through unchanged when condition (2.4) fails, and results with rates hold usually with weaker rates of convergence, but for brevity these rates are not stated explicitly. According to Theorem 2.2(b), results on averaging reduce to estimating the scalar quantity S and the random variable δ = δ (y0 ). These quantities are discussed below in Subsections 2.2 and 2.3 respectively. 2.2 Statistical stability In this subsection, we suppose that M is a topological space and that the σ-algebra of measurable sets is the σ-algebra of Borel sets. Recall that the family of measures ν is statistically stable at R R = 0 if ν0 is the weak limit of ν as → 0 (ν →w ν0 ). This means that M φ dν → M φ dν0 for all continuous bounded functions φ : M → R. In the noninvertible setting, often a stronger property known as strong statistical stability holds. Let m be a reference measure on M and suppose that ν is absolutely continuous with respect to m for all ≥ 0. Then ν0 is strongly statistically stable if R the densities ρ = dν /dm satisfy lim→0 R = 0 where R = M |ρ − ρ0 | dm. We note that S ≤ LR + . Proposition 2.5 If ν →w ν0 , then lim→0 S = 0. R R Proof Let A (x) = M a(x, y, 0) dν (y)− M a(x, y, 0) dν0 (y). Let δ > 0. Since ν →w ν0 , we have that A (x) → 0 for each x, so there exists x > 0 such that |A (x)| < δ for all ∈ (0, x ). Moreover, |A (z)| < 2δ for all ∈ (0, x ) and z ∈ Bδ/(2L) (x). Since E is covered by finitely many such balls Bδ/(2L) (x), there exists ¯ > 0 such that 5 supx∈E |A (x)| < 2δ for all ∈ (0, ¯). Hence zero uniformly in x. R M a(x, y, 0) (dν − dν0 )(y) converges to Hence for proving averaging theorems, statistical stability takes care of the term S in Theorem 2.2. In specific examples, we are able to appeal to results on statistical stability with rates, yielding effective estimates. Proposition 2.6 Let q ≥ 1. There is a constant C > 0 such that |z |Lq (ν ) ≤ C(|δ |Lq (ν ) + S ), for all ∈ [0, 0 ). Assuming strong statistical stability, there is a constant C > 0 such that |z |Lq (ν0 ) ≤ C(|δ |Lq (ν ) + R1/q + ), for all ∈ [0, 0 ). Proof Let A = {y ∈ M : δ (y) ≤ 21 }. Then Theorem 2.2(b) applies on A and Z Z Z Z q q q q 2L q 1 z dν = z dν + z dν ≤ (2L) ν (δ > 2 ) + (6e ) (δ + S )q dν M M \A A M Z Z Z q q 2L q q 2L q ≤ (4L) δ dν + (6e ) (δ + S ) dν ≤ (12e ) (δ + S )q dν . M M M Hence |z |Lq (ν ) ≤ 12e2L |δ + S |Lq (ν ) ≤ 12e2L |δ |Lq (ν ) + 12e2L S , yielding the first estimate. Next, Z Z Z Z q q q 2L q (δ + S )q dν + (2L)q R z dν0 = z dν + z (dν0 − dν ) ≤ (12e ) M M M M Z 2L q ≤ (12e ) (δ + LR + )q dν + (2L)q R M Z 2L q ≤ (12Le ) (δ + R1/q + )q dν + (2L)q R ZM ≤ (24Le2L )q (δ + R1/q + )q dν . M Hence |z |Lq (ν0 ) ≤ 24Le2L |δ + R1/q + |Lq (ν ) ≤ 24Le2L (|δ |Lq (ν ) + R1/q + ), yielding the second estimate. 6 Corollary 2.7 (a) Assume statistical stability and that lim→0 R lim→0 M z dν = 0. R M δ dν = 0. Then (b) Assume in addition strong statistical stability and that µ is a probability measure R on M with µ ν0 . Then lim→0 M z dµ = 0. Proof Part (a), and part (b) in the special case µ = ν0 , are immediate from Proposition 2.6. To prove the general case of part (b), suppose R R for contradiction that z dµ → b > 0 along some subsequence k → 0. Since M zk dν0 → 0, by passing M k without loss to a further subsequence, we can suppose also that zk → 0 on a set of full measure with R respect to ν0 and hence with respect to µ. By the bounded convergence theorem, M zk dµ → 0 which is the desired contradiction. The next result is useful in situations where ν0 is absolutely continuous but whose support is not the whole of M . R Corollary 2.8 Assume strong statistical stability and that lim→0 M δ dν0 = 0. Suppose further that each T is nonsingular with respect to m and that for almost every y ∈ MR, there exists N ≥ 1 such that TN y ∈ supp ν0 for all ∈ [0, 0 ). Then lim→0 M z dµ = 0 for every probability measure µ on M with µ m. Proof First, we note that for all N ≥ 1, ≥ 0, |δ ◦ TN − δ |∞ ≤ 8LN . (2.6) R By the arguments in the proof of Corollary 2.7, it suffices R to prove that δ dm → 0. Suppose this is not the case. By Corollary 2.7(b), supp ν0 δ dm → 0. M Hence there exists a subsequence k → 0 and a subsetR A ⊂ supp ν0 with m(supp ν0 \ A) = 0 such that (i) δk → 0 pointwise on A and (ii) M δk dm → b > 0. Since each T is nonsingular, there exists M 0 ⊂ M with m(M 0 ) = 1 such that 0 M ∩ T−n (supp ν0 \ A) = ∅ for all k, n ≥ 1. By the hypothesis of the result, there is k a subset M 00 ⊂ M 0 with m(M 00 ) = 1 such that for any y ∈ M 00 there exists N ≥ 1 such that TNk y ∈ A for all k ≥ 1. Hence it follows from (i) R and (2.6) that δk → 0 pointwise on M 00 . By the bounded convergence theorem M δk dm → 0. Together with (ii), this yields the desired contradiction. Remark 2.9 The hypotheses of Corollary 2.8 are particularly straightforward to apply when supp ν0 has nonempty interior. This property is automatic for large classes of nonuniformly expanding maps, see [3, Lemma 5.6] (taking G = supp ν0 ∩ H(σ), it follows that Ḡ contains a disk). 2.3 Estimating the order function By Proposition 2.6 and Corollary 2.7, it remains to deal with the order function δ . This is a random variable depending on the initial condition y0 . Here we give a useful 7 estimate. Since δ = δ1, + δ2, and the definition of δ2, is identical to that of δ1, with a replaced by Da, it suffices to consider δ1, . From now on, Γ denotes a constant that only depends on d, p, L and whose value may change from line to line. Lemma 2.10 Let µ be a probability measure on M . Then for all p ≥ 0, ∈ [0, 0 ), Z Z p+d+1 p δ1, dµ ≤ Γ sup sup |v,x,n |p dµ. x∈E 1≤n≤1/ M M Proof For most of the proof, we work pointwise on M suppressing the initial condition y0 ∈ M . There exist x̃ ∈ E, ñ ∈ [0, 1/] such that δ1, = |v,x̃,ñ |. Observe that |v,x,n − v,x̃,ñ | ≤ |v,x,n − v,x̃,n | + |v,x̃,n − v,x̃,ñ | ≤ 2L−1 |x − x̃| + 2L|n − ñ| for every x ∈ E and n ≤ 1/. Define A = {x ∈ E : |x − x̃| ≤ 1 δ }, 8L 1, B = {n ∈ [0, −1 ] : |n − ñ| ≤ 1 δ }. 8L 1, Then for every x ∈ A and n ∈ B we have |v,x,n | ≥ δ1, /2. Moreover, since δ1, ≤ 2L, d we have Leb(A) ≥ Γδ1, and #B ≥ −1 δ1, /8L. Hence p [1/]−1 Z X p |v,x,n | dx ≥ (#B) Leb(A) E n=0 δ1, 2 p p+d+1 ≥ Γ−1 δ1, . Finally, Z p+d+1 δ1, p+1 dµ ≤ Γ M [1/]−1 Z X n=0 Z p p Z |v,x,n | dµ dx ≤ Γ sup E sup x∈E 1≤n≤1/ M |v,x,n |p dµ, M as required. R Remark R2.11 Often, estimating M |v,x,n | dµ leads to an essentially identical estimate for M sup1≤n≤1/ |v,x,n | dµ. In this case, slightly better convergence rates for δ1, can be obtained using the estimate Z Z p+d p δ1, dµ ≤ Γ sup sup |v,x,n |p dµ (2.7) M x∈E for all p ≥ 0, ∈ [0, 0 ). 8 M 1≤n≤1/ 3 Examples: Uniformly expanding maps Let T : M → M be a family of maps defined on a metric space (M, dM ) with invariant ergodic Borel ν . Let P denote the corresponding R probability measures R transfer operators, so M P v w dν = M v w ◦ T dν for all v ∈ L1 (ν ), w ∈ L∞ (ν ). From now on, we require Lipschitz regularity in the M variables in addition to the Rd variables as was required in assumptions (2.3) and (2.4). So for g : Rd × M × [0, 0 ) → Rn we define kgkLip = |g|∞ + sup∈[0,0 ) Lip g(·, ·, ) where Lip g(·, ·, ) = supx6=x0 supy6=y0 |g(x, y, ) − g(x0 , y 0 , )|/(|x − x0 | + dM (y, y 0 )). We continue to assume conditions (2.3)—(2.5) with this modified definition of k kLip . Proposition R R 3.1 Suppose that there is a sequence of constants an → 0 such that n |P v − MR v dµ | ≤ an kvkLip for all Lipschitz v : M → R and all n ≥ 1, ≥ 0. M Then lim→0 M δ dν = 0. Proof We prove the result for δ1, and δ2, separately. By Remark 2.4, we can work in Lq for any choice of q and we take q = d + 3. For every Lipschitz v, we have Z X n−1 M j=0 v◦ Tj 2 dν = n Z X j=0 Z =n 2 v ◦ Tj dν + 2 M v 2 dν + 2 =n X Z (n − k) 1≤k≤n−1 v 2 dν + 2 M X v v ◦ Tk dν M Z (n − k) 1≤k≤n−1 v ◦ Ti v ◦ Tj dν M 0≤i<j≤n−1 M Z Z X Pk v v dν . M Hence for v Lipschitz and mean zero, Z X n−1 M v◦ Tj 2 dν ≤ bn kvk2Lip , j=0 P where bn = n + 2n 1≤k≤n−1 ak = o(n2 ) by the assumption on an . R By condition (2.3), kv,x kLip ≤ 2L for all , x, so supx∈E sup1≤n≤1/ M |v,x,n |2 dν ≤ R d+3 4L2 bn . By Lemma 2.10, it follows that lim→0 MR δ1, dν = 0. Similarly, by condition (2.4), kV,x kLip ≤ 2L, so supx∈E sup1≤n≤1/ M |V,x,n |2 dν ≤ 4L2 bn and hence R d+3 lim→0 M δ2, dν = 0. Remark 3.2 The proof uses only that n−1 Pn k=1 an → 0. Proposition 3.1 is useful in situations where T is a family of (piecewise) uniformly expanding maps. A general result of Keller & Liverani [22] guarantees uniform spectral properties of the transfer operators P under mild conditions, and consequently 9 R lim→0 M δq dν = 0 for all q. We mention two situations where this idea can be applied. Again, for brevity we work in L1 except when discussing convergence rates (see Remark 2.4). Example 3.3 (Uniformly expanding maps) Suppose that M = Tk ∼ = R/ Zk is a torus with Haar measure m and distance dM inherited from Euclidean distance on Rk and normalised so that diam M = 1. We say that a C 2 map T : M → M is uniformly expanding if there exists λ > 1 such that |(DT )y v| ≥ λ|v| for all y ∈ M , v ∈ Rk . There is a unique absolutely continuous invariant probability measure, and the density is C 1 and nonvanishing. If T : M → M , ∈ [0, 1], is a continuous family of C 2 maps, each of which is uniformly expanding, with corresponding probability measures ν , then R it follows from [22] that we are in the situation of Proposition 3.1, and so lim→0 M δ dν = 0. Moreover, it is well-known that ν0 is uniformly equivalent to m and is strongly statistically stable. Hence by R R Corollary 2.7 we obtain the averaging result lim→0 M z dν = 0 and lim→0 M z dµ = 0 for every absolutely continuous probability measure µ. Suppose further that T : M → M , ∈ [0, 1], is a C k family of C 2 maps, for some k ∈ (0, (for instance [22] with Banach spaces C 0 and C 1 ), R 1]. By standard results R = M |ρ − ρ0 | dm = O(k ). If k ∈ (0, 21 ), then by Remark 4.4 below we obtain the convergence rate O(k ) for z in Lq (ν ) and Lq (m) for all q > 0. If k ≥ 12 , then the 1 convergence rate for z is O( 2 − ). Example 3.4 (Piecewise uniformly expanding maps) Let M = [−1, 1] with Lebesgue measure m. We consider continuous maps T : M → M with T (−1) = T (1) = −1 such that T is C 2 on [−1, 0] and [0, 1]. We require that there exists λ > 1 such that T 0 ≥ λ on [−1, 0) and T 0 ≤ −λ on (0, 1]. There exists a unique absolutely continuous invariant probability measure with density of bounded variation. The results are analogous to those in Example 3.3. Let → T be a continuous family of such maps on [−1, 1] with associated measures ν . We assume that T0 is topologically mixing on the interval [T02 (0), T0 (0)] and that 0 is not periodic (which guarantees that T is mixing for small). Then ν0 is strongly statistically stable, so by Corollaries 2.7 and 2.8 we obtain averaging in L1 (ν ) and also in L1 (µ) for every absolutely continuous probability measure µ. Now suppose that → T is a C 1 family of such maps on [−1, 1] with densities ρ = dν /dm. Keller [21] showed that → ρ is C 1− as a map into L1 densities. Hence 1 − q 2 q > 0 and in L1 (ν0 ). we obtain the convergence rate O( R ) for z in L (ν ), for all More precisely, [21] shows that M |ρ −ρ0 | dm = O( log −1 ). By [8], this estimate is optimal, so this is a situation where linear response fails in contrast to Example 3.3. 10 4 Families of nonuniformly expanding maps In this section, we consider the situation where the fast dynamics is generated by nonuniformly expanding maps T , such that the nonuniform expansion is uniform in the parameter . In Subsection 4.1, we recall the notion of nonuniformly expanding map. In Subsection 4.2, we describe the uniformity criteria on the family T and state our main result on averaging, Theorem 4.3, for such families. In Subsection 4.3 we establish some basic estimates for nonuniformly expanding maps. In Subsection 4.4 we prove Theorem 4.3. 4.1 Nonuniformly expanding maps Let (M, dM ) be a locally compact separable bounded metric space with finite Borel measure m and let T : M → M be a nonsingular transformation for which m is ergodic. Let Y ⊂ M be a subset of positive measure, and normalise m so that m(Y ) = 1. Let α be an at most countable measurable partition of Y with m(a) > 0 for all a ∈ α. We suppose that there is an L1 return time function τ : Y → Z+ , constant on each a with value τ (a) ≥ 1, and constants λ > 1, η ∈ (0, 1], C0 , C1 ≥ 1 such that for each a ∈ α, (1) F = T τ restricts to a (measure-theoretic) bijection from a onto Y . (2) dM (F x, F y) ≥ λdM (x, y) for all x, y ∈ a. (3) dM (T ` x, T ` y) ≤ C0 dM (F x, F y) for all x, y ∈ a, 0 ≤ ` < τ (a). (4) ζ = dm|Y dm|Y ◦F satisfies | log ζ(x) − log ζ(y)| ≤ C1 dM (F x, F y)η for all x, y ∈ a. Such a dynamical system T : M → M is called nonuniformly expanding. We refer to F = T τ : Y → Y as the induced map. (It is not required that τ is the first return time to Y .) It follows from standard results (recalled later) that there is a unique absolutely continuous ergodic T -invariant probability measure ν on M . Remark 4.1 The uniformly expanding maps in Example 3.3 are clearly nonuniformly expanding: take Y = M , η = 1, τ = 1. Then conditions (1) and (2) are immediate, condition (3) is vacuously satisfied, and condition (4) holds with C1 = supx,y∈M, x6=y |(DT )x − (DT )y |/dM (x, y). 4.2 Uniformity assumptions Now suppose that T : M → M , ∈ [0, 0 ), is a family of nonuniformly expanding maps as defined in Subsection 4.1, with corresponding absolutely continuous ergodic invariant probability measures ν . 11 Definition 4.2 Let p > 1. We say that T : M → M is a uniform family of nonuniformly expanding maps (of order p) if (i) The constants C0 , C1 ≥ 1, λ > 1, η ∈ (0, 1] can be chosen independent of ∈ [0, 0 ). + p (ii) The return R timepfunctions τ : Y → Z lie in L for all ∈ [0, 0 ), and moreover sup∈[0,0 ) Y |τ | dm < ∞. We can now state our main result for this section. Recall the set up in Section 2. Theorem 4.3 If T : M → M is a uniform family of nonuniformly expanding maps of order p, then there is a constant C > 0 such that for all ∈ [0, 0 ) ( Z C(p−1)/2 , p>3 δp+d−1 dν ≤ , p−1 p−2 C| log | , p ∈ (2, 3] M and ( C(p−1)/2 , δp+d dν ≤ 2 C| log |p−1 (p−1) /p , M Z p ∈ (2, 3] . p ∈ (1, 2] Remark 4.4 In the case p > 3, it follows from Theorem 4.3 that |δ |Lq (ν ) = O((p−1)/(2(p+d−1)) ), for all q ≤ p + d − 1. Since δ is uniformly bounded, |δ |Lq (ν ) = O((p−1)/(2q) ) for all q > p + d − 1. Similar comments apply for p ∈ (1, 3]. In particular, if p can be taken arbitrarily large in Definition 4.2, then we obtain 1 that |δ |Lq (ν ) = O( 2 − ) for all q > 0. R 1 If in addition ν0 is strongly statistically stable with R = M |ρ −ρ0 | dm = O( 2 − ), 1 then by Proposition 2.6 we obtain |z |Lq (ν ) = O( 2 − ) for all q > 0 and |z |L1 (ν0 ) = 1 O( 2 − ). Remark 4.5 Alves & Viana [5] prove strong statistical stability for a large class of noninvertible dynamical systems. These maps are uniform families of nonuniformly expanding maps in a sense that is very similar to our definition. In fact it is almost the case that their definition includes our definition, so it is almost the case that verifying the assumptions of [5] is sufficient to obtain averaging and rates of averaging via Theorem 4.3. To be more precise, let us momentarily ignore assumption (3) in Subsection 4.1. Then Definition 4.2(i) with η = 1 is immediate from [5, (U1)], and Definition 4.2(ii) is immediate from [5, (U20 )] which follows from their conditions (U1) and (U2). Hence it remains to discuss assumption (3). This assumption is not explicitly mentioned in [5] since it is not required for the statement of their main results. 12 However, in specific applications, the hypotheses in [5] are often verified via the method of hyperbolic times [1]. When the return time function τ is a hyperbolic time, then it is automatic that C0 = 1 (see for example [2, Proposition 3.3(3)]). Alves et al. [4] introduced a general method for constructing inducing schemes, where τ is not necessarily a hyperbolic time but is close enough that C0 can still be chosen uniformly. Alves [2] combined the methods of [4] and [5] to prove statistical stability for large classes of examples. We show now that in the situation discussed by [2], assumption (3) holds with uniform C0 and hence our main results hold. Certain quantities δ1 > 0 and N0 ≥ 1 are introduced in [2, Lemma 3.2] and [2, eq. (16)] respectively, and are explictly uniform in . Moreover τ = n + m where n is a hyperbolic time and m ≤ N0 (see [2, Section 4.3]), so C0 depends only on at most N0 iterates of T . The construction in [2] (see in particular the proof of [2, Lemma 4.2]) ensures that the derivative of T is bounded along these iterates, so assumption (3) holds and C0 is uniform in . We mention also the extension of [4] due to Gouëzel [19] where C0 = 1 (see [19, Theorem 3.1 4)].) Finally, we note that when [4] is used to obtain polynomial decay of correlations with rate O(1/nβ ), β > 0, the resulting uniform family is of order p = β + 1−. (Uniformity in in Definition 4.2(ii) follows from [2, Lemma 5.1].) 4.3 Explicit estimates for nonuniformly expanding maps Throughout this subsection, we work with a fixed nonuniformly expanding map T : M → M . Some standard constructions and estimates are described. The main novelty is that we stress the dependence of various constants on the underlying constants C0 , C1 , λ and η. For convenience, we normalise the metric dM so that diam M = 1. Symbolic metric Fix θ ∈ [λ−η , 1) and define the symbolic metric dθ (x, y) = θs(x,y) where the separation time s(x, y) is the least integer n ≥ 0 such that F n x and F n y lie in distinct partition element. It is assumed that the partition α separates orbits of F , so s(x, y) is finite for all x 6= y guaranteeing that dθ is a metric. Proposition 4.6 dM (x, y)η ≤ dθ (x, y) for all x, y ∈ Y . Proof Let n = s(x, y). By condition (2), 1 ≥ diam Y ≥ dM (F n x, F n y) ≥ λn dM (x, y) ≥ (θ1/η )−n dM (x, y). Hence dM (x, y)η ≤ θn = dθ (x, y). 13 Invariant measures and transfer operators For a positive observable ψ : Y → (0, ∞), define the log-Lipschitz seminorm |ψ|θ` = | log ψ(x) − log ψ(y)| . dθ (x, y) x,y∈Y, x6=y sup We note that e −|ψ|θ` Z ψ dm ≤ ψ ≤ e |ψ|θ` Y Z ψ dm. (4.1) Y Also, for ψk : Y → (0, ∞), k ≥ 1, ∞ X ψk ≤ sup |ψk |θ` . k=1 θ` (4.2) k≥1 1 1 operator corresponding to F and m, RLet P̃ : L (Y ) →R L (Y ) denote the transfer ∞ so Y φ ◦ F ψ dm = Y φ P̃ ψ dm for all φ ∈ L and ψ ∈ L1 . Proposition 4.7 Let ψ : Y → (0, ∞). Then (a) |P̃ ψ|θ` ≤ C1 + θ|ψ|θ` , and R R (b) e−(C1 +θ|ψ|θ` ) ψ dm ≤ P̃ ψ ≤ eC1 +θ|ψ|θ` ψ dm. Proof Let a ∈ α and write ψa = 1a ψ. For y ∈ Y , we have (P̃ ψa )(y) = ζ(ya )ψ(ya ) where ya is the unique preimage of y under F lying in a. Let x, y ∈ Y with preimages xa , ya ∈ a. By condition (4) and Proposition 4.6, | log ζ(xa ) − log ζ(ya )| ≤ C1 dθ (x, y). Hence | log(P̃ ψa )(x)− log(P̃ ψa )(y)| ≤ | log ζ(xa ) − log ζ(ya )| + | log ψ(xa ) − log ψ(ya )| ≤ C1 dθ (x, y) + |ψ|θ` dθ (xa , ya ) ≤ (C1 + θ|ψ|θ` )dθ (x, y), and so |P̃Rψa |θ` ≤ C1 +Rθ|ψ|θ` . Part (a) follows from (4.2). Since Y P̃ ψ dm = Y ψ dm, part (b) follows from (4.1) and part (a). Proposition 4.8 There is a unique ergodic F -invariant probability measure µ on Y equivalent to m|Y . Moreover, the density h = dµ/dm|Y satisfies −1 e−C1 (1−θ) ≤ h ≤ eC1 (1−θ) −1 and |h|θ` ≤ C1 (1 − θ)−1 . Proof There is at most one ergodic invariant probability measure equivalent to m|Y . To prove existence of such a measure with the desired R properties, we define the sequence of positive functions ĥ0 = 1, ĥn+1 = P̃ ĥn . Then Y ĥn dm = 1 for all n ≥ 0. 14 Inductively, it follows from Proposition 4.7(a) that |ĥn |θ` ≤ C1 (1 + θ + · · · + θn−1 ) ≤ C1 (1 − θ)−1 for all n ≥ 1. Also, by Proposition 4.7(b), ĥn = P̃ ĥn−1 ≤ eC1 +θ|ĥn−1 |θ` ≤ eC1 +C1 θ(1−θ) −1 −1 = eC1 (1−θ) , −1 1 (1−θ) and similarly ĥn ≥ e−C for all P n ≥ 1. R P n−1 j n−1 −1 −1 P̃ 1. It is immediate that ĥ = n h dm = 1 Define hn = n j j=0 j=0 Y n Pn−1 C1 (1−θ)−1 −C1 (1−θ)−1 . By (4.2), |hn |θ` = | j=0 ĥj |θ` ≤ C1 (1 − θ)−1 . In ≤ hn ≤ e and e particular, the sequence hn is bounded and equicontinuous. By Arzela-Ascoli, there exists a subsequential limit h. The limit inherits the properties Z −1 −1 h dm = 1, e−C1 (1−θ) ≤ h ≤ eC1 (1−θ) , |h|θ` ≤ C1 (1 − θ)−1 . Y Moreover, P̃ hn = hn + n−1 (P̃ n 1 − 1) and so P̃ h = h. Hence dµ = h dm defines an F -invariant probability measure. W −j α Define g : Y → R by setting g|a = dµ|a /d(µ ◦ F |a ) for a ∈ α. Let αn = n−1 j=0 F denote the set of n-cylinders in Y , and write gn = (g ◦ F n−1 ) · · · (g ◦ F ) · g. Thus for a ∈ αn , gn |a = dµ|a /d(µ ◦ F n |a ) is the inverse of the Jacobian of F n |a. Let P : L1 (Y )R→ L1 (Y ) denote Rthe (normalised) transfer operator corresponding to F and µ,Pso Y φ ◦ F ψ dµ = Y φ P ψ dµ for all φ ∈ L∞ and ψ ∈ L1 . Then (P n v)(y) = a∈αn g(ya )v(ya ) where ya is the unique preimage of y under F n lying in a. Proposition 4.9 For all x, y ∈ a, a ∈ αn , n ≥ 1, gn (y) ≤ C2 µ(a) and where C2 = 2C1 (1 − θ)−2 e4C1 |gn (x) − gn (y)| ≤ C2 µ(a)dθ (F n x, F n y), (1−θ)−2 (4.3) . Proof First, suppose that n = 1 and let x, y ∈ a, a ∈ α. We have g = ζh/h ◦ F , so using Proposition 4.8, | log g(x) − log g(y)| ≤ | log ζ(x) − log ζ(y)| + | log h(x) − log h(y)| + | log h(F x) − log h(F y)| ≤ C1 dθ (F x, F y) + |h|θ` dθ (x, y) + |h|θ` dθ (F x, F y) ≤ C1 (1 + θ(1 − θ)−1 + (1 − θ)−1 )dθ (F x, F y) = 2C1 (1 − θ)−1 dθ (F x, F y). Next, let x, y ∈ a, a ∈ αn for general n ≥ 1. Then | log gn (x) − log gn (y)| ≤ n−1 X | log g(F j x) − log g(F j y)| j=0 −1 ≤ 2C1 (1 − θ) n X j j dθ (F x, F y) ≤ 2C1 (1 − θ) j=1 −2 −1 n X θn−j dθ (F n x, F n y) j=1 n n ≤ 2C1 (1 − θ) dθ (F x, F y). (4.4) 15 −2 In particular gn (x)/gn (y) ≤ e2C1 (1−θ) . Hence Z Z −2 µ(a) = 1a dµ = P n 1a dµ ≥ inf P n 1a = inf gn (y) ≥ e−2C1 (1−θ) sup gn (y), Y y∈a Y y∈a and so −2 gn (y) ≤ e2C1 (1−θ) µ(a). (4.5) Next, we note the inequality t − 1 ≤ t log t which is valid for all t ≥ 1. Suppose that gn (y) ≤ gn (x). Setting t = gn (x)/gn (y) ≥ 1 and using (4.4), gn (x) gn (x) gn (x) −2 −1≤ log ≤ e2C1 (1−θ) 2C1 (1 − θ)−2 dθ (F n x, F n y). gn (y) gn (y) gn (y) Hence −2 gn (x) − gn (y) ≤ gn (y)e2C1 (1−θ) 2C1 (1 − θ)−2 dθ (F n x, F n y), and the result follows from (4.5), There is a standard procedure to pass from µ to a T -invariant ergodic absolutely continuous probability measure ν on M , which we now briefly recall, since the construction is required in the proof of Lemma 4.17. Define the Young tower [36] ∆ = {(y, `) ∈ Y × Z : 0 ≤ ` < τ (y)}, (4.6) R with probability measure µ∆ = µ × {counting}/ Y τ dµ. Define π∆ : ∆ → M , π∆ (y, `) = T ` y. Then ν = (π∆ )∗ µ∆ is the desired probability measure on M . Induced observables Given φ : Y → Rd , we define kφkθ = |φ|∞ + |φ|θ where |φ|θ = supx6=y |φ(x) − φ(y)|/dθ (x, y). Then φ is dθ -Lipschitz if kφkθ < ∞. We say that φ : Y → Rd is locally dθ -Lipschitz if supy∈a |φ(y)| < ∞ and supx,y∈a, x6=y |φ(x) − φ(y)|/dθ (x, y) < ∞ for each a ∈ α. Given an observable v : M → Rd , we define the induced observable V : Y → R, τ (y)−1 V (y) = X v(T ` y). `=0 If v : M → Rd satisfies R v dν = 0, then M R Y V dµ = 0. Proposition 4.10 If v : M → Rd is dM -Lipschitz, then V : Y → Rd is locally dθ -Lipschitz and P V : Y → Rd is dθ -Lipschitz. Moreover, |V (y)| ≤ τ (a)|v|∞ , |V (x) − V (y)| ≤ C0 θ−1 τ (a)Lip v dθ (x, y), for all x, y ∈ a, a ∈ α, and |P V |∞ ≤ C2 |τ |L1 (µ) |v|∞ , |P V |θ ≤ C0 C2 θ−1 |τ |L1 (µ) kvkLip . 16 Proof The estimate for V (y) is immediate. By condition (3) and Proposition 4.6, τ (a)−1 |V (x) − V (y)| ≤ Lip v X τ (a)−1 ` ` dM (T x, T y) ≤ C0 Lip v `=0 X dM (F x, F y) `=0 = C0 τ (a)Lip v dM (F x, F y) ≤ C0 τ (a)Lip v dθ (F x, F y)1/η ≤ C0 τ (a)Lip v dθ (F x, F y) = C0 θ−1 τ (a)Lip v dθ (x, y), completing the estimates P for V . Next, (P V )(y) = a∈α g(ya )V (ya ), so by (4.3), X |P V |∞ ≤ C2 µ(a)τ (a)|v|∞ = C2 |τ |L1 (µ) |v|∞ . a∈α Also, |(P V )(x) − (P V )(y)| ≤ X |g(xa ) − g(ya )||V (xa )| + a∈α ≤ C2 X X g(ya )|V (xa ) − V (ya )| a∈α µ(a)dθ (x, y)τ (a)|v|∞ + C2 a∈α X µ(a)C0 θ−1 τ (a)Lip v dθ (x, y), a∈α yielding the estimate for |P V |θ . Explicit coupling argument For the remainder of this subsection, Lp -norms are computed using the invariant probability measure µ. Note that Z Z |ψ|θ` −|ψ|θ` e ψ dµ ≤ ψ ≤ e ψ dµ. (4.7) Y Y Proposition 4.11 Let ψ : Y → (0, ∞). Then (a) |P ψ|θ` ≤ C2 + θ|ψ|θ` , and R R (b) e−(C2 +θ|ψ|θ` ) ψ dµ ≤ P ψ ≤ eC2 +θ|ψ|θ` ψ dµ. Proof By the beginning of the proof of Proposition 4.9, | log g(x) − log g(y)| ≤ 2C1 (1 − θ)−1 dθ (F x, F y) ≤ C2 dθ (F x, F y) for all x, y lying in a common partition element. The proof now proceeds exactly as in Proposition 4.7 with P̃ , m, ζ and C1 replaced by P , µ, g and C2 . R By (4.7), ψ − ξ Y ψ dµ is positive for all ξ ∈ [0, e−|ψ|θl ). 17 Proposition 4.12 Let ψ : Y → (0, ∞). For each ξ ∈ [0, e−|ψ|θl ) Z |ψ|θl ψ dµ ≤ . ψ − ξ 1 − ξe|ψ|θl θl Y Proof Let κ(y) = log ψ(y). Note that Z d eκ 1 κ R R log e − ξ ψ dµ = κ = . −κ dκ e − ξ Y ψ dµ 1 − ξe ψ dµ Y Y By (4.7), 1 1− ξe−κ(y) R Y ψ dµ = 1 1 R ≤ , −1 1 − ξe|ψ|θl 1 − ξψ(y) ψ dµ Y for all y ∈ Y . Hence, by the mean value theorem, for x, y ∈ Y , Z Z |κ(x) − κ(y)| |ψ|θl dθ (x, y) κ(y) κ(x) ψ dµ ψ dµ − log e − ξ log e − ξ ≤ . ≤ 1 − ξe|ψ|θl 1 − ξe|ψ|θl Y Y This completes the proof. Choose R large enough so that C2 + θR < R. Define ξ = e Lemma 4.13 Let φ : Y → R be an observable with R −R C2 + θR 1− . R φ dµ = 0, and |φ|θ ≤ R. Then |P n φ|1 ≤ 2(1 + |φ|1 ))(1 − ξ)n . Proof Write φ = ψ0+ − ψ0−R , where ψ0−R = 1 − min{0, φ} and ψ0+ = 1 + max{0, φ}. Then ψ0± : Y → [1, ∞) and Y ψ0+ dµ = Y ψ0− dµ ≤ 1 + |φ|1 . Define Z ± ± ψn+1 = P ψn − ξ ψn± dµ, n ≥ 0. Y R + ± ± Then ψ dµ = (1 − ξ) ψ dµ, so inductively we have that ψn dµ = n+1 n Y Y R − Y R ± n n n + n − + − dµ. Moreover, P φ = P ψ − P ψ = ψ − ψ ψ dµ = (1 − ξ) ψ 0 0 0 n n. n Y Y ± We claim that the functions ψn are nonnegative for all n. Then Z Z n + − + n |P φ|1 ≤ |ψn |1 + |ψn |1 = 2 ψn dµ = 2(1 − ξ) ψ0+ dµ ≤ 2(1 − ξ)n (1 + |φ|1 ), R R Y Y as required. It remains to verify the claim. Since ξ < e−R , it suffices by Proposition 4.11(b) and the choice of R that |ψn± |θl ≤ R for all n. 18 For n = 0, we suppose first that φ(x) ≥ φ(y) ≥ 0. Then φ(x) − φ(y) log ψ0+ (x) − log ψ0+ (y) = log(1 + φ(x)) − log(1 + φ(y) = log 1 + 1 + φ(y) φ(x) − φ(y) ≤ φ(x) − φ(y). ≤ 1 + φ(y) Also, if φ(x) ≥ 0 ≥ φ(y), log ψ0+ (x) − log ψ0+ (y) = log(1 + φ(x)) ≤ φ(x) ≤ φ(x) − φ(y). It follows that |ψ0+ |θ` ≤ |φ|θ ≤ R. Noting that ψ0− = 1 + max{−φ, 0}, we obtain that |ψ0± |θ` ≤ R. Suppose inductively that |ψn± |θl ≤ R for some n. By Proposition 4.11(a), ± |P ψn± |θl ≤ C2 + θR < R, and so ξ < e−R < e−|P ψn |θl . By Proposition 4.12 and the definition of ξ, Z C2 + θR ± ± |ψn+1 |θl = P ψn − ξ P ψn± dµ ≤ = R, 1 − ξeR θl Y completing the induction argument. Corollary 4.14 Let p > 0, R > 0. There exists a constant γ ∈ (0, 1) depending 1−1/p only on p, θ, C2 and R such that |P n φ|p ≤ 2dγ n (1 + |φ|1 )1/p |φ|∞ for all mean zero dθ -Lipschitz φ : Y → Rd with |φ|θ ≤ R and all n ≥ 1. Proof The result holds for p ≤ 1 and d = 1 by Lemma 4.13. Working componentwise, the extension to general d is immediate. Since µ is T -invariant, it follows from the duality definition of P that |P φ|∞ ≤ |φ|∞ for all φ ∈ L∞ . Let γ̃ = 1 − ξ be the constant for p = 1 and suppose that p > 1. Then Z Z Z n p n p−1 n p−1 |P φ| dµ = |φ|∞ |P n φ| dµ ≤ 2dγ̃ n (1 + |φ|∞ )|φ|p−1 |P φ| dµ ≤ |P φ|∞ ∞ . Y Y Y The result follows with γ = γ̃ 1/p . Explicit moment estimate Proposition 4.15 Let p ≥ 1, R ≥ 1. Suppose that |τ |p kvkLip ≤ R. Then there exists m, χ ∈ Lp (Y, Rd ) such that V = m + χ ◦ F − χ, and m ∈ ker P . Moreover, |m|p ≤ 3C3 , and |χ|p ≤ C3 , where C3 ≥ R depends only on d, θ, C0 , C2 and R. 19 Proof Let R̂ = C0 C2 θ−1 R, and choose γ̂ corresponding to R̂ as in Corollary 4.14. By Proposition 4.10, |P V |θ ≤ R̂. Hence by Proposition 4.10 and Corollary 4.14 with φ = P V , for n ≥ 1, 1−1/p |P n V |p ≤ 2dγ̂ n−1 (1 + |P V |1 )1/p |P V |∞ ≤ 2dγ̂ n−1 (1 + C2 R)1/p (C2 R)1−1/p ≤ 4dC2 Rγ̂ n−1 . P k p −1 It follows that χ = ∞ k=1 P V lies in L and |χ|p ≤ C3 where C3 = 4dC2 R(1 − γ̂) . p Write V = m+χ◦F −χ; then m ∈ L and P m = 0. Finally, |m|p ≤ |V |p +2|χ|p ≤ R + 2|χ|p ≤ 3C3 . Pn−1 j Corollary 4.16 Define Vn = j=0 V ◦ F . Let p > 1, R ≥ 1. Suppose that |τ |p kvkLip ≤ R. Then there exists a constant C4 ≥ R depending only on p and C3 such that ( C4 n1/2 p>2 . max |Vj | ≤ 1/p 1≤j≤n p C4 (log2 n)n p ∈ (1, 2] Pn−1 Proof First note that Vn = mn + χ ◦ F n − χ where mn = j=0 m ◦ F . Hence |Vn |p ≤ |mn |p + 2|χ|p . Since m ∈ ker P , an application of Burkholder’s inequality [13] shows that |mn |p ≤ C(p)|m|p nmax{1/p,1/2} , (see for example the proof of [29, Proposition 4.3]). Hence |Vn |p ≤ C(p)(|m|p + 2|χ|p )nmax{1/p,1/2} . By Billingsley [12, Problem 10.3, p. 112] (see also [33, Equations (2.2) and (2.3)] with ui = 1, ν = p, α = max{1, p/2}), and using stationarity, max |Vj | ≤ C(p)(|m|p + 2|χ|p )(log2 n)nmax{1/p,1/2} . 1≤j≤n p For p ∈ (1, 2], the result follows from Proposition 4.15 with C4 = 15C3 C(p). By [30, Theorem 1], the logarithmic factor can be removed for p > 2 provided C4 is increased by a factor depending only on p. Lemma 4.17 Let p > 1, R ≥ 1. Suppose that |τ |p kvkLip ≤ R. Let vn = Then ( 1/(p−1) 1/2 3C4 |τ |p n p>2 |vn |p−1 ≤ . 1/q 3C4 |τ |p (log2 n)n1/p p ∈ (1, 2] Pn−1 j=0 v ◦ T j. Moreover, there exists a constant C5 ≥ R depending only on p and C4 such that ( 1/(p−1) 1/2 3C5 |τ |p n p>3 max |vj | ≤ . 1/(p−1) p−1 1/(p−1) j≤n 3C5 |τ |p (log2 n)n p ∈ (2, 3] 20 Proof Let q = p − 1. Define the tower ∆ as in (4.6) with tower map f : ∆ → ∆ where ( (y, ` + 1), ` ≤ τ (y) − 2 f (y, `) = . (F y, 0), ` = τ (y) − 1 R Recall that µ∆ = µ × counting/τ̄ on ∆ where τ̄ = Y τ dµ. Also, ν = (π∆ )∗ µ∆ where π∆ : ∆ → M is the projection π∆ (y,P`) = T ` y. R R n−1 Let v̂ = v ◦ π∆ and define v̂n = j=0 v̂ ◦ f j . Then M |vn |q dν = ∆ |v̂n |q dµ∆ . Next, let Nn : ∆ → {0, 1, . . . , n} be the number of laps by time n, Nn (y, `) = #{j ∈ {0, . . . , n} : f j (y, `) ∈ Y }. Then v̂n (y, `) = VNn (y,`) (y) + H ◦ f n (y, `) − v̂` (y, 0), P where H(y, `) = ``0 =1 v̂(y, `0 ) = v̂`+1 (y, 0) − v̂(y, 0). Note that |v̂` (y, 0)| ≤ |v|∞ τ (y) and |H(y, `)| ≤ |v|∞ τ (y) for all (y, `) ∈ ∆. Now Z Z n q |H ◦ f | dµ∆ = ∆ Z q |H| dµ∆ = (1/τ̄ ) ∆ ≤ |v|q∞ Z Y τ (y)−1 Y X j=0 τ (y)−1 τ q dµ = |v|q∞ Z X |H(y, `)|q dµ j=0 τ p dµ = |τ |pp |v|q∞ = (|τ |p |v|∞ )q |τ |p ≤ Rq |τ |p , Y R where we have used the fact that τ̄ ≥ 1. Similarly, ∆ |v̂` (y, 0)|q dµ∆ ≤ Rq |τ |p . Next, using Hölder’s inequality, Z Z Z q q |VNj (y,`) (y)| dµ∆ ≤ max |Vj (y)| dµ∆ = (1/τ̄ ) τ max |Vj |q dµ j≤n ∆ ∆ j≤n Y q ≤ |τ |p max |Vj |q = |τ |p max |Vj | . j≤n p/q j≤n p By the triangle inequality and Corollary 4.16, |vn |p−1 = |v̂n |Lq (∆) )≤ ≤ |τ |1/q (2R + max |V | j p p j≤n ( 1/q 3C4 |τ |p n1/2 1/q 3C4 |τ |p (log2 n)n1/p p>2 . p ∈ (1, 2] Suppose that p > 3. By [30, Theorem increasing C4 by a factor that 1], after 1/q depends on p, we have that maxj≤n |vn | p−1 ≤ 3C4 |τ |p n1/2 . 1/q If p ∈ (1, 3], then we have |vn |p−1 ≤ 3C4 |τ |p n1/(p−1) . For p ∈ (2, 3], it follows 1/q from [30, Theorem 3] that maxj≤n |vn |p−1 ≤ 3C4 |τ |p (log2 n)n1/(p−1) . 21 4.4 Proof of Theorem 4.3 RDefine v,x and v,x,n as in Section 2. Note that Lip v,x ≤ 2L for all , x and v dν = 0. Take R = 2L sup≥0 |τ |p in Lemma 4.17. M ,x First suppose that p > 3. It follows from Lemma 4.17 that max |v,x,j | ≤ 6C5 |τ |p1/(p−1) n1/2 , p−1 j≤n for all ≥ 0, x ∈ R, n ≥ 1. By (2.7), Z Z p+d−1 p−1 δ1, dν ≤ Γ sup M ≤ max |v,x,n |p−1 dν n≤1/ x∈E M p−1 ΓC5 |τ |p p−1 (1−p)/2 = ΓC5p−1 |τ |p (p−1)/2 . Similarly, v,x , v,x,n and δ1, by V,x , V,x,n and δ2, , we obtain that R p+d−1 replacing p−1 δ dν ≤ ΓC |τ |p (p−1)/2 . Since sup∈[0,0 ) |τ |p < ∞, this completes the 5 M 2, proof. From now on we restrict attention entirely to δ1, since the same estimate for δ2, is then immediate as above. Suppose that p ∈ (2, 3]. We can proceed as for p > 3, using the for R estimate p+d−1 maxj≤n |vj | given by Lemma 4.17 combined with (2.7); this gives M δ1, dν ≤ ΓC5p−1 |τ |p | log |p−1 p−2 . Alternatively,Rwe can use the estimate for vn in Lemma 4.17 p+d combined with Lemma 2.10 to obtain M δ1, dν ≤ ΓC4p−1 |τ |p (p−1)/2 . Finally, if p ∈ (1, 2], use the estimate for vn in Lemma 4.17 combined with R wep+d 2 Lemma 2.10 to obtain M δ1, dν ≤ ΓC4p−1 |τ |p | log |p−1 (p−1) /p . 5 Examples: Nonuniformly expanding maps Example 5.1 (Intermittent maps) Let M = [0, 1] with Lebesgue measure m and consider the intermittent maps T : M → M given by ( x(1 + 2a xa ), x ∈ [0, 21 ] . (5.1) Tx = 2x − 1, x ∈ ( 21 , 1] These were studied in [27]. Here a > 0 is a parameter. For a ∈ (0, 1) there is a unique absolutely continuous invariant probability measure with C ∞ nonvanishing density. We consider a family T : M → M , ∈ [0, 0 ), of such intermittent maps with parameter a ∈ (0, 1) depending continuously on . Let ν denote the corresponding family of absolutely continuous invariant probability measures. For each , we take Y = [ 21 , 1] and let τ : Y → Z+ be the first return time τ (y) = inf{n ≥ 1 : Tn y ∈ Y }. Define the first return map F = Tτ : Y → Y . Let α = {Y (n), n ≥ 1} where Y (n) = {y ∈ Y : τ (y) = n}. 22 It is standard that each T is a nonuniformly expanding map in the sense of Section 4.1 with τ ∈ Lp for all p < 1/a . Fix p ∈ (1, 1/a0 ) and choose 0 < a− < a0 < a+ < 1 such that p < 1/a+ . Without loss we can shrink 0 so that a ∈ [a− , a+ ] for all ∈ [0, 0 ). We show that T , ∈ [0, 0 ), satisfies Definition 4.2 for this choice of p. Since T0 ≥ 1 on M and T0 = 2 on Y , it is immediate that conditions (1)—(3) in Section 4.1 are satisfied with λ = 2 and C0 = 1. dm|Y Recall that ζ = dm| . Note that ζ (y) = 1/F0 (y). For each Y (n) ∈ α , define Y ◦F −1 −1 0 . By [26, ) = ζ ◦ F,n the bijection F,n = F |Y (n) : Y (n) → M . Let G,n = (F,n Assumption A2 and Theorem 3.1], there is a constant K, depending only on a− and −1 0 a+ , such that |(log G,n )0 | ≤ K for all ∈ [0, 0 ). Hence |(log ζ ◦F,n ) | = |(log G,n )0 | ≤ K. By the mean value theorem, for x, y ∈ Y (n), −1 −1 | log ζ (x) − log ζ (y)| = |(log ζ ◦ F,n )(F x) − (log ζ ◦ F,n )(F y)| ≤ K|F x − F y|. This proves condition (4) in Section 4.1, and so condition (i) in Definition 4.2 is satisfied. Define x1 = 21 and inductively xn+1 < xn (depending on ) with T xn+1 = xn . Then T (Y (n)) = [xn , xn−1 ] for n ≥ 2 and it is standard that xn = O(1/nα ) as a function of n. By [26, Lemma 5.2], there is a constant K, depending only on a− and a+ , such that xn ≤ Kn−1/α+ for all n ≥ 1, ∈ [0, 0 ). Hence m(τ > n) = m([1/2, (xn + 1)/2]) = xn /2 ≤ Kn−1/α+ . R Since p < 1/α+ it follows that sup∈[0,0 ) Y |τ |p dm < ∞ so condition (ii) in Definition 4.2 is satisfied. R By [11, 26], ν is strongly statistically stable and the densities ρ satisfy 1R = |ρ − ρ0 | dm = O(a − a0 ). Hence, by Corollary 2.7, we obtain averaging in L with respect to ν and also with respect to any absolutely continuous probability measure. Finally, if 7→ a is Lipschitz say, so that R = O(), then we obtain the rates described in Remark 4.4 with p = (1/a0 )−. Example 5.2 (Logistic family) We consider the family of quadratic maps T : [−1, 1] → [−1, 1] given by T (x) = 1 − ax2 , a ∈ [−2, 2], with m taken to be Lebesgue measure. Let b, c > 0. The map T satisfies the Collet-Eckmann condition [14] with constants b, c if |(T n )0 (1)| ≥ cebn for all n ≥ 0. (5.2) In thisScase, we write a ∈ Qb,c . The set of Collet-Eckmann parameters is given by P1 = b,c>0 Qb,c and is a Cantor set of positive Lebesgue measure [20, 10]. When a ∈ P1 , the map T has an invariant set Λ consisting of a finite union of intervals with an ergodic absolutely continuous invariant probability measure νa . The density for νa 23 is bounded below on Λ and lies in L2− . The invariant set attracts Lebesgue almost every trajectory in [−1, 1]. There is also an open dense set of parameters P0 ⊂ [−2, 2] for which T possesses a periodic sink attracting Lebesgue almost every trajectory in [−1, 1]. By Lyubich [28], P0 ∪ P1 has full measure. For a ∈ P0 , we let νa denote the invariant probability measure supported on the periodic attractor, so we have a map a 7→ νa defined on P0 ∪ P1 . It is clear that statistical stability holds on P0 , and that strong statistical stability fails everywhere in P0 ∪ P1 . Moreover, Thunberg [34, Corollary 1] shows that on any full measure subset of E ⊂ [−2, 2] the map a → νa is not statistically stable at any point of P1 ∩ E. On the other hand, Freitas & Todd [18] proved that strong statistical stability holds on Qb,c for all constants b, c > 0. That is, the map a → ρa = dνa /dm from Qb,c → L1 is continuous. (See also [16, 17] for the same result restricted to the Benedicks-Carleson parameters [10].) We consider families → T where each T is a quadratic map with parameter a = a depending continuously on . Fix b, c > 0 such that a0 ∈ Qb,c . We claim that Z lim z dνa = 0. →0 a ∈Qb,c Moreover, using Corollary 2.8 we obtain convergence in L1 (ν) for every absolutely continuous probability measure ν. Given the above results on strong statistical stability, it suffices to verify that T is a uniform family of nonuniformly expanding maps. For the Benedicks-Carleson parameters, the method in [16, 17] is the approach of [2] and we can apply Remark 4.5. In the general case, a different method exploiting negative Schwarzian derivative and Koebe spaces [18, Proof of Theorem B in Section 6] shows that the conditions in [5] are satisfied. By Remark 4.5, this completes the proof of averaging with the possible exception of condition (3). However, a standard consequence of negative Schwarzian derivative and the Koebe distortion property (as discussed in [18, Lemma 4.1] and used in [18, Remark 3.2]) is that bounded distortion holds at intermediate steps and not just at the inducing time as in condition (4). Hence there is a uniform constant C̃1 such that |Tj x−Tj y| x−F y| ≤ C̃1 |Fdiam for all partition elements a, all x, y ∈ a and all j ≤ τ (a). Y diam T j a In particular, |Tj x − Tj y| ≤ (2C̃1 / diam Y )|F x − F y|, yielding condition (3) uniformly in . Next, we discuss rates of convergence. By [18, Lemma 4.1], condition (ii) in 1 Definition 4.2 is satisfied for any p > 1. Hence |δ |Lq (ν ) = O( 2 − ). If 7→ a is R 1 C 1 , then it follows from Baladi et al. [9] that R = |ρ − ρ0 | dm = O( 2 − ). By 1 Remark 4.4, we obtain averaging with rate O( 2 − ) in Lq (ν ) for all q > 0 and in L1 (ν0 ). 24 Example 5.3 (Multimodal maps) Freitas & Todd [18] also consider families of multimodal maps where each critical point c satisfies a Collet-Eckmann condition along the orbit of T c with constants uniform in . Hence the averaging result for the quadratic family in Example 5.2 extends immediately to multimodal maps. Example 5.4 (Viana maps) Viana [35] introduced a C 3 open class of multidimensional nonuniformly expanding maps T : M → M . For definiteness, we restrict attention to the case M = S 1 × R. Let S : M → M be the map S(θ, y) = (16θ mod 1, a0 + a sin 2πθ − y 2 ). Here a0 is chosen so that 0 is a preperiodic point for the quadratic map y 7→ a0 − y 2 and a is fixed sufficiently small. Let T , 0 ≤ < 0 be a continuous family of C 3 maps sufficiently close to S. It follows from [1, 5] that there is an interval I ⊂ (−2, 2) such that, for each ∈ [0, 0 ), there is a unique absolutely continuous T -invariant ergodic probability measure ν supported in the interior of S 1 × I. Moreover the invariant set Λ = supp ν attracts almost every initial condition in S 1 × I. By Alves & Viana [5], ν0 is strongly statistically stable. Moreover, the inducing method of [4] and the arguments in [2] apply to this example, so T is a uniform family of nonuniformly expanding maps by Remark 4.5. Also, Corollary 2.8 is applicable by Remark 2.9. Hence we obtain averaging in L1 (ν ) and in L1 (µ) for all absolutely continuous µ. 6 Averaging for continuous time fast-slow systems Let φt : M → M , 0 ≤ < 0 , be a family of semiflows defined on the metric space (M, dM ). For each ≥ 0, let ν denote a φt -invariant ergodic Borel probability measure. Let a : Rd × M × [0, 0 ) → Rd be a family of vector fields on Rd satisfying conditions (2.3)—(2.5). We consider the family of fast-slow systems ẋ() = a(x() , y () , ), x() (0) = x0 , y () (t) = φt y0 , where the initial condition x() (0) = x0 is fixed throughout. The initial condition y0 ∈ M is again chosen randomly with respect to various measures that are specified in the statements of the results. Define x̂() : [0, 1] → Rd by setting x̂() (t) = x() (t/). Let X : [0, 1] → Rd be the solution to the ODE (2.2) and define z = sup |x̂() (t) − X(t)|. t∈[0,1] d R Recall that E = {x ∈ R : |x − x0 | ≤ L1 }. As in Section 2.1, define ā(x, ) = a(x, y, ) dν (y) and let v,x (y) = a(x, y, ) − ā(x, ). We define the order function M 25 δ = δ1, + δ2, : M → R where t Z δ1, = sup sup |v,x,t | where v,x ◦ φs ds, v,x,t = x∈E 0≤t≤1/ 0 Z δ2, = sup sup |V,x,t | where V,x,t = x∈E 0≤t≤1/ t (Dv,x ) ◦ φs ds. 0 The next results is the continuous time analogue of Theorem 2.2. The proof is entirely analogous, and hence is omitted. R Theorem 6.1 Let S = supx∈E | M a(x, y, 0) (dν − dν0 )(y)| + . Assume conditions (2.3)—(2.5). If δ ≤ 21 , then z ≤ 6e2L (δ + S ). As in the discrete time setting, statistical stability together with L1 convergence of δ implies averaging, and similarly for rates of averaging. In the remainder of this section, we consider statistical stability and convergence of δ . Under mild assumptions, we show that results for continuous time follow directly from those for discrete time with identical convergence rates. Fix a Borel subset M 0 ⊂ M and a finite Borel measure m0 supported on M 0 . Let h : M 0 → R+ be a family of Lipschitz functions such that φh (y) (y) ∈ M 0 for almost all y ∈ M 0 . The map T : M 0 → M 0 , T (y) = φh (y) (y), is then defined almost everywhere. We suppose throughout that there are constants K2 ≥ K1 ≥ 1 such that for all x, y ∈ M 0 , ∈ [0, 0 ), • K1−1 ≤ h ≤ K1 , Lip h ≤ K1 , |h − h0 |∞ ≤ K1 . • dM (φt x, φt y) ≤ K2 dM (x, y) and dM (φt y, φ0t y) ≤ K2 for all t ≤ K1 . (These assumptions are easily weakened; in particular changing the estimates to 1/2 will not affect anything.) As usual, we suppose that there is a family ν0 of ergodic T -invariant probability measures on M 0 . Define the suspension Mh = {(y, u) ∈ M 0 × R : 0 ≤ u ≤ h (y)}/ ∼ where (y, h (y)) ∼ (T y, 0). The suspension semiflow ft : Mh →RMh is given by ft (y, u) = (y, u + t) computed modulo identifications. Let h̄ = M 0 h dν0 . Then ν00 = (ν0 × Lebesgue)/h̄ is an ergodic absolutely continuous ft -invariant probability measure on Mh . The projection π : Mh → M given by π (y, u) = φu y is a semiconjugacy between ft and φt . Hence ν = π,∗ ν00 is an ergodic φt -invariant probability measure on M . It is easy to see that if ν0 is absolutely continuous on M 0 , then ν is absolutely continuous on M . Moreover, (strong) statistical stability of ν00 implies (strong) statistical stability of ν0 . Rates of strong statistical stability are also preserved. Let ρ = dν /dm and ρ0 = dν0 /dm0 denote the relevant densities. 26 Proposition 6.2 S ≤ 4K24 L Z + M0 |ρ0 − ρ00 | dm0 . Proof Fix x and let v(y) = a(x, y, 0). Then Z Z Z 00 v ◦ π dν − v ◦ π0 dν000 v (dν − dν0 ) = M M h0 M h Z h Z v◦ = (1/h̄ ) M0 π du dν0 Z Z − (1/h̄0 ) M0 0 h0 v ◦ π0 du dν00 0 = I1 + I2 + I3 + I4 where Z Z h Z Z h I1 = (1/h̄ − 1/h̄0 ) v◦ I2 = (1/h̄0 ) (v ◦ π − v ◦ π0 ) du dν0 , 0 0 M 0 M 0 Z Z h0 Z Z h v ◦ π0 du dν0 − v ◦ π0 du dν0 , I3 = (1/h̄0 ) M0 0 M0 0 Z Z h0 Z Z h0 0 0 v ◦ π0 du dν − v ◦ π0 du dν0 . I4 = (1/h̄0 ) M0 π du dν0 , M0 0 0 Now |I1 | ≤ K12 (h̄ |I2 | ≤ K12 − h̄0 )K1 |v|∞ ≤ sup y∈M 0 sup Lip v 0≤u≤K1 K14 L Z + M0 0 dM (φu y, φu y) ≤ K12 K2 L, Z 2 |I4 | ≤ K1 L |ρ0 − ρ00 | dm0 . |I3 | ≤ K1 |v|∞ |h − h0 |∞ ≤ K12 L, R Altogether, M v(dν − dν0 ) ≤ 3K24 L( + |ρ0 − ρ00 | dm0 , M0 R M0 |ρ0 − ρ00 | dm0 ) and the result follows. Next we show how the order function for the flows φt : M → M is related to the order function for the maps T : M 0 → M 0 . We restrict attention to δ1, since the corresponding statement for δ2, is identical. Define the family of induced observables wx, : M 0 → R, Z h (y) w,x (y) = v,x (φu y) du. 0 Note that R M0 w,x dν0 = 0 and w,x is dM -Lipschitz with kw kLip ≤ 2K1 K2 L. Let ∆1, = sup sup |w,x,n | where w,x,n = x∈E 1≤n≤K1 / n−1 X j=0 We can now state our main result for this section. 27 w,x ◦ Tj . Lemma 6.3 Let q ≥ 1. There is a constant C ≥ 1 such that Z Z q q 0 q δ1, dν ≤ C ∆1, dν + . M0 M Proof Let v̂,x = v,x ◦ π and define v̂,x,t = {0, 1, . . . , K1 t} be the number of laps by time t, Rt 0 v̂,x ◦ fu du. Let N,t : Mh → N,t (y, u) = #{s ∈ {0, . . . , K1 t} : fs (y, u) ∈ M 0 }. Then v̂,x,t (y, u) = w,x,N,t (y,u) (y) + H,x ◦ ft (y, u) − H,x (y, u), Ru where H,x (y, u) = 0 v̂,x (y, u0 ) du0 = v̂,x,u (y, 0). Note that |H,x |∞ ≤ 2K1 L. Hence sup |v,x,s | ◦ π (y, u) = sup |v̂,x,s (y, u)| ≤ sup |w,x,Ns (y,u) (y)| + 4K1 L s≤t s≤t s≤t ≤ max |w,x,j (y)| + 4K1 L, j≤K1 t and so δ1, ≤ ∆1, + 4K1 L. The result follows. As a consequence of Proposition 6.2 and Lemma 6.3, many of our results for maps go through immediately for semiflows. For example, suppose that the maps T (y) = φh (y) (y) are a family of quadratic maps as in Example 5.2. Then we obtain averaging in L1 (ν ) and in L1 (µ) for any absolutely continuous probability measure µ 1 on M . Moreover, we obtain the rate O( 2 − ) for convergence in Lq (ν ) for any q > 0. 7 Counterexample for almost sure convergence It is known [7] that almost sure convergence fails for fully-coupled fast-slow systems. Here we give an example to show that almost sure convergence fails also in the simpler context of families of skew products as considered in this paper. Fix β > 0. We consider the family of maps T : [0, 1] → [0, 1] given by T y = 2y + β mod 1 with invariant measure ν taken to be Lebesgue for all ≥ 0. Let a(x, y, ) = cos 2πy. Since a has mean zero, the averaged ODE is given by Ẋ = 0. We take x0 = 0 so that X(t) ≡ 0. Nevertheless, we prove: Proposition 7.1 For every y0 ∈ [0, 1], lim sup→0 x̂() (1) = 1. Proof Let y0 ∈ [0, 1] and δ > 0. Let N = [δ −1/2 ] and choose an integer 1 ≤ k ≤ 2N such that y0 ∈ [−δ β + (k − 1)2−N , −δ β + k2−N ]. Choose such that y0 = −β + (k − 1)2−N . 28 Then δ β − 2−N ≤ β ≤ δ β . If δ is small enough, then δ β − 2−N = δ β − 2−[δ Now −1/2 ] > 0, and so 0 < ≤ δ. yn() = 2n y0 + (2n − 1)β mod 1 = −β + (k − 1)2n−N mod 1, () () for all n ≥ 0. In particular, for n ≥ N we have yn = −β mod 1, and cos 2πyn ≥ 1− P −1 () πβ . Note that N ≤ −1/2 . Hence x̂() (1) = [n=0]−1 cos(2πyn ) = 1+O(1/2 )+O(β ). Since ∈ (0, δ] is arbitrarily small, the result follows. A Proof of first order averaging In this appendix, we prove Theorem 2.2(a). Our proof is analogous to that of [32, Theorem 4.3.6], but our setup is different, and we work with discrete time rather than continuous time. First, we consider the case where T is independent of . Let T : M → M be a transformation, and a : Rd × M → Rd and ā : Rd → Rd be functions satisfying kakLip = supy∈M ka(·, y)kLip ≤ L and kākLip ≤ L, where L ≥ 1. For > 0, consider the discrete fast-slow system xn+1 = xn + a(xn , yn ), yn+1 = T yn with x0 ∈ Rd and y0 ∈ M given. Define x̂ : [0, 1] → Rd , x̂ (t) = x[t/] , and let X : [0, 1] → Rd be the solution of the ODE Ẋ = ā(X) with initial condition X(0) = x0 . As in Section 2, we define δ1, n−1 X = sup sup (a(x, yj ) − ā(x)). x 1≤n≤1/ j=0 Theorem A.1 For all > 0, x0 ∈ Rd , y0 ∈ M and t ∈ [0, 1], q 2L δ1, (y0 ) + . |x̂ (t) − X(t)| ≤ 5e First we recall a discrete version of Gronwall’s lemma. Proposition A.2 C, D ≥ 0 such P Suppose that bn ≥ 0 and that there exist constants n that bn ≤ C + D n−1 b for all n ≥ 0. Then b ≤ C(D + 1) . n m=0 m Proof This follows by induction. 29 We are interested in the sequences xn ∈ Rd , yn ∈ M for n ≤ −1 , and for convenience and without loss of generality we assume that a(x, yn ) = ā(x) for all n > −1 and all x ∈ Rd . This gives us n−1 δ (y ) X 1, 0 (a(x, ym ) − ā(x)) ≤ m=0 (A.1) for every n, from the definition of δ1, . Define the local averages of x and a, xN,n N −1 1 X xn+m , = N m=0 N −1 1 X aN,n (x) = a(x, yn+m ). N m=0 The proof consists mainly of five short lemmas, all of them uniform in N, n ∈ Z, x0 ∈ Rd , y0 ∈ M , > 0, with 0 ≤ n ≤ −1 . Lemma A.3 |xn − xN,n | ≤ LN . Proof We have xN,n n+m−1 N −1 N −1 n+m−1 i X X X 1 Xh a(xk , yk ) = xn + = xn + a(xk , yk ), N m=0 N m=0 k=n k=n and so |xn − XN,n | ≤ N |a|∞ ≤ LN . n−1 X aN,m (xm ) ≤ 2L2 N . Lemma A.4 xN,n − x0 − m=0 Proof First compute that N −1 n+m−1 X X xN,n − x0 = xn − x0 + a(xk , yk ) N m=0 k=n N −1 n+m−1 N −1 n+m−1 X X X X a(xk , yk ) = a(xk , yk ). = a(xk , yk ) + N m=0 k=n N m=0 k=0 k=0 n−1 X Next, n−1 N −1 1 XX aN,m (xm ) = a(xm , ym+k ) = A(n, N ) + B(n, N ), N m=0 k=0 m=0 n−1 X 30 where n−1 N −1 1 XX A(n, N ) = [a(xm , ym+k ) − a(xm+k , ym+k )], N m=0 k=0 n−1 N −1 1 XX a(xm+k , ym+k ). N m=0 k=0 B(n, N ) = P PN −1 Note that |A(n, N )| ≤ NL n−1 m=0 k=0 |xm − xm+k | and |xm − xm+k | ≤ Lk ≤ LN so that |A(n, N )| ≤ L2 nN ≤ L2 N . Also B(n, N ) = N −1 n−1 N −1 n+m−1 1 XX 1 X X a(xm+k , ym+k ) = a(xk , yk ). N m=0 k=0 N m=0 k=m Altogether, xN,n − x0 − = N = N n−1 X aN,m (xm ) m=0 N −1 X n+m−1 X m=0 k=0 N −1 m−1 X X N −1 n+m−1 X X a(xk , yk ) − A(n, N ) − a(xk , yk ) N m=0 k=m a(xk , yk ) − A(n, N ). m=0 k=0 Hence |xN,n − x0 − n−1 X aN,m (xm )| ≤ LN + |A(n, N )| ≤ 2L2 N, m=0 as required. Define a sequence un ∈ Rd by setting u0 = x0 and un+1 = un + aN,n (un ). Lemma A.5 |xn − un | ≤ 3L2 eL N . Proof Note that Lip(aN,n ) ≤ L for all N, n. Hence it follows from Lemma A.4 that |xN,n − un | ≤ n−1 X 2 |aN,m (xm ) − aN,m (um )| + 2L N ≤ L m=0 n−1 X m=0 By Lemma A.3, |xn − un | ≤ L n−1 X |xm − um | + 3L2 N. m=0 31 |xm − um | + 2L2 N. By Proposition A.2, |xn − un | ≤ 3L2 N (1 + L)n ≤ 3L2 N (1 + L)1/ ≤ 3L2 eL N . Define a sequence zn ∈ Rd with zn+1 = zn + ā(zn ), Lemma A.6 |un − zn | ≤ z0 = x0 . 2eL δ1, . N Proof By equation (A.1), n+N −1 n−1 2δ X 1 X 1, |aN,n (x) − ā(x)| = [a(x, ym ) − ā(x)] − [a(x, ym ) − ā(x)] ≤ , N m=0 N m=0 for all x. Hence, |un − zn | ≤ n−1 X |aN,m (um ) − ā(zm )| m=0 ≤ n−1 X |aN,m (um ) − ā(um )| + m=0 n−1 X |ā(um ) − ā(zm )| ≤ m=0 By Proposition A.2, |un − zn | ≤ 2δ1, (1 N + L)n ≤ n−1 X 2δ1, |um − zm |. + L N m=0 2eL δ1, . N Lemma A.7 |X(n) − zn | ≤ L2 eL . Proof Write X(n) = x0 + n−1 Z X m=0 (m+1) [ā(X(s)) − ā(X(m))] ds + m n−1 X ā(X(m)). m=0 Since |ā(X(t1 )) − ā(X(t2 ))| ≤ L|X(t1 ) − X(t2 )| ≤ L2 |t1 − t2 | for all t1 , t2 , we obtain that n−1 X ā(X(m)) ≤ L2 n2 ≤ L2 . X(n) − x0 − m=0 Hence |X(n) − zn | ≤ n−1 X 2 |ā(X(m)) − ā(zm )| + L ≤ L m=0 n−1 X m=0 The result follows from Proposition A.2. 32 |X(m) − zm | + L2 . Proof of Theorem A.1 For convenience, we write L2 eL ≤ e2L . Also, 2eL ≤ e2L . By Lemmas A.5 and A.6, |xn − zn | ≤ e2L δ1, + 3e2L N. N 1/2 (A.2) 1/2 Choosing N = [δ1, /] + 1, we obtain |xn − zn | ≤ 4e2L δ1, + 3e2L . Overall, using Lemma A.7, |x[t/] − X(t)| ≤ |X(t) − X([t/])| + |X([t/]) − z[t/] | + |z[t/] − x[t/] | p p ≤ L + L2 eL + 4e2L δ1, + 3e2L ≤ 5e2L ( δ1, + ). This completes the proof. Proof of Theorem 2.2(a) Replacing T , a(x, y) and ā(x) in Theorem A.1 by T , a(x, y, ) and ā(x, ), we obtain that p δ1, + |x̂ (t) − X (t)| ≤ 5e2L where Ẋ = ā(X, ), X (0) = x0 . Let A = supx∈E |ā(x, ) − ā(x, 0)|. Then Z t |ā(X (s), ) − ā(X(s), 0)| ds |X (t) − X(t)| ≤ 0 Z t |ā(X (s), 0) − ā(X(s), 0)| ds ≤ tA + 0 Z t ≤ A + L |X (s) − X(s)| ds. 0 By Gronwall’s lemma, |X (t)R− X(t)| ≤ eL A for all t ≤ 1. Next, A ≤ L + supx∈E | M a(x, y, 0) (dν − dν0 )(y)|. Combining these estimates we obtain that Z p 2L 2L L δ1, + 6e + e sup a(x, y, 0) (dν − dν0 )(y), |x̂ (t) − X(t)| ≤ 5e x∈E M yielding the result. B Proof of second order averaging In this appendix, we prove Theorem 2.2(b). This is a quantitative version of a result due to [31] with a somewhat simplified proof. Again we work with discrete time rather than continuous time. 33 As in Appendix A, we consider first the case where T : M → M is independent of . Suppose that a : Rd × M → Rd and ā : Rd → Rd are functions. Assume that d and L ≥ 1. kakLip ≤ L and kDakLip ≤ L where D = dx Define δ = δ1, + δ2, where n−1 X δ1, (y0 ) = sup sup (a(x, yj ) − ā(x)), x 1≤n≤1/ j=0 n−1 X δ2, (y0 ) = sup sup (Da(x, yj ) − Dā(x)). x 1≤n≤1/ j=0 Theorem B.1 Let > 0, x0 ∈ Rd . For all y0 ∈ M with δ (y0 ) ≤ 1 2 and t ∈ [0, 1], |x̂ (t) − X(t)| ≤ 5e2L (δ (y0 ) + ). Define a function u : Rd × {0, 2, . . . , 1/} by setting u(x, 0) ≡ 0 and u(x, n) = n−1 X (a(x, yj ) − ā(x)), δ j=0 n ≥ 1. Proposition B.2 For any n ≤ 1/, we have |u(·, n)|∞ ≤ 1 and Lip u(·, n) ≤ 1. Proof For all x, n−1 δ X 2, |Du(x, n)| = (Da(x, yj ) − Dā(x)) ≤ ≤ 1. δ j=0 δ Hence the second estimate follows from the mean value theorem, and the first estimate is easier. Define a new sequence wn by setting w0 = x0 and wn = xn − δ u(wn−1 , n), n ≥ 1. Lemma B.3 For all 0 ≤ n ≤ 1/ and for all y ∈ M with δ (y) ≤ 21 , n−1 X ā(wk ) ≤ 4Lδ . w n − w 0 − k=0 Proof By definition, for all 0 ≤ k ≤ 1/, δ [u(wk , k + 1) − u(wk , k)] = [a(wk , yk ) − ā(wk )]. 34 Thus n−1 X δ u(wn−1 , n) = δ {u(wk , k + 1) − u(wk−1 , k)} k=0 = δ = n−1 X n−1 X {u(wk , k + 1) − u(wk , k)}) + {u(wk , k) − u(wk−1 , k)} k=0 n−1 X k=0 {a(wk , yk ) − ā(wk )} + In , k=0 where In = δ n−1 X {u(wk , k) − u(wk−1 , k)}. k=1 This together with the definition of xn yields w n = x0 + = w0 + = w0 + n−1 X k=0 n−1 X k=0 n−1 X a(xk , yk ) − δ u(wn−1 , y, n) a(xk , yk ) − n−1 X {a(wk , yk ) − ā(wk )} − In k=0 ā(wk ) − In + IIn , k=0 where IIn = n−1 X {a(xk , yk ) − a(wk , yk )}. k=0 We claim that for all 0 ≤ n ≤ 1/, |wn+1 − wn − ā(wn )| ≤ 4Lδ . The result follows by summing over n. It remains to prove the claim. It is easy to check that w1 − w0 − ā(w0 ) = 0. Inductively, suppose that |wn − wn−1 − ā(wn−1 )| ≤ 4Lδ . Notice that |IIn+1 − IIn | ≤ Lipa|xn − wn | ≤ Lipa δ |u(wn−1 , n)| ≤ Lδ . Also, |In+1 − In | ≤ δ Lipu|wn − wn−1 | ≤ δ (|ā(wn−1 )| + 4Lδ ) ≤ 3Lδ , where the second inequality follows by the induction hypothesis and the third inequality uses δ ≤ 12 . Therefore |wn+1 − wn − ā(wn )| ≤ |In+1 − In | + |IIn+1 − IIn | ≤ 4Lδ , 35 proving the claim. Define the sequence zn+1 = zn + ā(zn ), z0 = x0 . Corollary B.4 |xn − zn | ≤ 5δ e2L for all 1 ≤ n ≤ 1/, and for all y ∈ M with δ (y) ≤ 21 , Proof Write wn − zn = wn − x0 − n−1 X ā(zk ) = wn − w0 − k=0 n−1 X ā(wk ) + k=0 n−1 X {ā(wk ) − ā(zk )}. k=0 By Lemma B.3, |wn − zn | ≤ 4Lδ + n−1 X |ā(wk ) − ā(zk )| ≤ 4Lδ + L k=0 n−1 X |wk − zk |. k=0 By Proposition A.2, |wn − zn | ≤ 4Lδ eL ≤ 4δ e2L . Moreover, |xn − wn | ≤ δ |u|∞ ≤ δ and the result follows. Proof of Theorem B.1 This is identical to the proof of Theorem A.1, using Corollary B.4 in place of (A.2) together with Lemma A.7. Proof of Theorem 2.2(b) This follows from Theorem B.1 in exactly the same way that Theorem 2.2(a) followed from Theorem A.1. Acknowledgements This research was supported in part by a European Advanced Grant StochExtHomog (ERC AdG 320977). We are grateful to Vitor Araújo, Jorge Freitas, Vilton Pinheiro, Mike Todd and Paulo Varandas for helpful discussions. References [1] J. F. Alves. SRB measures for non-hyperbolic systems with multidimensional expansion. Ann. Sci. École Norm. Sup. (4) 33 (2000) 1–32. [2] J. F. Alves. Strong statistical stability of non-uniformly expanding maps. Nonlinearity 17 (2004) 1193–1215. [3] J. F. Alves, C. Bonatti and M. Viana. SRB measures for partially hyperbolic systems whose central direction is mostly expanding. Invent. Math. 140 (2000) 351–398. 36 [4] J. F. Alves, S. Luzzatto and V. Pinheiro. Markov structures and decay of correlations for non-uniformly expanding dynamical systems. Ann. Inst. H. Poincaré Anal. Non Linéaire 22 (2005) 817–839. [5] J. F. Alves and M. Viana. Statistical stability for robust classes of maps with non-uniform expansion. Ergodic Theory Dynam. Systems 22 (2002) 1–32. [6] D. V. Anosov. Averaging in systems of ordinary differential equations with rapidly oscillating solutions. Izv. Akad. Nauk SSSR Ser. Mat. 24 (1960) 721–742. [7] V. Bakhtin and Y. Kifer. Nonconvergence examples in averaging. Geometric and probabilistic structures in dynamics. Contemp. Math. 469, Amer. Math. Soc., Providence, RI, 2008, pp. 1–17. [8] V. Baladi. On the susceptibility function of piecewise expanding interval maps. Comm. Math. Phys. 275 (2007) 839–859. [9] V. Baladi, M. Benedicks and D. Schnellmann. Whitney-Hölder continuity of the SRB measure for transversal families of smooth unimodal maps. Invent. Math. To appear. [10] M. Benedicks and L. Carleson. On iterations of 1 − ax2 on (−1, 1). Ann. of Math. 122 (1985) 1–25. [11] V. Baladi and M. Todd. Linear response for intermittent maps. Preprint, 2015. [12] P. Billingsley. Convergence of probability measures, second ed., Wiley Series in Probability and Statistics: Probability and Statistics, John Wiley & Sons Inc., New York, 1999, A Wiley-Interscience Publication. [13] D. L. Burkholder. Distribution function inequalities for martingales. Ann. Probability 1 (1973) 19–42. [14] P. Collet and J.-P. Eckmann. Positive Liapunov exponents and absolute continuity for maps of the interval. Ergodic Theory Dynam. Systems 3 (1983) 13–46. [15] D. Dolgopyat. Averaging and invariant measures. Mosc. Math. J. 5 (2005) 537– 576, 742. [16] J. M. Freitas. Continuity of SRB measure and entropy for Benedicks-Carleson quadratic maps. Nonlinearity 18 (2005) 831–854. [17] J. M. Freitas. Exponential decay of hyperbolic times for Benedicks-Carleson quadratic maps. Port. Math. 67 (2010) 525–540. [18] J. M. Freitas and M. Todd. The statistical stability of equilibrium states for interval maps. Nonlinearity 22 (2009) 259–281. 37 [19] S. Gouzel. Decay of correlations for nonuniformly expanding systems. Bull. Soc. Math. France 134 (2006) 1–31. [20] M. V. Jakobson. Absolutely continuous invariant measures for one-parameter families of one-dimensional maps. Comm. Math. Phys. 81 (1981) 39–88. [21] G. Keller. Stochastic stability in some chaotic dynamical systems. Monatsh. Math. 94 (1982) 313–333. [22] G. Keller and C. Liverani. Stability of the spectrum for transfer operators. Annali della Scuola Normale Superiore di Pisa, Classe di Scienze XXVIII (1999) 141– 152. [23] Y. Kifer. Averaging in difference equations driven by dynamical systems. Astérisque (2003), no. 287, xviii, 103–123, Geometric methods in dynamics. II. [24] Y. Kifer. Averaging principle for fully coupled dynamical systems and large deviations. Ergodic Theory Dynam. Systems 24 (2004) 847–871. [25] Y. Kifer. Another proof of the averaging principle for fully coupled dynamical systems with hyperbolic fast motions. Discrete Contin. Dyn. Syst. 13 (2005) 1187– 1201. [26] A. Korepanov. Linear response for intermittent maps with summable and nonsummable decay of correlations. Preprint, 2015. [27] C. Liverani, B. Saussol and S. Vaienti. A probabilistic approach to intermittency. Ergodic Theory Dynam. Systems 19 (1999) 671–685. [28] M. Lyubich. Almost every real quadratic map is either regular or stochastic. Ann. of Math. 156 (2002) 1–78. [29] I. Melbourne and R. Zweimüller. Weak convergence to stable Lévy processes for nonuniformly hyperbolic dynamical systems. Ann Inst. H. Poincaré (B) Probab. Statist. 51 (2015) 545–556. [30] F. Móricz. Moment inequalities and the strong laws of large numbers. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 35 (1976) 299–314. [31] J. A. Sanders. On the fundamental theorem of averaging. SIAM J. Math. Anal. 14 (1983) 1–10. [32] J. A. Sanders, F. Verhulst and J. Murdock. Averaging methods in nonlinear dynamical systems, second ed., Applied Mathematical Sciences 59, Springer, New York, 2007. 38 [33] R. J. Serfling. Moment inequalities for the maximum cumulative sum. Ann. Math. Statist. 41 (1970) 1227–1234. [34] H. Thunberg. Unfolding of chaotic unimodal maps and the parameter dependence of natural measures. Nonlinearity 14 (2001) 323–337. [35] M. Viana. Multidimensional nonhyperbolic attractors. Inst. Hautes Études Sci. Publ. Math. (1997) 63–96. [36] L.-S. Young. Recurrence times and rates of mixing. Israel J. Math. 110 (1999) 153–188. 39