arXiv:1504.04889v1 [math.PR] 19 Apr 2015 CONTROLLED EQUILIBRIUM SELECTION IN STOCHASTICALLY PERTURBED DYNAMICS ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR Abstract. We consider a dynamical system with finitely many equilibria and perturbed by small noise, in addition to being controlled by an ‘expensive’ control. We study the invariant distribution of the controlled process as the variance of the noise becomes vanishingly small. It is shown that depending on the relative magnitudes of the noise variance and the ‘running cost’ for control, one can identify three regimes, in each of which the optimal control forces the invariant distribution of the process to concentrate near equilibria that can be characterized according to the regime. We also show that in the vicinity of the points of concentration the density of invariant distribution approximates the density of a Gaussian, and we explicitly solve for its covariance matrix. 1. Introduction The study of dynamical systems has a long and profound history. A lot of effort has been devoted to understand the behavior of the system when it is perturbed by an additive noise [6, 16, 22]. Small noise diffusions have found applications in climate modeling [4, 5], electrical engineering [9, 27], finance [15] and many other areas. In this article we consider a controlled dynamical system with small noise, which is modelled as a d−dimensional controlled diffusion X = [X1 , . . . , Xd ]T governed by the stochastic integral equation Z t (1.1) Xt = X0 + m(Xs ) + ε Us ds + εν Wt , t ≥ 0 . 0 Here: • m = [m1 , . . . , md ]T : Rd → Rd is a bounded C ∞ function with bounded derivatives, • W is a standard Brownian motion in Rd , • U is an Rd −valued control process with measurable paths satisfying the nonanticipativity condition: for t > s, Wt − Ws is independent of X0 , Wr , Ur , r ≤ s. As pointed out in [10, p. 18], we may, without loss of generality, consider U to be adapted to the natural filtration of X. (The set of such ‘admissible’ controls is denoted by U.) • 0 < ε ≪ 1, • ν > 0. We view this as a perturbation of the o.d.e. (for ordinary differential equation) ẋ(t) = m x(t) , (1.2) Date: April 21, 2015. 2000 Mathematics Subject Classification. 35R60, 93E20. Key words and phrases. Controlled diffusion, equilibrium selection, large deviations, small noise, expensive controls, ergodic control, HJB equation, ergodic LQG. 1 2 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR perturbed by the ‘small noise’ εν Wt (‘small’ because ε ≪ 1) and the control εUt . In the case when the control Ut ≡ 0, Freidlin and Wentzell developed a general framework for the analysis of small noise perturbed dynamical systems in [16] that is based on the theory of large deviations. The goal of this article is to study the effect of the additional control when the parameter ε tends to 0. We assume that the set of nonwandering points of (1.2) consists of finitely many hyperbolic equilibria, and that these are contained in some bounded open set which is positively invariant under the flow. The objective is to minimize the long run average (or ergodic) cost Z T 1 2 1 (1.3) ℓ(Xs ) + 2 |Us | ds , E lim sup T →∞ T 0 over all admissible controls. Here ℓ : Rd → R+ is a prescribed smooth, Lipschitz function satisfying the condition: lim ℓ(x) = ∞ . |x|→∞ Since ε is small, this implies that the control is expensive. Under a stochastic Lyapunov condition we introduce later, the cost is finite for U ≡ 0, ensuring in particular that the set of controls with finite cost is nonempty. It is quite evident from ergodic theory that for U = 0 the limit (1.3) is the moment of ℓ with respect to the invariant measure of (1.1). It is not hard to show that as ε ց 0, the collection of invariant measures is tight and concentrates on the set of equilibria of m. To find the actual support of the limit, in the case of multiple equilibria, one often looks at the large deviation properties of these invariant measures [16]. There are several studies in literature that deal with the large deviation principle of invariant measures of dynamical systems. Among the most relevant to the present are [14, 24] which obtain a large deviation principle for invariant measures (more precisely, invariant densities) of (1.1) under the assumption that there is only one equilibrium point. This is generalized to the multiple equilibria in [8]. A large deviation principle for invariant measures for a class of reaction-diffusion systems is established in [13]. None of the above mentioned studies have any control component in their dynamics. One of our motivations for this work is to add a control in the dynamics and study its effects on the noise in the selection of equilibria. Remark 1.1. The vector field m is assumed bounded for simplicity. The reader however might notice that the regularity results in [3], on which the characterization of optimality is based (see Theorem 2.2), the hypotheses in [3, Section 4.6.1] permit m to be unbounded as long as |m(x)|2 < ∞. lim sup ℓ(x) |x|→∞ Provided that this condition is satisfied, the assumption that the drift is bounded can be waived, and all the results of this paper hold unaltered, with the proofs requiring no major modification. Let us now state the following result on existence of solutions to (1.1) whose proof is given in Appendix A. R t Lemma 1.1. Under the condition E 0 |Us |2 ds < ∞, ∀ t ≥ 0, the diffusion in (1.1) has a unique weak solution. CONTROLLED EQUILIBRIUM SELECTION 3 The qualitative properties of the dynamics are best understood if we consider the special case d = 1, and m = − dF dx for some continuously differentiable F : R → R. Then the trajectory of (1.2) converges to a critical point of F . In fact, generically (i.e., for x(0) in an open dense set) it converges to a stable one, i.e., to a local minimum. If one views the graph of F as a ‘landscape’, the local minima are the bottoms of its ‘valleys’. The behavior of the stochastically perturbed (albeit uncontrolled) version of this model, notably the analysis of where the stationary probability distribution concentrates, has been of considerable interest to physicists (see, e.g., [23, Chapter 8] or [16, Chapter 6]). Recent work on ‘stochastic resonance’ (see, e.g., [21]) introduces an additional external input to the dynamics that may be viewed as a control. This is chosen so that it ‘resonates’ with the noise in an appropriate sense and induces transitions between valleys. The model in (1.1) goes a step further and considers the full-fledged optimal control version of this, wherein one tries to induce a preferred equilibrium behavior through a feedback control. The reason the latter has to be ‘expensive’ is because this captures the physically realistic situation that one can ‘tweak’ the dynamics but cannot replace it by something altogether different without incurring considerable expense. The function ℓ captures the relative preference among different points in the state space. Let us also point out that this problem can also be viewed as a multi-scale diffusion problem as the control and noise are scaled differently. The following hypothesis on the vector field m is in effect throughout the paper. Hypothesis 1.1. The set S := {x ∈ Rd : m(x) = 0} is finite and its elements are hyperbolic, i.e., the Jacobian matrix Dm(x) of m at each point x ∈ S has no eigenvalues on the imaginary axis. Also there exist a twice continuously differentiable function V̄ : Rd → R+ , and a bounded set K ⊂ Rd containing S, with the following properties: (H1) c1 |x|2 ≤ V̄(x) ≤ c2 |x|2 for some positive constants c1 , c2 , and all x ∈ Kc . (H2) ∇V̄ is Lipschitz and satisfies hm(x), ∇V̄ (x)i < −2γ|x| (1.4) for some γ > 0, and all x ∈ Kc . We need some definitions. Definition 1.1. Let Ss ⊂ S denote the set of stable equilibria of (1.2), i.e., the set of points z ∈ S for which the eigenvalues of Dm(z) have negative real parts. Throughout the paper β∗ε denotes the optimal value of (1.3), η∗ε denotes the stationary probability distribution of the process X under an optimal stationary Markov control (which is denoted as v∗ε ), and ̺ε∗ its density (for existence and uniqueness see Theorem 2.2). We also define the ‘running cost’ R[v] : Rd → R corresponding to a stationary Markov control v : Rd → Rd by 1 R[v](x) := ℓ(x) + |v(x)|2 . 2 d We say that a set K ⊂ R is stochastically stable (or that η∗ε concentrates on K) if it is compact, and limεց0 η∗ε (K c ) = 0. It is evident that the class S of stochastically stable sets, if nonempty, is closed under intersections. Hence we define the minimal stochastically stable set S by S := ∩K∈S K. 4 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR Definition 1.2. For a square matrix M ∈ Rd×d , let E+ (M ) denote the sum of its eigenvalues bz that lie in the open right half complex plane. For z ∈ S, and with M ≡ Dm(z), we let Q b and Σz be the symmetric, nonnegative definite, square matrices solving the pair of equations bz + Q bz M = Q b z2 , M TQ (1.5) b z T = −I . bz + Σ bz M − Q bz Σ M −Q bz , Σ b z ) of By Lemma 3.4, which appears later in the paper there exists a unique pair (Q b z is symmetric positive semidefinite matrices solving (1.5). It is also evident by (1.5) that Σ invertible. The main results of the paper are summarized in Theorems 1.1–1.3 that follow. Theorem 1.1. The minimal stochastically stable set S is a subset of S for all ν > 0. We define Z1 := Arg min ℓ(z) , J1 := min ℓ(z) , z∈S z∈S Z2 := Arg min ℓ(z) + E+ z∈S Z3 := Arg min ℓ(z) , Dm(z) , J2 := min ℓ(z) + E+ Dm(z) , z∈S J3 := min ℓ(z) . z∈S s z∈S s β∗ε depend on ν as follows: The set S and the optimal value (i) For ν > 1 (‘supercritical’ regime), S ⊂ Z1 , and β∗ε = J1 + O(ε2ν−2 ) if ν ∈ (1, 2), while O(ε2 ) ≤ β∗ε − J1 ≤ O(ε2ν−2 ) for ν ≥ 2. (ii) For ν < 1 (‘subcritical’ regime), S ⊂ Z3 , and O(ε2−2ν ) ≤ β∗ε − J3 ≤ O(ε4ν−2 ) if ν ∈ (2/3, 1), while β∗ε = J3 + O(εν ) if ν ≤ 2/3. (iii) For ν = 1 (‘critical’ regime), S ⊂ Z2 , limεց0 β∗ε = J2 , and β∗ε ≤ J2 + O(ε2 ). To prove Theorem 1.1, we first identify the optimal control for ε > 0 from the associated Hamilton–Jacobi–Bellman equation (HJB) and establish a rigorous connection of the HJB studied in [2] with the ergodic control problem. It is not hard to show that the optimal invariant measures η∗ε concentrate on S as ε ց 0 (see Lemma 3.1). In Theorem 1.1 we actually identify three regimes, corresponding to different values of ν and characterize the limiting β∗ε . For ν > 1 one can find a control U under which the invariant measure of the dynamics (1.1) concentrates on a point in S. Construction of invariant measures with similar properties is also possible for z ∈ Ss when ν < 1. The important difference is that for ν < 1 the optimal invariant measure η∗ε cannot concentrate on S \ Ss (see Lemma 3.3). To show this fact we construct a suitable Lyapunov function for the Morse-Smale dynamics (see Theorem 2.1). The analysis in the critical regime ν = 1 turns out to be more subtle than other two regimes. To establish the results for this regime we study the ergodic control problem for LQG systems (Lemma 3.4). Depending on some moment results (Lemma 3.5) we scale the space suitably and show that the resulting invariant measures are also tight. In particular, we examine the asymptotic behavior of η∗ε and show that under an appropriate spatial scaling it ‘converges’ to a Gaussian distribution in the vicinity of the minimal stochastically stable set. CONTROLLED EQUILIBRIUM SELECTION 5 Theorem 1.2. Let ν ∈ (0, 2). Suppose that for z ∈ S there exists a sequence εn ց 0, such that for some open neighborhood N of z whose closure does not contain any other elements of S it holds that lim inf εn ց0 η∗εn (N ) > 0. Then along this sequence we have T b −1 ενd ̺ε∗ εν x + z 1 x Σz x − − − → , exp − b z |1/2 εց0 (2π)d/2 |det Σ η∗ε (N ) 2 b z denotes the determinant of Σ bz. uniformly on compact sets, where det Σ The following theorem shows that in the supercritical case, the set S contains only those points z ∈ Z1 for which E+ Dm(z) is minimal. In particular, if Z1 ∩ Ss 6= ∅, then S ⊂ Z3 . Theorem 1.3. We assume ν ∈ (1, 2). Define the subset Z1∗ ⊂ Z1 by Z1∗ := Arg min E+ Dm(z) . z∈Z1 If z ∈ S \ Z1∗ and N is a neighborhood of z whose closure does not contain other elements of S, then η∗ε (N ) −−−→ 0 . Moreover, for z ∈ Z1∗ , it holds that εց0 β∗ε = ℓ(z) + ε2ν−2 E+ Dm(z) + o(ε2ν−2 ) . (1.6) Remark 1.2. It is evident from Theorem 1.1 (i) that ν = 2 is a critical value. While for ν ∈ (1, 2) and a point z ∈ Z1∗ \ Ss we have lim inf εց0 ε2−2ν β∗ε − ℓ(z) > 0, which means R R that the control effort Rd 12 |v∗ε |2 dη∗ε exceeds Rd ℓ dη∗ε for all ε sufficiently small, the opposite can occur if ν > 2. A simple example where this happens is the one-dimensional model with data m(x) = x and ℓ(x) = (x + 1)2 . For this example, direct substitution shows that the solution of the HJB equation (see (2.7)) is V ε (x) = 2 1 − b(ε) x + b(ε) , b(ε) β∗ε = 1 + ε2ν where 1 − b(ε) − 2b(ε) 1 − b(ε) , b(ε) −1 p b(ε) := 2ε2 1 + 2ε2 + 1 + 2ε2 . A simple calculation shows that if ν ∈ (1, 2) then lim inf εց0 ε2−2ν (β∗ε − 1) > 0, while if ν > 2, then lim supεց0 ε−2 (β∗ε − 1) < 0. Therefore (1.6) does not hold for this example when ν > 2. Theorem 1.1 is the combination of Lemma 3.1, Lemma 3.3, Corollary 3.1, Corollary 3.2 and Theorems 3.1–3.3 presented in Section 3. Theorem 1.2 follows from Lemmas 3.7 and 3.8— see also Theorems 3.5 and 3.6 for related results. The proof of Theorem 1.3 can be found in Section 3.5. 6 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR 1.1. Notation. The following notation is used in this paper. The symbols R, and C denote the fields of real numbers, and complex numbers, respectively. Also, N denotes the set of natural numbers. The Euclidean norm on Rd is denoted by | · |, and h· , ·i denotes the inner product. A ball of radius r > 0 in Rd around a point x is denoted by Br (x), or as Br if x = 0. For a compact set K, Br (K) denotes the open r-neighborhood of K. For a set A ⊂ Rd , we use Ā, Ac , and ∂A to denote the closure, the complement, and the boundary of A, respectively. We write A ⋐ B to indicate that Ā ⊂ B. We define Cbk (Rd ), k ≥ 0, as the set of functions whose i-th derivatives, i = 1, . . . , k, are continuous and bounded in Rd and denote by Cck (Rd ) the subset of Cbk (Rd ) with compact support. The space of all probability measures on a Polish space X with the Prohorov topology is denoted by P(X ). The density of the d-dimensional Gaussian distribution with mean 0 and covariance matrix Σ is denoted by ρΣ . The symbols O(x) and o(x) denote the sets of functions f : Rd → R having the property lim sup |x|ց0 |f (x)| < ∞, |x| and lim sup |x|ց0 |f (x)| = 0, |x| respectively. Abusing the notation, O(x) and o(x) occasionally denote generic members of these sets. Also κ1 , κ2 , . . . are generic constants whose definition differs from place to place. In Section 2 we present a basic property of gradient-like flows (Theorem 2.1), and characterize the optimal control problem via a HJB equation (Theorem 2.2). Section 3 is devoted to the study of the support of the limit, as ε ց 0, of the optimal stationary probability distribution η∗ε . Appendix A contains the more technical proofs of Lemma 1.1 and Theorem 2.2. 2. Gradient-like flows and the optimal control problem Recall the function V̄ defined in Hypothesis 1.1. Since ∇V̄ is Lipschitz, ∆V̄ is bounded and thus (1.4) implies that for Lε0 f (x) := ε2ν ∆f (x) + hm(x), ∇f (x)i 2 ∀ x ∈ Rd , f ∈ C 2 (Rd ) , we have Lε0 V̄(x) ≤ γ0 − γ |x| ∀ ε ∈ (0, 1) , for some constant γ0 > 0. This is the ‘stochastic Lyapunov condition’ that implies in particular that the process X with U ≡ 0 has a unique stationary probability distribution η0ε , and Z T Z 1 γ0 lim |x| η0ε (dx) ≤ |Xt | dt = E ∀ ε ∈ (0, 1) . (2.1) T →∞ T γ Rd 0 In view of Theorem 2.2 and Proposition 2.3 of [19], this follows from the ergodic theory of Markov processes. Since Rℓ is Lipschitz, (2.1) implies that there exists a constant C independent of ε such that ℓ dη0ε ≤ C < ∞ . Moreover, from [8] there exists a unique Lipschitz continuous function Z ≥ 0, such that minRd Z = 0, Z(x) → ∞ as |x| → ∞ and Z ∞ 2 1 φ̇(s) + m(φ(s)) ds + Z(xi ) , φ(0) = x , Z(x) = inf φ : φ(t)→xi , xi ∈S 2 0 CONTROLLED EQUILIBRIUM SELECTION 7 and, if ̺ε0 denotes the density of η0ε , then −ε2 ln ̺ε0 (x) → Z(x) uniformly on compact subsets of Rd as ε ց 0. The function Z is generally referred to as the quasi-potential. 2.1. Gradient-Like Morse–Smale dynamical systems. It is well known from the theory of dynamical systems that if the set of nonwandering points of a flow on a compact manifold consists of hyperbolic fixed points, then the associated vector field is generically gradient-like (see Definition 2.1 and Theorem 2.1 below). This is also the case under Hypothesis 1.1, since the ‘point at infinity’ is a source for the flow of m. The theorem below is well known [20, 25]. What we have added in its statement is the assertion that the energy function can be chosen in a manner that its Laplacian at critical points of positive index is negative. Recall that a function f : Rd → R is called inf-compact if the set {x ∈ Rd : f (x) ≤ C} is compact (or empty) for every C ∈ R. We start with the following definition. Definition 2.1. We say that V ∈ C ∞ (Rd ) is an energy function if it is inf-compact, and has a finite set S = {z1 , . . . , zn } of critical points, which are all nondegenerate. A C ∞ vector field m on Rd is called gradient-like relative to an energy function V provided that the set of nonwandering points of its flow is S, that every point in S is a hyperbolic singular point of m, and m(x) · ∇V(x) < 0 ∀x ∈ Rd \ S . If m satisfies these properties, we also say that m is adapted to V. For a hyperbolic singular point x̂ of a vector field m, we let Ws (x̂) and Wu (x̂) denote the stable and unstable manifolds of the flow. Recall that the index of x̂ is defined as the dimension of Wu (x̂). Theorem 2.1. Suppose that a C ∞ vector field m in Rd satisfies (H1)–(H2) and (a) The set of nonwandering points of its flow is a finite set S = {z1 , . . . , zn } of hyperbolic singular points. (b) If y and z are singular points of m, then Ws (y) and Wu (z) intersect transversally (if they intersect). Then m is adapted to an energy function V which satisfies (i) V(zi ) 6= V(zj ) for i 6= j. (ii) ∆V(z) < 0, for all z ∈ S \ Ss where Ss , as defined earlier, denotes the set of singular points of index 0, i.e., the stable equilibria of the flow of m. (iii) For each z ∈ S there exists some open neighborhood Nz of z and a constant C0 > 0 such that −C0−1 |x − z|2 ≤ hm(x), ∇V(x)i ≤ −C0 |x − z|2 ∀x ∈ Nz . (iv) There exists some open neighborhood Q of S and a constant C0′ > 0 such that − 1 hm(x), ∇V(x)i ≤ |∇V(x)|2 ≤ −C0′ hm(x), ∇V(x)i C0′ ∀x ∈ Q . 8 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR Proof. Let x̂ be a critical point of m of index q > 0. Translating the coordinates we may assume that x̂ ≡ 0. Since m(0) = 0, then m(x) takes the form m(x) = M x + o(x) locally around x = 0, where M = Dm(0) is the Jacobian of m at 0. By hypothesis M has exactly q (d − q) eigenvalues in the open right half (left half) complex space. Therefore since the corresponding eigenspaces are invariant under M , there exists a linear coordinate transformation T such that, in the new coordinates x̃ = T (x), the linear map x 7→ M x has the matrix representation M̃ = T M T −1 and M̃ = diag(M̃1 , −M̃2 ), where M̃1 and M̃2 are square Hurwitz matrices of dimension d − q and q respectively. By the Lyapunov theorem there exist positive definite matrices Q̃i , i = 1, 2, satisfying M̃1T Q̃1 + Q̃1 M̃1 = −Id−q , M̃2T Q̃2 + Q̃2 M̃2 = −Iq , (2.2) where Id−q and Iq are the identity matrices of dimension d − q and q, respectively. Let θ > 1 be such that θ trace T T diag(0, Q̃2 )T > trace T T diag(Q̃1 , 0)T , (2.3) and define V in some neighborhood of 0 by V(x) := a + xT T T diag(Q̃1 , −θ Q̃2 )T x , (2.4) where a is a constant to be determined later. By (2.3) we obtain ∆V(0) < 0, and thus (ii) holds. We have hm(x), ∇V(x)i = xT M T T T diag(Q̃1 , −θ Q̃2 )T + T T diag(Q̃1 , −θ Q̃2 )T M x + o(|x|2 ) . Expanding we obtain T T diag(Q̃1 , −θ Q̃2 )T M = T T diag(Q̃1 , −θ Q̃2 )T T −1 M̃ T = T T diag(Q̃1 M̃1 , θ Q̃2 M̃2 )T . By (2.2) we have hm(x), ∇V(x)i = −xT T T diag(Id−q , θIq )T x + o(|x|2 ) . Therefore, since θ > 1, we have −|T x|2 + o(|x|2 ) ≤ hm(x), ∇V(x)i ≤ −θ |T x|2 + o(|x|2 ) , thus establishing (iii). Property (iv) follows by (iii) and (2.4). As shown in [25] one can select n distinct real numbers ai and define V on S by setting V(zi ) = ai in a consistent manner: if zi and zj are the α- and ω-limit points of some trajectory then ai > aj . Thus V can be defined in nonoverlapping neighborhoods of the singular points by (2.4) so as to satisfy (i). This function can then be extended to Rd by the construction in [25]. The energy function V can be constructed in a manner so that it agrees, outside some ball, with the Lyapunov function V̄ in Hypothesis 1.1. This is stated in the following lemma. CONTROLLED EQUILIBRIUM SELECTION 9 Lemma 2.1. Under the assumptions of Theorem 2.1, the energy function V can be selected so that V = V̄ on the complement of some ball which contains S. Proof. Recall the definition of K in Hypothesis 1.1. It is evident that we can find radii 0 < R1 < R2 < R3 < R4 and balls centered at the origin such that S ⊂ BR1 , K ⊂ BR3 , and the following are satisfied c1 := sup V < c2 := BR1 c3 := sup BR3 \BR2 inf BR3 \BR2 V, V, V < c4 := inf c BR 4 V̄ . c̄2 := sup V̄ < c̄3 := inf c BR2 BR 3 Let ψ : R → R be a smooth non-decreasing function such that ψ(t) = t for t ≤ c1 , ψ(t) = c4 for t ≥ c4 , and whose derivative is strictly positive on the interval [c1 , c3 ]. Similarly let ψ̄ : R → R be a smooth non-decreasing function such that ψ̄(t) = 0 for t ≤ c̄2 − c4 and ψ̄(t) = t for t ≥ c̄3 − c4 . Let G := ψ ◦ V + ψ̄ ◦ (V̄ − c4 ). By construction G agrees with c . It can also be easily verified that sup V on BR1 and with V̄ on BR BR4 \BR1 m · ∇G < 0. 4 Replacing V with G we obtain a new energy function V such that |∇V| is Lipschitz. 2.2. Existence of an optimal control and the HJB equation. Recall that a stationary Markov control is of the form Ut = v(Xt ) where v : Rd → Rd is a measurable map. Theorem 2.2 below establishes the existence of a stationary control that is optimal. Before stating this result we mention the occupation measure based formulation that is quite related to the ergodic control problem considered here [7, 18]. Define, for f ∈ C 2 (Rd ), ε2ν ∆f (x) + hm(x) + εu, ∇f (x)i . 2 Now consider the following minimization problem Z ℓ(x) + 21 |u|2 π(dx, du) , inf Lε [f ](x, u) := π∈P(Rd ×Rd ) subject to: Z Rd ×Rd Rd ×Rd ε L [f ](x, u) π(dx, du) = 0 ∀f ∈ (2.5) Cc2 (Rd ) . It is not hard to show that the value of this minimization problem does not exceed the optimal value β∗ε of the ergodic cost (1.3) under the controlled dynamics (1.1). Using the lower semi-continuity of the cost it is also easy to show that there exists a minimizer π∗ε ∈ P(Rd ×Rd ) which attains the optimal value of (2.5). In [18] it is shown that there exists a càdlàg stationary process (X ∗ , v∗ε (X ∗ )) with distribution π∗ε that satisfies the martingale problem with respect to the generator Lε . But it is not obvious that one can construct a weak solution to (1.1) with control v∗ε . We adopt an analytic approach to find an optimal stationary control and characterize the optimal value β∗ε . Theorem 2.2. The HJB equation for the ergodic control problem, given by i h ε2ν ε ε 2 1 ∆V + min hm + εu, ∇V i + ℓ + 2 |u| = β ε , 2 u∈Rd (2.6) 10 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR has a solution (V ε , β ε ) in C 2 (Rd ) × R, where V ε is unique in the class of functions in C 2 (Rd ) satisfying lim|x|→∞ V ε (x) = ∞, V ε (0) = 0, and β ε is uniquely specified as β ε = β∗ε . Moreover, Ut = v∗ε (Xt ), t ≥ 0, where v∗ε = −ε∇V ε is an optimal control. Proof. The proof is contained in Appendix A. Remark 2.1. In view of the smoothness assumptions on the coefficients and the cost function, standard elliptic regularity theory allows us to improve the regularity of V ε to C k (Rd ), for any k ≥ 2. From (2.6) it follows that ε2 ε2ν ∆V ε + m · ∇V ε − |∇V ε |2 + ℓ = β∗ε . 2 2 (2.7) 3. The support of the limit of the stationary probability distribution Throughout the rest of the paper η∗ε denotes the stationary probability distribution under the optimal stationary control v∗ε in Theorem 2.2. We start the analysis with the following lemma which states that η∗ε concentrates on S as ε ց 0. Lemma 3.1. The family {η∗ε , ε ∈ (0, 1)} is tight, and any sub-sequential limit as ε ց 0 has support on S. Proof. Since ℓ is inf-compact and V ε is bounded below, it follows by (2.6) that the stationary control v∗ε defined in Theorem 2.2 is stable. Let η∗ε be the invariant probability measure of the SDE dXt = m(Xt ) + εv∗ε (Xt ) dt + εν dWt . Also by Theorem 2.2 we have Z Rd Recall that η0ε ℓ(x) + 12 |v∗ε (x)|2 η∗ε (dx) = β∗ε . is the invariant probability measure of (1.1) under the control U ≡ 0. Define Z ε ℓ(x) η0ε (dx) . β0 := Rd By (2.1) we have Z Rd ℓ(x) η∗ε (dx) ≤ β∗ε ≤ β0ε ≤ γ0 γ ∀ ε ∈ (0, 1) . (3.1) Since ℓ is inf-compact, (3.1) implies that {η∗ε , ε ∈ (0, 1)} is tight. Let x(t) be the solution of (1.2). Therefore, if Cm denotes a Lipschitz constant of m and X0 = x(0), we obtain Z t Z t (3.2) |Xs − x(s)| ds + ε |v∗ε (Xs )| ds + εν |Wt | . |Xt − x(t)| ≤ Cm 0 0 Hence applying Gronwall’s inequality we obtain from (3.2) that Z t Cm t ε ν |Xs − x(s)| ≤ e ε |v∗ (Xs )| ds + ε sup |Ws | , 0 s≤t s ≤ t. (3.3) CONTROLLED EQUILIBRIUM SELECTION 11 In turn, for any δ > 0, (3.3) implies that Z t δe−Cm t δe−Cm t ε Px(0) |Xt −x(t)| ≥ δ ≤ Px(0) |v∗ (Xs )| ds ≥ , +Px(0) sup |Ws | ≥ 2ε 2εν s≤t 0 for t > 0 . By Jensen’s inequality we have Z t Z t δ2 e−2Cm t δe−Cm t ε 2 ε ≤ Px(0) |v∗ (Xs )| ds ≥ Px(0) |v∗ (Xs )| ds ≥ 2ε 4tε2 0 0 Z t 4tε2 2Cm t ε 2 ≤ e E |v (X )| ds . s x(0) ∗ δ2 0 Therefore for any compact set K ⊂ Rd we have Z Z ε 4t2 ε2 2Cm t e |v∗ε (x)|2 η∗ε (dx) Px |Xt − x(t)| ≥ δ η∗ (dx) ≤ δ2 Rd K + sup Px sup |Ws | ≥ s≤t x∈K δ −Cm t e . (3.4) 2εν It is clear that the right hand side of (3.4) tends to 0 as ε ց 0. Suppose that η∗ε → η̄ along some subsequence as ε ց 0. We claim that for any f ∈ Cb (Rd ), Z Z f (x) η̄(dx) , (3.5) ft (x) η̄(dx) = Rd Rd where ft (x) := f (x(t)), and x(t) is the solution of (1.2) with initial condition x(0) = x. Since the ω-limit set of the trajectories of (1.2) is supported on S, (3.5) shows that η̄ has support on S. Next we prove (3.5). It is enough to prove the claim for a bounded Lipschitz function f . Since η∗ε is an invariant probability measure we have Z Z ε f (x) η∗ε (dx) , Ex [f (Xt )] η∗ (dx) = Rd Rd v∗ε . where X solves (1.1) with control Hence to prove (3.5) it is enough to show that Z Z ε ft (x) η̄(dx) Ex [f (Xt )] η∗ (dx) −−−→ εց0 Rd Rd Rd for all bounded, Lipschitz functions f . Since ft : → R is a bounded continuous function it suffices to show that, for any compact set K, we have Z Ex [f (Xt )] − ft (x) η∗ε (dx) −−−→ 0 . (3.6) K εց0 But (3.6) follows by the Lipschitz property of f and (3.4). We next consider the three separate cases, i.e., the supercritical, subcritical and critical regimes. We use the following running example: Example 3.1. Let m be a vector field in R, of the form m = −∇F for F a ‘double well 4 3 potential’ given as follows: F (x) := x4 − x3 − x2 on [−10, 10], with F suitably extended so that it is globally Lipschitz and does not have any critical points outside the interval [−10, 10]. Then ∇F vanishes at exactly three points: −1, 0, 2. Of these, 0 is a local maximum, hence an unstable equilibrium for the o.d.e. ẋ(t) = m(x(t)), and both −1 and 12 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR 2 are local minima, hence stable equilibria thereof. Let ℓ(x) = c|x|2 on [−10, 10] for a suitable c > 0, modified suitably outside [−10, 10] to render it globally Lipschitz. Note that 5 , F (2) = − 38 . Thus x = 2 is the unique global minimum of F . F (0) = 0, F (−1) = − 12 3.1. Supercritical regime (ν > 1). Let z ∈ S and consider the control 1 Utε = v(Xt ) := − m(Xt ) + (Xt − z) , t ≥ 0 . ε Then X is given by dXt = −(Xt − z) dt + εν dWt , t ≥ 0 . (3.7) Since (3.7) has a unique strong solution and U is adapted to the family of sub-σ-fields generated by X, the control U satisfies the nonanticipativity condition and is therefore admissible. Moreover, a standard argument using the Lipschitz property of m and Gronwall’s inequality shows that, for some K > 0, we have Z t Z t K 2 ε 2 |Xs | ds < ∞ ∀t ≥ 0, |Us | ds ≤ E 1+ E ε 0 0 as desired. The diffusion in (3.7) has a Gaussian stationary distribution µε with mean z and variance O(ε2ν ). In particular, Z Z |m(x)|2 + |x − z|2 ε−2 µε (dx) |v|2 dµε ≤ 2 Rd Rd C ≤ 2 ε Z Rd |x − z|2 µε (dx) ≤ O(ε2ν−2 ) (3.8) for some C > 0, where the second inequality follows from the Lipschitz property of m combined with the fact that m(z) = 0. It follows by (3.8) that the corresponding ergodic cost is ℓ(z) + O(ε2ν−2 ) ≈ ℓ(z) for sufficiently small ε. In particular, this in conjunction with Lemma 3.1 leads to: Theorem 3.1. For ν > 1, it holds that lim β∗ε = min {ℓ(z) : z ∈ S} . εց0 and β∗ε ≤ min {ℓ(z) : z ∈ S} + O ε2ν−2 ∀ε ∈ (0, 1) . 3.1.1. An asymptotically optimal control that uses the energy function. It is worth mentioning here that if z ∈ Ss , then an asymptotically optimal control can be synthesized from an energy function. Let V be an energy function that attains a unique global minimum at z and such that |∇V| is Lipschitz and inf-compact. Such a function exists by Theorem 2.1 and Lemma 2.1. Consider the control 1 v ε (x) := − m(x) + ∇V(x) , t ≥ 0 . ε Then X is given by dXt = −∇V(Xt ) dt + εν dWt , t ≥ 0 . CONTROLLED EQUILIBRIUM SELECTION Let µε denote its unique stationary probability distribution, and Lεvε := controlled extended generator. Since 2 Z Rd ε2ν 2 ∆ + ∇V · ∇ its ε2ν k∆Vk∞ − |∇V|2 , 2 Lεvε V ≤ it follows that 13 |∇V|2 dµε ≤ ε2ν k∆Vk∞ . Note that µε has density ̺ε (x) = C(ε) e− fore we have Z Z ε 2 ε |v (x)| µ (dx) ≤ 2 2V(x) ε2ν Rd Rd ≤ 2 Z Rd , where C(ε) is a normalizing constant. There- |m(x)|2 + |∇V(x)|2 ε−2 µε (dx) ε−2 |m(x)|2 µε (dx) + ε2ν−2 k∆Vk∞ ≤ O(ε2ν−2 ) + ε2ν−2 k∆Vk∞ . For the last inequality we used the fact that m is bounded, m(z) = 0, and that V is locally quadratic around z. In Example 3.1, ℓ attains its minimum over {−1, 0, 2} at 0. Thus in the supercritical regime we have β∗ε ≈ ℓ(0) = 0. 3.2. Subcritical regime (ν < 1). Recall that Ss is the collection of stable equilibrium points. The following lemma holds for any ν > 0. It shows that if z ∈ Ss then there exists a Markov stationary control with asymptotic cost O(εn ) for any n ∈ N, that renders {z} stochastically stable. Lemma 3.2. For any ν > 0 and z ∈ Ss there exists a Markov control v ε , for ε ∈ (0, 1), with the following properties: (a) With µε denoting the invariant probability measure of (1.1) under the control v ε , it holds that Z 1 |v ε (x)|2 µε (dx) = 0 ∀n ∈ N. lim sup n ε d εց0 R (b) If Σ is the symmetric positive definite solution of the matrix Lyapunov equation T Dm(z) Σ + Σ Dm(z) = −I, and Cℓ a Lipschitz constant for ℓ, then Z q ε 1 lim sup ν ℓ(x) − ℓ(z) µ (dx) ≤ Cℓ trace Σ . ε εց0 Rd (c) If ν ∈ (2/3, 1), then β∗ε ≤ J3 + O(ε4ν−2 ). Proof. We let M := Dm(z). In order to simplify the notation, we translate the origin so that z ≡ 0. Let R be the symmetric positive definite solution to the Lyapunov equation M T R + RM = −3R2 . (3.9) 14 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR We define the control v ε by ε v (x) := ( −1 ε M x − m(x) if |x| ≥ εν/2 , 0 otherwise. 2ν controlled extended genLet Lεvε := ε2 ∆ + m(x) + εv ε (x) · ∇ denote the corresponding xT Rx 2ν ε erator. We apply the function F (x) := ε exp ε2ν to Lvε . For |x| ≥ εν/2 , using (3.9), we obtain Lεvε F (x) = ≤ xT Rx ε2ν trace(R) + |Rx|2 + xT (M T R + RM T )x e ε2ν xT Rx ε2ν trace(R) − 2|Rx|2 ) e ε2ν ≤ −|Rx|2 e For |x| ≤ εν/2 , we have Lεvε F (x) = ≤ xT Rx ε2ν ∀ε ≤ 2 −1/ν . trace(R)R−1 xT Rx ε2ν trace(R) + |Rx|2 + 2m(x) · Rx e ε2ν xT Rx ε2ν trace(R) − 2|Rx|2 + 2|M x − m(x)||Rx| e ε2ν . (3.10) (3.11) However, for some constant κ0 > 0, it holds that |M x − m(x)| ≤ κ0 |x|2 for all sufficiently small |x|. Therefore, for some ε0 > 0 we have − 2|Rx|2 + 2|M x − m(x)||Rx| ≤ −|Rx|2 However, since ε2ν trace(R) − |Rx|2 ≤ 0 for |x| ≤ ε /2 , ν ∀ ε ≤ ε0 . (3.12) 2 p trace(R) , for |x| ≥ εν R−1 it follows by (3.11)–(3.12) that there exist a positive constant κ1 , not depending on ε, such that n ν ∀ ε ≤ ε0 . (3.13) sup Lεvε F (x) : |x| ≤ ε /2 ≤ κ1 ε2ν By (3.10) and (3.13), for some ε′0 > 0, we obtain Lεvε F (x) ≤ κ1 ε2ν − |Rx|2 e By (3.14) we obtain Z −2 exp ε−ν R−1 |x| ≥ εν/2 which implies part (a). Consider the ‘scaled’ diffusion |x|2 µε (dx) ≤ xT Rx ε2ν Z ∀ ε ≤ ε′0 . |x| ≥ εν/2 |x|2 e ≤ R−1 κ1 ε2ν dX̂t = b̂ε (X̂t ) dt + dWt , t ≥ 0, xT Rx ε2ν (3.14) µε (dx) ∀ ε ≤ ε′0 , (3.15) CONTROLLED EQUILIBRIUM SELECTION 15 where m(εν x) + ε v ε (εν x) . εν ε ε and let µ̂ denote its invariant probability measure. It ̺ and ̺ˆε denote the densities of µε and µ̂ε respectively, then ενd ̺ε (εν x) = ̺ˆε (x) for all x ∈ Rd . The (discontinuous) drift b̂ε converges uniformly to M x as ε ց 0. Therefore µ̂ε converges to R the Gaussian density ρΣ as ε ց 0, uniformly on compact sets. Moreover, supε∈(0,1) Rd |x|4 µ̂ε (dx) < ∞ by (3.15). Therefore, by uniform integrability, recalling that z ≡ 0, and the Cauchy-Schwartz inequality we obtain we obtain Z 1/2 Z ε |x|2 ε 1 ℓ(x) − ℓ(0) µ (dx) ≤ Cℓ lim µ (dx) lim 2ν εց0 εց0 εν Rd ε Rd q ≤ Cℓ trace Σ . For part (c) apply the control v ε (x) = ε−1 M x − m(x) for x ∈ Rd . Using the bound |M x − m(x)| ≤ κ1 |x|2 for some constant κ1 , we obtain Z Z ε 2 ε κ21 ε−2 |x|4 µε (dx) ∈ O(ε4ν−2 ) . |v | dµ ≤ b̂ε := Rd Rd Also, using the Taylor series expansion of ℓ, and the fact that µε is Gaussian with mean 0 and covariance matrix ∝ ε2ν I, we have Z ℓ(x) − ℓ(0) µε (dx) ∈ O(ε2ν ) . (3.16) Rd This completes the proof. From Lemma 3.2 we see that we can always find a stable admissible control such that the corresponding invariant probability measure concentrates on a stable equilibrium point. Now we proceed to show that for ν < 1, η∗ε does not concentrate on S \ Ss . → 0. Lemma 3.3. Suppose ν < 1. Then η∗ε (S \ Ss ) −−− εց0 Proof. We argue by contradiction. If the statement of the theorem is not true then there exists x̂ ∈ S \ Ss and a decreasing sequence {εn }, with εn ց 0, such that lim inf η∗εn Br (x̂) > 0 (3.17) n→∞ for all balls Br (x̂) of radius r > 0 centered at x̂. In the sequel all limits as ε ց 0 are meant to be along this sequence {εn }. Without loss of generality we may assume that V satisfies (i)–(iv) in Theorem 2.1. Let δ > 0 be such that the interval (V(x̂) − 3δ, V(x̂) + 3δ) contains no other critical values of V other than V(x̂) (such a δ exists by (i) of Theorem 2.1). Let ϕ : R → R be a smooth function such that (a) ϕ(V(x̂) + y) = y for y ∈ (V(x̂) − δ, V(x̂) + δ) ; (b) ϕ′ ∈ [0, 1] on (V(x̂) − 2δ, V(x̂) + 2δ) ; (c) ϕ′ ≡ 0 on (V(x̂) − 2δ, V(x̂) + 2δ)c . 16 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR By Theorem 2.1 (ii) there exists r > 0 such that supx∈Br (x̂) ∆V(x) < 0. We may also choose this r small enough so that Br (x̂) ⊂ support ϕ′ (V(·)) ⊂ Brc (S \ {x̂}) . Thus Z Rd ′ ϕ (V)∆V dη∗ε Z = ′ ϕ (V)∆V Br (x̂) dη∗ε + Z Brc (S) ϕ′ (V)∆V dη∗ε . (3.18) Since V is inf-compact, it follows that ϕ ◦ V is constant outside a compact set. Therefore the support of ϕ′ (V(·)) is compact, and as a result ∆V is bounded on this set. By (3.18), since η∗ε Brc (S) ց 0 as ε ց 0, there exists ε0 > 0 such that Z ϕ′ (V)∆V dη∗ε < 0 ∀ ε < ε0 . (3.19) Rd By the infinitesimal characterization of an invariant probability measure we have Z Lε [ϕ ◦ V] x, v∗ε (x) η∗ε (dx) = 0 , Rd which we write as Z Z ε2ν ϕ′ (V) (v∗ε · ∇V) dη∗ε ϕ′ (V)∆V dη∗ε + ε 2 Rd d R Z Z ε2ν ′′ 2 ε ϕ′ (V)hm, ∇Vi dη∗ε . (3.20) + ϕ (V) |∇V| dη∗ = − 2 Rd d R By the Cauchy–Schwarz inequality we have Z 1/2 1/2 Z Z ′ ε 2 ε ′ 2 ε ′ ε ε ϕ (V) |v∗ | dη∗ ϕ (V) |∇V| dη∗ ϕ (V)(v∗ · ∇V) dη∗ ≤ Rd Rd Rd ≤ Let ζ ε p := 2β∗ε sup Rd Z Rd ′ p ϕ′ (V) Z Rd 2 ϕ (V) |∇V| dη∗ε ϕ′ (V) |∇V|2 dη∗ε 1/2 1/2 . . By (3.19)–(3.21) and Theorem 2.1 (iv) we obtain Z p p ε2ν 1 ε ε 2 ′ ε (ζ ) − ε 2β∗ sup ϕ (V) ζ − ϕ′′ (V) |∇V|2 dη∗ε ≤ 0 C0′ 2 Rd Rd (3.21) (3.22) ∀ ε < ε0 . (3.23) Using the quadratic formula on (3.23) we obtain ζ ε ∈ O(εν ). On the other hand, since ϕ′′ (V) = 0 on some open neighborhood of S, it follows that Z ϕ′′ (V) |∇V|2 dη∗ε −−−→ 0 . Rd εց0 Therefore, since by (3.20)–(3.21) we have Z Z p p ε2ν ε2ν ′ ε ε ε ′ ϕ (V)∆V dη∗ + ε 2β∗ sup ϕ (V) ζ + ϕ′′ (V) |∇V|2 dη∗ε ≥ 0 , 2 Rd 2 d d R R CONTROLLED EQUILIBRIUM SELECTION it follows that lim inf εց0 Z Rd 17 ϕ′ (V)∆V dη∗ε ≥ 0 . However, this contradicts (3.17), since ∆V(x̂) < 0 . Lemmas 3.2 and 3.3 imply the following: Theorem 3.2. If ν < 1, then ( min {ℓ(z) : z ∈ Ss } + O(εν ) ε β∗ ≤ min {ℓ(z) : z ∈ Ss } + O(ε4ν−2 ) if ν ∈ (0, 32 ) , if ν ∈ ( 23 , 1) . In Example 3.1, this leads to β∗ε ≈ ℓ(−1) = c. 3.3. Critical regime (ν = 1). Recall that a square matrix is called Hurwitz if its eigenvalues lie in the open left half complex plane. We need the following definition. Definition 3.1. Let A denote the collection of all d × d Hurwitz matrices. For M ∈ Rd×d define 1 Λ(M ) := min trace (A − M ) ΣA (A − M )T , (3.24) A∈A 2 where ΣA is the (unique) symmetric solution of the Lyapunov equation A ΣA + ΣA AT = −I . (3.25) Suppose z ∈ S, and without loss of generality assume that z = 0. Consider a family of ε of the form controls vA Ax − m(x) ε vA (x) := , ε ε of the diffusion where A ∈ Rd×d is Hurwitz. The stationary probability distribution ηA dXt = AXt dt + ε dWt is Gaussian, with zero mean, and covariance matrix ε−2 ΣA , with ΣA the solution of (3.25). Let M = Dm(0). Then for some constants κ1 = κ1 (A) and κ2 we have |Ax − m(x)|2 = |(A − M )x|2 + ((A − M )x, M x − m(x) + |M x − m(x)|2 . Using the Taylor series for M x − m(x) we write M x − m(x) = f (x) + g(x), where f is an even quadratic function, and |g(x)| ≤ κ|x|3 . Using also the Taylor series for ℓ as in (3.16), it follows that there exists a constant κ3 such that Z ε (dx) ≤ ℓ(0) + Λ(M ) + ε2 κ3 . (3.26) ℓ(x) + 12 |v∗ε (x)|2 ηA β∗ε ≤ Rd Therefore, by (3.26) we obtain lim sup β∗ε ≤ min ℓ(y) + Λ Dm(y) + O(ε2 ) . εց0 y∈S (3.27) This establishes the second part of Theorem 1.1 (iii). Remark 3.1. If y ∈ Ss then Dm(y) is Hurwitz, and by choosing A = Dm(y) in (3.24), it follows that Λ(Dm(y)) = 0. 18 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR It is worthwhile at this point to present the following one-dimensional example, which shows how the limiting value of β∗ε bifurcates as we cross the critical regime. Example 3.2. Let d = 1, m(x) = M x, and ℓ(x) = 12 Lx2 , with M > 0 and L > 0. Then the solution to (2.7) is: √ M + M 2 + Lε2 2 ε x , V = 2ε2 p ε2ν−2 β∗ε = M + M 2 + Lε2 . 2 ε ε Note that β∗ → ℓ(0) = 0, β∗ → M , and β∗ε → ∞, as ε ց 0, when ν > 1, ν = 1, and ν < 1, respectively. Recall from Theorem 1.1 that E+ (M ) denotes the sum of eigenvalues of a matrix M that lie in the open right half complex plane. The following result is certainly not new, but since we could not locate it in this form in the literature, a proof is included. Lemma 3.4. Suppose that M ∈ Rd×d has no eigenvalues on the imaginary axis, and let Q be the (unique) positive semidefinite symmetric solution of the matrix Riccati equation M T Q + QM = Q2 , (3.28) (M − Q)Σ + Σ(M − Q)T = −I , (3.29) satisfying for some symmetric positive definite matrix Σ. Let U∗SM denote the class of stationary Markov controls v with the following properties: (i) v : Rd → Rd is continuous and has at most linear growth. (ii) The diffusion dXt = M Xt − v(Xt ) dt + dWt is positive recurrent and its invariant probability distribution ηv has finite second moments. Let Λ(M ) be as defined in (3.24). Then we have the following properties: (a) It holds that Z 2 1 (3.30) inf∗ 2 |v(x)| ηv (dx) = E+ (M ) = Λ(M ) . v∈USM Rd (b) The Markov control v(x) = −Qx is the unique control in U∗SM which attains this infimum and it holds that Λ(M ) = 21 trace(Q). Proof. It is well known that there exists at most one symmetric matrix Q satisfying (3.28)(3.29) [12, Theorem 3, p. 150]. For κ > 0, consider the ergodic control problem of minimizing Z T 1 1 κ |Xs |2 + |Us |2 ds JM (U ) := lim sup (3.31) E 2 T →∞ T 0 over U∗SM , subject to the linear controlled diffusion Z t M Xs + Us ds + Wt , Xt = X0 + 0 t ≥ 0. (3.32) CONTROLLED EQUILIBRIUM SELECTION 19 As is well known, an optimal stationary Markov control for this problem takes the form Ut = −Qκ Xt , where Qκ is the unique positive definite symmetric solution to the matrix Riccati equation (see [12, Theorem 1, p. 147]) Ψκ (x) 1 T 2 (x Qκ x) Q2κ − M T Qκ − Qκ M = 2 κI . is a solution of the associated HJB equation i h 1 1 ∆Ψκ (x) + min M x + u · ∇Ψκ (x) + 12 |u|2 + κ |x|2 = trace(Qκ ) . d 2 2 u∈R Moreover, = (3.33) (3.34) The HJB equation (3.34) characterizes the optimal cost, i.e., inf∗ U ∈USM JM (U ) = 1 trace(Qκ ) . 2 Since the stationary probability distribution of (3.32) under the control Ut = −Qκ Xt is Gaussian, it follows by (3.31) that the matrix Aκ := M − Qκ minimizes 1 Jκ (A) := κ trace ΣA + trace (A − M ) ΣA (A − M )T 2 over all Hurwitz matrices A, where ΣA is as in (3.25) (compare with (3.24)). Combining this with (3.34) we have 1 inf Jκ (A) = trace(Qκ ) . A∈A 2 1 It follows that Λ(M ) ≤ 2 trace(Qκ ) for all κ > 0. Note also that κ 7→ 12 trace(Qκ ) is decreasing. It is straightforward to show that Λ(M ) = lim κց0 1 trace(Qκ ) . 2 (3.35) It is also well known that Qκ′ − Qκ is nonnegative definite if κ′ ≥ κ. Therefore Qκ has a unique limit Q as κ ց 0 which is nonnegative definite and satisfies (3.28). Next we prove that M − Q is Hurwitz. Let T be a coordinate transformation such that M̃ := T M T −1 = diag(M̃1 , −M̃2 ), where M̃1 and M̃2 are square Hurwitz matrices of dimension d − q and q respectively. We may select T = [T1 , T2 ]T with T1 ∈ R(d−q)×d and T2 ∈ Rq×d so that the columns of each submatrix Ti are orthonormal. By the orthonormality of the columns of T1 and T2 we have ! T I T T 1 d−q 2 TTT = , (3.36) Iq T2 T1T where Id−q and Iq are the identity matrices of dimension d − q and q, respectively. Let Q̃ := (T T )−1 QT −1 . Then (3.28) takes the form M̃ T Q̃ + Q̃M̃ = Q̃T T T Q̃ . We write Q̃ in the block form Q̃ = Q̃1 R RT Q̃2 ! (3.37) 20 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR with Q̃1 ∈ R(d−q)×(d−q) , R ∈ R(d−q)×d and Q̃2 ∈ Rq×q . Block multiplication in (3.37) results in ! Q̃ 1 M̃1T Q̃1 + Q̃1 M̃1 = Q̃1 R T T T , (3.38) RT and since M̃1 is Hurwitz, Q̃1 is positive semidefinite, and also the right-hand side is positive semidefinite, we must have Q̃1 = 0. However this means that the right hand side of (3.38) equals RRT , which implies that R = 0. Hence by (3.37) we obtain M̃2T Q̃2 + Q̃2 M̃2 = −Q̃22 . (3.39) We claim that Q̃2 is positive definite. Indeed, since by (3.39) the nullspace of Q̃2 is invariant under M̃2 , if Q̃2 is not invertible, then this implies that for some y ∈ Cq , with y 6= 0, we have Q̃2 y = 0, and M̃2 y = λy for some λ ∈ C. Since M̃2 is Hurwitz, λ must have negative real part. If we define ỹ := T −1 y0 , with y0 ∈ Cd , then we have −1 T −1 0 (M − Q)ỹ = (T M̃ T − T Q̃T )T y = −λỹ . (3.40) However, M − Qκ is Hurwitz for all κ > 0 by (3.33), and thus, by continuity, M − Q cannot have an eigenvalue with positive real part. This contradicts (3.40) and proves the claim. If M −Q is not Hurwitz, then for some nonzero vector x ∈ Cd , and ρ ∈ C with nonnegative real part, we must have (M − Q)x = ρx. The identity (M − Q)T Q + Q(M − Q) = −Q2 and the fact that Q is nonnegative definite imply that Qx = 0, and therefore M x = ρx. Moreover, Qx = 0 implies that Q̃T x = 0, and since Q̃2 is invertible we must have T2T x = 0. Hence T2T M x = 0, and we obtain M̃1 T1T x = T1T M x = ρT1T x, which contradicts the fact that M̃1 is Hurwitz. Therefore M − Q must be Hurwitz. It follows that Q satisfies (3.29) for some symmetric positive definite matrix Σ. Parenthetically, we mention that −1 Z ∞ M̃2 t M̃2T t . dt e e Q̃2 = 0 It remains to calculate the trace of Q. By (3.36) we obtain trace(Q) = trace T T diag(0, Q̃2 )T = trace diag(0, Q̃2 )T T T = trace(Q̃2 ) . (3.41) On the other hand, by (3.39) we have trace(Q̃2 ) = − trace(M̃2T + M̃2 ) = −2 trace(M̃2 ) . Thus by (3.35) and (3.41)–(3.42) we have shown that E+ (M ) = Λ(M ) = that v(x) = −Qx attains the infimum over all linear controls. (3.42) 1 2 trace(Q) and CONTROLLED EQUILIBRIUM SELECTION 21 Now let v̂ ∈ U∗SM be any control. Since v̂ is suboptimal for the problem associated with (3.34), then from (3.34) we obtain 1 1 ∆Ψκ (x) + M x + v̂(x) · ∇Ψκ (x) + 12 |v̂(x)|2 + κ |x|2 = trace(Qκ ) + fκ (x) , (3.43) 2 2 where fκ (x) = 12 |Qk x − v̂(x)|2 . Applying Itô’s formula to (3.43), and using the fact that ηv̂ has finite second moments and Ψκ is quadratic, a standard argument gives Z 1 κ |x|2 − fκ (x) + 12 |v̂(x)|2 ηv̂ (dx) = trace(Qκ ) . (3.44) 2 Rd R 1 2 η (dx) < ∞, and lim Since v̂ κց0 2 trace(Qκ ) = Λ(M ), it follows by (3.44) that R 1 Rd |x| 2 Rd 2 |v̂(x)| ηv̂ (dx) ≥ Λ(M ). Hence (3.30) holds. Suppose v̂ is optimal. Then taking limits in (3.44) we must have Z fκ (x) ηv̂ (dx) = 0 . lim κց0 Rd Since fκ is nonnegative and locally equi-continuous, and ηv̂ has density, by Fatou’s lemma we deduce that fκ → 0 as κ ց 0, uniformly on compacta. Therefore, v̂(x) = −Qx for all x ∈ Rd . We need the following lemma, which is valid for all ν. Lemma 3.5. For any bounded domain G, such that S ⊂ G, there exists a constant κ̂0 = κ̂0 (G, ν) such that 2 Z dist(x, S) η∗ε (dx) ≤ κ̂0 ∀ν > 0, sup ε2(ν∧2) ε∈(0,1) G where dist(x, S) denotes the Euclidean distance of x from the set S. Proof. Let V be an energy function as in Theorem 2.1 which agrees with the Lyapunov function V̄ in Hypothesis 1.1 on the complement of some ball containing S (see Lemma 2.1). We fix some bounded domain G which contains S, and choose some number δ such that δ ≥ supx∈G V(x). Let ϕ : R → R be a smooth function such that (a) ϕ(y) = y for y ∈ (−∞, δ) ; (b) ϕ′ ∈ (0, 1) on (δ, 2δ) ; (c) ϕ′ ≡ 0 on [2δ, ∞) . If ν ≤ 1, then the steps in the proof of Lemma 3.3 show that all the terms on the left hand side of (3.20) are of O(ε2ν ). Therefore, there exists a constant C0 > 0 such that Z hm, ∇Vi ε sup dη∗ ≥ −C0 . ε2ν ε∈(0,1) {x : V(x) ≤ δ} Thus, by Theorem 2.1 (iii) we can find a constant κ̂0 such that 2 Z dist(x, S) η∗ε (dx) ≤ κ̂0 . sup ε2ν ε∈(0,1) {x : V(x) ≤ δ} 22 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR Next we turn to the case ν > 1. Define Z ϕ′ (V) |v∗ε |2 dη∗ε . Gε := Rd By (3.21)–(3.22) we obtain Z ′ (V)(v∗ε · ∇V) dη∗ε ≤ ζε p Gε . (3.45) By (3.45) and Theorem 2.1 (iv), we have (compare with (3.23)) p 1 ε 2 (ζ ) − ε Gε ζ ε + O(ε2ν ) ≤ 0 . C0′ (3.46) Rd ϕ Moreover, by Theorem 2.1 (iii) and Hypothesis 1.1, for some constant κ1 > 0, it holds that Z 2 dist(x, S) η∗ε (dx) ≤ κ1 (ζ ε )2 . (3.47) {x : V(x) ≤ δ} Let G be a bounded open set containing S such that ℓ(x) > min {ℓ(z) : z ∈ S} for all x ∈ Gc . Since ℓ is Lipschitz, we have ℓ(x) − min {ℓ(z) : z ∈ S} ≥ Cℓ dist(x, S) in G. Using the Cauchy–Schwartz inequality, we deduce from (3.47) that Z (3.48) ℓ dη∗ε − min {ℓ(z) : z ∈ S} ≥ −κ2 ζ ε Rd for some constant κ2 > 0. Thus by (3.48) and Theorem 3.1 we have −κ2 ζ ε ≤ β∗ε − min {ℓ(z) : z ∈ S} ≤ O(ε2ν−2 ) which implies that Gε ≤ κ3 ζ ε ∨ ε2ν−2 (3.49) for some constant the quadratic formula to (3.46) and using (3.49) we √ κ3 > ν0. Applying 2ν−2 obtain Gε ∈ O ε Gε ∨ ε ∨ ε , which implies that p ν ε Gε ∈ O ε2 ∨ ε1+ /2 ∨ εν . √ √ Therefore ε Gε ∈ O(εν ) when ν ∈ (1, 2), and ε Gε ∈ O(ε2 ) when ν ≥ 2. The result then follows by applying the quadratic formula to (3.46) and using (3.47). Note that above proof gives us ζ ε ∈ O(ε2ν−2 ) for ν ∈ [1, 2]. In the proof of Lemma 3.5 we have established the following useful fact. R 2ν−2 ε ε 2 Corollary 3.1. For ν ∈ [1, 2) we have Rd |v∗ | dη∗ ∈ O(ε ), whereas if ν ≥ 2 then R R ε 2ν−2 ). In particular, ε 2 ε 2 Rd ℓdη∗ − min{ℓ(z) : z ∈ S} ∈ O(ε Rd |v∗ | dη∗ ∈ O(ε ). Moreover, using Theorem 3.1 we obtain that ( min{ℓ(z) : z ∈ S} + O(ε2ν−2 ) for ν ∈ (1, 2) , β∗ε = min{ℓ(z) : z ∈ S} + O(ε2 ) for ν ≥ 2 . We need an estimate on the growth of ∇V ε . First a definition. CONTROLLED EQUILIBRIUM SELECTION 23 Definition 3.2. For the rest of the paper {Bz : z ∈ S} is some collection of nonempty, disjoint balls, with each Bz centered around z, and we define BS := ∪z∈S Bz . We let r : [0, 1] → [0, 1] be a continuous increasing function with r(0) = 0 and such that r̆(ε) := For z ∈ S, we define r(ε) →∞ εν as ε ց 0 . Vbzε (x) := V ε (εν x + z) , and analogously, the ‘scaled’ vector field and penalty by m b εz (x) := m(εν x + z) , εν ℓbεz (x) := ℓ(εν x + z) . We define the ‘scaled’ density ̺ˆεz (x) := ενd ̺ε∗ (εν x + z), and denote by η̂zε the corresponding probability measure in Rd . We also define ε ̺ˆεz (x) if εν x + z ∈ Br̆(ε) (z) , η∗ (Br(ε) (z)) ε ̺˘z (x) := 0 otherwise, ˚ ̺εz (x) := ̺ˆεz (x) η∗ε (Bz ) 0 if εν x + z ∈ Bz , otherwise. and η̆zε (dx) = ̺˘εz (x) dx, η̊zε (dx) = ˚ ̺εz (x) dx. Lemma 3.6. Assume ν ∈ (0, 2], and let Ṽ ε := ε2 V ε . There exists a constant c0 such that |∇Ṽ ε (x)| ≤ c0 (1 + |x|) ∀ ε ∈ (0, 1) , ∀ x ∈ Rd . (3.50) Also, the same applies to V̆zε := ε2(1−ν) Vbzε , with Vbzε as in Definition 3.2. Proof. The function f ε := ε2−2ν V ε satisfies ε2ν ε2ν ∆f ε + m · ∇f ε − |∇f ε |2 = ε2−2ν β∗ε − ℓ . 2 2 (3.51) Thus for ν ∈ (0, 1], the result follows by applying [19, Lemma 5.1] to (3.51) and using the assumptions on the growth of m and ℓ. We next turn to the case ν ∈ (1, 2]. Let z ∈ Z1 (see Theorem 1.1). The function V̆zε satisfies 1 1 ∆V̆zε (x) + m b εz (x) · ∇V̆zε (x) − |∇V̆zε (x)|2 = ε2(1−ν) β∗ε − ℓbεz (x) . (3.52) 2 2 Since ℓ is Lipschitz, the map x 7→ ε2(1−ν) ℓbεz (x) − ℓ(z) is bounded on compact subsets of Rd , uniformly in ε ∈ (0, 1). By Theorem 1.1 (i), which is established in Corollary 3.1, the constants ε2(1−ν) β∗ε − ℓ(z) are uniformly bounded in ε ∈ (0, 1). Applying [19, Lemma 5.1] 24 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR to (3.52) it follows that V̆zε satisfies (3.50). Therefore, we have ∇x V ε (x + z) = ε−ν ∇y Vbzε (y)y=ε−ν x = = and the proof is complete. ε−ν ε2(1−ν) c0 1 + |ε−ν x| c0 ν ε + |x| , 2 ε We continue with a version of Lemma 3.5 for unbounded domains. Proposition 3.1. Let ν ∈ (0, 2]. Then for any k ∈ N and r > 0, there exist constants ε0 > 0 and κ̂1 such that 2k Z dist(x, S) η∗ε (dx) ≤ κ̂1 . sup 2∧2ν ε c ε∈(0,ε0 ) Br (S) Proof. Let Ṽ ε := ε2 V ε . By Lemma 3.6 the function Ṽ ε = ε2 V ε is locally bounded, uniformly ε in ε > 0. Applying the function V 2k eṼ to the operator ε2ν Lε∗ := ∆ + (m + ε2 ∇V ε ) · ∇ , 2 and using the identity Lε∗ Ṽ ε = ε2 β∗ε − ℓ − 12 |∇Ṽ ε |2 , after some calculations we obtain 2k Ṽ ε 1 − ε2ν 2k ∇V 2 ε ε 2k Ṽ ε 2 ε 2ν ∆V − ∇Ṽ + L∗ V e = V e ε β∗ − ℓ + kε V 2 1 − ε2ν V m · ∇V |∇V|2 2k 2ν + 2k + k (2k − 1)ε + . (3.53) V 1 − ε2ν V2 By Hypothesis 1.1 and parts (iii) and (iv) of Theorem 2.1, we can add a positive constant to V so that |∇V|2 2k m · ∇V m · ∇V 2ν + (2k − 1)ε + ≤ on Rd , (3.54) 2 2ν 2 V 1−ε V V for all sufficiently small ε. Let κ0 be a bound of β∗ε + k |∆V| V , and define 1 − ε2ν 2k ∇V 2 ε G0 := ℓ + ∇Ṽ + . 2ε2 1 − ε2ν V Using (3.54), we write (3.53) as 2k Ṽ ε m(x) · ∇V(x) 1 ε 2k Ṽ ε (x) 2−2∧2ν L V e (x) ≤ V (x) e . κ0 − ε G0 (x) + k 2∧2ν ε2∧2ν ∗ ε V(x) (3.55) Let F ε (x) denote the right-hand side of (3.55). Since ℓ is inf-compact, there exists r0 > 0 such that F ε < 0 on Brc0 . On the other hand, since m(x) · ∇V(x) is approximately quadratic near S and negative on S c , it follows that there exists κ > 0, and ε0 > 0, such that {x : F ε (x) > 0} ⊂ Bρ(ε) (S) ∀ ε < ε0 , CONTROLLED EQUILIBRIUM SELECTION 25 ε where ρ(ε) := κε1∧ν . By Lemma 3.6, κ0 V 2k (x) eṼ (x) is locally bounded, uniformly in ε > 0. Let κ1 be such a bound on Bρ(ε) (S). Then 2k Ṽ ε 1 m(x) · ∇V(x) ε 2k Ṽ ε (x) c L V e (x) ≤ κ1 − V (x) e IBρ(ε) G0 (x) − k 2∧2ν (S) (x) (3.56) ε2∧2ν ∗ ε V(x) for all x ∈ Rd . As a result of the strong maximum principle, V ε attains its infimum over Rd in the set {x ∈ Rd : ℓ(x) ≤ β∗ε }. Therefore, Ṽ ε is bounded below in Rd by Lemma 3.6. Thus, from (3.56) we obtain Z m(x) · ∇V(x) κ2 ∀ ε ≤ ε0 . (3.57) V 2k−1 (x) η∗ε (dx) ≤ 2∧2ν c ε k inf Rd eṼ ε B (S) ρ(ε) for some constant κ2 . Since by Hypothesis 1.1, V has strict quadratic growth and |m · ∇V| has strict linear growth, then (3.57) implies that for some constant κ3 , we must have Z 4k−1 ε 1 η∗ (dx) ≤ κ3 ∀ε ≤ ε0 . dist(x, S) 2∧2ν B c (S) ε ρ(ε) This finishes the proof. As a corollary to the above result we obtain the following. Corollary 3.2. For ν ∈ (0, 1), we have β∗ε − J3 ≥ O εν∧(2−2ν) . Proof. Recall from Theorem 1.1 that J3 = min{ℓ(z) : z ∈ Ss }. Consider z ∈ S \ Ss , and let ϕ, Br (z), and ζ ε , be as in Lemma 3.3. Recall from the proof of Lemma 3.3 that supBr (z) ∆V < 0. Thus, using Theorem 2.1 (iv) and Young’s inequality, we deduce from (3.20) that Z Z κ1 ε 2 1 ′ ε 2−2ν ϕ′ (V) |v∗ε |2 dη∗ε + κ3 (ζ ε )2 ϕ (V)∆V dη∗ + 2ν (ζ ) ≤ κ2 ε − 2 Br (z) ε d R for some positive constants κi , i = 1, 2, 3 . It follows that for some r0 ≤ r we have η∗ε (Br0 (z)) ∈ O(ε2−2ν ). Therefore there exists a neighborhood B of S \ Ss such that η∗ε (B) ∈ O(ε2−2ν ). Let K ⊂ Rd be a compact set satisfying ℓ(x) − J3 ≥ 0 for x ∈ / K, and define G1 := ∪z∈S Br0 (z) \ B and G2 := K \ (G1 ∪ B). We can choose r0 and B in such a way that {Br0 (z) : z ∈ S} are disjoint and B ∩ Ss = ∅. By Proposition 3.1 we have η∗ε (G2 ) ∈ O(ε2ν ). Therefore Z ℓ dη∗ε − min{ℓ(z) : z ∈ Ss } β∗ε − J3 ≥ Rd ≥ −κ4 η∗ε (B) + Z G1 dist(x, S) dη∗ε ≥ −O(ε2−2ν ) − O(εν ) − O(ε2ν ) ≥ −O εν∧(2−2ν) , for some constant κ4 , where we use Lemma 3.5 in the third inequality. We proceed to prove the converse inequality to (3.27). + η∗ε (G2 ) 26 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR Lemma 3.7. Assume ν = 1, and let {Bz : z ∈ S} and r(ε) be as in Definition 3.2. Suppose that for some z ∈ S and some sequence εn ց 0 it holds that η∗εn (Bz ) → ξz > 0. All limits are assumed along this sequence. Normalize V ε so that V ε (z) = 0, and let M ≡ Dm(z). Let Vbzε be as in Definition 3.2. Then (a) The sequence Vbzεn converges to some function V ∈ C 2 (Rd ), along a subsequence of εn ց 0 (also denoted as {εn }), and V satisfies h i 1 (3.58) ∆V (x) + min M x − u · ∇V (x) + 21 |u|2 + ℓ(z) = β̄ . 2 u∈Rd (b) It holds that β̄ ≥ ℓ(z) + E+ Dm(z) . (c) The diffusion dXt = M Xt − ∇V (Xt ) dt + dWt (3.59) is positive recurrent, and its invariant probability measure η̄ has finite second moments. (d) The densities ̺˘εz and ˚ ̺εz in Definition 3.2 converge to the density ̺¯ of η̄, uniformly on compact sets. (e) It holds that Z lim inf εn ց0 Br(εn ) (z) ℓ(x) + 12 |v∗εn (x)|2 η∗εn (dx) ≥ ξz ℓ(z) + E+ Dm(z) . Proof. By (2.7) we obtain 1 1 bε ∆ Vz + m b εz · ∇Vbzε − |∇Vbzε |2 + ℓbεz = β∗ε . 2 2 By [19, Lemma 5.1] (see also Lemma 3.6) and the assumptions on m and ℓ, there exists a constant c0 such that |∇Vbzε (x)| ≤ c0 (1 + |x|) ∀ ε ∈ (0, 1) , ∀ x ∈ Rd . (3.60) It then follows that Vbzε is locally bounded in C 2,α (Rd ) for any α ∈ (0, 1). Taking limits in (2.6) along εn ց 0 we obtain a function V ∈ C 2 (Rd ) and a constant β̄ which satisfy (3.58). This proves part (a). In order to show that the diffusion in (3.59) is positive recurrent, consider the diffusion dXt = m b εz (Xt ) − ∇Vbzε (Xt ) dt + dWt , (3.61) Recall from Definition 3.2 that η̂zε and ̺ˆεz denote the invariant probability measure of (3.61) and its density, respectively. Let 1 b εz − ∇Vbzε · ∇ Lbεz := ∆ + m 2 denote the extended generator of (3.61). By Lemma 3.5 and the Markov inequality we κ̂0 ε obtain η∗ Bz \ Bnε (z) ≤ n2 for all n ∈ N. It follows that {η̊zεn : n ∈ N} is a tight family of measures. Since, by the Markov inequality just mentioned, we have κ̂0 ε2 , (3.62) η∗ε Br(ε) (z) ≥ η∗ε (Bz ) − 2 r (ε) CONTROLLED EQUILIBRIUM SELECTION 27 and η∗εn (Bz ) → ξz > 0 as εn ց 0 the family {η̆zεn : n ∈ N} is also tight. By the Harnack inequality the family {ˆ ̺εzn : n ∈ N} is locally bounded, and locally Hölder equi-continuous, and the same of course applies to {˘ ̺εzn : n ∈ N}. Moreover, the tightness of {η̆zεn : n ∈ N} implies the uniform integrability of {˘ ̺εzn : n ∈ N}. Select any subsequence, also denoted by ε n {εn } along which ̺˘z converges locally uniformly, and denote the R limit by ̺¯. By uniform integrability, ̺˘εzn also converges in L1 (Rd ), as n → ∞, and hence Rd ̺¯(x) dx = 1. Therefore η̄(dx) := ̺¯(x) dx is a probability measure. Let f be a smooth function with compact support, and 1 ∆ + M x − ∇V (x) · ∇ . 2 L̄ := Then Z Rd ̺εzn (x) dx Lbεzn f (x)˘ Z ε ε L̄f (x)¯ ̺(x) dx ≤ − Lbzn f (x) ̺˘zn (x) − ̺¯(x) dx Rd Rd Z ε n + Lbz f (x) − L̄f (x) ̺¯(x) dx . d Z (3.63) R Since ̺˘εzn → ̺¯ in L1 (Rd ), the first term on the right hand side of (3.63) converges to 0 as n → ∞. Similarly since m b εzn (x) → M x and Vbzεn → V uniformly on compacta, the second term also converges to 0. Since η̂zε is an invariant probability measure of (3.61), by the R ̺εzn (x) dx = 0, for all large enough εn , which implies definition of ̺˘εzn we have Rd Lbεzn f (x)˘ R that Rd L̄f (x)¯ ̺(x) dx = 0. Hence, η̄ is an infinitesimal invariant probability measure of (3.59), and since the diffusion is regular, it is also an invariant probability measure. This proves part (d). Since the diffusion in (3.59) has an invariant probability measure, it follows that it is positive recurrent. By Lemma 3.5 we have sup ε∈(0,1) Z {εν x+z ∈ Bz } |x|2 η̂zε (dx) < ∞ . R By Fatou’s lemma we obtain Rd |x|2 η̄(dx) < ∞. This completes the proof of part (c). R Since V has at most quadratic growth by (3.60), we have Rd |V (x)| η̄(dx) < ∞. Thus, by Lemma 4.12 in [17], if Exdenotes the expectation operator for the process governed by (3.59), it is the case that Ex V (Xt ) converges as t → ∞. Integrating both sides of (3.58) with respect to η̄, we deduce that Z Rd 1 2 |∇V (x)|2 η̄(dx) = β̄ − ℓ(z) . (3.64) Since the Markov control v(Xt ) = −∇V (Xt ) is in general suboptimal for the problem in (3.30), then by Lemma 3.4 we must have E+ (M ) ≤ β̄ − ℓ(z). This proves part (b). 28 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR ε By (3.62) we have ̺̺˘ˆεz → ξz along {εn } as n → ∞, uniformly on compacts. Therefore, z using Fatou’s lemma, we obtain by part (b) that Z ℓ(x) + 21 |v∗εn (x)|2 η∗εn (dx) lim inf εn ց0 Br(εn ) (z) = lim inf εn ց0 Z ̺ˆεn (x) ℓbεzn (x) + 12 |∇Vbzεn (x)|2 εzn η̆zεn (dx) ̺ ˘ (x) ν z {ε x+z ∈ Br̆(ε) (z)} This proves part (e) and thus completes the proof. ≥ ξz β̄ ≥ ξz E+ (M ) + ℓ(z) . (3.65) The statement in Theorem 1.1 (iii) follows from the following result. Theorem 3.3. Recall the definition of J2 from Theorem 1.1. Provided ν = 1, it holds that lim β∗ε = J2 . εց0 Proof. In view of (3.27), it is enough to show that lim inf εց0 β∗ε ≥ J2 . Choose any sequence εn εP n ց 0 such that η∗ (Bz ) → ξz , for all z ∈ S. Let S o := {z ∈ S : ξz > 0}. Thus z∈S o ξz = 1. Therefore, by Lemma 3.7 (e) we have X Z ℓ(x) + 12 |v∗εn (x)|2 η∗εn (dx) lim inf β∗εn ≥ n→∞ z∈S o ≥ X z∈S o Br(εn ) (z) ξz ℓ(z) + E+ Dm(z) ≥ J2 , and the proof is complete. (3.66) Recall the definition of Z2 from Theorem 1.1, and let the function r be as in Definition 3.2. The inequality in (3.66) suggests that the control effort concentrates in the vicinity of Z2 . This is asserted in the theorem that follows. c (Z ) → 0 as ε ց 0. Moreover, Theorem 3.4. Suppose ν = 1. Then η∗ε Br(ε) 2 Z |v∗ε (x)|2 η∗ε (dx) = 0 . lim εց0 c Br(ε) (Z2 ) Proof. The first claim follows by Lemma 3.5 and Theorem 3.3. To prove the second claim, given given any sequence εn ց 0, we extract a subsequence also denoted by εn along which limn→∞ η∗εn (BP z ) → ξz for all z ∈ S. As in the proof of Theorem 3.3, let So := {z ∈ S : ξz > 0}. Then z∈So ξz = 1, and So ⊂ Z2 by Theorem 3.3. Then, by (3.65) we have Z ℓ(x) + 21 |v∗εn (x)|2 η∗εn (dx) ≥ J2 , lim inf n→∞ Br(εn ) (S o ) CONTROLLED EQUILIBRIUM SELECTION and hence, using Theorem 3.3 we obtain Z lim inf n→∞ c Br(ε (S o ) n) 29 |v∗εn (x)|2 η∗εn (dx) = 0 , and the proof is complete. We next show that, under the hypothesis of Theorem 1.2. the scaled density ˚ ̺εz (x) in Definition 3.2 converges to that of a Gaussian distribution as ε ց 0. This establishes Theorem 1.2 for the critical regime. Theorem 3.5. Let ν = 1, and suppose that for some z ∈ S and a sequence εn ց 0, we bz , Σ b z ) be the pair of matrices have limn→∞ η∗εn (Bz ) = ξz > 0. Set M ≡ Dm(z), and let (Q which solves (1.5). The following hold: (a) Provided that we normalize V ε by setting V ε (z) = 0, it holds that V εn (εn x + z) → 1 Tb 2,α (Rd ), α ∈ (0, 1). 2 x Qz x as n → ∞, in C (b) The density ˚ ̺zεn in Definition 3.2 converges as n → ∞ (uniformly on compact sets) b z. to the density of a Gaussian with mean 0 and covariance matrix Σ Proof. Since β̄ in Lemma 3.7 satisfies β̄ ≤ lim supεց0 β∗ε , it follows by Theorem 3.3 that β̄ = J2 . Therefore, by (3.64) we have Z 2 1 2 |∇V (x)| η̄(dx) = E+ Dm(z) . Rd b z x for all x ∈ Rd , as claimed, follows by Lemma 3.4 (b). This proves the That ∇V (x) = Q first part of the result. b z x, the invariant probability measure η̄∗ of (3.59) is Gaussian with Since ∇V (x) = Q b z as stated. By Lemma 3.7 (d), ̺˘ε → ̺¯ and ̺ˆεzε → ξz as ε ց 0, covariance matrix Σ z ̺˘z uniformly on compact sets. Since η∗εn (Bz ) → ξz , the second part of the theorem follows and the proof is complete. It is interesting to note that part (a) of Theorem 3.5 holds for any z ∈ Z2 . We state this as follows. Theorem 3.6. Suppose ν = 1. Recall the definition of Z2 from Theorem 1.1. Then for any z ∈ Z2 , and subject to the normalization V ε (z) = 0, it holds that uniformly on compact sets. 1 bz x , Vbzε (x) −−−→ xT Q ε→0 2 Proof. Let M = Dm(z). Let εn ց 0 be any sequence, and extract a subsequence, also denoted as {εn } along which Vbzεn → V ∈ C 2 (Rd ). Taking limits as εn ց 0 in (2.6), and b z , we obtain using Theorem 3.3 and the fact that E+ Dm(z) = 21 trace Q h i 1 1 2 1 bz ) . ∆V (x) + min M x + u · ∇V (x) + 2 |u| = trace(Q d 2 2 u∈R 30 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR b z x we obtain By the suboptimality of u(x) = −∇Q 1 b z x|2 = 1 trace(Q b z ) + f (x) , b z x · ∇V (x) + 1 |Q ∆V (x) + M x − Q (3.67) 2 2 2 for some nonnegative function f . However, since ∇V has linear growth, then both f and V have at most quadratic growth. Hence, R repeating the argument that led to the derivation b denotes the of (3.44), we obtain from (3.67) that Rd f (x)ρΣ (x) dx = 0, hence f ≡ 0. If E bz )Xt and identity diffusion expectation operator for the diffusion Xt with linear drift (M − Q matrix then the stochastic representation of the solution of (3.67) gives Z τ̂r 1 Tb 1 b 2 b b q(x) := x Qz x = lim inf Ex |Qz Xt | − trace(Qz ) dt , rց0 2 2 0 where τ̂r is the first hitting time of the ball Br [1, Theorem 3.7.12]. Applying Dynkin’s formula to (3.67), and since f ≡ 0, we obtain V ≥ q. Also with L̄ := 12 ∆ + M x − ∇V · ∇, we have L̄(V −q) ≤ 0. Since V (0) = q(0), we have V = q on Rd by the comparison principle, and the proof is complete. Remark 3.2. In summary, when ν = 1, the control cost for having η∗ε concentrate around a point z ∈ S is E+ (Dm(z)). Therefore a point z ′ ∈ S will be chosen over a point z ∈ S, if and only if ℓ(z ′ ) + E+ (Dm(z ′ )) < ℓ(z) + E+ (Dm(z)). Recall also that E+ (Dm(z)) = 0 for z ∈ Ss . In Example 3.1, Dm(0) = 2. Therefore, applying Theorem 3.3 and Lemma 3.4, we conclude that in the critical regime the invariant probability distribution η∗ε concentrates at x = 0 if c > 2, and at x = −1 if c < 2. 3.4. The supercritical and subcritical regimes revisited. We return to the analysis of the supercritical and subcritical regimes, in order to determine the asymptotic behavior of the invariant probability distribution in the vicinity of the minimal stochastically stable set. In these regimes there are two scales. If we center the coordinates around a point is S, then we have V ε (x) ∈ O(ε−2 x), while − log ̺ε∗ (x) ∈ O(ε−2ν x). We start with the subcritical regime. 3.4.1. Subcritical Regime. Recall the notation introduced in Definition 3.2. Let z ∈ Z3 be such that for some sequence εn ց 0 it holds that lim inf n→∞ η∗εn (Bz ) > 0. Let r be as in Definition 3.2. Normalize V ε by setting V ε (z) = 0, and let M := Dm(z). We scale the space as 1/εν , to obtain ε2(1−ν) b ε 1 bε ∆Vz (x) + m b εz (x) · ∇Vbzε (x) − |∇Vz (x)|2 + ℓbεz (x) = β∗ε . (3.68) 2 2 By Lemma 3.6, ∇V̆zε = ε2(1−ν) ∇Vbzε is locally bounded and has at most linear growth. We write (3.68) in the form (3.52) which corresponds to the scaled diffusion bt = m bt ) − ε2(1−ν) ∇Vbzε (X bt ) dt + dW ct , dX b εz (X (3.69) and the associated HJB equation h 1 ε ∆V̆z (x) + min m b εz (x) + ε1−ν u · ∇V̆zε (x) + 2 u∈Rd ε2(1−ν) |u|2 2 i = ε2(1−ν) β∗ε − ℓbεz (x) . (3.70) CONTROLLED EQUILIBRIUM SELECTION 31 Letting ũ = ε1−ν u and taking limits in (3.69)–(3.70) along some subsequence εn ց 0, we obtain a function V ∈ C 2 (Rd ) of at most quadratic growth satisfying bt ) dt + dW ct , bt = M X bt − ∇V (X (3.71) dX and i h 1 ∆V (x) + min M x + ũ · V (x) + 21 |ũ|2 = 0 . 2 ũ∈Rd Following the proofs of Lemma 3.7 and Theorem 3.5, and using Lemma 3.5, we deduce that the densities ̺˘εz and ˚ ̺εz in Definition 3.2 converge as εn ց 0 to the density of a Gaussian b z , uniformly on compact sets. Also, V (x) = 1 xT Q b z x. distribution with covariance matrix Σ 2 bz = 0 by Remark 3.1 and Lemma 3.4. However, since Dm(z) is Hurwitz, then Q 3.4.2. Supercritical Regime. We assume ν ∈ (1, 2), and we use the same scaling and definitions as in Section 3.4.1 for the subcritical regime, except that z ∈ Z1 . It is clear that ε2(1−ν) ℓbεz (x) − ℓ(z) → 0 as ε ց 0. By Corollary 3.1 the constants ε2(1−ν) β∗ε − ℓ(z) are bounded, uniformly in ε ∈ (0, 1). Extract a subsequence εn ց 0 2(1−ν) along which εn β∗ε − ℓ(z) converges to a constant, and a further subsequence along which V̆zε converges to V ∈ C 2 (Rd ), uniformly on compact sets. We obtain h i 1 ∆V (x) + min M x + ũ · V (x) + 21 |ũ|2 = β̂ . 2 ũ∈Rd With ũ = ε1−ν u, as defined earlier, and noting that the scaled diffusion in (3.71) has invariant probability distribution η̂zε we obtain by (3.70) that Z bε Z ℓz (x) − ℓ(z) ε ν 2 ε 2(1−ν) ε |ũ(ε x + z)| η̂z (dx) + η̂ (dx) = ε β − ℓ(z) . (3.72) z ∗ ε2ν−2 Rd Rd Let G be a bounded open set containing S such that ℓ(x) > ℓ(z) for all x ∈ Gc . Since z ∈ J1 and ℓ is Lipschitz, we have ℓ(x) − ℓ(z) ≥ Cℓ dist(x, S) in G. Hence using the fact that 2ν − 2 < ν together with the Cauchy–Schwartz inequality and Lemma 3.5 it follows that the second term in (3.72) vanishes as ε ց 0. Therefore, following the arguments in Lemma 3.7 it follows that β̂ ≥ E+ Dm(z) , which implies that Z ε2(1−ν) |v∗ε (εν x + z)|2 η̊zε (dx) ≥ E+ Dm(z) . lim inf 2 εց0 εν x+z ∈ Bz b z (x − z) applied to the nonHowever, under the control uε (x) = − 1ε m(x) − M (x − z) + Q scaled problem, and with µε denoting the stationary distribution of the controlled process we obtain Z ε2(1−ν) |u(x)|2 µε (dx) = E+ Dm(z) . lim 2 εց0 Rd Therefore β̂ = E+ Dm(z) . Hence arguing as in Theorem 3.3 we can show that Z ε2(1−ν) (3.73) |v∗ε (εν x + z)|2 η̊zε (dx) = E+ Dm(z) . lim 2 εց0 εν x+z ∈ Bz bz x, and that ˚ ̺εz converges Moreover, by the proof of Theorem 3.5 we obtain V (x) = 12 xT Q bz. to the density of a Gaussian distribution with covariance matrix Σ 32 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR 3.5. On Theorem 1.3. We summarize the conclusions of Sections 3.4.1–3.4.2 in the following lemma. Lemma 3.8. Assume ν ∈ (0, 2) and let {Bz : z ∈ S} be as in Definition 3.2. Suppose that for some z ∈ S and some sequence εn ց 0 it holds that lim inf n→∞ η∗εn (Bz ) > 0. Let ı = 1, 2, or 3, according to as ν > 1, ν = 1, or ν < 1, respectively. Then, the density ˚ ̺εzn in Definition 3.2 converges as n → ∞ (uniformly on compact sets) to the density of a b z. Gaussian with mean 0 and covariance matrix Σ The ergodic control problem in (1.1) and (1.3) is of course equivalent to minimizing Z T 1 1 2 ℓ(Xs ) + 2 |Ũs − m(Xs )| ds E lim sup 2ε T →∞ T 0 over all admissible controls Ũ , subject to Z t Xt = X0 + Ũs ds + εν Wt , 0 t ≥ 0. It follows that if v∗ε is an optimal stationary Markov control for (1.1), (1.3), then ṽ∗ε := m(x) + εv∗ε is an optimal control for (3.31), (3.24), and vice-versa. e z [v](x) by Suppose z ∈ S. We define the ‘running cost’ R e z [v](x) := ℓ(x) − ℓ(z) + 1 |v(x) − m(x)|2 . R 2ε2 Then Z e z [ṽ ε ](x) η ε (dx) . R β∗ε = ℓ(z) + ∗ ∗ Rd b z )(x − z), with M = Dm(z), Suppose ν ≥ 1. Using the suboptimal control ṽz (x) = (M − Q ε instead of ṽ∗ we obtain Z ℓ(x) − ℓ(z) ε 1 b z D 2 ℓ(z) . lim ρΣb (x) dx = trace Σ (3.74) 2ν z εց0 Rd ε 2 b z . We write where ρεb denotes the Gaussian density with covariance matrix ε2ν Σ Σz where Rz (x) := b z (x − z)|2 + Rz (x) , |ṽz (x) − m(x)|2 = |Q b z )(x − z) − m(x) . M (x − z) − m(x) · (M − 2Q (3.75) Since |Rz (x)| ≤ C|x − z|3 for some constant C, and the third moments of ρεb are 0, it Σz follows by Lemma 3.4 and (3.75) that Z 1 2 ε 2ν (3.76) d 2ε2ν |ṽz (x) − m(x)| ρΣb z (x) dx − E+ Dm(z) ∈ O(ε ) . R By (3.74) and (3.76), we obtain Z 1 e z [ṽz ](x)ρε (x) dx = E+ Dm(z) . R lim 2ν−2 b Σz εց0 ε Rd CONTROLLED EQUILIBRIUM SELECTION 33 We are now ready for the proof of the theorem. Proof of Theorem 1.3. Suppose first that ν ∈ (1, 2). Select the collection of balls {Bz : z ∈ S} so as to satisfy ∀z ∈ S \ Z1 . (3.77) inf ℓ > min ℓ S Bz Let εn ց 0 be any sequence. Extract some subsequence also denoted by {εn } along which η∗εn (Bz ) converges to ξz ∈ [0, 1] for all z ∈ Z1 . Let Z1′ := {z ∈ Z1 : ξz > 0} . Also define δ(ε) := sup max η∗εn (Bz ) . εn <ε z∈S\Z1′ → 0. In the rest of the proof By Theorem 1.1 (i) and the definition of Z1′ we have δ(ε) −−− εց0 all limits or estimates are meant to be along the subsequence {εn }. By (3.73) we have Z e z [ṽ ε ](x)̺ε (x) dx ≥ ε2ν−2 η ε (Bz ) E+ Dm(z) − δ1 (ε) ∀ z ∈ Z1′ , (3.78) R ∗ ∗ ∗ Bz where δ1 (ε) → 0 as ε ց 0. Let z ∗ ∈ Z1∗ . By the Cauchy–Schwartz inequality and Lemma 3.5 we obtain 2 1/2 Z Z ℓ(x) − ℓ(z) |ℓ(x) − ℓ(z)| ε ε < ∞ . (3.79) sup η∗ (dx) ≤ sup η∗ (dx) εν ε2ν Bz ε∈(0,1) Bz ε∈(0,1) It follows by (3.77), (3.78)–(3.79), and Proposition 3.1 that X Z XZ ε ε ε ∗ e ℓ(x) − ℓ(z) η∗ε (dx) Rz [ṽ∗ ](x) η∗ (dx) + β∗ − ℓ(z ) ≥ z∈Z1′ Bz z∈Z\Z1′ + ≥ ε2ν−2 X z∈Z1′ Z BcS Bz ℓ(x) − ℓ(z) η∗ε (dx) η∗ε (Bz ) E+ Dm(z) − δ1 (ε) + O(εν ) + O(ε2 ) . b z )(x − z ∗ ), we have by (3.76) that On the other hand, with ṽz ∗ (x) = (M − Q Z e z ∗ [ṽz ∗ ](x)ρε (x) dx ≤ ε2ν−2 E+ Dm(z ∗ ) + O(ε2 ) . R b ∗ Σ z Rd Combining (3.80)–(3.81), and since ṽz ∗ is suboptimal, we obtain X η∗ε (Bz ) E+ Dm(z) ≤ E+ Dm(z ∗ ) + δ2 (ε) , (3.80) (3.81) (3.82) z∈Z1′ with δ2 (ε) −−−→ 0. By the definition of δ we have εց0 X z∈Z1′ η∗ε (Bz ) ≥ 1 − δ(ε) . (3.83) 34 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR Therefore, by (3.82)–(3.83) we obtain X η∗ε (Bz ) E+ Dm(z) − E+ Dm(z ∗ ) −−−→ 0 , εց0 z∈Z1′ which together with (3.83) and the definition of Z1′ implies that η∗ε (Bz ) −−−→ 0 εց0 ∀z ∈ Z1 \ Z1∗ . Also, by (3.80) and (3.83) we have β∗ε − J1 ≥ ε2ν−2 E+ Dm(z ∗ ) + o ε2ν−2 ∀ z ∗ ∈ Z1∗ , which together with the upper bound in (3.81) implies (1.6), and the proof is complete. 4. Concluding remarks In general, Morse–Smale flows may contain hyperbolic closed orbits, and it would be desirable to extend the results of the paper accordingly. An energy function V as in Theorem 2.1 may be constructed to account for critical elements that are closed orbits [20, 25]. Note that under the control used in Section 3.1.1 the optimal stationary probability distribution concentrates on the minimum of V. In the case that z ∈ Rd belongs to a stable periodic orbit with period T0 , we can construct V so that it attains its minimum on this closed orbit. In this manner, if φt denotes the flow of the vector field m, then under the control used in Section 3.1.1 we obtain Z T0 Z 1 ℓ φt (z) dt . ℓ(x) µε (dx) −−−→ εց0 T0 0 Rd The same can be done in the subcritical regime, by modifying the proof of Lemma 3.2. We leave it up to the reader to verify that Lemma 3.1 still holds if the set of critical elements S contains hyperbolic closed orbits. Let us define Z T0 1 ℓ̆(z) := ℓ φt (z) dt , T0 0 when z belongs to a closed orbit, and ℓ̆(z) = ℓ(z), when m(z) = 0. Then, provided Arg minz∈S ℓ̆(z) contains only stable critical elements, then the support of the limit of the stationary probability distribution lies in Ss , and this is true in any of the three regimes. However, the full analysis when unstable closed orbits are involved, seems to be much more difficult. Appendix A. Proofs of Lemma 1.1 and Theorem 2.2 Proof of Lemma 1.1. Let (Ω, F, Ft , P) be a complete probability space, and {Wt } be a ddimensional standard Brownian motion defined on it. Consider any admissible control U . Let o n Rt τN := inf t ≥ 0 : 0 |Us |2 ds ≥ N ε−1 , N ≥ 1 . Since E Z 0 T |Us |2 ds < ∞ ∀ T ∈ (0, ∞) , CONTROLLED EQUILIBRIUM SELECTION it follows that τN ↑ ∞ a.s., as N → ∞. Define Z Z t∧τN ε2−2ν t∧τN 1−ν 2 ΛN (t) := exp −ε |Us | ds , hUs , dWs i − 2 0 0 35 t ∈ (0, ∞) . Let P0 and E0 [ · ] denote the law of the process with U ≡ 0 and the expectation operator under this law, respectively. In turn PN and EN denotes the measure defined by d PN/d P0 = ΛN (T ) and the corresponding expectation, respectively. Then the laws of (X · ∧τN , U · ∧τN ) under PN and P coincide. Therefore E0 ΛN (T ) ln ΛN (T ) = EN ln ΛN (T ) Z Z T ∧τN ε2−2ν T ∧τN 2 1−ν |Us | ds = EN −ε hUs , dWs i − 2 0 0 Z T ∧τN 1 2−2ν 2 = EN ε |Us | ds 2 0 Z T 1 −2ν 2 ≤ E ε |Us | ds 2 0 < ∞. The third equality follows from the fact that under (Ω, Ft∧τN , PN ), Z t∧τN Wt∧τN + ε−ν Us ds , t ≥ 0 , 0 is a Brownian motion stopped at τN . Since x(ln(x))+ ≤ x ln(x) + e−1 , it follows that + < ∞. sup E0 ΛN (T ) ln(ΛN (T )) N By [11, Theorem 1.3.4, p. 10], {ΛN (T ∧ τN )} are uniformly integrable and converge a.s. and in L1 to Λ(T ), which then satisfies E0 [Λ(T )] = 1. Defining the probability measure P′ by d P′/d P0 = Λ(T ), we obtain Z t Wt + ε−ν Us ds , t ≥ 0 , 0 as a (Ω, Ft , P′ ) Brownian motion. This gives a weak solution for the control Us over [0, T ] under the probability measure P0 . To show the weak uniqueness we see that given two solutions we can always define P′ as above and get two weak solutions of (1.1) with U = 0 under the law of P′ . But we have weak uniqueness in the latter. Hence weak uniqueness follows. The rest of this section is devoted to the proof of Theorem 2.2. Without loss of generality we fix ε = 1. We consider the equation i h 1 ∆V + min hm + u, ∇V i + ℓ + 12 |u|2 = β . 2 u∈Rd If V is regular then the above equation can be rewritten as 1 1 ℓ − |∇V |2 = β − ∆V − m · ∇V . (A.1) 2 2 36 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR It is shown in [3] that there exists a unique pair (V, β) ∈ C 2 (R) × R satisfying (A.1) such that V (x) → ∞ as |x| → ∞. The existence of this solution is established as a limit of the solution Vα to the discounted problem [2, p. 175] ℓ− 1 1 |∇Vα |2 = αVα − ∆Vα − m · ∇Vα . 2 2 (A.2) 2,s In [2, Theorem 4.18, p. 177] it is proved that (A.2) has a solution in Wloc (Rd ), 1 ≤ s < ∞. Then using standard elliptic pde results we can improve the solution so that Vα ∈ C 2 (Rd ). 1,2 The solution Vα to (A.2) is obtained in [2] as a strong Wloc (Rd ) limit of the solution to the Neumann boundary value problem ℓ− 1 1 |∇vR |2 = αvR − ∆vR − m · ∇vR , 2 2 ∂vR = 0 ∂n on in BR (0) , (A.3) ∂BR (0) , as R → ∞ along some sub-sequence. Here n denotes the unit outward normal on ∂BR (0). In fact, there exists vR ∈ L∞ (BR (0)) ∩ W 1,2 (BR (0)) solving the Neumann problem. In the following lemma we obtain an estimate of the growth of Vα . Lemma A.1. For every α > 0, the solution Vα of (A.2), which is obtained as a limit of vR as R → ∞, is bounded below and has at most linear growth. Proof. The proof mimics the calculations in [3]. By the proof of [3, Theorem 4.18] there exist constants κ0 and R0 which do not depend on α such that α vR (x) ≥ −κ0 ∀ x ∈ BR (0) , ∀ R > R0 , (A.4) and for all α ∈ (0, 1). We divide the proof in a number of steps. Step 1. The calculation here is based on the computations done in [3, pp. 179–180] (mentioned as Local bound from above). Choose 0 < 4r < 1. Consider a smooth function ψ such that ψ = 1 on B2r (x0 ), ψ = 0 outside of B4r (x0 ) and 0 ≤ ψ ≤ 1. It is clear that we can choose ψ such that |∇ψ| ≤ cr , where cr does not depends on x0 as we can translate this ψ to any point. Now we choose x0 in BR (0) such that |x0 | + 4r < R. Multiplying (A.3) with ψ 2 and integrating on BR (0) we have (see [2, p. 179]) Z Z Z Z Z 1 2 2 2 2 ℓ ψ2 , |∇vR | ψ = (m · ∇vR )ψ + (∇vR · ∇ψ)ψ − vR ψ + α 2 BR BR BR BR BR which can also be written as Z Z Z 2 (∇vR · ∇ψ)ψ − vR ψ + α B4r (x0 ) B4r (x0 ) B4r (x0 ) 1 + 2 Z (m · ∇vR )ψ 2 2 B4r (x0 ) |∇vR | ψ 2 = Z B4r (x0 ) ℓ ψ 2 . (A.5) CONTROLLED EQUILIBRIUM SELECTION 37 Applying Young’s inequality to the second and third terms of (A.5) and using (A.4) we obtain Z Z |∇vR |2 ψ 2 |∇vR |2 ≤ B4r (x0 ) B2r (x0 ) ≤ 4 Z 2 ℓψ + Z 2 B4r (x0 ) B4r (x0 ) Z ≤ κ1 (r) 1 + ℓψ B4r (x0 ) 2 |∇ψ − mψ| − α Z vR ψ B4r (x0 ) 2 ∀ α ∈ (0, 1) , and all R sufficiently large, where κ1 (r) is a constant depending on kmk∞ , κ0 , and r. Since ℓ is nonnegative and Lipschitz continuous we have ℓ(x) ≤ κ2 1 + |x| ≤ κ2 1 + |x0 | + |x − x0 | . Therefore for some constant κ3 (r), independent of R, we have Z |∇vR |2 ≤ κ3 (r) 1 + |x0 | ∀ α ∈ (0, 1) , (A.6) B2r (x0 ) and all R sufficiently large. Step 2. Let us first consider d = 1. Then vR is absolutely continuous. Since vR → Vα 1,2 strongly in Wloc (R) there exists a point, say 0, such that vR (0) → Vα (0) as R → ∞. Thus |vR (0)| is bounded. Now let x ∈ R and choose R such that |x| + 4r < R. Let N be such that 4rN ≥ |x| > 4r(N − 1). We have Z x ′ |vR (x)| ≤ |vR (0)| + vR (s) ds 0 ≤ |vR (0)| + = |vR (0)| + ≤ |vR (0)| + Z 4rN 0 ′ |vR (s)| ds N −1 Z 4r(i+1) X i=0 √ 4r 4ri ′ |vR (s)| ds N −1Z 4r(i+1) X i=0 4ri ′ |vR (s)|2 ds 1/2 So if we use the estimate (A.6) then it follows that √ 1/2 |vR (x)| ≤ |vR (0)| + 4rN κ3 (r)(2 + |x|) √ 1/2 4r(|x| + 4r) ≤ |vR (0)| + κ3 (r)(2 + |x|) 4r ≤ κ4 (r) 1 + |x|2 . . 38 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR Here, κ4 (r) does not depend on R. So in the limit we obtain |Vα (x)| ≤ κ4 (r) 1 + |x|2 and that Vα is bounded from below. Now rewriting (A.2) as h i 1 ∆Vα + min hm + u, ∇Vα i + ℓ + 12 |u|2 = αVα , 2 u∈Rd and using the fact that the control U ≡ 0 is sub-optimal, we obtain Z t Vα (x) ≤ Ex e−αs ℓ(Xs0 ) ds + Ex e−αt Vα (Xt0 ) . (A.7) 0 In (A.7) X 0 denotes the solution to (1.1) for U ≡ 0. It is straightforward to show using Hypothesis 1.1 that Z ∞ i Ex e−αs ℓ(Xs0 ) ds ≤ κ5 1 + |x| , 0 for a constant κ5 which depends on α, and that Ex |Xt0 |2 e−αt → 0 as t → ∞. Therefore from (A.7), using also the fact that Vα is bounded from below, we conclude that Vα has at most linear growth. Step 3. Let d ≥ 2. We can use the Green function in this case, as used in [3, p. 179]. Again we consider x0 ∈ BR (0) such that |x0 | + 4r < R. We consider a Green function Gx0 (x) on B2r (x0 ). We also use the estimates in [3, (4.83), p. 179] for Green functions where the constants depend only on r. Now we consider a test function η such that η = 1 on Br (x0 ), η = 0 outside of B3r/2 (x0 ), and 0 ≤ η ≤ 1. Now we test (A.3) with η 2 G and we have (the first expression on [3, p. 180]) Z Z Z 1 1 ∇vR · ∇η 2 G + div(vR ∇η 2 )G vR η 2 G + vR (x0 ) + α 2 2 B2r (x0 ) B2r (x0 ) B2r (x0 ) Z Z Z 1 2 2 2 ℓη 2 G . m · ∇vR η G = − |∇vR | η G + − 2 B2r (x0 ) B2r (x0 ) B2r (x0 ) At this point we observe that because of the choice of r, G is positive on B2r (x0 ) (for d = 2 we use that 2r < 1, see [3, p. 73]). Also m · ∇vR ≤ |m|2 + 14 |∇vR |2 . Therefore Z Z Z 1 1 vR η 2 G + vR (x0 ) + α ∇vR · ∇η 2 G + div(vR ∇η 2 )G 2 2 B2r (x0 ) B2r (x0 ) B2r (x0 ) Z Z κ2 (1 + |x0 | + 2r)η 2 G . |m|2 η 2 G + ≤ B2r (x0 ) B2r (x0 ) Now we observe that ∇η is nonzero for |x − x0 | ≥ r and x ∈ B3r/2 (x0 ) where we can bound G (from (4.83) of [3]). Again using the lower bound for vR and the gradient estimate in (A.6) and the L1 bound of G (see [2, (4.38)]) we have vR (x0 ) ≤ κ6 (r) 1 + |x0 | . Here, the constant κ6 (r) does not depend on R and x0 (Observe that one can translate η and G so that the bounds on ∇η, ∇2 η, and G do not change as x0 varies). Hence passing to the limit as R ր ∞, we obtain Vα (x) ≤ κ6 (r)(1 + |x|) and that Vα is bounded from below. This concludes the proof. CONTROLLED EQUILIBRIUM SELECTION 39 Lemma A.2. For every α > 0, Vα (x) → ∞ as |x| → ∞. Moreover, there exists a compact set B such that minRd Vα = minB Vα for all α ∈ (0, 1). Proof. Since Vα is bounded from below we may add a suitable constant to both sides of (A.2) and assume that Vα ≥ 0 (ℓ is translated by a constant accordingly). Define for x0 ∈ R d , ζ(x) := c1 (1 − |x − x0 |2 ), for |x − x0 | ≤ 1 , where c1 is a constant to be chosen later. Let θ(x) = Vα (x) − ζ(x) in B1 (x0 ). Then in B1 (x0 ) we have 1 1 1 − ∆θ − m · ∇θ + αθ = − ∆Vα − m · ∇Vα + αVα + ∆ζ + m · ∇ζ − αζ 2 2 2 1 1 = − |∇Vα |2 + ℓ + c1 (−2d) − 2c1 m · (x − x0 ) − αζ . 2 2 Therefore in B1 (x0 ) 1 1 1 − ∆θ − m · ∇θ + (∇Vα + ∇ζ) · ∇θ + αθ = − |∇ζ|2 + ℓ − dc1 − 2c1 m · (x − x0 ) − αζ 2 2 2 ≥ ℓ − 2c21 − M0 c1 − αc1 , where M0 is a constant that depends on the bound of m. Since ℓ(x) → ∞ as |x| → ∞ we can choose c1 = c1 (|x0 |) such that c1 (|x0 |) → ∞ as |x0 | → ∞ and ℓ(x) − 2c21 (x) − M0 c1 (x) − αc1 (x) > 0 ∀ x ∈ B1 (x0 ) . Therefore 1 (∇Vα + ∇ζ) · ∇θ + αθ > 0 in B1 (x0 ) . 2 Since θ ≥ 0 on ∂B1 (x0 ) and α > 0 applying the maximum principle we have θ ≥ 0 in B1 (x0 ). In particular, Vα (x0 ) ≥ c1 (|x0 |). This proves that Vα (x) → ∞ as |x| → ∞. In particular, Vα attains its minimum at some x̂ ∈ Rd . Also ∆Vα (x̂) ≥ 0. Thus from (A.2) we obtain 1 ℓ(x̂) = αVα (x̂) − ∆Vα (x̂) ≤ αVα (x̂) . (A.8) 2 RLet η0 be the invariant probability measure corresponding to the control U ≡ 0. Then Rd ℓ dη0 ≤ κ for some constant κ. Applying Lemma A.3 we obtain Z ∞ Z Z κ Ex e−αs ℓ(Xs0 ) ds dη0 ≤ Vα (x̂) ≤ Vα dη0 ≤ . α d R 0 −∆θ − m · ∇θ + Therefore using (A.8) we have ℓ(x̂) ≤ κ. The result follows using the fact that ℓ(x) → ∞ as |x| → ∞. Lemma A.3. For all x ∈ Rd , Vα (x) = inf EU x U ∈U U where E Z 0 ∞ e−αs ℓ(Xs ) + 12 |Us |2 ds , denotes the expectation under the law of (X, U ). 40 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR Proof. From Lemma A.1 we have |Vα (x)| ≤ M1 + M2 |x| and Vα is bounded from below. Let us write (A.2) in the form of i h 1 (A.9) ∆Vα + min hm + u, ∇Vα i + ℓ + 12 |u|2 = αVα . 2 u∈Rd Let U ∈ U be any admissible control such that Z ∞ 2 −αs U 1 ℓ(Xs ) + 2 |Us | ds < ∞ , e Ex 0 and Xt = x + Since EU x R t 0 Z t m(Xs ) ds + Z t Us ds + Wt , 0 0 |Us |2 ds is finite it is easy to see that Z t U U Ex sup |X(s)| ≤ C |x| + t + Ex |Us | ds < ∞, 0≤s≤t 0 Therefore multiplying both sides by e−αt we get −αt e t ≥ 0. EU x [|Xt |] −αt ≤ C(|x| + t)e −αt ≤ C(|x| + t)e −α t 2 + Ce EU x Z t s −α 2 e 0 |Us | ds Z 1/2 √ − α t U t −αs 2 2 + C te e |Us | ds Ex 0 −αt ≤ C(|x| + t)e √ α + C te− 2 t EU x Z t −αs e 0 2 |Us | ds 1/2 −−−→ 0 . t→∞ Now applying Itô’s formula to (A.9) we have −αt Vα (x) − e EU x [Vα (Xt )] Hence using Lemma A.1 we have Vα (x) ≤ EU x Z ≤ ∞ 0 EU x Z 0 t −αs e 2 1 ℓ(Xs ) + 2 |Us | ds . e−αs ℓ(Xs ) + 1 2 |Us |2 ds . (A.10) Since (A.10) holds for all U ∈ U, the desired upper bound to Vα is established. To show the converse inequality let vα be a measurable selector from the minimizer of (A.9). With τR denoting the first exit time from a ball of radius R we then have Z t∧τR 2 vα −αs 1 Vα (x) = Ex ℓ(Xs ) + 2 vα (Xs ) e ds + Evxα e−α(t∧τR ) Vα (Xt∧τR ) . (A.11) 0 We claim that τR → ∞, Pvxα -a.s. Suppose that τR 9 ∞ as R → ∞ in Pvxα . Then there exists t0 > 0 and ε0 > 0 such that Pvxα (τR < t0 ) > ε0 for all R > 0. Let R0 > 0 be such that min Vα > ∂BR0 2 αt0 e Vα (x) . ε0 CONTROLLED EQUILIBRIUM SELECTION 41 Such R0 exists by Lemma A.2. Since ℓ is nonnegative, we obtain Evxα e−α(t0 ∧τR0 ) Vα (Xt0 ∧τR0 ) > ε0 e−αt0 min Vα ∂BR0 > 2Vα (x) , which contradicts (A.11), since ℓ is nonnegative. Therefore τR → ∞, as R → ∞, in probability, and therefore also Pvxα -a.s., since it is monotone in R. Taking limits in (A.11) as R → ∞, and applying Fatou’s lemma, we obtain Z t 2 −αs vα 1 ℓ(Xs ) + 2 vα (Xs ) ds + Evxα e−αt Vα (Xt ) , e Vα (x) ≥ Ex 0 and taking limits once more as t → ∞, using the fact that Vα is bounded below, we have Z ∞ 2 −αs vα 1 ℓ(Xs ) + 2 vα (Xs ) e Vα (x) ≥ Ex ds . (A.12) 0 By (A.10) we must have equality in (A.12). Moreover (A.10) and (A.12) imply that every measurable selector from the minimizer of (A.9), and in particular the control Ut = −∇Vα (Xt ), is optimal for the α-discounted problem and that Vα is the associated value function. Since the solution of (1.1) under the control Ut = −∇Vα (Xt ) is regular, and since the drift m − ∇Vα is locally Lipschitz, it follows that (1.1) under the optimal α-discounted control Ut = −∇Vα (Xt ) has a unique strong solution. The following is a special case of the Hardy–Littlewood theorem [26, Theorem 2.2]. Lemma A.4. Let {an } be a sequence of non-negative real numbers. Then lim sup (1 − β) βր1 ∞ X n=1 β n an ≤ lim sup N →∞ N 1 X an . N n=1 Concerning the proof of Lemma A.4, note thatPif the right hand side of the above display n is finite then the set { ann } is bounded. Therefore ∞ n=1 β an in finite for every β < 1. Hence we can apply [26, Theorem 2.2] to obtain the result. Continuing, we define Z T 1 2 1 ℓ(Xs ) + 2 |Us | ds . E β∗ := inf lim sup U ∈U T →∞ T 0 To be more precise we should define β∗ as a function of the initial condition x. But we shall see that the infimum does not depend on the initial condition. Lemma A.5. Let β be given by (A.1). Then β ≤ β∗ . Proof. Fix x ∈ Rd . Let U ∈ U be such that Z T 1 2 1 ℓ(Xs ) + 2 |Us | ds < ∞ . Ex lim sup T →∞ T 0 It is easy to see that Z T Z N 1 1 2 1 lim sup ℓ(Xs ) + 2 |Us | ds = lim sup ℓ(Xs ) + Ex Ex T →∞ T N →∞ N 0 0 1 2 2 |Us | ds , 42 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR where N runs over the set of natural numbers. Define Z n 2 1 ℓ(Xs ) + 2 |Us | ds , an := Ex n−1 n ≥ 1. Therefore applying Lemma A.4 we have Z T ∞ X 1 2 1 β n an . lim sup Ex ℓ(Xs ) + 2 |Us | ds ≥ lim sup (1 − β) T →∞ T βր1 0 (A.13) n=1 For α > 0, define β = e−α and thus we have Z n ∞ ∞ X X −αs n −α 2 1 ℓ(Xs ) + 2 |Us | ds e Ex β an ≥ e n−1 n=1 n=1 −α = e Ex Z ∞ −αs e 0 1 2 2 ℓ(Xs ) + |Us | ds . Therefore combining (A.13) with Lemma A.3 we obtain Z T 1 2 1 lim sup ℓ(Xs ) + 2 |Us | ds ≥ lim sup (1 − e−α )e−α Vα (x) . Ex T →∞ T αց0 0 Since αVα (x) → β as α ց 0 from [2] we obtain from above that Z T 1 2 1 ℓ(Xs ) + 2 |Us | ds ≥ β . Ex lim sup T →∞ T 0 U ∈ U being arbitrary this concludes the proof. Proof of Theorem 2.2. From Lemma A.5 we have β ≤ β∗ where β is given by (A.1). It is easy to see that v∗ = −∇V is a minimizing selector of i h 1 ∆V + min hm + u, ∇V i + ℓ + 12 |u|2 = β , 2 u∈Rd and the HJB takes the following form 1 1 ∆V + hm, ∇V i − |∇V |2 + ℓ = β . 2 2 By [19, Lemma 5.1] we have |∇V (x)| ≤ C1 + C2 |x| for some constants C1 , C2 . Since v∗ is locally Lipschitz with at most linear growth there exists a unique strong solution to (1.1) with U = v∗ . Applying Itô’s formula to V we have Z T ∧τR β − ℓ(Xs ) − 21 |∇V (Xs )|2 ds , Ex V (XT ∧τR ) − V (x) = Ex 0 where τR denotes the exit time from the ball of radius R > 0 around 0. Since V is bounded from below and τR → ∞ as R → ∞ we have Z T 1 Ex ℓ(Xt ) + 12 |∇V (Xt )|2 dt ≤ β . lim sup T 0 CONTROLLED EQUILIBRIUM SELECTION 43 Hence β = β∗ . Also this proves that v∗ is an optimal stable Markov control and by the ergodic theorem Z ℓ(x) + 21 |∇V (x)|2 dη∗ , β = Rd where η∗ is the invariant probability measure corresponding to the control v∗ . Acknowledgment This work was initiated during Vivek Borkar’s visit to the Department of Electrical Engineering, Technion, supported by Technion. Thanks are due to Prof. Rami Atar for suggesting the problem as well as for valuable discussions. The work of Ari Arapostathis was supported in part by the Office of Naval Research through grant N00014-14-1-0196, and in part by a grant from the POSTECH Academy-Industry Foundation. The work of Anup Biswas was supported in part by an award from the Simons Foundation (# 197982) to The University of Texas at Austin and in part by the Office of Naval Research through the Electric Ship Research and Development Consortium. This research of Anup Biswas was also supported in part by an INSPIRE faculty fellowship. The work of Vivek Borkar was supported in part by a J. C. Bose Fellowship from the Department of Science and Technology, Government of India. References [1] A. Arapostathis, V. S. Borkar, and M. K. Ghosh. Ergodic control of diffusion processes, volume 143 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2011. [2] A. Bensoussan and J. Frehse. On Bellman equations of ergodic control in Rn . J. Reine Angew. Math., 429:125–160, 1992. [3] A. Bensoussan and J. Frehse. Regularity results for nonlinear elliptic systems and applications, volume 151 of Applied Mathematical Sciences. Springer-Verlag, Berlin, 2002. [4] R. Benzi, G. Parisi, A. Sutera, and A. Vulpiani. A theory of stochastic resonance in climatic change. SIAM J. Appl. Math., 43(3):565–478, 1983. [5] N. Berglund and B. Gentz. Metastability in simple climate models: pathwise analysis of slowly driven Langevin equations. Stoch. Dyn., 2(3):327–356, 2002. Special issue on stochastic climate models. [6] N. Berglund and B. Gentz. Noise-induced phenomena in slow-fast dynamical systems. A sample-paths approach. Probability and its Applications (New York). Springer-Verlag London, Ltd., London, 2006. [7] A. G. Bhatt and V. S. Borkar. Occupation measures for controlled Markov processes: characterization and optimality. Ann. Probab., 24(3):1531–1562, 1996. [8] A. Biswas and V. S. Borkar. Small noise asymptotics for invariant densities for a class of diffusions: a control theoretic view. J. Math. Anal. Appl., 360(2):476–484, 2009. Erratum at arXiv:1107.2277. [9] B. Z. Bobrovsky, M. M. Zakai, and O. Zeitouni. Error bounds for the nonlinear filtering of signals with small diffusion coefficients. IEEE Trans. Inform. Theory, 34(4):710–721, 1988. [10] V. S. Borkar. Optimal control of diffusion processes, volume 203 of Pitman Research Notes in Mathematics Series. Longman Scientific & Technical, Harlow; copublished in the United States with John Wiley & Sons, Inc., New York, 1989. [11] V. S. Borkar. Probability theory. An advanced course. Universitext. Springer-Verlag, New York, 1995. [12] R. W. Brockett. Finite dimensional linear systems. John Wiley & Sons, 1970. [13] S. Cerrai and M. Röckner. Large deviations for invariant measures of stochastic reaction-diffusion systems with multiplicative noise and non-Lipschitz reaction term. Ann. Inst. H. Poincaré Probab. Statist., 41(1):69–105, 2005. [14] M. V. Day. Recent progress on the small parameter exit problem. Stochastics, 20(2):121–150, 1987. [15] J. Feng, M. Forde, and J.-P. Fouque. Short-maturity asymptotics for a fast mean-reverting Heston stochastic volatility model. SIAM J. Financial Math., 1(1):126–141, 2010. 44 ARI ARAPOSTATHIS, ANUP BISWAS, AND VIVEK S. BORKAR [16] M. I. Freidlin and A. D. Wentzell. Random perturbations of dynamical systems, volume 260 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, New York, second edition, 1998. [17] N. Ichihara. Large time asymptotic problems for optimal stochastic control with superlinear cost. Stochastic Process. Appl., 122(4):1248–1275, 2012. [18] T. G. Kurtz and R. H. Stockbridge. Existence of Markov controls and characterization of optimal Markov controls. SIAM J. Control Optim., 36(2):609–653, 1998. [19] G. Metafune, D. Pallara, and A. Rhandi. Global properties of invariant measures. J. Funct. Anal., 223(2):396–424, 2005. [20] K. R. Meyer. Energy functions for Morse Smale systems. Amer. J. Math., 90:1031–1040, 1968. [21] F. Moss. Stochastic resonance: From the ice ages to the monkey’s ear. In George H. Weiss, editor, Contemporary Problems in Statistical Physics, chapter 5, pages 205–253. SIAM, Philadelphia, 1994. [22] E. Olivieri and M. E. Vares. Large deviations and metastability, volume 100 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2005. [23] Z. Schuss. Theory and applications of stochastic differential equations. John Wiley & Sons, Inc., New York, 1980. Wiley Series in Probability and Statistics. [24] S. J. Sheu. Asymptotic behavior of the invariant density of a diffusion Markov process with small diffusion. SIAM J. Math. Anal., 17(2):451–460, 1986. [25] S. Smale. On gradient dynamical systems. Ann. of Math. (2), 74:199–206, 1961. [26] R. Sznajder and J. A. Filar. Some comments on a theorem of Hardy and Littlewood. J. Optim. Theory Appl., 75(1):201–208, 1992. [27] O. Zeitouni and M. Zakai. On the optimal tracking problem. SIAM J. Control Optim., 30(2):426–439, 1992. Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station, Austin, TX 78712 E-mail address: Department of Mathematics, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pune 411008, India E-mail address: Department of Electrical Engineering, Indian Institute of Technology, Powai, Mumbai E-mail address: