Estimation of the degree of jump activity, for irregularly sampled processes in presence of noise Jean Jacod – Viktor Todorov UPMC (Paris 6) – Northwestern University The aim Make inference on the jumps of a 1-dimensional process Z t Z t Xt = X0 + bs ds + σs dWs + jumps driven by a Poisson measure 0 0 observed at discrete times within the fixed time interval [0, 1], Three features: • The Brownian part Rt 0 σs dWs is typically dominating, but we are interested in jumps. • The sampling times 0 = T (n, 0) < T (n, 1) < · · · < T (n, i) < · · · may be irregular, possibly random. • There is a microstructure noise: instead of XT (n,i) we observe Yin = XT (n,i) + χni (so tick-by-tick data, or all transactions data, can be used to do inference. Notation ∆(n, i) = T (n, i) − T (n, i − 1) ∆ni V = VT (n,i) − VT (n,i−1) ∆Vt = Vt − Vt− V : any process V : any càdlàg process Regular sampling means ∆(n, i) = ∆n . Spot Lévy measures of X: the compensator ν of the jump measure of X is assumed to have the factorization ν(ω, dt, dx) = dt Fω,t (dx) (this is the “Itô semimartingale property” for the jumps). The measures Ft = Fω,t are Rt R the spot Lévy measures, and 0 ds (x2 ∧ 1) Fs (dx) < ∞. Warning: We do not want to “estimate” the jumps ∆Xt themselves, since • the “big jumps” over [0, 1] have no predictive value for the model. • the “small” jumps are infinitely many, hence impossible to “estimate”. Rather, we will estimate the “degree of activity” of the jumps, and also the “intensity” of the small jumps (as specified below). Assumptions on X (typically a log-price) We have a filtered space (Ω, F, (Ft )t≥0 , P) on which: Z t Z t Z tZ Z tZ Xt = X0 + bs ds + σs dWs + δ(s, z)(p q )(ds, dz) + δ 0 (s, z)p = −= = (ds, dz) 0 Z σt = σ0 + 0 t 0 Z bσs ds+ 0 t 0 Hsσ dWs0 + E Z tZ 0 E δ σ (s, z)(p = −q = )(ds, dz)+ 0 E Z tZ 0 E δ 0σ (s, z)p = (ds, dz) • W and W 0 are two correlated Brownian motions. • =p is a Poisson measure on R+ × E with (deterministic) compensator =q (dt, dz) = dt ⊗ η(dz) (η is a σ-finite measure on the Polish space E). • “standard” assumptions on the coefficients bt , bσt , Htσ : locally bounded, and (up to localization). bt , Htσ satisfy for all finite stopping times T ≤ S: E( sup |Vs − VT |2 ) ≤ KE(S − T ) (1) s∈[T,S] 0 • |δ(t, z)|r , 1{δ0 (t,z)6=0} ≤ J(z) for some non-random η-integrable function J and some r0 ∈ [0, 2), and the same for δ σ , δ 0σ (up to localization). MOREOVER, we need a structural assumption on the high-activity jumps of X, expressed in terms of the BG (Blumenthal-Getoor) index, or successive BG indices: There is an integer M ≥ 0, numbers 2 > β1 > · · · βM > 0, and nonnegative càdlàg m 1/βm processes a1t , . . . , aM satisfies (2), and the symmetrized Lévy t , such that each (at ) measures F̆t (A) = Ft (A) + Ft (−A) are such that the (signed) measure M X βm am t 0 1{0<|x|≤1} dx Ft (dx) = F̆t (dx) − 1+β m |x| m=1 satisfies (|Ft0 | being the “absolute value” of Ft0 ): |Ft0 |([−x, x]c ) ≤ Γ xr ∀x >∈ (0, 1]. Example: Z Z t bs ds + Xt = X0 + 0 t σs dWs + 0 M Z X m=1 t 0 m γs− dYsm + Z tZ 0 E δ 0 (s, z)p = (ds, dz) with Y m independent stable or tempered stable processes (with arbitrary dependencies with =p , and indices βm ) and γ m ’s are Itô semimartingales. m βm We then have am (up to a multiplicative constant). t = |γt | REGULAR OBSERVATIONS – NO NOISE Choose a sequence kn of integer (→ ∞) and set kX n −1 p 1 n n n L(y)j = cos(y(∆i+1+2l X − ∆i+2+2l X)/ ∆n ) kn l=0 We take differences of two successive returns to “symmetrize” the jump measure around zero and kill the drift. R ∞ sin y We have approximately, with χ(β) = 0 yβ dy: Ã 2 E(L(y)ni | Fi ) ≈ exp −y 2 σi∆ −2 n M X ! m /2 m ai∆n χ(βm )y βm ∆1−β n m=1 hence the following is a estimator of the spot (squared) volatility ct = σt2 for t ≈ i∆n : b c(y)nj ³ ´ _ 1 1 n = − 2 log L(y)j y log(1/∆n ) Introducing a de-biasing term, we set b nt = 2kn ∆n C(y) [t/vn ]−1 ³ X b c(y)nj j=0 where sinh(x) = 1 2 ¢ ´ 1 ¡ 2 n 2 − 2 sinh(y b c(y)j ) , y kn (ex − e−x ) is the hyperbolic sine. This “estimates” Ct + M X Am,n (y)t , where m=1 Z m,n A (y)t = y βm −2 m /2 ∆1−β Am n t , Am t t = 2χ(βm ) 0 and the normalized error term is M ³ ´ X 1 n n m,n b C(y)t − Ct − Z(y)t = √ A (y)t . ∆n m=1 am s ds The CLT: Let Y be a finite subset of (0, ∞). Choose kn and un such that √ p k n ∆n → 0. kn ∆n → 0, kn ∆1/2−ε → ∞ ∀ε > 0, u → 0, n n u2n Theorem We have the (functional) stable convergence in law: ³ ´ ¡ 1 ¢ ¢ L−s ¡ n n 2 Z(un ) , 2 (Z(yun ) − Z(un ) ) y∈Y =⇒ Z, ((y − 1)Z)y∈Y , un n e of the original space (Ω, F, (Ft )t≥0 , P) e F, e (Fet )t≥0 , P) where the limit is defined on an extension (Ω, and can be written as Z t Z t 2 Zt = √ c2s dWs(2) . Zt = 2 cs dWs(1) , 3 0 0 where W (1) and W (2) are two independent Brownian motions, independent of the σ-field F. Estimation of β1 (when M = 1 for simplicity): Observe that b n y)nt − C(u b n )t C(y)nt = C(u p = A (un y) − A (un )t + ∆n (Z(yun )nt − Z(un )t )) p β1 −2 β1 −2 1−β1 /2 1 = un (y − 1)∆n At + ∆n (Z(yun )nt − Z(un )t )) m,n m,n x ∞ The function f (x) = 24x −1 on (1, 2), with a C ∞ reciprocal function f −1 . −1 is C Then a natural estimator for β1 is, for example, ³ n´ n −1 C(4)t b βt = f C(2)nt and we have unβ1 −2 (β −1)/2 ∆n 1 ¡ βbtn − β1 ¢ L−s −→ Y for some mixed centered normal y with a known (conditional) variance. The rate is (β −1)/2 thus almost 1/∆n 1 , better than rates obtained by other methods when β1 > 4/3. √ The choice kn ≈ 1/ ∆n and un going to 0 very slowly is in fact not “optimal” for this problem, so better (but complicated...) rates can actually be achieved. One can also estimate A1t = 2χ(β1 ) bn,1 A = C(2)nt t (on similarly estimate Rt 0 Rt 0 a1s ds, using for example 1 bn β un1 − bn 2)(2β1 −2 − bn /2 1−β 1)∆n 1 a1s ds). bn,1 Then we have a CLT for A t , with the same kind of limit, and the same rate divided by log((1/∆n ) to an appropriate power. NOISE AND IRREGULAR SAMPLING Assumptions on the observation scheme Assumption: The inter-observations lags are ∆(n, i + 1) = ∆n λT (n,i) Φni+1 • λt positive càdlàg adapted, satisfying E(|λT − λS |) ≤ E(T − S) for all stopping times S ≤ T and 1/K ≤ λt ≤ K (up to localization). • The variables Φni are positive, E(Φni ) = 1, and supn,i E(Φni )p ) < ∞ for all p > 0. • The variables (Φni : i ≥ 1) are mutually independent and independent of F∞ This implies in particular that Ntn = ∆n Ntn P 1{T (n,i)≤t} satisfies Z t 1 u.c.p. =⇒ Λt := ds. λ s 0 i≥1 n the σ-field generated by F∞ and all Φni : i ≥ 1). We denote by H∞ Two assumptions on the noise We observe Yin = XT (n,i) + γT0 (n,i) εni , where (N1): The process γ 0 is an Itô semimartingale, and the variables (εni : i ≥ 1) are i.i.d., n independent of H∞ , with E(εni ) = 0, E((εni )2 ) = 1, E(|εni |p ) < ∞. n (N2): The variables (εni : i ≥ 1) are independent, conditionally on H∞ with (p) n E((εni )p | H∞ ) = γT (n,i) , n P(εni ∈ B | H∞ ) = P(εni ∈ B | HTn (n,i) ). (where (Htn ) is the smallest filtration containing (Ft ) and for which all T (n, i) are (1) (2) (p) stopping times. Moreover, γt = 0 and γt = 1; moreover γt0 and all γt satisfy (2) and are (Ft )-adapted. (N1) is much stronger than (N2), and not very realistic. Later on, γt = (γt0 )2 (this is the “variance” of the noise). An important example satisfying (N2) but not (N1). P j Let ρt ≥ 0 be càdlàg adapted nonnegative with j∈Z ρjt = 1 and ρjt = ρ−j and t P n supt j∈Z ρjt |j|p < ∞. For each n let (Zin : i ≥ 1) be i.i.d. conditionally on H∞ , with P density x 7→ j∈Z ρjT (n,j) 1[j,j+1) (x). The observation at time T (n, i) is Yin = [XT (n,i) + Zin ], so we have a additive (modulated) white noise plus rounding. Remark: If we have “pure rounding”, i.e. if we observe [XT (n,i) ] (or [XT (n,i) ] + “center” the noise), then no consistent estimator for Ct exists. 1 2 to Pre-averaging We choose 3 tuning parameters un , hn , kn all going to ∞: here un > 0 will be the argument of the empirical characteristic function below, and hn , kn (two integers) are window sizes. The de-noising method is pre-averaging, but other methods could probably be used as well. Take a weight (or, kernel) function g on R with g is continuous, piecewise C 1 with a piecewise Lipschitz derivative g 0 , R1 2 g(s) ds > 0, s∈ / (0, 1) ⇒ g(s) = 0, 0 for example g(x) = (x ∧ (1 − x)) 1[0,1] (x). With a sequence hn → ∞ of integers, set gin = g(i/hn ), P φn = h1n i∈Z (gin )2 , R φ = g(u)2 du, n g ni = gi+1 − gin P φn = hn i∈Z (g ni )2 , R 0 2 φ = g (u) du, P (β) φen = h1n i∈Z |gin |β , R (β) e φ = |g(u)|β du The pre-averaged returns of the observed values Yin are Yein = hX n −1 j=1 n n gjn (Yi+j − Yi+j−1 )=− hX n −1 j=0 n g nj Yi+j . Initial estimators For any y > 0, set L(y)ni kn −1 ¡ ¢ 1 X n en cos yun (Yei+2lh − Y ) = i+(2l+1)hn n kn l=0 (a proxy for the real part of the empirical characteristic function of the returns, over a window of 2hn kn successive observations). Taking a difference above allows us to “symmetrize” the problem. Then, for any y > 0, a natural estimator for the “integrated volatility” over the time interval [T (n, i), T (n, i + 2hn kn )] is 2kn n v(y) i, y 2 u2n φn v(y)ni ³ = − log L(y)ni _ 1 ´ . hn We need to de-bias these estimators, to account for the noise, and also ¡for some intrinsic¢ distortion present even when there is no noise. With f (x, y) = 12 e2x−y + e2x − 2 , set b nt = C(y) kn y 2 u2n φn [Ntn /2h n kn ]−1 ³ P j=0 1 kn f (v(y)n2jhn kn , v(2y)n2jhn kn ) ´ P 2 2 n 2 −φn y un l=1 (∆2jhn kn +l ) . 2v(y)n2jhn kn − Finally, we set M X 2 b n − Ct − Z(y)nt = C(y) |y|βm −2 uβnm −2 φeβnm Am t t φn m=1 The basic CLT Below, Y is any finite subset of (0, ∞). Theorem under appropriate conditions on un , hn , kn , plus uβn1 h3n ∆n uβn1 h3n ∆n + u4n (1 and with s vn = kn + h2n ∆n )2 → η, h2 ∆n 0 → η 1 + h2n ∆n h3n ∆n u4n (1 + h2n ∆n )2 + uβn1 h3n ∆n ¢ ¡ n for any t > 0 the variables vn Z(y)t y∈Y converge F∞ -stably in law to (Z(y)t )y∈Y , e and is, conditionally on F, centered Gaussian e F, e P) which is defined on an extension (Ω, with variance-covariance given by 0 e E(Z(y) t Z(y )t | F) Z t ¡ ¡ ¢2 1 = ηψ(β1 , y, y 0 )a1s λs + (1 − η)y 2 y 02 η 0 φσs2 λs + (1 − η 0 )φγs ds λ s 0 Estimation of β1 Assume again M = 1. We use estimator analogous to the estimators in the regular no-noise case, that is ³ C(4)n ´ t n −1 . βbt = f C(2)nt where f (x) = 4x −1 2x −1 and b n − C(1) b t C(y)nt = C(y) t ¡ β −2 β −2 ¢ 1−β1 /2 1 2 n 1 1 − 1)∆n At + (Z(y)t − Z(1)t )) = φn un (y and we have (when the basic CLT holds): uβn1 /2 ¡ βbtn − β1 ¢ L−s −→ Y for some mixed centered normal Y with a known (conditional) variance. We similarly have estimators with an analogous CLT for A1t . Problem: Choose un , hn , kn ) with the appropriate conditions, plus un as large as possible. Reporting only the rate for β1 , we find the following (sub)-optimal rates (depending on the value of β1 , and on an arbitrary ε > 0). β1 (1−ε) 10−4β1 1/∆n β1 • If σt ≡ 0 and (N1): 1/∆n7 (1−ε) β1 (1−ε) 4+2β1 1/∆n 3β1 (1−ε) 30−11β1 1/∆n 3β1 (1−ε) 21+β1 • If σt ≡ 0 and (N2): 1/∆n 3β1 (1−ε) 12+7β1 1/∆n if β1 ≤ if 3 4 ≤ β1 ≤ if β1 ≥ if β1 ≤ if 3 4 3 4 3 2 3 4 ≤ β1 ≤ if β1 ≥ 3 2 3 2 3 2 β1 (1−ε) 16−5β1 • If σt not 0 and (N1): 1/∆n 3β1 (1−ε) 32−4β1 1/∆n • If σt not 0 and (N2): 3β1 (1−ε) 48−11β1 1/∆n if β1 ≤ 16 11 if β1 ≥ 16 11 Some problems: 1 - Optimality concerning β1 2 - When M ≥ 2, what does the rate for β1 become ?, and can we estimate β2 , β3 , . . . ? 3 - How to choose in practice (un hn , kn ) (so far, only “mathematical” results are known.