Acta Mathematicae Applicatae Sinica, English Series Vol. 20, No. 2 (2004) 337–352 Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs Li-xin Zhang Department of Mathematics, Zhejiang University, Hangzhou 310028, China (E-mail: lxzhang@mail.hz.zj.cn) Abstract The strong approximations of a class of Rd -valued martingales are considered. The conditions used in this paper are easier to check than those used in [3] and [9]. As an application, the strong approximation of a class of non-homogenous Markov chains is established, and the asymptotic properties are established for the multi-treatment Markov chain adaptive designs in clinical trials. Keywords Martingale, Non-homogenous Markov chain, Wiener processes, strong approximation, adaptive designs, asymptotic properties 2000 MR Subject Classification 1 60F15, 60G42, 62L05, 62G20. Intrduction and Main Results X n , Fn ; n ≥ 1} is a square-integrable sequence of Throughout this paper, we assume that {X Rd -valued martingale differences, defined on (Ω, F , P), F0 is the trivial σ-field. The probability space (Ω, F , P) is also assumed to be rich enough such that there is a uniformly distributed m+n P X n ; n ≥ 1}. Denote S n (m) = X k and random variable U , which is independent of {X k=m+1 S n = S n (0). Let and the trace of σ n is defined as σ n = E X 0nX n |Fn−1 , X n k2 Fn−1 , tr(σ n ) = E kX uk = (u21 + · · · + u2d )1/2 for u ∈ Rd . Also, denote Σn = where ku Vn = tr(Σn ) = n X k=1 tr(σ k ) = n P σ k , Σn (m) = k=1 m+n P σ k and k=m+1 n X X k k2 Fk−1 . E kX k=1 Ak its maximum norm. For a d × d matrix A, denote by kA If A is a covariancematrix, the uk = 1 . maximum norm is equivalent to the norm defined by max uAu0 : u ∈ Rd , ku The first strong approximation theorem for martingales can be found in [13]. In the case d = 1, suppose Vn → ∞ a.s. and that {Xn } satisfies a kind of linderberg condition. Using the Skorohod embedding theorem Strassen[13] proved that if the underlying probability space Manuscript received October 20, 2003. Revised April 1, 2004 Supported by The National Natural Science Foundation of China ( No. 10071072). 338 L.X. Zhang has rich enough structure then the martingale can be approximated by a standard Brownian motion scaled according to the conditional variance of the given martingale sequence, i.e., X Xm I Vm ≤ n − B(n) = o(n1/2 ) a.s. (1.1) m≥1 For d ≥ 2, Monrad and Philipp[8] proved that it is impossible to embed a general Rd -valued martingale in an Rd -valued Gaussian process. However, Morrow and Philipp[10] , Eberlein[3] . Monrad and Philipp[9] etc established some approximations for Rd -valued martingales. By using a result of Eberlein[3] (see [9]), one can obtain Theorem A. Suppose X n k2+δ < ∞ sup EkX for some δ > 0. (1.2) n≥1 and there exists some covariance matrix T , measurable with respect to Fk for some k ≥ 0, and some 0 < θ < 1 such that uniformly in m E Σn (m)|Fm − nT T 1 = O(n1−θ ), (1.3) Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random then there exist a κ > 0 and a sequence {Y T vectors, independent of such that X Sn − Y mT 1/2 = O(n1/2−κ ) a.s. (1.4) m≤n However, it is not easy to show that (1.3) holds uniformly in m (even when d = 1). In many cases, we only know that Σn /n or Σn /Vn is close to some covariance matrix T . Morrow and Philipp[10] established an approximation similar to (1.1) under the condition that Σn → T. Vn Proposition 1. Let f be a non-decreasing function with f (x) → ∞ as x → ∞, and such that f (x)(log x)α /x is non-increasing for some α > 50d. Suppose that Vn → ∞ a.s., and X X n k2 I kX X n k2 ≥ f (Vn ) /f (Vn ) < ∞, E kX (1.5) n≥1 also, suppose there exists some covariance matrix T , measurable with respect to Fk for some k ≥ 0, such that E max exp kΣn − T Vn k/f (Vn ) < ∞, (1.6) n≥1 Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random vectors then there exists a sequence {Y independent of T , such that X X 1 X m I{Vm ≤ n} − Y mT 1/2 = O n1/2 f (n)/n 50d a.s. (1.7) m≥1 m≤n Though Monrad and Philipp[9] weakened Condition (1.6) to that for some 0 < p ≤ 1, p E max kΣn − T Vn k/f (Vn ) < ∞. (1.8) n≥1 Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 339 Since it is difficult to compute the expectation of the maximum of a sequence of random variables, (1.8) is not easy to satisfy. It should be noticed that (1.6) or (1.8) implies kΣn − T Vn k = O f (Vn ) a.s. and that they can not be implied from kΣn − T Vn k = O f (Vn ) a.s. or kΣn − T Vn k = o f (Vn ) a.s. The aim of this paper is to replace Conditions (1.3), (1.6) and (1.8) by some related conditions which are easier to verify, another aim is to replace (1.5) by a kind of conditional Lindberg condition. Our first result is as follows. Theorem 1.1. Let f be a non-decreasing function such that f (x) → ∞ as x → ∞, f (x)(log x)α /x is non-increasing for some α > 50d and f (x)/xε is non-decreasing for some 0 < ε < 1. Suppose Vn → ∞ a.s. and X X n k2 I{kX X n k2 ≥ f (Vn )}/f (Vn )Fn−1 < ∞ E kX a.s. (1.9) n≥1 Moreover, assume that there exists some covariance matrix T , measurable with respect to Fk for some k ≥ 0, such that kΣn − T Vn k = o f (Vn ) a.s, (10) Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random Then there exists a sequence {Y vectors, independent of T , such that X m≥1 X m I{Vm ≤ n} − X Y mT 1/2 = O n1/2 f (n)/n m≤n 1 50d a.s. (11) If Σn /n is near T , then we have the following result. Theorem 1.2. Let f be a non-decreasing function such that f (x) → ∞ as x → ∞, f (x)(log x)α /x is non-increasing for some α > 50d and f (x)/xε is non-decreasing for some 0 < ε < 1. Suppose X X n k2 I kX X n k2 ≥ f (n) /f (n)Fn−1 < ∞ E kX a.s. (1.12) n≥1 In addition, suppose also there exists some covariance matrix T , measurable with respect to Fk for some k ≥ 0, such that T k = o f (n) kΣn − nT a.s. (1.13) Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random Then there exists a sequence {Y vectors independent of T such that Sn − X m≤n Y mT 1/2 = O n1/2 f (n)/n 1 50d a.s. (1.14) The next theorem tells us that the uniformity condition (1.3) in Theorem A can be replaced T k1 = O(n1−θ ). by kΣn − nT Theorem 1.3. Suppose there exist constants 0 < θ, ε < 1 and a covariance matrix T , measurable with respect to Fk for some k ≥ 0, such that T = O(n1−θ ) a.s. or kΣn − nT T k1 = O(n1−θ ), Σn − nT ∞ X X n k2 I{kX X n k2 ≥ n1−ε Fn−1 /n1−ε < ∞ E kX a.s. n=1 (1.15) (1.16) 340 L.X. Zhang Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random vectors Then there exists a sequence {Y T independent of such that Sn − X Y mT 1/2 = O(n1/2−κ ) a.s, m≤n where κ > 0 is a constant depending only on θ, ε and d. Corollary 1.1. Suppose there exists a constant 0 < ε < 1 such that (1.16) is satisfied, and T is a covariance matrix measurable with respect to Fk for some k ≥ 0. Then for any δ > 0, Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random there exists an κ > 0 and a sequence {Y vectors independent of T such that Sn − X m≤n Y mT 1/2 = O n1/2−κ + O αn1/2+δ a.s, T k. where αn = max kΣm − mT m≤n 2 Proofs X k k2 ≤ f (Vk ) − E X k I{kX X k k2 ≤ f (Vk )}|Fk−1 and Proof of Theorem 1.1. Let X k = X k I kX b k = X k − X k for each k ≥ 1. By Condition (1.9) and the conditional Borel-Cantelli lemma[5] , X X k k2 > f (Vk ), i.o. = 0 P kX n X X k k2 > f (Vk ) |Fk−1 = O(1) E X k I kX and a.s. k=1 It follows that n X k=1 bk = X n X k=1 X k I kX X k k2 > f (Vk ) − E X k I kX X k k2 > f (Vk ) |Fk−1 = O(1) a.s. Also, by (1.9) again, n n X X 0 0 b X b E X kX k |Fk−1 =Σn + EX k k |Fk−1 k=1 k=1 n X X k k2 I{kX X k k2 ≥ f (Vk )}|Fk−1 =Σn + O(1) E kX k=1 =Σn + o f (Vn ) a.s. X k k ≤ 2f (Vk ), we can assume that kX X k k2 ≤ f (Vk ), k = 1, 2, · · ·, for otherwise, we can Notice kX replace X k by X k and f (x) by 2f (x). Z k = (Zk1 , · · · , Zkd ); k ≥ 1} be a sequence of i.i.d. Rd -valued random vectors, Now, let {Z such that, for each k, Zk1 , · · · , Zkd are also i.i.d. random variables with P(Zki = ±1) = 1/2. Denote αn = max kΣk − VkT k, let τ0 = 0, τk = max{n ≥ τk−1 ; Vn ≤ k}, k ≥ 1, then each τk is k≤n a stopping time respect to Fn−1 and n ≥ Vτn = Vτn +1 − tr(σ τn +1 ) ≥ n − f (Vτn +1 ) ≥ n − f (n + 1). Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 341 Let νk = τ2k . If m ≤ ν0 = τ1 , then define βm = m P i=1 T tr(σ i )k, δm = 1, Im = I{βm = 0} kσi −T c and Im = 1 − Im , if νk < m ≤ νk+1 , k ≥ 0, then define δm = 1 + f (Vνk ), βνk +1,m = max νk +1≤l≤m βm = βν0 + k−1 X l X i=νk +1 σ i − T tr(σ i ) , βνl +1,νl+1 + βνk +1,m , l=0 Im = I{βm ≤ δm } = I{βm ≤ 1 + f (Vνk )}, c Im = 1 − Im . and e k = X k Ik + I cZ kT 1/2 tr(σ k ), An = σ(Fn , Z 1 , · · · , Z n ), S en (m) = Let X k en (0). Then S So e n =: Σ k=m+1 e k and S en = X n n n h 0 i X X X e X e k |Ak−1 = EX I σ + IkcT tr(σ k ). k k k k=1 k=1 k=1 n n n X X X en = T )tr(σ k ) = Ven = tr Σ Ik tr(σ k ) + Ikc tr(T tr(σ k ) = Vn k=1 and m+n P First, we show that k=1 e n − T Ven = Σ n X k=1 k=1 Ik σ k − T tr(σ k ) . e n − T Ven ≤ βn . Σ It is obvious that n X Σ e n − T Ven ≤ kσi − T tr(σ i )k = βn , i=1 (2.1) if n ≤ ν0 . Suppose n > ν0 and (2.1) holds for 1, 2, · · · , n − 1. We shall that show (2.1) holds for n. We can find some k ≥ 0 such that νk + 1 ≤ n ≤ νk+1 . Notice that on {βn ≤ δn }, βm ≤ βn ≤ δn = δm c for all νk + 1 ≤ m ≤ n, which implies Im = 1, Im = 0 if νk + 1 ≤ m ≤ n. So by induction, n X e ν − T Veν + e n − T Ven In =Σ Σ k k i=νk +1 σ i − T tr(σ i ) In e ν − T Veν + βν +1,n ≤ βν + βν +1,n = βn . ≤Σ k k k k k However on {βn > δn }, In σ n − T t(σ n ) = 0. So e n − T Ven I c = Σ e n−1 − T Ven−1 I c ≤ βn−1 ≤ βn . Σ n n Hence (2.1) is proved. Next, we show that Σ e n − T Ven ≤ δn ≤ 1 + f (Vn ). (2.2) e n − T Ven k = 0 if n ≤ ν0 . If n > ν0 , assume (2.2) holds for 1, 2, · · · , n − 1, It is obvious that kΣ then for n, we can find some k ≥ 0 such that νk + 1 ≤ n ≤ νk+1 . So e n − T Ven In ≤ βn I{βn ≤ δn } ≤ δn Σ 342 L.X. Zhang by (2.1), and Σ e n − T Ven I c = Σ e n−1 − T Ven−1 I c ≤ δn−1 ≤ δn n n by induction. (2.2) is now proved. X n k2 ≤ f (Vn ) and tr(σ n ) = E kX X n k2 |Fn−1 ≤ f (Vn ), we have On the other hand, since kX 2 e n ≤ f (Vn )In + I c f (Vn )tr(T X T 1/2 ) ≤ Cf (Vn ) = Cf (Ven ). n (2.3) Y n ; n ≥ 1} of i.i.d. Rd -valued By (2.2), (2.3) and Proposition 1.1, there exists a sequence {Y standard normal random vectors independent of T such that X m≥1 e m I{Vm ≤ n} − X X Y mT 1/2 = O n1/2 f (n)/n m≤n 1 50d a.s. (2.4) Finally, notice that Σn − T Vn = o f (Vn ) a.s. It follows that αn = o(f (Vn )) a.s. So, for νk + 1 ≤ n ≤ νk+1 , βn ≤βν0 + =o(1) k−1 X l=0 k X βνl +1,νl+1 + βνk +1,n ≤ βν0 + 2 f (Vνl ) + o(1)f (Vn ) k−1 X ανl+1 + 2αn l=0 a.s. l=1 Also, notice that f (x)/xε is non-decreasing, and Vνl ∼ 2l , we have k X l=1 f (Vνl ) ≤C k k l=1 l=1 f (Vn ) X ε f (Vn ) X lε V ≤ C 2 ν l Vnε Vnε f (Vn ) kε f (Vn ) ε ≤C 2 ≤C V ≤ Cf (Vn ). Vnε Vnε νk On the other hand, f (x)(log x)α /x is non-increasing, we have for νk + 1 ≤ n ≤ νk+1 , f (Vn ) ≤ f (Vνk+1 ) ≤ It follows that (log Vνk )α /Vνk f (Vνk ) = O(1)f (Vνk ) = O(δn ) (log Vνk+1 )α /Vνk+1 βn = o f (Vn ) = o(δn ) e n , i.o.) = 0. Therefore X n 6= X Hence P(X X m≥1 X m I{Vm ≤ n} − X a.s. Y mT 1/2 = O n1/2 f (n)/n m≤n a.s. 1 50d a.s. The proof of Theorem 1.1 is now completed. T ) = 1. By (1.13), Vn − n = Proof of Theorem 1.2. Without loss of generality, we assume tr(T o f (n) a.s., so Conditions (1.9) and (1.10) are satisfied if necessary, we can replace f (x) by C1 f (C2 x) . First, we assume X n k2 ≤ f (n) a.s. kX and T k ≤ f (n) a.s., kΣn − nT n ≥ 1. (2.5) Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 343 Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal By Theorem 1.1, there exists a sequence {Y T random vectors independent of such that (1.11) holds. That is τn X m=1 Xm − X Y mT 1/2 = O n1/2 f (n)/n m≤n 1 50d a.s., (2.6) where τ0 = 0, τk = max{n ≥ τk−1 ; Vn ≤ k}, k ≥ 1. Next, we show that n X τn X Xm + m=τn +1 X m = O n1/2 f (n)/n m=n+1 1 50d a.s., (2.7) j P which, together with (2.6), implies (1.14). Here (·) = 0 for j < i. m=i Notice that Vn − n = o f (n) a.s. and n ≥ Vτn ≥ n − f (n + 1). We have τn − n = O f (n) a.s. To prove (2.7), it suffices to show that for any K ≥ 1, max |k|≤Kf (n) 1 S n+k − S n = O n1/2 f (n)/n 50d a.s. (2.8) X n k2 ≤ f (n), and |Vn − n| ≤ df (n) by (2.5). By the Rosenthal inequality, for Notice that kX q ≥ 2 we have n o n q o P max S n+k − S n ≥ x ≤ x−q E max S n+k − S n 0≤k≤Kf (n) 0≤k≤Kf (n) q o n q/2 X −q ≤Cx E Vn+Kf (n) − Vn +E max k n≤k≤n+Kf (n) −q ≤Cx and similarly, Hence q/2 q/2 q/2 Kf (n) + df n + Kf (n) + f n + Kf (n) ≤ Cx−q f (n) , n P max −Kf (n)≤k≤0 n P max |k|≤Kf (n) o S n+k − S n ≥ x ≤ Cx−q f (n) q/2 . o S n+k − S n ≥ x ≤ Cx−q f (n) q/2 . 1/2−ε Let 0 < ε < 1/2 and F (n) = n1/2 f (n)/n , notice that f (n)/n ≤ c(log n)−α , it follows that for q ≥ 2 large enough, o n P m maxm+1 max S n+k − S n /F (n) ≥ 1 |k|≤Kf (n) 2 ≤n≤2 o n S n+k − S n ≥ cF (2m+1 ) ≤P max max n≤2m+1 |k|≤Kf (2m+1 ) o n X S jKf (2m+1 )+k − S jKf (2m+1 ) ≥ cF (2m+1 ) P max ≤ j≤2m+1 /Kf (2m+1 ) ≤C |k|≤2Kf (2m+1 ) f (2m+1 ) εq/2−1 2m+1 m+1 −q m+1 q/2 F (2 ) f (2 ) ≤ c f (2m+1 ) 2m+1 ≤Cm−α{εq/2−1} ≤ Cm−2 , 344 L.X. Zhang which, together with the Borel-Cantelli lemma, implies that 1−ε max S n+k − S n = O n1/2 f (n)/n a.s. |k|≤Kf (n) (2.8) is proved. Finally, we remove Condition (2.5). Let X k = X k I kX X k k2 ≤ f (k) − E X k I{kX X k k2 ≤ f (k)}|Fk−1 b k = X k − X k for each k ≥ 1, then by Condition (1.9), and X n X k=1 b k = O(1) X a.s. and n n X X 0 0 b X b E X kX k |Fk−1 =Σn + EX k k |Fk−1 k=1 k=1 =Σn + O(1) n X X k k2 I{kX X k k2 ≥ f (k)}|Fk−1 E kX k=1 =Σn + o(f (n) a.s. X k k2 ≤ f (k), k = 1, 2, · · ·, for otherwise, we can replace X k by So, we can assume that kX X k and f (x) by 2f (x) . Now, let {Z Z k = (Zk1 , · · · , Zkd ); k ≥ 1} be a sequence of i.i.d. Rd valued random vectors, such that, for each k, Zk1 , · · · , Zkd are also i.i.d. random variables T k, let β1 = kσ1 − T k, δ1 = 1, with P(Zki = ±1) = 1/2. Denote αn = maxk≤n kΣk − kT I1 = I{β1 = 0} and I1c = 1 − I1 , if 2k < m ≤ 2k+1 , k ≥ 0, then define δm = 1 + f (2k ), β2k +1,m = βm = β1 + max 2k +1≤l≤m k−1 X l X T ) (σ i − iT , i=2k +1 β2l +1,2l+1 + β2k +1,m , l=0 Im = I βm ≤ δm = I{βm ≤ 1 + f (2k )}, and c Im = 1 − Im . e k = X k Ik + I cZ kT 1/2 , An = σ(Fn , Z 1 , · · · , Z n ), S en (m) = Let X k Then m+n P k=m+1 e k and S en = S en (0). X e n k ≤ In kX X n k + kZ Z kT 1/2 k ≤ f (n) + C kX and e n =: Σ n n n n X X X 0 X c e e X X T T E k k |Ak−1 = Ik σ k + Ik = nT + Ik σ k − T . k=1 k=1 k=1 k=1 Similarly to (2.1) and (2.2), we have e n − nT T k ≤ βn kΣ and Also, for 2k + 1 ≤ n ≤ 2k+1 , βn ≤ β1 + 2 k−1 X l=0 α2l+1 + 2αn = o(1) k X l=1 e n − nT T k ≤ δn ≤ 1 + f (n). kΣ f (2l ) + o(1)f (n) = o(f (n) = o(δn ) a.s. Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 345 e n , An ; n ≥ 1} satisfies (1.12), (1.13) and (2.5) (with f (x) being placed by It follows that {X e n 6= X n , i.o.) = 0. The proof of Theorem is now completed. f (x) + C), and furthermore P(X T k1 = O(n1−θ ) implies Proof of Theorem 1.3. By Theorem 1.2, it suffices to show that kΣn − nT ∗ T k = o(n1−θ ) a.s. for some θ∗ > 0. Choose p, q ≥ 2 such that θp/2 > 1 and θp/q < 1. kΣn − nT uk = 1, Let nk = k p . Notice that for nk ≤ n ≤ nk+1 and any u ∈ Rd with ku u(Σn − nT T )u u0 ≤ u(Σnk+1 )u u0 − nkuT u0 = u(Σnk+1 − nk+1T )u u0 + (nk+1 − nk )u uT u0 and u(nT T − Σn )u u0 ≤ nk+1uT u0 − u(Σnk )u u0 = u(nkT − Σnk )u u0 + (nk+1 − nk )u uT u0 . It follows that max nk ≤n≤nk+1 T k/n1−θ/q kΣn − nT 1−θ/q ≤CkΣnk+1 − nk+1T k/nk 1−θ/q + CkΣnk − nkT k/nk 1−θ/q + C(nk+1 − nk )/nk 1−θ/q 1−θ/q ≤CkΣnk+1 − nk+1T k/nk+1 + CkΣnk − nkT k/nk + O(k pθ/q−1 ). On the other hand, 1−θ/q 1−θ/q −θ(1−1/q) P kΣnk − nkT k ≥ nk ≤ kΣnk − nkT k1 /nk ≤ Cnk ≤ Ck −pθ/2 , which is summable. It follows that max nk ≤n≤nk+1 T k/n1−θ/q = O(1) kΣn − nT a.s. The proof is now completed. To prove Corollary 1.1, we need a lemma. n P Lemma 2.1. Let Un = Yk , Fn ; n ≥ 1 be an R1 -valued martingale sequence. Assume k=1 there exist two positive sequences bn and Kn such that bn and Kn are Fn−1 measurable for each n P n, Kn → 0 a.s., E[Yk2 |Fk−1 ] ≤ bn a.s. and k=1 |Yn | ≤ Kn Then p bn / log log bn |Un | lim sup √ ≤1 2bn log log bn n→∞ a.s. a.s. Proof. See [12]. Now, we begin the proof of Corollary 1.1. By Condition (1.16), without loss of generality e n and S en be defined X n k2 ≤ n1−ε ≤ n1−θ , n ≥ 1, let δn , βn , X we assume 0 < θ < ε/2 and kX 1−θ as in the proof of Theorem 1.2, where f (x) = x , then there exist κ > 0 and a sequence Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random vectors independent of T such that {Y where B n = n P k=1 en − B nT 1/2 = o(n1/2−κ ) S a.s., Y k . It is obvious that for any 2k + 1 ≤ n ≤ 2k+1 , (n/2)1−θ ≤ δn ≤ n1−θ (2.9) 346 L.X. Zhang and αn ≤ βn ≤ β1 + 2 k X l=0 αn ≤ 2(k + 2)αn ≤ Cαn log n. Let p > 1, then for all [n1/p ] + 1 ≤ m ≤ n, on the event βn ≤ βm ≤ βn ≤ [n1/p ] + 1 1−θ 2 n (1−θ)/p 4 (2.10) we have ≤ δ[n1/p ]+1 ≤ δm , and then Im = 1. It follows that (1−θ)/p (1−θ)/p S n I βn ≤ n/4 = S [n1/p ] I βn ≤ n/4 + n X k=[n1/p ]+1 and e k I βn ≤ n/4 (1−θ)/p X en + (S e 1/p )I βn ≤ n/4 (1−θ)/p − S en I βn > n/4 (1−θ)/p S [n1/p ] − S =S [n ] en − B nT 1/2 + S 1/p − S e 1/p I βn ≤ n/4 (1−θ)/p S n − B nT 1/2 =S [n ] [n ] en I βn > n/4 (1−θ)/p . + Sn − S (2.11) T k + αn , then for each i = 1, · · · , d, Choose bn = nkT |Xni | ≤ n(1−ε)/2 ≤ b(1−ε)/2 n and n X k=1 2 |Fk−1 ] E[Xki = nTii + n X k=1 2 |Fk−1 ] − nTii ≤ bn , E[Xki where Xni is the i-th component of X n , and Tii is the (i, i)-th element of T . By Lemma 2.1, p p p Sni = O bn log log bn = O n log log n + O αn log log αn a.s. So, Sn = O p p n log log n + O αn log log αn a.s. Also, by (2.9) and the law of iterated logarithm for i.i.d. random variables, p en = O n log log n S a.s. Notice (2.10), combing (2.9), (2.11)–(2.13) yields p 1 p S n − B nT 1/2 =O n1/2−κ + O n 2p log log n + O αn log log αn p 1 n (1−θ)/p + O n log log n I αn ≥ / log n C 4 1 1 p ( p +θ) 1/2−κ . =O n + O n 2p log log n + O αn2 1−θ Choose p = 1/(1 − θ), we then have p 1 +θ = + θ ≤ (1 + 2θ)2 + θ ≤ 1 + 7θ, 1−θ (1 − θ)2 (2.12) (2.13) Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 347 then S n − B nT 1/2 = O(n1/2−κ ) + o(n1/2−θ/3 ) + o α1/2+4θ . n The proof is now completed. 3 Applications to Markov Chain Adaptive Designs In clinical trials, adaptive designs are sequential designs in which treatments are selected according to outcomes of previously chosen treatments. The goal is to assign more patients to better treatments. This kind of designs have also been applied to bioassay and psychophysics etc. Consider a d-treatment clinical trial, suppose that at stage m, the m-th patient is assigned to the i-th treatment, then the (m + 1)-th patient will be assigned to the j-treatment in accordance with certain probabilities, which depend on the response of the i-th patient, denote this probability by hij (m). After n assignments, we let Nni be the number of patients assigned to d the i-th treatment, i = 1, · · · , d, write N n = (Nn1 , · · · , Nnd ), and H n = hij (n) i,j=1 . Obviously, N n10 = Nn1 + · · · + Nnd = n and H10 = 10 , where 1 = (1, · · · , 1). Let X n = (Xn1 , · · · , XnK ), where Xni = 1 if the n-th patient is assigned to the i-th treatment, i = 1, · · · , d, and denote ei the vector for which the i-th component is 1 and other components are 0, i = 1, · · · , d. Then Nn = n X Xn k=1 and X m+1 = ej |X X m = ei ) = hij (m). P(X It follows that X m+1 |Fm ] = X mH m , E[X X 1 , · · · X n ). Obviously, {X X n ; n ≥ 1} is a Markov chain with transition probawhere Fm = σ(X X n ; n ≥ 1} is called homogenous; otherwise, it bility matrix H n . If H n = H for all n, then {X is non-homogeneous. It is usually assumed H n → H a.s. for some non-random matrix H . It is obvious that λ1 = 1 is an eigenvalue of H . Let λ2 , · · · , λd be the other d − 1 eigenvalues of H , and λ = max Re(λ2 ), · · · , Re(λd ) . We assume λ < 1, notice |λi | ≤ 1, i = 1, · · · , d. The condition λ < 1 is not a difficult. For example, if H is a regular transition probability matrix of a Markov chain, then λ < 1. Theorem 3.1. In a probability space in which there exists d-dimensional standard Brownian B t }, we can redefine the sequence {X X n } without changing its distribution such that {B n X N n − nv v − B n Σ1/2 = o n1/2−κ + O Hk − Hk kH a.s., (3.1) k=1 for some κ > 0, where v = (v1 , · · · , vK ) is the left eigenvector corresponding to the largest eigenvalue λ1 = 1 of H with v1 + · · · + vK = 1, e )−1 e 0 )−1 diag(v v )H H (I I −H v ) − H 0 diag(v I −H Σ = (I e = H − 10v . and H Remark 3.1. In particular, if n X k=1 H k − H k = o(n1/2 ) kH a.s., (3.2) 348 L.X. Zhang then we have the asymptotic normality: D N n − nv v)N (0 0, Σ). n−1/2 (N Remark 3.2. (3.3) In the case d = 2, write h11 = α and h22 = β, then it can be checked that H= α 1−β 1−α β v= , 1−β 1−α , 2−α−β 2−α−β and Σ1 = (1 − α)(1 − β)(α + β) 2−α−β 1 −1 −1 1 , Σ= (1 − α)(1 − β)(α + β) (2 − α − β)3 1 −1 −1 1 . Before proving the theorem, we give some examples first. Let pi (n) = P(“success”|Xni = 1), and qi (n) = 1−pi (n), i = 1, 2, · · · , d. Assume that pi (n) → pi , i = 1, 2, · · · , d, and let qi = 1−pi . Example 3.1. Suppose the m-th patient is assigned to the i-th treatment. If the response of the m-th patient is “success”, then the (m + 1)-th patient is assigned to the same treatment i, if the response of the m-th patient is “failure”, then the (m + 1)-th patient is assigned to 1 . It is easily seen that each of other d − 1 treatments with probability d−1 p1 (n) q1 (n)/(d − 1) · · · q1 (n)/(d − 1) p2 (n) · · · q2 (n)/(d − 1) q (n)/(d − 1) Hn = 2 , ··· ··· ··· ··· qd (n)/(d − 1) qd (n)/(d − 1) · · · pd (n) p1 q1 /(d − 1) · · · q1 /(d − 1) p2 · · · q2 /(d − 1) q /(d − 1) H = 2 , ··· ··· ··· ··· qd /(d − 1) qd /(d − 1) · · · pd 1/q 1/q2 1/qd 1 v= , d ,···, d d P P P 1/qj 1/qj 1/qj j=1 Hn − Hk ≤ C and kH is just d P i=1 j=1 (3.4) j=1 |pi (n) − pi |. So, by Theorem 3.1, if 0 < pi < 1, j = 1, · · · , d, then (3.6) n X d X pi (k) − pi N n − nv v − B n Σ1/2 = o n1/2−κ + O a.s. k=1 i=1 Example 3.2q Suppose the m-th patient is assigned to the i-th treatment. If the response of the m-th patient is “success”, then the (m + 1)-th patient is assigned to the same treatment i. If the response of the m-th patient is ”failure”, then the (m + 1)-th patient is assigned to the (i + 1)-th treatment, where the (d + 1)-th treatment means the 1-st treatment. This assignment Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 349 scheme is called the cyclic play the winner (PWC) rule[6] . It is easily seen that p1 (n) q1 (n) 0 ··· 0 0 p2 (n) q2 (n) · · · 0 Hn = , ··· ··· ··· ··· ··· qd (n) 0 ··· · · · pd (n) p1 q1 0 ··· 0 0 p2 q2 · · · 0 H = , ··· ··· ··· ··· ··· qd 0 · · · · · · pd v is the same as in (3.4) and kH H n −H k ≤ C d P i=1 |pi (n) − pi |. So, (3.5) holds whenever 0 < pi < 1, j = 1, · · · , d. The designs in the above two examples give limit propositions of the patients assigned to each treatment the same as that of the design proposed by Wei and Durham[15] for the two-treatment case and Wei[14] for the general case. But it isknown that, for Wei’s design, the asymptotic normality holds only when the condition max Re(λ2 ), · · · , Re(λd ) ≤ 1/2 is satisfied. Such a condition is not easy to check when d ≥ 3. Example 3.3. Assume that pi (n) = pi holds for all n and i. Suppose the m-th patient is assigned to the i-th treatment. If the response of the m-th patient is “success”, then the (m + 1)-th patient is assigned to the same treatment i. When the response of the m-th patient is “failure”, our purpose is to assign the (m + 1)-th patient to the best one among the other Smi +1 d − 1 treatments. For this purpose, we use pbmi = N to estimate pi and write qbmi = 1 − pbmi , mi +1 where Smi is the number of the “successes” of those Nmi patients on the treatment i in the first m assignments, i = 1, · · · , d. Now, when the response of the m-th patient on the treatment i is “failure”, the (m + 1)-th patient is assigned one of the other d − 1 treatments for which pbmj (j 6= i) is the largest. If there are more than one treatments for which pbmj (j 6= i) is the largest, the (m + 1)th patient isassigned to these with the same probability. So hii (m) = pi and treatments hij (m) = qi I pbmj = max pbml /# t : pbmt = max pbml . To insure that each treatment is tested l6=i l6=i by enough patients, i.e., Nni → ∞ a.s., i = 1, · · · , d, we replace hij (m) by 1 1 h∗ij (m) = 1 − hij (m) + . m md Then we can show that Nni → ∞ a.s. and pbni → pi a.s., i = 1, · · · , d. Without loss of generality, we assume that p1 > p2 > p3 ≥ · · · ≥ pd . Then almost surely there exists n0 such that for all n ≥ n0 , max pbnl = pbn2 , and max pbnl = pbn1 for i 6= 1. It follows that l6=1 l6=i where H n = h∗ij (n) and Hn − H = O 1 p1 q1 0 q2 p2 0 H = q3 0 p3 ··· ··· ··· qd 0 0 It is easy to check v= a.s., n q2 q1 , , 0, · · · , 0 q1 + q2 q1 + q2 ··· 0 ··· 0 ··· 0 . ··· ··· · · · pd and Σ= Σ22 0 0 0 , 350 L.X. Zhang where Σ22 q1 q2 (p1 + p2 ) = (q1 + q2 )3 1 −1 −1 1 =: σ 2 1 −1 −1 1 . According to Theorem 3.1, we can define a 1-dimensional standard Brownian motion {Bt } such that q2 − σBn = o(n1/2−κ ) q1 + q2 q1 Nn2 − n + σBn = o(n1/2−κ ) q1 + q2 Nn1 − n Nni = o(n1/2−κ ) a.s. a.s. a.s. i = 3, · · · , d. Remark 3.3. When d = 2, the designs in Examples 3.1–3.3 are all the play-the-Winer (PW) rule, proposed by Zelen[16] . In Example 3.3, the limiting proportions of patients assigned the best two treatments are the same are those in the PW rule, and other treatments can be neglected. There are many other classes of adaptive designs. As for asymptotic properties, one can refer to Bai and Hu[1] , Bai[2] etc for urn-model type adaptive designs, and Eisele, J. and Woodroofe, M.[4] and Hu and Zhang[7] for doubly adaptive biased coin designs. For more discussions, one can refer to Rosenberger and Lachin[11] . Proof of Theorem 3.1. X n |Fn−1 ] and M n = Set X 0 = 0. Let Z n = X n − E[X Then n P k=1 Z n , n ≥ 1. X n =Z Z n + X n−1H n−1 = Z n + X n−1H + X n−1 (H H n−1 − H ) Z n + (X X n−1 − v )H H + v + X n−1 (H H n−1 − H ) =Z e + v + X n−1 (H Z n + (X X n−1 − v )H H n−1 − H ) =Z X n−1 − v )1 10 = 1 − 1 = 0. So, since vH = v and (X n X k=2 X k − v) = (X = n X k=2 n X Zk + Zk + k=2 n−1 X k=1 n X e+ X k − v )H (X e+ X k − v )H (X k=1 n−1 X k=1 n X k=1 X k (H Hk − H) X k (H H k − H ) − X nH n . Thus n X k=1 X k − v) = M n + (X n X k=1 e+ X k − v )H (X n X k=1 X k (H H k − H ) + E[X X 1 |F0 ] − E[X X n+1 |Fn ]. It follows that e) = Mn + N n − nv v )(I I −H (N n X k=1 X k (H H k − H ) + E[X X 1 ] − E[X X n+1 |Fn ] =: M n + Rn1 . e are greater than 0, so (I e )−1 exists. I −H Notice that the real parts of all eigenvalues of I − H Then e )−1 . N n − nv v = (M M n + Rn1 )(I I −H (3.6) Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 351 Obviously, Rn1 k ≤ kR n X k=1 H n − H k + 2. kH On the other hand, 0 Z 0nZ n |Fn−1 ] =E[X X 0nX n |Fn−1 ] − E[X X n |Fn−1 ] E[X X n |Fn−1 ] E[Z X n )|Fn−1 ] − H 0n−1X 0n−1X n−1H n−1 =E[diag(X X n−1H n−1 ) − H 0n−1 diag(X X n−1 )H H n−1 =diag(X 0 X n−1H ) − H diag(X X n−1 )H H + diag X n−1 (H H n−1 − H ) =diag(X X n−1 )(H H n−1 − H ) − (H H n−1 − H )0 diag(Xn−1 )H H − H 0n−1 diag(X 0 0 vH ) − H diag(v v )H H − H diag(X X n−1 − v )H H + diag (X X n−1 − v )H H =diag(v H n−1 − H ) − H 0n−1 diag(X X n−1 )(H H n−1 − H ) − (H H n−1 − H )0 diag(Xn−1 )H H + diag X n−1 (H 0 0 v ) − H diag(v v )H H − H diag(X X n−1 − v )H H + diag (X X n−1 − v)H H + r n2 =:diag(v 0 X n−1 − v )H H + diag (X X n−1 − v )H H + r n2 , =Σ1 − H diag(X v ) − H 0 diag(v v )H H and kr rn2 k ≤ CkH H n − H k. It follows that where Σ1 = diag(v n X E Z 0kZ k |Fk−1 − nΣ1 k=1 n X rk2 k N n−1 − nv v )H H N n−1 − nv v)H H kr ≤H 0 diag(N + + diag (N k=1 n X 0 e )−1H ] e )−1H Rn1 k + rk2 k + C H diag[M M n−1 (I I −H I −H kr ≤H + CkR + diag M n−1 (I k=1 M n−1 k + C ≤CkM Write n X k=1 H n − H k + C. kH (3.7) m X Z 0kZ k |Fk−1 ] − mΣ1 . αn = max E[Z m≤n (3.8) k=1 Z n k ≤ 2. By Corollary 1.1, for any 0 < δ < 1 there exist Obviously, αn ≤ Cn. Notice that kZ B t } such that κ > 0 and a d-dimensional standard Brownian motion {B M n − B n Σ1/2 = o(n1/2−κ ) + O(α1/2+δ ) n 1 a.s. By (3.9) and the law of the iterated logarithm for Brownian motions, M n k = O(n1/2+δ ) kM a.s., which, together with (3.7) and (3.8), implies that αn = O(n1/2+δ ) + O n X k=1 Hn − Hk kH a.s. (3.9) 352 L.X. Zhang From (3.9) again, it follows that M n − B n Σ1/2 =o(n1/2−κ ) + O(n1/4+δ/2 ) + O 1 =o(n1/2−κ ) + O n X k=1 n X k=1 Hn − Hk kH Hn − Hk kH 1/2+δ 1/2+δ a.s., which, together with (3.6), implies N n − nv v − B n Σ1/2 =o(n1/2−κ ) + O =o(n1/2−κ ) + O n X k=1 n X k=1 Hn − Hk kH Hn − Hk kH 1/2+δ +O n X k=1 Hn − Hk kH a.s. The proof is now completed. A note on Example 3.3. We shall show that pbni → pi a.s., i = 1, · · · , d. Notice that for each i, 1 P(Xn+1,i = 1|Xn,j = 1) = h∗ij (n) ≥ , i = 1, · · · , d. nd It follows that ∞ ∞ X X 1 P(Xn+1,i = 1|Fn ) ≥ = ∞, nd n=1 n=1 which, together with the extended Borel-Cantelli lemma, implies that Nni → ∞ a.s. Then by Lemma A.4 of [8], one has pbni → pi a.s., i = 1, · · · , d. References [1] Bai, Z.D., Hu, F. Asymptotic theorem for urn models with nonhomogeneous generating matrices. Stochastic Process. Appl., 80: 87–101 (1999) [2] Bai, Z.D., Hu, F., Zhang L.X. The Gaussian approximation theorems for urn models and their applications. Ann. Appl. Probab., 12: 1149–1173 (2002) [3] Eberlein, E. On strong invariance principles under dependence. Ann. Probab., 14: 260–270 (1986) [4] Eisele, J., Woodroofe, M. Central limit theorems for doubly adaptive biased coin designs. Ann. Statist., 23: 234–254 (1995) [5] Hall, P., Heyde, C.C. Martingale Limit Theory and its Applications. Academic Press, London, 1980 [6] Hoel, D.G., Sobel, M. Comparison of sequential procedures for selecting the best binomial population. Proc. Sixth. Berkeley Symp. Math. Statist. and Probabilities, 4: 53–69 (1972) [7] Hu, F., Zhang, L.X. Asymptotic properties of doubly adaptive biased coin designs for multi-treatment clinical trials. Ann. Statist., 32: 268–301 (2004) [8] Monrad, D., Philipp, W. The problem of embedding vector-valued martingales in a Gaussian process. Teor. Veroyatn. Primen., 35: 384–387 (1990) [9] Monrad, D., Philipp, W. Nearby variables with nearby conditional laws and a strong approximation theorem for Hilbert space valued martingales. Probab. Theory Relat. Fields, 88: 381–404 (1991) [10] Morrow, G.J., Philipp, W. An almost sure invariance principle for Hilbert space valued martingales. Trans. Amer. Math. Soc., 273: 231–251 (1982) [11] Rosenberger, W.F., Lachin, J.M. Randomization in Clinical Trials: Theory and Practice. Wiley, New York, 2002 [12] Stout, W.F. Almost Sure Convergence. Academic Press, New York, 1974 [13] Strassen V. Almost sure behavior of sums of independent random variables and martingales. Proc. Fifth Berkeley Symp. Math. Stat. Prob., II(1): 315–343 1967 [14] Wei, L.J. The generalized Polya’s urn design for sequential medical trials. Ann. Statist., 7: 291–296 (1979) [15] Wei, L.J., Durham, S. The randomized play-the-winner rule in medical trials. J. Amer. Statist. Assoc., 73: 840–843 (1978) [16] Zelen, M. Play the winner rule and the controlled clinical trial. J. Amer. Statist. Assoc., 64: 131–146 (1969)