Strong Approximations of Martingale Vectors and Their Li-xin Zhang

advertisement
Acta Mathematicae Applicatae Sinica, English Series
Vol. 20, No. 2 (2004) 337–352
Strong Approximations of Martingale Vectors and Their
Applications in Markov-Chain Adaptive Designs
Li-xin Zhang
Department of Mathematics, Zhejiang University, Hangzhou 310028, China (E-mail: lxzhang@mail.hz.zj.cn)
Abstract The strong approximations of a class of Rd -valued martingales are considered. The conditions used
in this paper are easier to check than those used in [3] and [9]. As an application, the strong approximation of
a class of non-homogenous Markov chains is established, and the asymptotic properties are established for the
multi-treatment Markov chain adaptive designs in clinical trials.
Keywords
Martingale, Non-homogenous Markov chain, Wiener processes, strong approximation, adaptive
designs, asymptotic properties
2000 MR Subject Classification
1
60F15, 60G42, 62L05, 62G20.
Intrduction and Main Results
X n , Fn ; n ≥ 1} is a square-integrable sequence of
Throughout this paper, we assume that {X
Rd -valued martingale differences, defined on (Ω, F , P), F0 is the trivial σ-field. The probability
space (Ω, F , P) is also assumed to be rich enough such that there is a uniformly distributed
m+n
P
X n ; n ≥ 1}. Denote S n (m) =
X k and
random variable U , which is independent of {X
k=m+1
S n = S n (0). Let
and the trace of σ n is defined as
σ n = E X 0nX n |Fn−1 ,
X n k2 Fn−1 ,
tr(σ n ) = E kX
uk = (u21 + · · · + u2d )1/2 for u ∈ Rd . Also, denote Σn =
where ku
Vn = tr(Σn ) =
n
X
k=1
tr(σ k ) =
n
P
σ k , Σn (m) =
k=1
m+n
P
σ k and
k=m+1
n
X
X k k2 Fk−1 .
E kX
k=1
Ak its maximum norm.
For a d × d matrix A, denote by kA
If A is a covariancematrix, the
uk = 1 .
maximum norm is equivalent to the norm defined by max uAu0 : u ∈ Rd , ku
The first strong approximation theorem for martingales can be found in [13]. In the case
d = 1, suppose Vn → ∞ a.s. and that {Xn } satisfies a kind of linderberg condition. Using
the Skorohod embedding theorem Strassen[13] proved that if the underlying probability space
Manuscript received October 20, 2003. Revised April 1, 2004
Supported by The National Natural Science Foundation of China ( No. 10071072).
338
L.X. Zhang
has rich enough structure then the martingale can be approximated by a standard Brownian
motion scaled according to the conditional variance of the given martingale sequence, i.e.,
X
Xm I Vm ≤ n − B(n) = o(n1/2 )
a.s.
(1.1)
m≥1
For d ≥ 2, Monrad and Philipp[8] proved that it is impossible to embed a general Rd -valued
martingale in an Rd -valued Gaussian process. However, Morrow and Philipp[10] , Eberlein[3] .
Monrad and Philipp[9] etc established some approximations for Rd -valued martingales.
By using a result of Eberlein[3] (see [9]), one can obtain
Theorem A.
Suppose
X n k2+δ < ∞
sup EkX
for some δ > 0.
(1.2)
n≥1
and there exists some covariance matrix T , measurable with respect to Fk for some k ≥ 0, and
some 0 < θ < 1 such that uniformly in m
E Σn (m)|Fm − nT
T 1 = O(n1−θ ),
(1.3)
Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random
then there exist a κ > 0 and a sequence {Y
T
vectors, independent of
such that
X
Sn −
Y mT 1/2 = O(n1/2−κ )
a.s.
(1.4)
m≤n
However, it is not easy to show that (1.3) holds uniformly in m (even when d = 1). In many
cases, we only know that Σn /n or Σn /Vn is close to some covariance matrix T . Morrow and
Philipp[10] established an approximation similar to (1.1) under the condition that
Σn
→ T.
Vn
Proposition 1.
Let f be a non-decreasing function with f (x) → ∞ as x → ∞, and such
that f (x)(log x)α /x is non-increasing for some α > 50d. Suppose that Vn → ∞ a.s., and
X X n k2 I kX
X n k2 ≥ f (Vn ) /f (Vn ) < ∞,
E kX
(1.5)
n≥1
also, suppose there exists some covariance matrix T , measurable with respect to Fk for some
k ≥ 0, such that
E max exp kΣn − T Vn k/f (Vn ) < ∞,
(1.6)
n≥1
Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random vectors
then there exists a sequence {Y
independent of T , such that
X
X
1 X m I{Vm ≤ n} −
Y mT 1/2 = O n1/2 f (n)/n 50d
a.s.
(1.7)
m≥1
m≤n
Though Monrad and Philipp[9] weakened Condition (1.6) to that for some 0 < p ≤ 1,
p
E max kΣn − T Vn k/f (Vn ) < ∞.
(1.8)
n≥1
Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 339
Since it is difficult to compute the expectation of the maximum of a sequence of random
variables, (1.8) is not easy to satisfy. It should be noticed that (1.6) or (1.8) implies kΣn −
T Vn k = O f (Vn ) a.s. and that they can not be implied from kΣn − T Vn k = O f (Vn ) a.s. or
kΣn − T Vn k = o f (Vn ) a.s.
The aim of this paper is to replace Conditions (1.3), (1.6) and (1.8) by some related conditions which are easier to verify, another aim is to replace (1.5) by a kind of conditional Lindberg
condition. Our first result is as follows.
Theorem 1.1.
Let f be a non-decreasing function such that f (x) → ∞ as x → ∞,
f (x)(log x)α /x is non-increasing for some α > 50d and f (x)/xε is non-decreasing for some
0 < ε < 1. Suppose Vn → ∞ a.s. and
X X n k2 I{kX
X n k2 ≥ f (Vn )}/f (Vn )Fn−1 < ∞
E kX
a.s.
(1.9)
n≥1
Moreover, assume that there exists some covariance matrix T , measurable with respect to Fk
for some k ≥ 0, such that
kΣn − T Vn k = o f (Vn )
a.s,
(10)
Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random
Then there exists a sequence {Y
vectors, independent of T , such that
X
m≥1
X m I{Vm ≤ n} −
X
Y mT 1/2 = O n1/2 f (n)/n
m≤n
1 50d
a.s.
(11)
If Σn /n is near T , then we have the following result.
Theorem 1.2.
Let f be a non-decreasing function such that f (x) → ∞ as x → ∞,
f (x)(log x)α /x is non-increasing for some α > 50d and f (x)/xε is non-decreasing for some
0 < ε < 1. Suppose
X X n k2 I kX
X n k2 ≥ f (n) /f (n)Fn−1 < ∞
E kX
a.s.
(1.12)
n≥1
In addition, suppose also there exists some covariance matrix T , measurable with respect to Fk
for some k ≥ 0, such that
T k = o f (n)
kΣn − nT
a.s.
(1.13)
Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random
Then there exists a sequence {Y
vectors independent of T such that
Sn −
X
m≤n
Y mT 1/2 = O n1/2 f (n)/n
1 50d
a.s.
(1.14)
The next theorem tells us that the uniformity condition (1.3) in Theorem A can be replaced
T k1 = O(n1−θ ).
by kΣn − nT
Theorem 1.3.
Suppose there exist constants 0 < θ, ε < 1 and a covariance matrix T ,
measurable with respect to Fk for some k ≥ 0, such that
T = O(n1−θ ) a.s. or kΣn − nT
T k1 = O(n1−θ ),
Σn − nT
∞
X X n k2 I{kX
X n k2 ≥ n1−ε Fn−1 /n1−ε < ∞
E kX
a.s.
n=1
(1.15)
(1.16)
340
L.X. Zhang
Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random vectors
Then there exists a sequence {Y
T
independent of such that
Sn −
X
Y mT 1/2 = O(n1/2−κ )
a.s,
m≤n
where κ > 0 is a constant depending only on θ, ε and d.
Corollary 1.1.
Suppose there exists a constant 0 < ε < 1 such that (1.16) is satisfied, and
T is a covariance matrix measurable with respect to Fk for some k ≥ 0. Then for any δ > 0,
Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random
there exists an κ > 0 and a sequence {Y
vectors independent of T such that
Sn −
X
m≤n
Y mT 1/2 = O n1/2−κ + O αn1/2+δ
a.s,
T k.
where αn = max kΣm − mT
m≤n
2
Proofs
X k k2 ≤ f (Vk ) − E X k I{kX
X k k2 ≤ f (Vk )}|Fk−1 and
Proof of Theorem 1.1. Let X k = X k I kX
b k = X k − X k for each k ≥ 1. By Condition (1.9) and the conditional Borel-Cantelli lemma[5] ,
X
X k k2 > f (Vk ), i.o. = 0
P kX
n
X
X k k2 > f (Vk ) |Fk−1 = O(1)
E X k I kX
and
a.s.
k=1
It follows that
n
X
k=1
bk =
X
n
X
k=1
X k I kX
X k k2 > f (Vk ) − E X k I kX
X k k2 > f (Vk ) |Fk−1 = O(1)
a.s.
Also, by (1.9) again,
n
n
X
X
0
0
b X
b
E X kX k |Fk−1 =Σn +
EX
k k |Fk−1
k=1
k=1
n
X
X k k2 I{kX
X k k2 ≥ f (Vk )}|Fk−1
=Σn + O(1)
E kX
k=1
=Σn + o f (Vn )
a.s.
X k k ≤ 2f (Vk ), we can assume that kX
X k k2 ≤ f (Vk ), k = 1, 2, · · ·, for otherwise, we can
Notice kX
replace X k by X k and f (x) by 2f (x).
Z k = (Zk1 , · · · , Zkd ); k ≥ 1} be a sequence of i.i.d. Rd -valued random vectors,
Now, let {Z
such that, for each k, Zk1 , · · · , Zkd are also i.i.d. random variables with P(Zki = ±1) = 1/2.
Denote αn = max kΣk − VkT k, let τ0 = 0, τk = max{n ≥ τk−1 ; Vn ≤ k}, k ≥ 1, then each τk is
k≤n
a stopping time respect to Fn−1 and
n ≥ Vτn = Vτn +1 − tr(σ τn +1 ) ≥ n − f (Vτn +1 ) ≥ n − f (n + 1).
Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 341
Let νk = τ2k . If m ≤ ν0 = τ1 , then define βm =
m
P
i=1
T tr(σ i )k, δm = 1, Im = I{βm = 0}
kσi −T
c
and Im
= 1 − Im , if νk < m ≤ νk+1 , k ≥ 0, then define δm = 1 + f (Vνk ),
βνk +1,m =
max
νk +1≤l≤m
βm = βν0 +
k−1
X
l
X
i=νk +1
σ i − T tr(σ i ) ,
βνl +1,νl+1 + βνk +1,m ,
l=0
Im = I{βm ≤ δm } = I{βm ≤ 1 + f (Vνk )},
c
Im
= 1 − Im .
and
e k = X k Ik + I cZ kT 1/2 tr(σ k ), An = σ(Fn , Z 1 , · · · , Z n ), S
en (m) =
Let X
k
en (0). Then
S
So
e n =:
Σ
k=m+1
e k and S
en =
X
n
n
n
h 0
i X
X
X
e X
e k |Ak−1 =
EX
I
σ
+
IkcT tr(σ k ).
k
k
k
k=1
k=1
k=1
n
n
n
X
X
X
en =
T )tr(σ k ) =
Ven = tr Σ
Ik tr(σ k ) +
Ikc tr(T
tr(σ k ) = Vn
k=1
and
m+n
P
First, we show that
k=1
e n − T Ven =
Σ
n
X
k=1
k=1
Ik σ k − T tr(σ k ) .
e n − T Ven ≤ βn .
Σ
It is obvious that
n
X
Σ
e n − T Ven ≤
kσi − T tr(σ i )k = βn ,
i=1
(2.1)
if
n ≤ ν0 .
Suppose n > ν0 and (2.1) holds for 1, 2, · · · , n − 1. We shall that show (2.1) holds for n. We can
find some k ≥ 0 such that νk + 1 ≤ n ≤ νk+1 . Notice that on {βn ≤ δn }, βm ≤ βn ≤ δn = δm
c
for all νk + 1 ≤ m ≤ n, which implies Im = 1, Im
= 0 if νk + 1 ≤ m ≤ n. So by induction,
n
X
e ν − T Veν +
e n − T Ven In =Σ
Σ
k
k
i=νk +1
σ i − T tr(σ i ) In
e ν − T Veν + βν +1,n ≤ βν + βν +1,n = βn .
≤Σ
k
k
k
k
k
However on {βn > δn }, In σ n − T t(σ n ) = 0. So
e n − T Ven I c = Σ
e n−1 − T Ven−1 I c ≤ βn−1 ≤ βn .
Σ
n
n
Hence (2.1) is proved.
Next, we show that
Σ
e n − T Ven ≤ δn ≤ 1 + f (Vn ).
(2.2)
e n − T Ven k = 0 if n ≤ ν0 . If n > ν0 , assume (2.2) holds for 1, 2, · · · , n − 1,
It is obvious that kΣ
then for n, we can find some k ≥ 0 such that νk + 1 ≤ n ≤ νk+1 . So
e n − T Ven In ≤ βn I{βn ≤ δn } ≤ δn
Σ
342
L.X. Zhang
by (2.1), and
Σ
e n − T Ven I c = Σ
e n−1 − T Ven−1 I c ≤ δn−1 ≤ δn
n
n
by induction. (2.2) is now proved.
X n k2 ≤ f (Vn ) and tr(σ n ) = E kX
X n k2 |Fn−1 ≤ f (Vn ), we have
On the other hand, since kX
2
e n ≤ f (Vn )In + I c f (Vn )tr(T
X
T 1/2 ) ≤ Cf (Vn ) = Cf (Ven ).
n
(2.3)
Y n ; n ≥ 1} of i.i.d. Rd -valued
By (2.2), (2.3) and Proposition 1.1, there exists a sequence {Y
standard normal random vectors independent of T such that
X
m≥1
e m I{Vm ≤ n} −
X
X
Y mT 1/2 = O n1/2 f (n)/n
m≤n
1 50d
a.s.
(2.4)
Finally, notice that Σn − T Vn = o f (Vn ) a.s. It follows that αn = o(f (Vn )) a.s. So, for
νk + 1 ≤ n ≤ νk+1 ,
βn ≤βν0 +
=o(1)
k−1
X
l=0
k
X
βνl +1,νl+1 + βνk +1,n ≤ βν0 + 2
f (Vνl ) + o(1)f (Vn )
k−1
X
ανl+1 + 2αn
l=0
a.s.
l=1
Also, notice that f (x)/xε is non-decreasing, and Vνl ∼ 2l , we have
k
X
l=1
f (Vνl ) ≤C
k
k
l=1
l=1
f (Vn ) X ε
f (Vn ) X lε
V
≤
C
2
ν
l
Vnε
Vnε
f (Vn ) kε
f (Vn ) ε
≤C
2 ≤C
V ≤ Cf (Vn ).
Vnε
Vnε νk
On the other hand, f (x)(log x)α /x is non-increasing, we have for νk + 1 ≤ n ≤ νk+1 ,
f (Vn ) ≤ f (Vνk+1 ) ≤
It follows that
(log Vνk )α /Vνk
f (Vνk ) = O(1)f (Vνk ) = O(δn )
(log Vνk+1 )α /Vνk+1
βn = o f (Vn ) = o(δn )
e n , i.o.) = 0. Therefore
X n 6= X
Hence P(X
X
m≥1
X m I{Vm ≤ n} −
X
a.s.
Y mT 1/2 = O n1/2 f (n)/n
m≤n
a.s.
1 50d
a.s.
The proof of Theorem 1.1 is now completed.
T ) = 1. By (1.13), Vn − n =
Proof of
Theorem 1.2. Without loss of generality, we assume tr(T
o f (n) a.s.,
so
Conditions
(1.9)
and
(1.10)
are
satisfied
if
necessary,
we can replace f (x) by
C1 f (C2 x) . First, we assume
X n k2 ≤ f (n) a.s.
kX
and
T k ≤ f (n) a.s.,
kΣn − nT
n ≥ 1.
(2.5)
Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 343
Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal
By Theorem 1.1, there exists a sequence {Y
T
random vectors independent of such that (1.11) holds. That is
τn
X
m=1
Xm −
X
Y mT 1/2 = O n1/2 f (n)/n
m≤n
1 50d
a.s.,
(2.6)
where τ0 = 0, τk = max{n ≥ τk−1 ; Vn ≤ k}, k ≥ 1. Next, we show that
n
X
τn
X
Xm +
m=τn +1
X m = O n1/2 f (n)/n
m=n+1
1 50d
a.s.,
(2.7)
j
P
which, together with (2.6), implies (1.14). Here
(·) = 0 for j < i.
m=i
Notice that Vn − n = o f (n) a.s. and n ≥ Vτn ≥ n − f (n + 1). We have
τn − n = O f (n)
a.s.
To prove (2.7), it suffices to show that for any K ≥ 1,
max
|k|≤Kf (n)
1 S n+k − S n = O n1/2 f (n)/n 50d
a.s.
(2.8)
X n k2 ≤ f (n), and |Vn − n| ≤ df (n) by (2.5). By the Rosenthal inequality, for
Notice that kX
q ≥ 2 we have
n
o
n
q o
P
max S n+k − S n ≥ x ≤ x−q E
max S n+k − S n 0≤k≤Kf (n)
0≤k≤Kf (n)
q o
n
q/2
X −q
≤Cx
E Vn+Kf (n) − Vn
+E
max
k
n≤k≤n+Kf (n)
−q
≤Cx
and similarly,
Hence
q/2
q/2 q/2
Kf (n) + df n + Kf (n)
+ f n + Kf (n)
≤ Cx−q f (n)
,
n
P
max
−Kf (n)≤k≤0
n
P
max
|k|≤Kf (n)
o
S n+k − S n ≥ x ≤ Cx−q f (n) q/2 .
o
S n+k − S n ≥ x ≤ Cx−q f (n) q/2 .
1/2−ε
Let 0 < ε < 1/2 and F (n) = n1/2 f (n)/n
, notice that f (n)/n ≤ c(log n)−α , it follows
that for q ≥ 2 large enough,
o
n
P m maxm+1 max S n+k − S n /F (n) ≥ 1
|k|≤Kf (n)
2 ≤n≤2
o
n
S n+k − S n ≥ cF (2m+1 )
≤P max
max
n≤2m+1 |k|≤Kf (2m+1 )
o
n
X
S jKf (2m+1 )+k − S jKf (2m+1 ) ≥ cF (2m+1 )
P
max
≤
j≤2m+1 /Kf (2m+1 )
≤C
|k|≤2Kf (2m+1 )
f (2m+1 ) εq/2−1
2m+1 m+1 −q
m+1 q/2
F
(2
)
f
(2
)
≤
c
f (2m+1 )
2m+1
≤Cm−α{εq/2−1} ≤ Cm−2 ,
344
L.X. Zhang
which, together with the Borel-Cantelli lemma, implies that
1−ε max S n+k − S n = O n1/2 f (n)/n
a.s.
|k|≤Kf (n)
(2.8) is proved.
Finally, we remove Condition (2.5). Let
X k = X k I kX
X k k2 ≤ f (k) − E X k I{kX
X k k2 ≤ f (k)}|Fk−1
b k = X k − X k for each k ≥ 1, then by Condition (1.9),
and X
n
X
k=1
b k = O(1)
X
a.s.
and
n
n
X
X
0
0
b X
b
E X kX k |Fk−1 =Σn +
EX
k k |Fk−1
k=1
k=1
=Σn + O(1)
n
X
X k k2 I{kX
X k k2 ≥ f (k)}|Fk−1
E kX
k=1
=Σn + o(f (n)
a.s.
X k k2 ≤ f (k), k = 1, 2, · · ·, for otherwise, we can replace X k by
So, we can assume that kX
X k and f (x) by 2f (x) . Now, let {Z
Z k = (Zk1 , · · · , Zkd ); k ≥ 1} be a sequence of i.i.d. Rd valued random vectors, such that, for each k, Zk1 , · · · , Zkd are also i.i.d. random variables
T k, let β1 = kσ1 − T k, δ1 = 1,
with P(Zki = ±1) = 1/2. Denote αn = maxk≤n kΣk − kT
I1 = I{β1 = 0} and I1c = 1 − I1 , if 2k < m ≤ 2k+1 , k ≥ 0, then define δm = 1 + f (2k ),
β2k +1,m =
βm = β1 +
max
2k +1≤l≤m
k−1
X
l
X
T )
(σ i − iT
,
i=2k +1
β2l +1,2l+1 + β2k +1,m ,
l=0
Im = I βm ≤ δm = I{βm ≤ 1 + f (2k )},
and
c
Im
= 1 − Im .
e k = X k Ik + I cZ kT 1/2 , An = σ(Fn , Z 1 , · · · , Z n ), S
en (m) =
Let X
k
Then
m+n
P
k=m+1
e k and S
en = S
en (0).
X
e n k ≤ In kX
X n k + kZ
Z kT 1/2 k ≤ f (n) + C
kX
and
e n =:
Σ
n
n
n
n
X
X
X
0
X
c
e
e
X
X
T
T
E k k |Ak−1 =
Ik σ k +
Ik = nT +
Ik σ k − T .
k=1
k=1
k=1
k=1
Similarly to (2.1) and (2.2), we have
e n − nT
T k ≤ βn
kΣ
and
Also, for 2k + 1 ≤ n ≤ 2k+1 ,
βn ≤ β1 + 2
k−1
X
l=0
α2l+1 + 2αn = o(1)
k
X
l=1
e n − nT
T k ≤ δn ≤ 1 + f (n).
kΣ
f (2l ) + o(1)f (n) = o(f (n) = o(δn )
a.s.
Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 345
e n , An ; n ≥ 1} satisfies (1.12), (1.13) and (2.5) (with f (x) being placed by
It follows that {X
e n 6= X n , i.o.) = 0. The proof of Theorem is now completed.
f (x) + C), and furthermore P(X
T k1 = O(n1−θ ) implies
Proof of Theorem 1.3. By Theorem 1.2, it suffices to show that kΣn − nT
∗
T k = o(n1−θ ) a.s. for some θ∗ > 0. Choose p, q ≥ 2 such that θp/2 > 1 and θp/q < 1.
kΣn − nT
uk = 1,
Let nk = k p . Notice that for nk ≤ n ≤ nk+1 and any u ∈ Rd with ku
u(Σn − nT
T )u
u0 ≤ u(Σnk+1 )u
u0 − nkuT u0 = u(Σnk+1 − nk+1T )u
u0 + (nk+1 − nk )u
uT u0
and
u(nT
T − Σn )u
u0 ≤ nk+1uT u0 − u(Σnk )u
u0 = u(nkT − Σnk )u
u0 + (nk+1 − nk )u
uT u0 .
It follows that
max
nk ≤n≤nk+1
T k/n1−θ/q
kΣn − nT
1−θ/q
≤CkΣnk+1 − nk+1T k/nk
1−θ/q
+ CkΣnk − nkT k/nk
1−θ/q
+ C(nk+1 − nk )/nk
1−θ/q
1−θ/q
≤CkΣnk+1 − nk+1T k/nk+1 + CkΣnk − nkT k/nk
+ O(k pθ/q−1 ).
On the other hand,
1−θ/q 1−θ/q
−θ(1−1/q)
P kΣnk − nkT k ≥ nk
≤ kΣnk − nkT k1 /nk
≤ Cnk
≤ Ck −pθ/2 ,
which is summable. It follows that
max
nk ≤n≤nk+1
T k/n1−θ/q = O(1)
kΣn − nT
a.s.
The proof is now completed.
To prove Corollary 1.1, we need a lemma.
n
P
Lemma 2.1.
Let Un =
Yk , Fn ; n ≥ 1 be an R1 -valued martingale sequence. Assume
k=1
there exist two positive sequences bn and Kn such that bn and Kn are Fn−1 measurable for each
n
P
n, Kn → 0 a.s.,
E[Yk2 |Fk−1 ] ≤ bn a.s. and
k=1
|Yn | ≤ Kn
Then
p
bn / log log bn
|Un |
lim sup √
≤1
2bn log log bn
n→∞
a.s.
a.s.
Proof. See [12].
Now, we begin the proof of Corollary 1.1. By Condition (1.16), without loss of generality
e n and S
en be defined
X n k2 ≤ n1−ε ≤ n1−θ , n ≥ 1, let δn , βn , X
we assume 0 < θ < ε/2 and kX
1−θ
as in the proof of Theorem 1.2, where f (x) = x , then there exist κ > 0 and a sequence
Y n ; n ≥ 1} of i.i.d. Rd -valued standard normal random vectors independent of T such that
{Y
where B n =
n
P
k=1
en − B nT 1/2 = o(n1/2−κ )
S
a.s.,
Y k . It is obvious that for any 2k + 1 ≤ n ≤ 2k+1 ,
(n/2)1−θ ≤ δn ≤ n1−θ
(2.9)
346
L.X. Zhang
and
αn ≤ βn ≤ β1 + 2
k
X
l=0
αn ≤ 2(k + 2)αn ≤ Cαn log n.
Let p > 1, then for all [n1/p ] + 1 ≤ m ≤ n, on the event βn ≤
βm ≤ βn ≤
[n1/p ] + 1 1−θ
2
n (1−θ)/p
4
(2.10)
we have
≤ δ[n1/p ]+1 ≤ δm ,
and then Im = 1. It follows that
(1−θ)/p (1−θ)/p S n I βn ≤ n/4
= S [n1/p ] I βn ≤ n/4
+
n
X
k=[n1/p ]+1
and
e k I βn ≤ n/4 (1−θ)/p
X
en + (S
e 1/p )I βn ≤ n/4 (1−θ)/p − S
en I βn > n/4 (1−θ)/p
S [n1/p ] − S
=S
[n
]
en − B nT 1/2 + S 1/p − S
e 1/p I βn ≤ n/4 (1−θ)/p
S n − B nT 1/2 =S
[n
]
[n
]
en I βn > n/4 (1−θ)/p .
+ Sn − S
(2.11)
T k + αn , then for each i = 1, · · · , d,
Choose bn = nkT
|Xni | ≤ n(1−ε)/2 ≤ b(1−ε)/2
n
and
n
X
k=1
2
|Fk−1 ]
E[Xki
= nTii +
n
X
k=1
2
|Fk−1 ] − nTii ≤ bn ,
E[Xki
where Xni is the i-th component of X n , and Tii is the (i, i)-th element of T . By Lemma 2.1,
p
p
p
Sni = O bn log log bn = O n log log n + O αn log log αn
a.s.
So,
Sn = O
p
p
n log log n + O αn log log αn
a.s.
Also, by (2.9) and the law of iterated logarithm for i.i.d. random variables,
p
en = O n log log n
S
a.s.
Notice (2.10), combing (2.9), (2.11)–(2.13) yields
p
1 p
S n − B nT 1/2 =O n1/2−κ + O n 2p log log n + O αn log log αn
p
1 n (1−θ)/p
+ O n log log n I αn ≥
/ log n
C 4
1
1 p
( p +θ) 1/2−κ
.
=O n
+ O n 2p log log n + O αn2 1−θ
Choose p = 1/(1 − θ), we then have
p
1
+θ =
+ θ ≤ (1 + 2θ)2 + θ ≤ 1 + 7θ,
1−θ
(1 − θ)2
(2.12)
(2.13)
Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 347
then
S n − B nT 1/2 = O(n1/2−κ ) + o(n1/2−θ/3 ) + o α1/2+4θ
.
n
The proof is now completed.
3
Applications to Markov Chain Adaptive Designs
In clinical trials, adaptive designs are sequential designs in which treatments are selected according to outcomes of previously chosen treatments. The goal is to assign more patients to
better treatments. This kind of designs have also been applied to bioassay and psychophysics
etc.
Consider a d-treatment clinical trial, suppose that at stage m, the m-th patient is assigned
to the i-th treatment, then the (m + 1)-th patient will be assigned to the j-treatment in
accordance with certain probabilities, which depend on the response of the i-th patient, denote
this probability by hij (m). After n assignments, we let Nni be the number of patients assigned to
d
the i-th treatment, i = 1, · · · , d, write N n = (Nn1 , · · · , Nnd ), and H n = hij (n) i,j=1 . Obviously,
N n10 = Nn1 + · · · + Nnd = n and H10 = 10 , where 1 = (1, · · · , 1). Let X n = (Xn1 , · · · , XnK ),
where Xni = 1 if the n-th patient is assigned to the i-th treatment, i = 1, · · · , d, and denote ei
the vector for which the i-th component is 1 and other components are 0, i = 1, · · · , d. Then
Nn =
n
X
Xn
k=1
and
X m+1 = ej |X
X m = ei ) = hij (m).
P(X
It follows that
X m+1 |Fm ] = X mH m ,
E[X
X 1 , · · · X n ). Obviously, {X
X n ; n ≥ 1} is a Markov chain with transition probawhere Fm = σ(X
X n ; n ≥ 1} is called homogenous; otherwise, it
bility matrix H n . If H n = H for all n, then {X
is non-homogeneous. It is usually assumed H n → H a.s. for some non-random matrix H . It
is obvious that λ1 = 1 is an eigenvalue
of H . Let λ2 , · · · , λd be the other d − 1 eigenvalues of
H , and λ = max Re(λ2 ), · · · , Re(λd ) . We assume λ < 1, notice |λi | ≤ 1, i = 1, · · · , d. The
condition λ < 1 is not a difficult. For example, if H is a regular transition probability matrix
of a Markov chain, then λ < 1.
Theorem 3.1.
In a probability space in which there exists d-dimensional standard Brownian
B t }, we can redefine the sequence {X
X n } without changing its distribution such that
{B
n
X
N n − nv
v − B n Σ1/2 = o n1/2−κ + O
Hk − Hk
kH
a.s.,
(3.1)
k=1
for some κ > 0, where v = (v1 , · · · , vK ) is the left eigenvector corresponding to the largest
eigenvalue λ1 = 1 of H with v1 + · · · + vK = 1,
e )−1
e 0 )−1 diag(v
v )H
H (I
I −H
v ) − H 0 diag(v
I −H
Σ = (I
e = H − 10v .
and H
Remark 3.1.
In particular, if
n
X
k=1
H k − H k = o(n1/2 )
kH
a.s.,
(3.2)
348
L.X. Zhang
then we have the asymptotic normality:
D
N n − nv
v)N (0
0, Σ).
n−1/2 (N
Remark 3.2.
(3.3)
In the case d = 2, write h11 = α and h22 = β, then it can be checked that
H=
α
1−β
1−α
β
v=
,
1−β
1−α ,
2−α−β 2−α−β
and
Σ1 =
(1 − α)(1 − β)(α + β)
2−α−β
1 −1
−1 1
,
Σ=
(1 − α)(1 − β)(α + β)
(2 − α − β)3
1 −1
−1 1
.
Before proving the theorem, we give some examples first. Let pi (n) = P(“success”|Xni = 1),
and qi (n) = 1−pi (n), i = 1, 2, · · · , d. Assume that pi (n) → pi , i = 1, 2, · · · , d, and let qi = 1−pi .
Example 3.1.
Suppose the m-th patient is assigned to the i-th treatment. If the response
of the m-th patient is “success”, then the (m + 1)-th patient is assigned to the same treatment
i, if the response of the m-th patient is “failure”, then the (m + 1)-th patient is assigned to
1
. It is easily seen that
each of other d − 1 treatments with probability d−1

p1 (n)
q1 (n)/(d − 1) · · · q1 (n)/(d − 1)
p2 (n)
· · · q2 (n)/(d − 1) 
 q (n)/(d − 1)
Hn =  2
,
···
···
···
···
qd (n)/(d − 1) qd (n)/(d − 1) · · ·
pd (n)


p1
q1 /(d − 1) · · · q1 /(d − 1)
p2
· · · q2 /(d − 1) 
 q /(d − 1)
H = 2
,
···
···
···
···
qd /(d − 1) qd /(d − 1) · · ·
pd



 1/q
1/q2
1/qd 

1
v=
, d
,···, d
 d

P
P
P

1/qj
1/qj
1/qj
j=1
Hn − Hk ≤ C
and kH
is just
d
P
i=1
j=1
(3.4)
j=1
|pi (n) − pi |. So, by Theorem 3.1, if 0 < pi < 1, j = 1, · · · , d, then (3.6)
n X
d
X
pi (k) − pi N n − nv
v − B n Σ1/2 = o n1/2−κ + O
a.s.
k=1 i=1
Example 3.2q Suppose the m-th patient is assigned to the i-th treatment. If the response
of the m-th patient is “success”, then the (m + 1)-th patient is assigned to the same treatment
i. If the response of the m-th patient is ”failure”, then the (m + 1)-th patient is assigned to the
(i + 1)-th treatment, where the (d + 1)-th treatment means the 1-st treatment. This assignment
Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 349
scheme is called the cyclic play the winner (PWC) rule[6] . It is easily seen that


p1 (n) q1 (n)
0
···
0
0
p2 (n) q2 (n) · · ·
0 
Hn = 

,
···
···
···
···
···
qd (n)
0
···
· · · pd (n)


p1 q1
0 ··· 0
 0 p2 q2 · · · 0 
H =
,
··· ··· ··· ··· ···
qd
0 · · · · · · pd
v is the same as in (3.4) and kH
H n −H k ≤ C
d
P
i=1
|pi (n) − pi |. So, (3.5) holds whenever 0 < pi < 1,
j = 1, · · · , d.
The designs in the above two examples give limit propositions of the patients assigned
to each treatment the same as that of the design proposed by Wei and Durham[15] for the
two-treatment case and Wei[14] for the general case. But it isknown that, for Wei’s
design,
the asymptotic normality holds only when the condition max Re(λ2 ), · · · , Re(λd ) ≤ 1/2 is
satisfied. Such a condition is not easy to check when d ≥ 3.
Example 3.3.
Assume that pi (n) = pi holds for all n and i. Suppose the m-th patient
is assigned to the i-th treatment. If the response of the m-th patient is “success”, then the
(m + 1)-th patient is assigned to the same treatment i. When the response of the m-th patient
is “failure”, our purpose is to assign the (m + 1)-th patient to the best one among the other
Smi +1
d − 1 treatments. For this purpose, we use pbmi = N
to estimate pi and write qbmi = 1 − pbmi ,
mi +1
where Smi is the number of the “successes” of those Nmi patients on the treatment i in the
first m assignments, i = 1, · · · , d.
Now, when the response of the m-th patient on the treatment i is “failure”, the (m + 1)-th
patient is assigned one of the other d − 1 treatments for which pbmj (j 6= i) is the largest.
If there are more than one treatments for which pbmj (j 6= i) is the largest, the (m + 1)th patient isassigned to these
with the same probability. So hii (m) = pi and
treatments
hij (m) = qi I pbmj = max pbml /# t : pbmt = max pbml . To insure that each treatment is tested
l6=i
l6=i
by enough patients, i.e., Nni → ∞ a.s., i = 1, · · · , d, we replace hij (m) by
1
1
h∗ij (m) = 1 −
hij (m) +
.
m
md
Then we can show that Nni → ∞ a.s. and pbni → pi a.s., i = 1, · · · , d. Without loss of generality,
we assume that p1 > p2 > p3 ≥ · · · ≥ pd . Then almost surely there exists n0 such that for all
n ≥ n0 , max pbnl = pbn2 , and max pbnl = pbn1 for i 6= 1. It follows that
l6=1
l6=i
where H n = h∗ij (n) and
Hn − H = O
1

p1 q1
0
 q2 p2 0

H =  q3
0 p3

··· ··· ···
qd
0
0
It is easy to check
v=
a.s.,
n
q2
q1
,
, 0, · · · , 0
q1 + q2 q1 + q2

··· 0
··· 0 

··· 0 .

··· ···
· · · pd
and
Σ=
Σ22
0
0
0
,
350
L.X. Zhang
where
Σ22
q1 q2 (p1 + p2 )
=
(q1 + q2 )3
1 −1
−1 1
=: σ
2
1 −1
−1 1
.
According to Theorem 3.1, we can define a 1-dimensional standard Brownian motion {Bt } such
that
q2
− σBn = o(n1/2−κ )
q1 + q2
q1
Nn2 − n
+ σBn = o(n1/2−κ )
q1 + q2
Nn1 − n
Nni = o(n1/2−κ )
a.s.
a.s.
a.s. i = 3, · · · , d.
Remark 3.3.
When d = 2, the designs in Examples 3.1–3.3 are all the play-the-Winer
(PW) rule, proposed by Zelen[16] . In Example 3.3, the limiting proportions of patients assigned
the best two treatments are the same are those in the PW rule, and other treatments can be
neglected.
There are many other classes of adaptive designs. As for asymptotic properties, one can refer
to Bai and Hu[1] , Bai[2] etc for urn-model type adaptive designs, and Eisele, J. and Woodroofe,
M.[4] and Hu and Zhang[7] for doubly adaptive biased coin designs. For more discussions, one
can refer to Rosenberger and Lachin[11] .
Proof of Theorem 3.1.
X n |Fn−1 ] and M n =
Set X 0 = 0. Let Z n = X n − E[X
Then
n
P
k=1
Z n , n ≥ 1.
X n =Z
Z n + X n−1H n−1 = Z n + X n−1H + X n−1 (H
H n−1 − H )
Z n + (X
X n−1 − v )H
H + v + X n−1 (H
H n−1 − H )
=Z
e + v + X n−1 (H
Z n + (X
X n−1 − v )H
H n−1 − H )
=Z
X n−1 − v )1
10 = 1 − 1 = 0. So,
since vH = v and (X
n
X
k=2
X k − v) =
(X
=
n
X
k=2
n
X
Zk +
Zk +
k=2
n−1
X
k=1
n
X
e+
X k − v )H
(X
e+
X k − v )H
(X
k=1
n−1
X
k=1
n
X
k=1
X k (H
Hk − H)
X k (H
H k − H ) − X nH n .
Thus
n
X
k=1
X k − v) = M n +
(X
n
X
k=1
e+
X k − v )H
(X
n
X
k=1
X k (H
H k − H ) + E[X
X 1 |F0 ] − E[X
X n+1 |Fn ].
It follows that
e) = Mn +
N n − nv
v )(I
I −H
(N
n
X
k=1
X k (H
H k − H ) + E[X
X 1 ] − E[X
X n+1 |Fn ] =: M n + Rn1 .
e are greater than 0, so (I
e )−1 exists.
I −H
Notice that the real parts of all eigenvalues of I − H
Then
e )−1 .
N n − nv
v = (M
M n + Rn1 )(I
I −H
(3.6)
Strong Approximations of Martingale Vectors and Their Applications in Markov-Chain Adaptive Designs 351
Obviously,
Rn1 k ≤
kR
n
X
k=1
H n − H k + 2.
kH
On the other hand,
0
Z 0nZ n |Fn−1 ] =E[X
X 0nX n |Fn−1 ] − E[X
X n |Fn−1 ] E[X
X n |Fn−1 ]
E[Z
X n )|Fn−1 ] − H 0n−1X 0n−1X n−1H n−1
=E[diag(X
X n−1H n−1 ) − H 0n−1 diag(X
X n−1 )H
H n−1
=diag(X
0
X n−1H ) − H diag(X
X n−1 )H
H + diag X n−1 (H
H n−1 − H )
=diag(X
X n−1 )(H
H n−1 − H ) − (H
H n−1 − H )0 diag(Xn−1 )H
H
− H 0n−1 diag(X
0
0
vH ) − H diag(v
v )H
H − H diag(X
X n−1 − v )H
H + diag (X
X n−1 − v )H
H
=diag(v
H n−1 − H ) − H 0n−1 diag(X
X n−1 )(H
H n−1 − H ) − (H
H n−1 − H )0 diag(Xn−1 )H
H
+ diag X n−1 (H
0
0
v ) − H diag(v
v )H
H − H diag(X
X n−1 − v )H
H + diag (X
X n−1 − v)H
H + r n2
=:diag(v
0
X n−1 − v )H
H + diag (X
X n−1 − v )H
H + r n2 ,
=Σ1 − H diag(X
v ) − H 0 diag(v
v )H
H and kr
rn2 k ≤ CkH
H n − H k. It follows that
where Σ1 = diag(v
n
X
E Z 0kZ k |Fk−1 − nΣ1 k=1
n
X
rk2 k
N n−1 − nv
v )H
H N n−1 − nv
v)H
H
kr
≤H 0 diag(N
+
+ diag (N
k=1
n
X
0
e )−1H ]
e )−1H Rn1 k +
rk2 k + C
H diag[M
M n−1 (I
I −H
I −H
kr
≤H
+ CkR
+ diag M n−1 (I
k=1
M n−1 k + C
≤CkM
Write
n
X
k=1
H n − H k + C.
kH
(3.7)
m
X
Z 0kZ k |Fk−1 ] − mΣ1 .
αn = max E[Z
m≤n
(3.8)
k=1
Z n k ≤ 2. By Corollary 1.1, for any 0 < δ < 1 there exist
Obviously, αn ≤ Cn. Notice that kZ
B t } such that
κ > 0 and a d-dimensional standard Brownian motion {B
M n − B n Σ1/2
= o(n1/2−κ ) + O(α1/2+δ
)
n
1
a.s.
By (3.9) and the law of the iterated logarithm for Brownian motions,
M n k = O(n1/2+δ )
kM
a.s.,
which, together with (3.7) and (3.8), implies that
αn = O(n1/2+δ ) + O
n
X
k=1
Hn − Hk
kH
a.s.
(3.9)
352
L.X. Zhang
From (3.9) again, it follows that
M n − B n Σ1/2
=o(n1/2−κ ) + O(n1/4+δ/2 ) + O
1
=o(n1/2−κ ) + O
n
X
k=1
n
X
k=1
Hn − Hk
kH
Hn − Hk
kH
1/2+δ 1/2+δ a.s.,
which, together with (3.6), implies
N n − nv
v − B n Σ1/2 =o(n1/2−κ ) + O
=o(n1/2−κ ) + O
n
X
k=1
n
X
k=1
Hn − Hk
kH
Hn − Hk
kH
1/2+δ +O
n
X
k=1
Hn − Hk
kH
a.s.
The proof is now completed.
A note on Example 3.3. We shall show that pbni → pi a.s., i = 1, · · · , d. Notice that for
each i,
1
P(Xn+1,i = 1|Xn,j = 1) = h∗ij (n) ≥
,
i = 1, · · · , d.
nd
It follows that
∞
∞
X
X
1
P(Xn+1,i = 1|Fn ) ≥
= ∞,
nd
n=1
n=1
which, together with the extended Borel-Cantelli lemma, implies that Nni → ∞ a.s. Then by
Lemma A.4 of [8], one has pbni → pi a.s., i = 1, · · · , d.
References
[1] Bai, Z.D., Hu, F. Asymptotic theorem for urn models with nonhomogeneous generating matrices. Stochastic Process. Appl., 80: 87–101 (1999)
[2] Bai, Z.D., Hu, F., Zhang L.X. The Gaussian approximation theorems for urn models and their applications.
Ann. Appl. Probab., 12: 1149–1173 (2002)
[3] Eberlein, E. On strong invariance principles under dependence. Ann. Probab., 14: 260–270 (1986)
[4] Eisele, J., Woodroofe, M. Central limit theorems for doubly adaptive biased coin designs. Ann. Statist.,
23: 234–254 (1995)
[5] Hall, P., Heyde, C.C. Martingale Limit Theory and its Applications. Academic Press, London, 1980
[6] Hoel, D.G., Sobel, M. Comparison of sequential procedures for selecting the best binomial population.
Proc. Sixth. Berkeley Symp. Math. Statist. and Probabilities, 4: 53–69 (1972)
[7] Hu, F., Zhang, L.X. Asymptotic properties of doubly adaptive biased coin designs for multi-treatment
clinical trials. Ann. Statist., 32: 268–301 (2004)
[8] Monrad, D., Philipp, W. The problem of embedding vector-valued martingales in a Gaussian process.
Teor. Veroyatn. Primen., 35: 384–387 (1990)
[9] Monrad, D., Philipp, W. Nearby variables with nearby conditional laws and a strong approximation theorem
for Hilbert space valued martingales. Probab. Theory Relat. Fields, 88: 381–404 (1991)
[10] Morrow, G.J., Philipp, W. An almost sure invariance principle for Hilbert space valued martingales. Trans.
Amer. Math. Soc., 273: 231–251 (1982)
[11] Rosenberger, W.F., Lachin, J.M. Randomization in Clinical Trials: Theory and Practice. Wiley, New
York, 2002
[12] Stout, W.F. Almost Sure Convergence. Academic Press, New York, 1974
[13] Strassen V. Almost sure behavior of sums of independent random variables and martingales. Proc. Fifth
Berkeley Symp. Math. Stat. Prob., II(1): 315–343 1967
[14] Wei, L.J. The generalized Polya’s urn design for sequential medical trials. Ann. Statist., 7: 291–296 (1979)
[15] Wei, L.J., Durham, S. The randomized play-the-winner rule in medical trials. J. Amer. Statist. Assoc.,
73: 840–843 (1978)
[16] Zelen, M. Play the winner rule and the controlled clinical trial. J. Amer. Statist. Assoc., 64: 131–146
(1969)
Download