Test of mean difference in longitudinal data based on block resampling

advertisement
Test of mean difference in longitudinal data
based on block resampling
Hirohito Sakurai1 and Masaaki Taguri2
1
2
Division of Computer Science, Hokkaido University, Sapporo 060–0814, JAPAN
sakurai@main.eng.hokudai.ac.jp
Research Division, National Center for University Entrance Examinations,
Tokyo 153–8501, JAPAN taguri@rd.dnc.ac.jp
Summary. This paper proposes a testing method for detecting the difference of
two means or mean curves in longitudinal data using a block resamplingapproach
with permutation analogy For the detection of mean difference, we consider four
types of test statistics. Monte Carlo simulations are carried out in order to examine
the sizes and powers of the proposed test.
Key words: block resampling, longitudinal data
1 Introduction
1
2
Suppose that there are two samples given by {Yi (t)}qi=1
and {Xj (t)}qj=1
for t =
1, . . . , n, and assume that they are mutually independent, where q1 and q2 are numbers of subjects, and n is the number of observed points. We also assume that, for
fixed t, Yi (t) and Xj (t) are independent over q1 and q2 subjects, respectively. Then
we consider the model
(
Yi (t) = p1 (t) + εi (t),
Xj (t) = p2 (t) + ηj (t),
i = 1, . . . , q1 ,
j = 1, . . . , q2 ,
(1)
where p1 (t) and p2 (t) are unknown regression functions, and εi (t) and ηj (t) are the
error terms having means 0 and finite variances. More general formulations for (1)
can be found, for example, in [HH90], [FS04], and references therein. Our problem
is then to test
(
H0 : p1 (t) = p2 (t) for all t,
H1 : p1 (t) 6= p2 (t) for some t,
(2)
where H0 and H1 are the null and alternative hypotheses, respectively.
For example, Fig. 1 shows the wind velocity data which are obtained by an
artificial satellite (left panel) and a radar (right panel) on the earth, and are measured
at altitudes from 80 km to 90 km every 1 km (11 subjects) during (n =)13 days.
For these data, we want to know whether the mean behavior of the two devices in
1088
Hirohito Sakurai and Masaaki Taguri
100 120
80
60
0
20
40
Wind Velocity (m/s)
80
60
40
0
20
Wind Velocity (m/s)
100 120
measuring wind velocity is equal or not. Then the problem is formulated as (2) and
the significant difference between them is detected by some methods, which is briefly
explained in Section 4.
0
2
4
6
8
10 12 14
0
2
4
Day
6
8
10 12 14
Day
Fig. 1. Wind velocity data (left: satellite, right: radar)
Several methods on the comparison of two means or regression curves have been
proposed, and most of them assume that the error terms are independent and identically distributed (i.i.d.), whose distributions are normal. However, when we analyze
a real dataset, it may be unrealistic. If we cannot assume the normality, the nonparametric approach, for example by [BY96], is available. Another possible approach
would be an application of resampling.
In some settings of dependent data, for example, in time series analysis, the
naive bootstrap usually fails to capture the dependent structure of data because of
ignoring the order of observations. To overcome this problem, two kinds of bootstrap
methods are proposed. One is a model-based approach which resamples from approximately i.i.d. residuals, and the other is a nonparametric, purely model-free bootstrap
scheme, which resamples from blocks of observations; see [B02], [HHK03], [L03], and
references therein.
This paper concerns with the case where longitudinal datafrom two groups are
given, and proposes a testing method for detecting the difference between two means
or two mean curves in longitudinal data based on block resamplingpproach. As seen
in the numerical examinations given below, our approach is superior to [BY96] in
power.
The rest of the paper is constructed as follows. Section 2 proposes a testing procedure which generates the resamples corresponding to two samples by permutation
analogy and calculates a p-value (achieved significance level). In order to investigate
the sizes and powers of the proposed testing method, Monte Carlo simulations are
carried out in Section 3, and some concluding remarks are summarized in Section 4.
2 Testing Method
There are some approaches to detecting the difference between two mean curves,
p1 (t) and p2 (t), in (1). In this paper, our interest concentrates on the behavior of
four types of test statistics given below. The following statistic is proposed by [HH90]:
Test of mean difference in longitudinal data based on block resampling
2n−1 j+g
X X
Sn = Sn (D1 , . . . , Dn ) = 4
j=0
P
t=j+1
!2 3 " n−1
X (Dt+1 − Dt )2 #−1
5
n
Dt
,
2
t=1
1089
(3)
P
where Dt = Yt − Xt for t = 1, . . . , n or Dt = Yt−n − Xt−n for t = n + 1, . . . , n + g,
1
2
Yt = qi=1
Yi (t)/q1 , Xt = qj=1
Xj (t)/q2 , g = [np] is the integer part of np, and p is
a tuning constant satisfying 0 < p < 1 which is determined by the fully data-driven
approach; the second approach described in [HH90, pp.1043–1044]. The statistic (3)
is essentially based on kernel estimators of p1 (t) and p2 (t). As another type of test
statistics, we can consider
T1n = T1n (D1 , . . . , Dn ) =
n
X
|Dt |,
(4)
Dt2 .
(5)
t=1
n
T2n = T2n (D1 , . . . , Dn ) =
X
t=1
R
In addition to (3), (4) and (5), we here also focus attention on area-difference
|p1 (t) − p2 (t)| dt, and then consider the following test statistic T3n =
A =
T3n (D1 , . . . , Dn ) as an estimator of A constructed by the trapezoidal rule with linear
interpolations of adjacent observation values:
T3n = T3n (D1 , . . . , Dn ) =
1
2
X
n−1
(|Dt | + |Dt+1 |)I{Dt Dt+1 ≥ 0}
t=1
+
1
2
X |Dt |2 + |Dt+1|2
n−1
t=1
|Dt | + |Dt+1 |
I{Dt Dt+1 < 0},
(6)
where I{·} is the indicator function.
The values of Sn and Trn (r = 1, 2, 3) may be small when H0 is true, and
large when H0 is false. Therefore, the above four statistics enable us to measure the
discrepancy between p1 (t) and p2 (t).
In this section, we propose a nonparametric testing method for the problem
(2) using (3), (4), (5) and (6). We call our testing procedure stated below “Mixed
Moving Block Resampling (MMBR) Test.” The main ideas of our testing method
are that we generate blocks of observationsn each sample similar to the moving
block bootstrap( [K89]), and draw resamples without replacement from the mixed
blockshich are constructed by two samples. This is motivated from the technique
that can reflect the null hypothesis by resampling from a combined sample. The idea
of combining observations of two samples and drawing resamples with replacement
from the combined sample is previously considered by [BJV89] and [WT96]. The
former is the test of homogeneity of scale, and the latter is that of equality of two
means. For the mean difference in longitudinal data, [ST05] proposes the test which
generates the resamples with replacement from mixed blocks.
For simplicity, let T be a generic notation for Sn , T1n , T2n or T3n . For a given significance level α, the proposed testing algorithm together with Monte Carlo method
is described as follows.
1. Calculate tobs = T (Y, X) = T (D1 , . . . , Dn ).
2. Put Cy,t = Yt − Ȳ and Cx,t = Xt − X̄, where Ȳ =
n
t=1 Xt /n.
P
Pn
t=1
Yt /n and X̄ =
1090
Hirohito Sakurai and Masaaki Taguri
3. Divide {Cy,1 , . . . , Cy,n } and {Cx,1 , . . . , Cx,n } into k(= n − l + 1) successive
overlapping blocks with each length l, and put the collection of blocks ξy =
{ξy,1 , . . . , ξy,k } and ξx = {ξx,1 , . . . , ξx,k }, where ξy,t = {Cy,t , . . . , Cy,t+l−1 } and
ξx,t = {Cx,t , . . . , Cx,t+l−1 } (t = 1, . . . , k), respectively.
4. Combine ξy and ξx , and put ξpooled = {ξy,1 , . . . , ξy,k , ξx,1 , . . . , ξx,k }.
∗b
∗b
∗b
∗b
5. Draw 2m blocks, ξy∗b = {ξy,1
, . . . , ξy,m
} and ξx∗b = {ξx,1
, . . . , ξx,m
}, without
∗b
∗b
∗b
replacement from ξpooled to obtain resamples Y = {Y1 , . . . , Yn } and X ∗b =
{X1∗b , . . . , Xn∗b } (b = 1, . . . , B), where m = n/l (if [n/l] is an integer) or m =
[n/l] + 1 (otherwise), and [n/l] is the integer part of a real n/l.
6. Calculate t∗b = T (Y ∗b , X ∗b ) = T (D1∗b , . . . , Dn∗b ).
7. Repeating steps 5 and 6 an appropriate number of times B, calculate
t∗1 , . . . , t∗B .
8. From steps 1 and 7, approximate the achieved significance levelby ASL =
B
∗b
≥ tobs }/B, and reject H0 when ASL ≤ α.
b=1 I{t
P
d
d
Note that if l = 1, this reduces to permutation test.
3 Numerical Examination
We conduct Monte Carlo simulations to investigate the size and power properties
of our testing procedure proposed in Section 2. The step 5 of the testing algorithm
generates resamples without replacement, however we also make the comparisons
of resampling algorithms between “with” and “without” replacement in order to
examine the difference of the results. Further, we conduct Bowman and Young’s
test for unpaired data ( [BY96]; hereafter termed “BY” for short) in order to clarify
the properties of our test. In the level and power studies, the nominal level is α = 0.05
and 0.10. All our results are based on independent 2000 pairs of two samples, {Yi (t)}
and {Xj (t)}, where B = 2000 replications of resampling are applied to every two
samples in our test. Naturally, the same initial samples are used for the comparisons.
We generate initial samples according to (1) whose means are specified by p1 (t) =
0 and p2 (t) = c, where c = 0, 0.2, 0.4, 0.6, 0.8, 1.0. The case of c = 0 or c 6= 0
corresponds to the null hypothesis or the alternative hypothesis being true. The
values, q1 , q2 and n, are (q1 , q2 ) = (10, 10), (10, 20), (10, 30), (20, 20), (20, 30), (30, 30)
and n = 10. The size of n may be small, however it is useful in practice to examine
such a behavior of similar settings for the real data described in Section 1. As for
the error terms εi (t) and ηj (t), if p1 (t) and p2 (t) explain most of the correlation
structure contained in the data, then we may consider that the errors are nearly
i.i.d., or very weak dependency exists in εi (t) and ηj (t). Thus, we choose the following
Gaussian AR(1) errors, εi (t) = φεi (t − 1) + zi(t) and ηj (t) = φηj (t − 1) + zj (t), where
i.i.d.
i.i.d.
zi (t) ∼ N (0, τ12 ), zj (t) ∼ N (0, τ22 ), φ = 0, ±0.1, ±0.2, τ12 = τ22 = (1−φ2 )V(εi (t)),
and V(εi (t)) = 1, 2, 3, 4, 5. The computation has been carried out for all combinations
of these parameters, however, to save space, we restrict ourselves to discussing the
case of α = 0.05 and V(εi (t)) = 3.
In the MMBR test, it contains a parameter l, namely block length, to be estimated. Since it is preferable that the empirical level is nearly equal to the nominal
level α, our choice of l is then the length where the empirical level is close to α. If
there are some candidates which have the same level errors, we make the conservative choice, viz., we choose the block length such that the empirical level is less than
Test of mean difference in longitudinal data based on block resampling
1091
the nominal level. Further if there are some candidates whose empirical levels are
equal, we select the length where the empirical power is the highest among them.
The resulting block lengths are given in Table 1.
Table 1. Optimum block length for α = 0.05 and V(εi (t)) = 3
(q1 , q2 ) = (10, 10)
(q1 , q2 ) = (10, 20)
(q1 , q2 ) = (10, 30)
φ −0.2 −0.1 0 0.1 0.2 −0.2 −0.1 0 0.1 0.2 −0.2 −0.1 0 0.1 0.2
(a) T1
5
4 2 1 1
5
5 2 1 1
5
5 3 2 1
T2
4
3 2 1 1
5
4 3 1 1
5
4 3 1 1
T3
4
2 1 3 3
4
2 1 3 3
5
3 1 1 2
Sn
2
2 1 1 2
2
2 1 1 2
2
2 1 1 1
(b) T1
5
4 4 4 2
5
5 3 3 3
5
5 2 3 3
T2
4
2 3 3 3
5
2 3 3 3
5
4 3 2 3
T3
4
2 4 4 3
2
1 3 3 4
5
2 3 3 3
Sn
2
1 1 1 2
2
2 1 1 2
2
1 1 1 1
(q1 , q2 ) = (20, 20)
(q1 , q2 ) = (20, 30)
(q1 , q2 ) = (30, 30)
φ −0.2 −0.1 0 0.1 0.2 −0.2 −0.1 0 0.1 0.2 −0.2 −0.1 0 0.1 0.2
(a) T1
5
4 3 1 1
5
5 3 2 2
5
5 3 2 1
T2
5
3 2 1 1
5
4 3 1 1
5
3 2 1 1
T3
4
3 1 2 2
4
3 1 2 2
4
3 1 1 2
Sn
3
2 1 1 3
3
1 1 1 3
2
2 1 2 2
(b) T1
5
4 2 2 2
5
5 3 3 3
4
4 2 2 2
T2
5
2 3 3 3
5
2 2 3 3
4
3 2 2 2
T3
4
3 3 3 3
4
2 3 3 3
4
2 2 2 3
Sn
2
2 1 1 2
3
1 1 1 1
2
1 1 1 2
(a): Resampling with replacement,
(b): Resampling without replacement
Now, let us first summarize the results of the level studies. The empirical level
of BY and our tests are given in Table 2. BY test seems to have a tendency to keep
the nominal level, however it underestimates the nominal level except for the case of
(q1 , q2 ) = (10, 10). We have also observed that the empirical levels of Trn (r = 1, 2, 3)
and Sn keep the nominal level α, and that Trn (r = 1, 2, 3) needs longer block length
than Sn to keep the nominal level as is shown in Table 1. Most of the resulting block
length for Sn is 1 or 2, while those for T1n , T2n and T3n are greater than or equal
to 3. On the other hand, the comparison of resampling methods between “with” and
“without” replacement shows the similar behavior. For φ < 0, both methods have
a tendency to keep the nominal level, while the level error for φ ≥ 0 seems to be
slightly larger when resampling is done without replacement.
Next, we discuss the power studies. Since we found similar tendencies among the
six cases of (q1 , q2 ), we show the results for (q1 , q2 ) = (20, 20) with φ = 0, 0.1, −0.2.
The empirical power seems to be affected by the numbers of q1 and q2 rather than
the variance of noise. The increase of noise variance causes the decrease of empirical
1092
Hirohito Sakurai and Masaaki Taguri
Table 2. Empirical level (α = 0.05 and V(εi (t)) = 3)
(q1 , q2 )
φ
(10, 10) −0.2
−0.1
0
0.1
0.2
(10, 20) −0.2
−0.1
0
0.1
0.2
(10, 30) −0.2
−0.1
0
0.1
0.2
(20, 20) −0.2
−0.1
0
0.1
0.2
(20, 30) −0.2
−0.1
0
0.1
0.2
(30, 30) −0.2
−0.1
0
0.1
0.2
T1n
0.049
0.048
0.050
0.058
0.077
0.040
0.049
0.051
0.051
0.071
0.036
0.045
0.050
0.055
0.066
0.041
0.050
0.051
0.052
0.070
0.037
0.048
0.049
0.056
0.076
0.036
0.047
0.053
0.053
0.071
with replacement
T2n
T3n
Sn
0.050 0.047 0.059
0.046 0.048 0.064
0.053 0.062 0.054
0.060 0.099 0.075
0.084 0.129 0.092
0.050 0.046 0.054
0.047 0.057 0.062
0.050 0.053 0.053
0.054 0.089 0.078
0.070 0.126 0.097
0.045 0.052 0.047
0.049 0.044 0.056
0.048 0.055 0.048
0.050 0.084 0.068
0.072 0.116 0.089
0.050 0.046 0.054
0.043 0.050 0.051
0.051 0.058 0.051
0.062 0.086 0.075
0.082 0.115 0.096
0.046 0.045 0.050
0.051 0.047 0.038
0.051 0.059 0.058
0.057 0.084 0.077
0.078 0.114 0.100
0.045 0.048 0.045
0.046 0.056 0.053
0.045 0.057 0.059
0.061 0.084 0.077
0.075 0.114 0.092
without replacement
T1n
T2n
T3n
Sn
0.046 0.051 0.044 0.063
0.046 0.050 0.052 0.034
0.054 0.064 0.074 0.057
0.069 0.083 0.095 0.076
0.092 0.112 0.129 0.095
0.038 0.046 0.046 0.059
0.049 0.046 0.056 0.068
0.045 0.055 0.068 0.056
0.059 0.069 0.090 0.080
0.092 0.092 0.126 0.103
0.035 0.045 0.049 0.053
0.046 0.048 0.049 0.035
0.049 0.053 0.064 0.048
0.067 0.073 0.095 0.068
0.089 0.092 0.122 0.094
0.038 0.049 0.044 0.048
0.048 0.051 0.050 0.054
0.057 0.060 0.071 0.053
0.070 0.081 0.093 0.076
0.094 0.104 0.120 0.100
0.035 0.047 0.045 0.054
0.046 0.052 0.048 0.038
0.051 0.064 0.063 0.059
0.062 0.072 0.087 0.078
0.083 0.092 0.117 0.103
0.036 0.043 0.045 0.051
0.044 0.049 0.049 0.042
0.052 0.060 0.069 0.060
0.065 0.076 0.095 0.083
0.087 0.104 0.126 0.095
BY
0.067
0.056
0.054
0.054
0.056
0.037
0.038
0.037
0.038
0.037
0.041
0.043
0.042
0.042
0.042
0.021
0.021
0.025
0.025
0.025
0.029
0.029
0.025
0.025
0.026
0.022
0.022
0.019
0.019
0.020
power as a whole, however the power properties among the five variances given
above are nearly the same each other. Thus, we choose and discuss the case where
V(εi (t)) = 3. Fig. 2 shows the empirical powers corresponding to the four test
statistics, (3) – (6), and BY test. The figure shows that the empirical power of T3n is
most powerful among them, and that the relationship among powers corresponding
to our and BY tests is given by T3n ≥ T2n ≥ T1n ≥ Sn ≥ BY. This indicates the
numerical superiority of our test using the four statistics in power. Especially, the
numerical superiority of T3n in power can be confirmed from Fig. 2. As the number of
subjects increases, the empirical power is improved. Within the values of 0 ≤ c ≤ 1,
the empirical power of T1n is nearly equal to that of T2n , however T2n is slightly
Test of mean difference in longitudinal data based on block resampling
1093
0.6
0.8
1.0
0.2
1.0
0.8
0.4
1.0
1.0
1.0
0.4
0.6
0.8
1.0
0.6
0.8
1.0
T_{1n}
T_{2n}
T_{3n}
Sn
BY
0.8
power
0.6
0.0
0.2
power
0.8
T_{1n}
T_{2n}
T_{3n}
Sn
BY
0.0
0.4
0.2
(iii) φ = −0.2
0.2
0.4
0.0
0.2
0.0
c
0.4
1.0
0.8
T_{1n}
T_{2n}
T_{3n}
Sn
BY
0.6
0.8
(ii) φ = 0.1
0.2
power
0.6
c
(i) φ = 0
0.0
0.6
power
0.2
0.0
c
0.6
0.4
0.4
0.2
0.0
0.2
0.0
0.0
T_{1n}
T_{2n}
T_{3n}
Sn
BY
0.4
1.0
0.8
T_{1n}
T_{2n}
T_{3n}
Sn
BY
0.6
power
0.6
0.4
0.0
0.2
power
0.8
T_{1n}
T_{2n}
T_{3n}
Sn
BY
0.4
1.0
higher than T1n . The results of “with” and “without” resampling were nearly equal
to each other.
0.0
c
(iv) φ = 0
0.2
0.4
0.6
c
(v) φ = 0.1
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
c
(vi) φ = −0.2
Fig. 2. Empirical power for (q1 , q2 ) = (20, 20)
For each combination of q1 and q2 , the panels (i)–(iii) are obtained by resampling
with replacement and the ones (iv)–(vi) are by resampling without replacement.
4 Concluding Remarks
In this paper we have proposed the testing method for two means in longitudinal data
based on the block resampling technique with permutation analogy. Our numerical
studies indicate the applicability of MMBR test for weakly dependent data even
when the sample size is very small. In some cases the effectiveness of application of
block resampling could be confirmed. Applying our test with every possible block
length to the data given in Fig. 1, we have observed that there is a possibility of
the significant difference between the satellite and radar in measuring wind velocity,
which is also supported by the result of [BY96]. However, the problem on block
length selection in the block resampling is very important, and the development of
a fully data-driven approach to selecting block length in MMBR test will be needed
for practical data analyses.
References
[BJV89]
Boos, D., Janssen, P. and Veraverbeke, N.: Resampling from centered
data in the two sample problem. J. Statist. Plan. Inf., 21, 327–345 (1989)
1094
[BY96]
Hirohito Sakurai and Masaaki Taguri
Bowman, A. and Young, S.: Graphical comparison of nonparametric
curves. Appl. Statist., 45, 83–98 (1996)
[B02]
Bühlmann, P.: Bootstraps for time series. Statist. Sci., 17, 52–72 (2002)
[FS04]
Ferreira, E. and Stute, W.: Testing for differences between conditional
means in a time series context. J. Amer. Statist. Assoc., 99, 169–174
(2004)
[HH90]
Hall, P. and Hart, J. D.: Bootstrap test for difference between means in
nonparametric regression. J. Amer. Statist. Assoc., 85, 1039–1049 (1990)
[HHK03] Härdle, W., Horowitz, J. and Kreiss, J.-P.: Bootstrap methods for time
series. Internat. Statist. Rev., 71, 435–459 (2003)
[K89]
Künsch, H. R.: The jackknife and the bootstrap for general stationary
observations. Ann. Statist., 17, 1217–1241 (1989)
[L03]
Lahiri, S. N.: Resampling Methods for Dependent Data. Springer, New
York (2003)
[ST05]
Sakurai, H. and Taguri, M.: Test for difference of two means in longitudinal data using moving block bootstrap. Proc. 5th IASC Asian Conference
on Statistical Computing, 139–142 (2005)
[WT96] Wang, J. and Taguri, M.: Bootstrap method — an introduction from a
two sample problem (in Japanese). Proc. Inst. Statist. Math., 44, 3–18
(1996)
Download