An asymptotic two dependent samples test of

advertisement
An asymptotic two dependent samples test of
equality of means of fuzzy random variables
González-Rodrı́guez, Gil, Colubi, Ana, Gil, Angeles M.1 and D’Urso, Pierpaolo2
1
2
Dpto. de Estadı́stica e I.O. Universidad de Oviedo. 33007 Oviedo. Spain.
{gil,colubi,magil}@uniovi.es
Dipartimento di Scienze Economiche, Gestionali e Sociali. Università degli Studi
del Molise. 86100. Campobasso. Italy. durso@unimol.it
Summary. In this paper, we present an asymptotic procedure to test the equality of (fuzzy) means values of two fuzzy random variables measured on the same
population. We propose a test statistic and we obtain its asymptotic distribution
under the null hypothesis. Since the limit distribution is unknown, we suggest ways
for estimating it and we develop an asymptotic testing procedure on this basis. We
show some simulation results for different dependence degrees between the random
elements which suggest that moderate/large sample sizes are required. Finally, we
present an illustrative example of the practical application of the test.
Key words: fuzzy random variable, equality of means test, dependent samples
1 Introduction
Fuzzy random variables in Puri & Ralescu’s sense [PR86] are employed to manage
random experiments whose outcomes are associated with imprecise values, in such
a way that randomness is modeled through a probability space and the imprecise
values are described by means of fuzzy sets. The expected value of a fuzzy random
variable (see [PR86]) plays an important role in this context, and it is justified by
means of the strong law of large numbers (see, for instance, [CDLG99]). For this
reason, one of the first efforts in developing statistical inference with fuzzy random
variables has been focused on testing about the expected value (see, for instance,
[MCCG04], [Kor00] or [GMGCC06]). In [GMCG06] effective methods developed
to test about means of fuzzy random variables based on bootstrap techniques are
gathered.
As a first step in dealing with dependent samples, in this paper we present
an asymptotic procedure to test the equality of (fuzzy) means values of two fuzzy
random variables measured on the same population. In Section 2 we include the
preliminaries concerning fuzzy random variables. In Section 3 we introduce the test
and the statistic and we obtain its asymptotic distribution under the null hypothesis.
Since the limit distribution is unknown, we suggest a method for estimating it and
we develop an asymptotic testing procedure on this basis. In Section 4 we show some
690
Authors Suppressed Due to Excessive Length
simulation results and, finally, in Section 5, we present an illustrative example of the
practical application of the test.
2 Preliminaries
Let Kc (Rp ) be the class of the nonempty compact convex subsets of Rp endowed with
the Minkowski sum and the product by a scalar, that is, A+B = {a+b | a ∈ A b ∈ B}
and λA = {λa | a ∈ A} for all A, B ∈ Kc (Rp ) and λ ∈ R. We will consider the class
of fuzzy sets
Fc (Rp ) = U : Rp → [0, 1] Uα ∈ Kc (Rp ) for all α ∈ [0, 1]
where Uα is the α-level of U (i.e. Uα = {x ∈ Rp | U (x) ≥ α}) for all α ∈ (0, 1] ,
and U0 is the closure of the support of U . The space Fc (Rp ) can be endowed with
the sum and the product by a scalar based on Zadeh’s extension principle [Zad75],
which satisfies that (U + V )α = Uα + Vα and (λU )α = λUα for all U, V ∈ Fc (Rp ),
λ ∈ R and α ∈ [0, 1].
The support function of a fuzzy set U ∈ Fc (Rp ) is sU (u, α) = supw∈Uα hu, wi for
any u ∈ Sp−1 and α ∈ [0, 1], where Sp−1 is the unit sphere in Rp and h·, ·i denotes
the inner product. The support function allows to embed Fc (Rp ) onto a cone of
the continuous and Lebesgue integrable functions L(Sp−1 ) by means of the mapping
s : Fc (Rp ) → L(Sp−1 × [0, 1]) where s(U ) = sU (see [DK94]).
We will consider the generalized metric by Körner and Näther [KN02], which is
defined so that
[DK (U, V )]2 =
Z
(Sp−1 )2 ×[0,1]2
sU (u, α) − sV (u, α) sU (v, β) − sV (v, β) dK(u, α, v, β),
for all U, V ∈ Fc (Rp ), where K is a positive definite and symmetric kernel, thus DK
coincides with the generic L2 distance k · k2 on the Banach space L(Sp−1 × [0, 1]).
Let (Ω, A, P ) be a probability space. A Fuzzy Random Variable (FRV) in Puri &
p
Ralescu’s sense [PR86] is a mapping X : Ω → Fc (R
) so that the α-level mappings
Xα : Ω → Kc (Rp ), defined so that Xα (ω) = X (ω) α for all ω ∈ Ω, are random sets
(that is, Borel-measurable mappings with the Borel σ-field generated by the topology
associated with the well-known Hausdorff metric dH on K(Rp )). Alternatively, an
FRV is an Fc (Rp )-valued random element (i.e. a Borel-measurable mapping) when
the Skorokhod metric is considered on Fc (Rp ) (see [CDLR02]).
If X : Ω → Fc (Rp ) is a fuzzy random variable such that dH {0}, X0 ∈
L1 (Ω, A, P ), then the expected value (or mean) of X is the unique E(X ) ∈ Fc (Rp )
such that E(X ) α = Aumman’s integral of the random set Xα for all α ∈ [0, 1],
that is,
E(X )
α
= E(X|P ) X : Ω → Rp , X ∈ L1 (Ω, A, P ), X ∈ Xα a.s. [P ] .
3 Two dependent samples asymptotic test for fuzzy
random variables
Let (X , Y) : Ω → Fc (Rp ) × Fc (Rp ) be a two-dimensional fuzzy random variable and
let (X1 , Y1 ), . . . , (Xn , Yn ) be a random sample from (X , Y), that is, {(Xi , Yi )}n
i=1 are
independent random elements distributed as (X , Y).
An asymptotic two dependent samples test of equality of means
691
We consider the problem of testing H0 : E(X ) = E(Y) against H1 : E(X ) 6=
E(Y). When real-valued data are available, the usual statistic for the two dependent
sample problem is based on the sample mean of the difference of the paired data.
Since the difference based on Zadeh’s extension principle does not satisfy that A −
A = 0 and Hukuhara’s difference is not well-defined for all elements in Fc (Rp ), we
considered the use of a statistic
based on the distance
between the sample means
P
Pn
DK (X n , Y n ), where X n = n
X
/n
and
Y
=
Y
i
n
i=1
i=1 i /n. By following the ideas
in [Kor00] and in [GMCG06], we can take advantage of the isometry induced by the
support function to get the following asymptotic result:
Theorem 1. Let (Ω, A, P ) be a probability space, (X , Y) : Ω → Fc (Rp ) × Fc (Rp )
be a two-dimensional fuzzy random element and {(Xi , Yi )}n
i=1 be a family of in2
dependent random elements
distributed
as
(X
,
Y).
If
D
(X
, Y) ∈ L1 (Ω, A, P ) is
K
√
nondegenerate, then nDK X n + E(Y), Y n + E(X ) converges in distribution to
kZk2 , where Z is a Gaussian variable on L(Sp−1 × [0, 1]) with mean 0 and covariance function C(u, α, v, β) = Cov((sX − sY )(u, α), (sX − sY )(v, β)) for all
(u, α), (v, β) ∈ Sp−1 × [0, 1].
Proof: It is easy to check that
√
nDK X n + E(Y), Y n + E(X )
"
#
n
√
1X
= n
(sXi − sYi ) − E(sX − sY ) .
n
i=1
2
2
Consequently, since 0 < E(ksX − sY k22 ) = E(DK
(X , Y)) < +∞ we can apply
p−1
CLT in L(S
× [0, 1]) (see, for instance, [LR79]) to obtain the result.
2
On the basis of Theorem 1, we could obtain a consistent asymptotic testing
procedure in the line of the one in [Kor00]. However, the limit distribution depends
on the covariance functions, which are unknown. To overcome this difficulty we can
approximate kZk2 by means of kZ ∗ k2 , being Z ∗ a Gaussian variable on L(Sp−1 ×
[0, 1]) with mean 0 and covariance function
n (u, α, v, β) =
C
n
1X
(s∗Xi − s∗Yi )(u, α) (s∗Xi − s∗Yi )(v, β)
n i=1
for all (u, α), (v, β) ∈ Sp−1 × [0, 1], where s∗Xi = sXi − sX n .
To obtain the exact distribution of kZ ∗ k2 is a difficult task, but there are several
ways in the literature to approximate it, namely,
1st Approach) If we consider the particular distance
Z
[d2 (U, V )]2 =
(Sp−1 )2 ×[0,1]2
Z
=
[0,1]
sU (u, α) − sV (u, α)
2
du dα
1
1
(inf Uα − inf Vα )2 + (sup Uα − sup Vα )2 dα
2
2
P
∞
2
the distribution of kZ ∗ k22 is equal to
k=1 λk ξk where λ1 ≥ λ2 ≥ . . . are the
eigenvalues of Cn and ξ1 , ξ2 , . . . are independent random variables with distribution
N (0, 1) (see the ideas in [Kor00]). On the basis of Korner’s paper, it is easy to
692
Authors Suppressed Due to Excessive Length
approximate the statistic distribution from the sample in case all fuzzy sets are of
the same L-R type. Otherwise, the approximation of the linear combination becomes
much more complex.
2nd Approach) On the other hand, we could follow the ideas in [CFF04] in order
to estimate kZ ∗ k2 by applying Monte Carlo method and a discretization of the
Gaussian process. However, in fuzzy case, this procedure is not operational due to
the great dependence existing between the α-level mappings.
3rd Approach) As an alternative approach we suggest in this paper to approximate kZ ∗ k2 by means of CLT as follows: we will consider the finite population
{(X1 , Y1 ), . . . , (Xn , Yn )} (that is, the original sample) and we resample it to obtain
∗
∗
m i.i.d. random elements (X1∗ , Y1∗ ), . . . , (Xm
, Ym
) with m ∈ N. Theorem 1 assures
that
√
m DK X m + Y n , Y m + X n
converges in distribution to kZ ∗ k2 as m tends to infinity. Consequently,
we can
√
approximate kZ ∗ k2 by means of m DK X m + Y n , Y m + X n for m large enough.
Thus we propose the following procedure:
Asymptotic testing procedure
√
Step 1: Compute the value of the statistic T = nDK X n , Y n
∗
∗
∗
∗
Step 2: Obtain a random sample (X1 , Y1 ), . . . , (Xm , Ym ) from {(Xi , Yi )}n
i=1 and
√
compute the value T ∗ = m DK X m + Y n , Y m + X n .
Step 3: Repeat step 2 a large number b of times and approximate the p−value as
the proportion of values in {T1∗ , . . . , Tb∗ } greater than T .
4 Simulation studies
In many situations, fuzzy sets describing the imprecise values are combinations of
different types of L-R fuzzy numbers. For this reason we have simulated some of
these combinations in order to analyze the empirical size of the test.
Firstly, we have considered two fuzzy random variables Z and T , being Z a
triangular fuzzy number with left spread behaving as a χ21 random variable, center
varying as a N (0, 1) random variable, and right spread behaving as a χ23 random
variable. Values of variable T are the addition of an S-curve and a Z-curve (see, for
instance, [Cox92]), the parameters of the S-curve varying as a N (0, 1), χ21 and χ22
random variables, respectively, and parameters of the Z-curve varying as a N (0, 1),χ21
and χ22 random variables, respectively.
We consider Xp = Z + pE[Z] + (1 − p)E[T ] and Yp = pZ + (1 − p)T + E[Z] with
p ∈ [0, 1]. It should be noted that Xp and Yp have the same expected value (that
is, H0 is satisfied) for all p ∈ [0, 1] and p allows us to fix the statistical dependence
degree between the variables. For the simulations we have considered three values of
p, namely, p = 0 which corresponds to independence situation, p = .4, for a medium
degree of dependence, and p = .8 for a high degree of dependence.
Each simulation corresponds to 10,000 iterations of the test at a nominal significance level .05 for different sample sizes n. In this section, the generalized distance
has been chosen to be the Bertoluzza et al. one [BCS95] with the Lebesgue measures on [0, 1]. In order to approximated the Gaussian process, we have considered
m = 10, 000. The results are gathered in Table 1.
An asymptotic two dependent samples test of equality of means
693
Table 1. Empirical percentage of rejections under H0 .
n = 30 n = 100 n = 300
p = .0 6.57
p = .4 6.46
p = .8 6.49
5.46
5.51
5.73
4.96
5.05
5.52
In Table 1 we can see that to apply in practice the asymptotic procedure proposed in this paper we need moderate/large sample sizes for all dependence degrees
between the variables, since for n < 100 the empirical percentage of rejections are
far away from the nominal significance level.
5 Illustrative example
The undergraduate students in a course of the University of Oviedo have been asked
for the time spent daily in watching TV at the beginning and at the end of the
academic year. The answers were imprecise (x
e1 = around 1 hour or less, x
e2 =
approximately 1 to 2 hours and x
e3 = around 2 hours or more) and has been described
by means of the fuzzy numbers in Figure 1.
Fig. 1. Fuzzy numbers describing the time spent daily on watching TV.
x
e1 (thick line), x
e2 (dotted line), x
e3 (dashed line)
The collected data are gathered in Table 2. In Figure 2, the samples means
corresponding to the beginning and to the end of the academic year are represented.
They are quite different, although the sample variability is also quite large. If we
quantify the variability of a FRV X as
Desv(X ) =
p
E([DK (X , E(X ))]2 )
(see [KN02]), we have a dispersion of 43.2709 at the beginning and 46.6361 at the
end. Thus, in order to check if the corresponding population expected values are the
same, we can apply the testing procedure developed in the preceding section.
694
Authors Suppressed Due to Excessive Length
Table 2. Sample data of the time spent daily on watching TV.
Beginning x
e1 x
e1 x
e1 x
e2 x
e2 x
e2 x
e3 x
e3 x
e3
End
x
e1 x
e2 x
e3 x
e1 x
e2 x
e3 x
e1 x
e2 x
e3
Frequency 12 8 5 24 17 7 29 19 21
Fig. 2. Fuzzy samples means of the time spent daily on watching TV.
Beginning (thick line). End (dashed line).
We conclude that the null hypothesis asserting the equality of the mean times
spent in watching TV for students at the beginning and at the end of the academic
year is rejected at the usual nominal significance levels, since the p-value obtained
by the asymptotic testing procedure suggested is equal 0.
References
[Aum65] Aumann, R.J.: Integrals of set-valued functions. J. Math. Anal. Appl. 12,
1–12 (1965)
[BCS95] Bertoluzza, C., Corral, N., Salas A.: On a new class of distances between
fuzzy numbers. Mathware & Soft Computing 2, 71–84 (1995)
[CDLG99] Colubi, A., Domı́nguez-Menchero, J. S., López-Dı́az, M., Gil, M. A.: A
generalized Strong Law of Large Numbers. Prob. Theory Rel. Fields 114,
401–417 (1999)
[CDLR02] Colubi, A., Domı́nguez-Menchero, J. S., López-Dı́az, M. and Ralescu,
D.A.: A DE [0, 1]-representation of random upper semicontinuous functions. Proc. Am. Math. Soc. 130, 3237–3242 (2002)
[CFG02] Colubi, A., Fernández-Garcı́a, C., Gil, M. A.: Simulation of random fuzzy
variables: an empirical approach to statistical/probabilistic studies with
fuzzy experimental data. IEEE Trans. Fuzzy Syst. 10, 384–390 (2002)
[Cox92] Cox, E.: The fuzzy Systems Handbook. Academic Press, Cambridge
(1992)
[CFF04] Cuevas, A., Febrero, M., Fraiman, R.: An anova test for functional data.
Comput. Statist. Data Anal. 47, 111–122 (2004)
An asymptotic two dependent samples test of equality of means
[DK94]
695
Diamond, P., Kloeden, P.: Metric Spaces of Fuzzy Sets. World Scientific,
Singapore (1994)
[GMGCC06] Gil, M.A., Montenegro, M., González-Rodrı́guez, G., Colubi, A.,
Casals, M.R.: Bootstrap approach to the Multi-Sample Test of Means
with Imprecise Data. Comp. Stat. Data Anal. (2006, to appear)
[GZ90]
Giné, E., Zinn, J.: Bootstrapping general empirical measures. Ann.
Probab. 18, 851–869 (1990)
[GMCG06] González-Rodrı́guez, G., Montenegro, M., Colubi, A., Gil, M.A.: Bootstrap techniques and fuzzy random variables: Synergy in hypothesis testing with fuzzy data. Fuzzy Sets and Systems (2006, to appear)
[Kor00] Körner, R.: An asymptotic α-test for the expectation of random fuzzy
variables. J. Stat. Plann. Inference 83, 331–346 (2000)
[KN02]
Körner, R., Näther, W.: On the variance of random fuzzy variables. In:
Bertoluzza, C., Gil, M.A., Ralescu, D.A. (Eds.) Statistical Modeling,
Analysis and Management of Fuzzy Data. Physica-Verlag, Heidelberg,
pp. 22–39 (2002)
[LR79]
Laha, R.G., Rohatgi, V.K.: Probability Theory. Wiley, New York (1979)
[MCCG04] Montenegro, M., Colubi, A., Casals, M. R., Gil, M. A.: Asymptotic and
Bootstrap techniques for testing the expected value of a fuzzy random
variable. Metrika 59, 31–49 (2004)
[PR86]
Puri, M. L., Ralescu, D. A.: Fuzzy random variables. J. Math. Anal. Appl.
114, 409–422 (1986)
[Zad75] Zadeh, L.A.: The concept of a linguistic variable and its application to
approximate reasoning, Part 1. Inform. Sci. 8, 199–249; Part 2. Inform.
Sci. 8, 301–353; Part 3. Inform. Sci. 9, 43–80 (1975)
Download