An asymptotic two dependent samples test of equality of means of fuzzy random variables González-Rodrı́guez, Gil, Colubi, Ana, Gil, Angeles M.1 and D’Urso, Pierpaolo2 1 2 Dpto. de Estadı́stica e I.O. Universidad de Oviedo. 33007 Oviedo. Spain. {gil,colubi,magil}@uniovi.es Dipartimento di Scienze Economiche, Gestionali e Sociali. Università degli Studi del Molise. 86100. Campobasso. Italy. durso@unimol.it Summary. In this paper, we present an asymptotic procedure to test the equality of (fuzzy) means values of two fuzzy random variables measured on the same population. We propose a test statistic and we obtain its asymptotic distribution under the null hypothesis. Since the limit distribution is unknown, we suggest ways for estimating it and we develop an asymptotic testing procedure on this basis. We show some simulation results for different dependence degrees between the random elements which suggest that moderate/large sample sizes are required. Finally, we present an illustrative example of the practical application of the test. Key words: fuzzy random variable, equality of means test, dependent samples 1 Introduction Fuzzy random variables in Puri & Ralescu’s sense [PR86] are employed to manage random experiments whose outcomes are associated with imprecise values, in such a way that randomness is modeled through a probability space and the imprecise values are described by means of fuzzy sets. The expected value of a fuzzy random variable (see [PR86]) plays an important role in this context, and it is justified by means of the strong law of large numbers (see, for instance, [CDLG99]). For this reason, one of the first efforts in developing statistical inference with fuzzy random variables has been focused on testing about the expected value (see, for instance, [MCCG04], [Kor00] or [GMGCC06]). In [GMCG06] effective methods developed to test about means of fuzzy random variables based on bootstrap techniques are gathered. As a first step in dealing with dependent samples, in this paper we present an asymptotic procedure to test the equality of (fuzzy) means values of two fuzzy random variables measured on the same population. In Section 2 we include the preliminaries concerning fuzzy random variables. In Section 3 we introduce the test and the statistic and we obtain its asymptotic distribution under the null hypothesis. Since the limit distribution is unknown, we suggest a method for estimating it and we develop an asymptotic testing procedure on this basis. In Section 4 we show some 690 Authors Suppressed Due to Excessive Length simulation results and, finally, in Section 5, we present an illustrative example of the practical application of the test. 2 Preliminaries Let Kc (Rp ) be the class of the nonempty compact convex subsets of Rp endowed with the Minkowski sum and the product by a scalar, that is, A+B = {a+b | a ∈ A b ∈ B} and λA = {λa | a ∈ A} for all A, B ∈ Kc (Rp ) and λ ∈ R. We will consider the class of fuzzy sets Fc (Rp ) = U : Rp → [0, 1] Uα ∈ Kc (Rp ) for all α ∈ [0, 1] where Uα is the α-level of U (i.e. Uα = {x ∈ Rp | U (x) ≥ α}) for all α ∈ (0, 1] , and U0 is the closure of the support of U . The space Fc (Rp ) can be endowed with the sum and the product by a scalar based on Zadeh’s extension principle [Zad75], which satisfies that (U + V )α = Uα + Vα and (λU )α = λUα for all U, V ∈ Fc (Rp ), λ ∈ R and α ∈ [0, 1]. The support function of a fuzzy set U ∈ Fc (Rp ) is sU (u, α) = supw∈Uα hu, wi for any u ∈ Sp−1 and α ∈ [0, 1], where Sp−1 is the unit sphere in Rp and h·, ·i denotes the inner product. The support function allows to embed Fc (Rp ) onto a cone of the continuous and Lebesgue integrable functions L(Sp−1 ) by means of the mapping s : Fc (Rp ) → L(Sp−1 × [0, 1]) where s(U ) = sU (see [DK94]). We will consider the generalized metric by Körner and Näther [KN02], which is defined so that [DK (U, V )]2 = Z (Sp−1 )2 ×[0,1]2 sU (u, α) − sV (u, α) sU (v, β) − sV (v, β) dK(u, α, v, β), for all U, V ∈ Fc (Rp ), where K is a positive definite and symmetric kernel, thus DK coincides with the generic L2 distance k · k2 on the Banach space L(Sp−1 × [0, 1]). Let (Ω, A, P ) be a probability space. A Fuzzy Random Variable (FRV) in Puri & p Ralescu’s sense [PR86] is a mapping X : Ω → Fc (R ) so that the α-level mappings Xα : Ω → Kc (Rp ), defined so that Xα (ω) = X (ω) α for all ω ∈ Ω, are random sets (that is, Borel-measurable mappings with the Borel σ-field generated by the topology associated with the well-known Hausdorff metric dH on K(Rp )). Alternatively, an FRV is an Fc (Rp )-valued random element (i.e. a Borel-measurable mapping) when the Skorokhod metric is considered on Fc (Rp ) (see [CDLR02]). If X : Ω → Fc (Rp ) is a fuzzy random variable such that dH {0}, X0 ∈ L1 (Ω, A, P ), then the expected value (or mean) of X is the unique E(X ) ∈ Fc (Rp ) such that E(X ) α = Aumman’s integral of the random set Xα for all α ∈ [0, 1], that is, E(X ) α = E(X|P ) X : Ω → Rp , X ∈ L1 (Ω, A, P ), X ∈ Xα a.s. [P ] . 3 Two dependent samples asymptotic test for fuzzy random variables Let (X , Y) : Ω → Fc (Rp ) × Fc (Rp ) be a two-dimensional fuzzy random variable and let (X1 , Y1 ), . . . , (Xn , Yn ) be a random sample from (X , Y), that is, {(Xi , Yi )}n i=1 are independent random elements distributed as (X , Y). An asymptotic two dependent samples test of equality of means 691 We consider the problem of testing H0 : E(X ) = E(Y) against H1 : E(X ) 6= E(Y). When real-valued data are available, the usual statistic for the two dependent sample problem is based on the sample mean of the difference of the paired data. Since the difference based on Zadeh’s extension principle does not satisfy that A − A = 0 and Hukuhara’s difference is not well-defined for all elements in Fc (Rp ), we considered the use of a statistic based on the distance between the sample means P Pn DK (X n , Y n ), where X n = n X /n and Y = Y i n i=1 i=1 i /n. By following the ideas in [Kor00] and in [GMCG06], we can take advantage of the isometry induced by the support function to get the following asymptotic result: Theorem 1. Let (Ω, A, P ) be a probability space, (X , Y) : Ω → Fc (Rp ) × Fc (Rp ) be a two-dimensional fuzzy random element and {(Xi , Yi )}n i=1 be a family of in2 dependent random elements distributed as (X , Y). If D (X , Y) ∈ L1 (Ω, A, P ) is K √ nondegenerate, then nDK X n + E(Y), Y n + E(X ) converges in distribution to kZk2 , where Z is a Gaussian variable on L(Sp−1 × [0, 1]) with mean 0 and covariance function C(u, α, v, β) = Cov((sX − sY )(u, α), (sX − sY )(v, β)) for all (u, α), (v, β) ∈ Sp−1 × [0, 1]. Proof: It is easy to check that √ nDK X n + E(Y), Y n + E(X ) " # n √ 1X = n (sXi − sYi ) − E(sX − sY ) . n i=1 2 2 Consequently, since 0 < E(ksX − sY k22 ) = E(DK (X , Y)) < +∞ we can apply p−1 CLT in L(S × [0, 1]) (see, for instance, [LR79]) to obtain the result. 2 On the basis of Theorem 1, we could obtain a consistent asymptotic testing procedure in the line of the one in [Kor00]. However, the limit distribution depends on the covariance functions, which are unknown. To overcome this difficulty we can approximate kZk2 by means of kZ ∗ k2 , being Z ∗ a Gaussian variable on L(Sp−1 × [0, 1]) with mean 0 and covariance function n (u, α, v, β) = C n 1X (s∗Xi − s∗Yi )(u, α) (s∗Xi − s∗Yi )(v, β) n i=1 for all (u, α), (v, β) ∈ Sp−1 × [0, 1], where s∗Xi = sXi − sX n . To obtain the exact distribution of kZ ∗ k2 is a difficult task, but there are several ways in the literature to approximate it, namely, 1st Approach) If we consider the particular distance Z [d2 (U, V )]2 = (Sp−1 )2 ×[0,1]2 Z = [0,1] sU (u, α) − sV (u, α) 2 du dα 1 1 (inf Uα − inf Vα )2 + (sup Uα − sup Vα )2 dα 2 2 P ∞ 2 the distribution of kZ ∗ k22 is equal to k=1 λk ξk where λ1 ≥ λ2 ≥ . . . are the eigenvalues of Cn and ξ1 , ξ2 , . . . are independent random variables with distribution N (0, 1) (see the ideas in [Kor00]). On the basis of Korner’s paper, it is easy to 692 Authors Suppressed Due to Excessive Length approximate the statistic distribution from the sample in case all fuzzy sets are of the same L-R type. Otherwise, the approximation of the linear combination becomes much more complex. 2nd Approach) On the other hand, we could follow the ideas in [CFF04] in order to estimate kZ ∗ k2 by applying Monte Carlo method and a discretization of the Gaussian process. However, in fuzzy case, this procedure is not operational due to the great dependence existing between the α-level mappings. 3rd Approach) As an alternative approach we suggest in this paper to approximate kZ ∗ k2 by means of CLT as follows: we will consider the finite population {(X1 , Y1 ), . . . , (Xn , Yn )} (that is, the original sample) and we resample it to obtain ∗ ∗ m i.i.d. random elements (X1∗ , Y1∗ ), . . . , (Xm , Ym ) with m ∈ N. Theorem 1 assures that √ m DK X m + Y n , Y m + X n converges in distribution to kZ ∗ k2 as m tends to infinity. Consequently, we can √ approximate kZ ∗ k2 by means of m DK X m + Y n , Y m + X n for m large enough. Thus we propose the following procedure: Asymptotic testing procedure √ Step 1: Compute the value of the statistic T = nDK X n , Y n ∗ ∗ ∗ ∗ Step 2: Obtain a random sample (X1 , Y1 ), . . . , (Xm , Ym ) from {(Xi , Yi )}n i=1 and √ compute the value T ∗ = m DK X m + Y n , Y m + X n . Step 3: Repeat step 2 a large number b of times and approximate the p−value as the proportion of values in {T1∗ , . . . , Tb∗ } greater than T . 4 Simulation studies In many situations, fuzzy sets describing the imprecise values are combinations of different types of L-R fuzzy numbers. For this reason we have simulated some of these combinations in order to analyze the empirical size of the test. Firstly, we have considered two fuzzy random variables Z and T , being Z a triangular fuzzy number with left spread behaving as a χ21 random variable, center varying as a N (0, 1) random variable, and right spread behaving as a χ23 random variable. Values of variable T are the addition of an S-curve and a Z-curve (see, for instance, [Cox92]), the parameters of the S-curve varying as a N (0, 1), χ21 and χ22 random variables, respectively, and parameters of the Z-curve varying as a N (0, 1),χ21 and χ22 random variables, respectively. We consider Xp = Z + pE[Z] + (1 − p)E[T ] and Yp = pZ + (1 − p)T + E[Z] with p ∈ [0, 1]. It should be noted that Xp and Yp have the same expected value (that is, H0 is satisfied) for all p ∈ [0, 1] and p allows us to fix the statistical dependence degree between the variables. For the simulations we have considered three values of p, namely, p = 0 which corresponds to independence situation, p = .4, for a medium degree of dependence, and p = .8 for a high degree of dependence. Each simulation corresponds to 10,000 iterations of the test at a nominal significance level .05 for different sample sizes n. In this section, the generalized distance has been chosen to be the Bertoluzza et al. one [BCS95] with the Lebesgue measures on [0, 1]. In order to approximated the Gaussian process, we have considered m = 10, 000. The results are gathered in Table 1. An asymptotic two dependent samples test of equality of means 693 Table 1. Empirical percentage of rejections under H0 . n = 30 n = 100 n = 300 p = .0 6.57 p = .4 6.46 p = .8 6.49 5.46 5.51 5.73 4.96 5.05 5.52 In Table 1 we can see that to apply in practice the asymptotic procedure proposed in this paper we need moderate/large sample sizes for all dependence degrees between the variables, since for n < 100 the empirical percentage of rejections are far away from the nominal significance level. 5 Illustrative example The undergraduate students in a course of the University of Oviedo have been asked for the time spent daily in watching TV at the beginning and at the end of the academic year. The answers were imprecise (x e1 = around 1 hour or less, x e2 = approximately 1 to 2 hours and x e3 = around 2 hours or more) and has been described by means of the fuzzy numbers in Figure 1. Fig. 1. Fuzzy numbers describing the time spent daily on watching TV. x e1 (thick line), x e2 (dotted line), x e3 (dashed line) The collected data are gathered in Table 2. In Figure 2, the samples means corresponding to the beginning and to the end of the academic year are represented. They are quite different, although the sample variability is also quite large. If we quantify the variability of a FRV X as Desv(X ) = p E([DK (X , E(X ))]2 ) (see [KN02]), we have a dispersion of 43.2709 at the beginning and 46.6361 at the end. Thus, in order to check if the corresponding population expected values are the same, we can apply the testing procedure developed in the preceding section. 694 Authors Suppressed Due to Excessive Length Table 2. Sample data of the time spent daily on watching TV. Beginning x e1 x e1 x e1 x e2 x e2 x e2 x e3 x e3 x e3 End x e1 x e2 x e3 x e1 x e2 x e3 x e1 x e2 x e3 Frequency 12 8 5 24 17 7 29 19 21 Fig. 2. Fuzzy samples means of the time spent daily on watching TV. Beginning (thick line). End (dashed line). We conclude that the null hypothesis asserting the equality of the mean times spent in watching TV for students at the beginning and at the end of the academic year is rejected at the usual nominal significance levels, since the p-value obtained by the asymptotic testing procedure suggested is equal 0. References [Aum65] Aumann, R.J.: Integrals of set-valued functions. J. Math. Anal. Appl. 12, 1–12 (1965) [BCS95] Bertoluzza, C., Corral, N., Salas A.: On a new class of distances between fuzzy numbers. Mathware & Soft Computing 2, 71–84 (1995) [CDLG99] Colubi, A., Domı́nguez-Menchero, J. S., López-Dı́az, M., Gil, M. A.: A generalized Strong Law of Large Numbers. Prob. Theory Rel. Fields 114, 401–417 (1999) [CDLR02] Colubi, A., Domı́nguez-Menchero, J. S., López-Dı́az, M. and Ralescu, D.A.: A DE [0, 1]-representation of random upper semicontinuous functions. Proc. Am. Math. Soc. 130, 3237–3242 (2002) [CFG02] Colubi, A., Fernández-Garcı́a, C., Gil, M. A.: Simulation of random fuzzy variables: an empirical approach to statistical/probabilistic studies with fuzzy experimental data. IEEE Trans. Fuzzy Syst. 10, 384–390 (2002) [Cox92] Cox, E.: The fuzzy Systems Handbook. Academic Press, Cambridge (1992) [CFF04] Cuevas, A., Febrero, M., Fraiman, R.: An anova test for functional data. Comput. Statist. Data Anal. 47, 111–122 (2004) An asymptotic two dependent samples test of equality of means [DK94] 695 Diamond, P., Kloeden, P.: Metric Spaces of Fuzzy Sets. World Scientific, Singapore (1994) [GMGCC06] Gil, M.A., Montenegro, M., González-Rodrı́guez, G., Colubi, A., Casals, M.R.: Bootstrap approach to the Multi-Sample Test of Means with Imprecise Data. Comp. Stat. Data Anal. (2006, to appear) [GZ90] Giné, E., Zinn, J.: Bootstrapping general empirical measures. Ann. Probab. 18, 851–869 (1990) [GMCG06] González-Rodrı́guez, G., Montenegro, M., Colubi, A., Gil, M.A.: Bootstrap techniques and fuzzy random variables: Synergy in hypothesis testing with fuzzy data. Fuzzy Sets and Systems (2006, to appear) [Kor00] Körner, R.: An asymptotic α-test for the expectation of random fuzzy variables. J. Stat. Plann. Inference 83, 331–346 (2000) [KN02] Körner, R., Näther, W.: On the variance of random fuzzy variables. In: Bertoluzza, C., Gil, M.A., Ralescu, D.A. (Eds.) Statistical Modeling, Analysis and Management of Fuzzy Data. Physica-Verlag, Heidelberg, pp. 22–39 (2002) [LR79] Laha, R.G., Rohatgi, V.K.: Probability Theory. Wiley, New York (1979) [MCCG04] Montenegro, M., Colubi, A., Casals, M. R., Gil, M. A.: Asymptotic and Bootstrap techniques for testing the expected value of a fuzzy random variable. Metrika 59, 31–49 (2004) [PR86] Puri, M. L., Ralescu, D. A.: Fuzzy random variables. J. Math. Anal. Appl. 114, 409–422 (1986) [Zad75] Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning, Part 1. Inform. Sci. 8, 199–249; Part 2. Inform. Sci. 8, 301–353; Part 3. Inform. Sci. 9, 43–80 (1975)