A Note on Hilbertian Elliptically Contoured Distributions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, USA Abstract. In this paper, we discuss elliptically contoured distribution for random variables defined on a separable Hilbert space. It is a generalization of the multivariate elliptically contoured distribution to distributions on infinite dimensional spaces. Some theoretical properties of the Hilbertian elliptically contoured distribution are discussed, examples on functional data are investigated to illustrate the applications of such distributions. Keywords: Elliptically contoured distribution, functional data, Hilbertian random variable. 1 Introduction Elliptically contoured distribution is an important class of distribution in Multivariate Analysis, with some very nice symmetry properties. It is widely used in statistical practices, for example in dimension reduction (Li, 1991, Cook and Weisberg 1991, Schott, 1994) and regression graphs (Cook, 1998). The most important example of this class of distribution is of course the multivariate Gaussian distribution. Properties of multivariate elliptically contoured distribution are well studied, see for example Cambanis, Huang, and Simons (1981), Eaton (1986). Recent developments in statistics has lead us to look beyond random vectors on the Euclidian space. Statistical models for random vectors defined on infinite dimensional Hilbert spaces are in demand. One important example is the functional data analysis (Ramsay and Silverman, 2005), where the data are vectors in a functional space, e.g. the L2 space. Among Hilbertian distributions, the Gaussian distribution is still the most wellunderstood. For example, in functional data analysis, the random functions are 1 E-mail addresses: yehuali@uga.edu usually modeled as a Gaussian processes. The class of elliptically contoured distributions is an important generalization from Gaussian. It has important applications in dimension reduction, see recent literature on functional sliced inverse regression (Ferré and Yao, 2002, Li and Hsing, 2007), yet its theoretical properties are not well studied. The goal of this paper is to fill in this gap. The rest of the paper is organized as the following. We introduce some backgrounds and definitions regarding linear operators and Hilbertian random variables in Section 2. A random representation for Hilbertian elliptically contoured random variables is introduced in Section 3. Some theoretical properties of Hilbertian elliptically contoured distribution are discussed in Section 4, including marginal and conditional distributions of a random variable X when it is mapped into different Hilbert spaces. Finally, we give some examples in Section 5 to illustrate the applications of the theories derived in the previous sections, especially their application in functional data analysis. 2 2.1 Definitions and Backgrounds Linear operators We first introduce some notation and backgrounds for linear operators on Hilbert spaces. More theories on linear operators can be found in Dunford and Schwartz (1988). We will restrict our discussion to separable Hilbert spaces. A separable Hilbert space H is a Hilbert space with a countable basis, {e1 , e2 , · · · }. For two Hilbert spaces H and H0 , a linear operator T : H → H0 is a linear map from H to H0 , i.e. T (ax) = a × (T x), T (x + y) = T x + T y, for any x, y ∈ H and any scalar value a. T is bounded if kT xkH0 ≤ M kxkH , ∀x ∈ H, for some non-negative real number M . Denote the class of bounded linear operator from H to H0 as B(H, H0 ), when H0 = H it is simplified as B(H). 2 The adjoint of an operator T ∈ B(H, H0 ), denoted as T ∗ , is an operator mapping from H0 to H, with hy, T xiH0 = hT ∗ y, xiH , ∀x ∈ H, ∀y ∈ H0 . When H = H0 , T is called self-adjoint if T ∗ = T . 2.2 Hilbertian random variables Let H be a separable Hilbert space with inner product h·, ·iH , (Ω, F, P ) be a probability space, then a Hilbertian valued random variable is a map X : (Ω, F, P ) → H. Since finite dimensional Hilbert space are isomorphic to the Euclidian space, and theories in Multivariate Analysis apply, we are generically interested in random variables on an infinite dimensional Hilbert space. In functional data analysis, H could be L2 functional space, or Sobleve space, etc. R The mean of X, if exits, is defined as µX = EX = X(ω)dP (ω), which is an element in H satisfying hb, EXi = Ehb, Xi for all b ∈ H. The variance of X is an operator on H, defined as VX (g) = E{(X − EX) ⊗ (X − EX)}(g) = E{hX − EX, gi(X − EX)}, for∀g ∈ H. It is easy to show that VX is a self-adjoint, nonnegative definite operator. The characteristic function of a Hilbertian random variable is φX (f ) = E{exp(ihf, Xi)}, (1) for all f ∈ H. For a separable Hilbert space, there is a countable basis, {e1 , e2 , · · · }. Define xj = hej , Xi, they are univariate random variables. Then X has the coordinate (x1 , x2 , · · · ), which is a `2 random variable. Denote the space of H-valued random variables as X , then X is isomorphic to the space of `2 random variables. An operator T is nuclear if the trace of it is finite and independent of the choice of the basis. The trace of an operator is defined as tr(T ) = ∞ X hei , T ei i, i=1 3 for any complete orthonormal basis of H. The covariance operator of X considered in this paper is a self-adjoint, nonnegative definite, nuclear operator, denoted as Γ. Γ has a spectrum decomposition Γ= X λj ψj ⊗ ψj , (2) j where λ1 ≥ λ2 , ≥, · · · ≥ 0 are eigenvalues of Γ, ψj are corresponding eigenvectors. If {ψj } is incomplete, we can always make them into a complete basis for H by including basis of the null space of Γ. ψj ’s are the principal components of the random vector X. Definition 1 A Hilbertian random variable X has an elliptically contoured distribution if the characteristic function of X − µ has the form φX−µ (f ) = φ0 (hf, Γf i) for a univariate function φ0 , where Γ is a self-adjoint, non-negative definite, nuclear operator on H. Denote the distribution of X as HECH (µ, Γ, φ). One important example of elliptically contoured distribution is the Gaussian distribution, whose characteristic function has the form φX−µ (f ) = exp(−hf, Γf i/2). 3 Random representation for Hilbertian valued elliptically contoured random variables For a fixed self-adjoint, non-negative definite, nuclear operator Γ, we can define a 1/2 metric in H, dΓ (x, y) = kx − ykΓ = hx − y, Γ(x − y)iH . Lemma 2 Suppose φX−µ (f ) = φ0 (hf, Γf i) is the characteristic function of an elliptically contoured distribution, then φ0 (·) is a non-negative definite function on H with respect to the metric dΓ (·, ·), i.e. n X ai aj φ0 {d2Γ (fi , fj )} ≥ 0, j=1 for any finite collections of {fi ; i = 1, · · · , n} ⊂ H and for any real values {ai ; i = 1, · · · , n}. 4 Lemma 2 is a straight forward application of the Sazanov theorem (Vakhania, Tarieladze and Chobemyan, 1987), which is a generalization of the Bochner’s Theorem to the infinity dimensional Hilbert space. By the definition of the characteristic function, one can easily see that φX−µ (f − g) = φX−µ (g − f ), which leads to φ0 (kf − gk2Γ ) = φ0 (kf − gk2Γ ). This means φ0 (·) must be real valued. Theorem 3 X ∼ HECH (µ, Γ, φ), if and only if d X = µ + RU, (3) where U ∼ Gaussian(0, Γ), R is a nonnegative univariate random variable independent of U . Proof: We first prove the ”if” part. Suppose (3) is true, and let F be the distribution function of R, then Z φX−µ (f ) = E exp(iRhf, U i) = (4) exp(−r2 hf, Γf i/2)dF (r). [0,∞) By Definition 1, X is a Hilbertian-valued elliptically contoured random variable. Conversely, suppose X is an elliptically contoured Hilbertian random variable with the characteristic function φX−µ (f ) = φ0 (kf kΓ ), by Lemma 2 g(t) = φ0 (t2 ) is a positive definite function. By Theorem 2 in Schoenberg (1938), Z ∞ g(t) = exp(−t2 u2 )dα(u), 0 for some bounded non-decreasing function α(u). By the definition of the characR∞ teristic function, we have 1 = φ0 (0) = 0 dα(u). Therefore, α(·) is the cumulative distribution function of a non-negative random variable. We now change variable, let t = kf kΓ and define a random variable R, such that 2−1/2 R has distribution function α(·). Let F be the distribution function of R, then F (r) = α(2−1/2 r). We have Z ∞ φX−µ (f ) = φ0 (hf, Γf i) = exp(−r2 hf, Γf i/2)dF (r), 0 Therefore, X has the stochastic representation (3). 5 4 Properties of elliptically contoured distribution We first discuss moment properties of elliptically contoured distribution. Suppose the first two moments of X exist. By (3), EX = µ, , V (X) = E(R2 )Γ. On the other hand, if we start from the characteristic function, assuming that φ is twice differentiable, (2) (2) V (X) = φX−µ (0) = 2φ00 (hf, Γf i)Γ + 4φ0 (hf, Γf i)(Γf ) ⊗ (Γf )|f =0 = 2φ00 (0)Γ. To make Γ identifiable, we can let E(R2 ) = 2φ00 (0) = 1, then Γ = V (X) is the covariance operator of X. Theorem 4 Let H, H1 and H2 be separable Hilbert spaces, suppose X ∼ HECH (µ, Γ, φ0 ), P1 ∈ B(H, H1 ), P2 ∈ B(H, H2 ) are two bounded operators. Define Xi = Pi X, µi = Pi µ, Γij = Pi ΓPj∗ , for i, j = 1, 2. Suppose Γ12 = 0. Then X1 ∼ HECH1 (µ1 , Γ11 , φ0 ). If Γ22 is a finite dimensional operator, then X1 |X2 ∼ HECH1 (µ1 , Γ11 , φT (X2 ) ), (5) where φT (X2 ) (t2 ) is a non-negative definite function depends on T (X2 ) = hX2 − 1/2 − µ2 , Γ− 22 (X2 − µ2 )iH2 , Γ22 is a generalized inverse of Γ22 . If Γ22 is an infinite dimensional operator, then X1 |X2 ∼ GaussianH1 (µ1 , r2 (X2 )Γ11 ), (6) where r(·) is a deterministic function given in (7). Proof: By (3), Pi X = Pi µ + RUi , where Ui = Pi U ∼ Gaussian(0, Γii ), for i = 1, 2. Therefore, X1 ∼ HECH1 (µ1 , Γ11 , φ0 ) by Theorem 3. Since Cov(U1 , U2 ) = P1 ΓP2∗ = 0, by the property of Gaussian variables, U1 is independent of U2 . X1 depends on X2 only though the information on R provided by X2 . Suppose Γ22 is finite dimensional, then X2 |R ∼ GaussianH2 (µ2 , R2 Γ22 ). 6 Notice that Γ22 is a finite dimensional operator (matrix) with a generalized inverse Γ− 22 . From the theory of finite dimensional Gaussian, T (X2 ) is a sufficient statistic for R, i.e. X2 |T (X2 ) is independent of R. Therefore, d X1 |X2 = P1 µ + U1 × {R|T (X2 )}. (5) is obtained by Theorem 3. On the other hand, if Γ22 is infinite dimensional, we claim that X2 provide all information about R. It is is easy to see that Γ22 is self-adjoint, non-negative definite and nuclear, therefore it has a spectrum decomposition, Γ22 = ∞ X λj ψj ⊗ ψj , j=1 Pn −1/2 hX2 − j=1 λj d −1/2 λj hX2 − µ2 , ψj iH2 = RU2j , where λj ’s are the positive eigenvalues of Γ22 . Define rn (X2 ) = 1 n µ2 , ψj iH2 . Notice that X2 − µ2 = RU2 , and therefore where U2j are i.i.d Normal(0, 1) independent of R. By Law of Large Numbers, r(X2 ) = lim rn (X2 ) = R n→∞ (7) d with probability 1. Therefore, X1 |X2 = µ1 + r(X2 )U1 which is the Gaussian distribution in (6). Theorem 4 gives the conditional distribution of X1 given X2 when they are uncorrelated, i.e. Γ12 = Cov(P1 X, P2 X) = 0. The following corollary gives the condition distribution for the more general cases. Corollary 5 Let H, H1 and H2 be separable Hilbert spaces, suppose X ∼ HECH (µ, Γ, φ0 ), P1 ∈ B(H, H1 ), P2 ∈ B(H, H2 ), define Xi = Pi X, µi = Pi µ, Γij = Pi ΓPj∗ , for − ∗ i, j = 1, 2. Define µ∗1 = µ1 + Γ12 Γ− 22 (X2 − µ2 ) and Γ11 = Γ11 − Γ12 Γ22 Γ21 . If Γ22 is a finite dimensional operator, X1 |X2 ∼ HECH1 (µ∗1 , Γ∗11 , φT (X2 ) ), (8) where φT (X2 ) (t2 ) is a non-negative definite function depends on T (X2 ) = hX2 − 1/2 µ2 , Γ− 22 (X2 − µ2 )iH2 . 7 If Γ22 is an infinite dimensional operator, then X1 |X2 ∼ GaussianH1 (µ∗1 , r2 (X2 )Γ∗11 ), (9) where r(·) is a deterministic function given in (7). Proof: First of all, Γij = Cov(Xi , Xj ) are bounded operators, like in multivariate analysis, by Cauchy’s inequality, we have Γ∗11 bounded and positive semidefinite. Also, µ∗1 is well defined, since Cov(µ∗1 ) = Γ12 Γ− 22 Γ21 is bounded by Γ11 and therefore ∗ P (µ1 ∈ H1 ) = 1. Let Ui = Pi U , i = 1, 2. Since Ui ’s are Gaussian, it is easy to check though moment calculations that d (U1 , U2 ) = (Z1 + Γ12 Γ− 22 Z2 , Z2 ), where Z1 ∼ GaussianH1 (0, Γ∗11 ), Z2 ∼ GaussianH2 (0, Γ22 ), and they are independent. Therefore d X1 |X2 = {µ1 + RZ1 + Γ12 Γ− 22 (RZ2 )}|X2 d = µ1 + Γ12 Γ− 22 (X2 − µ2 ) + RZ1 |X2 d = µ∗1 + Z1 × (R|X2 ). (10) Here we used the fact that Z1 is independent of X2 . When the range of the operator Γ22 is finite dimensional, X2 is also finite dimensional. Use the arguments like in the proof of Theorem 4, we can show that (R|X2 ) is a nonnegative random variable, which depends on value of X2 only through the statistic T (X2 ). Then (8) follows from a direct application of Theorem 3. When Γ22 is of infinite dimension, like in the proof of Theorem 4, (R|X2 ) is a deterministic function r(X2 ) given by (7). (9) is proved. Remark: Although, in this paper, we are only interested in the case that X is defined on an infinite dimensional Hilbert space H, we do allow the Hilbert spaces H1 and H2 that P1 and P2 mapped into to be finite dimensional. For example, we allow H1 and H2 to be the Euclidian space Rm . See our examples in Section 5. 8 5 Applications To show the usefulness of our theory in statistical practices, we will provide a few examples which are direct results of the theorems in Section 4. Example 1: (Principal Component Analysis) Suppose X ∼ HECH (µ, Γ, φ0 ), and the covariance operator Γ has the spectrum decomposition as in (2). ψj ’s are the principal components of X. Define the principal component score ξj = hψj , XiH for j = 1, 2, · · · , then X has the following decomposition X =µ+ ∞ X ξj ψj . (11) j=1 Such decomposition in functional analysis is also called the Karhunen-Loéve decomposition (Ash and Gardner, 1975). For any finite collection, {ψj1 , · · · , ψjm }, define an operator P : H → Rm by P = ψj1 ⊗ e1 + · · · + ψjm ⊗ em , where ej are the j th column vector of the identity matrix. Then by Theorem 4, (ξj1 , · · · , ξjm )T = P (X − µ) ∼ HECRm {0, diag(λj1 , · · · , λjm ), φ0 }. In other words, any finite collection of principal component scores of X follows a multivariate elliptically contoured distribution. This example also suggest a way to simulate a HECH (µ, Γ, φ) random variable. Since λj → 0 as j → ∞, we can truncate the series in (11) at a large number m and simulate the first m principal component scores from a multivariate elliptical contoured distribution. This is very useful in simulating functional data. Example 2: (Conditional moments) Let X ∼ HECH (0, Γ, φ), P1 ∈ B(H, H1 ), P2 ∈ B(H, H2 ). By Corollary 5, E(P1 X|P2 X) = µ1 + Γ12 Γ− 22 (X2 − µ2 ). Suppose µ = 0, P1 ΓP2∗ = 0, then Γ12 = 0 and we have E(P1 X|P2 X) = 0. 9 On the other hand, by the random representation (10), Var(P1 X|P2 X) = Γ∗11 E(R2 |X2 ). When Γ22 is a finite dimensional operator, E(R2 |X2 ) = g{T (X2 )} for some univariate function g depends on the elliptically contoured distribution, T (X2 ) is defined in Theorem 4. Therefore Var(P1 X|P2 X) = g{T (X2 )}Γ∗11 . When Γ22 is an infinite dimensional operator, by (9), Var(P1 X|P2 X) = r2 (X2 )Γ∗11 . If P1 ΓP2∗ = 0, Γ∗11 = Γ11 . Example 3: (Functional sliced inverse regression) Suppose the Hilbert space H is the L2 [0, 1] functional space, X ∈ H are random functions defined on the [0, 1] interval. A general functional regression model is given by Y = f (hβ1 , Xi, hβ2 , Xi, · · · , hβK , Xi, ²), (12) where Y is a scalar response variable, ² is the error term independent with X, β1 , · · · , βK are linearly independent coefficient function, f is a nonparametric link function. Model (12) is very general, it can be very useful in many applications, see the discussion in Ferré and Yao (2003) and Li and Hsing (2007). Since we do not impose any structure on the link function f , the coefficient functions βk ’s are usually unidentifiable, but the subspace spanned by these function is. This subspace is called the Effective Dimension Reduction space, or the EDR space. We can chose any K orthonormal basis functions in the EDR space as βk ’s, these functions are also called the EDR directions. The functional sliced inverse regression (FSIR) approach can be used to estimate the EDR directions and to decide the dimension of EDR space. We will show that the class of process X with a Hilbertian Elliptically Contoured distribution satisfies a key assumption for (FSIR), and we will discuss an important result for elliptically contoured functional predictor, which is useful for FSIR. For a more comprehensive 10 account for the method and theory of FSIR, we refer to Ferré and Yao (2003), Li and Hsing (2007). One key assumption for FSIR is that, for any β0 ∈ H, then K X E(hβ0 , Xi|hβ1 , Xi, hβ2 , Xi, · · · , hβK , Xi) = c0 + ck hβk , Xi (13) k=1 for some constants c0 , · · · , cK , see Ferré and Yao (2003). We will show this assumption is satisfied if X is elliptically contoured. Define operators P1 x = hβ0 , xi, P2 x = (hβ1 , xi, hβ2 , xi, · · · , hβK , xi)T for ∀x ∈ H. Notice that H2 = RK , and P2 is clearly a finite dimensional operator. For any vector v = (v1 , · · · , vK )T , hP2∗ v, xiH = hv, P2 xiH2 = h K X vk βk , xi, ∀x ∈ H. k=1 Therefore, P2∗ v = (β1 , · · · , βK )v. One can also show that Γ22 is a K × K matrix with the (j, k)th entry equal to hβj , Γβk i. Similarly, Γ12 = (hβ0 , Γβ1 i, · · · , hβ0 , ΓβK ) is a 1 × K matrix. By Corollary 5, E(P1 X|P2 X) = P1 µ + Γ12 Γ− 22 (P2 X − P2 µ). Therefore, assumption (13) is satisfied. Example 4: (Functional sliced inverse regression, continued) Suppose X is elliptically contoured, with mean µ = 0. Let P2 be the operator defined as in the previous example, define the operator P1 x = (hγ1 , xi, · · · , hγm , xi)T for a set of orthonormal vectors in H, {γ1 , · · · , γm }. Suppose P1 ΓP2∗ = 0, i.e. hγj , Γβk i = 0 for j = 1, · · · , m, k = 1, · · · , K. By model (12), all the information in Y about X are contained in P2 X, we have E(P1 X|Y ) = E{E(P1 X|P2 X)|Y }. By Example 2, E(P1 X|P2 X) = 0, therefore P1 E(X|Y ) = 0. 11 (14) Equation (14) provides information about the shape of the inverse regression curve E(X|Y ). On the other hand, Var(P1 X|Y ) = E{Var(P1 X|P2 X)|Y } + Var{E(P1 X|P2 X)|Y } = E{Var(P1 X|P2 X)|Y }. (by E(P1 X|P2 X) = 0) Again, by Example 2, Var(P1 X|P2 X) = g{T (P2 X)}Γ11 , therefore Var(P1 X|Y ) = E[g{T (P2 X)}|Y ]Γ11 . Since Γ11 is the marginal covariance for P1 X, this result shows that the conditional covariance of P1 X given Y is proportional to the the marginal covariance. This result is important to constructing tests for FSIR. References [1] Ash, R. B. and Gardner, M. F. (1975). Topics in stochastic processes, Academic press. [2] Cambanis, S., Huang, S. and Simons, G. (1981). On the theory of elliptically contoured distributions, Journal of Multivariate Analysis, 11, 368-385. [3] Cook, D. R. and Weisberg, S. (1991). Comments on ”Sliced Inverse Regression for Dimension Reduction,” by K. C. Li, Journal of the American Statistical Association, 86, 328-332. [4] Eaton, M. L. (1986). A characterization of spherical distributions’, Journal of Multivariate Analysis, 20, 272-276. [5] Ferré, L. and Yao, A. (2003). Functional sliced inverse regression analysis. Statistics, 37, 475-488. [6] Li, K. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 414, 316-327. [7] Li, Y. and Hsing, T. (2007). Determination of the Dimensionality in Functional Sliced Inverse Regression, manuscript. 12 [8] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd Edition. Springer-Verlag, New York. [9] Schoenberg, I. J. (1938). Metric spaces and completely monotone functions, Annals of Mathematics, 39, No.4, 811-841. [10] Vakhania, N. N., Tarieladze, V. I. and Chobemyan, S.A. (1987). Probability Distributions on Banach Spaces. D. Reidel, Dordrecht. 13