A Note on Hilbertian Elliptically Contoured Distributions

advertisement
A Note on Hilbertian Elliptically Contoured
Distributions
Yehua Li
Department of Statistics, University of Georgia, Athens, GA 30602, USA
Abstract. In this paper, we discuss elliptically contoured distribution for random
variables defined on a separable Hilbert space. It is a generalization of the multivariate elliptically contoured distribution to distributions on infinite dimensional
spaces. Some theoretical properties of the Hilbertian elliptically contoured distribution are discussed, examples on functional data are investigated to illustrate the
applications of such distributions.
Keywords: Elliptically contoured distribution, functional data, Hilbertian random
variable.
1
Introduction
Elliptically contoured distribution is an important class of distribution in Multivariate Analysis, with some very nice symmetry properties. It is widely used in
statistical practices, for example in dimension reduction (Li, 1991, Cook and Weisberg 1991, Schott, 1994) and regression graphs (Cook, 1998). The most important
example of this class of distribution is of course the multivariate Gaussian distribution. Properties of multivariate elliptically contoured distribution are well studied,
see for example Cambanis, Huang, and Simons (1981), Eaton (1986).
Recent developments in statistics has lead us to look beyond random vectors on
the Euclidian space. Statistical models for random vectors defined on infinite dimensional Hilbert spaces are in demand. One important example is the functional data
analysis (Ramsay and Silverman, 2005), where the data are vectors in a functional
space, e.g. the L2 space.
Among Hilbertian distributions, the Gaussian distribution is still the most wellunderstood. For example, in functional data analysis, the random functions are
1
E-mail addresses: yehuali@uga.edu
usually modeled as a Gaussian processes. The class of elliptically contoured distributions is an important generalization from Gaussian. It has important applications
in dimension reduction, see recent literature on functional sliced inverse regression
(Ferré and Yao, 2002, Li and Hsing, 2007), yet its theoretical properties are not well
studied. The goal of this paper is to fill in this gap.
The rest of the paper is organized as the following. We introduce some backgrounds and definitions regarding linear operators and Hilbertian random variables
in Section 2. A random representation for Hilbertian elliptically contoured random
variables is introduced in Section 3. Some theoretical properties of Hilbertian elliptically contoured distribution are discussed in Section 4, including marginal and
conditional distributions of a random variable X when it is mapped into different
Hilbert spaces. Finally, we give some examples in Section 5 to illustrate the applications of the theories derived in the previous sections, especially their application
in functional data analysis.
2
2.1
Definitions and Backgrounds
Linear operators
We first introduce some notation and backgrounds for linear operators on Hilbert
spaces. More theories on linear operators can be found in Dunford and Schwartz
(1988).
We will restrict our discussion to separable Hilbert spaces. A separable Hilbert
space H is a Hilbert space with a countable basis, {e1 , e2 , · · · }. For two Hilbert
spaces H and H0 , a linear operator T : H → H0 is a linear map from H to H0 , i.e.
T (ax) = a × (T x),
T (x + y) = T x + T y,
for any x, y ∈ H and any scalar value a. T is bounded if
kT xkH0 ≤ M kxkH ,
∀x ∈ H,
for some non-negative real number M . Denote the class of bounded linear operator
from H to H0 as B(H, H0 ), when H0 = H it is simplified as B(H).
2
The adjoint of an operator T ∈ B(H, H0 ), denoted as T ∗ , is an operator mapping
from H0 to H, with
hy, T xiH0 = hT ∗ y, xiH ,
∀x ∈ H, ∀y ∈ H0 .
When H = H0 , T is called self-adjoint if T ∗ = T .
2.2
Hilbertian random variables
Let H be a separable Hilbert space with inner product h·, ·iH , (Ω, F, P ) be a probability space, then a Hilbertian valued random variable is a map X : (Ω, F, P ) → H.
Since finite dimensional Hilbert space are isomorphic to the Euclidian space, and
theories in Multivariate Analysis apply, we are generically interested in random variables on an infinite dimensional Hilbert space. In functional data analysis, H could
be L2 functional space, or Sobleve space, etc.
R
The mean of X, if exits, is defined as µX = EX = X(ω)dP (ω), which is an
element in H satisfying hb, EXi = Ehb, Xi for all b ∈ H. The variance of X is an
operator on H, defined as
VX (g) = E{(X − EX) ⊗ (X − EX)}(g)
= E{hX − EX, gi(X − EX)},
for∀g ∈ H.
It is easy to show that VX is a self-adjoint, nonnegative definite operator.
The characteristic function of a Hilbertian random variable is
φX (f ) = E{exp(ihf, Xi)},
(1)
for all f ∈ H.
For a separable Hilbert space, there is a countable basis, {e1 , e2 , · · · }. Define
xj = hej , Xi, they are univariate random variables. Then X has the coordinate
(x1 , x2 , · · · ), which is a `2 random variable. Denote the space of H-valued random
variables as X , then X is isomorphic to the space of `2 random variables.
An operator T is nuclear if the trace of it is finite and independent of the choice
of the basis. The trace of an operator is defined as
tr(T ) =
∞
X
hei , T ei i,
i=1
3
for any complete orthonormal basis of H.
The covariance operator of X considered in this paper is a self-adjoint, nonnegative definite, nuclear operator, denoted as Γ. Γ has a spectrum decomposition
Γ=
X
λj ψj ⊗ ψj ,
(2)
j
where λ1 ≥ λ2 , ≥, · · · ≥ 0 are eigenvalues of Γ, ψj are corresponding eigenvectors.
If {ψj } is incomplete, we can always make them into a complete basis for H by
including basis of the null space of Γ. ψj ’s are the principal components of the
random vector X.
Definition 1 A Hilbertian random variable X has an elliptically contoured distribution if the characteristic function of X − µ has the form φX−µ (f ) = φ0 (hf, Γf i)
for a univariate function φ0 , where Γ is a self-adjoint, non-negative definite, nuclear
operator on H. Denote the distribution of X as HECH (µ, Γ, φ).
One important example of elliptically contoured distribution is the Gaussian distribution, whose characteristic function has the form φX−µ (f ) = exp(−hf, Γf i/2).
3
Random representation for Hilbertian valued elliptically contoured random variables
For a fixed self-adjoint, non-negative definite, nuclear operator Γ, we can define a
1/2
metric in H, dΓ (x, y) = kx − ykΓ = hx − y, Γ(x − y)iH .
Lemma 2 Suppose φX−µ (f ) = φ0 (hf, Γf i) is the characteristic function of an elliptically contoured distribution, then φ0 (·) is a non-negative definite function on H
with respect to the metric dΓ (·, ·), i.e.
n
X
ai aj φ0 {d2Γ (fi , fj )} ≥ 0,
j=1
for any finite collections of {fi ; i = 1, · · · , n} ⊂ H and for any real values {ai ; i =
1, · · · , n}.
4
Lemma 2 is a straight forward application of the Sazanov theorem (Vakhania,
Tarieladze and Chobemyan, 1987), which is a generalization of the Bochner’s Theorem to the infinity dimensional Hilbert space.
By the definition of the characteristic function, one can easily see that φX−µ (f −
g) = φX−µ (g − f ), which leads to φ0 (kf − gk2Γ ) = φ0 (kf − gk2Γ ). This means φ0 (·)
must be real valued.
Theorem 3 X ∼ HECH (µ, Γ, φ), if and only if
d
X = µ + RU,
(3)
where U ∼ Gaussian(0, Γ), R is a nonnegative univariate random variable independent of U .
Proof: We first prove the ”if” part. Suppose (3) is true, and let F be the distribution
function of R, then
Z
φX−µ (f ) = E exp(iRhf, U i) =
(4)
exp(−r2 hf, Γf i/2)dF (r).
[0,∞)
By Definition 1, X is a Hilbertian-valued elliptically contoured random variable.
Conversely, suppose X is an elliptically contoured Hilbertian random variable
with the characteristic function φX−µ (f ) = φ0 (kf kΓ ), by Lemma 2 g(t) = φ0 (t2 ) is
a positive definite function. By Theorem 2 in Schoenberg (1938),
Z ∞
g(t) =
exp(−t2 u2 )dα(u),
0
for some bounded non-decreasing function α(u). By the definition of the characR∞
teristic function, we have 1 = φ0 (0) = 0 dα(u). Therefore, α(·) is the cumulative
distribution function of a non-negative random variable. We now change variable,
let t = kf kΓ and define a random variable R, such that 2−1/2 R has distribution
function α(·). Let F be the distribution function of R, then F (r) = α(2−1/2 r). We
have
Z ∞
φX−µ (f ) = φ0 (hf, Γf i) =
exp(−r2 hf, Γf i/2)dF (r),
0
Therefore, X has the stochastic representation (3).
5
4
Properties of elliptically contoured distribution
We first discuss moment properties of elliptically contoured distribution. Suppose
the first two moments of X exist. By (3),
EX = µ,
, V (X) = E(R2 )Γ.
On the other hand, if we start from the characteristic function, assuming that φ is
twice differentiable,
(2)
(2)
V (X) = φX−µ (0) = 2φ00 (hf, Γf i)Γ + 4φ0 (hf, Γf i)(Γf ) ⊗ (Γf )|f =0 = 2φ00 (0)Γ.
To make Γ identifiable, we can let E(R2 ) = 2φ00 (0) = 1, then Γ = V (X) is the
covariance operator of X.
Theorem 4 Let H, H1 and H2 be separable Hilbert spaces, suppose X ∼ HECH (µ, Γ, φ0 ),
P1 ∈ B(H, H1 ), P2 ∈ B(H, H2 ) are two bounded operators. Define Xi = Pi X, µi =
Pi µ, Γij = Pi ΓPj∗ , for i, j = 1, 2. Suppose Γ12 = 0. Then X1 ∼ HECH1 (µ1 , Γ11 , φ0 ).
If Γ22 is a finite dimensional operator, then
X1 |X2 ∼ HECH1 (µ1 , Γ11 , φT (X2 ) ),
(5)
where φT (X2 ) (t2 ) is a non-negative definite function depends on T (X2 ) = hX2 −
1/2
−
µ2 , Γ−
22 (X2 − µ2 )iH2 , Γ22 is a generalized inverse of Γ22 .
If Γ22 is an infinite dimensional operator, then
X1 |X2 ∼ GaussianH1 (µ1 , r2 (X2 )Γ11 ),
(6)
where r(·) is a deterministic function given in (7).
Proof: By (3), Pi X = Pi µ + RUi , where Ui = Pi U ∼ Gaussian(0, Γii ), for i = 1, 2.
Therefore, X1 ∼ HECH1 (µ1 , Γ11 , φ0 ) by Theorem 3.
Since Cov(U1 , U2 ) = P1 ΓP2∗ = 0, by the property of Gaussian variables, U1 is
independent of U2 . X1 depends on X2 only though the information on R provided
by X2 .
Suppose Γ22 is finite dimensional, then
X2 |R ∼ GaussianH2 (µ2 , R2 Γ22 ).
6
Notice that Γ22 is a finite dimensional operator (matrix) with a generalized inverse
Γ−
22 . From the theory of finite dimensional Gaussian, T (X2 ) is a sufficient statistic
for R, i.e. X2 |T (X2 ) is independent of R. Therefore,
d
X1 |X2 = P1 µ + U1 × {R|T (X2 )}.
(5) is obtained by Theorem 3.
On the other hand, if Γ22 is infinite dimensional, we claim that X2 provide all
information about R. It is is easy to see that Γ22 is self-adjoint, non-negative definite
and nuclear, therefore it has a spectrum decomposition,
Γ22 =
∞
X
λj ψj ⊗ ψj ,
j=1
Pn
−1/2
hX2 −
j=1 λj
d
−1/2
λj hX2 − µ2 , ψj iH2 = RU2j ,
where λj ’s are the positive eigenvalues of Γ22 . Define rn (X2 ) =
1
n
µ2 , ψj iH2 . Notice that X2 − µ2 = RU2 , and therefore
where U2j are i.i.d Normal(0, 1) independent of R. By Law of Large Numbers,
r(X2 ) = lim rn (X2 ) = R
n→∞
(7)
d
with probability 1. Therefore, X1 |X2 = µ1 + r(X2 )U1 which is the Gaussian distribution in (6).
Theorem 4 gives the conditional distribution of X1 given X2 when they are uncorrelated, i.e. Γ12 = Cov(P1 X, P2 X) = 0. The following corollary gives the condition
distribution for the more general cases.
Corollary 5 Let H, H1 and H2 be separable Hilbert spaces, suppose X ∼ HECH (µ, Γ, φ0 ),
P1 ∈ B(H, H1 ), P2 ∈ B(H, H2 ), define Xi = Pi X, µi = Pi µ, Γij = Pi ΓPj∗ , for
−
∗
i, j = 1, 2. Define µ∗1 = µ1 + Γ12 Γ−
22 (X2 − µ2 ) and Γ11 = Γ11 − Γ12 Γ22 Γ21 .
If Γ22 is a finite dimensional operator,
X1 |X2 ∼ HECH1 (µ∗1 , Γ∗11 , φT (X2 ) ),
(8)
where φT (X2 ) (t2 ) is a non-negative definite function depends on T (X2 ) = hX2 −
1/2
µ2 , Γ−
22 (X2 − µ2 )iH2 .
7
If Γ22 is an infinite dimensional operator, then
X1 |X2 ∼ GaussianH1 (µ∗1 , r2 (X2 )Γ∗11 ),
(9)
where r(·) is a deterministic function given in (7).
Proof: First of all, Γij = Cov(Xi , Xj ) are bounded operators, like in multivariate
analysis, by Cauchy’s inequality, we have Γ∗11 bounded and positive semidefinite.
Also, µ∗1 is well defined, since Cov(µ∗1 ) = Γ12 Γ−
22 Γ21 is bounded by Γ11 and therefore
∗
P (µ1 ∈ H1 ) = 1.
Let Ui = Pi U , i = 1, 2. Since Ui ’s are Gaussian, it is easy to check though
moment calculations that
d
(U1 , U2 ) = (Z1 + Γ12 Γ−
22 Z2 , Z2 ),
where Z1 ∼ GaussianH1 (0, Γ∗11 ), Z2 ∼ GaussianH2 (0, Γ22 ), and they are independent.
Therefore
d
X1 |X2 = {µ1 + RZ1 + Γ12 Γ−
22 (RZ2 )}|X2
d
= µ1 + Γ12 Γ−
22 (X2 − µ2 ) + RZ1 |X2
d
= µ∗1 + Z1 × (R|X2 ).
(10)
Here we used the fact that Z1 is independent of X2 .
When the range of the operator Γ22 is finite dimensional, X2 is also finite dimensional. Use the arguments like in the proof of Theorem 4, we can show that (R|X2 )
is a nonnegative random variable, which depends on value of X2 only through the
statistic T (X2 ). Then (8) follows from a direct application of Theorem 3.
When Γ22 is of infinite dimension, like in the proof of Theorem 4, (R|X2 ) is a
deterministic function r(X2 ) given by (7). (9) is proved.
Remark: Although, in this paper, we are only interested in the case that X is
defined on an infinite dimensional Hilbert space H, we do allow the Hilbert spaces
H1 and H2 that P1 and P2 mapped into to be finite dimensional. For example, we
allow H1 and H2 to be the Euclidian space Rm . See our examples in Section 5.
8
5
Applications
To show the usefulness of our theory in statistical practices, we will provide a few
examples which are direct results of the theorems in Section 4.
Example 1: (Principal Component Analysis) Suppose X ∼ HECH (µ, Γ, φ0 ),
and the covariance operator Γ has the spectrum decomposition as in (2). ψj ’s are
the principal components of X. Define the principal component score ξj = hψj , XiH
for j = 1, 2, · · · , then X has the following decomposition
X =µ+
∞
X
ξj ψj .
(11)
j=1
Such decomposition in functional analysis is also called the Karhunen-Loéve decomposition (Ash and Gardner, 1975).
For any finite collection, {ψj1 , · · · , ψjm }, define an operator P : H → Rm by
P = ψj1 ⊗ e1 + · · · + ψjm ⊗ em , where ej are the j th column vector of the identity
matrix. Then by Theorem 4,
(ξj1 , · · · , ξjm )T = P (X − µ) ∼ HECRm {0, diag(λj1 , · · · , λjm ), φ0 }.
In other words, any finite collection of principal component scores of X follows a
multivariate elliptically contoured distribution.
This example also suggest a way to simulate a HECH (µ, Γ, φ) random variable.
Since λj → 0 as j → ∞, we can truncate the series in (11) at a large number m
and simulate the first m principal component scores from a multivariate elliptical
contoured distribution. This is very useful in simulating functional data.
Example 2: (Conditional moments) Let X ∼ HECH (0, Γ, φ), P1 ∈ B(H, H1 ),
P2 ∈ B(H, H2 ). By Corollary 5,
E(P1 X|P2 X) = µ1 + Γ12 Γ−
22 (X2 − µ2 ).
Suppose µ = 0, P1 ΓP2∗ = 0, then Γ12 = 0 and we have
E(P1 X|P2 X) = 0.
9
On the other hand, by the random representation (10),
Var(P1 X|P2 X) = Γ∗11 E(R2 |X2 ).
When Γ22 is a finite dimensional operator, E(R2 |X2 ) = g{T (X2 )} for some univariate function g depends on the elliptically contoured distribution, T (X2 ) is defined
in Theorem 4. Therefore
Var(P1 X|P2 X) = g{T (X2 )}Γ∗11 .
When Γ22 is an infinite dimensional operator, by (9),
Var(P1 X|P2 X) = r2 (X2 )Γ∗11 .
If P1 ΓP2∗ = 0, Γ∗11 = Γ11 .
Example 3: (Functional sliced inverse regression) Suppose the Hilbert space
H is the L2 [0, 1] functional space, X ∈ H are random functions defined on the [0, 1]
interval. A general functional regression model is given by
Y = f (hβ1 , Xi, hβ2 , Xi, · · · , hβK , Xi, ²),
(12)
where Y is a scalar response variable, ² is the error term independent with X,
β1 , · · · , βK are linearly independent coefficient function, f is a nonparametric link
function. Model (12) is very general, it can be very useful in many applications,
see the discussion in Ferré and Yao (2003) and Li and Hsing (2007). Since we do
not impose any structure on the link function f , the coefficient functions βk ’s are
usually unidentifiable, but the subspace spanned by these function is. This subspace
is called the Effective Dimension Reduction space, or the EDR space. We can chose
any K orthonormal basis functions in the EDR space as βk ’s, these functions are
also called the EDR directions.
The functional sliced inverse regression (FSIR) approach can be used to estimate
the EDR directions and to decide the dimension of EDR space. We will show that
the class of process X with a Hilbertian Elliptically Contoured distribution satisfies
a key assumption for (FSIR), and we will discuss an important result for elliptically
contoured functional predictor, which is useful for FSIR. For a more comprehensive
10
account for the method and theory of FSIR, we refer to Ferré and Yao (2003), Li
and Hsing (2007).
One key assumption for FSIR is that, for any β0 ∈ H, then
K
X
E(hβ0 , Xi|hβ1 , Xi, hβ2 , Xi, · · · , hβK , Xi) = c0 +
ck hβk , Xi
(13)
k=1
for some constants c0 , · · · , cK , see Ferré and Yao (2003). We will show this assumption is satisfied if X is elliptically contoured.
Define operators P1 x = hβ0 , xi, P2 x = (hβ1 , xi, hβ2 , xi, · · · , hβK , xi)T for ∀x ∈ H.
Notice that H2 = RK , and P2 is clearly a finite dimensional operator. For any vector
v = (v1 , · · · , vK )T ,
hP2∗ v, xiH
= hv, P2 xiH2 = h
K
X
vk βk , xi,
∀x ∈ H.
k=1
Therefore, P2∗ v = (β1 , · · · , βK )v. One can also show that Γ22 is a K × K matrix
with the (j, k)th entry equal to hβj , Γβk i. Similarly, Γ12 = (hβ0 , Γβ1 i, · · · , hβ0 , ΓβK )
is a 1 × K matrix. By Corollary 5,
E(P1 X|P2 X) = P1 µ + Γ12 Γ−
22 (P2 X − P2 µ).
Therefore, assumption (13) is satisfied.
Example 4: (Functional sliced inverse regression, continued)
Suppose X is elliptically contoured, with mean µ = 0. Let P2 be the operator
defined as in the previous example, define the operator P1 x = (hγ1 , xi, · · · , hγm , xi)T
for a set of orthonormal vectors in H, {γ1 , · · · , γm }. Suppose P1 ΓP2∗ = 0, i.e.
hγj , Γβk i = 0 for j = 1, · · · , m, k = 1, · · · , K.
By model (12), all the information in Y about X are contained in P2 X, we have
E(P1 X|Y ) = E{E(P1 X|P2 X)|Y }.
By Example 2, E(P1 X|P2 X) = 0, therefore
P1 E(X|Y ) = 0.
11
(14)
Equation (14) provides information about the shape of the inverse regression curve
E(X|Y ).
On the other hand,
Var(P1 X|Y ) = E{Var(P1 X|P2 X)|Y } + Var{E(P1 X|P2 X)|Y }
= E{Var(P1 X|P2 X)|Y }.
(by E(P1 X|P2 X) = 0)
Again, by Example 2, Var(P1 X|P2 X) = g{T (P2 X)}Γ11 , therefore
Var(P1 X|Y ) = E[g{T (P2 X)}|Y ]Γ11 .
Since Γ11 is the marginal covariance for P1 X, this result shows that the conditional
covariance of P1 X given Y is proportional to the the marginal covariance. This
result is important to constructing tests for FSIR.
References
[1] Ash, R. B. and Gardner, M. F. (1975). Topics in stochastic processes, Academic press.
[2] Cambanis, S., Huang, S. and Simons, G. (1981). On the theory of elliptically
contoured distributions, Journal of Multivariate Analysis, 11, 368-385.
[3] Cook, D. R. and Weisberg, S. (1991). Comments on ”Sliced Inverse Regression
for Dimension Reduction,” by K. C. Li, Journal of the American Statistical
Association, 86, 328-332.
[4] Eaton, M. L. (1986). A characterization of spherical distributions’, Journal of
Multivariate Analysis, 20, 272-276.
[5] Ferré, L. and Yao, A. (2003). Functional sliced inverse regression analysis.
Statistics, 37, 475-488.
[6] Li, K. (1991). Sliced inverse regression for dimension reduction. Journal of
the American Statistical Association, 86, 414, 316-327.
[7] Li, Y. and Hsing, T. (2007). Determination of the Dimensionality in Functional Sliced Inverse Regression, manuscript.
12
[8] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd
Edition. Springer-Verlag, New York.
[9] Schoenberg, I. J. (1938). Metric spaces and completely monotone functions,
Annals of Mathematics, 39, No.4, 811-841.
[10] Vakhania, N. N., Tarieladze, V. I. and Chobemyan, S.A. (1987). Probability
Distributions on Banach Spaces. D. Reidel, Dordrecht.
13
Download