2205.01158

Reproducing Kernels and New Approaches in Compositional Data Analysis arXiv:2205.01158v1 [stat.ML] 2 May 2022 Binglin Li1 and Jeongyoun Ahn2 1 2 University of Georgia Korea Advanced Institute of Science and Technology May 4, 2022 Abstract Compositional data, such as human gut microbiomes, consist of non-negative variables whose only the relative values to other variables are available. Analyzing compositional data such as human gut microbiomes needs a careful treatment of the geometry of the data. A common geometrical understanding of compositional data is via a regular simplex. Majority of existing approaches rely on a log-ratio or power transformations to overcome the innate simplicial geometry. In this work, based on the key observation that a compositional data are projective in nature, and on the intrinsic connection between projective and spherical geometry, we re-interpret the compositional domain as the quotient topology of a sphere modded out by a group action. This re-interpretation allows us to understand the function space on compositional domains in terms of that on spheres and to use spherical harmonics theory along with reflection group actions for constructing a compositional Reproducing Kernel Hilbert Space (RKHS). This construction of RKHS for compositional data will widely open research avenues for future methodology developments. In particular, well-developed kernel embedding methods can be now introduced to compositional data analysis. The polynomial nature of compositional RKHS has both theoretical and computational benefits. The wide applicability of the proposed theoretical framework is exemplified with nonparametric density estimation and kernel exponential family for compositional data. 1 Introduction Recent popularity of human gut microbiomes research has presented many data-analytic, statistical challenges (Calle, 2019). Among many features of microbiomes, or meta-genomics data, we address their compositional nature in this work. Compositional data consist of n observations of (d + 1) non-negative variables whose values represent the relative proportions to other variables in the data. Compositional data have been commonly observed in many scientific fields, such as bio-chemistry, ecology, finance, economics, to name just a few. The 1 most notable aspect of compositional data is the restriction on their domain, specifically that the sum of the variables is fixed. The compositional domain is not a classical vector space, but instead a (regular) simplex, that can be modeled by the simplex: ( ) d+1 X ∆d = (x1 , . . . , xd+1 ) ∈ Rd+1 | xi = 1, xi ≥ 0, ∀i , (1) i=1 which is topologically compact. The inclusion of zeros in (1) is crucial as most microbiomes data have a substantial number of zeros. Arguably the most prominent approach to handle the data on a simplex is to take a logratio transformation (Aitchison, 1982), for which one has to consider only the open interior of ∆d , denoted by S d . Zeros are usually taken care of by adding a small number, however, it has been noted that the results of analysis can be quite dependent on how the zeros are handled (Lubbe et al., 2021). Gloor et al. (2017) pointed out “the dangers inherent in ignoring the compositional nature of the data” and argued that microbiome datasets must be treated as compositions at all stages of analysis. Recently some approaches that analyze compositional data without any transformation have been gaining popularity (Li et al., 2020; Rasmussen et al., 2020). The approach proposed in this paper is to construct reproducing kernels of compositional data by interpreting compositional domains via projective spherical geometries. 1.1 Methodological Motivation Besides the motivation from microbiomes studies, another source of inspiration for this work is the current exciting development in statistics and machine learning. In particular, the rising popularity of applying higher tensors and kernel techniques, which allows multivariate techniques to be extended to exotic structures beyond traditional vector spaces, e.g., graphs (Jorgensen and Tian, 2016), manifolds (Minh and Sindhwani, 2011) or images (Zhou et al., 2013). This work serves an attempt to construct reproducing kernel structures for compositional data, so that recent developments of (reproducing) kernel techniques from machine learning theory can be introduced to this classical field in statistics. The approach in this work is to model the compositional data as a group quotient of a sphere Sd /Γ (see (7)), which gives a new connection of compositional data analysis with directional statistics. The idea of representing data by using tensors and frames is not new in directional statistics (Arnold et al., 2018), but the authors find it more convenient to construct reproducing kernels for Sd /Γ (whose reason is given in Section 1.4). We do want to mention that the construction of reproducing kernels for compositional data indicates a new potential paradigm for compositional data analysis: traditional approaches aim to find direct analogue of multivariate concepts, like mean, variance-covariance matrices and suitable regression analysis frameworks based on those concepts. However, finding the mean point over non-linear spaces, e.g. on manifold, is not an easy job, and in worst case scenarios, mean points might not even exist on the underlying space (e.g. the mean point of the uniform distribution on a unit circle is not living on the circle). In this work we take the perspective of kernel mean embedding (Muandet et al., 2017). Roughly speaking, instead of finding the “physical ” point for the mean of a distribution, one 2 could do statistics distributionally. In other words, the mean or expectation is considered as a linear functional on the RKHS, and this functional is represented by an actual function in the Hilbert space, which is referred to as “kernel mean E[k(X, ·)]”. Instead of trying to find another compositional point as the empirical mean of a compositional data set, one can construct “kernel mean” as a replacement of the traditional empirical mean, which is Pn just i=1 k(Xi , ·)/n. Moreover, one can also construct the analogue of variance-covariance matrix purely from kernels; in fact, Fukumizu et al. (2009) considered the gram matrix constructed out of reproducing kernels, as consistent estimators of cross-variance operators (these operators play the role of covariance and cross-variance matrices in classical Euclidean spaces). Since we remodel compositional domain using projective/spherical geometry, compositional domain is not treated as a vector space, but a quotient topological space Sd /Γ. Instead of “putting a linear structure on an Aitchison simplex” (Aitchison, 1982), or square root transformation (which is still transformed from an Aitchison simplex), we choose to “linearize” compositional data points by using kernel techniques (and possibly higher-tensor constructions) and one can still do “multivariate analysis”. Our construction in this work initiates such an attempt to introduce these recent development of kernel and tensor techniques from statistical learning theory into compositional data analysis. 1.2 Contributions of the Present Work Our contribution in this paper is three folds. First, we propose a new geometric foundation for compositional data analysis, Pd≥0 , a subspace of a full projective space Pd . Based on the close connection of spheres with projective spaces, we will also describe Pd≥0 in terms of Sd /Γ, a reflection group acting on a sphere, and the fundamental domain of this actions is the first orthant Sd≥0 (a totally different reason of using “Sd≥0 ” in the traditional approach). Secondly, based on the new geometric foundations of compositional domains, we propose a new nonparametric compositional density estimation by making use of the well-developed spherical density estimation theory. Furthermore, we provide a central limit theorem for integral squared errors, which leads to a goodness-of-fit test. Thirdly, also through this new geometric foundation, function spaces on compositional domains can be related with those on the spheres. Square integrable functions L2 (Sd ) on the sphere is a focus of an ancient subject in mathematics and physics, called “spherical harmonics”. Moreover, spherical harmonics theory also tells that each Laplacian eigenspace of L2 (Sd ) is a reproducing kernel Hilbert space, and this allows us to construct reproducing kernels for compositional data points via “orbital integrals”, which opens a door for machine learning techniques to be applied to compositional data. We also propose a compositional exponential family as a general distributional family for compositional data modeling. 1.3 Why Projective and Spherical Geometries? According to Aitchison (1994), “any meaningful function of a composition must satisfy the requirement f (ax) = f (x) for any a 6= 0.” In geometry and topology, a space consisting of such functions is called a projective space, denoted by Pd , therefore, projective geometry should be the natural candidate to model compositional data, rather than a simplex. Since 3 a point in compositional domains can not have opposite signs, a compositional domain is in fact a “positive cone” Pd≥0 inside a full projective space. A key property of projective spaces is that stretching or shrinking the length of a vector in Pd does not alter the point. Thus one can stretch a point in ∆d to a point in the first orthant sphere by dividing it by its `2 norm. Figure 1 illustrates this stretching (“stretching” is not a transformation from projective geometry point of view) in action. In short, projective geometry is more natural to model the compositional data according to the original philosophy in Aitchison (1994). However, spheres are easier to work with because mathematically speaking, the function space on spheres is a well-treated subject in spherical harmonics theory, and statistically speaking, we can connect with directional statistics in a more natural way. Our compositional domain Pd≥0 can be naturally identified with Sd /Γ, a sphere modded out by a reflection group action. This reflection group Γ acts on the sphere Sd by reflection, and the fundamental domain of this action is Sd≥0 (notions of group actions, fundamental domains and reflection groups are all discussed in Section 2). Thus our connection with the first orthant sphere Sd≥0 is a natural consequence of projective geometry and its connection with spheres with group actions, having nothing to do with square root transformations. 1.4 Why Reproducing Kernels? As explained in Section 1.1, we strive to use new ideas of tensors and kernel techniques in machine learning to propose another framework for compositional data analysis, and Section 1.3 explains the new connection with spherical geometry and directional statistics. However, it is not new in directional statistics where the idea of tensors was used to represent data points (Arnold et al., 2018). So a naive idea would be to mimic directional statistics when studying ambiguous rotations: Arnold et al. (2018) studied how to do statistics over coset space SO(3)/K where K is a finite subgroup of SO(3). In their case, the subgroup K has to be a special class of subgroups of special orthogonal groups, and within this class, they manage to study the corresponding tensors and frames, which gives the inner product structures of different data points. However, in our case, a compositional domain is Sd /Γ = O(d) \ O(d + 1)/Γ, a double coset space. Unlike Arnold et al. (2018) that only considered d = 3 case, our dimension d is completely general; moreover, our reflection group Γ is not a subgroup of any special orthogonal groups, so constructions of tenors and frames in Arnold et al. (2018) does not apply to our situation directly. Part of the novelty of this work is to get around this issue by making use of the reproducing kernel Hilbert Space (RKHS) structures on spheres, and “averaging out” the group action at the reproducing kernel level, which in return gives us a reproducing kernel structure on compositional domains. Once we have RKHS in hand, we can “add” and take the “inner product” of two data points, so our linearization strategy can also be regarded as a combination of “the averaging approach” and the “embedding approach” as in Arnold et al. (2018). In fact, an abstract function space together with reproducing kernels plays an increasingly important role. In below we provide some philosophical motivations on the importance of function space over underlying data set: 4 (a) Hilbert spaces of functions are naturally linear with an inner product structure. With the existence of (reproducing) kernels, data points are naturally incorporated into the function space, which leads to interesting interactions between the data set and functions defined over them. There has been a large amount of literature of embedding distributions into RKHS, e.g. Smola et al. (2007), and using reproducing kernels to recover exponential families, e.g. Dai et al. (2019). RKHS has also been used to recover classical statistical tests, e.g. goodness-of-fit test in Chwialkowski et al. (2016), and regression in de los Campos et al. (2009). Those works do not concern the analysis of function space, but primarily focus on the data analysis on the underlying data set, but all of them are done by passing over to RKHS. This implies the increasing recognition of the importance of abstract function space with (reproducing) kernel structure. (b) Mathematically speaking, given a geometric space M , the function space on M can recover the underlying geometric space M itself, and this principle has been playing a big role in different areas of geometry; in particular, modern algebraic geometry, following the philosophy of Grothendieck, is based upon this insight. Function spaces can be generalized to matrix valued function spaces, and this generalization gives rise to non-commutative RKHS, which is used in shape analysis in Micheli and Glaunés (2014); moreover, non-commutative RKHS is connected with free probability theory (Ball et al., 2016), which has been used in random effects and linear mixed effects models (Zhou and Johnstone, 2019; Fan et al., 2021) . 1.5 Structure of the Paper We describe briefly the content of the main sections of this article: • In Section 2, we will rebuild the geometric foundation of compositional domains by using projective geometry and spherical geometry. We will also point out that the old model using the closed simplex ∆d is topologically the same as the new foundation. In diagrammatic way, we establish the following topological equivalence: ∆d ∼ = Pd≥0 ∼ = Sd /Γ ∼ = Sd≥0 , (2) where Sd≥0 is the first orthant sphere, which is also the fundamental domain of the group action Γ y Sd . All of the four spaces in (2) will be referred to as “compositional domains”. As a direct application, we propose a compositional density estimation method by using the spherical density estimation theory via a spread-out construction through the quotient map π : Sd → Sd /Γ, and proved that our compositional density estimator also possesses integral square errors that satisfies central limit theorems (Theorem 2.6), which can be used for goodness-of-fit tests. • Section 3 will be devoted to constructing. compositional reproducing kernel Hilbert spaces. Our construction relies on the reproducing kernel structures on spheres, which is given by spherical harmonics theory. Wahba (1981) constructed splines using reproducing kernel structures on S2 (2-dimensional sphere), in which she also used spherical 5 Figure 1: Illustration of the stretching action on ∆1 to S1 . Note that the stretching keeps the relative compositions where the square root transformation fails to do so. harmonics theory in Sansone (1959), which only treated 2-dimensional case. Our theory deals with general d-dimensional case, so we need the full power of spherical harmonics theory, which will be reviewed at the beginning of Section 3, and then we will use spherical harmonics theory to construct compositional reproducing kernels using an “orbital integral” type of idea. • Section 4 will give a couple of applications of our construction of compositional reproducing kernels. (i) The first example is the representer theorem, but with one caveat: our RKHS is finite dimensional consisting degree 2m homogeneous polynomials, with no transcendental functions, so linear independence for distinct data points is not directly available, however we show that when the degree m is high enough, linear independence still holds. Our statement of representer theorem is not new purely from RKHS theory point of view. Our point is to demonstrate that intuitions from traditional statistical learning can still be used in compositional data analysis, with some extra care. (ii) Secondly, we construct the compositional exponential family, which can be used to model the underlying distribution of compositional data. The flexible construction will enable us to utilize the distribution family in many statistical problems such as mean tests. 2 New Geometric Foundations of Compositional Domains In this section, we give a new interpretation of compositional domains as a cone Pd≥0 in a projective space, based on which compositional domains can be interpreted as spherical quotients by reflection groups. This connection will yield a “spread-out” construction on spheres and we demonstrate an immediate application of this new approach to compositional density estimation. 6 2.1 Projective and Spherical Geometries and a Spread-out Construction Compositional data consist of relative proportions of d + 1 variables, which implies that each observation belongs to a projective space. A d-dimensional projective space Pd is the set of one-dimensional linear subspace of Rd+1 . A one-dimensional subspace of a vector space is just a line through the origin, and in projective geometry, all points in a line through the origin will be regarded as the same point in a projective space. Contrary to the classical linear coordinates (x1 , · · · , xd+1 ), a point in Pd can be represented by a projective coordinate (x1 : · · · : xd+1 ), with the following property (x1 : x2 : · · · : xd+1 ) = (λx1 : λx2 : · · · : λxd+1 ), for any λ 6= 0. It is natural that an appropriate ambient space for compositional data is non-negative projective space, which is defined as Pd≥0 = (x1 : x2 : · · · : xd+1 ) ∈ Pd | (x1 , x2 : · · · : xd+1 ) = (|x1 | : |x2 | : · · · : |xd+1 |) . (3) It is clear that the common representation of compositional data with a (closed) simplex ∆d in (1) is in fact equivalent to (3), thus we have the first equivalence: Pd≥0 ∼ = ∆d . (4) Let Sd denote a d-dimensional unit sphere, defined as ( ) d+1 X d d+1 2 S = (x1 , x2 , . . . , xd+1 ) ∈ R | xi = 1 , i=1 and let Sd≥0 denote the first orthant of Sd , a subset in which all coordinates are non-negative. The following lemma states that Sd≥0 can be a new domain for compositional data as there exists a bijective map between ∆d and Sd≥0 . Lemma 2.1. There is a canonical identification of ∆d with Sd≥0 , namely, ∆d o f g / d S ≥0 , where f is the inflation map g is the contraction map, with both f and g being continuous and inverse to each other. Proof It is straightforward to construct the inflation map f . For v ∈ ∆d , it is easy to see that f (v) ∈ Sd≥0 when f (v) = v/kvk2 , where kvk2 is the `2 norm of v. Note that the inflation map makes sure that f (v) is in the same projective space as v. To construct the shrinking map g, for s ∈ Sd≥0 we define g(s) = s/ksk1 , where ksk1 is the `1 norm of s and see that g(s) ∈ ∆d . One can easily check that both f and g are continuous and inverse to each other. 7 Based on Lemma 2.1, we now identify ∆d alternatively with the quotient topological space Sd /Γ for some group action Γ. In order to do so, we first show that the cone Sd≥0 is a strict fundamental domain of Γ, i.e., Sd≥0 ∼ = Sd /Γ. We start by defining a coordinate hyperplane for a group. The i-th coordinate hyperplane Hi ∈ Rd+1 with respect to a choice of a standard basis {e1 , e2 , . . . , ed+1 } is a codimension one linear subspace which is defined as Hi = {(x1 , . . . , xi , . . . , xd+1 ) ∈ Rd+1 : xi = 0}, i = 1, . . . , d + 1. We define the reflection group Γ with respect to coordinate hyperplanes as the follows: Definition 2.2. The reflection group Γ is a subgroup of general linear group GL(d + 1) and it is generated by {γi , i = 1, . . . , d + 1}. Given the same basis {e1 , . . . , ed+1 } for Rd+1 , the reflection γi is a linear map specified via: γi : (x1 , . . . , xi−1 , xi , xi+1 , . . . , xd+1 ) 7→ (x1 , . . . , xi−1 , −xi , xi+1 , . . . , xd+1 ). Note that if restricted on Sd , γi is an isometry map from the unit sphere Sd to itself, which we denote by Γ y Sd . Thus, one can treat the group Γ as a discrete subgroup of the isometry group of Sd . In what follows we establish that Sd≥0 is a fundamental domain of the group action Γ y Sd in the topological sense. In general, there is no uniform treatment of a fundamental domain, but we will follow the approach by Beardon (2012). To introduce a fundamental domain, let us define an orbit first. For a point z ∈ Sd , an orbit of the group Γ is the following set: OrbitΓz = {γ(z), ∀γ ∈ Γ}. (5) Note that one can decompose Sd into a disjoint union of orbits. The size of an orbit is not necessarily the same as the size of the group |Γ|, because of the existence of a stabilizer subgroup, which is defined as Γz = {γ ∈ Γ, γ(z) = z}. (6) The set Γz forms a group itself, and we call this group Γz the stabilizer subgroup of Γ. Every element in OrbitΓz has isomorphic stabilizer subgroups, thus the size of OrbitΓz is the quotient |Γ|/|Γz |, where | · | here is the cardinality of the sets. There are only finite possibilities for the size of a stabilizer subgroup for the action Γ y Sd , and the size of stabilizer subgroups is dependent on codimensions of coordinate hyperplanes. Definition 2.3. Let G act properly and discontinuously on a d-dimensional sphere, with d > 1. A fundamental domain for the group action G is a closed subset F of the sphere such that every orbit of G intersects F in at least one point and if an orbit intersects with the interior of F , then it only intersects F at one point. A fundamental domain is strict if every orbit of G intersects F at exactly one point. The following proposition identifies Sd≥0 as the quotient topological space Sd /Γ, i.e., Sd≥0 = Sd /Γ. Proposition 2.4. Let Γ y Sd be the group action described in Definition 2.2, then Sd≥0 is a strict fundamental domain. 8 In topology, there is a natural quotient map Sd → Sd /Γ. With the identification Sd≥0 = Sd /Γ, there should be a natural map Sd → Sd≥0 . Now define a contraction map c : Sd → Sd≥0 via (x1 , . . . , xd+1 ) 7→ (|x1 |, . . . , |xd+1 |) by taking component-wise absolute values. Then it is straightforward to see that the c is indeed the topological quotient map Sd → Sd /Γ, under the identification Sd≥0 = Sd /Γ. So far, via (4), Lemma 2.1 and Proposition 2.4, we have established the following equivalence: (7) Pd≥0 = ∆d = Sd≥0 = Sd /Γ. For the rest of the paper we will use the four characterizations of a compositional domain interchangeably. 2.1.1 Spread-Out Construction Based on (7), one can turn a compositional data analysis problem into one on a sphere via spread-out construction. The key idea is to associate one compositional data point z ∈ ∆d = Sd≥0 with a Γ-orbit of data points OrbitΓz ⊂ Sd in (5). Formally, given a point z ∈ ∆d , we construct the following data set (not necessarily a set because of possible repetitions): c−1 (z) = |Γz0 | copies of z 0 , for z 0 ∈ OrbitΓz , (8) where Γz0 is the stabilizer subgroup of Γ with respect to z 0 in (6). In general, if there are n observations in ∆d , the spread-out construction will create a data set with n2d+1 observations on Sd , in which observations with zero coordinates are repeated. Figure 2 (a) and (b) illustrate this idea with a toy data set with d = 2. 2.2 Illustration: Compositional Density Estimation The spread-out construction in (8) provides a new intimate relation between directional statistics and compositional data analysis. Indeed, this construction produces a directional data set out of a compositional data set, then we can literally transform a compositional data problem into a directional statistics problem via this spread-out construction. For example, we can perform compositional independence/uniform tests by doing directional independence/uniform tests (Jupp and Spurr, 1985; Jupp, 2008) through spread-out constructions. In this section, we will give a new compositional density estimation framework by using spread-out constructions. In directional statistics, density estimation for spherical data has a long history dating back to the late 70s in Beran (1979). In the 80s, Hall et al. (1987) and Bai et al. (1989) established systematic framework for spherical density estimation theory. Spherical density estimation theory became popular later partly because its integral squared error (ISE) is close dly related with goodness of fit test as in Zhao and Wu (2001) and a recent work Garcı́a-Portugués et al. (2015). The rich development in spherical density estimation theory will yield a compositional density framework via spread-out constructions. In the following we apply this idea to nonparametric density estimation for compositional data. Instead of directly estimating the density on ∆d , one can perform the estimation with the spread-out data on Sd , from which a density estimate for compositional data can be obtained. 9 (a) Compositional data on ∆2 (b) “Spread-out” data on S2 (c) Density estimate on S2 (d) “Pulled-back” estimate on ∆2 Figure 2: Toy compositional data on the simplex ∆2 in (a) are spread out to a sphere S2 in (b). The density estimate on S2 in (c) are pulled back to ∆2 in (d). Let p(·) denote a probability density function of a random vector Z on Sd≥0 , or equivalently on ∆d . The following proposition gives a form of the density of the spread-out random vector Γ(Z) on the whole sphere Sd . Proposition 2.5. Let Z be a random variable on Sd≥0 with probability density p(·), then the induced random variable Γ(Z) = {γ(Z)}γ∈Γ , has the following density p̃(·) on Sd : p̃(z) = |Γz | p(c(z)), z ∈ Sd , |Γ| (9) where |Γz | is the cardinality of the stabilizer subgroup Γz of z. Let c∗ denote the analogous operation for functions to the contraction map c that applies to data points. It is clear that given a probability density p̃ on Sd , we can obtain the original 10 density on the compositional domain via the “pull back” operation c∗ : X p(z) = c∗ (p̃)(z) = p̃(x), z ∈ Sd≥0 . x∈c−1 (z) Now consider estimating density on Sd with the spread-out data. Density estimation for data on a sphere has been well studied in directional statistics (Hall et al., 1987; Bai et al., 1989). For x1 , . . . xn ∈ Sd , a density estimate for the underlying density is n ch X 1 − z T xi ˆ fn (z) = K , z ∈ Sd , n i=1 hn where K is a kernel function that satisfies common assumptions in Assumption A.1, and ch is a normalizing constant. Applying this to the spread-out data c−1 (xi ), i = 1, . . . , n, we have a density estimate of p̃(·) defined on Sd : T X c 1 − z γ(x ) h i Γ fˆn (z) = K , z ∈ Sd , (10) n|Γ| 1≤i≤n,γ∈Γ hn from which a density estimate on the compositional domain is obtained by applying c∗ . That is, X pˆn (z) = c∗ fˆnΓ (z) = fˆnΓ (x), z ∈ Sd≥0 . x∈c−1 (z) Figure 2 (c) and (d) illustrate this density estimation process with a toy example. The consistency of the spherical density estimate fˆn is established by Zhao and Wu (2001); Garcı́a-Portugués et al. (2015), where it is shown that the integral squared error (ISE) of R 2 ˆ ˆ fn , Sd (fn − f ) dz follows a central limit theorem. It is straightforward to show that the ISE of the proposed compositional density estimator p̂n on the compositional domain also asymptotically normally distributed by CLT. However, the CLT of ISE for spherical densities in Zhao and Wu (2001) contains an unnecessary finite support assumption on the density kernel function K (very different from reproducing kernels); although in Garcı́a-Portugués et al. (2015) such finite support condition is dropped, their result was on directional-linear data, and their proof does not directly applies to the pure directional context. For the readers’ convenient, we will provide the proof for the CLT of ISE for both compositional and spherical data, without the finite support condition as in Zhao and Wu (2001) Theorem 2.6. CLT for ISE holds for both directional and compositional data under the mild conditions (H1, H2 and H3) in Section A.1, without the finite support condition on density kernel functions K. The detail of the proof of THeorem 2.6 plus the statements of the technical conditions can be found Section A.1. 11 3 Reproducing Kernels of Compositional Data We will be devoted to construct reproducing kernel structures on compositional domains, based on the topological re-interpretation of ∆d in Section 2. The key idea is that based on the quotient map π : Sd → Sd /Γ = ∆d , we can use function spaces on spheres to understand function spaces on compositional domains. Moreover, we can construct reproducing kernel structures of a compositional domain ∆d based on those on Sd . The reproducing kernel was first introduced in 1907 by Zaremba when he studied boundary value problems for harmonic and biharmonic functions, but the systematic development of the subject was finally done in the early 1950s by Aronszajn (1950). Reproducing kernels on Sd were essentially discovered by Laplace and Legendre in the 19th centuary, although the reproducing kernels on spheres were called zonal spherical functions at that time. Both spherical harmonics theory and RKHS have found applications in theoretical subjects like functional analysis, representation theory of Lie groups and quantum mechanics. In statistics, the successful application of RKHS in spline models by Wahba (1981) popularized RKHS theory for Sd . In particular, they used spherical harmonics theory to construct an RKHS on S2 . Generally speaking, for a fixed topological space X, there exists (and one can construct) multiple reproducing kernel Hilbert spaces on X; In their work, an RKHS on S2 was constructed by considering a subspace of L2 (S2 ) under a finiteness condition, and the reproducing kernels were also built out of zonal spherical functions. Their work is motivated by studying spline models on the sphere, while our motivation has nothing to do with spline models of any kind. In this work we consider reproducing structures on spheres which are different from the one in Wahba (1981), but we share the same building blocks, which is spherical harmonics theory. Evolved from the re-interpretation of a compositional domain ∆d as Sd /Γ, we will construct reproducing kernels of compositional by using reproducing kernel structures on spheres. Since spherical harmonics theory gives reproducing kernel structures on Sd , and a compositional domain ∆d are topologically covered by spheres with their deck transformations group Γ. Thus naturally we wonder (i) whether function spaces on ∆d can identified with the subspaces of Γ-invariant functions on Sd , and (ii) whether one might “build” Γ-invariant kernels out of spherical reproducing kernels, and hope that the Γ-invariant kernels can play the role of “reproducing kernels” on ∆d . It turns out that the answers for both (i) and (ii) are positive (see Remark 3.8 and Theorem 3.12). The discovery of reproducing kernel structures on ∆d is crucially based on the reinterpretation of compositional domains via projective and spherical geometries in Section 2. By considering Γ-invariant objects in spherical function spaces we managed to construct reproducing kernel structures for compositional domains, and compositional reproducing Hilbert spaces. Although compositional RKHS was first considered as a candidate “inner product space” for data points to be mapped into, the benefit of working with RKHS goes far beyond than this, due to exciting development of kernel techniques in machine learning theory that can be applied to compositional data analysis as is mentioned in Section 1.1. This gives a new chance to construct a new framework for compositional data analysis, in which we “upgrade” compositional data points as functions (via reproducing kernels), and the classical statistical notions, like means and variance-covariances, will be “upgraded” to linear functionals and linear operators over the functions space. Traditionally important statistical 12 topics such as dimension reduction, regression analysis, and many inference problems can be re-addressed in the light of this new “kernel techniques”. 3.1 Recollection of Basic Facts from Spherical Harmonics Theory We give a brief review of the theory of spherical harmonics in the following. See Atkinson and Han (2012) for a general introduction to the topic. In classical linear algebra, a finite dimensional linear space with a linear map to itself can be decomposed into direct sums of eigenspaces. Such a phenomenon still holds for L2 (Sd ) with Laplacians being the linear operator to itself. Recall that the Laplacian operator on a function f with d + 1 variables is ∆f = d+1 2 X ∂ f i=1 ∂x2i . Let Hi be the i-th eigenspace of the Laplacian operator. It is known that L2 (Sd ) can be orthogonally decomposed as ∞ M 2 d L (S ) = Hi , (11) i=1 where the orthogonality is endowed with respect to the inner product in L2 (Sd ): hf, gi = Z f ḡ. Sd Let Pi (d + 1) be the space of homogeneous polynomials of degree i in d + 1 coordinates on Sd . A homogeneous polynomial is a polynomial whose terms are all monomials of the same degree, e.g., P4 (3) includes xy 3 + x2 yz. Further, let Hi (d + 1) be the space of homogeneous harmonic polynomials of degree i on Sd , i.e., Hi (d + 1) = {P ∈ Pi (d + 1)| ∆P = 0}. (12) For example, x3 y + xy 3 − 3xyz 2 and x2 − 6x2 y 2 + y 4 are members of H4 (3). Importantly, the spherical harmonic theory has established that each eigenspace Hi in (11) is indeed the same space as Hi (d + 1). This implies that any function in L2 (Sd ) can be approximated by an accumulated direct sum of orthogonal homogeneous harmonic polynomials. The following well-known proposition further reveals that the Laplacian constraint in (12) is not necessary to characterize the function space on the sphere. Proposition 3.1. Let Pm (d + 1) be the space of degree m homogeneous polynomial on d + 1 variables on the unit sphere and Hi be the ith eigenspace of L2 (Sd ). Then bm/2c M Pm (d + 1) = H2i , i=dm/2e−bm/2c where d·e and b·c stand for round-up and round-down integers respectively. From Proposition 3.1, one can see that any L2 function on Sd can be approximated by homogeneous polynomials. An important feature of spherical harmonics theory is that it 13 gives reproducing structures on spheres, and now we will recall this fact. For the following discussion, we will fix a Laplacian eigenspace Hi inside L2 (Sd ), so Hi is a finite dimensional Hilbert space on Sd ; such a restriction on a single piece Hi is necessary because the entire Hilbert space L2 (Sd ) does not have a reproducing kernel given that the Delta functional on L2 (Sd ) is not a bounded functional1 . 3.2 Zonal Spherical Functions as Reproducing Kernels in Hi On each Laplacian eigenspace Hi inside L2 (Sd ) on general d-dimensional spheres, we define a linear functional Lx on Hi , such that for each Y ∈ Hi , Lx (Y ) = Y (x) for a fixed point x ∈ Sd . General spherical harmonics theory tells us that there exists ki (x, t) such that: Z Lx (Y ) = Y (x) = Y (t)ki (x, t)dt, x ∈ Sd ; Sd this function ki (x, t) is the representing function of the functional Lx (Y ), and classical spherical harmonics theory refers to the function ki (x, t) as the zonal spherical function, and furthermore, they are actually “reproducing kernels” inside Hi ⊂ L2 (Sd ) in the sense of Aronszajn (1950). Another way to appreciate spherical harmonics theory is that it tells that each Laplacian eigenspace Hi ⊂ L2 (Sd ) is actually a reproducing kernel Hilbert space on Sd , a special case when d = 2 was used Wahba (1981). Let us recollect some basic facts of zonal spherical functions for readers’ convenience in the next Proposition. One can find their proofs in almost any modern spherical harmonics references, in particular in Stein and Weiss (1971, Chapter IV): Proposition 3.2. The following properties hold for the zonal spherical function ki (x, t), which is also the reproducing kernel inside Hi ⊂ L2 (Sd ) with dimension ai . (a) For a choice of orthonormal basis {Y1 , . . . , Yai } in Hi , we can express the kernel ai X Yi (x)Yi (t), but ki (x, t) does not depend on choices of basis. ki (x, t) = i=1 (b) ki (x, t) is a real-valued function and symmetric, i.e., ki (x, t) = ki (t, x). (c) For any orthogonal matrix R ∈ O(d + 1), we have ki (x, t) = ki (Rx, Rt). (d) ki (x, x) = ai for any point x ∈ Sd . d vol(S ) (e) ki (x, t) ≤ ai for any x, t ∈ Sd . vol(Sd ) Remark 3.3. The above proposition “seems” obvious from traditional perspectives, as if it could be found in any textbook, so readers with rich experience with RKHS theory might think that we are stating something trivial. However, we want to point out two facts. 1 At first sight, this might seem to contradict the discussion on splines on 2-dimensional spheres in Wahba (1981), but a careful reader can find that a finiteness constraint was imposed there, and it was never claimed that L2 (S2 ) is a RKHS. That is, their RKHS on S2 is a subspace of L2 (S2 ). 14 (1) Function spaces over underlying spaces with different topological structures behave very differently. Spheres are compact with no boundary, and their function spaces have Laplacian operators whose eigenspaces and finite dimensional, which possesses reproducing kernels structures inside finite dimensional eigenspaces. These coincidences are not expected to happen over other general topological spaces. (2) Relative to classical topological spaces whose RKHS were used more often, e.g. unit intervals or vector spaces, spheres are more “exotic” topological structures (simply connected space, but with nontrivial higher homotopy groups), while intervals or vector spaces are contractible with trivial homotopy groups. One way to appreciate spherical harmonics theory is that classical “naive” expectations can still happen on spheres. In the next subsection we discuss the corresponding function space in the compositional domain ∆d . 3.3 Function Spaces on Compositional Domains With the identification ∆d = Sd /Γ, the functions space L2 (∆d ) can be identified with L2 (Sd /Γ), i.e., L2 (∆d ) = L2 (Sd /Γ). The function space L2 (Sd ) is well understood by spherical harmonics theory as above, so we want to relate L2 (Sd /Γ) with L2 (Sd ) as follows. Notice that a function h ∈ L2 (Sd /Γ) is a map from Sd /Γ to (real or complex) numbers. Thus a natural associated function π ∗ (h) ∈ L2 (Sd ) is given by the following composition of maps: π◦h: Sd π / Sd /Γ h / C. Therefore, the composition π ◦ h = π ∗ (h) ∈ L2 (Sd ) gives rise to a natural embedding of the function space of compositional domains to that of a sphere π ∗ : L2 (Sd /Γ) → L2 (Sd ). The embedding π ∗ identifies the Hilbert space of compositional domains as a subspace of the Hilbert space of spheres. A natural question is how to characterize the subspace in L2 (Sd ) that corresponds to functions on compositional domains. The following proposition states that f ∈ im(π ∗ ) if and only if f is constant on fibers of the projection map π : Sd → Sd /Γ, almost everywhere. In other words, f takes the same values on all Γ orbits, i.e., on the set of points which are connected to each other by “sign flippings”. Proposition 3.4. The image of the embedding π ∗ : L2 (Sd /Γ) → L2 (Sd ) consists of functions f ∈ L2 (Sd ) such that up to a measure zero set, is constant on π −1 (x) for every x ∈ Sd /Γ, where π is the natural projection Sd → Sd /Γ. We call a function f ∈ L2 (Sd ) that lies in the image of the embedding π ∗ a Γ-invariant function. Now we construct the contraction map π∗ : L2 (S d ) → L2 (S d /Γ) and this map will descend every function on spheres to a function on compositional domains. To construct π∗ , it suffices to associate a Γ-invariant function to every function in L2 (Sd ). For a point z ∈ Sd and a reflection γ ∈ Γ, a point γ(z) lies in the set OrbitΓz which is defined in (5). Starting with a function f ∈ L2 (Sd ), we will define the associated Γ-invariant function f Γ as follows: 15 Proposition 3.5. Let f be a function in L2 (Sd ). Then the following f Γ f Γ (z) = 1 X f (γ(z)), |Γ| γ∈Γ z ∈ Sd , (13) is a Γ-invariant function. Proof Each fiber of the projection map π : Sd → Sd /Γ is OrbitΓz for some z in the fiber. For any other point y on the same fiber with z for the projection π, there exists a reflection γ ∈ Γ such that y = γ(z). Then this proposition follows from the identity f Γ (z) = f Γ (γ(z)), which can be easily checked. The contraction map f 7→ f Γ on spheres naturally gives the following map π∗ : L2 (Sd ) → L2 (Sd /Γ), with f 7→ f Γ (14) Remark 3.6. Some readers might argue that each element in an L2 space is a function class rather than a function, so in that sense π∗ (f ) = f Γ is not well-defined, but note that each element in L2 (Sd ) can be approximated by polynomials, and the π∗ which is well defined on individual polynomial, will induce a well defined map on function classes. Theorem 3.7. This contraction map π∗ : L2 (Sd ) → L2 (Sd /Γ), as defined in (14), has a section given by π ∗ , namely the composition π∗ ◦ π ∗ induces the identity map from L2 (Sd /Γ) to itself. In particular, the contraction map π∗ is a surjection. Proof One way to look at the relation of the two maps π∗ and π∗ is through the diagram L2 (Sd /Γ) o π∗ π∗ / L2 (Sd ) . The image of π ∗ consists of Γ-invariant functions in L2 (Sd ). Conversely, given a Γ-invariant function g ∈ L2 (Sd ), the map g 7→ g Γ is an identity map, i.e., g = g Γ , thus the theorem follows. Remark 3.8. Theorem 3.7 identifies functions on compositional domains as Γ-invariant functions in L2 (Sd ). For any function f ∈ L2 (Sd ), we can produce the corresponding Γ-invariant function f Γ by (13). More importantly, we can “recover” L2 (∆d ) from L2 (Sd ), without losing any information. This allows us to construct reproducing kernels of ∆d from L2 (Sd ) in Section 3.5. 3.4 Further Reduction to Homogeneous Polynomials of Even Degrees In this section we provide L a further simplification of the homogeneous polynomials in the finite direct sum space m that if m is even, then Pm (d + 1) = i=0 Hi . Proposition 3.1 tells us Lm/2 L(m−1)/2 H2i+1 , where Pm (d + 1) is i=0 H2i , and that if m is odd then Pm (d + 1) = i=0 16 the space of degree m homogeneous polynomials in d + 1 variables. In either of the cases (m being even or odd), the degree of the homogeneous polynomials m is the same as the max{2i, L dm/2e − bm/2c ≤ i ≤ bm/2c}. Therefore we can decompose the finite direct sum space m i=0 Hi into the direct sum of two homogeneous polynomial spaces: m M Hi = Pm (d + 1) M Pm−1 (d + 1). i=0 However we will show that any monomial of odd degree term will collapse to zero by taking its Γ-invariant, thus only one piece of the above homogeneous polynomial space will “survive” under the contraction map π∗ . This will further simplify the function space, which in turn facilitates an easy computation. L Specifically, when working with accumulated direct sums m i=0 Hi on spheres, not every d d function is a meaningful function on ∆ = S /Γ, e.g., we can find a nonzero function f ∈ Lm Γ = 0. In fact, all of the odd pieces of the eigenspace Hm with m being i=0 Hi , but f odd do not L contribute to anything to L2 (∆d ) = L2 (Sd /Γ). In other words, the accumulated direct sum m i=0 H2i+1 is “killed” to zero under π∗ , as shown by the following Lemma: Q αi exits k with αk being Lemma 3.9. For every monomial d+1 i=1 xi (each αi ≥ 0), if there Qd+1 αi Γ Qd+1 αi odd, then the monomial i=1 xi is a shadow function, that is, ( i=1 xi ) = 0. LkAn important implication of this Lemma is that since each homogeneous polynomial in i=0 H2i+1 is a linear combination of monomials with Lat∞least one odd term, it is killed under 2 d π∗ . This implies that all “odd” pieces in L (S ) = i=0 Hi do not contribute anything to L2 (∆d ) = L2 (Sd /Γ). Therefore, whenever using spherical harmonics theory to understand function spaces of compositional domains, it suffices to consider only even i for Hi in L2 (Sd ). In summary, the function space on the compositional domain ∆d = Sd /Γ has the following eigenspace decomposition: 2 d 2 d L (∆ ) = L (S /Γ) = ∞ M Γ H2i , (15) i=0 Γ where H2i := {h ∈ H2i , h = hΓ }. 3.5 Reproducing Kernels for Compositional Domain With the understanding of function spaces on compositional domains as invariant functions on spheres, we are ready to use spherical harmonic theory to construct reproducing kernel structures on compositional domains. 3.5.1 Γ-invariant Functionals on Hi The main goal of this section is to establish reproducing kernels for compositional data. Inside each Laplacian eigenspace Hi in L2 (Sd ), recall that the Γ-invariant subspace HiΓ can be regarded as a function space on ∆d = Sd /Γ, based on (15). To find a candidate of 17 reproducing kernel inside HiΓ , we first identify the representing function for the following linear functional LΓz on Hi , which is defined as follows: For any function Y ∈ Hi , LΓz (Y ) = Y Γ (z) = 1 X Y (γz), |Γ| γ∈Γ for a given z ∈ Sd . One can easily see that LΓz and Lz agree on the subspace HiΓ inside Hi and also that LΓz can be seen as a composed map LΓz = Lz π∗ : Hi → HiΓ → C. Note that although LΓz is defined on Hi , it can actually be seen as a “Delta functional” on Sd /Γ = ∆d . To find the representing function for LΓz , we will use zonal spherical functions: Let ki (·, ·) be the reproducing kernel in the eigenspace Hi . Define the “compositional” kernel kiΓ (·, ·) for Hi as 1 X kiΓ (x, y) = ki (γx, y), ∀x, y ∈ Sd , (16) |Γ| γ∈Γ from which it is straightforward to check that kiΓ (z, ·) represents linear functionals of the form LΓz , simply by following the definitions. Remark 3.10. The above definition of “compositional kernels” in (16) is not just a trick only to get rid of the “redundant points” on spheres. This definition is inspired by the notion of “orbital integrals” in analysis and geometry. In our case, the “integral” is a discrete version, because the “compact subgroup” in our situation is replaced by a finite discrete reflection group Γ. In fact, such kind of “discrete orbital integral” construction is not new in statistical learning theory, e.g., Reisert and Burkhardt (2007) also used the “orbital integral” type of construction to study equivariant matrix valued kernels. At first sight, a compositional kernel is not symmetric on the nose, because we are only “averaging” over the group orbit on the first variable of the function ki (x, y). However since ki (x, y) is both symmetric and orthogonally invariant by Propositional 3.2, so quite counter-intuitively, compositional kernels are actually symmetric: Proposition 3.11. Compositional kernels are symmetric, namely kiΓ (x, y) = kiΓ (y, x). Proof Recall that ki (x, y) = ki (y, x) and that ki (Gx, Gy) = ki (x, y) for any orthogonal matrix G. Notice that every reflection γ ∈ Γ can be realized as an orthogonal matrix, then we have 1 X kiΓ (x, y) = ki (γx, y) |Γ| γ∈Γ 1 X 1 X = ki (y, γx) = ki (γ −1 y, γ −1 (γx)) |Γ| γ∈Γ |Γ| γ∈Γ X 1 = ki (γ −1 y, x) |Γ| γ∈Γ 1 X = ki (γy, x) |Γ| γ∈Γ = kiΓ (y, x) 18 Recall that HiΓ is the Γ-invariant functions inside Hi , and by (15), HiΓ is the i-th subspace of a compositional function space L2 (∆d ). A naı̈ve candidate for the reproducing kernel inside HiΓ , denoted as wi (x, y), might be the spherical reproducing kernel ki (x, y), but ki (x, y) is not Γ-invariant. It turns out that the compositional kernels are actually reproducing with respect to all Γ-invariant functions in Hi , while being Γ-invariant on both arguments. Theorem 3.12. Inside Hi , the compositional kernel kiΓ (x, y) is Γ-invariant on both arguments x and y, and moreover kiΓ (x, y) = wi (x, y), i.e., the compositional kernel is the reproducing kernel for HiΓ . Proof Firstly by the definition, kiΓ (x, y) is Γ-invariant on the first argument x; by the symmetry of kiΓ (x, y) in Proposition 3.11, it is then also Γ-invariant on the second argument y, hence the compositional kernel kiΓ (x, y) is a kernel inside HiΓ . Secondly, let us prove the reproducing property of kiΓ (x, y). For any Γ-invariant function f ∈ HiΓ ⊂ Hi , < f (t), kiΓ (x, t) > = < f (t), X 1 ki (γx, t) > |Γ| γ∈Γ 1 X < f (t), ki (γx, t) > |Γ| γ∈Γ 1 X 1 X = f (γx) = f (x) (f is Γ-invariant) |Γ| γ∈Γ |Γ| γ∈Γ = = f (x) Remark 3.13. Theorem 3.12 justifies that a compositional kernel is actually the reproducing kernel for functions inside HiΓ . Although the compositional kernel kiΓ (x, y) is symmetric as is proved in Proposition 3.11, we will still use wi (x, y) to denote kiΓ (x, y) because wi (x, y) is, notationally speaking, more visually symmetric than the notation for compositional kernels. 3.5.2 Compositional RKHS and Spaces of Homogeneous Polynomials Recall that based on Theorem 3.1, the direct sum of an even (resp. odd) number of eigenspaces can be expressed as the set of homogeneous polynomials of a fixed degree. FurL∞ 2 d ther recall that the direct sum L (S ) = i=0 Hi is an orthogonal one, so L decomposition Γ is the direct sum L2 (∆d ) = ∞ H . By the orthgonality between eigenspaces, i i=0 Lm Pm the reproΓ ducing kernels for the finite direct sum i=0 Hi is naturally the summation i=0 wi . Note that by Lemma 3.9, it suffices to consider only even pieces of eigenspaces H2i . Finally, we give a formal definition of “the degree m reproducing kernel Hilbert space” on ∆d , consisting degree 2m homogeneous polynomials: Definition 3.14. Let wi be the reproducing kernel for Γ-invariant functions in the ith eigenspace Hi ⊂ L2 (Sd ). The degree m compositional reproducing kernel Hilbert space is 19 L Γ defined to be the finite direct sum m i=0 H2i , and the reproducing kernel for the degree m compositional reproducing kernel Hilbert space is ωm (·, ·) = m X w2i (·, ·). (17) i=0 Lm Γ Thus the degree m RKHS for the compositional domain is the pair i=0 H2i , ωm . Lm L Γ which is Recall that the direct sum m i=0 H2i ,L i=0 H2i can identified as a subspace of Γ isomorphic to the space of degree 2m homogeneous polynomials, so each function in m i=0 H2i can be written as a degree 2m homogeneous polynomial, including the reproducing kernel ωm (x, ·), P although it is not obvious from (17). Notice that for a point (x1 , x2 , . .L . , xd+1 ) ∈ Sd , d+1 2 Γ the sum i=1 xi = 1, so one can always use this sum to turn each element in m i=0 H2i to a homogeneous polynomial. For example, x2 + 1 is not a homogeneous polynomial, but each point (x, y, z) ∈ S2 satisfies x2 + y 2 + z 2 = 1, then we have x2 + 1 = x2 + x2 + y 2 + z 2 = 2 2x2 + y 2 + z 2 , which is a homogeneous polynomial Lm onΓ the sphere S . In fact, we can say something more about i=0 H2i . Recall that Proposition 3.9 “killed” the contributions fromL “odd pieces” H2k+1 under the contraction map π∗ : L2 (Sd ) → L2 (∆d ). However, even inside m i=0 H2i , only a subspace can be identified with a compositional function space, namely, those Γ-invariant homogeneous polynomials. The Lmfollowing proposition gives a characterization of which homogeneous polynomials inside i=0 H2i come from the Lm Γ subspace i=0 H2i : L Lm 2 d Γ Proposition 3.15. Given any element θ ∈ m i=0 H2i ⊂ L (S /Γ), there exists i=0 H2i ⊂ a degree m homogeneous polynomial pm , such that θ(x1 , x2 , . . . , xd+1 ) = pm (x21 , x22 , · · · , x2d+1 ). (18) Proof Note that homogeneous Γ-invariant polynomial, then each monomial Q θ isaai degreeP2m d+1 with x in θ has form d+1 i=1 ai = 2m. i=1 i Q ai that ai is odd for some If θ contains one monomial d+1 i=1 xi with nonzero coefficient suchQ Q ai Γ Γ 1 ≤ i ≤ d + 1. Note that θ is Γ-invariant, i.e., θ = θ , which implies d+1 xai i = ( d+1 i=1 xi ) , i=1 Qd+1 ai Γ but the term ( i=1 xi ) is zero by Proposition 3.9. Thus θ is a linear combination of monoQ Qd+1 2 ai /2 P ai mials of the form d+1 with each ai being even and i ai /2 = m, thus i=1 xi = i=1 (xi ) the proposition follows. Lm Γ Recall that the degree m compositional RKHS is H , ω 3.14, and m 2i i=0 Lm Lm in Definition Γ H consists of degree 2m homogeneous polynomials while H is just a subspace 2i 2i i=0 i=0 of Proposition 3.15 tells us that one can also have a concrete description of the subspace Lit. m Γ i=0 H2i via those degree m homogeneous polynomials on squared variables. 4 Applications of Compositional Reproducing Kernels The availability of compositional reproducing kernels will open a door to many statistical/machine learning techniques for compositional data analysis. However, we will only 20 present two application scenarios, as an initial demonstration of the influence of RKHS thoery on compositional data analysis. The first application is the representer theorem, which is motivated by newly developed kernel-based machine learning, especially by the rich theory of vector valued regression (Micchelli and Pontil, 2005; Minh and Sindhwani, 2011). The second one is constructing exponential families on compositional domains. Parameters of compositional exponential models are compositional reproducing kernels. To the best of authors’ knowledge, these will be the first class of nontrivial examples of explicit distributions on compositional domains with non-vanishing densities on the boundary. 4.1 Compositional Representer Theorems Beyond the successful applications on traditional spline models, representer theorems are increasingly relevant due to the new kernel techniques in machine learning. We will consider minimal normal interpolations and least square regularzations in this paper. Regularizations are especially important in many situations, like structured prediction, multi-task learning, multi-label classification and related themes that attempt to exploit output structure. A common theme in the above-mentioned contexts is non-parametric estimation of a vector-valued function f : X → Y, between a structured input space X and a structured output space Y. An important adopted framework in those analyses is the “vector-valued reproducing kernel Hilbert spaces” in Micchelli and Pontil (2005). Unsurprisingly, representer theorems not only are necessary, but also call for further generalizations in various contexts: (i) In classical spline models, the most frequently used version of representer theorems are about scalar valued kernels, but besides the above-mentioned scenario f : X → Y in manifold regularization context, in which vector valued representer theorems are needed, higher tensor valued kernels and their corresponding representer theorems are also desirable. In Reisert and Burkhardt (2007), matrix valued kernels and their representer theorems are studied, with applications in image processing. (ii) Another related application lies in the popular kernel mean embedding theories, in particular, conditional mean embedding. Conditional mean embedding theory essentially gives an operator from an RKHS to another (Grunewalder et al., 2012). In order to learn such operators, vector-valued regressions plus corresponding representer theorems are used. In vector-valued regression framework, an important assumption discussed in representer theorems are linear independence conditions (Micchelli and Pontil, 2005). As our construction of compositional RKHS is based on finite dimensional spaces of polynomials, the linear independence conditions are not freely satisfied on the nose, so we will address this problem in this paper. Instead of dealing with vector-valued kernels, we will only focus on the special case of scalar valued (reproducing) kernels, but the issue can be clearly seen in this special case. 4.1.1 Linear Independence of Compositional Reproducing Kernels Lm Γ The compositional RKHS that was constructed in Section 3 takes the form H , ω m 2i i=0 indexed by m. Based on the finite dimensional nature of compositional RKHS, it is not even 21 L Γ clear whether different points yield to different functions ωm (xi , ·) inside m i=0 H2i . we will give a positive answer when m is high enough. Given a set of distinct compositional data points {xi }ni=1 ⊂ ∆d , we will show that the corresponding set of reproducing functions {ωm (xi , ·)}ni=1 form a linearly independent set Lm Γ inside i=0 H2i if m is high enough. Theorem 4.1. Let {xi }ni=1 be distinct data points on a compositional domain ∆d . Then there exists a positive integer M L , such that for any m > M , the set of functions ωm (xi , ·)}ni=1 Γ is a linearly independent set in m i=0 H2i . Proof The quotient map c∆ : Sd → ∆d can factor through a projective space, i.e., c∆ : Sd → d P → ∆d . The main idea is to prove a stronger statement, in which we showed that distinct data points in Pd will give linear independence of projective kernels for large enough m, where projective kernels are reproducing kernels in Pd whose definition was given in A.3. Then we construct two vector subspace V1m and V2m and a linear map gm from V1m to V2m . The key trick is that the matrix representing the linear map gm becomes diagonally dominant when m is large enough, which forces the spanning sets of both V1m and V2m to be linear independent. More details of the proof are given in Section A.3. In the proof of Theorem 4.1, we make use of the homogeneous polynomials (yi ·t)2m L,mwhich is not living inside a single piece H2i , thus we had to use the direct sum space i=0 H2i for our construction of RKHS. Without using projective kernels, one might wonder if the same argument works, however, the issue is that the matrix might have ±1 at off-diagonal entries, which will fail to be diagonally dominant when m grows large enough. We break down to linear independence of projective kernels for distinct points, because reproducing kernels for distinct compositional data points are linear combinations of distinct projective kernels, then in this way, the off diagonal terms will be the power of inner product of two vectors that will not be antipodal or identical, thus off diagonal terms’ m-th power will go to zero with increasing m. Another consequence of Theorem 4.1 is that ωm (xi , ·) 6= ωm (xj , ·) whenever i 6= j when m is large enough. Not only large enough m will separate points on the reproducing kernel level, but also gives each data point their “own dimension.” 4.1.2 Minimal Norm Interpolation and Least Squares Regularization Once the linear independence is established in Theorem 4.1, it is an easy corollary to establish the representer theorems for minimal norm interpolations and least square regularizations. Nothing is new from the point of view of general RKHS theory, but we will include these theorems and proofs on account of completeness. Again, we will focus on the scalar-valued (reproducing) kernels and functions, instead of the vector-valued kernels and regressions. However, Theorem 4.1 sheds important lights on linearly independence issues, and interested readers can generalize these compositional representer theorems to vector-valued cases by following Micchelli and Pontil (2005). 22 The first representer theorem we provide is a solution to minimal norm interpolation problem: for a fixed set of distinct points {xi }ni=1 in ∆d and a set of numbers y = {yi ∈ R}ni=1 , let Iym be the set of functions that interpolates the data Iym = {f ∈ m M Γ : f (xi ) = yi }, H2i i=0 and out goal is to find f0 with minimum `2 norm, i.e., kf0 k = inf{kf k , f ∈ Iym }. Theorem 4.2. Choose m large enough so that the reproducing kernels {ωm (xi , t)}ni=1 are linearly independent, then the unique solution of the minimal norm interpolation problem Lm Γ : f (xi ) = yi } is given by the linear combination of the kernels: min{kf k , f ∈ i=0 H2i f0 (t) = n X ci ωm (xi , t) i=1 where {ci }ni=1 is the unique solution of the following system of linear equations: n X ωm (xi , xj )cj = yi , 1 ≤ i ≤ n. j=1 Proof For any other f in Iym , define g = f − f0 . By considering the decomposition: kf k2 = kg + f0 k2 = kgk2 + 2 < f0 , g > + kf0 k2 , one can argue that the cross term < f0 , g >= 0. The detail can be found in Section A.4. We want to point out that the linear independence of reproducing kernels guarantees the uniqueness and existence of f0 . The second representer theorem is for a more realistic scenario with `2 regularization, which has the following objective: n X (f (xi ) − yi )2 + µ kf k2 . (19) i=1 L Γ The goal is to find the Γ-invariant function fµ ∈ m i=0 H2i that minimizes (19). The solution to this problem is provided by the following representer theorem: Theorem 4.3. For a set of distinct compositional data points {xi }ni=1 , choose m large enough such that the reproducing kernels {ωm (xi , t)}ni=1 are linearly independent. Then the solution to (19) is given by n X fµ (t) = ci ωm (xi , t), i=1 where {ci }ni=1 is the solution of the following system of linear equations: µci + n X ωm (xi , xj )cj = yi , 1 ≤ i ≤ n. j=1 23 Proof The detail of this proof can be found in Section A.4, but we want to point out how the linear independence P condition plays a role in here. of the proof we need In the middle P n to show that µfµ (t) = i=1 (yi − fµ (xi ))ωm (xi , t) , where fµ (t) = ni=1 ωm (xi , t)ci . We use the linear independence in Theorem 4.1 to establish the equivalence between this linear equation system of {ci }ni=1 and the one given in the theorem. 4.2 Compositional Exponential Family With the construction of RKHS in hand, one can produce exponential families using the technique developed in Canu and Smola (2006). Recall that for a function space H with the inner product < ·, · > on a general topological space X , whose reproducing kernel is given by k(x, ·), the exponential family density p(x, θ) with the parameter θ ∈ H is given by: Z where g(θ) = log p(x, θ) = exp{< θ(·), k(x, ·) > −g(θ)}, exp < θ(·), k(x, ·) > dx. X For compositional data we define the density of the mth degree exponential family as pm (x, θ) = exp {< θ(·), ωm (x, ·) > −g(θ)} , x ∈ Sd /Γ, (20) R Lm Γ and g(θ) = log Sd /Γ exp(< θ(·), ωm (x, ·) >)dx. Note that this density where θ ∈ i=0 H2i canLbe made more explicit by using homogeneous polynomials. Recall that any function Γ in m i=0 H2i can be written as a degree m homogeneous polynomial with squared variables by Lemma 3.9. Thus the density in (20) can be simplified to the following form: for x = (x1 , . . . , xd+1 ) ∈ Sd≥0 , pm (x, θ) = exp{sm (x21 , x22 , . . . , x2d+1 ; θ) − g(θ)}, (21) x2i ’s where sm is a polynomial on squared variables with θ as coefficients. Note that sm is invariant under “sign-flippings”, and the normalizing constant can be computed via the integration over the entire sphere as follows: Z Z 1 g(θ) = exp(sm )dx = exp(sm )dx. |Γ| Sd Sd /Γ Figure 3 displays three examples of compositional exponential distribution. The three densities respectively have the following θ: θ1 = −2x41 − 2x42 − 3x43 + 9x21 x22 + 9x21 x23 − 2x22 x23 , θ2 = −x41 − x42 − x43 − x21 x22 − x21 x23 − x22 x23 , θ3 = −3x41 − 2x42 − x43 + 9x21 x22 − 5x21 x23 − 5x22 x23 . The various shapes of the densities in the Figure implies that the compositional exponential family can be used to model data with a wide range of locations and correlation structures. Further investigation is needed on the estimation of the parameters of the compositional exponential model (21), which is suggested as a future direction of research. A natural starting point is maximum likelihood estimation and a regression-based method such as the one discussed by Beran (1979). 24 (a) p4 (x, θ1 ) (b) p4 (x, θ2 ) (c) p4 (x, θ3 ) Figure 3: Three example densities from compositional exponential family. See text for specification of the parameters θ1 , θ2 , and θ3 . 5 Discussion A main contribution of this work is that we use projective and spherical geometries to reinterpret compositional domains, which allows us to construct reproducing kernels for compositional data points by using spherical harmonics under group actions. With the rapid development of kernel techniques (especially kernel mean embedding philosophy) in machine learning theory, this work will make it possible to introduce reproducing kernel theories to compositional data analysis. Let us for example consider the mean estimation problem for compositional data. Under the kernel mean embedding framework that is surveyed by Muandet et al. (2017), one can focus on kernel mean E[k(X, ·)] in the function space, rather a physical mean that exist in the compositional domain. The latter is known to be difficult to even define properly (Paine et al., 2020; Scealy and Welsh, 2011). On the other hand, the kernel mean is endowed with flexibility and linear structure of the function space. Although inspired by the kernel mean embedding techniques, we did not address the computation of the kernel mean E[k(X, ·)] and cross-variance operator as a replacement of traditional means and variance-covariances in traditional multivariate analysis. The authors will, in the forthcoming work, develop further techniques to come back to this issue of kernel means and cross-variance operators for compositional exponential models, via applying deeper functional analysis techniques. Although we only construct reproducing kernels for compositional data, it does not mean that “higher tensors” is abandoned in our consideration. In fact, higher-tensor valued reproducing kernels are also included in kernel techniques with applications in manifold regularizations (Minh and Sindhwani, 2011) and shape analysis (Micheli and Glaunés, 2014). These approaches on higher-tensor valued reproducing kernels indicate further possibilities of regression frameworks between exotic spaces f : X → Y with both the source X and Y being non-linear in nature, which extends the intuition of multivariate analysis further to nonlinear contexts, and compositional domains (traditionally modeled by an “Aitchison simplex”) are an interesting class of examples which can be treated non-linearly. 25 A Supplementary Proofs A.1 Proof of Central Limit Theorems on Integral Squared Errors (ISE) in Section 2.2 Assumption A.1. For all kernel density estimators and bandwidth parameters in this paper, we assume the following: H1 The kernel function K : [0, ∞) → [0, ∞) is continuous such Z ∞that both λd (K) and K(r)rd/2−1 dr. λd (K 2 ) are bounded for d ≥ 1, where λd (K) = 2d/2−1 vol(S d ) 0 H2 If a function f on Sd ⊂ Rd+1 is extended to the entire Rd+1 /{0} via f (x) = f (x/ kxk), then the extended function f needs to have its first three derivatives bounded. H3 Assume the bandwidth parameter hn → 0 as nhdn → ∞. Let f be the extended function from Sd to Rd+1 /{0} via f (x/ kxk), and let φ(f, x) = −xT ∇f (x) + d−1 (∇2 f (x) − xT (Hx f )x) = d−1 tr[Hx f (x)], where Hx f is the Hessian matrix of f at the point x. The term bd (K) in the statement of Theorem 2.6 is defined to be: Z ∞ K(r)rd/2 dr 0 bd (K) = Z ∞ K(r)rd/2−1 dr 0 The term φ(hn ) in the statement of Theorem 2.6 is defined to be: 4bd (K)2 2 4 φ(hn ) = σx hn d2 Proof of Theorem 2.6: Proof The strategy in Zhao and Wu (2001) in the directional set-up follows that in Hall (1984), whose key idea is to give asymptotic bounds for degenerate U-statistics, so that one can use Martingale theory to derive the central limit theorem. The step where the finite support condition was used in Zhao and Wu (2001), is when they were trying to prove the asymptotic bound:E(G2n (X1 , X2 )) = O(h7d ), where Gn (x, y) = E[Hn (X, xHn (X, y))] with Z Kn (z, x)Kn (z, y)dz and the centered kernel Kn (x, y) = K[(1 − x0 y)/h2 ] − E{[K(1 − Hn = Sd 2 x0 X)/h ]}. During that prove, they were trying to show that the following term: Z Z T1 = f (x)dx SZd f (y)dy× SZd 0 2 2 Z K[(1 − u x)/h ]K[(1 − u z)/h ]du · f (z)dz Sd 0 Sd 2 0 2 K[(1 − u y)/h ]K[(1 − u z)/h ]du Sd 26 0 2 , satisfies T1 = O(h7d ). During this step, in order to give an upper bound for T1 , the finite support condition was substantially used. The idea to avoid this assumption was based on the observation in Garcı́a-Portugués et al. (2015) where they only concern the case of directional-linear CLT, whose result can not be directly used to the only directional case. Based on the method provided in Lemma 10 in Garcı́a-Portugués et al. (2015), one can easily deduce the following asymptotic equivalence: Z 1 − xT y i )φ (y)dy ∼ hd λd (K j )φi (x), Kj( 2 h Sd Z ∞ where λd (K j ) = 2d/2−1 vol(Sd−1 ) K j (r)rd/2−1 dr. As a special case we have: 0 Z Sd K 2( 1 − xT y )dy ∼ hd λd (K 2 )C, with C being a positive constant. h2 Now we will proceed the proof without the finite support condition: Z Z T1 = f (x)dx f (y)dy SdZ Sd Z 2 Z 0 2 0 2 0 2 0 2 K[(1 − u y)/h ]K[(1 − u z)/h ]du K[(1 − u x)/h ]K[(1 − u z)/h ]du · f (z)dz × d S Sd Sd Z 2 Z Z 1 − xT z 1 − yT z d d f (z) λd (K)h K( ∼ f (x)dx f (y)dy ) × λd (K)h K( ) dz h2 h2 Sd Z Sd Z Sd 2 1 − xT y f (x)dx f (y) λd (K)hd K( ∼ λd (K)4 h4d )f (y) dy 2 Sd h ZSd Z T 1 − x y K 2( )f 3 (y)dy f (x)dx = λd (K)6 h6d 2 h Sd ZSd 6 6d λd (K 2 )hd C · f 3 (x)f (x)dx ∼ λd (K) h d Z S 6 2 7d f ( x)dx = O(h7d ). = Cλd (K) λd (K )h Sd Thus we have proved T1 = O(h7d ) without finite support assumption, then the rest of the proof will follow through as in Zhao and Wu (2001). Observe the identity: Z Z 2 (p̂n − p) dx = |Γ| Sd≥0 (fˆn − p̃)2 dy, (22) Sd then the CLT of compositional ISE follows from the identity (22) and our proof of CLT on spherical ISE without finite support conditions on kernels. A.2 Proofs of Shadow Monomials in Section 3 Proof of Proposition 3.9: 27 Proof A direct computation yields: d+1 1 X Y (si xi )αi |Γ| si ∈{±1} i=1 X Y 1 X Y = (si xi )αi xαk k + (si xi )αi (−xk )αk |Γ| si ∈{±1} i6=k si ∈{±1} i6=k X Y 1 X Y αk 1 αi = xk (si xi ) − xαk k (si xi )αi |Γ| |Γ| i6=k i6=k Q αi Γ ( d+1 = i=1 xi ) si ∈{±1} si ∈{±1} = 0. A.3 Proof of Linear Independence of Reproducing Kernels in Theorem 4.1 We sketch a slight of more detailed (not complete) proof: Proof This is the most technical lemma in this article. We will sketch the philosophy of the proof in here, which can be intuitively understood topologically. Recall that we can produce a projective space Pd by identifying every pair of antipodal points of a sphere Sd (identify x with −x), in other words Pd = Sd /Z2 where Z2 = {0, 1} d is a cyclic group of order 2. Then we can define a projective kernel in Hi ⊂ L2 (S ) to be L p ki (x, ·) = [ki (x, + ki (−x, ·)]/2. We can also denote the projective kernel inside m i=0 H2i P·) m p p by k m (x, ·) = i=0 k2i (x, ·). Now we spread out the data set {xi }ni=1 by “spread-out” construction in Section 2.1, and n denote the spread-out data set as {Γ · xi }ni=1 = {c−1 ∆ (xi )}i=1 (a data set, not a set because of repetitions). A compositional reproducing kernel kernel is a summation of spherical re−1 producing kernels of on c−1 ∆ (xi ), divided by the number of elements in c∆ (xi ). This data set c−1 ∆ (xi ) has antipodal symmetry, then a compositional kernel is a linear combination of projective kernels. Notice that different fake kernels are linear combinations of different projective kernels. It suffices to show the linear independence of projective kernels for distinct data points and large enough m, which implies the linear independence of fake kernels {k Γm (xi , ·)}ni=1 . Now we are focusing on the linear independence of projective kernels. A projective kernel l d can be seen as a reproducing kernel for a point in Pd . For a set of distinct points {yL i }i=1 ⊂ P , m we will show that the corresponding set of projective kernels {k pm (yi , ·)}li=1 ⊂ i=0 H2i is linearly independent for an integer l and a large m. enough p m 2m l m l Consider two vector subspace V = span {(y · t) } and V = span {k (y , t)} , i i 1 i=1 2 m i=1 Lm 2 d m both of which are inside i=0 H2i ⊂ L (S ). Then we can define a linear map hm : V1 → P V2m by setting hm ((yi · t)2m ) = lj=1 < (yi · t)2m , k pm (yj , t) > k pm (yj , t). This linear map hm is represented by an l × l symmetric matrix whose diagonal elements are 1’s, and off diagonal elements are [(yi · yj )]2m . Notice that yi 6= yj in Pd , which means that they are not antipodal to each other in Sd , thus |yi · yj | < 1. When m is large enough, all off-diagonal elements 28 will go to zero while diagonal elements always stay constant, then the matrix representing hm will become a diagonally dominant matrix, which is full rank. When the linear map hm has full rank, both spanning sets {(yi · t)2m }li=1 and {k pm (yi , t)}li=1 have to be a basis for V1m and V2m correspondingly, then the set of projective kernels {k pm (yi , t)}m i=1 have to be linearly independent when m is large enough. A.4 Proof of Representer Theorems in Section 4.1.2 Proof of Theorem 4.2 on minimal norm Linterpolation: Proof Note that the set Iym = {f ∈ m i=0 H2i : f (xi ) = yi } is non-empty, because the f0 defined by the linear system of equation is naturally in Iym . Let f be any other element in Iym , define g = f − f0 , then we have: kf k2 = kg + f0 k2 = kgk2 + 2 < f0 , g > + kf0 k2 . Notice that g ∈ Lm i=0 H2i and that g(xi ) = 0 for 1 ≤ i ≤ n, we have: P < f0 , g > = < ni=1 ωm (xi , ·)ci , g(·) > n X = ci < ωm (xi , ·), g(·) > . i=1 = n X ci g(xi ) = 0. i=1 Thus kf k2 = kg + f0 k2 = kgk2 + kf0 k2 , which implies that f0 is the solution to the minimal norm interpolation problem. Proof of Theorem 4.3, on regularization problems: P Proof First defineL the loss functional E(f ) = ni=1 |f (xi )−yi |2 +µ kf k2 . For any Γ-invariant function f = f Γ ∈ m i=0 H2i , let g = f − fµ , then a simple computation yields: E(f ) = E(fµ ) + n X i=1 I want to show equality is: Pn i=1 (yi n X |g(xi )| − 2 (yi − fµ (xi ))g(xi ) + 2µ < fµ , g > +µ kgk2 . 2 i=1 − fµ (xi ))g(xi ) = µ < fµ , g >, and an equivalent way of writing this n X < (yi − fµ (xi ))ωm (xi , t), g(t) >= µ < fµ , g > . i=1 P Now I claim that µfµ (t) = ni=1 (yi − fµ (xi )) · ωm (xi , P t) , which implies the above equality. To prove this claim, plug this linear combination fµ = ni=1 ci · ωm (xi , t) into the claim, then we get a system of linear equations in {ci }ni=1 , thus the proof of the claim breaks down to checking the system of linear equations in {ci }ni=1 , produced by the claim. 29 Note that {ωm (xi , t)}ni=1 is a linearly independent set, so one can check that the system n n of linear Pnequations in {ci }i=1 produced by the claim is true, if and only if {ci }i=1 satisfy µck + i=1 ci · ωm (xi , xk ) = yk for every k with 1 ≤ k ≤ n, which is given by the condition of this theorem. The equivalence of these two systems of linear equations is given by the linearindependence of the set{ωm (xi , t)}ni=1 . Therefore we conclude that the claim µfµ (t) = P n i=1 (yi − fµ (xi )) · ωm (xi , t) is true. To finish the proof of this theorem, notice that P P E(f ) = E(fµ ) + ni=1 |g(xi )|2 − 2 ni=1 (yi − fµ (xi ))g(xi ) + 2µ < fµ , g > +µ kgk2 n n X X 2 2 = E(fµ ) + |g(xi )| + µ kgk + 2 µ < fµ , g > − (yi − fµ (xi ))g(xi ) i=1 = E(fµ ) + n X i=1 = E(fµ ) + n X i=1 2 |g(xi )|2 + µ kgk + 2 µ < fµ , g > − i=1 n X |g(xi )|2 + µ kgk2 + 2 < µfµ (t) − (yi − fµ (xi )) · ωm (xi , t) , g(t) > i=1 | = E(fµ ) + < (yi − fµ (xi )) · ωm (xi , t), g(t) > i=1 n X n X {z =0 } |g(xi )|2 + µ kgk2 . i=1 The term i=1 |g(xi )|2 + µ kgk2 in the above equality is always non-negative, thus E(fµ ) ≤ E(f ), then the theorem follows. Pn References Aitchison, J. (1982), “The Statistical Analysis of Compositional Data,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 44, 139–177. — (1994), “Principles of compositional data analysis,” Lecture Notes-Monograph Series, 73– 81. Arnold, R., Jupp, P. E., and Schaeben, H. (2018), “Statistics of ambiguous rotations,” Journal of Multivariate Analysis, 165, 73–85. Aronszajn, N. (1950), “Theory of Reproducing Kernels,” Transactions of the American Mathematical Society, 68, 337–404. Atkinson, K. and Han, W. (2012), Spherical Harmonics and Approximations on the Unit Sphere: An Introduction, Springer. Bai, Z., Rao, C. R., and Zhao, L. (1989), “Kernel estimators of density function of directional data,” Journal of Multivariate Analysis, 27, 24–39. Ball, J. A., Marx, G., and Vinnikov, V. (2016), “Noncommutative reproducing kernel Hilbert spaces,” Journal of Functional Analysis, 271, 1844–1920. 30 Beardon, A. (2012), The Geometry of Discrete Groups, vol. 91, Springer Science & Business Media. Beran, R. (1979), “Exponential models for directional data,” Annals of Statistics, 7, 1162– 1178. Calle, M. L. (2019), “Statistical analysis of metagenomics data,” Genomics & Informatics, 17. Canu, S. and Smola, A. (2006), “Kernel methods and the exponential family,” Neurocomputing, 69, 714–720. Chwialkowski, K., Strathmann, H., and Gretton, A. (2016), “A kernel test of goodness of fit,” International Conference on Machine Learning, 2606–2615. Dai, B., Dai, H., Gretton, A., Song, L., Schuurmans, D., and He, N. (2019), “Kernel exponential family estimation via doubly dual embedding,” in Proceedings of Machine Learning Research, pp. 2321–2330. de los Campos, G., Gianola, D., and Rosa, G. (2009), “Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation,” Journal of Animal Science, 87, 1883–1887. Fan, Z., Sun, Y., and Wang, Z. (2021), “Principal components in linear mixed models with general bulk,” Annals of Statistics, 49, 1489–1513. Fukumizu, K., Bach, F., and Jordan, M. (2009), “Kernel Dimension Reduction in Regression,” Annals of Statistics, 37, 1871–1905. Garcı́a-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. (2015), “Central limit theorems for directional and linear random variables with applications,” Statistica Sinica, 25, 1207–1229. Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V., and Egozcue, J. J. (2017), “Microbiome datasets are compositional: and this is not optional,” Frontiers in microbiology, 8, 2224. Grunewalder, S., Lever, G., Baldassarre, L., Patterson, S., Gretton, A., and Pontil, M. (2012), “Conditional mean embeddings as regressors,” Proceedings of the 29th International Conference on International Conference on Machine Learning, 1803–1810. Hall, P. (1984), “Central limit theorem for integrated square error of multivariate nonparametric density estimators,” Journal of Multivariate Analysis, 14, 1–16. Hall, P., Watson, G. S., and Cabrera, J. (1987), “Kernel density estimation with spherical data,” Biometrika, 74, 751–762. Jorgensen, P. and Tian, F. (2016), “Graph Laplacians and discrete reproducing kernel Hilbert spaces from restrictions,” Stochastic Analysis and Applications, 34, 722–747. 31 Jupp, P. E. (2008), “Data-driven Sobolev tests of uniformity on compact Riemannian manifolds,” Annals of Statistics, 36, 1246–1260. Jupp, P. E. and Spurr, B. D. (1985), “Sobolev tests for independence of directions,” Annals of Statistics, 13, 1140–1155. Li, G., Li, Y., and Chen, K. (2020), “It’s All Relative: New Regression Paradigm for Microbiome Compositional Data,” arXiv preprint arXiv:2011.05951. Lubbe, S., Filzmoser, P., and Templ, M. (2021), “Comparison of zero replacement strategies for compositional data with large numbers of zeros,” Chemometrics and Intelligent Laboratory Systems, 210, 104248. Micchelli, C. A. and Pontil, M. (2005), “On Learning Vector-Valued Functions,” Neural Computation, 17, 177–204. Micheli, M. and Glaunés, J. A. (2014), “Matrix-valued Kernels for Shape Deformation Analysis,” Geometry, Imaging and Computing, 1(1), 57–139. Minh, H. Q. and Sindhwani, V. (2011), “Vector-valued Manifold Regularization,” in Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 57–64. Muandet, K., Fukumizu, K., Sriperumbudur, B., and Schölkopf, B. (2017), “Kernel Mean Embedding of Distributions: A Review and Beyond,” Foundations and Trends in Machine Learning, 10. Paine, P. J., Preston, S., Tsagris, M., and Wood, A. (2020), “Spherical regression models with general covariates and anisotropic errors,” Statistics and Computing, 30, 153–165. Rasmussen, C. L., Palarea-Albaladejo, J., Johansson, M. S., Crowley, P., Stevens, M. L., Gupta, N., Karstad, K., and Holtermann, A. (2020), “Zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods,” International Journal of Behavioral Nutrition and Physical Activity, 17, 1–10. Reisert, M. and Burkhardt, H. (2007), “Learning Equivariant Functions with Matrix Valued Kernels,” Journal of Machine Learning Research, 8, 385–408. Sansone, G. (1959), Orthogonal Functions, Dover Books on Mathematics Series, Dover. Scealy, J. L. and Welsh, A. H. (2011), “Regression for compositional data by using distributions defined on the hypersphere,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 351–375. Smola, A., Gretton, A., Song, L., and Schölkopf, B. (2007), “A Hilbert space embedding for distributions,” in International Conference on Algorithmic Learning Theory, Springer, pp. 13–31. Stein, E. M. and Weiss, G. (1971), Introduction to Fourier Analysis on Euclidean Spaces, Princeton Mathematical Series, Princeton University Press. 32 Wahba, G. (1981), “Spline interpolation and smoothing on the sphere,” SIAM Journal on Scientific and Statistical Computing, 2, 5–16. Zhao, L. and Wu, C. (2001), “Central limit theorem for integrated square error of kernel estimators of spherical density,” Science in China Series A: Mathematics, 44, 474–483. Zhou, F. and Johnstone, I. M. (2019), “Eigenvalue distributions of variance components estimators in high-dimensional random effects models,” Annals of Statistics, 47, 2855– 2886. Zhou, H., Li, L., and Zhu, H. (2013), “Tensor Regression with applications in neuroimaging data analysis,” Journal of the American Statistical Association, 108, 540–552. 33

2205.01158

Related documents

Products

Support

2205.01158

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib