Optimal Uniform and Lp Rates for Random Fourier Features

Optimal Uniform and Lp Rates for Random Fourier Features Zoltán Szabó∗ Gatsby Unit, UCL Joint work with Bharath K. Sriperumbudur∗ Department of Statistics, PSU (∗ equal contribution) Pennsylvania State University December 4, 2015 Zoltán Szabó Optimal Rates for RFFs Outline Kernels and kernel derivatives. Random Fourier features (RFFs). Guarantees on RFF approximation: uniform, Lr . Zoltán Szabó Optimal Rates for RFFs Kernel, RKHS k : X × X → R kernel on X, if ∃ϕ : X → H(ilbert space) feature map, k(a, b) = hϕ(a), ϕ(b)iH (∀a, b ∈ X). Zoltán Szabó Optimal Rates for RFFs Kernel, RKHS k : X × X → R kernel on X, if ∃ϕ : X → H(ilbert space) feature map, k(a, b) = hϕ(a), ϕ(b)iH (∀a, b ∈ X). Kernel examples: X = Rd (p > 0, θ > 0) p k(a, b) = (ha, bi + θ) : polynomial, 2 2 k(a, b) = e −ka−bk2 /(2θ ) : Gaussian, k(a, b) = e −θka−bk2 : Laplacian. Zoltán Szabó Optimal Rates for RFFs Kernel, RKHS k : X × X → R kernel on X, if ∃ϕ : X → H(ilbert space) feature map, k(a, b) = hϕ(a), ϕ(b)iH (∀a, b ∈ X). Kernel examples: X = Rd (p > 0, θ > 0) p k(a, b) = (ha, bi + θ) : polynomial, 2 2 k(a, b) = e −ka−bk2 /(2θ ) : Gaussian, k(a, b) = e −θka−bk2 : Laplacian. In the H = H(k) RKHS (∃!): ϕ(u) = k(·, u). Zoltán Szabó Optimal Rates for RFFs RKHS: evaluation point of view Let H ⊂ RX be a Hilbert space. Consider for fixed x ∈ X the δx : f ∈ H 7→ f (x) ∈ R map. The evaluation functional is linear: δx (αf + βg ) = αδx (f ) + βδx (g ). Zoltán Szabó Optimal Rates for RFFs RKHS: evaluation point of view Let H ⊂ RX be a Hilbert space. Consider for fixed x ∈ X the δx : f ∈ H 7→ f (x) ∈ R map. The evaluation functional is linear: δx (αf + βg ) = αδx (f ) + βδx (g ). Def.: H is called RKHS if δx is continuous for ∀x ∈ X. Zoltán Szabó Optimal Rates for RFFs RKHS: reproducing point of view Let H ⊂ RX be a Hilbert space. k : X × X → is called a reproducing kernel of H if for ∀x ∈ X, f ∈ H 1 2 k(·, x) ∈ H, hf , k(·, x)iH = f (x) (reproducing property). Zoltán Szabó Optimal Rates for RFFs RKHS: reproducing point of view Let H ⊂ RX be a Hilbert space. k : X × X → is called a reproducing kernel of H if for ∀x ∈ X, f ∈ H 1 2 k(·, x) ∈ H, hf , k(·, x)iH = f (x) (reproducing property). Specifically, ∀x, y ∈ X k(x, y ) = hk(·, x), k(·, y )iH . Zoltán Szabó Optimal Rates for RFFs RKHS: positive-definite point of view Let us given a k : X × X → R symmetric function. k is called positive definite if ∀n ≥ 1, ∀(a1 , . . . , an ) ∈ Rn , (x1 , . . . , xn ) ∈ Xn n X ai aj k(xi , xj ) = aT Ga ≥ 0, i,j=1 where G = [k(xi , xj )]ni,j=1 . Zoltán Szabó Optimal Rates for RFFs Kernel: example domains (X) Euclidean space: X = Rd . Graphs, texts, time series, dynamical systems, distributions. Zoltán Szabó Optimal Rates for RFFs Kernel: application example – ridge regression Given: {(xi , yi )}ì=1 , H = H(k). Task: find f ∈ H s.t. f (xi ) ≈ yi , ` J(f ) = 1X [f (xi ) − yi ]2 + λ kf k2H → min (λ > 0). f ∈H ` i=1 Zoltán Szabó Optimal Rates for RFFs Kernel: application example – ridge regression Given: {(xi , yi )}ì=1 , H = H(k). Task: find f ∈ H s.t. f (xi ) ≈ yi , ` J(f ) = 1X [f (xi ) − yi ]2 + λ kf k2H → min (λ > 0). f ∈H ` i=1 Analytical solution, O(`3 ) – expensive: f (x) = [k(x1 , x), . . . , k(x` , x)](G + λÌ )−1 [y1 ; . . . ; y` ], G = [k(xi , xj )]ì,j=1 . Zoltán Szabó Optimal Rates for RFFs Kernel: application example – ridge regression Given: {(xi , yi )}ì=1 , H = H(k). Task: find f ∈ H s.t. f (xi ) ≈ yi , ` J(f ) = 1X [f (xi ) − yi ]2 + λ kf k2H → min (λ > 0). f ∈H ` i=1 Analytical solution, O(`3 ) – expensive: f (x) = [k(x1 , x), . . . , k(x` , x)](G + λÌ )−1 [y1 ; . . . ; y` ], G = [k(xi , xj )]ì,j=1 . Idea: Ĝ, matrix-inversion lemma, fast primal solvers → RFF. Zoltán Szabó Optimal Rates for RFFs Kernels: more generally Requirement: inner product on the inputs (k : X × X → R). Loss function (λ > 0): J(f ) = ` X V (yi , f (xi )) + λ kf k2H(k) → min . f ∈H(k) i=1 Zoltán Szabó Optimal Rates for RFFs Kernels: more generally Requirement: inner product on the inputs (k : X × X → R). Loss function (λ > 0): J(f ) = ` X V (yi , f (xi )) + λ kf k2H(k) → min . f ∈H(k) i=1 By the representer theorem [f (·) = J(α) = ` X P` i=1 αi k(·, xi )]: V (yi , (Gα)i ) + λαT Gα → min . α∈R` i=1 ⇒ k(xi , xj ) matters. Zoltán Szabó Optimal Rates for RFFs Kernel derivatives: application example Motivation: fitting ∞-D exp. family distributions [Sriperumbudur et al., 2014], k ↔ sufficient statistics, rich family, Zoltán Szabó Optimal Rates for RFFs Kernel derivatives: application example Motivation: fitting ∞-D exp. family distributions [Sriperumbudur et al., 2014], k ↔ sufficient statistics, rich family, fitting = linear equation: coefficient matrix: (d`) × (d`), d = dim(x), entries: kernel values and derivatives. Zoltán Szabó Optimal Rates for RFFs Kernel derivatives: more generally Objective: ` J(f ) = 1X V (yi , {∂ p f (xi )}p∈Ji ) + λ kf k2H(k) → min . ` f ∈H(k) i=1 Zoltán Szabó Optimal Rates for RFFs Kernel derivatives: more generally Objective: ` J(f ) = 1X V (yi , {∂ p f (xi )}p∈Ji ) + λ kf k2H(k) → min . ` f ∈H(k) i=1 [Zhou, 2008, Shi et al., 2010, Rosasco et al., 2010, Rosasco et al., 2013, Ying et al., 2012]: semi-supervised learning with gradient information, nonlinear variable selection. Kernel HMC [Strathmann et al., 2015]. Zoltán Szabó Optimal Rates for RFFs Focus X = Rd . k: continuous, shift-invariant [k(x, y) = k̃(x − y)]. By Bochner’s theorem: Z Z iω T (x−y) k(x, y) = e dΛ(ω) = cos ω T (x − y) dΛ(ω). Rd Rd Zoltán Szabó Optimal Rates for RFFs Focus X = Rd . k: continuous, shift-invariant [k(x, y) = k̃(x − y)]. By Bochner’s theorem: Z Z iω T (x−y) k(x, y) = e dΛ(ω) = cos ω T (x − y) dΛ(ω). Rd Rd i.i.d. RFF trick [Rahimi and Recht, 2007] (MC): ω1:m := (ωj )m j=1 ∼ Λ, m k̂(x, y) = 1 X cos ωjT (x − y) = m j=1 Zoltán Szabó Z cos ω T (x − y) dΛm (ω). Rd Optimal Rates for RFFs RFF – existing guarantee, basically Hoeffding inequality + union bound: r k − k̂ ∞ = Op L (S) |S| |{z} ! log m . m linear Zoltán Szabó Optimal Rates for RFFs RFF – existing guarantee, basically Hoeffding inequality + union bound: r k − k̂ ∞ = Op L (S) |S| |{z} ! log m . m linear Characteristic function point of view [Csörgő and Totik, 1983] (asymptotic!): 1 2 |Sm | = e o(m) is the optimal rate for a.s. convergence, For faster growing |Sm |: even convergence in probability fails. Zoltán Szabó Optimal Rates for RFFs Today: one-page summary 1 specifically Finite-sample L∞ -guarantee −−−−−−→ ! p log |S| k − k̂ ∞ = Oa.s. √ L (S) m ⇒ S can grow exponentially [|Sm | = e o(m) ] – optimal! Zoltán Szabó Optimal Rates for RFFs Today: one-page summary 1 specifically Finite-sample L∞ -guarantee −−−−−−→ ! p log |S| k − k̂ ∞ = Oa.s. √ L (S) m ⇒ S can grow exponentially [|Sm | = e o(m) ] – optimal! 2 Finite sample Lr guarantees, r ∈ [1, ∞). Zoltán Szabó Optimal Rates for RFFs Today: one-page summary 1 specifically Finite-sample L∞ -guarantee −−−−−−→ ! p log |S| k − k̂ ∞ = Oa.s. √ L (S) m ⇒ S can grow exponentially [|Sm | = e o(m) ] – optimal! 2 Finite sample Lr guarantees, r ∈ [1, ∞). 3 Derivatives: ∂ p,q k. Zoltán Szabó Optimal Rates for RFFs . . . , where Uniform (r = ∞), Lr (1 ≤ r < ∞) norm: kk − k̂kL∞ (S) := sup k(x, y) − k̂(x, y) , x,y∈S Z Z kk − k̂kLr (S) := r |k(x, y) − k̂(x, y)| dx dy S Zoltán Szabó S Optimal Rates for RFFs 1 r . . . . , where Uniform (r = ∞), Lr (1 ≤ r < ∞) norm: kk − k̂kL∞ (S) := sup k(x, y) − k̂(x, y) , x,y∈S Z Z kk − k̂kLr (S) := r 1 r |k(x, y) − k̂(x, y)| dx dy S . S Kernel derivatives: ∂ p,q k(x, y) = ∂ |p|+|q| k(x, y) , ∂x1p1 · · · ∂xdpd ∂y1q1 · · · ∂ydqd Zoltán Szabó Optimal Rates for RFFs |p| = d X j=1 |pj |. kk − k̂kL∞ (S) : proof idea 1 R Empirical process form [Pg := g dP]: sup k(x, y) − k̂(x, y) = sup |Λg − Λm g | = kΛ − Λm kG . g ∈G x,y∈S Zoltán Szabó Optimal Rates for RFFs kk − k̂kL∞ (S) : proof idea 1 R Empirical process form [Pg := g dP]: sup k(x, y) − k̂(x, y) = sup |Λg − Λm g | = kΛ − Λm kG . g ∈G x,y∈S 2 f (ω1:m ) = kΛ − Λm kG concentrates (bounded difference): 1 kΛ − Λm kG - Eω1:m kΛ − Λm kG + √ . m Zoltán Szabó Optimal Rates for RFFs kk − k̂kL∞ (S) : proof idea 1 R Empirical process form [Pg := g dP]: sup k(x, y) − k̂(x, y) = sup |Λg − Λm g | = kΛ − Λm kG . g ∈G x,y∈S 2 f (ω1:m ) = kΛ − Λm kG concentrates (bounded difference): 1 kΛ − Λm kG - Eω1:m kΛ − Λm kG + √ . m 3 G is ’nice’ (uniformly bounded, separable Carathéodory) ⇒ Eω1:m kΛ − Λm kG - Eω1:m R (G, ω1:m ) . | {z } 1 Pm E supg ∈G m j=1 j g (ωj ) Zoltán Szabó Optimal Rates for RFFs Proof idea 4 Using Dudley’s entropy bound: 1 R (G, ω1:m ) - √ m Z |G|L2 (Λm ) q log N (G, L2 (Λm ), u)du. 0 Zoltán Szabó Optimal Rates for RFFs Proof idea 4 Using Dudley’s entropy bound: 1 R (G, ω1:m ) - √ m 5 Z |G|L2 (Λm ) q log N (G, L2 (Λm ), u)du. 0 G is smoothly parameterized by a compact set ⇒ N (G, L2 (Λm ), u) ≤ 4|S|A +1 u Zoltán Szabó d v u X u1 m , A(ω1:m ) = t kωj k22 . m Optimal Rates for RFFs j=1 Proof idea 4 Using Dudley’s entropy bound: 1 R (G, ω1:m ) - √ m 5 |G|L2 (Λm ) q log N (G, L2 (Λm ), u)du. 0 G is smoothly parameterized by a compact set ⇒ N (G, L2 (Λm ), u) ≤ 6 Z 4|S|A +1 u d v u X u1 m , A(ω1:m ) = t kωj k22 . m j=1 Putting together [|G|L2 (Λm ) ≤ 2, Jensen inequality] we get . . . Zoltán Szabó Optimal Rates for RFFs L∞ result for k Let k be continuous, σ 2 := and compact set S ⊂ Rd m Λ kk̂ − kkL∞ (S) R kωk2 dΛ(ω) < ∞. Then for ∀τ > 0 h(d, |S|, σ) + √ ≥ m √ 2τ s p h(d, |S|, σ) := 32 2dlog(2|S| + 1) + 16 32 ! 2d + log(2|S| + 1) p 2d log(σ + 1). Zoltán Szabó ≤ e −τ , Optimal Rates for RFFs Consequence-1 (Borel-Cantelli lemma) m→∞ A.s. convergence on compact sets: k̂ −−−−→ k at rate Zoltán Szabó Optimal Rates for RFFs q log |S| m . Consequence-1 (Borel-Cantelli lemma) m→∞ A.s. convergence on compact sets: k̂ −−−−→ k at rate Growing diameter: log |Sm | m m→∞ −−−−→ 0 is enough, i.e. |Sm | = e o(m) . Zoltán Szabó Optimal Rates for RFFs q log |S| m . Consequence-1 (Borel-Cantelli lemma) m→∞ A.s. convergence on compact sets: k̂ −−−−→ k at rate Growing diameter: log |Sm | m q log |S| m . m→∞ −−−−→ 0 is enough, i.e. |Sm | = e o(m) . Specifically: asymptotic optimality [Csörgő and Totik, 1983, Theorem 2] (if k(z) vanishes at ∞). Zoltán Szabó Optimal Rates for RFFs Consequence-2: Lr result for k (1 ≤ r ) Idea: Note that Z Z kk̂ − kkLr (S) = r |k̂(x, y) − k(x, y)| dx dy S S ≤ kk̂ − kkL∞ (S) vol2/r (S). Zoltán Szabó Optimal Rates for RFFs 1 r Consequence-2: Lr result for k (1 ≤ r ) Idea: Note that Z Z kk̂ − kkLr (S) = 1 r r |k̂(x, y) − k(x, y)| dx dy S S ≤ kk̂ − kkL∞ (S) vol2/r (S). n vol(S) ≤ vol(B), where B := x ∈ Rd : kxk2 ≤ Zoltán Szabó Optimal Rates for RFFs |S| 2 o , Consequence-2: Lr result for k (1 ≤ r ) Idea: Note that Z Z kk̂ − kkLr (S) = r |k̂(x, y) − k(x, y)| dx dy S S ≤ kk̂ − kkL∞ (S) vol2/r (S). n o vol(S) ≤ vol(B), where B := x ∈ Rd : kxk2 ≤ |S| 2 , R∞ d/2 d vol(B) = dπ d|S| , Γ(t) = 0 u t−1 e −u du. ⇒ 2 Γ 2 +1 Zoltán Szabó Optimal Rates for RFFs 1 r Lr result for k Under the previous assumptions, and 1 ≤ r < ∞:   !2/r √ d/2 d π |S| h(d, |S|, σ) + 2τ  √ Λm kk̂ − kkLr (S) ≥ ≤ e −τ . d d m 2 Γ( 2 + 1) Zoltán Szabó Optimal Rates for RFFs Lr result for k Under the previous assumptions, and 1 ≤ r < ∞:   !2/r √ d/2 d π |S| h(d, |S|, σ) + 2τ  √ Λm kk̂ − kkLr (S) ≥ ≤ e −τ . d d m 2 Γ( 2 + 1) Hence, kk̂ − kkLr (S) = Oa.s. ! p |S|2d/r log |S| √ . m | {z } m→∞ Lr (S)-consistency if −−−−→ 0 Zoltán Szabó Optimal Rates for RFFs Lr result for k Under the previous assumptions, and 1 ≤ r < ∞:   !2/r √ d/2 d π |S| h(d, |S|, σ) + 2τ  √ Λm kk̂ − kkLr (S) ≥ ≤ e −τ . d d m 2 Γ( 2 + 1) Hence, kk̂ − kkLr (S) = Oa.s. ! p |S|2d/r log |S| √ . m | {z } m→∞ Lr (S)-consistency if −−−−→ 0 Uniform guarantee: |Sm | = e m δ<1 ; now: Zoltán Szabó |Sm |2d/r √ m r → 0 ⇒ |Sm | = õ(m 4d ). Optimal Rates for RFFs Direct Lr result for k (proof idea after discussion) Under the previous assumptions, and 1 < r < ∞:   !2/r √ ! d/2 |S|d 0 π C 2τ r  ≤ e −τ , Λm kk̂ − kkLr (S) ≥ + √ 1−max{ 12 , 1r } m 2d Γ( d2 + 1) m √ Cr0 = O( r ): universal constant; only r -dependent (not |S| or m-dep.). Zoltán Szabó Optimal Rates for RFFs Direct Lr result for k (proof idea after discussion) Under the previous assumptions, and 1 < r < ∞:   !2/r √ ! d/2 |S|d 0 π C 2τ r  ≤ e −τ , Λm kk̂ − kkLr (S) ≥ + √ 1−max{ 12 , 1r } m 2d Γ( d2 + 1) m √ Cr0 = O( r ): universal constant; only r -dependent (not |S| or m-dep.). Note: if 2 ≤ r , then √ 1 1 1 m 1−max{ 2 , r } = m, r a.s. 2 kk̂ − kk r → 0 if |S | = o m 4d as m → ∞. m L (Sm ) p 3 In short, we got rid of log(|S|): õ → o. Zoltán Szabó Optimal Rates for RFFs Direct Lr result for k: proof idea 1 f (ω1 , . . . , ωm ) = kk − k̂kLr (S) concentrates (bounded difference): r 2τ kk − k̂kLr (S) ≤ Eω1:m kk − k̂kLr (S) + vol2/r (S) . m Zoltán Szabó Optimal Rates for RFFs Direct Lr result for k: proof idea 1 2 f (ω1 , . . . , ωm ) = kk − k̂kLr (S) concentrates (bounded difference): r 2τ kk − k̂kLr (S) ≤ Eω1:m kk − k̂kLr (S) + vol2/r (S) . m 0 By Lr ∼ = (Lr )∗ ( 1r + symmetrization: 1 r0 Eω1:m kk − k̂kLr (S) 0 = 1), the separability of Lr (S) (r > 1) and m X 2 ≤ Eω1:m Eε εi cos(hωi , · − ·i) . m i=1 Lr (S) | {z } =:(∗) Zoltán Szabó Optimal Rates for RFFs Direct Lr result for k: proof idea 1 2 f (ω1 , . . . , ωm ) = kk − k̂kLr (S) concentrates (bounded difference): r 2τ kk − k̂kLr (S) ≤ Eω1:m kk − k̂kLr (S) + vol2/r (S) . m 0 By Lr ∼ = (Lr )∗ ( 1r + symmetrization: 1 r0 Eω1:m kk − k̂kLr (S) 0 = 1), the separability of Lr (S) (r > 1) and m X 2 ≤ Eω1:m Eε εi cos(hωi , · − ·i) . m i=1 Lr (S) | {z } =:(∗) 3 Since Lr (S) is of type min(2, r ) [’-rule’] ∃Cr0 such that (∗) ≤ Cr0 m X ! k cos(hωi , · − min(2,r ) ·i)kLr (S) i=1 Zoltán Szabó Optimal Rates for RFFs 1 min(2,r ) . Kernel derivatives: N2d 3 [p; q] 6= 0 p,q . If Goal: kd 1 supp(Λ) is bounded: h i 2 Ck,p,q := Eω∼Λ |ω p+q | kωk2 < ∞: L∞ , Lr X, but Gaussian, Laplacian, inverse multiquadratic, Matern:( c0 universality ⇔ supp(Λ) = Rd , if k(z) ∈ C0 (Rd ). Zoltán Szabó Optimal Rates for RFFs Kernel derivatives: N2d 3 [p; q] 6= 0 p,q . If Goal: kd 1 supp(Λ) is bounded: h i 2 Ck,p,q := Eω∼Λ |ω p+q | kωk2 < ∞: L∞ , Lr X, but Gaussian, Laplacian, inverse multiquadratic, Matern:( c0 universality ⇔ supp(Λ) = Rd , if k(z) ∈ C0 (Rd ). 2 supp(Λ) is unbounded: G: becomes unbounded. [Rahimi and Recht, 2007]: ’Hoeffding → Bernstein’, but Zoltán Szabó Optimal Rates for RFFs Kernel derivatives: unbounded supp(Λ) Assumptions [ha = cos (a) , S∆ = S − S]: 1 z 7→ ∇ [∂ p,q k(z)]: continuous; S ⊂ Rd : compact, z Ep,q := Eω∼Λ |ω p+q | kωk2 < ∞. 2 ∃L > 0, σ > 0 M! σ 2 LM−2 (∀M ≥ 2, ∀z ∈ S∆ ), 2 f (z; ω) = ∂ p,q k(z) − ω p (−ω)q h|p+q| (ω T z). Eω∼Λ |f (z; ω)|M ≤ Zoltán Szabó Optimal Rates for RFFs Kernel derivatives: unbounded supp(Λ) d 1 Then with Fd := d − d+1 + d d+1 p,q kk ∞ Λm k∂ p,q k − ∂[ ≥ ≤ L (S) − ≤ 2d−1 e 2 m 8σ 2 1+ L2 2σ + Fd 2 4d−1 d+1 |S|(Dp,q,S + Ep,q ) where Dp,q,S := supz∈conv (S∆ ) k∇z [∂ p,q k(z)]k2 . Zoltán Szabó Optimal Rates for RFFs d d+1 − e 2 m 8(d+1)σ 2 1+ L2 2σ , Summary Finite sample spec. L∞ (S) guarantees −−−→ |Sm | = e o(m) – asymp. optimal! Lr (S) results (⇐ uniform, type of Lr ). Zoltán Szabó Optimal Rates for RFFs Summary Finite sample spec. L∞ (S) guarantees −−−→ |Sm | = e o(m) – asymp. optimal! Lr (S) results (⇐ uniform, type of Lr ). derivative approximation guarantees: bounded spectral support: X unbounded spectral support: trickier – to be continued;) Zoltán Szabó Optimal Rates for RFFs Thank you for the attention! Acknowledgments: This work was supported by the Gatsby Charitable Foundation. Zoltán Szabó Optimal Rates for RFFs Csörgő, S. and Totik, V. (1983). On how long interval is the empirical characteristic function uniformly consistent? Acta Scientiarum Mathematicarum, 45:141–149. Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. In Neural Information Processing Systems (NIPS), pages 1177–1184. Rosasco, L., Santoro, M., Mosci, S., Verri, A., and Villa, S. (2010). A regularization approach to nonlinear variable selection. JMLR W&CP – International Conference on Artificial Intelligence and Statistics (AISTATS), 9:653–660. Rosasco, L., Villa, S., Mosci, S., Santoro, M., and Verri, A. (2013). Nonparametric sparsity and regularization. Journal of Machine Learning Research, 14:1665–1714. Zoltán Szabó Optimal Rates for RFFs Shi, L., Guo, X., and Zhou, D.-X. (2010). Hermite learning with gradient data. Journal of Computational and Applied Mathematics, 233:3046–3059. Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Hyvärinen, A., and Kumar, R. (2014). Density estimation in infinite dimensional exponential families. Technical report. http://arxiv.org/pdf/1312.3516.pdf. Strathmann, H., Sejdinovic, D., Livingstone, S., Szabó, Z., and Gretton, A. (2015). Gradient-free Hamiltonian Monte Carlo with efficient kernel exponential families. In Neural Information Processing Systems (NIPS). Ying, Y., Wu, Q., and Campbell, C. (2012). Learning the coordinate gradients. Advances in Computational Mathematics, 37:355–378. Zoltán Szabó Optimal Rates for RFFs Zhou, D.-X. (2008). Derivative reproducing properties for kernel methods in learning theory. Journal of Computational and Applied Mathematics, 220:456–463. Zoltán Szabó Optimal Rates for RFFs

Optimal Uniform and Lp Rates for Random Fourier Features

Related documents

Products

Support

Optimal Uniform and Lp Rates for Random Fourier Features

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib