slides

Distribution Regression: Computational & Statistical Tradeoffs Zoltán Szabó (Gatsby Unit, UCL) Joint work with ◦ Bharath K. Sriperumbudur (Department of Statistics, PSU), ◦ Barnabás Póczos (ML Department, CMU), ◦ Arthur Gretton (Gatsby Unit, UCL) Princeton University November 26, 2015 Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Outline Motivation: application + theory. Problem formulation. Results: computational & statistical tradeoffs. Numerical examples. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs The task Samples: {(xi , yi )}`i=1 . Find f ∈ H such that f (xi ) ≈ yi . Distribution regression: xi -s are distributions, i available only through samples: {xi,n }N n=1 , labelled bags. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs The task Samples: {(xi , yi )}`i=1 . Find f ∈ H such that f (xi ) ≈ yi . Distribution regression: xi -s are distributions, i available only through samples: {xi,n }N n=1 , labelled bags. Goal: computational & statistical tradeoffs implied by N := Ni (∀i). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Motivation (application): aerosol prediction Bag := pixels of a multispectral satellite image over an area. Label of a bag := aerosol value. Relevance: climate research, sustainability. Engineered methods [Wang et al., 2012]: 100 × RMSE = 7.5 − 8.5. Using distribution regression? Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Wider context Context: machine learning: multi-instance learning, statistics: point estimation tasks (without analytical formula). Applications: computer vision: image = collection of patch vectors, network analysis: group of people = bag of friendship graphs, natural language processing: corpus = bag of documents, time-series modelling: user = set of trial time-series. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Several algorithmic approaches 1 Parametric fit: Gaussian, MOG, exp. family [Jebara et al., 2004, Wang et al., 2009, Nielsen and Nock, 2012]. 2 Kernelized Gaussian measures: [Jebara et al., 2004, Zhou and Chellappa, 2006]. 3 (Positive definite) kernels: [Cuturi et al., 2005, Martins et al., 2009, Hein and Bousquet, 2005]. 4 Divergence measures (KL, Rényi, Tsallis, . . . ): [Póczos et al., 2011, Kandasamy et al., 2015]. 5 Set metrics: Hausdorff metric [Edgar, 1995]; variants [Wang and Zucker, 2000, Wu et al., 2010, Zhang and Zhou, 2009, Chen and Wu, 2012]. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Motivation: theory MIL dates back to [Haussler, 1999, Gärtner et al., 2002]. Sensible methods in regression: require density estimation [Póczos et al., 2013, Oliva et al., 2014, Reddi and Póczos, 2014, Sutherland et al., 2015] + assumptions: 1 2 compact Euclidean domain. output = R ([Oliva et al., 2013, Oliva et al., 2015]: distribution/function). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Input-output requirements Input: distributions on ’structured’ D domains (kernels). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Input-output requirements Input: distributions on ’structured’ D domains (kernels). Output: simplest case: Y = R, but Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Input-output requirements Input: distributions on ’structured’ D domains (kernels). Output: simplest case: Y = R, but dependencies might matter: Y = Rd (or separable Hilbert). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Kernel, RKHS k : D × D → R kernel on D, if ∃ϕ : D → H(ilbert space) feature map, k(a, b) = hϕ(a), ϕ(b)iH (∀a, b ∈ D). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Kernel, RKHS k : D × D → R kernel on D, if ∃ϕ : D → H(ilbert space) feature map, k(a, b) = hϕ(a), ϕ(b)iH (∀a, b ∈ D). Kernel examples: D = Rd (p > 0, θ > 0) p k(a, b) = (ha, bi + θ) : polynomial, 2 2 k(a, b) = e −ka−bk2 /(2θ ) : Gaussian, k(a, b) = e −θka−bk1 : Laplacian. In the H = H(k) RKHS (∃!): ϕ(u) = k(·, u). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Kernel: example domains (D) Euclidean space (D = Rd ), graphs, texts, time series, dynamical systems, distributions! Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Problem formulation (Y = R) Given: ` labelled bags ẑ = {(x̂i , yi )}i=1 , i.i.d. i th bag: x̂i = {xi,1 , . . . , xi,N } ∼ xi ∈ P (D), yi ∈ R. Task: find a P (D) → R mapping based on ẑ. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Problem formulation (Y = R) Given: ` labelled bags ẑ = {(x̂i , yi )}i=1 , i.i.d. i th bag: x̂i = {xi,1 , . . . , xi,N } ∼ xi ∈ P (D), yi ∈ R. Task: find a P (D) → R mapping based on ẑ. Construction: mean embedding (µx ) µ=µ(k) P (D) −−−−−→X ⊆ H = H(k) | {z } 2 =two-stage sampling Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Problem formulation (Y = R) Given: ` labelled bags ẑ = {(x̂i , yi )}i=1 , i.i.d. i th bag: x̂i = {xi,1 , . . . , xi,N } ∼ xi ∈ P (D), yi ∈ R. Task: find a P (D) → R mapping based on ẑ. Construction: mean embedding (µx ) + ridge regression µ=µ(k) f ∈H=H(K ) P (D) −−−−−→X ⊆ H = H(k)−−−−−−−→R . | {z } | {z } 2 =two-stage sampling Zoltán Szabó 1 =Hilbert → R regression Distribution Regression: Computational & Statistical Tradeoffs 1 : Hilbert → R regression, well-specified case Regression function, expected risk (assume for a moment: fρ ∈ H): Z fρ (µx ) = y dρ(y |µx ), R [f ] = E(x,y ) |f (µx ) − y |2 . R Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs 1 : Hilbert → R regression, well-specified case Regression function, expected risk (assume for a moment: fρ ∈ H): Z fρ (µx ) = y dρ(y |µx ), R [f ] = E(x,y ) |f (µx ) − y |2 . R Ridge estimator: ` fzλ = arg min f ∈H 1X |f (µxi ) − yi |2 + λ kf k2H , ` (λ > 0). i=1 Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs 1 : Hilbert → R regression, well-specified case Regression function, expected risk (assume for a moment: fρ ∈ H): Z fρ (µx ) = y dρ(y |µx ), R [f ] = E(x,y ) |f (µx ) − y |2 . R Ridge estimator: ` fzλ = arg min f ∈H 1X |f (µxi ) − yi |2 + λ kf k2H , ` (λ > 0). i=1 Excess risk: E(fzλ , fρ ) = R[fzλ ] − R[fρ ]. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs 1 : Hilbert → R regression Known [Caponnetto and De Vito, 2007]: if ρ(µx , y ) ∈ P(b, c), then the best/achieved rate bc E(fzλ , fρ ) = Op `− bc+1 (1 < b, c ∈ (1, 2]). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs 1 : Hilbert → R regression Known [Caponnetto and De Vito, 2007]: if ρ(µx , y ) ∈ P(b, c), then the best/achieved rate bc E(fzλ , fρ ) = Op `− bc+1 (1 < b, c ∈ (1, 2]). ρ ∈ P(b, c): Z T = K (·, µa )K ∗ (·, µa )dρX (µa ) : H → H. X c−1 Eigenvalues of T decay as λn = O(n−b ). fρ ∈ Im T 2 . Intuition: 1/b – effective input dimension, c – smoothness of fρ . Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Our question bc Can we reach this Op `− bc+1 minimax rate? N =? ` fẑλ 1X = arg min |f (µx̂i ) − yi |2 + λ kf k2H , ` f ∈H fzλ = arg min f ∈H 1 ` i=1 ` X |f (µxi ) − yi |2 + λ kf k2H . i=1 Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs 2 : mean embedding, µxi → µx̂i k : D × D → R kernel; canonical feature map: ϕ(u) = k(·, u). Mean embedding of a distribution x, x̂i ∈ P(D): Z µx = k(·, u)dx(u) ∈ H(k), D Z µx̂i = D k(·, u)dx̂i (u) = Zoltán Szabó N 1 X k(·, xi,n ). N n=1 Distribution Regression: Computational & Statistical Tradeoffs 2 : mean embedding, µxi → µx̂i k : D × D → R kernel; canonical feature map: ϕ(u) = k(·, u). Mean embedding of a distribution x, x̂i ∈ P(D): Z µx = k(·, u)dx(u) ∈ H(k), D Z µx̂i = D k(·, u)dx̂i (u) = N 1 X k(·, xi,n ). N n=1 Linear K ⇒ set kernel: N 1 X K (µx̂i , µx̂j ) = µx̂i , µx̂j H = 2 k(xi,n , xj,m ). N n,m=1 Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs 2 : mean embedding, µxi → µx̂i k : D × D → R kernel; canonical feature map: ϕ(u) = k(·, u). Mean embedding of a distribution x, x̂i ∈ P(D): Z µx = k(·, u)dx(u) ∈ H(k), D Z µx̂i = D k(·, u)dx̂i (u) = N 1 X k(·, xi,n ). N n=1 Nonlinear K example: K (µx̂i , µx̂j ) = e Zoltán Szabó − kµx̂ −µx̂ k2 i j H 2σ 2 . Distribution Regression: Computational & Statistical Tradeoffs 2 : ridge regression ⇒ analytical solution Given: training sample: ẑ, test distribution: t. Prediction on t: (fẑλ ◦ µ)(t) = k(K + `λI` )−1 [y1 ; . . . ; y` ], (1) K = [K (µx̂i , µx̂j )] ∈ R`×` , k = [K (µx̂1 , µt ), . . . , K (µx̂` , µt )] ∈ R (2) 1×` . (3) ⇒ K (µx , µx 0 ) = K (·, µx ), K (·, µ0x ) H(K ) matter. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs 1 - 2 : Why can we get consistency? – intuition Convergence of the mean embedding: 1 kµx − µx̂ kH(k) = Op √ . N Hölder property of K (0 < L, 0 < h ≤ 1): kK (·, µx ) − K (·, µx̂ )kH(K ) ≤ L kµx − µx̂ khH(k) . fẑλ depends ’nicely’ on µx̂ . Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs 1 - 2 : Proof idea By decomposing the excess risk, concentration, on P(b, c) we get E(fẑλ , fρ ) logh (`) ≤ N hλ | 1 1 1 1 +1+ + λc + 2 + 1 → 0, 1 2 2 1+ λ ` ` λ `λ b `λ b } | {z } {z 2 =two-stage sampling s.t. `λ b+1 b ≥ 1, log(`) 2 1 = H → R regression ≤ N. λh Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs 1 - 2 : Proof idea By decomposing the excess risk, concentration, on P(b, c) we get E(fẑλ , fρ ) logh (`) ≤ N hλ | 1 1 1 1 +1+ + λc + 2 + 1 → 0, 1 2 2 1+ λ ` ` λ `λ b `λ b } | {z } {z 2 =two-stage sampling s.t. `λ b+1 b ≥ 1, log(`) 2 1 = H → R regression ≤ N. λh a Let N = ` h log(`) ⇒ 1st term + constraints simplify. a > 0: needed, i.e. N > log(`). Bias-variance trick with constraint-checking ⇒ Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Computational & statistical tradeoff (W) If b(c bc b(c a≥ bc a≤ ac a + 1) , then E fẑλ , fρ = Op `− c+1 with λ = `− c+1 , +1 bc b + 1) then E fẑλ , fρ = Op `− bc+1 with λ = `− bc+1 . +1 Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Computational & statistical tradeoff (W) If b(c bc b(c a≥ bc a≤ ac a + 1) , then E fẑλ , fρ = Op `− c+1 with λ = `− c+1 , +1 bc b + 1) then E fẑλ , fρ = Op `− bc+1 with λ = `− bc+1 . +1 a Meaning (a-dependence, N = ` h log(`)): ’Smaller a’ = computational saving, but reduced statistical efficiency. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Computational & statistical tradeoff (W) If b(c bc b(c a≥ bc a≤ ac a + 1) , then E fẑλ , fρ = Op `− c+1 with λ = `− c+1 , +1 bc b + 1) then E fẑλ , fρ = Op `− bc+1 with λ = `− bc+1 . +1 a Meaning (a-dependence, N = ` h log(`)): ’Smaller a’ = computational saving, but reduced statistical efficiency. b(c + 1) Sensible choice: a ≤ < 2: bc + 1 N sub-quadratic in ` achieves onestage sampled minimax rate! (’=’) Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Computational & statistical tradeoff (W) If b(c bc b(c a≥ bc a≤ ac a + 1) , then E fẑλ , fρ = Op `− c+1 with λ = `− c+1 , +1 bc b + 1) then E fẑλ , fρ = Op `− bc+1 with λ = `− bc+1 . +1 a Meaning (h-dependence, N = ` h log(`)): smoother K kernel is rewarding = bag-size reduction; see smoothness of fρ . Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Computational & statistical tradeoff (W) If b(c bc b(c a≥ bc a≤ ac a + 1) , then E fẑλ , fρ = Op `− c+1 with λ = `− c+1 , +1 bc b + 1) then E fẑλ , fρ = Op `− bc+1 with λ = `− bc+1 . +1 Meaning (c-dependence): b(c + 1) c 7→ decreasing: smaller bags are enough for easier bc + 1 problems. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Misspecified setting Relevant case: fρ ∈ L2ρX \H. fρ : difficulty parameter = s ∈ (0, 1], larger s = easier problem. Proof idea: ∞-D exponential family fitting [Sriperumbudur et al., 2014], ridge solution. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Computational & statistical tradeoff (M) 2a Let N = ` h log(`) (a > 0). If 2sa a s +1 a≤ , then E fẑλ , fρ = Op `− s+1 with λ = `− s+1 , s +2 2s 1 s +1 a≥ , then E fẑλ , fρ = Op `− s+2 with λ = `− s+2 . s +2 Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Computational & statistical tradeoff (M) 2a Let N = ` h log(`) (a > 0). If 2sa a s +1 a≤ , then E fẑλ , fρ = Op `− s+1 with λ = `− s+1 , s +2 2s 1 s +1 a≥ , then E fẑλ , fρ = Op `− s+2 with λ = `− s+2 . s +2 Meaning (a-dependence): ’Smaller a’ = computational saving, but reduced statistical efficiency. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Computational & statistical tradeoff (M) 2a Let N = ` h log(`) (a > 0). If 2sa a s +1 a≤ , then E fẑλ , fρ = Op `− s+1 with λ = `− s+1 , s +2 2s 1 s +1 a≥ , then E fẑλ , fρ = Op `− s+2 with λ = `− s+2 . s +2 Meaning (a-dependence): ’Smaller a’ = computational saving, but reduced statistical efficiency. s +1 2 4 Sensible choice: a ≤ ≤ ⇒ 2a ≤ < 2! s +2 3 3 N can be sub-quadratic in ` again (’=’)! Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Computational & statistical tradeoff (M) 2a Let N = ` h log(`) (a > 0). If 2sa a s +1 a≤ , then E fẑλ , fρ = Op `− s+1 with λ = `− s+1 , s +2 2s 1 s +1 a≥ , then E fẑλ , fρ = Op `− s+2 with λ = `− s+2 . s +2 Meaning (h-dependence): 2a h 7→ is decreasing: smoother K kernel is rewarding. h Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Computational & statistical tradeoff (M) 2a Let N = ` h log(`) (a > 0). If 2sa a s +1 a≤ , then E fẑλ , fρ = Op `− s+1 with λ = `− s+1 , s +2 2s 1 s +1 a≥ , then E fẑλ , fρ = Op `− s+2 with λ = `− s+2 . s +2 Meaning (s-dependence): s 7→ = better rate, 2s is increasing, i.e easier task s +2 s → 0: arbitrary slow rate. 2 s = 1: `− 3 rate. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Optimality of the rate (M) 2s Our rate: r (`) = `− s+2 – range space assumption (s). 2s One-stage sampled optimal rate: ro (`) = `− 2s+1 [Steinwart et al., 2009], range-space assumption + eigendecay constraint, D: compact metric, Y = R. 0.8 Rate 0.6 0.4 −logl[ro(l)]=2s/(2s+1) 0.2 −logl[r(l)]=2s/(s+2) 0 0 0.2 0.4 0.6 Smoothness (s) Zoltán Szabó 0.8 1 Distribution Regression: Computational & Statistical Tradeoffs Blanket assumptions: both settings D: separable, topological domain. k: bounded: supu∈D k(u, u) ≤ Bk ∈ (0, ∞), continuous. K : bounded; Hölder continuous: ∃L > 0, h ∈ (0, 1] such that kK (·, µa ) − K (·, µb )kH ≤ L kµa − µb khH . y : bounded. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Hölder K examples (other than the linear K when h=1) In case of compact metric D, universal k: KG e Ke µa −µ 2 b H − 2θ 2 e h=1 KC µa −µ b H − 2θ 2 1 2 h= Ki 1 + kµa − µb kθH h= θ 2 −1 h=1 Kt 1 + kµa − µb k2H /θ2 −1 (θ ≤ 2) kµa − µb k2H + θ2 − 1 2 h=1 Functions of kµa − µb kH ⇒ computation: similar to set kernel. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Vector-valued output: similarly K (µa , µb ) ∈ L(Y ). Prediction on a new test distribution (t): (fẑλ ◦ µ)(t) = k(K + `λI` )−1 [y1 ; . . . ; y` ], (4) K = [K (µx̂i , µx̂j )] ∈ L(Y )`×` , (5) k = [K (µx̂1 , µt ), . . . , K (µx̂` , µt )] ∈ L(Y ) 1×` . (6) Specifically: Y = R ⇒ L(Y ) = R; Y = Rd ⇒ L(Y ) = Rd×d . Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Demo Supervised entropy learning: RMSE: MERR=0.75, DFDR=2.02 2 4 0 RMSE entropy 1 true MERR −1 3 2 DFDR 1 −2 0 1 2 rotation angle (β) Zoltán Szabó 3 MERR DFDR Distribution Regression: Computational & Statistical Tradeoffs Demo Supervised entropy learning: RMSE: MERR=0.75, DFDR=2.02 2 4 0 RMSE entropy 1 true MERR −1 3 2 DFDR 1 −2 0 1 2 rotation angle (β) 3 MERR DFDR Aerosol prediction from satellite images: State-of-the-art baseline: 7.5 − 8.5 (±0.1 − 0.6). MERR: 7.81 (±1.64). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Summary Problem: distribution regression (k). Contribution: computational & statistical tradeoff analysis, specifically, the set kernel is consistent: 16-year-old open question, minimax optimal rate is achievable: sub-quadratic bag size. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Summary Problem: distribution regression (k). Contribution: computational & statistical tradeoff analysis, specifically, the set kernel is consistent: 16-year-old open question, minimax optimal rate is achievable: sub-quadratic bag size. Code in ITE, analysis submitted to JMLR: https://bitbucket.org/szzoli/ite/ http://arxiv.org/abs/1411.2066. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Thank you for the attention! Acknowledgments: This work was supported by the Gatsby Charitable Foundation, and by NSF grants IIS1247658 and IIS1250350. A part of the work was carried out while Bharath K. Sriperumbudur was a research fellow in the Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge, UK. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Caponnetto, A. and De Vito, E. (2007). Optimal rates for regularized least-squares algorithm. Foundations of Computational Mathematics, 7:331–368. Chen, Y. and Wu, O. (2012). Contextual Hausdorff dissimilarity for multi-instance clustering. In International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pages 870–873. Cuturi, M., Fukumizu, K., and Vert, J.-P. (2005). Semigroup kernels on measures. Journal of Machine Learning Research, 6:11691198. Edgar, G. (1995). Measure, Topology and Fractal Geometry. Springer-Verlag. Gärtner, T., Flach, P. A., Kowalczyk, A., and Smola, A. (2002). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Multi-instance kernels. In International Conference on Machine Learning (ICML), pages 179–186. Haussler, D. (1999). Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz. (http://cbse.soe.ucsc.edu/sites/default/files/ convolutions.pdf). Hein, M. and Bousquet, O. (2005). Hilbertian metrics and positive definite kernels on probability measures. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 136–143. Jebara, T., Kondor, R., and Howard, A. (2004). Probability product kernels. Journal of Machine Learning Research, 5:819–844. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Kandasamy, K., Krishnamurthy, A., Póczos, B., Wasserman, L., and Robins, J. M. (2015). Influence functions for machine learning: Nonparametric estimators for entropies, divergences and mutual informations. Technical report, Carnegie Mellon University. http://arxiv.org/abs/1411.4342. Martins, A. F. T., Smith, N. A., Xing, E. P., Aguiar, P. M. Q., and Figueiredo, M. A. T. (2009). Nonextensive information theoretical kernels on measures. Journal of Machine Learning Research, 10:935–975. Nielsen, F. and Nock, R. (2012). A closed-form expression for the Sharma-Mittal entropy of exponential families. Journal of Physics A: Mathematical and Theoretical, 45:032003. Oliva, J., Neiswanger, W., Póczos, B., Xing, E., and Schneider, J. (2015). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Fast function to function regression. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 717–725. Oliva, J., Póczos, B., and Schneider, J. (2013). Distribution to distribution regression. International Conference on Machine Learning (ICML; JMLR W&CP), 28:1049–1057. Oliva, J. B., Neiswanger, W., Póczos, B., Schneider, J., and Xing, E. (2014). Fast distribution to real regression. International Conference on Artificial Intelligence and Statistics (AISTATS; JMLR W&CP), 33:706–714. Póczos, B., Rinaldo, A., Singh, A., and Wasserman, L. (2013). Distribution-free distribution regression. International Conference on Artificial Intelligence and Statistics (AISTATS; JMLR W&CP), 31:507–515. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Póczos, B., Xiong, L., and Schneider, J. (2011). Nonparametric divergence estimation with applications to machine learning on distributions. In Uncertainty in Artificial Intelligence (UAI), pages 599–608. Reddi, S. J. and Póczos, B. (2014). k-NN regression on functional data with incomplete observations. In Conference on Uncertainty in Artificial Intelligence (UAI), pages 692–701. Sriperumbudur, B. K., Fukumizu, K., Kumar, R., Gretton, A., and Hyvärinen, A. (2014). Density estimation in infinite dimensional exponential families. Technical report. (http://arxiv.org/pdf/1312.3516). Steinwart, I., Hush, D. R., and Scovel, C. (2009). Optimal rates for regularized least squares regression. In Conference on Learning Theory (COLT). Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Sutherland, D. J., Oliva, J. B., Póczos, B., and Schneider, J. (2015). Linear-time learning on distributions with approximate kernel embeddings. Technical report, Carnegie Mellon University. http://arxiv.org/abs/1509.07553. Wang, F., Syeda-Mahmood, T., Vemuri, B. C., Beymer, D., and Rangarajan, A. (2009). Closed-form Jensen-Rényi divergence for mixture of Gaussians and applications to group-wise shape registration. Medical Image Computing and Computer-Assisted Intervention, 12:648–655. Wang, J. and Zucker, J.-D. (2000). Solving the multiple-instance problem: A lazy learning approach. In International Conference on Machine Learning (ICML), pages 1119–1126. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs Wang, Z., Lan, L., and Vucetic, S. (2012). Mixture model for multiple instance regression and applications in remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 50:2226–2237. Wu, O., Gao, J., Hu, W., Li, B., and Zhu, M. (2010). Identifying multi-instance outliers. In SIAM International Conference on Data Mining (SDM), pages 430–441. Zhang, M.-L. and Zhou, Z.-H. (2009). Multi-instance clustering with applications to multi-instance prediction. Applied Intelligence, 31:47–68. Zhou, S. K. and Chellappa, R. (2006). From sample similarity to ensemble similarity: Probabilistic distance measures in reproducing kernel Hilbert space. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs IEEE Transactions on Pattern Analysis and Machine Intelligence, 28:917–929. Zoltán Szabó Distribution Regression: Computational & Statistical Tradeoffs

slides

Related documents

Products

Support

slides

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib