slides

advertisement
Distribution Regression: Computational &
Statistical Tradeoffs
Zoltán Szabó (Gatsby Unit, UCL)
Joint work with
◦ Bharath K. Sriperumbudur (Department of Statistics, PSU),
◦ Barnabás Póczos (ML Department, CMU),
◦ Arthur Gretton (Gatsby Unit, UCL)
Princeton University
November 26, 2015
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Outline
Motivation: application + theory.
Problem formulation.
Results: computational & statistical tradeoffs.
Numerical examples.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
The task
Samples: {(xi , yi )}`i=1 . Find f ∈ H such that f (xi ) ≈ yi .
Distribution regression:
xi -s are distributions,
i
available only through samples: {xi,n }N
n=1 , labelled bags.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
The task
Samples: {(xi , yi )}`i=1 . Find f ∈ H such that f (xi ) ≈ yi .
Distribution regression:
xi -s are distributions,
i
available only through samples: {xi,n }N
n=1 , labelled bags.
Goal: computational & statistical tradeoffs implied by N := Ni (∀i).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Motivation (application): aerosol prediction
Bag := pixels of a multispectral satellite image over an area.
Label of a bag := aerosol value.
Relevance: climate research, sustainability.
Engineered methods [Wang et al., 2012]: 100 × RMSE = 7.5 − 8.5.
Using distribution regression?
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Wider context
Context:
machine learning: multi-instance learning,
statistics: point estimation tasks (without analytical formula).
Applications:
computer vision: image = collection of patch vectors,
network analysis: group of people = bag of friendship graphs,
natural language processing: corpus = bag of documents,
time-series modelling: user = set of trial time-series.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Several algorithmic approaches
1
Parametric fit: Gaussian, MOG, exp. family
[Jebara et al., 2004, Wang et al., 2009, Nielsen and Nock, 2012].
2
Kernelized Gaussian measures:
[Jebara et al., 2004, Zhou and Chellappa, 2006].
3
(Positive definite) kernels:
[Cuturi et al., 2005, Martins et al., 2009, Hein and Bousquet, 2005].
4
Divergence measures (KL, Rényi, Tsallis, . . . ):
[Póczos et al., 2011, Kandasamy et al., 2015].
5
Set metrics: Hausdorff metric [Edgar, 1995]; variants
[Wang and Zucker, 2000, Wu et al., 2010, Zhang and Zhou, 2009,
Chen and Wu, 2012].
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Motivation: theory
MIL dates back to [Haussler, 1999, Gärtner et al., 2002].
Sensible methods in regression: require density estimation
[Póczos et al., 2013, Oliva et al., 2014, Reddi and Póczos, 2014,
Sutherland et al., 2015] + assumptions:
1
2
compact Euclidean domain.
output = R ([Oliva et al., 2013, Oliva et al., 2015]: distribution/function).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Input-output requirements
Input: distributions on ’structured’ D domains (kernels).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Input-output requirements
Input: distributions on ’structured’ D domains (kernels).
Output:
simplest case: Y = R, but
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Input-output requirements
Input: distributions on ’structured’ D domains (kernels).
Output:
simplest case: Y = R, but
dependencies might matter: Y = Rd (or separable Hilbert).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Kernel, RKHS
k : D × D → R kernel on D, if
∃ϕ : D → H(ilbert space) feature map,
k(a, b) = hϕ(a), ϕ(b)iH (∀a, b ∈ D).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Kernel, RKHS
k : D × D → R kernel on D, if
∃ϕ : D → H(ilbert space) feature map,
k(a, b) = hϕ(a), ϕ(b)iH (∀a, b ∈ D).
Kernel examples: D = Rd (p > 0, θ > 0)
p
k(a, b) = (ha, bi + θ) : polynomial,
2
2
k(a, b) = e −ka−bk2 /(2θ ) : Gaussian,
k(a, b) = e −θka−bk1 : Laplacian.
In the H = H(k) RKHS (∃!): ϕ(u) = k(·, u).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Kernel: example domains (D)
Euclidean space (D = Rd ), graphs, texts, time series,
dynamical systems, distributions!
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Problem formulation (Y = R)
Given:
`
labelled bags ẑ = {(x̂i , yi )}i=1 ,
i.i.d.
i th bag: x̂i = {xi,1 , . . . , xi,N } ∼ xi ∈ P (D), yi ∈ R.
Task: find a P (D) → R mapping based on ẑ.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Problem formulation (Y = R)
Given:
`
labelled bags ẑ = {(x̂i , yi )}i=1 ,
i.i.d.
i th bag: x̂i = {xi,1 , . . . , xi,N } ∼ xi ∈ P (D), yi ∈ R.
Task: find a P (D) → R mapping based on ẑ.
Construction: mean embedding (µx )
µ=µ(k)
P (D) −−−−−→X ⊆ H = H(k)
|
{z
}
2 =two-stage sampling
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Problem formulation (Y = R)
Given:
`
labelled bags ẑ = {(x̂i , yi )}i=1 ,
i.i.d.
i th bag: x̂i = {xi,1 , . . . , xi,N } ∼ xi ∈ P (D), yi ∈ R.
Task: find a P (D) → R mapping based on ẑ.
Construction: mean embedding (µx ) + ridge regression
µ=µ(k)
f ∈H=H(K )
P (D) −−−−−→X ⊆ H = H(k)−−−−−−−→R .
|
{z
} |
{z
}
2 =two-stage sampling
Zoltán Szabó
1 =Hilbert → R regression
Distribution Regression: Computational & Statistical Tradeoffs
1 : Hilbert → R regression, well-specified case
Regression function, expected risk (assume for a moment: fρ ∈ H):
Z
fρ (µx ) =
y dρ(y |µx ),
R [f ] = E(x,y ) |f (µx ) − y |2 .
R
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
1 : Hilbert → R regression, well-specified case
Regression function, expected risk (assume for a moment: fρ ∈ H):
Z
fρ (µx ) =
y dρ(y |µx ),
R [f ] = E(x,y ) |f (µx ) − y |2 .
R
Ridge estimator:
`
fzλ = arg min
f ∈H
1X
|f (µxi ) − yi |2 + λ kf k2H ,
`
(λ > 0).
i=1
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
1 : Hilbert → R regression, well-specified case
Regression function, expected risk (assume for a moment: fρ ∈ H):
Z
fρ (µx ) =
y dρ(y |µx ),
R [f ] = E(x,y ) |f (µx ) − y |2 .
R
Ridge estimator:
`
fzλ = arg min
f ∈H
1X
|f (µxi ) − yi |2 + λ kf k2H ,
`
(λ > 0).
i=1
Excess risk:
E(fzλ , fρ ) = R[fzλ ] − R[fρ ].
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
1 : Hilbert → R regression
Known [Caponnetto and De Vito, 2007]: if ρ(µx , y ) ∈ P(b, c),
then the best/achieved rate
bc
E(fzλ , fρ ) = Op `− bc+1
(1 < b, c ∈ (1, 2]).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
1 : Hilbert → R regression
Known [Caponnetto and De Vito, 2007]: if ρ(µx , y ) ∈ P(b, c),
then the best/achieved rate
bc
E(fzλ , fρ ) = Op `− bc+1
(1 < b, c ∈ (1, 2]).
ρ ∈ P(b, c):
Z
T =
K (·, µa )K ∗ (·, µa )dρX (µa ) : H → H.
X
c−1 Eigenvalues of T decay as λn = O(n−b ). fρ ∈ Im T 2 .
Intuition: 1/b – effective input dimension, c – smoothness of fρ .
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Our question
bc
Can we reach this Op `− bc+1 minimax rate? N =?
`
fẑλ
1X
= arg min
|f (µx̂i ) − yi |2 + λ kf k2H ,
`
f ∈H
fzλ = arg min
f ∈H
1
`
i=1
`
X
|f (µxi ) − yi |2 + λ kf k2H .
i=1
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
2 : mean embedding, µxi → µx̂i
k : D × D → R kernel; canonical feature map: ϕ(u) = k(·, u).
Mean embedding of a distribution x, x̂i ∈ P(D):
Z
µx =
k(·, u)dx(u) ∈ H(k),
D
Z
µx̂i =
D
k(·, u)dx̂i (u) =
Zoltán Szabó
N
1 X
k(·, xi,n ).
N
n=1
Distribution Regression: Computational & Statistical Tradeoffs
2 : mean embedding, µxi → µx̂i
k : D × D → R kernel; canonical feature map: ϕ(u) = k(·, u).
Mean embedding of a distribution x, x̂i ∈ P(D):
Z
µx =
k(·, u)dx(u) ∈ H(k),
D
Z
µx̂i =
D
k(·, u)dx̂i (u) =
N
1 X
k(·, xi,n ).
N
n=1
Linear K ⇒ set kernel:
N
1 X
K (µx̂i , µx̂j ) = µx̂i , µx̂j H = 2
k(xi,n , xj,m ).
N
n,m=1
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
2 : mean embedding, µxi → µx̂i
k : D × D → R kernel; canonical feature map: ϕ(u) = k(·, u).
Mean embedding of a distribution x, x̂i ∈ P(D):
Z
µx =
k(·, u)dx(u) ∈ H(k),
D
Z
µx̂i =
D
k(·, u)dx̂i (u) =
N
1 X
k(·, xi,n ).
N
n=1
Nonlinear K example:
K (µx̂i , µx̂j ) = e
Zoltán Szabó
−
kµx̂ −µx̂ k2
i
j H
2σ 2
.
Distribution Regression: Computational & Statistical Tradeoffs
2 : ridge regression ⇒ analytical solution
Given:
training sample: ẑ,
test distribution: t.
Prediction on t:
(fẑλ ◦ µ)(t) = k(K + `λI` )−1 [y1 ; . . . ; y` ],
(1)
K = [K (µx̂i , µx̂j )] ∈ R`×` ,
k = [K (µx̂1 , µt ), . . . , K (µx̂` , µt )] ∈ R
(2)
1×`
.
(3)
⇒ K (µx , µx 0 ) = K (·, µx ), K (·, µ0x ) H(K ) matter.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
1 - 2 : Why can we get consistency? – intuition
Convergence of the mean embedding:
1
kµx − µx̂ kH(k) = Op √
.
N
Hölder property of K (0 < L, 0 < h ≤ 1):
kK (·, µx ) − K (·, µx̂ )kH(K ) ≤ L kµx − µx̂ khH(k) .
fẑλ depends ’nicely’ on µx̂ .
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
1 - 2 : Proof idea
By decomposing the excess risk, concentration, on P(b, c) we get
E(fẑλ , fρ )
logh (`)
≤
N hλ
|
1
1
1
1
+1+
+ λc + 2 + 1 → 0,
1
2
2
1+
λ `
` λ `λ b
`λ b }
|
{z
}
{z
2 =two-stage sampling
s.t. `λ
b+1
b
≥ 1,
log(`)
2
1 = H → R regression
≤ N.
λh
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
1 - 2 : Proof idea
By decomposing the excess risk, concentration, on P(b, c) we get
E(fẑλ , fρ )
logh (`)
≤
N hλ
|
1
1
1
1
+1+
+ λc + 2 + 1 → 0,
1
2
2
1+
λ `
` λ `λ b
`λ b }
|
{z
}
{z
2 =two-stage sampling
s.t. `λ
b+1
b
≥ 1,
log(`)
2
1 = H → R regression
≤ N.
λh
a
Let N = ` h log(`) ⇒ 1st term + constraints simplify.
a > 0: needed, i.e. N > log(`).
Bias-variance trick with constraint-checking ⇒
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Computational & statistical tradeoff (W)
If
b(c
bc
b(c
a≥
bc
a≤
ac a
+ 1)
, then E fẑλ , fρ = Op `− c+1 with λ = `− c+1 ,
+1
bc
b
+ 1)
then E fẑλ , fρ = Op `− bc+1 with λ = `− bc+1 .
+1
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Computational & statistical tradeoff (W)
If
b(c
bc
b(c
a≥
bc
a≤
ac a
+ 1)
, then E fẑλ , fρ = Op `− c+1 with λ = `− c+1 ,
+1
bc
b
+ 1)
then E fẑλ , fρ = Op `− bc+1 with λ = `− bc+1 .
+1
a
Meaning (a-dependence, N = ` h log(`)):
’Smaller a’ = computational saving, but reduced statistical
efficiency.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Computational & statistical tradeoff (W)
If
b(c
bc
b(c
a≥
bc
a≤
ac a
+ 1)
, then E fẑλ , fρ = Op `− c+1 with λ = `− c+1 ,
+1
bc
b
+ 1)
then E fẑλ , fρ = Op `− bc+1 with λ = `− bc+1 .
+1
a
Meaning (a-dependence, N = ` h log(`)):
’Smaller a’ = computational saving, but reduced statistical
efficiency.
b(c + 1)
Sensible choice: a ≤
< 2:
bc + 1
N sub-quadratic in ` achieves onestage sampled minimax rate! (’=’)
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Computational & statistical tradeoff (W)
If
b(c
bc
b(c
a≥
bc
a≤
ac a
+ 1)
, then E fẑλ , fρ = Op `− c+1 with λ = `− c+1 ,
+1
bc
b
+ 1)
then E fẑλ , fρ = Op `− bc+1 with λ = `− bc+1 .
+1
a
Meaning (h-dependence, N = ` h log(`)):
smoother K kernel is rewarding = bag-size reduction; see
smoothness of fρ .
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Computational & statistical tradeoff (W)
If
b(c
bc
b(c
a≥
bc
a≤
ac a
+ 1)
, then E fẑλ , fρ = Op `− c+1 with λ = `− c+1 ,
+1
bc
b
+ 1)
then E fẑλ , fρ = Op `− bc+1 with λ = `− bc+1 .
+1
Meaning (c-dependence):
b(c + 1)
c 7→
decreasing: smaller bags are enough for easier
bc + 1
problems.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Misspecified setting
Relevant case: fρ ∈ L2ρX \H.
fρ : difficulty parameter = s ∈ (0, 1], larger s = easier problem.
Proof idea:
∞-D exponential family fitting [Sriperumbudur et al., 2014],
ridge solution.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Computational & statistical tradeoff (M)
2a
Let N = ` h log(`) (a > 0). If
2sa a
s +1
a≤
, then E fẑλ , fρ = Op `− s+1 with λ = `− s+1 ,
s +2
2s 1
s +1
a≥
, then E fẑλ , fρ = Op `− s+2 with λ = `− s+2 .
s +2
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Computational & statistical tradeoff (M)
2a
Let N = ` h log(`) (a > 0). If
2sa a
s +1
a≤
, then E fẑλ , fρ = Op `− s+1 with λ = `− s+1 ,
s +2
2s 1
s +1
a≥
, then E fẑλ , fρ = Op `− s+2 with λ = `− s+2 .
s +2
Meaning (a-dependence):
’Smaller a’ = computational saving, but reduced statistical
efficiency.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Computational & statistical tradeoff (M)
2a
Let N = ` h log(`) (a > 0). If
2sa a
s +1
a≤
, then E fẑλ , fρ = Op `− s+1 with λ = `− s+1 ,
s +2
2s 1
s +1
a≥
, then E fẑλ , fρ = Op `− s+2 with λ = `− s+2 .
s +2
Meaning (a-dependence):
’Smaller a’ = computational saving, but reduced statistical
efficiency.
s +1
2
4
Sensible choice: a ≤
≤ ⇒ 2a ≤ < 2!
s +2
3
3
N can be sub-quadratic in ` again (’=’)!
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Computational & statistical tradeoff (M)
2a
Let N = ` h log(`) (a > 0). If
2sa a
s +1
a≤
, then E fẑλ , fρ = Op `− s+1 with λ = `− s+1 ,
s +2
2s 1
s +1
a≥
, then E fẑλ , fρ = Op `− s+2 with λ = `− s+2 .
s +2
Meaning (h-dependence):
2a
h 7→
is decreasing: smoother K kernel is rewarding.
h
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Computational & statistical tradeoff (M)
2a
Let N = ` h log(`) (a > 0). If
2sa a
s +1
a≤
, then E fẑλ , fρ = Op `− s+1 with λ = `− s+1 ,
s +2
2s 1
s +1
a≥
, then E fẑλ , fρ = Op `− s+2 with λ = `− s+2 .
s +2
Meaning (s-dependence): s 7→
= better rate,
2s
is increasing, i.e easier task
s +2
s → 0: arbitrary slow rate.
2
s = 1: `− 3 rate.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Optimality of the rate (M)
2s
Our rate: r (`) = `− s+2 – range space assumption (s).
2s
One-stage sampled optimal rate: ro (`) = `− 2s+1 [Steinwart et al., 2009],
range-space assumption + eigendecay constraint,
D: compact metric, Y = R.
0.8
Rate
0.6
0.4
−logl[ro(l)]=2s/(2s+1)
0.2
−logl[r(l)]=2s/(s+2)
0
0
0.2
0.4
0.6
Smoothness (s)
Zoltán Szabó
0.8
1
Distribution Regression: Computational & Statistical Tradeoffs
Blanket assumptions: both settings
D: separable, topological domain.
k:
bounded: supu∈D k(u, u) ≤ Bk ∈ (0, ∞),
continuous.
K : bounded; Hölder continuous: ∃L > 0, h ∈ (0, 1] such that
kK (·, µa ) − K (·, µb )kH ≤ L kµa − µb khH .
y : bounded.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Hölder K examples (other than the linear K when h=1)
In case of compact metric D, universal k:
KG
e
Ke
µa −µ 2
b H
−
2θ 2
e
h=1
KC
µa −µ b H
−
2θ 2
1
2
h=
Ki
1 + kµa − µb kθH
h=
θ
2
−1
h=1
Kt
1 + kµa − µb k2H /θ2
−1
(θ ≤ 2)
kµa − µb k2H + θ2
− 1
2
h=1
Functions of kµa − µb kH ⇒ computation: similar to set kernel.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Vector-valued output: similarly
K (µa , µb ) ∈ L(Y ).
Prediction on a new test distribution (t):
(fẑλ ◦ µ)(t) = k(K + `λI` )−1 [y1 ; . . . ; y` ],
(4)
K = [K (µx̂i , µx̂j )] ∈ L(Y )`×` ,
(5)
k = [K (µx̂1 , µt ), . . . , K (µx̂` , µt )] ∈ L(Y )
1×`
.
(6)
Specifically: Y = R ⇒ L(Y ) = R; Y = Rd ⇒ L(Y ) = Rd×d .
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Demo
Supervised entropy learning:
RMSE: MERR=0.75, DFDR=2.02
2
4
0
RMSE
entropy
1
true
MERR
−1
3
2
DFDR
1
−2
0
1
2
rotation angle (β)
Zoltán Szabó
3
MERR
DFDR
Distribution Regression: Computational & Statistical Tradeoffs
Demo
Supervised entropy learning:
RMSE: MERR=0.75, DFDR=2.02
2
4
0
RMSE
entropy
1
true
MERR
−1
3
2
DFDR
1
−2
0
1
2
rotation angle (β)
3
MERR
DFDR
Aerosol prediction from satellite images:
State-of-the-art baseline: 7.5 − 8.5 (±0.1 − 0.6).
MERR: 7.81 (±1.64).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Summary
Problem: distribution regression (k).
Contribution:
computational & statistical tradeoff analysis,
specifically, the set kernel is consistent: 16-year-old open question,
minimax optimal rate is achievable: sub-quadratic bag size.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Summary
Problem: distribution regression (k).
Contribution:
computational & statistical tradeoff analysis,
specifically, the set kernel is consistent: 16-year-old open question,
minimax optimal rate is achievable: sub-quadratic bag size.
Code in ITE, analysis submitted to JMLR:
https://bitbucket.org/szzoli/ite/
http://arxiv.org/abs/1411.2066.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Thank you for the attention!
Acknowledgments: This work was supported by the Gatsby Charitable
Foundation, and by NSF grants IIS1247658 and IIS1250350. A part of
the work was carried out while Bharath K. Sriperumbudur was a research
fellow in the Statistical Laboratory, Department of Pure Mathematics and
Mathematical Statistics at the University of Cambridge, UK.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Caponnetto, A. and De Vito, E. (2007).
Optimal rates for regularized least-squares algorithm.
Foundations of Computational Mathematics, 7:331–368.
Chen, Y. and Wu, O. (2012).
Contextual Hausdorff dissimilarity for multi-instance clustering.
In International Conference on Fuzzy Systems and Knowledge
Discovery (FSKD), pages 870–873.
Cuturi, M., Fukumizu, K., and Vert, J.-P. (2005).
Semigroup kernels on measures.
Journal of Machine Learning Research, 6:11691198.
Edgar, G. (1995).
Measure, Topology and Fractal Geometry.
Springer-Verlag.
Gärtner, T., Flach, P. A., Kowalczyk, A., and Smola, A.
(2002).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Multi-instance kernels.
In International Conference on Machine Learning (ICML),
pages 179–186.
Haussler, D. (1999).
Convolution kernels on discrete structures.
Technical report, Department of Computer Science, University
of California at Santa Cruz.
(http://cbse.soe.ucsc.edu/sites/default/files/
convolutions.pdf).
Hein, M. and Bousquet, O. (2005).
Hilbertian metrics and positive definite kernels on probability
measures.
In International Conference on Artificial Intelligence and
Statistics (AISTATS), pages 136–143.
Jebara, T., Kondor, R., and Howard, A. (2004).
Probability product kernels.
Journal of Machine Learning Research, 5:819–844.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Kandasamy, K., Krishnamurthy, A., Póczos, B., Wasserman,
L., and Robins, J. M. (2015).
Influence functions for machine learning: Nonparametric
estimators for entropies, divergences and mutual informations.
Technical report, Carnegie Mellon University.
http://arxiv.org/abs/1411.4342.
Martins, A. F. T., Smith, N. A., Xing, E. P., Aguiar, P. M. Q.,
and Figueiredo, M. A. T. (2009).
Nonextensive information theoretical kernels on measures.
Journal of Machine Learning Research, 10:935–975.
Nielsen, F. and Nock, R. (2012).
A closed-form expression for the Sharma-Mittal entropy of
exponential families.
Journal of Physics A: Mathematical and Theoretical,
45:032003.
Oliva, J., Neiswanger, W., Póczos, B., Xing, E., and
Schneider, J. (2015).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Fast function to function regression.
In International Conference on Artificial Intelligence and
Statistics (AISTATS), pages 717–725.
Oliva, J., Póczos, B., and Schneider, J. (2013).
Distribution to distribution regression.
International Conference on Machine Learning (ICML; JMLR
W&CP), 28:1049–1057.
Oliva, J. B., Neiswanger, W., Póczos, B., Schneider, J., and
Xing, E. (2014).
Fast distribution to real regression.
International Conference on Artificial Intelligence and
Statistics (AISTATS; JMLR W&CP), 33:706–714.
Póczos, B., Rinaldo, A., Singh, A., and Wasserman, L. (2013).
Distribution-free distribution regression.
International Conference on Artificial Intelligence and
Statistics (AISTATS; JMLR W&CP), 31:507–515.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Póczos, B., Xiong, L., and Schneider, J. (2011).
Nonparametric divergence estimation with applications to
machine learning on distributions.
In Uncertainty in Artificial Intelligence (UAI), pages 599–608.
Reddi, S. J. and Póczos, B. (2014).
k-NN regression on functional data with incomplete
observations.
In Conference on Uncertainty in Artificial Intelligence (UAI),
pages 692–701.
Sriperumbudur, B. K., Fukumizu, K., Kumar, R., Gretton, A.,
and Hyvärinen, A. (2014).
Density estimation in infinite dimensional exponential families.
Technical report.
(http://arxiv.org/pdf/1312.3516).
Steinwart, I., Hush, D. R., and Scovel, C. (2009).
Optimal rates for regularized least squares regression.
In Conference on Learning Theory (COLT).
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Sutherland, D. J., Oliva, J. B., Póczos, B., and Schneider, J.
(2015).
Linear-time learning on distributions with approximate kernel
embeddings.
Technical report, Carnegie Mellon University.
http://arxiv.org/abs/1509.07553.
Wang, F., Syeda-Mahmood, T., Vemuri, B. C., Beymer, D.,
and Rangarajan, A. (2009).
Closed-form Jensen-Rényi divergence for mixture of Gaussians
and applications to group-wise shape registration.
Medical Image Computing and Computer-Assisted
Intervention, 12:648–655.
Wang, J. and Zucker, J.-D. (2000).
Solving the multiple-instance problem: A lazy learning
approach.
In International Conference on Machine Learning (ICML),
pages 1119–1126.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Wang, Z., Lan, L., and Vucetic, S. (2012).
Mixture model for multiple instance regression and
applications in remote sensing.
IEEE Transactions on Geoscience and Remote Sensing,
50:2226–2237.
Wu, O., Gao, J., Hu, W., Li, B., and Zhu, M. (2010).
Identifying multi-instance outliers.
In SIAM International Conference on Data Mining (SDM),
pages 430–441.
Zhang, M.-L. and Zhou, Z.-H. (2009).
Multi-instance clustering with applications to multi-instance
prediction.
Applied Intelligence, 31:47–68.
Zhou, S. K. and Chellappa, R. (2006).
From sample similarity to ensemble similarity: Probabilistic
distance measures in reproducing kernel Hilbert space.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 28:917–929.
Zoltán Szabó
Distribution Regression: Computational & Statistical Tradeoffs
Download