Fast algorithms for the estimation of the geometric median and

advertisement
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Fast algorithms for the estimation of the geometric
median and unsupervised classification with
high-dimensional data
Peggy Cénac-Guesdon - IMB, Dijon
Work in collaboration with Hervé Cardot, Pierre-André Zitt,
Jean-Marie Monnez
CLARA, september 2013
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Plan
1
Medians in R and elsewhere
In dimension one. . .
Definition in a Hilbert space
2
Stochastic gradient algorithm
3
Estimation of the functional median
Functional median
Conditional geometric median
4
Unsupervised classification
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Medians in R and elsewhere
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
The median in R
Definition (Median)
The (not necessary unique) value m such that :
P (X ≤ m) = 0.5.
More precisely, P (X ≤ m) ≥ 0.5 and P (X ≥ m) ≥ 0.5.
For a random variable with density function > 0, there is equality
and
X −m
E (sign(X − m)) = 0 = E
.
|X − m|
On the right-hand side, it is the derivative of z 7→ E (|X − z|) and
m = arg min E (|X − z|) .
z∈R
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
The median in R (2)
m = arg min E |X − z| .
z∈R
Remark
The mean µ is characterized by E(X − µ) = 0, such that
µ = arg min E |X − z|2 .
z∈R
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Robust statistics
For a distribution P0 , look at the "contaminated distributions" :
N (P0 ) = {(1 − )P0 + Q, Q probability measure on H} .
The criterion B() = sup {km(P) − m(P0 )k, P ∈ N (P0 )}
measures the maximal distance between the true median and the
contaminated ones.
Property (Gervini, 2008)
The breakdown point for the median is non trivial :
inf {, B() = ∞} = 0.5,
and
lim B() = 0.
→0
Remark
For the mean, B() = ∞ for all > 0.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Robust statistics : gross error sensivity
The gross error sensitivity, which is also a classical indicator of
robustness, is bounded for the median :
F(z) = lim
ε→0
m ((1 − ε)P0 + δz ) − m (P0 )
.
ε
Property
F(z) = Γ−1
m
z −m
kz − mk
and
sup {kF(z)k , z ∈ H} =
1
λm
This indicator is not bounded for the mean.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Definition in a Hilbert space
Definition
Median : «The» (not necessary unique) value m such that :
« P (X ≤ m) = 0.5 ».
No natural order relation for R2 .
Let the median be one/the point m minimizing
z 7→ E (kX − zk) .
The geometric median is not so easy to calculate : no closed
formula. Recursive algorithms are used for approximation.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Definition in a Hilbert space
Definition
Median : «The» (not necessary unique) value m such that :
« P (X ≤ m) = 0.5 ».
No natural order relation for R2 .
Let the median be one/the point m minimizing
z 7→ E (kX − zk) .
The geometric median is not so easy to calculate : no closed
formula. Recursive algorithms are used for approximation.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Definition in a Hilbert space
Definition
Median : «The» (not necessary unique) value m such that :
« P (X ≤ m) = 0.5 ».
No natural order relation for R2 .
Let the median be one/the point m minimizing
z 7→ E (kX − zk) .
The geometric median is not so easy to calculate : no closed
formula. Recursive algorithms are used for approximation.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Definition in a Hilbert space
Definition
Median : «The» (not necessary unique) value m such that :
« P (X ≤ m) = 0.5 ».
No natural order relation for R2 .
Let the median be one/the point m minimizing
z 7→ E (kX − zk) .
The geometric median is not so easy to calculate : no closed
formula. Recursive algorithms are used for approximation.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Definition in a Hilbert space (2)
Definition
m = arg min E (kX − zk)
z
Let us consider
separable Hilbert spaces,
some Banach spaces as Lp for 1 < p < ∞.
In these spaces :
Theorem (Kemperman, 1987)
The geometric median is uniquely defined as soon as the random variable
is not concentrated on a straight line.
A possible space : H = L2 ([0, T ]). The median of a random variable
in H is called here median function.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Definition in a Hilbert space (2)
Definition
m = arg min E (kX − zk)
z
Let us consider
separable Hilbert spaces,
some Banach spaces as Lp for 1 < p < ∞.
In these spaces :
Theorem (Kemperman, 1987)
The geometric median is uniquely defined as soon as the random variable
is not concentrated on a straight line.
A possible space : H = L2 ([0, T ]). The median of a random variable
in H is called here median function.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Definition in a Hilbert space (2)
Definition
m = arg min E (kX − zk)
z
Let us consider
separable Hilbert spaces,
some Banach spaces as Lp for 1 < p < ∞.
In these spaces :
Theorem (Kemperman, 1987)
The geometric median is uniquely defined as soon as the random variable
is not concentrated on a straight line.
A possible space : H = L2 ([0, T ]). The median of a random variable
in H is called here median function.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
In dimension one. . .
Definition in a Hilbert space
Convexity
Function to minimize : G (z) = E (kX − zk)
Property
G is a convex function. Under some assumptions which ensure that the
distribution is not so concentrated around some points, G is twice
differentiable and
X −x
Dx G = E
kX − xk
Dx2 G = . . . (closed formula).
Let Γm denote the Hessian of functional G at the minimum m.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Stochastic gradient algorithm
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Gradient algorihtm : Newton’s method
Determinist case : One looks for a zero of a real function f , from a
rough estimation x0 .
The idea is to replace the representation curve of f by its tangent at
x0 . The intersection between this tangent and the abscisses axes is
given by
f (x0 )
.
x1 = x0 − 0
f (x0 )
The method consists in the iteration of the function
ϕ(x) = x −
Peggy Cénac-Guesdon - IMB, Dijon
f (x)
.
f 0 (x)
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Gradient algorihtm : Newton’s method
Determinist case : One looks for a zero of a real function f , from a
rough estimation x0 .
The idea is to replace the representation curve of f by its tangent at
x0 . The intersection between this tangent and the abscisses axes is
given by
f (x0 )
.
x1 = x0 − 0
f (x0 )
The method consists in the iteration of the function
ϕ(x) = x −
Peggy Cénac-Guesdon - IMB, Dijon
f (x)
.
f 0 (x)
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Gradient algorihtm : Newton’s method
Determinist case : One looks for a zero of a real function f , from a
rough estimation x0 .
The idea is to replace the representation curve of f by its tangent at
x0 . The intersection between this tangent and the abscisses axes is
given by
f (x0 )
.
x1 = x0 − 0
f (x0 )
The method consists in the iteration of the function
ϕ(x) = x −
Peggy Cénac-Guesdon - IMB, Dijon
f (x)
.
f 0 (x)
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Newton’s algorithm
Slope : f 0 (x0 )
1
α
x2
x1
2
x0
3
Figure : Newton’s algorithm on the function f (x) = e x/2 − 2
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Robbins Monro algorithm
To approximate a minimum or a zero of a function, one generaly
uses deterministic algorithms.
When the function has numerous minima not so far from each
other, this kind of algorithm may be trapped in local minima.
Moreover, the function f may be known up to a random
perturbation.
f (x) = E[F (x, ξ)])
Robbins Monro (1951) algorithm consists in replacing, at each step,
the unknown value f (Xn ) by an observation Yn := F (Xn , ξn+1 ),
where ξ1 , . . . , ξn+1 are iid.
Xn+1 = Xn − γn+1 Yn+1 .
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Robbins Monro algorithm
To approximate a minimum or a zero of a function, one generaly
uses deterministic algorithms.
When the function has numerous minima not so far from each
other, this kind of algorithm may be trapped in local minima.
Moreover, the function f may be known up to a random
perturbation.
f (x) = E[F (x, ξ)])
Robbins Monro (1951) algorithm consists in replacing, at each step,
the unknown value f (Xn ) by an observation Yn := F (Xn , ξn+1 ),
where ξ1 , . . . , ξn+1 are iid.
Xn+1 = Xn − γn+1 Yn+1 .
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Robbins Monro algorithm
To approximate a minimum or a zero of a function, one generaly
uses deterministic algorithms.
When the function has numerous minima not so far from each
other, this kind of algorithm may be trapped in local minima.
Moreover, the function f may be known up to a random
perturbation.
f (x) = E[F (x, ξ)])
Robbins Monro (1951) algorithm consists in replacing, at each step,
the unknown value f (Xn ) by an observation Yn := F (Xn , ξn+1 ),
where ξ1 , . . . , ξn+1 are iid.
Xn+1 = Xn − γn+1 Yn+1 .
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Robbins Monro algorithm
To approximate a minimum or a zero of a function, one generaly
uses deterministic algorithms.
When the function has numerous minima not so far from each
other, this kind of algorithm may be trapped in local minima.
Moreover, the function f may be known up to a random
perturbation.
f (x) = E[F (x, ξ)])
Robbins Monro (1951) algorithm consists in replacing, at each step,
the unknown value f (Xn ) by an observation Yn := F (Xn , ξn+1 ),
where ξ1 , . . . , ξn+1 are iid.
Xn+1 = Xn − γn+1 Yn+1 .
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Robbins Monro algorithm
To approximate a minimum or a zero of a function, one generaly
uses deterministic algorithms.
When the function has numerous minima not so far from each
other, this kind of algorithm may be trapped in local minima.
Moreover, the function f may be known up to a random
perturbation.
f (x) = E[F (x, ξ)])
Robbins Monro (1951) algorithm consists in replacing, at each step,
the unknown value f (Xn ) by an observation Yn := F (Xn , ξn+1 ),
where ξ1 , . . . , ξn+1 are iid.
Xn+1 = Xn − γn+1 Yn+1 .
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Robbins Monro algorithm (2)
Vast literature on this algorithm.
Typical assumptions :
R or Rd ,
γn neither too big nor too small,
"Nice" behaviour of the function f near to the zero,
"Nice" behaviour at the infinite.
Typical results :
Convergence,
Asymptotic normality.
In practice : very sensitive to the parameters.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Robbins Monro algorithm (2)
Vast literature on this algorithm.
Typical assumptions :
R or Rd ,
γn neither too big nor too small,
"Nice" behaviour of the function f near to the zero,
"Nice" behaviour at the infinite.
Typical results :
Convergence,
Asymptotic normality.
In practice : very sensitive to the parameters.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Robbins Monro algorithm (2)
Vast literature on this algorithm.
Typical assumptions :
R or Rd ,
γn neither too big nor too small,
"Nice" behaviour of the function f near to the zero,
"Nice" behaviour at the infinite.
Typical results :
Convergence,
Asymptotic normality.
In practice : very sensitive to the parameters.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Averaging
Very interesting idea : averaging (Polyak & Juditsky 92)
the sequence (γn ) is chosen more slower than the "optimal" rate, for
instance γn = cn−3/4 ;
the estimator mn oscillates two much ;
it is smoothed out by averaging :
n
1X
mk .
n
k=1
Advantages :
the convergence remains true,
the normality with "optimal" variance can be shown,
in practice, the algorithm is very less sensitive to the parameters.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Estimation of the geometric median of 18900 vectors of
R336 in less than 3 seconds
Joint work with H. Cardot and P-A. Zitt.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
A new estimator
Let us define the algorithm
mn+1 = mn + γn
where the stepsize γn > 0 satisfies
X
γn = ∞ and
n≥1
Xn+1 − mn
kXn+1 − mn k
X
γn2 < ∞.
n≥1
Advantages :
For a n-sample of vectors in Rd : O(nd ) operations ;
Online estimation : simple update ;
No need to store the data.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Assumptions
Assumptions
A1 The distribution of X is not concentrated on a straight line.
A2 The distribution is a mix : µX = λµc + (1 − λ)µd , with
µc satisfies : ∀x ∈ H, µc ({x}) = 0 and
α 7→ E kX − αk−1
is uniformly bounded on the balls.
µd is a discrete measure whose support does not contain the median
m.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Convergence of the algorithm
Theorem (Cardot, Cénac, Zitt 2011)
Under assumptions (A1) and (A2), the sequence (mn ) converges almost
surely when n goes to infinity,
a.s.
kmn − mk −→ 0.
n→∞
Does it really work ?
Take a gaussian sample with mean (0, 0) and variance
10 3
3 2
.
With the symmetry, the median m is equal to the mean.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Convergence of the algorithm
Theorem (Cardot, Cénac, Zitt 2011)
Under assumptions (A1) and (A2), the sequence (mn ) converges almost
surely when n goes to infinity,
a.s.
kmn − mk −→ 0.
n→∞
Does it really work ?
Take a gaussian sample with mean (0, 0) and variance
10 3
3 2
.
With the symmetry, the median m is equal to the mean.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Functional median
Conditional geometric median
0
-2
-4
X2
2
4
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
-5
0
5
X1
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
g
n3/4
Xn+1 − mn
kXn+1 − mn k
MSE
0.0
0.00
0.1
0.02
0.2
0.04
MSE
0.06
RM, g=10
RM, g=1
AV, g=10
AV, g=1
0.3
RM, g=10
RM, g=1
AV, g=10
AV, g=1
0.08
0.4
0.5
0.10
mn+1 = mn +
Functional median
Conditional geometric median
0
2000
4000
6000
8000
10000
Iterations
Peggy Cénac-Guesdon - IMB, Dijon
0
2000
4000
6000
8000
10000
iteration
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Averaging
We smooth out :
n
mn
1X
=
mj .
n
j=1
Easy update :

Xn+1 − mn


 mn+1 = mn + γn

kXn+1 − mn k





 mn+1 =
n
1
mn +
mn+1 .
n+1
n+1
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Averaging (2)
Let us denote Γm the Hessian at point m, V the variance operator
X −·
kX −·k .
Theorem (Cardot, Cénac, Zitt 2011)
Under assumptions (A1) and (A2), assuming γn = g /nα , 0.5 < α < 1,
and
n o
sup E kX − hk−2 ; h ∈ B(m, ) < ∞,
one has
√
n (mn − m)
−1
N (0, Γ−1
m V Γm ) in distribution in H.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
An application
200 samples with n = 2000. Have a look at the error estimation :
0.25
0.20
0.15
0.10
0.05
0.00
0.00
0.05
0.10
0.15
0.20
0.25
Averaging
g = 0.1
g = 0.5
g=1
g=2
g=5
g = 10
Peggy Cénac-Guesdon - IMB, Dijon
Mean
g = 0.1
g = 0.5
g=1
g=2
g=5
g = 10
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Error estimation of the geometric median
g
0.2
1
10
50
75
Vardi & Zhang
[Q1
0.45
0.15
0.13
0.13
0.14
0.12
n=250
median
0.60
0.22
0.18
0.19
0.20
0.18
Q3]
0.80
0.31
0.25
0.26
0.27
0.25
[Q1
0.25
0.05
0.04
0.04
0.05
0.04
n=2000
median
0.35
0.08
0.06
0.06
0.07
0.06
Q3]
0.47
0.10
0.09
0.09
0.09
0.08
In one second, we deal with samples of size :
n = 150 with the algorithm of Vardi & Zhang (2000) (in
n = 4500 with our averaged algorithm (in
),
),
n = 90 000 with our averaged algorithm (in C).
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
An example : television audience (data from Médiamétrie)
Measures every second, 6/09/2010 :
n = 5423 and Xi ∈ {0, 1}86400 .
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
First report :
An algorithm which is very easy to compute,
not so sensitive to the parameters,
very fast,
at the "optimal" rate.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Recursive estimation of the conditional geometric median
in Hilbert spaces
Joint work with H. Cardot and P-A. Zitt.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
The conditional median
We would like to take into account some covariates, i.e. random
variables that are correlated with the functional variable under study.
Let us introduce a weighted algorithm with weights controlled by a
kernel function and an associated bandwidth.
The pair (Y , X ) stands for random variables in H × R. The goal is
the estimation of the geometric median of Y given X = x, that is
the value m(x) such that
m(x) = argminα∈H E[kY − αk − kY k |X = x].
The solution is unique provided that the conditional distribution of
Y given X = x is not supported by a straight line (Kemperman 87).
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
The conditional median (2)
The recursive algorithm for the geometric median
mn+1 = mn + γn
Yn+1 − mn
kYn+1 − mn k
becomes for the conditional geometric median :
Xn+1 − x
Yn+1 − Zn 1
K
Zn+1 = Zn + γn
kYn+1 − Zn k hn
hn
with two deterministic sequences (hn ) and (γn ) and the kernel K as
parameters.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Assumptions
Assumptions
A0. The kernel function K is positive, symmetric, bounded with compact
support and satisfies
Z
Z
K (u)du = 1
K (u)2 du = ν 2
R
R
A1. For all x in the support of X , the variable Y given X = x is not
concentrated in a straight line.
A2. The density function of X is bounded, continuous and twice
differentiable with bounded derivatives.
...
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Assumptions (2)
Assumptions
...
A3. The conditional law µx = L(Y |X = x) varies regularly with x :
there are two constants c and β such that
β
W2 (µx , µx 0 ) ≤ c x − x 0 .
A4. There exists a constant C such that
∀α ∈ H, ∀x, E[kY − αk−2 |X = x] ≤ C .
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Comments
The assumptions A2 and A0 are classical in nonparametric
estimation and could be weakened at the expense of more
complicated proofs.
Assumption A4 is stated quite strongly. It could be relaxed.
Informally it forces the law to be “spread out” and this avoids
pathological behaviors of the algorithm.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
Rate of convergence
Let us define
n
Z n+1
1X
=
Zk .
n
k=1
Theorem
Under some assumptions on the stepsizes (γn ) and (hn ),
p
L
−1
nhn Z n − m −→ N 0, Γ−1
m ΣΓm ,
n→∞
with,
1
(Y − m)
(Y − m)
2 X −m
Σ = lim E K
⊗
.
h
kY − mk kY − mk
h→0 h
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Functional median
Conditional geometric median
0.4
mean
median
q25
q50
q75
q90
0.0
0.2
Audience
0.6
0.8
Application to television audience
5
10
15
20
25
Hours
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Non-hierarchical and automatical classification in Rd
Joint work with H. Cardot and J-M. Monnez.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Non-hierarchical and automatical classification in Rd
The aim is to partition Rd into a finite number k of clusters. Each
cluster is represented by its center, θ ` ∈ Rd , ` = 1, . . . , k. We
minimize the function g : Rdk 7→ R defined by
`
g (θ) = E
min ϕ(kX − θ k) ,
`=1,...,k
where ϕ is an increasing function on R+ .
Two particular cases :
ϕ(u) = u 2 , leads to the k-means algorithm (MacQueen, 1967).
ϕ(u) = |u|, leads to the k-medians algorithm.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Non-hierarchical and automatical classification in R2
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Recursive version of the k-means (MacQueen, 1967)
Let I` stand for the indicator function of x ∈ Rd to the cluster `,
I` (x, θ) =
k
Y
j=1
11{kx−θ` k≤kx−θj k} ,
Initialization : k points of Rd : θ `1
Iterations :
θ `n+1 = θ `n − an` I` (Xn , θ n ) θ `n − Xn ,
θ `n+1
where an` is the inverse of the size n` of cluster `.
stands for the mean
θ `n+1
1
=
1 + n`
θ `1
+
n
X
!
I` (Xi ; θ i )Xi
,
i=1
Very fast averaged stochastic gradient algorithm.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Recursive version of the k-medians (CCM, 2011)
Initialization : k points of Rd , θ `1 , ` = 1, . . . , k.
Iterations : for n ≥ 1, and ` = 1, . . . , k,
θ ` − Xn
,
θ `n+1 = θ `n − an` I` (Xn , θ n ) `n
θ n − Xn Stepsize : for parameters cγ , cα and 21 < α ≤ 1 to calibrate,
( `
an−1
if I` (Xn , θ n ) = 0,
`
cγ
an =
elsewhere.
(1 + cα n` )α
Iterations : (averaged version)

`

 θ̄ n
`
`
θ̄ n+1 =
n` θ̄ n + θ `n+1


n` + 1
Peggy Cénac-Guesdon - IMB, Dijon
if I` (Xn , θ n ) = 0
elsewhere.
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Assumptions
(H1)
a) The random vector X has an absolutely continuous density function.
b) X is bounded : ∃K > 0 : kXk ≤ K a.s. h
i
1
c) ∃C : ∀x ∈ Rd such that kxk ≤ K + 1, E kX−xk
< C.
(H2)
a) ∀n ≥ 1, min` an` > 0.
1
) a.s.
b) P
max` supn an` < min( 12 , 8C
∞
`
c)
max
a
=
∞
a.s.
`
n
n=1
d) supn
max` an`
min` an`
(H3)
Pk
P∞
(H3’)
Pk
P∞
`=1
`=1
< ∞ a.s.
` 2
an < ∞ a.s.
h i
` 2 I (X , θ ) < ∞.
E
a
n
n
`
n
n=1
n=1
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
About convergence
Proposition (Cardot, Cénac, Monnez, 2011)
If θ 1 is absolutely continuous, θ `1 ≤ K , for ` = 1, . . . , k, and under
(H1), (H2), (H3) or (H3’),
∇g (θ n ) and the distance between θ n and the set of stationary points of
g converge almost surely to zero.
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
1.0
Television audience profiles
0.8
Cl. 1
Cl. 2
Cl. 3
0.2
0.4
0.6
Cl. 5
0.0
TV audience
Cl. 4
0
200
400
600
Peggy Cénac-Guesdon - IMB, Dijon
800
1000
1200
1400
Fast algorithms for the estimation of the median
Plan
Median, medians
Stochastic gradient algorithm
Estimation of the functional median
Unsupervised classification
Thank you !
Peggy Cénac-Guesdon - IMB, Dijon
Fast algorithms for the estimation of the median
Download