Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Fast algorithms for the estimation of the geometric median and unsupervised classification with high-dimensional data Peggy Cénac-Guesdon - IMB, Dijon Work in collaboration with Hervé Cardot, Pierre-André Zitt, Jean-Marie Monnez CLARA, september 2013 Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Plan 1 Medians in R and elsewhere In dimension one. . . Definition in a Hilbert space 2 Stochastic gradient algorithm 3 Estimation of the functional median Functional median Conditional geometric median 4 Unsupervised classification Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Medians in R and elsewhere Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space The median in R Definition (Median) The (not necessary unique) value m such that : P (X ≤ m) = 0.5. More precisely, P (X ≤ m) ≥ 0.5 and P (X ≥ m) ≥ 0.5. For a random variable with density function > 0, there is equality and X −m E (sign(X − m)) = 0 = E . |X − m| On the right-hand side, it is the derivative of z 7→ E (|X − z|) and m = arg min E (|X − z|) . z∈R Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space The median in R (2) m = arg min E |X − z| . z∈R Remark The mean µ is characterized by E(X − µ) = 0, such that µ = arg min E |X − z|2 . z∈R Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Robust statistics For a distribution P0 , look at the "contaminated distributions" : N (P0 ) = {(1 − )P0 + Q, Q probability measure on H} . The criterion B() = sup {km(P) − m(P0 )k, P ∈ N (P0 )} measures the maximal distance between the true median and the contaminated ones. Property (Gervini, 2008) The breakdown point for the median is non trivial : inf {, B() = ∞} = 0.5, and lim B() = 0. →0 Remark For the mean, B() = ∞ for all > 0. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Robust statistics : gross error sensivity The gross error sensitivity, which is also a classical indicator of robustness, is bounded for the median : F(z) = lim ε→0 m ((1 − ε)P0 + δz ) − m (P0 ) . ε Property F(z) = Γ−1 m z −m kz − mk and sup {kF(z)k , z ∈ H} = 1 λm This indicator is not bounded for the mean. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Definition in a Hilbert space Definition Median : «The» (not necessary unique) value m such that : « P (X ≤ m) = 0.5 ». No natural order relation for R2 . Let the median be one/the point m minimizing z 7→ E (kX − zk) . The geometric median is not so easy to calculate : no closed formula. Recursive algorithms are used for approximation. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Definition in a Hilbert space Definition Median : «The» (not necessary unique) value m such that : « P (X ≤ m) = 0.5 ». No natural order relation for R2 . Let the median be one/the point m minimizing z 7→ E (kX − zk) . The geometric median is not so easy to calculate : no closed formula. Recursive algorithms are used for approximation. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Definition in a Hilbert space Definition Median : «The» (not necessary unique) value m such that : « P (X ≤ m) = 0.5 ». No natural order relation for R2 . Let the median be one/the point m minimizing z 7→ E (kX − zk) . The geometric median is not so easy to calculate : no closed formula. Recursive algorithms are used for approximation. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Definition in a Hilbert space Definition Median : «The» (not necessary unique) value m such that : « P (X ≤ m) = 0.5 ». No natural order relation for R2 . Let the median be one/the point m minimizing z 7→ E (kX − zk) . The geometric median is not so easy to calculate : no closed formula. Recursive algorithms are used for approximation. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Definition in a Hilbert space (2) Definition m = arg min E (kX − zk) z Let us consider separable Hilbert spaces, some Banach spaces as Lp for 1 < p < ∞. In these spaces : Theorem (Kemperman, 1987) The geometric median is uniquely defined as soon as the random variable is not concentrated on a straight line. A possible space : H = L2 ([0, T ]). The median of a random variable in H is called here median function. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Definition in a Hilbert space (2) Definition m = arg min E (kX − zk) z Let us consider separable Hilbert spaces, some Banach spaces as Lp for 1 < p < ∞. In these spaces : Theorem (Kemperman, 1987) The geometric median is uniquely defined as soon as the random variable is not concentrated on a straight line. A possible space : H = L2 ([0, T ]). The median of a random variable in H is called here median function. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Definition in a Hilbert space (2) Definition m = arg min E (kX − zk) z Let us consider separable Hilbert spaces, some Banach spaces as Lp for 1 < p < ∞. In these spaces : Theorem (Kemperman, 1987) The geometric median is uniquely defined as soon as the random variable is not concentrated on a straight line. A possible space : H = L2 ([0, T ]). The median of a random variable in H is called here median function. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification In dimension one. . . Definition in a Hilbert space Convexity Function to minimize : G (z) = E (kX − zk) Property G is a convex function. Under some assumptions which ensure that the distribution is not so concentrated around some points, G is twice differentiable and X −x Dx G = E kX − xk Dx2 G = . . . (closed formula). Let Γm denote the Hessian of functional G at the minimum m. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Stochastic gradient algorithm Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Gradient algorihtm : Newton’s method Determinist case : One looks for a zero of a real function f , from a rough estimation x0 . The idea is to replace the representation curve of f by its tangent at x0 . The intersection between this tangent and the abscisses axes is given by f (x0 ) . x1 = x0 − 0 f (x0 ) The method consists in the iteration of the function ϕ(x) = x − Peggy Cénac-Guesdon - IMB, Dijon f (x) . f 0 (x) Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Gradient algorihtm : Newton’s method Determinist case : One looks for a zero of a real function f , from a rough estimation x0 . The idea is to replace the representation curve of f by its tangent at x0 . The intersection between this tangent and the abscisses axes is given by f (x0 ) . x1 = x0 − 0 f (x0 ) The method consists in the iteration of the function ϕ(x) = x − Peggy Cénac-Guesdon - IMB, Dijon f (x) . f 0 (x) Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Gradient algorihtm : Newton’s method Determinist case : One looks for a zero of a real function f , from a rough estimation x0 . The idea is to replace the representation curve of f by its tangent at x0 . The intersection between this tangent and the abscisses axes is given by f (x0 ) . x1 = x0 − 0 f (x0 ) The method consists in the iteration of the function ϕ(x) = x − Peggy Cénac-Guesdon - IMB, Dijon f (x) . f 0 (x) Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Newton’s algorithm Slope : f 0 (x0 ) 1 α x2 x1 2 x0 3 Figure : Newton’s algorithm on the function f (x) = e x/2 − 2 Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Robbins Monro algorithm To approximate a minimum or a zero of a function, one generaly uses deterministic algorithms. When the function has numerous minima not so far from each other, this kind of algorithm may be trapped in local minima. Moreover, the function f may be known up to a random perturbation. f (x) = E[F (x, ξ)]) Robbins Monro (1951) algorithm consists in replacing, at each step, the unknown value f (Xn ) by an observation Yn := F (Xn , ξn+1 ), where ξ1 , . . . , ξn+1 are iid. Xn+1 = Xn − γn+1 Yn+1 . Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Robbins Monro algorithm To approximate a minimum or a zero of a function, one generaly uses deterministic algorithms. When the function has numerous minima not so far from each other, this kind of algorithm may be trapped in local minima. Moreover, the function f may be known up to a random perturbation. f (x) = E[F (x, ξ)]) Robbins Monro (1951) algorithm consists in replacing, at each step, the unknown value f (Xn ) by an observation Yn := F (Xn , ξn+1 ), where ξ1 , . . . , ξn+1 are iid. Xn+1 = Xn − γn+1 Yn+1 . Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Robbins Monro algorithm To approximate a minimum or a zero of a function, one generaly uses deterministic algorithms. When the function has numerous minima not so far from each other, this kind of algorithm may be trapped in local minima. Moreover, the function f may be known up to a random perturbation. f (x) = E[F (x, ξ)]) Robbins Monro (1951) algorithm consists in replacing, at each step, the unknown value f (Xn ) by an observation Yn := F (Xn , ξn+1 ), where ξ1 , . . . , ξn+1 are iid. Xn+1 = Xn − γn+1 Yn+1 . Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Robbins Monro algorithm To approximate a minimum or a zero of a function, one generaly uses deterministic algorithms. When the function has numerous minima not so far from each other, this kind of algorithm may be trapped in local minima. Moreover, the function f may be known up to a random perturbation. f (x) = E[F (x, ξ)]) Robbins Monro (1951) algorithm consists in replacing, at each step, the unknown value f (Xn ) by an observation Yn := F (Xn , ξn+1 ), where ξ1 , . . . , ξn+1 are iid. Xn+1 = Xn − γn+1 Yn+1 . Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Robbins Monro algorithm To approximate a minimum or a zero of a function, one generaly uses deterministic algorithms. When the function has numerous minima not so far from each other, this kind of algorithm may be trapped in local minima. Moreover, the function f may be known up to a random perturbation. f (x) = E[F (x, ξ)]) Robbins Monro (1951) algorithm consists in replacing, at each step, the unknown value f (Xn ) by an observation Yn := F (Xn , ξn+1 ), where ξ1 , . . . , ξn+1 are iid. Xn+1 = Xn − γn+1 Yn+1 . Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Robbins Monro algorithm (2) Vast literature on this algorithm. Typical assumptions : R or Rd , γn neither too big nor too small, "Nice" behaviour of the function f near to the zero, "Nice" behaviour at the infinite. Typical results : Convergence, Asymptotic normality. In practice : very sensitive to the parameters. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Robbins Monro algorithm (2) Vast literature on this algorithm. Typical assumptions : R or Rd , γn neither too big nor too small, "Nice" behaviour of the function f near to the zero, "Nice" behaviour at the infinite. Typical results : Convergence, Asymptotic normality. In practice : very sensitive to the parameters. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Robbins Monro algorithm (2) Vast literature on this algorithm. Typical assumptions : R or Rd , γn neither too big nor too small, "Nice" behaviour of the function f near to the zero, "Nice" behaviour at the infinite. Typical results : Convergence, Asymptotic normality. In practice : very sensitive to the parameters. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Averaging Very interesting idea : averaging (Polyak & Juditsky 92) the sequence (γn ) is chosen more slower than the "optimal" rate, for instance γn = cn−3/4 ; the estimator mn oscillates two much ; it is smoothed out by averaging : n 1X mk . n k=1 Advantages : the convergence remains true, the normality with "optimal" variance can be shown, in practice, the algorithm is very less sensitive to the parameters. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Estimation of the geometric median of 18900 vectors of R336 in less than 3 seconds Joint work with H. Cardot and P-A. Zitt. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median A new estimator Let us define the algorithm mn+1 = mn + γn where the stepsize γn > 0 satisfies X γn = ∞ and n≥1 Xn+1 − mn kXn+1 − mn k X γn2 < ∞. n≥1 Advantages : For a n-sample of vectors in Rd : O(nd ) operations ; Online estimation : simple update ; No need to store the data. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Assumptions Assumptions A1 The distribution of X is not concentrated on a straight line. A2 The distribution is a mix : µX = λµc + (1 − λ)µd , with µc satisfies : ∀x ∈ H, µc ({x}) = 0 and α 7→ E kX − αk−1 is uniformly bounded on the balls. µd is a discrete measure whose support does not contain the median m. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Convergence of the algorithm Theorem (Cardot, Cénac, Zitt 2011) Under assumptions (A1) and (A2), the sequence (mn ) converges almost surely when n goes to infinity, a.s. kmn − mk −→ 0. n→∞ Does it really work ? Take a gaussian sample with mean (0, 0) and variance 10 3 3 2 . With the symmetry, the median m is equal to the mean. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Convergence of the algorithm Theorem (Cardot, Cénac, Zitt 2011) Under assumptions (A1) and (A2), the sequence (mn ) converges almost surely when n goes to infinity, a.s. kmn − mk −→ 0. n→∞ Does it really work ? Take a gaussian sample with mean (0, 0) and variance 10 3 3 2 . With the symmetry, the median m is equal to the mean. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Functional median Conditional geometric median 0 -2 -4 X2 2 4 Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification -5 0 5 X1 Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification g n3/4 Xn+1 − mn kXn+1 − mn k MSE 0.0 0.00 0.1 0.02 0.2 0.04 MSE 0.06 RM, g=10 RM, g=1 AV, g=10 AV, g=1 0.3 RM, g=10 RM, g=1 AV, g=10 AV, g=1 0.08 0.4 0.5 0.10 mn+1 = mn + Functional median Conditional geometric median 0 2000 4000 6000 8000 10000 Iterations Peggy Cénac-Guesdon - IMB, Dijon 0 2000 4000 6000 8000 10000 iteration Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Averaging We smooth out : n mn 1X = mj . n j=1 Easy update : Xn+1 − mn mn+1 = mn + γn kXn+1 − mn k mn+1 = n 1 mn + mn+1 . n+1 n+1 Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Averaging (2) Let us denote Γm the Hessian at point m, V the variance operator X −· kX −·k . Theorem (Cardot, Cénac, Zitt 2011) Under assumptions (A1) and (A2), assuming γn = g /nα , 0.5 < α < 1, and n o sup E kX − hk−2 ; h ∈ B(m, ) < ∞, one has √ n (mn − m) −1 N (0, Γ−1 m V Γm ) in distribution in H. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median An application 200 samples with n = 2000. Have a look at the error estimation : 0.25 0.20 0.15 0.10 0.05 0.00 0.00 0.05 0.10 0.15 0.20 0.25 Averaging g = 0.1 g = 0.5 g=1 g=2 g=5 g = 10 Peggy Cénac-Guesdon - IMB, Dijon Mean g = 0.1 g = 0.5 g=1 g=2 g=5 g = 10 Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Error estimation of the geometric median g 0.2 1 10 50 75 Vardi & Zhang [Q1 0.45 0.15 0.13 0.13 0.14 0.12 n=250 median 0.60 0.22 0.18 0.19 0.20 0.18 Q3] 0.80 0.31 0.25 0.26 0.27 0.25 [Q1 0.25 0.05 0.04 0.04 0.05 0.04 n=2000 median 0.35 0.08 0.06 0.06 0.07 0.06 Q3] 0.47 0.10 0.09 0.09 0.09 0.08 In one second, we deal with samples of size : n = 150 with the algorithm of Vardi & Zhang (2000) (in n = 4500 with our averaged algorithm (in ), ), n = 90 000 with our averaged algorithm (in C). Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median An example : television audience (data from Médiamétrie) Measures every second, 6/09/2010 : n = 5423 and Xi ∈ {0, 1}86400 . Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median First report : An algorithm which is very easy to compute, not so sensitive to the parameters, very fast, at the "optimal" rate. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Recursive estimation of the conditional geometric median in Hilbert spaces Joint work with H. Cardot and P-A. Zitt. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median The conditional median We would like to take into account some covariates, i.e. random variables that are correlated with the functional variable under study. Let us introduce a weighted algorithm with weights controlled by a kernel function and an associated bandwidth. The pair (Y , X ) stands for random variables in H × R. The goal is the estimation of the geometric median of Y given X = x, that is the value m(x) such that m(x) = argminα∈H E[kY − αk − kY k |X = x]. The solution is unique provided that the conditional distribution of Y given X = x is not supported by a straight line (Kemperman 87). Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median The conditional median (2) The recursive algorithm for the geometric median mn+1 = mn + γn Yn+1 − mn kYn+1 − mn k becomes for the conditional geometric median : Xn+1 − x Yn+1 − Zn 1 K Zn+1 = Zn + γn kYn+1 − Zn k hn hn with two deterministic sequences (hn ) and (γn ) and the kernel K as parameters. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Assumptions Assumptions A0. The kernel function K is positive, symmetric, bounded with compact support and satisfies Z Z K (u)du = 1 K (u)2 du = ν 2 R R A1. For all x in the support of X , the variable Y given X = x is not concentrated in a straight line. A2. The density function of X is bounded, continuous and twice differentiable with bounded derivatives. ... Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Assumptions (2) Assumptions ... A3. The conditional law µx = L(Y |X = x) varies regularly with x : there are two constants c and β such that β W2 (µx , µx 0 ) ≤ c x − x 0 . A4. There exists a constant C such that ∀α ∈ H, ∀x, E[kY − αk−2 |X = x] ≤ C . Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Comments The assumptions A2 and A0 are classical in nonparametric estimation and could be weakened at the expense of more complicated proofs. Assumption A4 is stated quite strongly. It could be relaxed. Informally it forces the law to be “spread out” and this avoids pathological behaviors of the algorithm. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median Rate of convergence Let us define n Z n+1 1X = Zk . n k=1 Theorem Under some assumptions on the stepsizes (γn ) and (hn ), p L −1 nhn Z n − m −→ N 0, Γ−1 m ΣΓm , n→∞ with, 1 (Y − m) (Y − m) 2 X −m Σ = lim E K ⊗ . h kY − mk kY − mk h→0 h Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Functional median Conditional geometric median 0.4 mean median q25 q50 q75 q90 0.0 0.2 Audience 0.6 0.8 Application to television audience 5 10 15 20 25 Hours Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Non-hierarchical and automatical classification in Rd Joint work with H. Cardot and J-M. Monnez. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Non-hierarchical and automatical classification in Rd The aim is to partition Rd into a finite number k of clusters. Each cluster is represented by its center, θ ` ∈ Rd , ` = 1, . . . , k. We minimize the function g : Rdk 7→ R defined by ` g (θ) = E min ϕ(kX − θ k) , `=1,...,k where ϕ is an increasing function on R+ . Two particular cases : ϕ(u) = u 2 , leads to the k-means algorithm (MacQueen, 1967). ϕ(u) = |u|, leads to the k-medians algorithm. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Non-hierarchical and automatical classification in R2 Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Recursive version of the k-means (MacQueen, 1967) Let I` stand for the indicator function of x ∈ Rd to the cluster `, I` (x, θ) = k Y j=1 11{kx−θ` k≤kx−θj k} , Initialization : k points of Rd : θ `1 Iterations : θ `n+1 = θ `n − an` I` (Xn , θ n ) θ `n − Xn , θ `n+1 where an` is the inverse of the size n` of cluster `. stands for the mean θ `n+1 1 = 1 + n` θ `1 + n X ! I` (Xi ; θ i )Xi , i=1 Very fast averaged stochastic gradient algorithm. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Recursive version of the k-medians (CCM, 2011) Initialization : k points of Rd , θ `1 , ` = 1, . . . , k. Iterations : for n ≥ 1, and ` = 1, . . . , k, θ ` − Xn , θ `n+1 = θ `n − an` I` (Xn , θ n ) `n θ n − Xn Stepsize : for parameters cγ , cα and 21 < α ≤ 1 to calibrate, ( ` an−1 if I` (Xn , θ n ) = 0, ` cγ an = elsewhere. (1 + cα n` )α Iterations : (averaged version) ` θ̄ n ` ` θ̄ n+1 = n` θ̄ n + θ `n+1 n` + 1 Peggy Cénac-Guesdon - IMB, Dijon if I` (Xn , θ n ) = 0 elsewhere. Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Assumptions (H1) a) The random vector X has an absolutely continuous density function. b) X is bounded : ∃K > 0 : kXk ≤ K a.s. h i 1 c) ∃C : ∀x ∈ Rd such that kxk ≤ K + 1, E kX−xk < C. (H2) a) ∀n ≥ 1, min` an` > 0. 1 ) a.s. b) P max` supn an` < min( 12 , 8C ∞ ` c) max a = ∞ a.s. ` n n=1 d) supn max` an` min` an` (H3) Pk P∞ (H3’) Pk P∞ `=1 `=1 < ∞ a.s. ` 2 an < ∞ a.s. h i ` 2 I (X , θ ) < ∞. E a n n ` n n=1 n=1 Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification About convergence Proposition (Cardot, Cénac, Monnez, 2011) If θ 1 is absolutely continuous, θ `1 ≤ K , for ` = 1, . . . , k, and under (H1), (H2), (H3) or (H3’), ∇g (θ n ) and the distance between θ n and the set of stationary points of g converge almost surely to zero. Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification 1.0 Television audience profiles 0.8 Cl. 1 Cl. 2 Cl. 3 0.2 0.4 0.6 Cl. 5 0.0 TV audience Cl. 4 0 200 400 600 Peggy Cénac-Guesdon - IMB, Dijon 800 1000 1200 1400 Fast algorithms for the estimation of the median Plan Median, medians Stochastic gradient algorithm Estimation of the functional median Unsupervised classification Thank you ! Peggy Cénac-Guesdon - IMB, Dijon Fast algorithms for the estimation of the median