Quantization for Probability Distributions - An Introduction Christian Küchler FS Numerik stochastischer Modelle, SS 2006 May 31, 2006 1 Introduction, History and Applications 2 Voronoi regions, diagrams and tesselations 3 Properties, Related Problems and Asymptotics 4 Application: Numerical Integration 5 Algorithms 6 Application: Quantization of Stochastic Processes 7 Conclusion Quantization ”Quantization is the division of a quantity into a discrete number of small parts, often assumed to be integral multiples of a common quantity.”, The dictionary (Random House) Quantization ”Quantization is the division of a quantity into a discrete number of small parts, often assumed to be integral multiples of a common quantity.”, The dictionary (Random House) Oldest example: rounding off for estimating densities by histograms. Sheppard (1898) Quantization ”Quantization is the division of a quantity into a discrete number of small parts, often assumed to be integral multiples of a common quantity.”, The dictionary (Random House) Oldest example: rounding off for estimating densities by histograms. Sheppard (1898) The term ”quantization” originates in the theory of signal processing in electrical engineering in the late 40’s. Oliver, Pierce, Shannon (1948) and Bennett (1948) Quantization ”Quantization is the division of a quantity into a discrete number of small parts, often assumed to be integral multiples of a common quantity.”, The dictionary (Random House) Oldest example: rounding off for estimating densities by histograms. Sheppard (1898) The term ”quantization” originates in the theory of signal processing in electrical engineering in the late 40’s. Oliver, Pierce, Shannon (1948) and Bennett (1948) Analog-to-digital conversion, data compression Quantization ”Quantization is the division of a quantity into a discrete number of small parts, often assumed to be integral multiples of a common quantity.”, The dictionary (Random House) Oldest example: rounding off for estimating densities by histograms. Sheppard (1898) The term ”quantization” originates in the theory of signal processing in electrical engineering in the late 40’s. Oliver, Pierce, Shannon (1948) and Bennett (1948) Analog-to-digital conversion, data compression History and Overview : Gray and Neuhoff (1998) Quantization Let X be a Rd −valued random variable with distribution P and EkX kr < ∞ for fixed 1 ≤ r < ∞. Fn is the set of all Borel measurable maps f : Rd → Rd with #f (Rd ) ≤ n, f ∈ Fn is called quantizer , f (X ) a quantized version of X Quantization Let X be a Rd −valued random variable with distribution P and EkX kr < ∞ for fixed 1 ≤ r < ∞. Fn is the set of all Borel measurable maps f : Rd → Rd with #f (Rd ) ≤ n, f ∈ Fn is called quantizer , f (X ) a quantized version of X The n−th quantization error for P of order r is defined by Vn,r (P) , inf EkX − f (X )kr . f ∈Fn In the following we assume | supp P| ≥ n + 1. Quantization Let X be a Rd −valued random variable with distribution P and EkX kr < ∞ for fixed 1 ≤ r < ∞. Fn is the set of all Borel measurable maps f : Rd → Rd with #f (Rd ) ≤ n, f ∈ Fn is called quantizer , f (X ) a quantized version of X The n−th quantization error for P of order r is defined by Vn,r (P) , inf EkX − f (X )kr . f ∈Fn In the following we assume | supp P| ≥ n + 1. A quantizer f is called n-optimal of order r if Vn,r (P) = EkX − f (X )kr . Areas of Application Quantization problems appear in various scientific fields, e.g. Information theory (signal compression) Areas of Application Quantization problems appear in various scientific fields, e.g. Information theory (signal compression) Cluster analysis, pattern and speech recognition (quantization of empirical measures) Areas of Application Quantization problems appear in various scientific fields, e.g. Information theory (signal compression) Cluster analysis, pattern and speech recognition (quantization of empirical measures) Numerical Integration Areas of Application Quantization problems appear in various scientific fields, e.g. Information theory (signal compression) Cluster analysis, pattern and speech recognition (quantization of empirical measures) Numerical Integration Simulation of stochastic processes Infinite-dimensional quantization (Functional Quantization) Areas of Application Quantization problems appear in various scientific fields, e.g. Information theory (signal compression) Cluster analysis, pattern and speech recognition (quantization of empirical measures) Numerical Integration Simulation of stochastic processes Infinite-dimensional quantization (Functional Quantization) Mathematical models in economics Areas of Application - Information Theory Scalar and Vector Quantization (block source coding) Areas of Application - Information Theory Scalar and Vector Quantization (block source coding) Quantization with fixed or variable rate Areas of Application - Information Theory Scalar and Vector Quantization (block source coding) Quantization with fixed or variable rate High resolution quantization (n → ∞, asymptotic results) Areas of Application - Information Theory Scalar and Vector Quantization (block source coding) Quantization with fixed or variable rate High resolution quantization (n → ∞, asymptotic results) Rate-distortion theory Vector Quantization and Signal Compression, Gersho and Gray (2000) Voronoi a a regions and diagrams Georgi Voronoi, 1868-1908 α (locally) finite subset of Rd , k·k any norm on Rd , Voronoi region (or, Dirichlet region) generated by a ∈ α is defined as d W (a|α) , x ∈ R : kx − ak = min kx − bk b∈α Voronoi a a regions and diagrams Georgi Voronoi, 1868-1908 α (locally) finite subset of Rd , k·k any norm on Rd , Voronoi region (or, Dirichlet region) generated by a ∈ α is defined as d W (a|α) , x ∈ R : kx − ak = min kx − bk b∈α W (a|α) depends on k·k and is closed, star-shaped relative to a - but not necessarily convex. (The latter holds if k·k is Euclidean.) Voronoi a a regions and diagrams Georgi Voronoi, 1868-1908 α (locally) finite subset of Rd , k·k any norm on Rd , Voronoi region (or, Dirichlet region) generated by a ∈ α is defined as d W (a|α) , x ∈ R : kx − ak = min kx − bk b∈α W (a|α) depends on k·k and is closed, star-shaped relative to a - but not necessarily convex. (The latter holds if k·k is Euclidean.) The Voronoi diagram of α (or, Dirichlet tesselation) {W (a|α) : a ∈ α} , is a (locally) finite covering of Rd . A Voronoi diagram Mathematica: DiagramPlot Voronoi regions Voronoi regions are determined by their neighbouring regions in the following sense. Lemma For a ∈ α, let β , {b ∈ α : W (b|α) ∩ W (a|α) 6= ∅} . Then W (a|α) = W (a|β). Voronoi partition A Borel measurable partition {Aa : a ∈ α} is called Voronoi partition of Rd with respect to α (and a Borel probability measure P) if Aa ⊂ W (a|α) P − a.s. ∀a ∈ α. Geometry and topology The open Voronoi region generated by a ∈ α is defined as d W0 (a|α) , x ∈ R : kx − ak < min kx − bk b∈α\{a} Pairwise disjoint, no covering of Rd Geometry and topology The open Voronoi region generated by a ∈ α is defined as d W0 (a|α) , x ∈ R : kx − ak < min kx − bk b∈α\{a} Pairwise disjoint, no covering of Rd In general, W0 (a|α) 6= int W (a|α). kxk = ky k = 1 ⇒ ksx + (1 − s)y k < 1, s ∈ (0, 1)) (Equality holds if k·k strictly convex, i.e. if Euclidean norms If k·k is euclidean, the following properties are fulfilled. If α is finite, W (a|α) is polyhedral. Euclidean norms If k·k is euclidean, the following properties are fulfilled. If α is finite, W (a|α) is polyhedral. If W (a|α) is bounded, it is polyhedral. Euclidean norms If k·k is euclidean, the following properties are fulfilled. If α is finite, W (a|α) is polyhedral. If W (a|α) is bounded, it is polyhedral. W (a|α) is convex. (Mann (1935)) Furthermore, convexity W (a|α) ∀α ∈ Rd·n , a ∈ α ⇔ k·k euclidean Euclidean norms If k·k is euclidean, the following properties are fulfilled. If α is finite, W (a|α) is polyhedral. If W (a|α) is bounded, it is polyhedral. W (a|α) is convex. Furthermore, convexity W (a|α) ∀α ∈ Rd·n , a ∈ α ⇔ k·k euclidean (Mann (1935)) W (a|α) is bounded if and only if a ∈ int conv α Boundary theorem Theorem Each of the following conditions implies λd (δW (a|α)) = 0, a ∈ α. 1 The underlying norm is strictly convex. 2 The underlying norm is the lp − norm with 1 ≤ p ≤ ∞. 3 d = 2. Tesselations For a Borel set C ⊂ Rd and a Borel measure P on Rd , a P−tesselation is a countable covering {Cn : n ∈ N} of C with Borel sets Cn ⊂ C such that P(Cn ∩ Cm ) = 0 for n 6= m. A λd −tesselation is simply called tesselation. Tesselations For a Borel set C ⊂ Rd and a Borel measure P on Rd , a P−tesselation is a countable covering {Cn : n ∈ N} of C with Borel sets Cn ⊂ C such that P(Cn ∩ Cm ) = 0 for n 6= m. A λd −tesselation is simply called tesselation. Proposition A Voronoi diagram is a P−tesselation of Rd , if and only if ! [ d W0 (a|α) = 0. P R \ a∈α A Voronoi diagram with respect to a strictly convex norm is a λd −tesselation of Rd . Quantization, Centers and Voronoi partitions For fixed n ∈ N, searching an optimal quantizer is equivalent to the n-centers problem : Lemma Vn,r (P) = inf α⊂Rd ,#α≤n h i E min kX − akr a∈α α is called n-optimal set of centers for P of order r if it realizes the infimum on the r.h.s. Quantization, Centers and Voronoi partitions For fixed n ∈ N, searching an optimal quantizer is equivalent to the n-centers problem : Lemma Vn,r (P) = inf α⊂Rd ,#α≤n h i E min kX − akr a∈α α is called n-optimal set of centers for P of order r if it realizes the infimum on the r.h.s. Proof. ” ≥ ”: For f we set α , f (Rd ). ” ≤ ”: For α we choose a Voronoi partition Aa and set P f = a∈α a · 1Aa . Quantization, Centers and Voronoi partitions For fixed n ∈ N, searching an optimal quantizer is equivalent to the n-centers problem : Lemma Vn,r (P) = inf α⊂Rd ,#α≤n h i E min kX − akr a∈α α is called n-optimal set of centers for P of order r if it realizes the infimum on the r.h.s. Proof. ” ≥ ”: For f we set α , f (Rd ). ” ≤ ”: For α we choose a Voronoi partition Aa and set P f = a∈α a · 1Aa . Weighted sum (or, integral) of distances ⇔ Mass transportation problem Nearest neighbour rule ⇔ Optimal redistribution Dupačova, Quantization, Centers and Voronoi partitions The quantization problem is equivalent to the problem of approximating P by a discrete probability with at most n supporting points . Quantization, Centers and Voronoi partitions The quantization problem is equivalent to the problem of approximating P by a discrete probability with at most n supporting points . For Borel probability measures P1 , P2 with R kxkr dPi (x) < ∞, let Z ρr (P1 , P2 ) , inf µ:πi µ=Pi i=1,2 the Lr −Wasserstein metric. 1 r kx − y k dµ(x, y ) , r Quantization, Centers and Voronoi partitions The quantization problem is equivalent to the problem of approximating P by a discrete probability with at most n supporting points . For Borel probability measures P1 , P2 with R kxkr dPi (x) < ∞, let Z ρr (P1 , P2 ) , inf µ:πi µ=Pi i=1,2 1 r kx − y k dµ(x, y ) , r the Lr −Wasserstein metric. By Pn we denote the set of all discrete probability measures Q with | supp Q| ≤ n. Quantization, Centers and Voronoi partitions Lemma Vn,r (P) = inf ρrr (P, Pf ) = inf ρrr (P, Q). f ∈Fn Q∈Pn Quantization, Centers and Voronoi partitions Lemma Vn,r (P) = inf ρrr (P, Pf ) = inf ρrr (P, Q). f ∈Fn Q∈Pn This yields the following stability result for the functional Vn,r (·). R For Pi , i = 1, 2 with kxkr dPi (x) < ∞ we have Vn,r (P1 )1/r − Vn,r (P2 )1/r ≤ ρr (P1 , P2 ). Quantization, Centers and Voronoi partitions Lemma Vn,r (P) = inf ρrr (P, Pf ) = inf ρrr (P, Q). f ∈Fn Q∈Pn This yields the following stability result for the functional Vn,r (·). R For Pi , i = 1, 2 with kxkr dPi (x) < ∞ we have Vn,r (P1 )1/r − Vn,r (P2 )1/r ≤ ρr (P1 , P2 ). The functional Vn,r (·) is concave: Lemma R Pm r i=1 si Pi , i=1 si = 1, si ≥ 0, Rd kxk dPi (x) < ∞. P Vn,r (P) ≥ m i=1 si Vn,r (Pi ) Pm P If ni ∈ N, i=1 ni = n we have Vn,r (P) ≤ m i=1 si Vni ,r (Pi ). Let P = Pm Asymptotics - Zador’s Theorem Lemma If EkX kr < ∞, then limn→∞ Vn,r (P) = 0. Asymptotics - Zador’s Theorem Lemma If EkX kr < ∞, then limn→∞ Vn,r (P) = 0. Theorem Zador(1962), Bucklew and Wise (1982), Graf and Luschgy (2000) Suppose EkX kr +δ < ∞ for some δ > 0. Let Pa be the absolutely continuous part of P and Qrd , inf nr /d Vn,r U[0,1]d . n≥1 Then Qrd > 0 and lim nr /d Vn,r (P) = Qrd n→∞ If P is singular this yields only Vn,r (P) = o Qrd is unknown for d ≥ 3, one knows that d → ∞. dPa dλd . d/(d+r ) 1 . nr /d 1 Qr = 2r (r1+1) , Q22 = 18 5 √ 3 and Qrd ∼ d 2πe r /2 with Asymptotics Theorem (Asymptotical optimal quantizer point weights) dP Suppose P λd , g = dλ d , k·k = k·k2 and d = 1. Let αn be an optimal quantizer of P. Then g (ain )2/(d+2) 1 R as n → ∞ n Rd g (x)2/(d+2) dx P [W (ain |αn )] ∼ (uniformly in ain , i = 1, . . . , n on every compact set) This holds as a conjecture for d > 1. s N 50 N 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 -4 -3 -2 -1 0 1 2 3 4 Point weights and limit function for N (0, 1) and n = 50. Application: Numerical Integration Problem: Evaluation of E [f (X )] h i P Approximation by E f (X̂ ) , where X̂ = ni=1 ai · 1Aai (X ) is an quantized version of X . Application: Numerical Integration Problem: Evaluation of E [f (X )] h i P Approximation by E f (X̂ ) , where X̂ = ni=1 ai · 1Aai (X ) is an quantized version of X . X̂ is a discrete random variable n h i X E f (X̂ ) = f (ai ) · PX [Aai ] i=1 Application: Numerical Integration h i Error bounds for E [f (X )] − E f (X̂ ) : If f ∈ C 1 with bounded Df : ∼ O n−1/d C · X − X̂ 2 Application: Numerical Integration h i Error bounds for E [f (X )] − E f (X̂ ) : If f ∈ C 1 with bounded Df : ∼ O n−1/d C · X − X̂ 2 1 If f ∈ C 1 with Lipschitz-continuous Df : 2 C · X − X̂ 2 1 If X̂ is an k·k2 −stationary quantizer ∼ O n−2/d Application: Numerical Integration h i Error bounds for E [f (X )] − E f (X̂ ) : If f ∈ C 1 with bounded Df : ∼ O n−1/d C · X − X̂ 2 1 If f ∈ C 1 with Lipschitz-continuous Df : 2 C · X − X̂ ∼ O n−2/d 2 Monte-Carlo methods: O n−1/2 . Comparisons of MC’s convidence intervals with Vnr : Pagès, Pham, Printemps (2004) (Asymptotically) Critical dimension d = 4. Quantization seems efficient for n not too large, up to d = 10. 1 If X̂ is an k·k2 −stationary quantizer Asymptotics in the infinite-dimensional case Less is known if P is defined on an infinite-dimensional Banach space (e.g. P = Wiener measure on C ([0, 1], k·ksup )). Asymptotics in the infinite-dimensional case Less is known if P is defined on an infinite-dimensional Banach space (e.g. P = Wiener measure on C ([0, 1], k·ksup )). Application: Numerical integration of functionals on Banach spaces, e.g. path-dependent options. Asymptotics in the infinite-dimensional case Less is known if P is defined on an infinite-dimensional Banach space (e.g. P = Wiener measure on C ([0, 1], k·ksup )). Application: Numerical integration of functionals on Banach spaces, e.g. path-dependent options. In the Wiener case we know Dereich and Scheutzow (2005) 1/2 lim (ln n) n→∞ with some constant c > 0. Vn,r (P) = c Asymptotics in the infinite-dimensional case Less is known if P is defined on an infinite-dimensional Banach space (e.g. P = Wiener measure on C ([0, 1], k·ksup )). Application: Numerical integration of functionals on Banach spaces, e.g. path-dependent options. In the Wiener case we know Dereich and Scheutzow (2005) 1/2 lim (ln n) n→∞ Vn,r (P) = c with some constant c > 0. Vn,r is the smallest worst-case error error that can be achieved by any deterministic algorithm with computational cost ≤ n (for numerical integration of Lipschitz continuous functionals) Asymptotics in the infinite-dimensional case Less is known if P is defined on an infinite-dimensional Banach space (e.g. P = Wiener measure on C ([0, 1], k·ksup )). Application: Numerical integration of functionals on Banach spaces, e.g. path-dependent options. In the Wiener case we know Dereich and Scheutzow (2005) 1/2 lim (ln n) n→∞ Vn,r (P) = c with some constant c > 0. Vn,r is the smallest worst-case error error that can be achieved by any deterministic algorithm with computational cost ≤ n (for numerical integration of Lipschitz continuous functionals) For random algorithms for diffusions we have Dereich, Müller-Gronbach, Ritter (2006) x random lim n1/4 · (ln n) Vn,r (P) = c n→∞ where x ∈ [− 41 , 34 ]. x = − 14 is realized for the Euler Monte-Carlo algorithm The Quantization Problem The Quantization Problem How to find optimal quantizers, n−centers or reduced probabilities? The Quantization Problem How to find optimal quantizers, n−centers or reduced probabilities? Problem: ψn,r : (Rd )n → R+ ψn,r (a1 , . . . , an ) = E min kX − ai kr 1≤i≤n is continuous, but typically not convex for n ≥ 2. Existence of optimal set of centers Theorem We have Vn,r < Vn−1,r . The level set {ψn,r ≤ c} is compact for every 0 ≤ c < Vn−1,r , hence optimal set of centers exist and lie in a bounded set. Existence of optimal set of centers Theorem We have Vn,r < Vn−1,r . The level set {ψn,r ≤ c} is compact for every 0 ≤ c < Vn−1,r , hence optimal set of centers exist and lie in a bounded set. Lemma Suppose the Voronoi diagram of α = (a1 , . . . , an ) is a P−tesselation of Rd . Then ψn,r has a one-sided directional derivative at α in every direction y ∈ (Rd )n given by ∇+ ψn,r (α, y ) = r n Z X i=1 kx − ai kr −1 · ∇+ k·k(ai − x, y )dP(x). W (ai |α) The condition of the above Lemma is fulfilled if k·k is strictly convex and P absolutely continuous w.r. to λd . If additionally k·k is differentiable on Rd \ {0} and (r > 1 or P[α] = 0) then ψn,r is differentiable at α. Stationarity - necessary for optimality Theorem Let α be a n-optimal set of centers of order r . Then |α| = n, for every a ∈ α we have P [W (a|α)] > 0 and a is a center of order r of P [·|W (a|α)]. Stationarity - necessary for optimality Theorem Let α be a n-optimal set of centers of order r . Then |α| = n, for every a ∈ α we have P [W (a|α)] > 0 and a is a center of order r of P [·|W (a|α)]. A set α ⊂ Rd with |α| = n fulfilling the above condition is called n-stationary set of centers for P of order r . Stationarity - necessary for optimality Theorem Let α be a n-optimal set of centers of order r . Then |α| = n, for every a ∈ α we have P [W (a|α)] > 0 and a is a center of order r of P [·|W (a|α)]. A set α ⊂ Rd with |α| = n fulfilling the above condition is called n-stationary set of centers for P of order r . Every n-stationary set of centers α is a stationary point of ψn,r , i.e. ∇+ ψn,r (α, y ) ≥ 0 ∀y ∈ (Rd )n . Stationarity - necessary for optimality Theorem Let α be a n-optimal set of centers of order r . Then |α| = n, for every a ∈ α we have P [W (a|α)] > 0 and a is a center of order r of P [·|W (a|α)]. A set α ⊂ Rd with |α| = n fulfilling the above condition is called n-stationary set of centers for P of order r . Every n-stationary set of centers α is a stationary point of ψn,r , i.e. ∇+ ψn,r (α, y ) ≥ 0 ∀y ∈ (Rd )n . Stationary set of centers are not necessarily local minima of ψn,r .Lloyd(1982) Lloyd’s algorithm I (k-mean method) Necessary conditions for optimality of a quantizer f = with certain Borel sets Aa : P a∈α a · 1Aa {Aa : a ∈ α} is a Voronoi partition with respect to α. neighbour property” ”nearest Lloyd’s algorithm I (k-mean method) Necessary conditions for optimality of a quantizer f = with certain Borel sets Aa : P a∈α a · 1Aa {Aa : a ∈ α} is a Voronoi partition with respect to α. ”nearest neighbour property” a is a center of order r of P [·|Aa )]. Lloyd’s algorithm I ”centroid property” Steinhaus (1956),Lloyd (1957) 1 Select an initial set α of n points ai . 2 Determine a Voronoi partition {Aa : a ∈ α} with respect to α. 3 Choose center ci of Aai and update α by setting ai , ci . 4 If E min1≤i≤n kX − ai kr does not satisfy a convergence criterion: back to step 2. Lloyd’s algorithm I (k-mean method) Descending algorithm, converging to a n-stationary set of centers (d = 1) Lloyd’s algorithm I (k-mean method) Descending algorithm, converging to a n-stationary set of centers (d = 1) Integrals have to be computed, easily done if X (or, P) is discrete. Lloyd’s algorithm I (k-mean method) Descending algorithm, converging to a n-stationary set of centers (d = 1) Integrals have to be computed, easily done if X (or, P) is discrete. ”..repeated applications of the Lloyd algorithm with different initial conditions has also proved effective in avoiding local optima.” Gray and Neuhoff (1998) Lloyd’s algorithm I (k-mean method) Descending algorithm, converging to a n-stationary set of centers (d = 1) Integrals have to be computed, easily done if X (or, P) is discrete. ”..repeated applications of the Lloyd algorithm with different initial conditions has also proved effective in avoiding local optima.” Gray and Neuhoff (1998) Lloyd’s algorithm can be used to improve the result of other methods. Lloyd’s algorithm I (k-mean method) Descending algorithm, converging to a n-stationary set of centers (d = 1) Integrals have to be computed, easily done if X (or, P) is discrete. ”..repeated applications of the Lloyd algorithm with different initial conditions has also proved effective in avoiding local optima.” Gray and Neuhoff (1998) Lloyd’s algorithm can be used to improve the result of other methods. Straightforward implementation can be quite slow (costly computation of nearest neighbor) Recent implementation: Kanungo,Mount,Netanyahu,Piatko,Silverman,Wu (2002), Lloyd’s algorithm I (k-mean method) Combination of Lloyd’s algorithm with global optimization techniques to avoid local minima (Simulated annealing/stochastic relaxation) Lloyd’s algorithm I (k-mean method) Combination of Lloyd’s algorithm with global optimization techniques to avoid local minima (Simulated annealing/stochastic relaxation) Generalized Vector Quantization Möller, Galicki, Baresova, Witte (1998) 1 Select an initial set α of n points ai . 2 Set α̃ , α + ξ with ξ ∼ N (0, σ 2 · Idd·n ). 3 Determine a Voronoi partition {Aa : a ∈ α̃} with respect to α. 4 Choose center ci of Aãi and update α̃ by setting ãi , ci . 5 If E min1≤i≤n kX − ãi k < E min1≤i≤n kX − ai k update α = α̃ and set σ , σ · ex, ex > 1, else σ , σ · ct, ct < 1 6 If E min1≤i≤n kX − ai k does not satisfy a convergence criterion and σ > σlb : back to step 2. r r r Pairwise Nearest Neighbor Design Heuristics to scenario reduction, or finding a codebook from a training sequence. PNN Equitz (1987) 1 Select an initial set α of N > n points ai , interpreted as clusters (containing a single point) 2 Find the pair of clusters with minimal increase in distortion, if they are merged. 3 Replace the clusters and their centroids by the merged clusters and the corresponding centroid. 4 If the number of the remaining clusters exceeds n, back to 2. cp. backward reduction Kohonen algorithm - Competitive Learning Vector Quantization Heuristics to determine a ”representative” set α = (a1 , . . . , an ). Kohonen algorithm 1 Select an initial set α of n points ai and set ji , 1, i = 1, . . . , n. 2 Take a sample y of P and find the ai ∗ closest to y 3 Update ai ∗ , (ji ∗ · ai ∗ + y ) / (ji ∗ + 1) and ji ∗ , ji ∗ + 1. 4 If α does not satisfy a convergence criterion: back to step 2. Kohonen algorithm - Competitive Learning Vector Quantization Heuristics to determine a ”representative” set α = (a1 , . . . , an ). Kohonen algorithm 1 Select an initial set α of n points ai and set ji , 1, i = 1, . . . , n. 2 Take a sample y of P and find the ai ∗ closest to y 3 Update ai ∗ , (ji ∗ · ai ∗ + y ) / (ji ∗ + 1) and ji ∗ , ji ∗ + 1. 4 If α does not satisfy a convergence criterion: back to step 2. Convergence of the approximation error was proved by MacQueen (1967). Alternatively, update every ai depending on ky − ai k. Varying weighting scheme (or, stepsize), cp. to stochastic gradient method of Pagès and Printems (2003) 2 - Competitive Phase, 3 - Learning Phase The Gaussian case With r = 2 and k·k = k·k2 , the CLVQ approach appears as a Stochastic Gradient method in Pagès, Printems (2003), Pham, Runggaldier, Sellami (2004), Pagès, Pham, Printems (2004), Bally, Pagès, Printems (2005). ∂ψr ,n (α) ∂ai Z = 2 Z (ai − x)P(dx) i = 1, . . . , d W (ai |α) ∇ψr ,n (α) =: G (α, x)P(dx) Rd A gradient based approach would require computation of the dP−expectation. The Gaussian case With r = 2 and k·k = k·k2 , the CLVQ approach appears as a Stochastic Gradient method in Pagès, Printems (2003), Pham, Runggaldier, Sellami (2004), Pagès, Pham, Printems (2004), Bally, Pagès, Printems (2005). ∂ψr ,n (α) ∂ai Z = 2 Z (ai − x)P(dx) i = 1, . . . , d W (ai |α) ∇ψr ,n (α) =: G (α, x)P(dx) Rd A gradient based approach would require computation of the dP−expectation. Stochastic gradient approach: take a P−sample ξm+1 and update αm+1 = αm − γm+1 G (αm , ξm+1 ). The Gaussian case This means setting a − γ m+1 (ai − ξm+1 ) if ξm+1 ∈ W (ai |α) i ai , a if ξm+1 ∈ / W (ai |α). i Most time consuming task: computation of ξm+1 ’s nearest neighbour. The Gaussian case This means setting a − γ m+1 (ai − ξm+1 ) if ξm+1 ∈ W (ai |α) i ai , a if ξm+1 ∈ / W (ai |α). i Most time consuming task: computation of ξm+1 ’s nearest neighbour. √ Fastest rate of convergence (depending on (γm )) is m (CLT) Unfortunately, conditions for P−a.s. convergence are not fulfilled. The Gaussian case Quantizer of N (0, Id2 ) with n = 500. The Gaussian case Quantizer of N (0, Id2 ) with n = 500 and its Voronoi diagramm. The Gaussian case Application: Discretization of (controlled) diffusions driven by Brownian motion to solve numerically stochastic control problems: dXt = µ(Xt , t)dt + σ(Xt , t)dWt , 0 ≤ t ≤ T . The Gaussian case Application: Discretization of (controlled) diffusions driven by Brownian motion to solve numerically stochastic control problems: dXt = µ(Xt , t)dt + σ(Xt , t)dWt , 0 ≤ t ≤ T . 1 Time discretization (0 = t1 < . . . < tM = T ) (e.g. Euler Scheme) The Gaussian case Application: Discretization of (controlled) diffusions driven by Brownian motion to solve numerically stochastic control problems: dXt = µ(Xt , t)dt + σ(Xt , t)dWt , 0 ≤ t ≤ T . 1 2 Time discretization (0 = t1 < . . . < tM = T ) (e.g. Euler Scheme) Simultaneous spatial discretization through optimal quantization: Heuristics to optimal allocation of nti points to the i−th quantization problem,i = 1, . . . , M The Gaussian case Application: Discretization of (controlled) diffusions driven by Brownian motion to solve numerically stochastic control problems: dXt = µ(Xt , t)dt + σ(Xt , t)dWt , 0 ≤ t ≤ T . 1 2 Time discretization (0 = t1 < . . . < tM = T ) (e.g. Euler Scheme) Simultaneous spatial discretization through optimal quantization: Heuristics to optimal allocation of nti points to the i−th quantization problem,i = 1, . . . , M Sample the vector (discrete time process) (Xt1 , . . . , XtM ) Update the quantizer αti with Xti , i = 1, . . . , M. for fixed PM n = i=1 nti . Estimation of transition probabilities. The Gaussian case Application: Discretization of (controlled) diffusions driven by Brownian motion to solve numerically stochastic control problems: dXt = µ(Xt , t)dt + σ(Xt , t)dWt , 0 ≤ t ≤ T . 1 2 Time discretization (0 = t1 < . . . < tM = T ) (e.g. Euler Scheme) Simultaneous spatial discretization through optimal quantization: Heuristics to optimal allocation of nti points to the i−th quantization problem,i = 1, . . . , M Sample the vector (discrete time process) (Xt1 , . . . , XtM ) Update the quantizer αti with Xti , i = 1, . . . , M. for fixed PM n = i=1 nti . Estimation of transition probabilities. 3 Define a discretized process X̂ which is a Markov chain. The Markov property allows using a recombining tree. Stability This yields a process X̂ taking values in a finite set with P (almost) minimal M k=1 E Xtk − X̂tk . Problem: Generally, it is not enough to consider this stagewise disturbance. Heitsch, Römisch, Strugarek (2005) Stability This yields a process X̂ taking values in a finite set with P (almost) minimal M k=1 E Xtk − X̂tk . Problem: Generally, it is not enough to consider this stagewise disturbance. Heitsch, Römisch, Strugarek (2005) Obviously, quantization (or, clustering scenarios) at time tk can considerably change the optimal value of the optimization problem, if P Xtk+1 ∈ ·, . . . , XtM ∈ · Xtk = x does not depend continuously on x. Stability This yields a process X̂ taking values in a finite set with P (almost) minimal M k=1 E Xtk − X̂tk . Problem: Generally, it is not enough to consider this stagewise disturbance. Heitsch, Römisch, Strugarek (2005) Obviously, quantization (or, clustering scenarios) at time tk can considerably change the optimal value of the optimization problem, if P Xtk+1 ∈ ·, . . . , XtM ∈ · Xtk = x does not depend continuously on x. In which sense? Conditions on X ? Stability Assumption of Bally, Pagès, Printems (2005) For every k = 1, . . . , M − 1 there exists a constant K > 0, such that for every Lipschitz continuous mapping f : Rd → R with Lipschitz constant [f ]lip the mapping Pt f : R d x → R 7→ E f (Xtk+1 )|Xtk = x is Lipschitz continuous with Lipschitz constant K · [f ]lip . continuous time framework, this condition is known as Feller property In a Markovian, Stability Assumption of Bally, Pagès, Printems (2005) For every k = 1, . . . , M − 1 there exists a constant K > 0, such that for every Lipschitz continuous mapping f : Rd → R with Lipschitz constant [f ]lip the mapping Pt f : R d x → R 7→ E f (Xtk+1 )|Xtk = x is Lipschitz continuous with Lipschitz constant K · [f ]lip . In a Markovian, continuous time framework, this condition is known as Feller property This is fulfilled by a variety of stochastic processes and guarantees stability for the problem of Bally et al. (American Option Pricing) Stability Assumption of Bally, Pagès, Printems (2005) For every k = 1, . . . , M − 1 there exists a constant K > 0, such that for every Lipschitz continuous mapping f : Rd → R with Lipschitz constant [f ]lip the mapping Pt f : R d x → R 7→ E f (Xtk+1 )|Xtk = x is Lipschitz continuous with Lipschitz constant K · [f ]lip . In a Markovian, continuous time framework, this condition is known as Feller property This is fulfilled by a variety of stochastic processes and guarantees stability for the problem of Bally et al. (American Option Pricing) Perspective: Extension of the stability result on more general multistage stochastic programs Conclusion Quantization - an old problem with a wide range of applications Non-convex, very ”bumpy” optimization problem Variety of heuristically motivated algorithms Active fields of research Conclusion Quantization - an old problem with a wide range of applications Non-convex, very ”bumpy” optimization problem Variety of heuristically motivated algorithms Active fields of research Thank you very much.