Uploaded by Veronica Hillary Guevara

quantization -berlin

advertisement
Quantization for Probability Distributions - An
Introduction
Christian Küchler
FS Numerik stochastischer Modelle, SS 2006
May 31, 2006
1
Introduction, History and Applications
2
Voronoi regions, diagrams and tesselations
3
Properties, Related Problems and Asymptotics
4
Application: Numerical Integration
5
Algorithms
6
Application: Quantization of Stochastic Processes
7
Conclusion
Quantization
”Quantization is the division of a quantity into a discrete number
of small parts, often assumed to be integral multiples of a common
quantity.”, The dictionary (Random House)
Quantization
”Quantization is the division of a quantity into a discrete number
of small parts, often assumed to be integral multiples of a common
quantity.”, The dictionary (Random House)
Oldest example: rounding off for estimating densities by
histograms. Sheppard (1898)
Quantization
”Quantization is the division of a quantity into a discrete number
of small parts, often assumed to be integral multiples of a common
quantity.”, The dictionary (Random House)
Oldest example: rounding off for estimating densities by
histograms. Sheppard (1898)
The term ”quantization” originates in the theory of signal
processing in electrical engineering in the late 40’s. Oliver, Pierce,
Shannon (1948) and Bennett (1948)
Quantization
”Quantization is the division of a quantity into a discrete number
of small parts, often assumed to be integral multiples of a common
quantity.”, The dictionary (Random House)
Oldest example: rounding off for estimating densities by
histograms. Sheppard (1898)
The term ”quantization” originates in the theory of signal
processing in electrical engineering in the late 40’s. Oliver, Pierce,
Shannon (1948) and Bennett (1948)
Analog-to-digital conversion, data compression
Quantization
”Quantization is the division of a quantity into a discrete number
of small parts, often assumed to be integral multiples of a common
quantity.”, The dictionary (Random House)
Oldest example: rounding off for estimating densities by
histograms. Sheppard (1898)
The term ”quantization” originates in the theory of signal
processing in electrical engineering in the late 40’s. Oliver, Pierce,
Shannon (1948) and Bennett (1948)
Analog-to-digital conversion, data compression
History and Overview : Gray and Neuhoff (1998)
Quantization
Let X be a Rd −valued random variable with distribution P
and EkX kr < ∞ for fixed 1 ≤ r < ∞.
Fn is the set of all Borel measurable maps f : Rd → Rd with
#f (Rd ) ≤ n, f ∈ Fn is called quantizer , f (X ) a quantized
version of X
Quantization
Let X be a Rd −valued random variable with distribution P
and EkX kr < ∞ for fixed 1 ≤ r < ∞.
Fn is the set of all Borel measurable maps f : Rd → Rd with
#f (Rd ) ≤ n, f ∈ Fn is called quantizer , f (X ) a quantized
version of X
The n−th quantization error for P of order r is defined by
Vn,r (P) , inf EkX − f (X )kr .
f ∈Fn
In the following we assume | supp P| ≥ n + 1.
Quantization
Let X be a Rd −valued random variable with distribution P
and EkX kr < ∞ for fixed 1 ≤ r < ∞.
Fn is the set of all Borel measurable maps f : Rd → Rd with
#f (Rd ) ≤ n, f ∈ Fn is called quantizer , f (X ) a quantized
version of X
The n−th quantization error for P of order r is defined by
Vn,r (P) , inf EkX − f (X )kr .
f ∈Fn
In the following we assume | supp P| ≥ n + 1.
A quantizer f is called n-optimal of order r if
Vn,r (P) = EkX − f (X )kr .
Areas of Application
Quantization problems appear in various scientific fields, e.g.
Information theory (signal compression)
Areas of Application
Quantization problems appear in various scientific fields, e.g.
Information theory (signal compression)
Cluster analysis, pattern and speech recognition (quantization
of empirical measures)
Areas of Application
Quantization problems appear in various scientific fields, e.g.
Information theory (signal compression)
Cluster analysis, pattern and speech recognition (quantization
of empirical measures)
Numerical Integration
Areas of Application
Quantization problems appear in various scientific fields, e.g.
Information theory (signal compression)
Cluster analysis, pattern and speech recognition (quantization
of empirical measures)
Numerical Integration
Simulation of stochastic processes
Infinite-dimensional quantization (Functional Quantization)
Areas of Application
Quantization problems appear in various scientific fields, e.g.
Information theory (signal compression)
Cluster analysis, pattern and speech recognition (quantization
of empirical measures)
Numerical Integration
Simulation of stochastic processes
Infinite-dimensional quantization (Functional Quantization)
Mathematical models in economics
Areas of Application - Information Theory
Scalar and Vector Quantization (block source coding)
Areas of Application - Information Theory
Scalar and Vector Quantization (block source coding)
Quantization with fixed or variable rate
Areas of Application - Information Theory
Scalar and Vector Quantization (block source coding)
Quantization with fixed or variable rate
High resolution quantization (n → ∞, asymptotic results)
Areas of Application - Information Theory
Scalar and Vector Quantization (block source coding)
Quantization with fixed or variable rate
High resolution quantization (n → ∞, asymptotic results)
Rate-distortion theory
Vector Quantization and Signal Compression, Gersho and Gray (2000)
Voronoi
a
a
regions and diagrams
Georgi Voronoi, 1868-1908
α (locally) finite subset of Rd , k·k any norm on Rd ,
Voronoi region (or, Dirichlet region) generated by a ∈ α is
defined as
d
W (a|α) , x ∈ R : kx − ak = min kx − bk
b∈α
Voronoi
a
a
regions and diagrams
Georgi Voronoi, 1868-1908
α (locally) finite subset of Rd , k·k any norm on Rd ,
Voronoi region (or, Dirichlet region) generated by a ∈ α is
defined as
d
W (a|α) , x ∈ R : kx − ak = min kx − bk
b∈α
W (a|α) depends on k·k and is closed, star-shaped relative to
a - but not necessarily convex. (The latter holds if k·k is Euclidean.)
Voronoi
a
a
regions and diagrams
Georgi Voronoi, 1868-1908
α (locally) finite subset of Rd , k·k any norm on Rd ,
Voronoi region (or, Dirichlet region) generated by a ∈ α is
defined as
d
W (a|α) , x ∈ R : kx − ak = min kx − bk
b∈α
W (a|α) depends on k·k and is closed, star-shaped relative to
a - but not necessarily convex. (The latter holds if k·k is Euclidean.)
The Voronoi diagram of α (or, Dirichlet tesselation)
{W (a|α) : a ∈ α} ,
is a (locally) finite covering of Rd .
A Voronoi diagram
Mathematica: DiagramPlot
Voronoi regions
Voronoi regions are determined by their neighbouring regions in the
following sense.
Lemma
For a ∈ α, let
β , {b ∈ α : W (b|α) ∩ W (a|α) 6= ∅} .
Then W (a|α) = W (a|β).
Voronoi partition
A Borel measurable partition {Aa : a ∈ α} is called Voronoi
partition of Rd with respect to α (and a Borel probability measure
P) if
Aa ⊂ W (a|α)
P − a.s. ∀a ∈ α.
Geometry and topology
The open Voronoi region generated by a ∈ α is defined as
d
W0 (a|α) , x ∈ R : kx − ak < min kx − bk
b∈α\{a}
Pairwise disjoint, no covering of Rd
Geometry and topology
The open Voronoi region generated by a ∈ α is defined as
d
W0 (a|α) , x ∈ R : kx − ak < min kx − bk
b∈α\{a}
Pairwise disjoint, no covering of Rd
In general, W0 (a|α) 6= int W (a|α).
kxk = ky k = 1 ⇒ ksx + (1 − s)y k < 1, s ∈ (0, 1))
(Equality holds if k·k strictly convex, i.e. if
Euclidean norms
If k·k is euclidean, the following properties are fulfilled.
If α is finite, W (a|α) is polyhedral.
Euclidean norms
If k·k is euclidean, the following properties are fulfilled.
If α is finite, W (a|α) is polyhedral.
If W (a|α) is bounded, it is polyhedral.
Euclidean norms
If k·k is euclidean, the following properties are fulfilled.
If α is finite, W (a|α) is polyhedral.
If W (a|α) is bounded, it is polyhedral.
W (a|α) is convex.
(Mann (1935))
Furthermore, convexity W (a|α) ∀α ∈ Rd·n , a ∈ α ⇔ k·k euclidean
Euclidean norms
If k·k is euclidean, the following properties are fulfilled.
If α is finite, W (a|α) is polyhedral.
If W (a|α) is bounded, it is polyhedral.
W (a|α) is convex.
Furthermore, convexity W (a|α) ∀α ∈ Rd·n , a ∈ α ⇔ k·k euclidean
(Mann (1935))
W (a|α) is bounded if and only if a ∈ int conv α
Boundary theorem
Theorem
Each of the following conditions implies
λd (δW (a|α)) = 0, a ∈ α.
1
The underlying norm is strictly convex.
2
The underlying norm is the lp − norm with 1 ≤ p ≤ ∞.
3
d = 2.
Tesselations
For a Borel set C ⊂ Rd and a Borel measure P on Rd , a
P−tesselation is a countable covering {Cn : n ∈ N} of C with
Borel sets Cn ⊂ C such that P(Cn ∩ Cm ) = 0 for n 6= m.
A λd −tesselation is simply called tesselation.
Tesselations
For a Borel set C ⊂ Rd and a Borel measure P on Rd , a
P−tesselation is a countable covering {Cn : n ∈ N} of C with
Borel sets Cn ⊂ C such that P(Cn ∩ Cm ) = 0 for n 6= m.
A λd −tesselation is simply called tesselation.
Proposition
A Voronoi diagram is a P−tesselation of Rd , if and only if
!
[
d
W0 (a|α) = 0.
P R \
a∈α
A Voronoi diagram with respect to a strictly convex norm is a
λd −tesselation of Rd .
Quantization, Centers and Voronoi partitions
For fixed n ∈ N, searching an optimal quantizer is equivalent to the
n-centers problem :
Lemma
Vn,r (P) =
inf
α⊂Rd ,#α≤n
h
i
E min kX − akr
a∈α
α is called n-optimal set of centers for P of order r if it realizes the infimum on the r.h.s.
Quantization, Centers and Voronoi partitions
For fixed n ∈ N, searching an optimal quantizer is equivalent to the
n-centers problem :
Lemma
Vn,r (P) =
inf
α⊂Rd ,#α≤n
h
i
E min kX − akr
a∈α
α is called n-optimal set of centers for P of order r if it realizes the infimum on the r.h.s.
Proof.
” ≥ ”: For f we set α , f (Rd ).
” ≤ ”: For α we choose a Voronoi partition Aa and set
P
f = a∈α a · 1Aa .
Quantization, Centers and Voronoi partitions
For fixed n ∈ N, searching an optimal quantizer is equivalent to the
n-centers problem :
Lemma
Vn,r (P) =
inf
α⊂Rd ,#α≤n
h
i
E min kX − akr
a∈α
α is called n-optimal set of centers for P of order r if it realizes the infimum on the r.h.s.
Proof.
” ≥ ”: For f we set α , f (Rd ).
” ≤ ”: For α we choose a Voronoi partition Aa and set
P
f = a∈α a · 1Aa .
Weighted sum (or, integral) of distances ⇔ Mass
transportation problem
Nearest neighbour rule ⇔ Optimal redistribution Dupačova,
Quantization, Centers and Voronoi partitions
The quantization problem is equivalent to the problem of
approximating P by a discrete probability with at most n
supporting points .
Quantization, Centers and Voronoi partitions
The quantization problem is equivalent to the problem of
approximating P by a discrete probability with at most n
supporting points .
For Borel probability measures P1 , P2 with
R
kxkr dPi (x) < ∞, let
Z
ρr (P1 , P2 ) ,
inf
µ:πi µ=Pi
i=1,2
the Lr −Wasserstein metric.
1
r
kx − y k dµ(x, y ) ,
r
Quantization, Centers and Voronoi partitions
The quantization problem is equivalent to the problem of
approximating P by a discrete probability with at most n
supporting points .
For Borel probability measures P1 , P2 with
R
kxkr dPi (x) < ∞, let
Z
ρr (P1 , P2 ) ,
inf
µ:πi µ=Pi
i=1,2
1
r
kx − y k dµ(x, y ) ,
r
the Lr −Wasserstein metric.
By Pn we denote the set of all discrete probability measures Q
with | supp Q| ≤ n.
Quantization, Centers and Voronoi partitions
Lemma
Vn,r (P) = inf ρrr (P, Pf ) = inf ρrr (P, Q).
f ∈Fn
Q∈Pn
Quantization, Centers and Voronoi partitions
Lemma
Vn,r (P) = inf ρrr (P, Pf ) = inf ρrr (P, Q).
f ∈Fn
Q∈Pn
This yields the following stability result for the functional Vn,r (·).
R
For Pi , i = 1, 2 with kxkr dPi (x) < ∞ we have
Vn,r (P1 )1/r − Vn,r (P2 )1/r ≤ ρr (P1 , P2 ).
Quantization, Centers and Voronoi partitions
Lemma
Vn,r (P) = inf ρrr (P, Pf ) = inf ρrr (P, Q).
f ∈Fn
Q∈Pn
This yields the following stability result for the functional Vn,r (·).
R
For Pi , i = 1, 2 with kxkr dPi (x) < ∞ we have
Vn,r (P1 )1/r − Vn,r (P2 )1/r ≤ ρr (P1 , P2 ).
The functional Vn,r (·) is concave:
Lemma
R
Pm
r
i=1 si Pi ,
i=1 si = 1, si ≥ 0, Rd kxk dPi (x) < ∞.
P
Vn,r (P) ≥ m
i=1 si Vn,r (Pi )
Pm
P
If ni ∈ N, i=1 ni = n we have Vn,r (P) ≤ m
i=1 si Vni ,r (Pi ).
Let P =
Pm
Asymptotics - Zador’s Theorem
Lemma
If EkX kr < ∞, then limn→∞ Vn,r (P) = 0.
Asymptotics - Zador’s Theorem
Lemma
If EkX kr < ∞, then limn→∞ Vn,r (P) = 0.
Theorem
Zador(1962), Bucklew and Wise (1982), Graf and Luschgy (2000)
Suppose EkX kr +δ < ∞ for some δ > 0. Let Pa be the absolutely
continuous part of P and
Qrd , inf nr /d Vn,r U[0,1]d .
n≥1
Then Qrd > 0 and
lim nr /d Vn,r (P) = Qrd
n→∞
If P is singular this yields only Vn,r (P) = o
Qrd is unknown for d ≥ 3, one knows that
d → ∞.
dPa
dλd
.
d/(d+r )
1
.
nr /d
1
Qr = 2r (r1+1) ,
Q22 =
18
5
√
3
and Qrd ∼
d
2πe
r /2
with
Asymptotics
Theorem (Asymptotical optimal quantizer point weights)
dP
Suppose P λd , g = dλ
d , k·k = k·k2 and d = 1. Let αn be an
optimal quantizer of P. Then
g (ain )2/(d+2)
1
R
as n → ∞
n Rd g (x)2/(d+2) dx
P [W (ain |αn )] ∼
(uniformly in ain , i = 1, . . . , n on every compact set)
This holds as a conjecture for d > 1.
s N
50 N
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
-4
-3
-2
-1
0
1
2
3
4
Point weights and limit function for N (0, 1) and n = 50.
Application: Numerical Integration
Problem: Evaluation of E [f (X )]
h
i
P
Approximation by E f (X̂ ) , where X̂ = ni=1 ai · 1Aai (X ) is
an quantized version of X .
Application: Numerical Integration
Problem: Evaluation of E [f (X )]
h
i
P
Approximation by E f (X̂ ) , where X̂ = ni=1 ai · 1Aai (X ) is
an quantized version of X .
X̂ is a discrete random variable
n
h
i X
E f (X̂ ) =
f (ai ) · PX [Aai ]
i=1
Application: Numerical Integration
h
i
Error bounds for E [f (X )] − E f (X̂ ) :
If f ∈ C 1 with bounded Df :
∼ O n−1/d
C · X − X̂
2
Application: Numerical Integration
h
i
Error bounds for E [f (X )] − E f (X̂ ) :
If f ∈ C 1 with bounded Df :
∼ O n−1/d
C · X − X̂
2
1
If f ∈ C 1 with Lipschitz-continuous Df :
2
C · X − X̂
2
1
If X̂ is an k·k2 −stationary quantizer
∼ O n−2/d
Application: Numerical Integration
h
i
Error bounds for E [f (X )] − E f (X̂ ) :
If f ∈ C 1 with bounded Df :
∼ O n−1/d
C · X − X̂
2
1
If f ∈ C 1 with Lipschitz-continuous Df :
2
C · X − X̂
∼ O n−2/d
2
Monte-Carlo methods: O n−1/2 .
Comparisons of MC’s convidence intervals
with Vnr : Pagès, Pham, Printemps (2004)
(Asymptotically) Critical dimension d = 4. Quantization
seems efficient for n not too large, up to d = 10.
1
If X̂ is an k·k2 −stationary quantizer
Asymptotics in the infinite-dimensional case
Less is known if P is defined on an infinite-dimensional Banach
space (e.g. P = Wiener measure on C ([0, 1], k·ksup )).
Asymptotics in the infinite-dimensional case
Less is known if P is defined on an infinite-dimensional Banach
space (e.g. P = Wiener measure on C ([0, 1], k·ksup )).
Application: Numerical integration of functionals on Banach spaces,
e.g. path-dependent options.
Asymptotics in the infinite-dimensional case
Less is known if P is defined on an infinite-dimensional Banach
space (e.g. P = Wiener measure on C ([0, 1], k·ksup )).
Application: Numerical integration of functionals on Banach spaces,
e.g. path-dependent options.
In the Wiener case we know
Dereich and Scheutzow (2005)
1/2
lim (ln n)
n→∞
with some constant c > 0.
Vn,r (P) = c
Asymptotics in the infinite-dimensional case
Less is known if P is defined on an infinite-dimensional Banach
space (e.g. P = Wiener measure on C ([0, 1], k·ksup )).
Application: Numerical integration of functionals on Banach spaces,
e.g. path-dependent options.
In the Wiener case we know
Dereich and Scheutzow (2005)
1/2
lim (ln n)
n→∞
Vn,r (P) = c
with some constant c > 0.
Vn,r is the smallest worst-case error error that can be achieved by
any deterministic algorithm with computational cost ≤ n (for numerical
integration of Lipschitz continuous functionals)
Asymptotics in the infinite-dimensional case
Less is known if P is defined on an infinite-dimensional Banach
space (e.g. P = Wiener measure on C ([0, 1], k·ksup )).
Application: Numerical integration of functionals on Banach spaces,
e.g. path-dependent options.
In the Wiener case we know
Dereich and Scheutzow (2005)
1/2
lim (ln n)
n→∞
Vn,r (P) = c
with some constant c > 0.
Vn,r is the smallest worst-case error error that can be achieved by
any deterministic algorithm with computational cost ≤ n (for numerical
integration of Lipschitz continuous functionals)
For random algorithms for diffusions we have
Dereich, Müller-Gronbach, Ritter
(2006)
x
random
lim n1/4 · (ln n) Vn,r
(P) = c
n→∞
where x ∈ [− 41 , 34 ].
x = − 14 is realized for the Euler Monte-Carlo algorithm
The Quantization Problem
The Quantization Problem
How to find optimal quantizers, n−centers or reduced
probabilities?
The Quantization Problem
How to find optimal quantizers, n−centers or reduced
probabilities?
Problem:
ψn,r : (Rd )n → R+
ψn,r (a1 , . . . , an )
=
E min kX − ai kr
1≤i≤n
is continuous, but typically not convex for n ≥ 2.
Existence of optimal set of centers
Theorem
We have Vn,r < Vn−1,r . The level set {ψn,r ≤ c} is compact for
every 0 ≤ c < Vn−1,r , hence optimal set of centers exist and lie in
a bounded set.
Existence of optimal set of centers
Theorem
We have Vn,r < Vn−1,r . The level set {ψn,r ≤ c} is compact for
every 0 ≤ c < Vn−1,r , hence optimal set of centers exist and lie in
a bounded set.
Lemma
Suppose the Voronoi diagram of α = (a1 , . . . , an ) is a
P−tesselation of Rd . Then ψn,r has a one-sided directional
derivative at α in every direction y ∈ (Rd )n given by
∇+ ψn,r (α, y ) = r
n Z
X
i=1
kx − ai kr −1 · ∇+ k·k(ai − x, y )dP(x).
W (ai |α)
The condition of the above Lemma is fulfilled if k·k is strictly convex and P absolutely continuous w.r. to
λd .
If additionally k·k is differentiable on Rd \ {0} and (r > 1 or P[α] = 0) then ψn,r is differentiable at α.
Stationarity - necessary for optimality
Theorem
Let α be a n-optimal set of centers of order r . Then |α| = n, for
every a ∈ α we have P [W (a|α)] > 0 and a is a center of order r of
P [·|W (a|α)].
Stationarity - necessary for optimality
Theorem
Let α be a n-optimal set of centers of order r . Then |α| = n, for
every a ∈ α we have P [W (a|α)] > 0 and a is a center of order r of
P [·|W (a|α)].
A set α ⊂ Rd with |α| = n fulfilling the above condition is
called n-stationary set of centers for P of order r .
Stationarity - necessary for optimality
Theorem
Let α be a n-optimal set of centers of order r . Then |α| = n, for
every a ∈ α we have P [W (a|α)] > 0 and a is a center of order r of
P [·|W (a|α)].
A set α ⊂ Rd with |α| = n fulfilling the above condition is
called n-stationary set of centers for P of order r .
Every n-stationary set of centers α is a stationary point of
ψn,r , i.e. ∇+ ψn,r (α, y ) ≥ 0 ∀y ∈ (Rd )n .
Stationarity - necessary for optimality
Theorem
Let α be a n-optimal set of centers of order r . Then |α| = n, for
every a ∈ α we have P [W (a|α)] > 0 and a is a center of order r of
P [·|W (a|α)].
A set α ⊂ Rd with |α| = n fulfilling the above condition is
called n-stationary set of centers for P of order r .
Every n-stationary set of centers α is a stationary point of
ψn,r , i.e. ∇+ ψn,r (α, y ) ≥ 0 ∀y ∈ (Rd )n .
Stationary set of centers are not necessarily local minima of
ψn,r .Lloyd(1982)
Lloyd’s algorithm I (k-mean method)
Necessary conditions for optimality of a quantizer f =
with certain Borel sets Aa :
P
a∈α a · 1Aa
{Aa : a ∈ α} is a Voronoi partition with respect to α.
neighbour property”
”nearest
Lloyd’s algorithm I (k-mean method)
Necessary conditions for optimality of a quantizer f =
with certain Borel sets Aa :
P
a∈α a · 1Aa
{Aa : a ∈ α} is a Voronoi partition with respect to α.
”nearest
neighbour property”
a is a center of order r of P [·|Aa )].
Lloyd’s algorithm I
”centroid property”
Steinhaus (1956),Lloyd (1957)
1
Select an initial set α of n points ai .
2
Determine a Voronoi partition {Aa : a ∈ α} with
respect to α.
3
Choose center ci of Aai and update α by setting
ai , ci .
4
If E min1≤i≤n kX − ai kr does not satisfy a
convergence criterion: back to step 2.
Lloyd’s algorithm I (k-mean method)
Descending algorithm, converging to a n-stationary set of
centers (d = 1)
Lloyd’s algorithm I (k-mean method)
Descending algorithm, converging to a n-stationary set of
centers (d = 1)
Integrals have to be computed, easily done if X (or, P) is
discrete.
Lloyd’s algorithm I (k-mean method)
Descending algorithm, converging to a n-stationary set of
centers (d = 1)
Integrals have to be computed, easily done if X (or, P) is
discrete.
”..repeated applications of the Lloyd algorithm with different
initial conditions has also proved effective in avoiding local
optima.” Gray and Neuhoff (1998)
Lloyd’s algorithm I (k-mean method)
Descending algorithm, converging to a n-stationary set of
centers (d = 1)
Integrals have to be computed, easily done if X (or, P) is
discrete.
”..repeated applications of the Lloyd algorithm with different
initial conditions has also proved effective in avoiding local
optima.” Gray and Neuhoff (1998)
Lloyd’s algorithm can be used to improve the result of other
methods.
Lloyd’s algorithm I (k-mean method)
Descending algorithm, converging to a n-stationary set of
centers (d = 1)
Integrals have to be computed, easily done if X (or, P) is
discrete.
”..repeated applications of the Lloyd algorithm with different
initial conditions has also proved effective in avoiding local
optima.” Gray and Neuhoff (1998)
Lloyd’s algorithm can be used to improve the result of other
methods.
Straightforward implementation can be quite slow (costly
computation of nearest neighbor) Recent implementation:
Kanungo,Mount,Netanyahu,Piatko,Silverman,Wu (2002),
Lloyd’s algorithm I (k-mean method)
Combination of Lloyd’s algorithm with global optimization
techniques to avoid local minima (Simulated annealing/stochastic relaxation)
Lloyd’s algorithm I (k-mean method)
Combination of Lloyd’s algorithm with global optimization
techniques to avoid local minima (Simulated annealing/stochastic relaxation)
Generalized Vector Quantization
Möller, Galicki, Baresova, Witte (1998)
1
Select an initial set α of n points ai .
2
Set α̃ , α + ξ with ξ ∼ N (0, σ 2 · Idd·n ).
3
Determine a Voronoi partition {Aa : a ∈ α̃} with respect
to α.
4
Choose center ci of Aãi and update α̃ by setting ãi , ci .
5
If E min1≤i≤n kX − ãi k < E min1≤i≤n kX − ai k update α = α̃
and set σ , σ · ex, ex > 1, else σ , σ · ct, ct < 1
6
If E min1≤i≤n kX − ai k does not satisfy a convergence
criterion and σ > σlb : back to step 2.
r
r
r
Pairwise Nearest Neighbor Design
Heuristics to scenario reduction, or finding a codebook from a
training sequence.
PNN
Equitz (1987)
1
Select an initial set α of N > n points ai ,
interpreted as clusters (containing a single
point)
2
Find the pair of clusters with minimal increase
in distortion, if they are merged.
3
Replace the clusters and their centroids by the
merged clusters and the corresponding centroid.
4
If the number of the remaining clusters exceeds
n, back to 2.
cp. backward reduction
Kohonen algorithm - Competitive Learning Vector
Quantization
Heuristics to determine a ”representative” set α = (a1 , . . . , an ).
Kohonen algorithm
1
Select an initial set α of n points ai and set
ji , 1, i = 1, . . . , n.
2
Take a sample y of P and find the ai ∗ closest to y
3
Update ai ∗ , (ji ∗ · ai ∗ + y ) / (ji ∗ + 1) and ji ∗ , ji ∗ + 1.
4
If α does not satisfy a convergence criterion:
back to step 2.
Kohonen algorithm - Competitive Learning Vector
Quantization
Heuristics to determine a ”representative” set α = (a1 , . . . , an ).
Kohonen algorithm
1
Select an initial set α of n points ai and set
ji , 1, i = 1, . . . , n.
2
Take a sample y of P and find the ai ∗ closest to y
3
Update ai ∗ , (ji ∗ · ai ∗ + y ) / (ji ∗ + 1) and ji ∗ , ji ∗ + 1.
4
If α does not satisfy a convergence criterion:
back to step 2.
Convergence of the approximation error was proved by MacQueen (1967).
Alternatively, update every ai depending on ky − ai k.
Varying weighting scheme (or, stepsize), cp. to stochastic gradient method of Pagès and Printems (2003)
2 - Competitive Phase, 3 - Learning Phase
The Gaussian case
With r = 2 and k·k = k·k2 , the CLVQ approach appears as a
Stochastic Gradient method in
Pagès, Printems (2003),
Pham, Runggaldier, Sellami (2004),
Pagès, Pham, Printems (2004),
Bally, Pagès, Printems (2005).
∂ψr ,n
(α)
∂ai
Z
=
2
Z
(ai − x)P(dx) i = 1, . . . , d
W (ai |α)
∇ψr ,n (α) =:
G (α, x)P(dx)
Rd
A gradient based approach would require computation of the
dP−expectation.
The Gaussian case
With r = 2 and k·k = k·k2 , the CLVQ approach appears as a
Stochastic Gradient method in
Pagès, Printems (2003),
Pham, Runggaldier, Sellami (2004),
Pagès, Pham, Printems (2004),
Bally, Pagès, Printems (2005).
∂ψr ,n
(α)
∂ai
Z
=
2
Z
(ai − x)P(dx) i = 1, . . . , d
W (ai |α)
∇ψr ,n (α) =:
G (α, x)P(dx)
Rd
A gradient based approach would require computation of the
dP−expectation.
Stochastic gradient approach: take a P−sample ξm+1 and
update
αm+1 = αm − γm+1 G (αm , ξm+1 ).
The Gaussian case
This means setting

a − γ
m+1 (ai − ξm+1 ) if ξm+1 ∈ W (ai |α)
i
ai ,
a
if ξm+1 ∈
/ W (ai |α).
i
Most time consuming task: computation of ξm+1 ’s nearest
neighbour.
The Gaussian case
This means setting

a − γ
m+1 (ai − ξm+1 ) if ξm+1 ∈ W (ai |α)
i
ai ,
a
if ξm+1 ∈
/ W (ai |α).
i
Most time consuming task: computation of ξm+1 ’s nearest
neighbour.
√
Fastest rate of convergence (depending on (γm )) is m (CLT)
Unfortunately, conditions for P−a.s. convergence are not
fulfilled.
The Gaussian case
Quantizer of N (0, Id2 ) with n = 500.
The Gaussian case
Quantizer of N (0, Id2 ) with n = 500 and its Voronoi diagramm.
The Gaussian case
Application: Discretization of (controlled) diffusions driven by
Brownian motion to solve numerically stochastic control problems:
dXt = µ(Xt , t)dt + σ(Xt , t)dWt , 0 ≤ t ≤ T .
The Gaussian case
Application: Discretization of (controlled) diffusions driven by
Brownian motion to solve numerically stochastic control problems:
dXt = µ(Xt , t)dt + σ(Xt , t)dWt , 0 ≤ t ≤ T .
1
Time discretization (0 = t1 < . . . < tM = T )
(e.g. Euler Scheme)
The Gaussian case
Application: Discretization of (controlled) diffusions driven by
Brownian motion to solve numerically stochastic control problems:
dXt = µ(Xt , t)dt + σ(Xt , t)dWt , 0 ≤ t ≤ T .
1
2
Time discretization (0 = t1 < . . . < tM = T ) (e.g. Euler Scheme)
Simultaneous spatial discretization through optimal
quantization:
Heuristics to optimal allocation of nti points to the i−th
quantization problem,i = 1, . . . , M
The Gaussian case
Application: Discretization of (controlled) diffusions driven by
Brownian motion to solve numerically stochastic control problems:
dXt = µ(Xt , t)dt + σ(Xt , t)dWt , 0 ≤ t ≤ T .
1
2
Time discretization (0 = t1 < . . . < tM = T ) (e.g. Euler Scheme)
Simultaneous spatial discretization through optimal
quantization:
Heuristics to optimal allocation of nti points to the i−th
quantization problem,i = 1, . . . , M
Sample the vector (discrete time process) (Xt1 , . . . , XtM )
Update the quantizer αti with Xti , i = 1, . . . , M. for fixed
PM
n = i=1 nti .
Estimation of transition probabilities.
The Gaussian case
Application: Discretization of (controlled) diffusions driven by
Brownian motion to solve numerically stochastic control problems:
dXt = µ(Xt , t)dt + σ(Xt , t)dWt , 0 ≤ t ≤ T .
1
2
Time discretization (0 = t1 < . . . < tM = T ) (e.g. Euler Scheme)
Simultaneous spatial discretization through optimal
quantization:
Heuristics to optimal allocation of nti points to the i−th
quantization problem,i = 1, . . . , M
Sample the vector (discrete time process) (Xt1 , . . . , XtM )
Update the quantizer αti with Xti , i = 1, . . . , M. for fixed
PM
n = i=1 nti .
Estimation of transition probabilities.
3
Define a discretized process X̂ which is a Markov chain.
The Markov property allows using a recombining tree.
Stability
This yields a process X̂ taking values in a finite set with
P
(almost) minimal M
k=1 E Xtk − X̂tk .
Problem: Generally, it is not enough to consider this stagewise
disturbance. Heitsch, Römisch, Strugarek (2005)
Stability
This yields a process X̂ taking values in a finite set with
P
(almost) minimal M
k=1 E Xtk − X̂tk .
Problem: Generally, it is not enough to consider this stagewise
disturbance. Heitsch, Römisch, Strugarek (2005)
Obviously, quantization (or, clustering scenarios) at time tk
can considerably change the optimal value of the optimization
problem, if
P Xtk+1 ∈ ·, . . . , XtM ∈ · Xtk = x
does not depend continuously on x.
Stability
This yields a process X̂ taking values in a finite set with
P
(almost) minimal M
k=1 E Xtk − X̂tk .
Problem: Generally, it is not enough to consider this stagewise
disturbance. Heitsch, Römisch, Strugarek (2005)
Obviously, quantization (or, clustering scenarios) at time tk
can considerably change the optimal value of the optimization
problem, if
P Xtk+1 ∈ ·, . . . , XtM ∈ · Xtk = x
does not depend continuously on x.
In which sense? Conditions on X ?
Stability
Assumption of Bally, Pagès, Printems (2005)
For every k = 1, . . . , M − 1 there exists a constant K > 0, such
that for every Lipschitz continuous mapping f : Rd → R with
Lipschitz constant [f ]lip the mapping
Pt f : R d
x
→ R
7→ E f (Xtk+1 )|Xtk = x
is Lipschitz continuous with Lipschitz constant K · [f ]lip .
continuous time framework, this condition is known as Feller property
In a Markovian,
Stability
Assumption of Bally, Pagès, Printems (2005)
For every k = 1, . . . , M − 1 there exists a constant K > 0, such
that for every Lipschitz continuous mapping f : Rd → R with
Lipschitz constant [f ]lip the mapping
Pt f : R d
x
→ R
7→ E f (Xtk+1 )|Xtk = x
is Lipschitz continuous with Lipschitz constant K · [f ]lip .
In a Markovian,
continuous time framework, this condition is known as Feller property
This is fulfilled by a variety of stochastic processes and
guarantees stability for the problem of Bally et al. (American Option
Pricing)
Stability
Assumption of Bally, Pagès, Printems (2005)
For every k = 1, . . . , M − 1 there exists a constant K > 0, such
that for every Lipschitz continuous mapping f : Rd → R with
Lipschitz constant [f ]lip the mapping
Pt f : R d
x
→ R
7→ E f (Xtk+1 )|Xtk = x
is Lipschitz continuous with Lipschitz constant K · [f ]lip .
In a Markovian,
continuous time framework, this condition is known as Feller property
This is fulfilled by a variety of stochastic processes and
guarantees stability for the problem of Bally et al. (American Option
Pricing)
Perspective: Extension of the stability result on more general multistage stochastic programs
Conclusion
Quantization - an old problem with a wide range of
applications
Non-convex, very ”bumpy” optimization problem
Variety of heuristically motivated algorithms
Active fields of research
Conclusion
Quantization - an old problem with a wide range of
applications
Non-convex, very ”bumpy” optimization problem
Variety of heuristically motivated algorithms
Active fields of research
Thank you very much.
Download