Quad-trees, efficient networks, inferred fibres 2011 W-CAS Afternoon Wilfrid S. Kendall

advertisement
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
2011-11-30
Introduction
2011-WCAS
Quad-trees, efficient networks, inferred fibres
2011 W-CAS Afternoon
Wilfrid S. Kendall
w.s.kendall@warwick.ac.uk
Department of Statistics, University of Warwick
30th November 2011
Quad-trees, efficient networks, inferred fibres
2011 W-CAS Afternoon
Wilfrid S. Kendall
w.s.kendall@warwick.ac.uk
Department of Statistics, University of Warwick
30th November 2011
1
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
Introduction
This talk gives brief sketches of three snippets of research.
1. Quad-trees: image analysis in depth;
2. Efficient networks: how to build fast networks that connect
efficiently;
3. Inferred fibres: guessing curves given an associated point
pattern.
Common theme: Using probability to build useful models.
2
2011-11-30
Introduction
2011-WCAS
Introduction
Introduction
This talk gives brief sketches of three snippets of research.
1. Quad-trees: image analysis in depth;
Introduction
2. Efficient networks: how to build fast networks that connect
efficiently;
3. Inferred fibres: guessing curves given an associated point
pattern.
Common theme: Using probability to build useful models.
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
Ising images
Consider modelling a binary image using an Ising model.
1. Do this by envisioning an ideal image as a finite cartesian
lattice, with bond strengths J1 expressing the thought that
neighbouring pixels are likely to be similar (a “local
Bayesian prior”). The actual observed image is a duplicate
lattice in which neighbouring pixels are un-related to each
other, but relate to corresponding ideal pixels by bonds of
strength K .
2011-11-30
Introduction
2011-WCAS
Quad-trees
Ising images
Ising images
Consider modelling a binary image using an Ising model.
1. Do this by envisioning an ideal image as a finite cartesian
lattice, with bond strengths J1 expressing the thought that
neighbouring pixels are likely to be similar (a “local
Bayesian prior”). The actual observed image is a duplicate
lattice in which neighbouring pixels are un-related to each
other, but relate to corresponding ideal pixels by bonds of
strength K .
2. We can simulate this using the heat bath algorithm
described above. However we wish to simulate from the
Ising model conditioned on the observed noisy image.
Because the heat bath algorithm is reversible, we simply fix
observed pixels!
3. The heat-bath algorithm can be viewed as a collection of
correlated but very simple reflected random walks.
ANIMATION
1. This is of course a very simple model. To make it slightly more realistic, one
might introduce diagonal bonds of different strengths J2 .
2. Super-critical parameters are best, so the algorithm could be quite slow.
3. We can even implement CFTP!
2. We can simulate this using the heat bath algorithm
described above. However we wish to simulate from the
Ising model conditioned on the observed noisy image.
Because the heat bath algorithm is reversible, we simply fix
observed pixels!
3. The heat-bath algorithm can be viewed as a collection of
correlated but very simple reflected random walks.
ANIMATION
3
Quad-trees
Efficient networks
Inferred fibres
Conclusion
Multiresolution (I)
References
2011-11-30
Introduction
2011-WCAS
Quad-trees
Multiresolution (I)
Multiresolution (I)
Kendall and Wilson (2003): Ising model built on a quadtree;
different parent-child and horizontal neighbour connection
strengths (Jτ , Jλ ).
Question: in which range of parameters is the model suitable
for image analysis?
The motivation arises from image recognition algorithms in computer science, which
use hierarchical networks to model the scenes to be analyzed.
Tricks and problems:
1.
2.
3.
4.
Kendall and Wilson (2003): Ising model built on a quadtree;
different parent-child and horizontal neighbour connection
strengths (Jτ , Jλ ).
Question: in which range of parameters is the model suitable
for image analysis?
4
Failure of symmetry!
Hierarchical structure lends itself to structured images.
Connections across sub-trees mitigate “blocky” structures.
FKG inequalities allow one to relate Ising model (tricky) to percolation (slightly
less tricky).
5. One looks for phenomena occurring (a) for small parent-child interactions (layers
of 2-d Ising models), and (b) for small neighbour interactions (nearly like tree
models – but some very significant differences).
6. Tree-like structure is rather crucial to a significant part of the analysis.
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
2011-11-30
Introduction
Multiresolution (II)
http://www.dcs.warwick.ac.uk/˜rgw/sira/sim.html
(a) Jλ = 1, Jτ = 0.5
(b) Jλ = 1, Jτ = 1
(c) Jλ = 1, Jτ = 2
(d) Jλ = 0.5, Jτ = 0.5
(e) Jλ = 0.5, Jτ = 1
(f) Jλ = 0.5, Jτ = 2
(g) Jλ = 0.25, Jτ = 0.5
(h) Jλ = 0.25, Jτ = 1
(i) Jλ = 0.25, Jτ = 2
2011-WCAS
Quad-trees
Multiresolution (II)
Multiresolution (II)
http://www.dcs.warwick.ac.uk/˜rgw/sira/sim.html
(a) Jλ = 1, Jτ = 0.5
(b) Jλ = 1, Jτ = 1
(c) Jλ = 1, Jτ = 2
(d) Jλ = 0.5, Jτ = 0.5
(e) Jλ = 0.5, Jτ = 1
(f) Jλ = 0.5, Jτ = 2
(g) Jλ = 0.25, Jτ = 0.5
(h) Jλ = 0.25, Jτ = 1
(i) Jλ = 0.25, Jτ = 2
1. Only 200 resolution levels;
2. At each level, 1000 sweeps in scan order;
3. At each level, simulate square sub-region of 128 × 128 pixels conditioned by
mother 64 × 64 pixel region;
4. Impose periodic boundary conditions on 128 × 128 square region;
5. At the coarsest resolution, all pixels set white. At subsequent resolutions, ‘all
black’ initial state.
6. Careful analytical work using percolation and FKG comparison inequalities
isolates a regime in which (in the infinite variant) there is a single infinite cluster.
That captures the regime within which one might expect good image modelling.
There is still much to be done here!
5
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
An ancient optimization problem
A Roman
Emperor’s
dilemma:
2011-11-30
Introduction
2011-WCAS
Efficient networks
An ancient optimization problem
An ancient optimization problem
CON: Roads are expensive
to build and maintain;
Pro optimo
quod faciendum est?
6
PRO: Roads are needed to
move legions quickly around
the country;
CON: Roads are expensive
to build and maintain;
Pro optimo
quod faciendum est?
We begin by reviewing the work of Aldous and WSK (2008).
An early illustration of this trade-off: Roman emperors must have had to face the
optimization problem, how many Roman roads to build?
I owe the Latin comment to my erudite colleague Saul Jacka and my learned friend
Diana Barclay.
PRO: Roads are needed to
move legions quickly around
the country;
A Roman
Emperor’s
dilemma:
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
2011-11-30
Introduction
A problem in frustrated optimization
√
Consider N cities x (N) = {x1 , . . . , xN } in square side N.
Assess road network G = G(x (N) ) connecting cities by:
network total road length len(G)
(minimized by Steiner minimum tree ST(x (N) ));
versus
average network distance between two random cities,
��
1
average(G) =
distG (xi , xj ) ,
N(N − 1)
A problem in frustrated optimization
i�=j
(minimized by laying tarmac for complete graph).
Perhaps the average ratio would be a good measure of
performance?
� � distG (xi , xj )
1
i�=j
�xi − xj �
One might reasonably suppose, in order to get average(G(x (N) )) close to the
Euclidean distance (“as the crow flies”), one needs substantially more than the
minimum possible distance (len(ST(x (N) )) = O(N) where ST(x (N) ) is the Steiner
minimum tree for the configuration x (N) ).
We note in passing that computing the Steiner minimum tree is typically difficult
(NP-complete!), though approximation (in planar case) is feasible using randomized
algorithms.
√
Notice: generally we expect average(G(x (N) )) ≥ O( N) while it will be minimized for
the complete planar graph, for which len(G(x (N) )) = O(N 5/2 ).
(minimized by laying tarmac for complete graph).
Perhaps the average ratio would be a good measure of
performance?
� � distG (xi , xj )
1
i�=j
A problem in frustrated optimization
√
Consider N cities x (N) = {x1 , . . . , xN } in square side N.
Assess road network G = G(x (N) ) connecting cities by:
network total road length len(G)
(minimized by Steiner minimum tree ST(x (N) ));
versus
average network distance between two random cities,
��
1
average(G) =
distG (xi , xj ) ,
N(N − 1)
N(N − 1)
i�=j
N(N − 1)
2011-WCAS
Efficient networks
�xi − xj �
7
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
Asymptotics
Theorem
Careful asymptotics for n → ∞ show that
E
�1
�
=
2 len ∂Cx,y
��
�
�
n + 14
(α − sin α) exp − 12 (η − n) d z ≈
R2
�
�
4
5
n+
log n + γ +
3
3
where γ = 0.57721 . . . is the Euler-Mascheroni constant.
Thus a unit-intensity invariant Poisson line process is within
O(log n) of providing connections which are as efficient as
Euclidean connections.
8
2011-11-30
Introduction
2011-WCAS
Efficient networks
Asymptotics
Theorem
Careful asymptotics for n → ∞ show that
E
Asymptotics
�1
2
�
len ∂Cx,y
=
��
�
�
n + 14
(α − sin α) exp − 12 (η − n) d z ≈
R2
�
�
4
5
n+
log n + γ +
3
3
where γ = 0.57721 . . . is the Euler-Mascheroni constant.
Thus a unit-intensity invariant Poisson line process is within
O(log n) of providing connections which are as efficient as
Euclidean connections.
Considerable analytical work required here. The error incurred by the asymptotic can
1
be bounded by constant × 1/3
.
n
Euler-Mascheroni constant γ:
n
�
1
− log n
m
1
→
γ
as n → ∞ .
It is amusing that it is not yet known whether γ is irrational. Computed to about 2 billion
digits of accuracy (information from Wikipedia . . . if you believe that sort of source).
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
2011-11-30
Introduction
Three typical application contexts:
2011-WCAS
Inferred fibres
Three typical application contexts:
Three typical application contexts:
1. Fingerprint sweat pores
Extracted from fingerprint a002-5 from NIST Special database 30 (Watson 2001).
2. Earthquake epicentres
Epicentres in New Madrid region,
taken from CERI (Center for Earthquake Research and Information).
3. Universe within 500 Mly
Image: Richard Powell (atlasoftheuniverse.com/nearsc.html:
Creative Commons Attribution-ShareAlike 2.5 License).
Great variation in length scales between different datasets!
Another interesting application: minefields – mines tend to be laid along (curved) paths.
1. Fingerprints: typical scale 2.0 × 10−5 km
2. Earthquakes: typical scale 100 km
3. Universe: typical scale 4.7 × 1021 km
1. Fingerprint sweat pores
Extracted from fingerprint a002-5 from NIST Special database 30 (Watson 2001).
2. Earthquake epicentres
Epicentres in New Madrid region,
taken from CERI (Center for Earthquake Research and Information).
3. Universe within 500 Mly
Image: Richard Powell (atlasoftheuniverse.com/nearsc.html:
Creative Commons Attribution-ShareAlike 2.5 License).
9
Quad-trees
Efficient networks
Inferred fibres
Conclusion
Our statistical model (I)
Formulation using points clustered around curvilinear fibres
References
2011-11-30
Introduction
2011-WCAS
Inferred fibres
Our statistical model (I)
Formulation using points clustered around curvilinear fibres
Our statistical model (I)
We aspire to a “statistically principled” approach!
Observed points are attached to “anchor points” distributed along fragments of integral
curves of the underlying orientation field.
10
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
2011-11-30
Introduction
Our statistical model (II)
Construction of fibres as integral curves of orientation field
2011-WCAS
Inferred fibres
Our statistical model (II)
Construction of fibres as integral curves of orientation field
Our statistical model (II)
Form a Poisson process of finite-length fibres.
One could view this as a finite sample of the spacings achieved by cutting a “long line”
according to a Poisson process – though this presents undesirable measure-theoretic
complications!
Calculation can be done to ensure constant length intensity per unit area, at price of
inhomogeneous process of seeds marked by lengths.
We ignore issues to do with (a) re-entrant fibres, (b) window censoring.
11
Quad-trees
Efficient networks
Inferred fibres
Our statistical model (III)
Building up a (simplified) DAG
Conclusion
References
2011-11-30
Introduction
2011-WCAS
Inferred fibres
Our statistical model (III)
Building up a (simplified) DAG
Our statistical model (III)
Mark points are placed on fibres (a) according to a Poisson process (in simplest case);
or (b) using a continuous-time renewal process with Gamma-distributed spacings, to
allow for clustering or order. In both cases stationarity is imposed.
Note the extra complexity in the model: points are not classified as signal or noise, but
are given parameters specifying probability of their being signal or noise (a latent or
“hidden variable” approach).
12
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
2011-11-30
Introduction
Our statistical model (IV)
DAG of full model
2011-WCAS
Inferred fibres
Our statistical model (IV)
DAG of full model
Our statistical model (IV)
The full picture is quite involved.
13
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
Orientation Field
Calculating an appropriate orientation field is key.
Possible approaches include using random field theory, eg
extending a Gaussian field . . .
. . . but the configuration space of orientation fields is huge.
Use Empirical Bayes to evade resulting problems.
14
2011-11-30
Introduction
2011-WCAS
Inferred fibres
Orientation Field
Calculating an appropriate orientation field is key.
Possible approaches include using random field theory, eg
extending a Gaussian field . . .
Orientation Field
. . . but the configuration space of orientation fields is huge.
Use Empirical Bayes to evade resulting problems.
The space of possible configurations of random fields has dimension which is very high
(even infinite), and we may expect strict Bayesian inference to perform poorly here for
all sorts of reasons.
Quad-trees
Efficient networks
Inferred fibres
Conclusion
References
2011-11-30
Introduction
Fingerprints
Estimate of clustering of signal points
2011-WCAS
Inferred fibres
Fingerprints
Estimate of clustering of signal points
Fingerprints
Note the clear classification obtained here.
Posterior Probabilities for Number of Fibres
Number of Fibres
15
16
17
Posterior Probability
0.17
0.17
0.25
Other Properties Conditioned on the Number of Fibres
Number of Posterior
50% HPD
Fibres
Mean
Interval
16
15.00
[12,17]
Number of Noise Points
17
18.86
[16,19]
95th Percentile of the
16
3.00
[2.67,2.95]
Distances from Signal
17
3.10
[2.86,3.16]
Points to Fibres
16
864.75
[836,886]
Total Length of Fibres
17
814.43
[788,804]
18
0.11
95% HPD
Interval
[8,21]
[16,23]
[2.50,3.43]
[2.78,3.49]
[784,927]
[788,878]
15
Quad-trees
Efficient networks
Inferred fibres
Conclusion
Conclusion
Bespoke probability models for interesting situations.
16
References
2011-11-30
Introduction
2011-WCAS
Conclusion
Conclusion
Conclusion
Bespoke probability models for interesting situations.
Introduction
Quad-trees
Efficient networks
Inferred fibres
Conclusion
Aldous, D. J. and WSK (2008, March).
Short-length routes in low-cost networks via Poisson line
patterns.
Advances in Applied Probability 40(1), 1–21.
Kendall, W. S. and R. G. Wilson (2003, March).
Ising models and multiresolution quad-trees.
Advances in Applied Probability 35(1), 96–122.
Watson, C. (2001).
Dual Resolution Images from Paired Fingerprint Cards.
17
References
Download