Medical Image Classification with Advanced Markov Random Field

advertisement
Medical Image Classification with
Advanced Markov Random Field/Gibbs
Classification
A Dissertation Proposal
Zhihong Yang
May 30,2000
Advisor committee
Dr. Ian R. Greenshields
Dr. Howard Sholl
Dr. Reda Ammar
5/30/00
Outline
 Medical image classification and its time-consuming
properties
 Introduction to MRF/Gibbs classification
 Previous work and novelty of this study
 Methods and techniques to be used
 Partial parallel algorithm
 ensemble-parallel algorithm
 locally adaptive cooling schedule based on multiresolution tiling
 Summary of research goals and expected
contributions
 Result evaluation plan
 Availability and location of research facilities
5/30/00
2
Medical Image Classification with
MRF/Gibbs Methods--1/2
 Image classification is a procedure by which desired
information is extracted from original image data
through a designed algorithm
 four elements are involved in the definition of image
classification: original data/classified data/classification
algorithm/estimation criterion
 The scale of image classification problem--original input data: a 256 X 256 lattice grey image
classified image: a 256 X 256 lattice binary image
The number of possible output is power(2, 256*256). This is
a very big number
5/30/00
3
Medical Image Classification with
MRF/Gibbs Methods--2/2
 Markov Random Field/Gibbs classification is a wellestablished method to classify image based on statistical
inference.
 In order to introduce the problem we are facing, we
would like to summarize the properties of MRF/Gibbs
classification.
MAP estimate
slow speed of convergence due to the cooling schedule
 The principle of MRF/Gibbs classification follows the
problem statement.
5/30/00
4
The problem & Proposed methods
 The time consuming property of MRF/Gibbs
classification to medical image data because of both
the volume of the input data and the nature of the
algorithm
 Proposed Methods to reduce the time for the
MRF/Gibbs algorithm to converge
partial parallel algorithm
ensemble parallel algorithm
locally adaptive cooling schedule based on
multiresolution tiling
5/30/00
5
Introduction to MRF/Gibbs
Classification
 Priors and Posteriors
 Bayes decision rules
 Maximum A Posterior estimate
 Markov Random Fields/Neighborhood System
 Gibbs Fields
 Markov Chains, Limit Theorems, Convergence of the
algorithm
 Gibbs Sampler/ Visiting Scheme
 Simulated Annealing/Cooling Schedule
5/30/00
6
Priors and posterior Distributions
 Prior gives the model that we expect to see in an
image before observation, for example--How many classes are actually in the image?
What is the percentage distribution of these tissue
types?
Given a class, what is the distributions of data?
If it is normal distribution, what are the
parameters of the distribution?
 Posterior can be interrupted as an adjustment of
priors to the real data after the observation.
5/30/00
7
Gibbsian form of priors and posteriors
 A strictly positive probability distribution will always
have the Gibbsian form
(ς) 
where
1
Z
 exp(  H (ς))
ς
Z   exp(-H(z))
z
and H is called the energy function from statistical
physics. Large value of energe correspond to small value of probability values.
P( x | ˆy)  z( x( )zP) P( x(,zˆy,ˆy) )
the posterior has this form
where  is the prior on the space of image configurations, P( x,) are the
distributions of the data(observed images) y given the configuration x, and
P is the posterior (the distribution of configurations x given the observed
data y.
5/30/00
8
Bayes Decision rules
 What is the best? Two elements have to be take into
account
The ideal mode presented by priors
The real data that we observed
 The Bayesian approach takes into account both
requirements simultaneously by looking for desired
posteriors.
 A estimator minimizing the risk is called a Bayes
estimator
 MAP estimators are Bayes estimators for the 0-1 loss
functions.
5/30/00
9
MAP estimator
 An estimate from observed data that will maximize
the posterior distribution is called a Maximum A
Posterior(MAP) estimate.The image is estimated as a
whole in MAP. MAP is contextual related estimate.
Thus it is equal to look for the estimate that will
minimize the energy function of posterior distribution.
 Why MAP estimator? MAP estimator is the best
estimator under 0-1 loss function.
 Computation complexity--exponential
5/30/00
10
Markov Random Fields/Neighborhood
System
 Random Field---A strictly positive probability distribution on the
space of configurations
Index Set of Sites/Pixel
space of states/Classes
space of configurations
 Neighborhood System---Some axioms
One site is not its own neighbor.
Site S is site T’s neighbor, if and only if site T is site S’s
neighbor.
 Local characteristic-- the conditional probability of the site S,
given the configuration of other sites, is called local
characteristic.
 Markov field-- A site’s local characteristic only relies on its
neighbors. We want the neighborhood to be small.
5/30/00
11
Gibbs Fields
 Probability of Gibbs Form are always strictly positive
and hence random fields.
 Gibbs fields is induced by the energy function.
 Energy function is given by the sum of potentials.
 Potential is something related to a site’s
class/state/position to reflect the relative relations
with other sites’ class/state/position.
 Thus the image configurations with posterior that is
Gibbsian form is a Gibbs field.
5/30/00
12
Markov Chains:Limit Theorem
 Markov Kernal--the possibility from a old class(x) to new
class(y). It is a matrix, with x-th row and y-th column.
Markov kernels with a strictly positive power are called
primitive.
 Markov Chain--On the finite space X, given by an initial
distribution v and Markov kernels P1,P2,P3,…. If Pi=P
for all I then the chain is called homogeneous.
The limit Theorem. A primitive Markov kernel P on a finite space has a unique invariant
distribution μ and
n  , vP n  μ
uniformly in all distributions v.
5/30/00
13
Gibbs Sampler(1)
 limit theorem tells us that we should look for a strictly positive Markov
kernel for which the distribution is invariant.
 One natural construction is based on the local characteristics of the
possibility measure of the random field.
 A Markov Kernal is defined by a site’s local characteristic.
Z 1exp(-H(y  x S  )) if y S-  x S 
  ( x, y)  
0, otherwise
 The Gibbs field P and its local characteristics fulfill the detailed balance
equation.
 If u and P fulfill the detailed balance equation then u is invariant for P.
In particular, Gibbs fields are invariant for their local characteristics.
After very large number of iterations, one will end up in a sample from
a distribution close to the Gibbs field.
5/30/00
14
Gibbs Sampler(2)
 Visiting Scheme is an enumation of sites whose
classes are waiting to be determined.
 Gibbs Sampler. The above homogeneous Markov
chain with transition probability P induces the
following algorithm: an initial configuration x is
chosen or picked at random according to some initial
distribution v. In the first step, x is updated at site 1
by sampling from the single-site characteristic.
 This yields a new configuration y=y1xs\{1} which in
turn is updated at site 2. /This way all the sites in S
are sequentially updated. This will be called a sweep.
The first sweep results in a sample from vP. Running
the chain for many sweeps produces a sample from
vP…P. The procedure goes over and over …...
5/30/00
15
Simulated Annealing/Cooling Schedule
 The computation of MAP estimators for Gibbs fields amounts to the
minimization of energy functions.
β
β 1
β
,

(
x
)

(
Z
) exp(-βH(x))
 inverse temperature is defined by
 Cooling schedule is an increasing sequence of positive numbers
β( n) 
1
ln n
σ
 For every n>=1, a Markov kernel is defined by
Pn ( x, y)   βS(1 n)  βS(2n)  βS(μn) ( x, y)
 A Theorem by S. and D. Geman(1984)
 M
lim v P1P 2 P n ( x)  
n 
0
5/30/00
1
if x  M
otherwise
16
A Brief History of MRF/Gibbs
Classification
 S.Geman and D. Geman(1984) build a theoretical
schema for this algorithm.
 M.C. Zhang(1990) applied Geman’s brothers work to
image classification problem. Their definition of
potential function and energy are inherited in our study.
 Tom Deggeet and I.R. Greenshields(1998) explored 3-D
volume medical image classification. The proposed siteaging will be refined in our study.
 Numerous other attempts to apply Geman’s theory in
classifications
5/30/00
17
Literature Review--Parallel Simulated
Annealing
 Parallel computation by speculative computation
[Sohn,1995],[Nabhan,1995],[Witte, 1991]
 Partial Parallel Algorithm
cluster algorithm[Swensdon,1987],[Sokal,1989],[Fox,1995]
synchronous updating based on independent set
partition
[Beba, 1997]---not based on independent set partition, no
rigorous result on speed up.
[Jeng,1993]---failed to take the communication cost into
consideration
 Multiple trials/ensemble parallel
 Pseudo parallel algorithm[Deggett, 1999]
5/30/00
18
Comparison of the proposed parallel
techniques
Parallel techniques
Speculative Computation
cluster algorithm
Indendent set partition
Multiple trials
pesudo algorithm
5/30/00
spectific problems
no
isling/potts model
no
no
no
Speedup
2-3 at no comm cost
10( data from experiment)
pixels/chromic numbers
uncertain
number of data chunks
Properties
limited speedup
limited image model
chromic number is a NP complete problem
uncertainty
theoretical ground need to be bilit
19
Shorten the run time by adjusting
cooling schedule
 Various cooling schedules
polynomial[Young,1999],[Yuan, 1999]
adaptive cooling[Steinhofel,1998]
 Characters
Problem-specific
limited Speedup
preliminary consideration on cooling schedule that is
adaptive to the data.
Regular data partition.
5/30/00
20
The novelty of this study
 Partial parallel algorithm based on independent set
partition on identified simple neighbor hood system
 locally adaptive cooling schedule based multiresolution
tiling
 the application on medical image classification
5/30/00
21
Method 1--Partial Parallel Algorithm
 An example
 For a 4-neighbor hood with northern, eastern, southern and western
neighbors, an update at a 'black' site need no information about the
states in other 'black' sites. Hence, given a configuration x, all 'black'
processing units may do their job simultaneously and produce a new
configuration y' on the basis of x and then the white processors may
update y' in the same way and end up with a configuration y. Thus a
sweep is finished after two time steps and the transition possibility is
the same as for sequential updating over a sweep in |S| time steps.
5/30/00
22
Independent sets
 If a neighbor hood system is given, then a subset T of
S is independent if it contains no pair of neighbors.
 The transition probability of partially parallel based on
independent sets coincides with the transition probability
for one sequential sweep.
 The limit theorem for sampling stays the same as the
sequential case.
5/30/00
23
Independent set partition--The graph
coloring problem
 The small number of independent sets is called
chromatic number of the neighbor hood system. In
fact, it is the smallest number of colors needed to paint
the sites in such a fashion that neighbors never have the
same color.
 Two extreme cases. If the classes in the site are
independent, then there are no neighbor hood at all and
the chromatic number is 1. If the all sites interact each
other, then chromatic number is |S|.
 In general this problem is known as the graph coloring
problem, which is NP complete.
5/30/00
24
Independent set partition with small
neighborhood
 Though in general , the independent set partition is a
problem that is even harder than the Gibbs algorithm,
the problem is still solvable with a small neighborhood
system.
 For example, a 2-D four neighborhood (East, West,
North, and South) partition is given by a checkerboard
style partition. It is (B—black, W—white)
BWBWB
WBWBW
BWBWB
WBWBW
5/30/00
25
5-neighborhood partition
 A 2-D five neighborhood (East, West, North, South, and
southwestern) partition is given by (R—Red, G—Green,
B—Blue).
RGBRGBRGBRGB
GBRGBRGBRGBR
BRGBRGBRGBRG
RGBRGBRGBRGB
GBRGBRGBRGBR
5/30/00
26
3-D checkboard
 The partition to 3-D case is much more difficult than
that to 2-D neighborhood system, but to the 6neighborhood case (East, West, North, South, bottom,
up) the partition is elegant. Suppose we have a
checkerboard in each plain, we can have one plain
beginning from a black block (the upper left corner),
and another layer beginning from a white block,
respectively. It is a 3-D checkerboard.
5/30/00
27
Parallel strategy based on
independent partition
 With the independent set partition, we can consider the
parallel strategy of this algorithm. In a cluster computer,
we cut an image into several data block as a simple way,
and assign each block to a computer. Here is the key to
this implementation: each computer will not update the
sites in raster scanner order, it will update the sites with
white color first, black later or on the other way around.
Each computer does this in same order. After these
computers complete one color in the assigned data block,
they need to exchange the neighbor column/row. Then
they will proceed to update another color. After all color
have been updated, the next iteration will begin.
5/30/00
28
Data exchange after the black blocks
are updated
Processor 1
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
Processor 2
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
Processor 3
5/30/00
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
Processor 4
29
Data exchange after the white blocks
are updated
Processor 1
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
Processor 2
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
Processor 3
5/30/00
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
Processor 4
30
parallel algorithm on multigrid
clusters
P1
P3
5/30/00
P2
P4
31
Partial parallel algorithm
Partition the neighborhood manually, label the pixels with colors.
Partition the data into chunks
for t = start to end do (t is preselected based on locally adaptive cooling schedule)
For colors = color0 to colorN
for every pixel (i,j,k) in the image do
let s = c(i,j,k) be the current class at (i,j,k)
let N(i,j,k) be the neighborhood at (i,jk)
compute E[s; i,j,k ] (==energy at (i,j,k) )
from the New Class Selection Rule*
let q be a new class for site (i,j,k)
compute E[q; i,j,k] (==energy at (i,j,k) with class q
If ( E[q;i,j,k] < E[s;i,j,k] )
c(i,j,k) = q
Else
let D = Abs( E[q; i,j,k] – E[s;i,j,k])
Let y = Exp[ -D/T(t) ]
Let x = Random [0..1]
/* a uniform rand in 0..1 */
If ( y > x) c(i,j,k) = q;
end for every pixel
if the other processors with neighor pixels
have finished the update, exchange edge pixels
between processors
end for colors
end for t
collect the result by one of the machine
5/30/00
32
Method2--Ensemble parallel
Multiple trials--the simultaneous execution of N
instances of the same algorithm over the same dataset
seeded different starting configurations. If these
ensemble instances are driven from the same prior,
then there is evidence that they will converge locally to
a single instance, offering the possibility that K
ensemble instances can be collapsed into 1 instance
over identified point sets in the volume.
5/30/00
33
Method3--locally adaptive cooling schedule
based on multiresolution tiling
 Site aging
 multiresolution tiling with self-similar sets
 local moments
 Rank temperature based on tile
5/30/00
34
Site aging
 Observation---Background Area
 If the possibility that the configuration of site’s
neighborhood changes over iterations once the the
schedule had reached an inverse temperature given by
N0 is a small value, then we can change the visitation
schedule so that that site does not invoke the sampler
process once the temperature exceeds N0
5/30/00
35
Seek heuristic that predicts the
inverse temperature
 Need a flexible and tunable plane partition
Multiresolution tilling with self-similar sets
 Develop a function that reflect the complexity of the
data in the sense of classification
Wavelet decomposition---Harr wavelet
local moments
 rank the cooling temperature based on the functions
5/30/00
36
Multiresolution tilings --1
 A well-established theory by Mathematicians
Dilation Transformation
Translation transformation
Multiresolution analysis
scaling function
definition of wavelet
 If a multiresolution analysis’s scaling function is the
characteristic function of a measurable set Q and
|Q|=1, then Q can tile a plane.
5/30/00
37
Multiresolution tilings --2
 If |Q|=1, Q is the attractor of an affine transformation.
Q can be solved by the iteration
Qn+1=Union(invA(Qn+k). Each intermediate Qn can tile a
plane as well. K belongs to coset representative set. The
union is over the complete coset.
 So we have a geometrically tunable, irregular system
over which heuristics for the cooling schedule can be
developed
5/30/00
38
Local Moments
 Local moments definition
We call the number
p q
m pq
Α   x y f(x, y)dμ
A
The (p, q)' th local moment of f with respect to A.
 first moment--mean
 second moment--variance
 higher moments
5/30/00
39
Heuristic
 Choose a tile( inverse problem, but we are considering
to choose one from tile banks based on some similarity
measures)
 wavelet decomposition over tilling--tree presentation
 Ranking
 Assign cooling schedule
5/30/00
40
Evaluation Plan
 Validation test
Apply the algorithm to synthetic image with known
priors
comparison with existing classification result
 Speed up
 efficiency
 scalability
5/30/00
41
Summary of research goals and
expected contributions
 Development of a SIMD MRF/Gibbs classification
algorithm based on independent set partition
 Rigorous medical image classification result based on
this algorithm with random initial configuration
 Rigorous result of the speedup, efficiency about this
algorithm
 Exploration of ensemble parallel algorithm in the
application of medical image classification
 Exploration of multiresolution tiling on locally adaptive
cooling schedule
5/30/00
42
Research facilities
 In general, the research facilities for this study is
available or can be accessed in Computer Science
and Engineering Department and Booth Research
Center at UConn.
Hardware--SUN, PC, SGI workstation
Software--mobile agent, MPI, openMP,...
Network--100BaseTEthernet,Gigabit Ethernet,OC3ATM...
 The data that will be used are Visible Human Data,
which Image Processing Lab has the license to
acquire this data.
 Super computer facilities---NPACI
5/30/00
43
Bibliography--1








[1]
The
visible
Human
Dataset,
National
Library
Medicine,
http://www.nlm.nih.gov/pubs/factsheets/visible_human.html
[2] Stuart Geman and Donald Geman, “Stochastic Relaxation, Gibbs Distributions
and the Bayesian Restoration of Images,”IEEE PAMI Vol 6, No.6, 1984
[3] Ming chuang Zhu, Robert M. Haralick, James B. Campbell, “Multispectral Image
Context Classification Using Stochastic Relaxation,” IEEE PAMI Vol. 20, No. 1, 1990
[4] T. Degget, I.R. Greenshields and G. Weerasinghe, “Asynchronous, parallel pseudo
Gibbs Classification of the VF Dataset,” Proceedings of twelfth IEEE symposium on
computer-based medical systems
[5] Gerhard Winkler, “Image Analysis, Random fields and Dynamic Monte Carlo
Methods--A Mathematical introduction,” Springer, 1995
[6] W.T. Tutte, “Graph Theory, Encyclopedia of Mathematics and its Applications,”
Cambridge University Press, 1984
[7] Laurent Younes, “Synchronous Random Fields and Image Restoration,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No.4, April 1998,
pp. 380-390
[8] Soo-Young Lee, Kyung Geun Lee, “Synchronous and Asynchronous Parallel
Simulated Annealing with Multiple Markov Chains,” IEEE Transactions on Parallel and
Distributed Systems, Vol. 7, No. 10, October 1996, pp. 993-1008
5/30/00
44
Bibliography--2
[9] Hao Chen, Nicholas S. Flann, and Deniel W. Watson, “Parallel Genetic Simulated
Annealing: A Massively Parallel SIMD Algorithm,” IEEE Transactions on Parallel and
Distributed Systems, Vol. 9, No.2, February 1998, pp. 126-136
 [10] B. Hajek, “Cooling Schedules for Optimal Annealing,” Mathematics of Operations
Research, Vol 13, pp. 311-329, 1998
 [11] Andrew Sohn, “Parallel N-ary Speculative Computation of Simulated Annealing,
IEEE Transactions on Parallel and Distributed Systems,” Vol. 6, No. 10, October, 1995
pp. 997-1005
 [12] Tarek M. Nabhan and Albert Y. Zomaya, “A parallel Simulated Annealing
Algorithm with Low Communication Overhead,” Vol. 6, No. 12, December 1995, pp.
1226-1233
 [13] EE. Witte, R.D. Chambelain and M.A. Franklin, “Parallel Simulated Annealing
using Speculative Compuation, ” IEEE Transaction on Parallel and Distributed
Systems, Vol.2, No.4, pp483-495, April. 1991.
 [14] Beba C. Vemuri, Chhandomay Mandal, and Shang-Hong Lai, “A Fast Gibbs
Sampler for Synthesizing Constrained Fractals,” IEEE Transactions on Visualization
and Computer Graphics, Vol.3, No. 4, October-December 1997, pp.337—351
 [15] Fure-Ching Jeng, John W. Woods, and Sanjeev Rastoni, “Compound GaussMarkov Random Fields for Parallel Image Processing,” in Markov Random Fields—
Theory and Applications, pp. 11-38, Edited by Rama Chellappa and Anil Jain,
Academic Press, 1993
5/30/00
45

Bibliography--3
[16] Zhihong Yang, Ian R. Greenshields, “Volume Visible Human Data Classification with
Parallel Dynamic Monte Carlo Methods,” to appear, in The 4th World Multiconference on
Systemics, Cybernetics and Informatics and the 6th International Conference on
Information Systems, Analysis and Synthesis.
 [17] Madych, W., “Some Elementary Properties of Multiresolution Analyses of,” in
Wavelets: A Tutorial in Theory and Applications, Ed. C.K. Chui, Academic Press, 1992
 [18] Ian R. Greenshields, Zhihong Yang, “A Multigrid Approach to the Gibbsian
Classification of Mammograms,” to appear in the 13th IEEE symposium of Computer Based
Medical Systems.
 [19] Ian R. Greenshields, “Local Moments, Contractive IFS and Multiresolution
Decompositions of 3D Imagery,” Proceeding of Microscopy, Holography and
Interferometry in Biomedicine, SPIE Vol. 2083, 174-183, 1993
 [20] http://www.npaci.com
 [21] Danny B. Lange, Mitsuru Oshima, “Programming and Deploying Java Mobile Agents
with Aglets,” Addison-Wesley, 1998
 [22] Ian T. Foster, “Designing and Building Parallel Programs—Concepts and Tools for
Parallel Software Engineering,” Addison-Wesley Publishing Company, 1994
 [23] Joel A. Rosiene, Ph.D. dissertation, “Affine Transformations and Image
Representation,” the University of Connecticut, 1994
 [24] Thomas A. Daggett, Ph.D. dissertation, “MRF-Gibbs Context-Dependent Classification
on a Small-scale Cluster Computing System,” the University of Connecticut, 1998
5/30/00
46

Download