Power Point

```Lecture 3
Math &amp; Probability
Background
ch. 1-2 of Machine Vision by Wesley E. Snyder &amp; Hairong Qi
Spring 2012
BioE 2630 (Pitt) : 16-725 (CMU RI)
18-791 (CMU ECE) : 42-735 (CMU BME)
Dr. John Galeotti
The content of these slides by John Galeotti, &copy; 2012 Carnegie Mellon University (CMU), was made possible in part by NIH NLM contract#
license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative Commons, 171 2nd Street, Suite 300, San
Francisco, California, 94105, USA. Permissions beyond the scope of this license may be available either from CMU or by emailing
[email protected]
The book is an overview of many concepts
Top quality design requires:
Experimentation &amp; validation
2
Two themes
Consistency
A conceptual tool implemented in many/most
algorithms
Often must fuse information from local
measurements to make global conclusions about
the image
Optimization
Mathematical mechanism
The “workhorse” of machine vision
3
Image Processing Topics
Enhancement
Coding
Compression
Restoration
“Fix” an image
Reconstruction
4
Machine Vision Topics
Original
Image

Classification &amp;
Further Analysis
AKA:




Feature
Extraction
Computer vision
Image analysis
Image understanding
Pattern recognition:
Our Focus
1. Measurement of features
Features characterize the image, or some part of it
2. Pattern classification
Requires knowledge about the possible classes
5
Feature measurement
Original Image
Restoration
Ch. 9
Ch. 6-7
Noise removal
Ch. 8
Segmentation
Shape Analysis
Consistency Analysis Ch. 10-11
Matching
Features
Ch. 12-16
6
Probability
Probability of an event a occurring:
 Pr(a)
Independence
 Pr(a) does not depend on the outcome of event b, and
vice-versa
Joint probability
 Pr(a,b) = Prob. of both a and b occurring
Conditional probability
 Pr(a|b) = Prob. of a if we already know the outcome of
event b
 Read “probability of a given b”
7
Probability for continuouslyvalued functions
Probability distribution function:
P(x) = Pr(z&lt;x)
Probability density function:
d
p ( x) = P ( x)
dx
&ograve;
&yen;
-&yen;
p ( x ) dx = 1
8
Linear algebra
v = [ x1 x2 x3 ]
T
a T b = &aring; ai bi
x = x Tx
i
Unit vector: |x| = 1
Orthogonal vectors: xTy = 0
Orthonormal: orthogonal unit vectors
Inner product of continuous functions
f ( x ), g ( x ) =
&ograve; f ( x) g ( x) dx
b
a
 Orthogonality &amp; orthonormality apply here too
9
Linear independence
No one vector is a linear combination of the others
 xj  aixi for any ai across all i  j
Any linearly independent set of d vectors {xi=1…d}
is a basis set that spans the space d
 Any other vector in d may be written as a linear
combination of {xi}
Often convenient to use orthonormal basis sets
Projection: if y=aixi then ai=yTxi
10
Linear transforms
= a matrix, denoted e.g. A
T
x Ax
d T
T
x
Ax
=
A
+
A
x
(
)
(
)
dx
Positive definite:
Applies to A if
xT Ax &gt; 0 &quot;x &Icirc; &Acirc;d , x &sup1; 0
11
More derivatives
Of a scalar function of x:
Really important!
df &eacute; &para;f &para;f
=&ecirc; dx &euml; &para;x1 &para;x2
T
&para;f &ugrave;
&uacute;
&para;xd &ucirc;
Of a vector function of x
Called the Jacobian
&eacute;
&ecirc;
&ecirc;
df &ecirc;
= dx &ecirc;
&ecirc;
&ecirc;
&euml;
&para;f1
…
&para;x1
&para;fm
…
&para;x1
&eacute; &para;2 f
&para;2 f
&ecirc;
…
2
&para;x1&para;x2
&ecirc; &para;x1
&ecirc; &ecirc;
2
2
&para;
f
&para;
f
&ecirc;
…
&ecirc; &para;xd&para;x1 &para;xd &para;x2
&euml;
Hessian = matrix of 2nd derivatives of a
scalar function
&para;f1 &ugrave;
&uacute;
&para;xd &uacute;
&uacute;
&uacute;
&para;fm &uacute;
&para;xd &uacute;
&ucirc;
&para;2 f &ugrave;
&uacute;
&para;x1&para;xd &uacute;
&uacute;
&uacute;
2
&para; f &uacute;
&para;xd2 &uacute;
&ucirc;
12
Misc. linear algebra
Derivative operators
Eigenvalues &amp; eigenvectors
 Translates “most important vectors”
 Of a linear transform (e.g., the matrix A)
 Characteristic equation: Ax = lx l &Icirc; &Acirc;
 A maps x onto itself with only a change in length
  is an eigenvalue
 x is its corresponding eigenvector
13
Function minimization
Find the vector x which produces a minimum of
some function f (x)
 x is a parameter vector
 f(x) is a scalar function of x
 The “objective function”
The minimum value of f is denoted:
f ( x) = min x f ( x)
The minimizing value of x is denoted:
x̂ = argmin x f ( x)
14
Numerical minimization
 The derivative points away from the minimum
 Take small steps, each one in the “down-hill” direction
Local vs. global minima
Combinatorial optimization:
 Use simulated annealing
Image optimization:
 Use mean field annealing
15
Markov models
For temporal processes:
 The probability of something happening is dependent on
a thing that just recently happened.
For spatial processes
 The probability of something being in a certain state is
dependent on the state of something nearby.
 Example: The value of a pixel is dependent on the values
of its neighboring pixels.
16
Markov chain
Simplest Markov model
Example: symbols transmitted one at a time
 What is the probability that the next symbol will be w?
For a Markov chain:
 “The probability conditioned on all of history is identical
to the probability conditioned on the last symbol
17
Hidden Markov models
(HMMs)
1st Markov
Process
f (t)
2nd Markov
Process
f (t)
18
HMM switching
Governed by a finite state machine (FSM)
Output
1st
Process
Output
2nd
Process
19