Lecture 7 µ σ µ µ

advertisement
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
Lecture 7
Last time: Moments of the Poisson distribution from its generating function.
G ( s ) = e µ ( s −1)
dG
= µ e µ ( s −1)
ds
d 2G
= µ 2e µ ( s −1)
2
ds
dG
=µ
X=
ds s =1
X2 =
d 2G
ds 2
+
s =1
dG
ds
s =1
= µ2 + µ
σ2 = X2 − X2
= µ2 + µ − µ2
=µ
=X
Example: Using telescope to measure intensity of an object
Photon flux Î photoelectron flux. The number of photoelectrons are Poisson
distributed. During an observation we cause N photoelectron emissions. N is
the measure of the signal.
S = N = λt
σ N2 = µ = λ t
λt
S
=
= λt
σN
λt
2
1⎛ S ⎞
t= ⎜
⎟
λ ⎝σN ⎠
For signal-to-noise ratio of 10, require N = 100 photoelectrons. All this follows
from the property that the variance is equal to the mean.
This is an unbounded experiment, whereas the binomial distribution is for n
number of trials.
9/30/2004 9:55 AM
Page 1 of 10
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
3. The Poisson Approximation to the Binomial Distribution
The binomial distribution, like the Poisson, is that of a random variable taking
only positive integral values. Since it involves factorials, the binomial
distribution is not very convenient for numerical application.
We shall show under what conditions the Poisson expression serves as a good
approximation to the binomial expression – and thus may be used for
convenience.
b( k ) =
n!
p k (1 − p ) n −k
k !( n − k )!
Consider a large number of trials, n, with small probability of success in each, p,
such that the mean of the distribution, np, is of moderate magnitude.
Define µ ≡ np with n large and p small
p=
µ
n
Recalling:
n ! ~ 2π n
n+
1
2 −n
e
⎛ µ⎞
e − µ = lim ⎜ 1 − ⎟
n →∞
n⎠
⎝
Stirling's formula
n
n!
µk
b( k ) =
k !( n − k )! n k
≈
=
≈
µk
⎛ µ⎞
⎜1 − ⎟
n⎠
⎝
2π n
n+
k!
2π ( n − k )
µk
n
k!
n
n+
1
2
n+
n −k
1
2 −n
n+
e
1
2
e− n+k n k
1
2 −n
e
⎛ k⎞
⎜1 − ⎟
⎝ n⎠
n −k +
1
2
ek
⎛ µ⎞
⎜1 − ⎟
n⎠
⎝
⎛ µ⎞
⎜1 − ⎟
n⎠
⎝
n −k
n −k
µ k e− µ
as n becomes large relative to k
k ! e − k ek
1
= µ k e− µ
k!
The relative error in this approximation is of order of magnitude
(k − µ )2
Rel. Error ~
n
9/30/2004 9:55 AM
Page 2 of 10
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
However, for values of k much smaller or larger than µ , the probability becomes
small.
The Normal Distribution
Outline:
1. Describe the common use of the normal distribution
2. The practical employment of the Central Limit Therorem
3. Relation to tabulated functions
Normal distribution function
Normal error function
Complementary error function
1. Describe the common use of the normal distribution
Normally distributed variables appear repeatedly in physical situations.
• Voltage across the plate of a vacuum tube
• Radar angle tracking noise
• Atmospheric gust velocity
• Wave height in the open sea
2. The practical employment of the Central Limit Therorem
X i (i = 1, 2,..., n ) are independent random variables.
Define the sum of these Xi as
n
S = ∑ Xi
i =1
n
S = ∑ Xi
i =1
n
σ S = ∑ σ X2
2
i =1
i
Then under the condition
lim
n →∞
β
=0
σ S3
n
β = ∑βX
i =1
i
βX = ( Xi − Xi )
3
i
9/30/2004 9:55 AM
Page 3 of 10
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
the limiting distribution of S is the normal distribution. Note that this is true for
any distributions of the Xi.
These are sufficient conditions under which the theorem can be proved. It is not
clear that they are necessary.
Notice that each of the noises mentioned earlier depend on the accumulated
effect of a great many small causes e.g., voltage across plate: electrons traveling
from cathode to plate.
It is convenient to work with the characteristic function since we are dealing with
the sum of independent variables.
Normal probability density function:
1 −
e
2π
f ( x) =
( x − m )2
2σ 2
Normal probability distribution function:
x
F ( x) =
−
1
e
2π σ
∫
−∞
( u − m )2
2σ 2
du
x −m
=
2
σ
∫
−∞
v
−
1
e 2 dv
2π σ
Where:
m= X
v=
dv =
u−m
σ
1
σ
du
This integral with the integrand normalized is tabulated. It is called the normal
probability function and symbolized with Φ .
9/30/2004 9:55 AM
Page 4 of 10
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
2
x
1 − v2
Φ( x) = ∫
e dv
π
2
−∞
This is a different x. Note the relationship between this and the quantity x
previous defined. We use x again here as this is how Φ is usually written.
Not only this function but also its first several derivatives which appear in
analytic work are tabulated.
3. Relation to tabulated functions
Even more generally available are the closely related functions:
Error function:
x
2
− u2
erf ( x ) =
∫ e du
π
0
Complementary error function:
∞
2
− u2
cerf ( x ) =
∫ e du
π
Φ( x) =
x
1⎡
⎛ x ⎞⎤
1 + erf ⎜
⎟⎥
⎢
2⎣
⎝ 2 ⎠⎦
1
φ (t ) =
2π σ
1
=
2π
∞
∫e
jtx
e
−
( x − m )2
2σ 2
dx
−∞
∞
∫e
jt ( m +σ y )
e
−
y2
2
dy , where y =
−∞
x−m
σ
2
∞
y
−
1
e jtm ∫ (cos ( tσ y ) + j sin ( tσ y ))e 2 dy
=
2π
−∞
2
∞
y
−
2 jtm
e ∫ cos ( tσ y ) e 2 dy
=
2π
−∞
−
=
t 2σ 2
2
2 jtm π e
e
2π
2
⎛
t 2σ 2 ⎞
⎜⎜ jtm −
⎟
2 ⎟⎠
⎝
=e
Differentiation of this form will yield correctly the first 2 moments of the
distribution.
9/30/2004 9:55 AM
Page 5 of 10
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
Most important property of normal variables: any linear combination (weighted
sum) of normal variables, whether independent or not, is another normal
variable.
Note that for zero mean variables
x2
− 2
1
f ( x) =
e 2σ
2π σ
φ (t ) = e
−
σ 2t 2
2
Both are Gaussian forms.
The Normal Approximation to the Binomial Distribution
The binomial distribution deals with the outcomes of n independent trials of an
experiment. Thus if n is large, we should expect the binomial distribution to be
well approximated by the normal distribution. The approximation is given by
the normal distribution having the same mean and variance. Thus
−
1
b( k , n , p ) ≈
e
2π npq
( k − np )2
2 npq
( k − np )3
( npq) 2
The relative fit is good near the mean if npq is large and degenerates in the tails
where the probability itself is small.
Relative error is of the order of
The Normal Approximation to the Poisson Distribution
Also the Poisson distribution depends on the outcomes of independent events. If
there are enough of them,
9/30/2004 9:55 AM
Page 6 of 10
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
−
1
P(k , µ ) ≈
e
2πµ
( k − µ )2
2µ
Relative error is of the order of
(k − µ )3
µ2
The relative fit is subject to the same behavior as the binomial approximation.
Interpretation of a continuous distribution approximating a discrete one:
The value of the normal density function at any k approximates the value of the
discrete distribution for that value of k. Think of spreading the area of each
impulse over a unit interval. Then the height of each rectangle is the probability
that the corresponding value of k will be taken. The normal curve approximates
this step-wise function.
Note that in summing the probabilities for values of k in some interval, the
approximating normal curve should be integrated over that interval plus ½ on
each end to get all the probability associated with those values of k.
P ( N1 ≤ X ≤ N 2 ) =
N2 +
N2
∑ P(k , µ ) ≈ ∫
k = N1
1
2
1
N1 −
2
N2
∑ P(k )
k = N1
−
1
e
2πµ
( x − µ )2
2µ
dx
1
1
⎛
⎞
⎛
⎞
⎜ N2 + 2 − µ ⎟
⎜ N1 − 2 − µ ⎟
= Φ⎜
⎟−Φ⎜
⎟
µ
µ
⎜
⎟
⎜
⎟
⎝
⎠
⎝
⎠
Multidimensional Normal Distribution
Probability density function:
1
⎡ 1
⎤
f ( x) =
exp ⎢ − ( x − X )Τ M −1 ( x − X ) ⎥
n
⎣ 2
⎦
2π 2 M
( )
9/30/2004 9:55 AM
Page 7 of 10
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
Assuming zero mean, which is often the case:
1
⎡ 1
⎤
f ( x) =
exp ⎢ − x Τ M −1 x ⎥
n
⎣ 2
⎦
( 2π ) 2 M
For zero mean variables: contours of constant probability density are given by:
x T M −1 x = c 2
Not expressed in principal coordinates if the Xi are correlated.
Need to know the rudimentary properties of eigenvalues and eigenvectors.
M is symmetric and full rank.
Mvi = λi vi
⎧1, i = j
vi T v j = δ ij = ⎨
⎩0, i ≠ j
This probability density function can be better visualized in terms of its principal
coordinates. These coordinates are defined by the directions of the eigenvectors
of the covariance matrix. The appropriate transformation is
y = Vx
⎡ ← v1T → ⎤
⎢
⎥
V =⎢
M
⎥
⎢ ← vnT → ⎥
⎣
⎦
Thus yi is the component of x in the direction vi .
(In terms of the new variable y , the contours of constant probability density are)
9/30/2004 9:55 AM
Page 8 of 10
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
x T M −1 x = y TV −T M −1V −1 y
= y T Y −1 y
Y −1 = V −T M −1V −1
Y = VMV T
⎡↑ ↑ ⎤
⎢
⎥
= VM ⎢ v1 ... vn ⎥
⎢↓ ↓ ⎥
⎣
⎦
⎡← v1T → ⎤ ⎡ ↑
↑ ⎤
⎥
⎢
⎥⎢
= ⎢ ... ⎥ ⎢λ1v1 ... λn vn ⎥
⎢← v1T → ⎥ ⎢ ↓
↓ ⎥⎦
⎣
⎦⎣
⎡λ1
⎤
⎢
⎥
O
=
⎢
⎥
λn ⎥⎦
⎢⎣
Y is the covariance matrix for the random variable Y = VX , so the λi are the
variance of the Yi.
⎡1
⎤
⎢λ
⎥
1
⎢
⎥
Y −1 = ⎢
O
⎥
⎢
1⎥
⎢
⎥
λn ⎦⎥
⎣⎢
⎡1
⎤
⎢λ
⎥
1
⎢
⎥
2
yT ⎢
O
⎥y=c
⎢
1⎥
⎢
⎥
λn ⎦⎥
⎣⎢
y T Y −1 y =
y12
λ1
+
y22
λ2
+ ... +
yn2
λn
= c2
These are the principal coordinate with intercepts at yi = ± c λi with
λi the
standard deviation of yi.
Note that two random variables, each having a normal distribution singly, do not
necessarily have a binormal joint distribution.
However, if the random variables are independent and normally distributed,
their joint distribution is clearly a multidimensional normal distribution.
9/30/2004 9:55 AM
Page 9 of 10
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
Two dimensional case- continued:
mij = X i X j
f 2 ( x1 , x2 ) =
1
2π m11m22 − m12 2
⎡ m x 2 − 2m12 x1 x2 + m11 x2 2 ⎤
exp ⎢ − 22 1
⎥
2(m11m22 − m12 2 )
⎣
⎦
m11 = σ 12
m22 = σ 22
m12 = µ12
ρ12 =
µ12
σ xσ y
In terms of these symbols:
2
⎡ ⎛ x ⎞2
⎛ x1 ⎞ ⎛ x2 ⎞ ⎛ x2 ⎞ ⎤
1
⎢ ⎜ ⎟ − 2 ρ12 ⎜ ⎟ ⎜ ⎟ + ⎜ ⎟ ⎥
σ
1
⎝ σ1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥
exp ⎢ − ⎝ 1 ⎠
f 2 ( x1 , x2 ) =
⎢
⎥
2
2(1 − ρ12 2 )
2πσ 1σ 2 1 − ρ12
⎢
⎥
⎢⎣
⎥⎦
Note that if a set of random variables having the multidimensional normal
distribution is uncorrelated, they are independent. This is not true in general.
9/30/2004 9:55 AM
Page 10 of 10
Download