Multiple Random Variables

advertisement
Multiple Random Variables
Joint Probability Density
Let X and Y be two random variables. Their joint distribution
( )
( )
( −∞,−∞ ) = F ( x,−∞ ) = F ( −∞, y ) = 0
F ( ∞,∞ ) = 1
function is FXY x, y ≡ P ⎡⎣ X ≤ x ∩ Y ≤ y ⎤⎦ .
0 ≤ FXY x, y ≤ 1 , − ∞ < x < ∞ , − ∞ < y < ∞
FXY
( )
XY
XY
XY
FXY x, y does not decrease if either x or y increases or both increase
(
)
( )
(
)
()
FXY ∞, y = FY y and FXY x,∞ = FX x
Joint Probability Density
Joint distribution function for tossing two dice
Joint Probability Density
f XY
( )
( ( ))
∂2
x, y =
FXY x, y
∂ x∂ y
( )
f XY x, y ≥ 0 , − ∞ < x < ∞ , − ∞ < y < ∞
∞ ∞
y
∫ ∫ f ( x, y ) dxdy = 1
( ) ∫ ∫ f (α , β ) dα d β
FXY x, y =
XY
−∞ −∞
( ) ∫ f ( x, y ) dy
XY
∞
P ⎡⎣ x1 < X ≤ x2 , y1 < Y ≤ y2 ⎤⎦ =
((
( ) ∫ f ( x, y ) dx
and fY y =
−∞
E g X ,Y
XY
−∞ −∞
∞
fX x =
x
XY
−∞
y2 x2
∫ ∫ f ( x, y ) dxdy
XY
y1 x1
∞ ∞
)) = ∫ ∫ g ( x, y ) f ( x, y ) dxdy
XY
−∞ −∞
Combinations of Two Random
Variables
Example
X and Y are independent, identically distributed (i.i.d.) random
variables with common PDF
()
() ( )
f X x = e− x u x
( )
fY y = e − y u y
Find the PDF of Z = X / Y.
Since X and Y are never negative, Z is never negative.
()
()
()
FZ z = P ⎡⎣ X / Y ≤ z ⎤⎦ ⇒ FZ z = P ⎡⎣ X ≤ zY ∩ Y > 0 ⎤⎦ + P ⎡⎣ X ≥ zY ∩ Y < 0 ⎤⎦
Since Y is never negative FZ z = P ⎡⎣ X ≤ zY ∩ Y > 0 ⎤⎦
Combinations of Two Random
Variables
∞ zy
() ∫
FZ z =
∞ zy
( )
−x − y
f
x,
y
dxdy
=
e
∫ XY
∫ ∫ e dxdy
−∞ −∞
0 0
Using Leibnitz’s formula for differentiating an integral,
b( z )
b( z )
⎡
⎤
db
z
da
z
∂ g x, z
d
⎢ ∫ g x, z dx ⎥ =
g b z ,z −
g a z ,z + ∫
dx
dz ⎢ a( z )
dz
dz
∂z
⎥
a
z
(
)
⎣
⎦
()
( )
fZ
fZ
∞
(() )
∂
z = FZ z = ∫ ye− zy e− y dy , z > 0
∂z
0
()
()
u(z)
(z) =
( z + 1)
2
()
(() )
( )
Combinations of Two Random
Variables
Combinations of Two Random
Variables
Example
The joint PDF of X and Y is defined as
⎧6x , x ≥ 0, y ≥ 0, x + y ≤ 1
f XY x, y = ⎨
⎩0 , otherwise
Define Z = X − Y. Find the PDF of Z.
( )
Combinations of Two Random
Variables
Given the constraints on X and Y , − 1 ≤ Z ≤ 1.
1+ Z
1− Z
Z = X − Y intersects X + Y = 1 at X =
, Y=
2
2
(1− z )/2 1− y
(1− z )/2
1− y
2
For 0 ≤ z ≤ 1, FZ z = 1− ∫ ∫ 6xdxdy = 1− ∫ ⎡⎣3x ⎤⎦ dy
()
)(
0
)
y+ z
3
3
2
FZ z = 1− 1− z 1− z ⇒ f Z z = 1− z 1+ 3z
4
4
()
(
()
y+ z
0
(
)(
)
Combinations of Two Random
Variables
For − 1 ≤ z ≤ 0
(1− z )/2 y+ z
(1− z )/2
(1− z )/2
y+ z
2
2
FZ z = 2 ∫ ∫ 6xdxdy = 6 ∫ ⎡⎣ x ⎤⎦ dy = 6 ∫ y + z dy
()
−z
1+ z )
(
F (z) =
0
3
Z
4
⇒ fZ
(
(z) =
−z
3 1+ z
4
0
)
2
−z
(
)
Joint Probability Density
Let f XY
⎛ x − X0 ⎞
⎛ y − Y0 ⎞
1
x, y =
rect ⎜
rect ⎜
⎟
⎟
wX wY
⎝ wX ⎠
⎝ wY ⎠
( )
∞ ∞
( ) ∫ ∫ x f ( x, y ) dxdy = X
E (Y ) = Y
E X =
XY
0
−∞ −∞
0
∞ ∞
( ) ∫ ∫ xy f ( x, y ) dxdy = X Y
E XY =
XY
0 0
−∞ −∞
∞
⎛ x − X0 ⎞
1
x, y dy =
rect ⎜
⎟
wX
⎝ wX ⎠
( ) ∫f ( )
fX x =
XY
−∞
Joint Probability Density
Conditional Probability
{
Let A = Y ≤ y
FX |A
}
(
)
P ⎡⎣ X ≤ x ∩ A⎤⎦
x =
P ⎡⎣ A⎤⎦
()
( )
( )
P ⎡⎣ X ≤ x ∩ Y ≤ y ⎤⎦ FXY x, y
FX | Y ≤ y x =
=
P ⎡⎣Y ≤ y ⎤⎦
FY y
Let A = y1 < Y ≤ y2
()
{
}
()
FX | y <Y ≤ y x =
1
2
(
) ( )
F (y )− F (y )
FXY x, y2 − FXY x, y1
Y
2
Y
1
Joint Probability Density
{
Let A = Y = y
FX | Y = y
FX | Y = y
}
( ( ))
( ( ))
∂
FXY x, y
FXY x, y + Δy − FXY x, y
∂y
x = lim
=
Δy→0
d
FY y + Δy − FY y
FY y
dy
∂
FXY x, y
f XY x, y
∂
∂y
x =
, f X |Y = y x =
FX | Y = y x =
∂x
fY y
fY y
(
()
()
)
)
(
( ( ))
( )
( )
Similarly fY |X = x y =
( )
f ( x)
f XY x, y
X
( )
( )
()
(
( ))
( )
( )
Joint Probability Density
f ( x, y )
(
)
In a simplified notation f ( x ) =
and f ( y ) =
f ( y)
f ( x)
Bayes’ Theorem f ( x ) f ( y ) = f ( y ) f ( x )
f XY x, y
XY
X |Y
Y |X
Y
X |Y
Y
X
Y |X
X
Marginal PDF’s from joint or conditional PDF’s
∞
∞
( ) ∫ f ( x, y ) dy = ∫ f ( x ) f ( y ) dy
fX x =
XY
X |Y
−∞
−∞
∞
∞
Y
( ) ∫ f ( x, y ) dx = ∫ f ( y ) f ( x ) dx
fY y =
XY
−∞
Y |X
−∞
X
Joint Probability Density
Example:
Let a message X with a known PDF be corrupted by additive
noise N also with known pdf and received as Y = X + N.
Then the best estimate that can be made of the message X is
the value at the peak of the conditional PDF,
()
f X |Y x =
( ) ()
f ( y)
fY |X y f X x
Y
Joint Probability Density
Let N have the PDF,
Then, for any known value of X,
the PDF of Y would be
()
Therefore if the PDF of N is f N n , the conditional PDF of Y given
(
X is f N y − X
)
Joint Probability Density
Using Bayes’ theorem,
f X |Y
f ( y − x)f ( x)
(
)
(
)
( x) = f ( y) = f ( y)
f ( y − x)f ( x)
f ( y − x)f ( x)
=
=
∫ f ( y ) f ( x ) dx ∫ f ( y − x ) f ( x ) dx
fY |X y f X x
N
Y
Y
N
X
∞
N
X
N
X
∞
Y |X
−∞
X
X
−∞
Now the conditional PDF of X given Y can be computed.
Joint Probability Density
To make the example concrete let
− x/ E X
e ( )
1
− n2 /2 σ N2
fX x =
u x
fN n =
e
E X
σ N 2π
()
(
)
( )
()
Then the conditional pdf of X given Y is found to be
⎡ σ2
y
N
exp ⎢ 2
−
E X
⎢⎣ 2 E X
y =
2E X
⎤⎡
⎛
σ N2
⎥⎢
⎜ y− E X
⎥⎦ ⎢
⎜
fY
1+
erf
⎢
⎜
2σ N
⎢
⎜
⎢⎣
⎝
where erf is the error function.
( )
( ) ( )
( )
( )
⎞⎤
⎟⎥
⎟⎥
⎟⎥
⎟⎥
⎠ ⎥⎦
Joint Probability Density
Independent Random Variables
If two random variables X and Y are independent then
()
f X |Y x = f X
f ( x, y )
(
)
( x ) = f ( y ) and f ( y ) = f ( y ) = f ( x )
( x, y ) = f ( x ) f ( y ) and their correlation is the product
f XY x, y
XY
Y |X
Y
Y
Therefore f XY
X
X
Y
of their expected values.
∞ ∞
∞
∞
( ) ∫ ∫ xy f ( x, y ) dxdy = ∫ y f ( y ) dy ∫ x f ( x ) dx = E ( X ) E (Y )
E XY =
XY
−∞ −∞
Y
−∞
X
−∞
Independent Random Variables
Covariance
σ XY
( )
( )
*
⎛
≡ E ⎡⎣ X − E X ⎤⎦ ⎡⎣Y − E Y ⎤⎦ ⎞
⎝
⎠
∫ ∫ ( x − E ( X ))( y − E (Y )) f ( x, y ) dxdy
= E ( XY ) − E ( X ) E (Y )
σ XY =
∞ ∞
*
*
XY
−∞ −∞
σ XY
*
*
( ) ( ) ( ) ( )
If X and Y are independent, σ XY = E X E Y * − E X E Y * = 0
Independent Random Variables
Correlation Coefficient
( )
( )
ρ XY
⎛ X −E X
Y* − E Y* ⎞
= E⎜
×
⎟
⎜⎝
⎟⎠
σX
σY
ρ XY
⎛ x − E X ⎞ ⎛ y* − E Y * ⎞
= ∫ ∫⎜
⎟ f XY x, y dxdy
⎟⎜
⎟⎠
σ X ⎠ ⎜⎝
σY
−∞ −∞ ⎝
ρ XY =
∞ ∞
(
( )
) ( ) ( )=
E XY * − E X E Y *
σ XσY
( )
( )
σ XY
σ XσY
If X and Y are independent ρ = 0. If they are perfectly positively
correlated ρ = +1 and if they are perfectly negatively correlated
ρ = − 1.
Independent Random Variables
If two random variables are independent, their covariance is
zero. However, if two random variables have a zero covariance
that does not mean they are necessarily independent.
Independence ⇒ Zero Covariance
Zero Covariance ⇒ Independence
Independent Random Variables
In the traditional jargon of random variable analysis, two
“uncorrelated” random variables have a covariance of zero.
Unfortunately, this does not also imply that their correlation is zero.
If their correlation is zero they are said to be orthogonal.
X and Y are "Uncorrelated" ⇒ σ XY = 0
( )
X and Y are "Uncorrelated" ⇒ E XY = 0
Independent Random Variables
The variance of a sum of random variables X and Y is
σ 2X +Y = σ 2X + σ Y2 + 2σ XY = σ 2X + σ Y2 + 2 ρ XY σ X σ Y
If Z is a linear combination of random variables X i
N
Z = a0 + ∑ ai X i
i=1
N
( )
( )
then E Z = a0 + ∑ ai E X i
i=1
N
N
N
N
N
σ Z2 = ∑ ∑ ai a jσ X X = ∑ ai2σ 2X + ∑ ∑ ai a jσ X X
i=1 j=1
i
j
i=1
i
i=1 j=1
i≠ j
i
j
Independent Random Variables
If the X’s are all independent of each other, the variance of
the linear combination is a linear combination of the variances.
N
σ Z2 = ∑ ai2σ 2X
i=1
i
If Z is simply the sum of the X’s, and the X’s are all independent
of each other, then the variance of the sum is the sum of the
variances.
N
σ Z2 = ∑ σ 2X
i=1
i
One Function of Two Random
Variables
(
)
Let Z = g X ,Y . Find the pdf of Z.
()
(
)
(
)
FZ z = P ⎡⎣ Z ≤ z ⎤⎦ = P ⎡⎣ g X ,Y ≤ z ⎤⎦ = P ⎡⎣ X ,Y ∈RZ ⎤⎦
where RZ is the region in the XY plane where g X ,Y ≤ z
For example, let Z = X + Y
(
)
Probability Density of a Sum of
Random Variables
Let Z = X + Y. Then for Z to be less than z, X must be less
than z − Y. Therefore, the distribution function for Z is
∞ z− y
( ) ∫ ∫ f ( x, y ) dxdy
FZ z =
XY
−∞ −∞
∞
⎛ z− y
⎞
y ⎜ ∫ f X x dx ⎟ dy
⎝ −∞
⎠
( ) ∫f ( )
If X and Y are independent, FZ z =
Y
−∞
()
∞
( ) ∫ f ( y ) f ( z − y ) dy = f ( z ) ∗ f ( z )
and it can be shown that f Z z =
Y
−∞
X
Y
X
Moment Generating Functions
()
The moment-generating function Φ X s of a CV random variable
()
∞
( ) ∫ f ( x)e
X is defined by Φ X s = E esX =
sx
X
−∞
dx.
()
()
Relation to the Laplace transform → Φ X s = L ⎡⎣ f X x ⎤⎦
s→− s
( ( ))
∞
d
Φ X s = ∫ f X x xesx dx
ds
−∞
()
∞
⎡d
⎤
⎢ ds Φ X s ⎥ = ∫ x f X x dx = E
⎣
⎦ s→0 −∞
n
⎡
d
Relation to moments → E X n = ⎢ n Φ X s
⎣ ds
( ( ))
()
( )
(X)
( ( ))
⎤
⎥
⎦ s→0
Moment Generating Functions
()
The moment-generating function Φ X z of a DV random variable
( ) ∑ P ⎡⎣ X = n ⎤⎦ z = ∑ p z .
Relation to the z transform → Φ ( z ) = Z ( P ( n ))
d
d
Φ ( z ) = E ( Xz )
Φ ( z ) = E ( X ( X − 1) z )
dz
dz
()
X is defined by Φ X z = E z X =
∞
n=−∞
X
X −1
X
n
n=−∞
X
n
n
z→z −1
2
2
∞
X −2
X
⎧⎡ d
⎤
⎪⎢ Φ X z ⎥ = E X
⎦ z=1
⎪ ⎣ dz
Relation to moments → ⎨ 2
⎪⎡ d Φ z ⎤ = E X 2 − E X
⎥
⎪ ⎢⎣ dz 2 X
⎦ z=1
⎩
()
( )
()
( ) ( )
The Chebyshev Inequality
For any random variable X and any ε > 0,
−( µ X + ε )
∞
P ⎡⎣ X − µ X ≥ ε ⎤⎦ = ∫ f X x dx + ∫ f X x dx = ∫
−∞
()
µ X +ε
()
X − µ X ≥ε
()
f X x dx
Also
σ =
2
X
∞
∫ ( x − µ ) f ( x ) dx ≥ ∫
2
X
−∞
X
( x − µ ) f ( x ) dx ≥ ε ∫
2
X − µ X ≥ε
X
2
X
X − µ X ≥ε
()
f X x dx
It then follows that P ⎡⎣ X − µ X ≥ ε ⎤⎦ ≤ σ 2X / ε 2
This is known as the Chebyshev inequality. Using this we can put a bound
on the probability of an event with knowledge only of the variance and no
knowledge of the PMF or PDF.
The Markov Inequality
()
For any random variable X let f X x = 0 for all X < 0 and let ε be a postive
constant. Then
E ⎡⎣ X ⎤⎦ =
∞
∞
∞
∞
∫ x f ( x ) dx = ∫ x f ( x ) dx ≥ ∫ x f ( x ) dx ≥ ε ∫ f ( x ) dx = ε P ⎡⎣ X ≥ ε ⎤⎦
X
−∞
Therefore P ⎡⎣ X ≥ ε ⎤⎦ ≤
X
0
X
ε
X
ε
( ) . This is known as the Markov inequality.
E X
ε
It allows us to bound the probability of certain events with knowledge
only of the expected value of the random variable and no knowledge of the
PMF or PDF except that it is zero for negative values.
The Weak Law of Large
Numbers
{
Consider taking N independent values X 1 , X 2 ,, X N
} from a random
variable X in order to develop an understanding of the nature of X . They
1 N
constitute a sampling of X . The sample mean is X N = ∑ X n . The sample
N n=1
size is finite, so different sets of N values will yield different sample means.
Thus X N is itself a random variable and it is an estimator of the expected
( )
value of X , E X . A good estimator has two important qualities. It is
( )
( )
unbiased and consistent. Unbiased means E X N = E X . Consistent means
that as N is increased the variance of the estimator is decreased.
The Weak Law of Large
Numbers
Using the Chebyshev inequality we can put a bound on the probable
deviation of X N from its expected value.
( )
σ 2X
P ⎡ X − E X N ≥ ε ⎤ ≤ 2N
⎣
⎦ ε
This implies that
σ 2X
=
, ε >0
2
Nε
σ 2X
P ⎡ X N − E X < ε ⎤ ≥ 1−
, ε >0
2
⎣
⎦
Nε
The probability that X N is within some small deviation from E X can be
( )
( )
made as close to one as desired by making N large enough.
The Weak Law of Large
Numbers
Now, in
σ 2X
P ⎡ X N − E X < ε ⎤ ≥ 1−
, ε >0
2
⎣
⎦
Nε
let N approach infinity.
( )
( )
lim P ⎡ X N − E X < ε ⎤ = 1 , ε > 0
⎦
N →∞ ⎣
The Weak Law of Large Numbers states that if X 1 , X 2 ,, X N
( )
{
sequence of iid random variable values and E X is finite, then
( )
lim P ⎡ X N − E X < ε ⎤ = 1 , ε > 0
⎦
N →∞ ⎣
This kind of convergence is called convergence in probability.
} is a
The Strong Law of Large
Numbers
{
}
Now consider a sequence X 1 , X 2 , of independent values of X and let
( )
a sequence of sample means { X , X ,} defined by X
X have an expected value E X and a finite variance σ 2X . Also consider
1
2
N
1 N
= ∑ X n . The
N n=1
Strong Law of Large Numbers says
( )
P ⎡ lim X N = E X ⎤ = 1
⎣ N →∞
⎦
This kind of convergence is called almost sure convergence.
The Laws of Large Numbers
The Weak Law of Large Numbers
( )
lim P ⎡ X N − E X < ε ⎤ = 1 , ε > 0
⎦
N →∞ ⎣
and the Strong Law of Large Numbers
( )
P ⎡ lim X N = E X ⎤ = 1
⎣ N →∞
⎦
seem to be saying about the same thing. There is a subtle difference.
It can be illustrated by the following example in which a sequence
converges in probability but not almost surely.
The Laws of Large Numbers
(
)
⎧⎪1 , k / n ≤ ζ < k + 1 / n , 0 ≤ k < n , n = 1,2,3,
Let X nk = ⎨
⎪⎩0 , otherwise
and let ζ be uniformly distributed between 0 and 1. As n increases from
one we get this "triangular" sequence of X 's.
X 10
X 20
X 21
X 30
X 31

X 32
{
}
Now let Yn( n−1)/2+ k +1 = X nk meaning that Y = X 10 , X 20 , X 21 , X 30 , X 31 , X 32 , .
X 10 is one with probability one. X 20 and X 21 are each one with probability
1/2 and zero with probability 1/2. Generalizing we can say that X nk is one
with probability 1/n and zero with probability 1− 1 / n.
The Laws of Large Numbers
Yn( n−1)/ 2+ k +1 is therefore one with probability 1 / n and zero with probability
1− 1 / n. For each n the probability that at least one of the n numbers in
each length-n sequence is one is
(
)
n
P ⎡⎣at least one 1⎤⎦ = 1− P ⎡⎣ no ones ⎤⎦ = 1− 1− 1 / n .
In the limit as n approaches infinity this probability approaches 1− 1 / e ≅ 0.632.
So no matter how large n gets there is a non-zero probability that at least one
1 will occur in any length-n sequence. This proves that the sequence Y does
not converge almost surely because there is always a non-zero probability that
a length-n sequence will contain a 1 for any n.
The Laws of Large Numbers
( )
The expected value E X nk is
( )
E X nk = P ⎡⎣ X nk = 1⎤⎦ × 1+ P ⎡⎣ X nk = 0 ⎤⎦ × 0 = 1 / n
and is therefore independent of k and approaches zero as n approaches
infinity. The expected value of X nk2 is
( )
( )
E X nk2 = P ⎡⎣ X nk = 1⎤⎦ × 12 + P ⎡⎣ X nk = 0 ⎤⎦ × 02 = E X nk = 1 / n
n −1
and the variance is X nk is 2 . So the variance of Y approaches zero
n
as n approaches infinity. Then according to the Chebyshev inequality
n −1
2
2
⎡
⎤
P ⎣ Y − µY ≥ ε ⎦ ≤ σ Y / ε = 2 2
nε
implying that as n approaches infinity the variation of Y gets steadily
smaller and that says that Y converges in probability to zero.
The Laws of Large Numbers
The Laws of Large Numbers
Consider an experiment in which we toss a fair coin and assign the value
1 to a head and the value 0 to a tail. Let N H be the number of heads, let
N be the number of coin tosses, let rH be N H / N and let X be the random
N
( )
variable indicating a head or tail. Then N H = ∑ X n , E N H = N / 2 and
( )
E rH = 1 / 2.
n=1
The Laws of Large Numbers
σ r2 = σ 2X / N ⇒ σ r = σ X / N Therefore rH − 1 / 2 generally approaches
H
H
zero but not smoothly or monotonically.
( )
σ N2 = N σ 2X ⇒ σ N = N σ X . Therefore N H − E N H does not approach
H
H
zero. So the variation of N H increases with N .
Convergence of Sequences of
Random Variables
We have already seen two types of convergence of sequences of random
variables, almost sure convergence (in the Strong Law of Large Numbers)
and convergence in probability (in the Weak Law of Large Numbers). Now
we will explore other types of convergence.
Convergence of Sequences of
Random Variables
Sure Convergence
{ ( )} converges surely to the random
A sequence of random variables X n ζ
()
()
variable X ζ if the sequence of functions X n ζ converges to the function
()
X ζ as n → ∞ for all ζ in S. Sure convergence requires that every possible
sequence converges. Different sequences may converge to different limits but
all must converge.
()
()
X n ζ → X ζ as n → ∞ for all ζ ∈S
Convergence of Sequences of
Random Variables
Almost Sure Convergence
{ ( )} converges almost surely to the
A sequence of random variables X n ζ
()
()
random variable X ζ if the sequence of functions X n ζ converges to the
()
function X ζ as n → ∞ for all ζ in S, except possible on a set of probability
zero.
()
()
P ⎡⎣ζ : X n ζ → X ζ as n → ∞ ⎤⎦ = 1
This is the convergence in the Strong Law of Large Numbers.
Convergence of Sequences of
Random Variables
Mean Square Convergence
{ ( )} converges in the mean - square
The sequence of random variables X n ζ
()
sense to the random variable X ζ if
( ( ) ( ))
2
⎡
E X n ζ − X ζ ⎤ → 0 as n → ∞
⎢⎣
⎥⎦
If the limiting random variable X ζ is not known we can use the Cauchy
()
{ ( )} converges in the
Criterion: The sequence of random variables X n ζ
()
mean - square sense to the random variable X ζ if and only if
( ()
( ))
E ⎡ Xn ζ − Xm ζ
⎢⎣
2
⎤ → 0 as n → ∞ and m → ∞
⎥⎦
Convergence of Sequences of
Random Variables
Convergence in Probability
{ ( )} converges in probability
The sequence of random variables X n ζ
()
P ⎡ X (ζ ) − X (ζ ) > ε ⎤ → 0 as n → ∞
⎣
⎦
to the random variable X ζ if, for any ε > 0
n
This is the convergence in the Weak Law of Large Numbers.
Convergence of Sequences of
Random Variables
Convergence in Distribution
{ }
The sequence of random variables X n with cumulative distribution
{ ( )} converges in distribution to the random variable X
functions Fn x
()
with cumulative distribution function F x if
()
()
Fn x → F x as n → ∞
()
for all x at which F x is continuous. The Central Limit Theorem (coming
soon) is an example of convergence in distribution.
Long-Term Arrival Rates
Suppose a system has a component that fails at time X 1 , it is replaced and
()
that component fails at time X 2 , and so on. Let N t be the number of
()
components that have failed at time t. N t is called a renewal counting
process. Let X j denote the lifetime of the jth component. Then the time
when the nth component fails is S n = X 1 + X 2 + + X n where we assume
that the X j are iid non-negative random
( )
( )
variables with 0 ≤ E X = E X j < ∞.
We call the X j 's the interarrival or cycle
times.
Long-Term Arrival Rates
Since the average interarrival time is E ( X ) seconds per event one would
expect intuitively that the average rate of arrivals is 1/E ( X ) events per
second.
SN(t ) ≤ t ≤ SN(t )+1
Dividing through by N ( t ) ,
SN(t )
N (t )
SN(t )
N (t )
≤
t
N (t )
≤
SN(t )+1
N (t )
is the average interarrival
time for the first N ( t ) arrivals.
Long-Term Arrival Rates
1 ()
=
Xj
∑
N ( t ) N ( t ) j=1
SN(t )
Similarly,
N t
SN(t )+1
N (t ) + 1
we can say lim
t→∞
t
As t → ∞, N ( t ) → ∞ and
→ E ( X ) . So from
N (t )
N (t )
1
lim
=
.
t→∞
t
E(X)
= E ( X ) and
SN(t )
N (t )
≤
SN(t )
N (t )
t
N (t )
→ E ( X ).
≤
SN(t )+1
N (t )
Long-Term Time Averages
Suppose that events occur at random with iid interarrival times X j and
that a cost C j is associated with each event. Let C j ( t ) be the cost
N(t )
accumulated up to time t. Then C j ( t ) = ∑ C j . The average cost up to
j=1
C ( t ) 1 N (t )
N ( t ) 1 N(t )
time t is
= ∑Cj =
C j . In the limit t → ∞,
∑
t
t j=1
t N ( t ) j=1
N t
N (t )
C (t ) E (C )
1
1 ()
→
and
C j → E ( C ) . Therefore lim
=
.
∑
t→∞
t
E(X)
N ( t ) j=1
t
E(X)
The Central Limit Theorem
N
Let YN = ∑ Xn where the Xn's are an iid sequence of random variable
n=1
values.
N
Let Z N =
YN
X
(
∑
− N E(X)
=
n=1
σX N
n
− E ( X ))
.
σX N
=0 


N
N 
⎛
⎞
( X n − E ( X )) ⎟ ∑ E ( X n − E ( X ))
⎜∑
n=1
E ( Z N ) = E ⎜ n=1
=0
⎟=
σX N
σX N
⎜
⎟
⎜⎝
⎟⎠
The Central Limit Theorem
σ
2
ZN
2
⎛ 1 ⎞
2
= ∑⎜
σ
⎟ X =1
N⎠
n=1 ⎝ σ
X
N
()
( )
(
( )) ⎞⎟ ⎞⎟ =
The MGF of Z N is Φ Z s = E e
N
ΦZ
ΦZ
N
N
sZ N
⎛ N
⎛ X −E X
n
s = E ⎜ ∏ exp ⎜ s
⎜⎝
⎜⎝ n−1
σX N
()
(
( )) ⎞⎟ ⎞⎟
⎛
⎛ X −E X
s = E N ⎜ exp ⎜ s
⎜⎝
⎜⎝
σX N
()
⎛
⎛ N
⎞⎞
⎜
⎜ ∑ Xn − E X ⎟ ⎟
= E ⎜ exp ⎜ s n=1
⎟⎟ .
⎜
σX N
⎜
⎟⎟
⎜⎝
⎟⎠ ⎟⎠
⎜⎝
⎟⎠ ⎟⎠
⎟⎠ ⎟⎠
(
( ))
(
( )) ⎞⎟ ⎞⎟
⎛
⎛ X −E X
n
⎜
E
exp
s
⎜
∏
⎜⎝
⎜⎝
n−1
σX N
N
⎟⎠ ⎟⎠
The Central Limit Theorem
We can expand the exponential function in an infinite series.
ΦZ
ΦZ
ΦZ
N
N
N
(
( )) + s ( X − E ( X ))
⎛
X −E X
N ⎜
s = E 1+ s
⎜
σX N
⎝
()
2
2!σ N
2
X
2
X − E ( X ))
(
+s
⎞
+⎟
3
⎟
3!σ X N N
⎠
3
3
= σ 2X
⎛
⎞

=0



⎜
⎟
2
3
⎛
⎞
⎛
⎞
E X −E X
E X −E X
⎜
⎟
E X −E X
⎝
⎠
⎝
⎠
2
3
s = ⎜ 1+ s
+s
+s
+⎟
2
3
2!σ X N
⎜
⎟
σX N
3!σ X N N
⎜
⎟
⎜⎝
⎟⎠
()
(
( ))
(
(
( ))
( ))
3
⎛
⎞
⎛
⎞
E
X
−
E
X
⎝
⎠
s2
⎜
⎟
s = ⎜ 1+
+ s3
+⎟
3
2N
3!
σ
N N
X
⎜
⎟
⎝
⎠
()
N
(
( ))
N
The Central Limit Theorem
For large N we can neglect the higher-order terms. Then using
m
⎛
z⎞
lim ⎜ 1+ ⎟ = e z we get
m→∞ ⎝
m⎠
ΦZ
N
⎛
s ⎞
e
s2 /2
s = lim ⎜ 1+
= e ⇒ fZ z =
⎟
N
N →∞ ⎝
2N ⎠
()
2
()
− z 2 /2
2π
Thus the PDF approaches a Gaussian shape, with no assumptions about
N
the shapes of the PDF's of the X n's. This is convergence in distribution.
The Central Limit Theorem
Comparison of the distribution functions of two different Binomial
random variables and Gaussian random variables with the same expected
value and variance
The Central Limit Theorem
Comparison of the distribution functions of two different Poisson
random variables and Gaussian random variables with the same
expected value and variance
The Central Limit Theorem
Comparison of the distribution functions of two different Erlang random
variables and Gaussian random variables with the same expected value
and variance
The Central Limit Theorem
Comparison of the distribution functions of a sum of five independent
random variables from each of four distributions and a Gaussian random
variable with the same expected value and variance as that sum
The Central Limit Theorem
The PDF of a sum of independent random variables is the convolution
of their PDF's. This concept can be extended to any number of random
N
variables. If Z = ∑ Xn then fZ ( z ) = fX1 ( z ) ∗ fX2 ( z ) ∗ fX2 ( z ) ∗∗ fX N ( z ) .
n=1
As the number of convolutions increases, the shape of the PDF of Z
approaches the Gaussian shape.
The Central Limit Theorem
The Central Limit Theorem
The Gaussian pdf
()
fX x =
( )
1
σ X 2π
e
(
− x− µ X
)2 /2σ 2X
( )
2
⎛
µ X = E X and σ X = E ⎡⎣ X − E X ⎤⎦ ⎞
⎝
⎠
The Central Limit Theorem
The Gaussian PDF
Its maximum value occurs at the mean value of its argument.
It is symmetrical about the mean value.
The points of maximum absolute slope occur at one standard deviation
above and below the mean.
Its maximum value is inversely proportional to its standard deviation.
The limit as the standard deviation approaches zero is a unit impulse.
(
)
δ x − µ x = lim
σ X →0
1
σ X 2π
e
(
− x− µ X
)2 /2σ 2X
The Central Limit Theorem
The normal PDF is a Gaussian PDF with a mean of zero and
a variance of one.
1 − x 2 /2
fX x =
e
2π
()
The central moments of the Gaussian PDF are
⎧⎪0
, n odd
n
⎛
⎞
E ⎡⎣ X − E X ⎤⎦ = ⎨
⎝
⎠ ⎪1⋅ 3⋅5… n − 1 σ n , n even
X
⎩
( )
(
)
The Central Limit Theorem
In computing probabilities from a Gaussian PDF it is necessary to
x2
evaluate integrals of the form,
∫σ
x1
()
G x =
1
2π
x
∫e
−∞
− λ 2 /2
dx
X
2π
e
(
− x− µ X
)2 /2σ 2X
. Define a function
x − µX
d λ . Then, using the change of variable λ =
σX
x2 − µ X
σX
dλ
∫
we can convert the integral to
x1 − µ X
2π
e
− λ 2 /2
⎛ x2 − µ X ⎞
⎛ x1 − µ X ⎞
or G ⎜
−G⎜
.
⎟
⎟
⎝ σX ⎠
⎝ σX ⎠
σX
The G function is closely related to some other standard functions. For example
()
the "error" function erf x =
2
x
e
∫
π
0
− λ2
( ( 2x ) + 1).
1
d λ and G x = erf
2
()
The Central Limit Theorem
Jointly Normal Random Variables
(
)(
(
f XY
)
2
⎡ ⎛ x − µ ⎞ 2 2ρ x − µ
⎤
y
−
µ
⎛
⎞
y
−
µ
XY
X
Y
X
Y
⎢
⎥
−
+
⎜
⎟
⎜
⎟
σ XσY
⎢ ⎝ σX ⎠
⎝ σY ⎠ ⎥
exp ⎢ −
⎥
2
2
1−
ρ
XY
⎢
⎥
⎢
⎥
⎣
⎦
x, y =
2πσ X σ Y 1− ρ 2XY
( )
)
The Central Limit Theorem
Jointly Normal Random Variables
The Central Limit Theorem
Jointly Normal Random Variables
The Central Limit Theorem
Jointly Normal Random Variables
The Central Limit Theorem
Jointly Normal Random Variables
Any cross section of a bivariate Gaussian PDF at any value of x or y
is a Gaussian. The marginal PDF’s of X and Y can be found using
∞
( ) ∫ f ( x, y ) dy
fX x =
XY
−∞
which turns out to be
()
fX x =
e
(
− x− µ X
)2 /2σ 2X
σ X 2π
Similarly
( )
fY y =
e
(
− y− µY
)2 /2σ Y2
σ Y 2π
The Central Limit Theorem
Jointly Normal Random Variables
The conditional PDF of X given Y is
(
f X |Y
) ( (
⎧ ⎡
⎪ ⎣ x − µ X − ρ XY σ X / σ Y
exp ⎨−
2
2
2
σ
1−
ρ
⎪
X
XY
⎩
x =
2πσ X 1− ρ 2XY
()
(
) ( y − µ ))
)
Y
⎤ ⎫
⎦ ⎪
⎬
⎪
⎭
2
The conditional PDF of Y given X is
(
fY |X
) ( (
⎧ ⎡
⎪ ⎣ y − µY − ρ XY σ Y / σ X
exp ⎨−
2
2
2
σ
1−
ρ
⎪
Y
XY
⎩
y =
2πσ Y 1− ρ 2XY
( )
(
)(
)
))
⎫
x − µX ⎤ ⎪
⎦
⎬
⎪
⎭
2
Transformations of Joint
Probability Density Functions
(
If W = g X ,Y
)
and
(
)
Z = h X ,Y and both functions are
(
invertible then it is possible to write X = G W , Z
)
and
(
Y = H W ,Z
)
and
P ⎡⎣ x < X ≤ x + Δx, y < Y ≤ y + Δy ⎤⎦ = P ⎡⎣ w < W ≤ w + Δw, z < Z ≤ z + Δz ⎤⎦
( )
( )
f XY x, y ΔxΔy ≅ fWZ w, z ΔwΔz
Transformations of Joint
Probability Density Functions
∂G
ΔxΔy = J ΔwΔz where J = ∂ w
∂H
∂w
( )
( )
∂G
∂z
∂H
∂z
( ( ) ( ))
fWZ w, z = J f XY x, y = J f XY G w, z , H w, z
Transformations of Joint
Probability Density Functions
Let R =
X +Y
2
2
and
⎛Y⎞
Θ = tan ⎜ ⎟ , − π < Θ ≤ π
⎝ X⎠
-1
where X and Y are independent and Gaussian, with zero mean
and equal variances. Then
( )
X = Rcos Θ
∂x
J = ∂r
∂y
∂r
and
∂x
∂θ = cos θ
∂y
sin θ
∂θ
( )
Y = Rsin Θ
()
()
( ) =r
r cos (θ )
−r sin θ
Transformations of Joint
Probability Density Functions
()
fX x =
1
σ X 2π
e
− x 2 /2 σ 2X
( )
and fY y =
1
σ Y 2π
e
− y 2 /2 σ Y2
Since X and Y are independent
1 −( x 2 + y 2 )/2σ 2
2
2
2
f XY x, y =
e
σ
=
σ
=
σ
X
Y
2πσ 2
Applying the transformation formula
r
− r 2 /2 σ 2
f RΘ r,θ =
e
u r , −π <θ ≤ π
2
2πσ
r
− r 2 /2 σ 2
f RΘ r,θ =
e
u r rect θ / 2π
2
2πσ
( )
( )
()
( )
() (
)
Transformations of Joint
Probability Density Functions
The radius R is distributed according to the Rayleigh PDF
π
r
r − r 2 /2σ 2
− r 2 /2 σ 2
fR r = ∫
e
u r dθ = 2 e
u r
2
σ
− π 2πσ
()
()
()
π
E R =
σ and σ R2 = 0.429σ 2
2
The angle is uniformly distributed
( )
(
)
rect θ / 2π
⎧1 / 2π , − π < θ ≤ π
r
− r 2 /2 σ 2
fΘ θ = ∫
e
u r dr =
=⎨
2
2π
⎩0 , otherwise
−∞ 2πσ
()
∞
()
Multivariate Probability Density
FX , X
1
2 ,, X N
0 ≤ FX , X
1
FX , X
1
1
2
2 ,, X N
2 ,, X N
FX , X
1
2 ,, X N
FX , X
1
2 ,, X N
FX , X
2 ,, X N
1
( x , x ,, x ) ≡ P ⎡⎣ X ≤ x ∩ X ≤ x ∩∩ X ≤ x ⎤⎦
( x , x ,, x ) ≤ 1 , − ∞ < x < ∞ ,  , − ∞ < x < ∞
( −∞,,−∞ ) = F
( −∞,, x ,,−∞ )
=F
( x ,,−∞,, x ) = 0
( +∞,,+∞ ) = 1
( x , x ,, x ) does not decrease if any number of x 's increase
( +∞,, x ,,+∞ ) = F ( x )
N
1
2
1
1
2
N
2
X1 , X 2 ,, X N
1
Xk
k
N
N
k
N
k
N
1
X1 , X 2 ,, X N
1
2
N
Multivariate Probability Density
fX ,X
1
2 ,, X N
fX ,X
2 ,, X N
1
∞
( x , x ,, x )
( x , x ,, x ) ≥ 0
1
2
N
1
2
N
∞ ∞
∫∫ ∫ f
−∞
−∞ −∞
∂N
=
FX , X ,, X x1 , x2 ,, x N
∂ x1∂ x2 ∂x N 1 2 N
X1 , X 2 ,, X N
(
, − ∞ < x1 < ∞ ,  , − ∞ < x N < ∞
( x , x ,, x ) dx dx dx
1
2
N
xN
FX , X
1
2 ,, X N
1
2
N
−∞
∞
2
−∞ −∞
=1
N
x2 x1
( x , x ,, x ) = ∫  ∫ ∫ f
1
)
X1 , X 2 ,, X N
( λ , λ ,, λ ) d λ d λ d λ
1
2
N
1
2
∞ ∞
( ) ∫∫ ∫ f
( x , x ,, x , x ,, x ) dx dx dx dx
P ⎡⎣( X , X ,, X ) ∈R ⎤⎦ = ∫  ∫∫ f
( x , x ,, x ) dx dx dx
f X xk =
k
−∞
1
2
−∞ −∞
X1 , X 2 ,, X N
1
N
E g X 1 , X 2 ,, X N
k −1
2
X1 , X 2 ,, X N
R
((
N
k +1
1
N
2
∞ ∞
)) = ∫ ∫ g ( x , x ,, x ) f
1
−∞ −∞
2
N
X1 , X 2 ,, X N
1
k −1
2
N
1
2
k +1
dx N
N
( x , x ,, x ) dx dx dx
1
2
N
1
2
N
Other Important Probability
Density Functions
In an ideal gas the three components of molecular velocity are
all Gaussian with zero mean and equal variances of
σ V2 = σ V2 = σ V2 = σ V2 = kT / m
X
Y
Z
The speed of a molecule is
V = V X2 + VY2 + VZ2
and the PDF of the speed is called Maxwellian and is given by
v 2 − v 2 /2σV2
fV v = 2 / π 3 e
u v
σV
()
()
Other Important Probability
Density Functions
Other Important Probability
Density Functions
N
If χ = Y + Y + Y + + Y = ∑ Yn2 and the random variables
2
2
1
2
2
2
3
2
N
n=1
Yn are all mutually independent and normally distributed then
fχ2
x N /2−1
x = N /2
e− x/2 u x
2 Γ N /2
()
(
)
()
This is the chi - squared PDF.
( )
E χ2 = N
σ χ2 2 = 2N
Other Important Probability
Density Functions
Reliability
()
Reliability is defined by R t = P ⎡⎣T > t ⎤⎦ where T is the random
variable representing the length of time after a system first begins
operation that it fails.
()
()
FT t = P ⎡⎣T ≤ t ⎤⎦ = 1− R t
( ( ))
d
R t = − fT t
dt
()
Reliability
Probably the most commonly-used term in reliability analysis
is mean time to failure (MTTF). MTTF is the expected value
∞
( ) ∫ t f (t ) dt.
of T which is E T =
T
The conditional distribution
−∞
function and PDF for the time to failure T given the condition
T > t0 are
FT |T >t
0
⎧0
, t < t0 ⎫
⎪
⎪ FT t − FT t0
t = ⎨ FT t − FT t0
u t − t0
⎬=
, t ≥ t0 ⎪
R t0
⎪ 1− F t
T
0
⎩
⎭
fT t
fT |T >t t =
u t − t0
0
R t0
()
()
( )
( )
()
()
() (
( )
)
( ) (
( )
)
Reliability
A very common term in reliability analysis is failure rate which
()
()
is defined by λ t dt = P ⎡⎣t < T ≤ t + dt ⎤⎦ = fT |T >t t dt. Failure rate
is the probability per unit time that a system which has been
operating properly up until time t will fail, as a function of t.
R ′ (t )
(
)
λ (t ) =
=−
, t ≥0
R (t )
R (t )
R ′ (t ) + λ (t ) R (t ) = 0 , t ≥ 0
fT t
Reliability
λ ( x ) dx
∫
0
The solution of R ′ t + λ t R t = 0 , t ≥ 0 is R t = e
, t ≥ 0.
() () ()
()
−
t
One of the simplest models for system failure used in reliability
analysis is that the failure rate is a constant. Let that constant be
K. Then
()
t
()
()
Kdx
∫
0
R t =e
= e− Kt and fT t = − R ′ t = Ke− Kt ← Exponential PDF
−
MTTF is 1/K.
Reliability
In some systems if any of the subsystems fails the overall system
fails. If subsystem failure mechanisms are independent, the
probability that the overall system is operating properly is the
product of the probabilities that the subsystems are all operating
properly. Let Ak be the event “subsystem k is operating
properly” and let As be the event “the overall system is operating
properly”. Then, if there are N subsystems
()
() ()
()
P ⎡⎣ As ⎤⎦ = P ⎡⎣ A1 ⎤⎦ P ⎡⎣ A2 ⎤⎦P ⎡⎣ AN ⎤⎦ and R s t = R 1 t R 2 t R N t
If the subsystems all have failure times with exponential PDF’s then
−t 1/ τ +1/ τ ++1/ τ N )
−t / τ −t / τ
−t / τ
R t = e 1 e 2 e N = e ( 1 2
= e−t /τ
s
()
1 / τ = 1 / τ 1 + 1 / τ 2 + + 1 / τ N
Reliability
In some systems the overall system fails only if all of the subsystems
fail . If subsystem failure mechanisms are independent, the
probability that the overall system is not operating properly is the
product of the probabilities that the subsystems are all not operating
properly. As before let Ak be the event “subsystem k is operating
properly” and let As be the event “the overall system is operating
properly”. Then, if there are N subsystems
P ⎡⎣ As ⎤⎦ = P ⎡⎣ A1 ⎤⎦ P ⎡⎣ A2 ⎤⎦P ⎡⎣ AN ⎤⎦
() (
( ))(1− R (t ))(1− R (t ))
and 1− R s t = 1− R 1 t
2
N
If the subsystems all have failure times with exponential PDF’s then
()
(
R s t = 1− 1− e
−t / τ 1
)(
1− e
−t / τ 2
) (
 1− e
−t / τ N
)
Reliability
An exponential failure rate implies that whether a system has just
begun operation or has been operating properly for a long time,
the probability that it will fail in the next unit of time is the same.
The expected value of the additional time to failure at any arbitrary
time is a constant independent of past history,
(
)
( )
E T | T > t0 = t0 + E T
This model is fairly reasonable for a wide range of times but not
for all times in all systems. Many real systems experience two
additional types of failure that are not indicated by an exponential
PDF of failure times, infant mortality and wear - out.
Reliability
The “Bathtub” Curve
Reliability
The two higher-failure-rate portions of the bathtub curve are
often modeled by the log - normal distribution of failure times.
If a random variable X is Gaussian distributed its PDF is
2
− ( x− µ X ) /2 σ 2X
e
fX x =
σ X 2π
()
( )
If Y = e X then dY / dX = e X = Y , X = ln Y and the PDF of Y is
( )
fY y =
( ( )) =
f X ln y
dy / dx
( )
( ()
e
− ln y − µ X
) /2σ
2
2
X
yσ X 2π
Y is log-normal distributed E Y = e
µ X + σ 2X /2
and σ = e
2
Y
2 µ X + σ 2X
(e − 1).
σ 2X
The Log-Normal Distribution
Y = eX
The Log-Normal Distribution
Another common application of the log-normal distribution is to
model the pdf of a random variable X that is formed from the
product of a large number N of independent random variables X n .
N
X = ∏ Xn
n=1
The logarithm of X is then
( )
N
( )
log X = ∑ log X n
( )
n=1
Since log X is the sum of a large number of independent random
variables its PDF tends to be Gaussian which implies that the PDF of
X is log-normal in shape.
Download