Exam #2 Review Chapter 3 Functions of Random Variables 151

advertisement
Exam #2 Review
Chapter 3
3.1
3.2
Chapter 4
4.1
4.2
4.3
4.5
4.6
4.7
Chapter 6
6.2
6.3
6.4
6.7
151
151
154
155
166
Expectation and Moments
Expected Value of a Random Variable
Conditional Expectations
Moments of Random Variables
Joint Moments
Chebyshev and Schwarz Inequalities
Markov Inequality
The Schwarz Inequality
Moment-Generating Functions
Chernoff Bound
Characteristic Functions
The Central Limit Theorem
4.4
6.1
Functions of Random Variables
Introduction
Functions of a Random Variable (FRV): Several Views
Solving Problems of the Type Y = g(X)
General Formula of Determining the pdf of Y = g(X)
215
232
242
246
255
257
258
261
264
266
276
Statistics: Part 1 Parameter Estimation
Introduction
Independent, Identically Distributed (i.i.d.) Observations
Estimation of Probabilities
Estimators
Estimation of the Mean
Properties of the Mean-Estimator Function (MEF)
Procedure for Getting a δ-confidence Interval on the Mean of a Normal
Random Variable When σX Is Known
Confidence Interval for the Mean of a Normal Distribution When σX Is Not
Known
Procedure for Getting a δ-Confidence Interval Based on n Observations on the
Mean of a Normal Random Variable when σX Is Not Known
Interpretation of the Confidence Interval
Estimation of the Variance and Covariance
Confidence Interval for the Variance of a Normal Random variable
Estimating the Standard Deviation Directly
Estimating the covariance
Maximum Likelihood Estimators
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
1 of 36
ECE 3800
340
341
343
346
348
349
352
352
355
355
355
357
359
360
365
Chapter 7
7.1
7.2
7.3
7.4
Statistics: Part 2 Hypothesis Testing
Bayesian Decision Theory
Likelihood Ratio Test
Composite Hypotheses
Generalized Likelihood Ratio Test (GLRT)
Goodness of Fit
391
396
402
403
417
Homework problems
Previous homework problem solutions as examples – Dr. Severance’s Skill Examples
Skills #2
Skills #3
Skills #4
Read through the 2015 Exam #2 …
Three of the four to five problems will be:
1. You have earned the chance to do another sum of probabilities problem. (Exam #1,
problem #3, except alternate pdfs or even pmfs are likely to be selected – discrete
convolution)
2. You will be given a joint density function. Compute the marginal density functions.
Compute all the means, 2nd moments, variances, and cross-correlation coefficient.
(2015 exam #2 problem #1)
3. A confidence interval computation will be required, based on a probabilistic variance
value being know (Gaussian normal distribution) AND based on a statistically generated
variance value being known (Student’s t-distribution).
And now for a quick chapter review … the important information without the rest!
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
2 of 36
ECE 3800
Chapter 3
Functions of Random Variables
151
The sum of two random variables Z=X+Y
We are considering the CDF defined for
Z  X Y
FZ  z   PrZ  z 
f Z z  

 f z  y, y   dy
X ,Y

If X and Y are independent,
f X ,Y  x, y   f X x   f Y  y 
And

f Z z  

f X ,Y z  y, y   dy 


 f z  y   f  y   dy
X
Y

A note, the convolution can be performed in either of the two forms:
f Z z  


f X  x   f Y  z  x   dx 


 f z  y   f  y   dy
X
Y

Extending the convolution result
Z  a  X  b Y
For
Assume two changes of variables:
V  a  X and W  b  Y
Then the new marginal pdf can be defined as
1
1
v
 w
f V v  
f X   and f W w  f X  
a
b
a
b
and we have
1
v
f V v  
fX  
a
a
f Z z  


1
1
v 1  z v
 z  w 1  w
 a f X  a   b f Y  b   dv   a f X  a   b f Y  b   dw


1
1
 z  w
 w
v
 z v
f Z z  
  f X    fY 
  fX 
  f Y    dw
  dv 
a  b   a 
a  b   a 
 b 
b
4.1
Expected Value of a Random Variable
215
This is commonly called the expectation operator or expected value of …
and is mathematically described as:
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
3 of 36
ECE 3800
X  EX  

x f
X
x   dx

For discrete random variables, the integration becomes a summation and
X  EX  

 x  f X x  
x  

 x  Pr  X  x 
x  
General concept of an expected value
In general, the expected value of a function is:
E g  X  

 g  X   f x   dx
X

E g  X  


 g  X   f x    g  X   Pr  X  x 
X
x  
x  
Estimating a parameter:
If we know the expected value, you have a simple estimate of future expected outcomes.
xˆ  X  E X 
Or for y  g  x 
yˆ  E y   Eg  X 
Example 4.1-2 Expected value of a Gaussian
f X x  
   x   2
 exp
2
2   2
 2 
1

, for    x  


X  E X   
Moments
The moments of a random variable are defined as the expected value of the powers of the
measured output or …

  x
Xn E Xn 
n
 f X  x   dx

Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
4 of 36
ECE 3800
  x

Xn E Xn 

 f X x  
n
x  
x
 Pr  X  x 
n
x  
Therefore, the mean or average is sometimes called the first moment.
Expected Mean Squared Value or Second Moment
The mean square value or second moment is

   x
X EX
2
2
 f X  x   dx
2


   x
X EX
2
2
2
 Pr  X  x 
x  
Central Moments and Variance
The central moments are the moments of the difference between a random variable and its mean.
X  X 
n

    x  X 

E XX
n
n
 f X  x   dx

The second central moment is referred to as the variance of the random variable …

  XX
2

2

    x  X 

E XX
2
2
 f X  x   dx

The square root of the variance is the standard deviation, σ
X  X 
2

Note that the variance may also be computed as:
2  X2 X
2
Another estimate of future outcomes, is the value that minimizes the mean squared error.



min E error   min E  X  xˆ 
2
2

Power and Energy Considerations
DC Power/Energy is related to the mean square
EX    2  X
AC Power/Energy is related to the variance
E XX
2

  
2
2
2
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
5 of 36
ECE 3800
Notice that the 2nd moment has both AC and DC power terms ….
2
X 2  2  X  2  2
 

E X2 E X X
  E  X 
2
2
Means and Variances of Defined density functions.
Linearity of the expected values – a linear operator
The linearity allowing “linear operate” characteristics can be interpreted as allowing the expected
value of a function to be the expected value of the internal elements of a function.
N
 N
E  g i  X    E g i  X 
 i 1
 i 1
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
6 of 36
ECE 3800
As another example
EX  Y   EX   EY 
Example 4.1-7 Variance of Gaussian
   x   2
 exp
2
2   2
 2 
1
f X x  



, for    x  


E X      2
2
Example 4.1-15 Geometric Distribution
PX  x   pmf X  x   1  a   a x ,
EX    
 
EX

2
2  a2
1  a 

E X    
4.2
2

2
0 x
a
1 a
a
 22  
1 a
a2
1  a 

2
a
 2  
1 a
Conditional Expectations
232
Expected Values for Conditional Probability
The conditional expectation is defined as
E X | B  
E X | B    xi  Pr X |B

 x  f x | B   dx
x | B    x  pmf x
X |B

i
i
i
X |B
i
| B
i
Example 4.2-1” Conditional expectation of a uniform R.V
1
f X x  
,
x0  x  x1
x1  x0
x  x0
FX  x  
,
x 0  x  x1
x1  x 0
The condition
B  X  a
Therefore:
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
7 of 36
ECE 3800
f X x 
1
a  x  x1

,
1  FX a  x1  a
F  x   FX a  x  a
FX  x | B   X
, a  x  x1

1  FX a 
x1  a
f X x | B  
Expected Values for Joint Density functions and Conditional Probability
Definition 4.2-3 The expected value of a conditional probability.
f X ,Y  x, y 
For the joint density function given as:
EY | X  x
We want to now the expected value
We know from before
f X ,Y  y | X  x  
f  x | Y  y   f Y  y  f  x, y 

f X x 
f X x 
E Y | X  x    y  f X ,Y  y | X  x   dy
Usefulness or application
E Y     y  f X ,Y x, y   dy  dx
But
f  x, y   f  x | Y  y   f Y  y   f  y | X  x   f X  x 
E Y     y  f  y | X  x   f X  x   dy  dx


E Y      y  f  y | X  x   dy   f X  x   dx


E Y    E Y | X  x   f X  x   dx
Properties of Conditional Expectations:
Property I: Expected values of a conditional expected values:
EY   EEY | X 
Property II: If X and Y are independent (X should not matters)
EY   EY | X 
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
8 of 36
ECE 3800
Property III: A conditional chain rule …
EZ | X   EEZ | X , Y  | X 
Based on this equation, all values of y have been considered. Therefore,
E E Z | X , Y  | X    z  f Z | X  z | X   dz  E Z | X 
4.3
Moments of Random Variables
242
Example 4.3-1 Binomial R.V.
n
n x
PX x   pmf X  x      p x  1  p 
 x
EX   n  p
 
E X 2  n  n  1  p 2  n  p
Determine the variance


  
E  X     E X 2  E X 
2

2
E  X     n  p  1  p   n  p  q
2
Joint Moments
246
When we have multiple random variables, additional moments can be defined.
 
  g x, y   f x, y   dx  dy
E g  X , Y  
  
All expected values may be computed using the Joint pdf. There are some “new” relationships.
Correlation and Covariance between Random Variables
The definition of correlation was given as
EX  Y  
 
  x  y  f x, y   dx  dy
 
But most of the time, we are not interested in products of mean values (observed when X and Y
are independent) but what results when they are removed prior to the computation. Developing
values where the random variable means have been extracted, is defined as computing the
covariance
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
9 of 36
ECE 3800
COV  X , Y   E  X  E  X   Y  E Y  
 
  x      y     f x, y   dx  dy
X
Y
  
This gives rise to another factor, when the random variable means and variances are used to
normalize the factors or correlation/covariance computation. For example, the following
definition – correlation coefficient based on the normalized covariance
 X   X
  E 
  X
  Y  Y
  
  Y
 
 x  X


 
 X

 


  y  Y
  
  Y

  f x, y   dx  dy

COV  X , Y 
 X  Y
Also
E X  Y    X   Y
 X  Y
EX  Y      X   Y   X  Y

The short derivation
 X   X   Y   Y
  
  X    Y

 X  Y   X  Y   Y  X   X   Y 
  E 

 X  Y



  E 
The expected value is a linear operator … constants remain constants and sums are sums …

E  X  Y    X  E Y    Y  E  X    X   Y
 X  Y

E  X  Y    X   Y   Y   X   X   Y
 X  Y

E  X  Y    X   Y
 X  Y
Properties of Uncorrelated Random Variables
248
For “standardized” random variables, the correlation coefficient can be solved for as the
correlation value. (standardized R.V. are zero mean, unit variance)

E x  y    X   Y E x  y   0  0

 E x  y 
11
 X  Y
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
10 of 36
ECE 3800
For either X or Y a zero mean variable,

E x  y   0 E  X  Y 

 X  Y
 X  Y
For independent random variables …
E X  Y   E X   E Y    X   Y

E X  Y    X   Y  X   Y   X   Y

0
 X  Y
 X  Y
For X and Y uncorrelated … letting Z=X+Y


VARZ   E Z   Z    Z   X  2     X   Y   Y
2
2
2
2
For uncorrelated X and Y
VARZ    Z   X   Y
2
2
2
A special note
Independent random variables are uncorrelated.
E X  Y   E X   E Y    X   Y

E X  Y    X   Y  X   Y   X   Y

0
 X  Y
 X  Y
However, uncorrelated random variables are not necessarily independent.
Example 4.3-4 Linear Prediction – mean square error
Under a linear assumption of the relationship between X and Y
yˆ  a  X  b
  Y  yˆ  Y  a  X  b
The error can be defined as
   
 
E  2  E Y 2  2  a  E Y  X   2  b   Y  a  2 E X 2  2  a  b   X  b 2
Minimization says to take the derivative in a and set to zero and then derivative in b and set to
zero.

E  2  2  EY  X   2  a  E X 2  2  b   X  0
a
 
 
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
11 of 36
ECE 3800
 

E  2  2  Y  2  a   X  2  b  0
b
a
E Y  X    Y   X
 
E X 2  X
2

  X  Y   Y

X
X2
b  Y  a   X  Y 
The linear predictor becomes
yˆ   Y 
  Y
 X   X 
X
  Y  yˆ  Y   Y 
 
  Y
 X
X
  Y
 X   X 
X

E  2   Y   2  Y   Y  1   2
2
2
2

Jointly Gaussian Random Variables
251
If two R.V. are jointly Gaussian

  x   X 2
 x   X    y   Y   y   Y 2  
1

 2  

exp 

2
2
 X  Y
 Y2 
 2  1     X
f X ,Y  x, y  
2   X   Y  1   2


If   0 :
   x   X 2   y   Y 2 
   x   X 2 
   y   Y 2 

exp 

 exp 
 exp 
2   Y2 
2   X2
2   Y2 
2   X2 



f X ,Y  x, y  


2   X   Y
2   Y
2   X
4.4
Chebyshev and Schwarz Inequalities
255
The Chebyshev inequality furnishes a bound on the probability of how much an R.V. can deviate
from its mean value.
Chebyshev inequality Theorem 4.4-1
Let X be an arbitrary R.V. with known mean and variance. Then for any   0
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
12 of 36
ECE 3800
P X   X    
X2
2
If we consider the complement of the probability described,
P X   X     1 
2
2
It may be convenient to define the delta function in terms of a multiple of the variance.
  k 
Then the Chebyshev inequality becomes
P x   X  k    
1
k2
P x   X  k     1 
1
k2
Example 4.4-1
Deviation from the mean for a Normal R.V.
The Gaussian Normal CDF is

xX
 X  z 


z
  v   2 

1
  dv
 
 exp
2


2


2



 v  


Therefore
k
 x X

  v   2 
1
  dv   X k    X  k   2   X k   1
 k  
 exp
P
2

 
 v  k 2  
 2 



P x  X  k    2   X k   1  1 
1
k2
and

 x X
 k
  v   2 
  v   2 
1
1
1
  dv  2


P
 dv  
 exp
 k  
 exp
2
2


k
2  
 v   2  
vk
 2 

 2 

 
 x X

1
 k    X  k   1   X k   1  Q k   Qk   2
P
k

 
 x X

1
 k   2  1   X k   2  Qk   2
P
k
 

Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
13 of 36
ECE 3800
Markov Inequality
257
The Markov inequality focuses on the mean values and states. For a non-negative R.V.
f X  x   0,
for x  0
PX    
EX 

The Schwarz Inequality
258
Schwarz Inequality considers the magnitude of the covariance of two R.C.
CovX , Y    X   Y
2
CovX , Y    X   Y
2
2
2
2
You have already experienced the Schwartz Inequality in other setting …
For the inner product

h, g    hx   g x *  dx

h, g  
h  g
You may also see the convolution form as

 hx   g x   dx 
*


 hx   hx   dx 
*


 g x   g x 
*
 dx

Law of Large Numbers
Now that we have discussed the Cheybyshev inequality, we can provide a proof of the Law of
Large Numbers. The discussions here provides the condition that the sample mean converges to
the ensemble mean … that is the statistical mean equals the R.V. ensemble mean.
Example 4.4-3.
The sample mean equals the expected value mean
For a large enough number of samples, we say that
1 n
̂ X    X i
n i 1
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
14 of 36
ECE 3800
1 n
1
X  n X  X
n i 1
n
E ˆ X  
Var ˆ X  
1
2
 X
n
Therefore, using the Chebyshev inequality we state that
X2
P ˆ X   X    
n  2
Then for any fixed value of delta
X2
lim P ˆ X   X     lim
n
n n   2
and
lim P ˆ X   X     0
n 
4.5
Moment-Generating Functions
261
The moment generation function (MGF) is the two sided Laplace transform of the probability
density function (pdf). If the MGF exists, there is a forward and inverse relationship between the
MGF and the pdf. The MGF is defined bases on the expected value as
M X t   Eexpt  X 
Therefore
M X t  

 f x   expt  x   dx
X

If you like s better than t in your Laplace transforms …
M X s  

 f x   exps  x   dx
X

For discrete R.V. we perform a discrete Laplace transform
M X s  

 pmf X xi   exps  xi  
i  

 P x   exps  x 
i  
X
i
i
Why do we do this?
1. It enables a convenient computation of the higher order moments
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
15 of 36
ECE 3800
2. It can be used to estimate fx(x) from experimental measurements of the moments
3. It can be used to solve problems involving the computation of the sums of R.V.
4. It is an important analytical instrument that can be used to demonstrate results and
establish additional bounds (the Chernoff Bound and the Central Limit Theorem).
It enables a convenient computation of the higher order moments
k
k 
M X t   M X 0   mk
k
t
t 0
Solution done by performing a 2-sided Laplace Transform and differentiation!
Example 4.5-1: The MGF of a Gaussian
f X x  
   x   2
 exp
2
2   2
 2 
1

, for    x  



 2 t2

M X t   exp   t 
2




Example 4.5-2 MGF of Binomial
 n
n x
PX x   pmf X  x      p x  1  p 
 x
M X t   1  p  p  expt 
n
Example 4.5-3 MGF of Geometric Distribution
PX  x   pmf X  x   1  a   a x ,
M X t  
0 x
1 a
1  a  exp t 
The MGF of an Exponential
f X  x     exp   x ,
M X t  

 t
,
for 0  x  
for t  
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
16 of 36
ECE 3800
The MGF of a Uniform U(a,b)
4.6
f X x  
1
, for a  x  b
ba
M X t  
exp t  b   exp t  a 
t  b  1
Chernoff Bound
264
The Chernoff bound is based on concepts related to the MGF.
PX  a   exp a  t   M X t 
To minimize the bound, the value of t that provides the smallest value should be used. Therefore,
differentiate with respect to t, find the value of t for the minimum and use the derived value for
the bounds.
Example 4.6-1 Chernoff Bound to Gaussian
For the Gaussian
f X x  
   x   2
 exp
2
2
2  
 2 
1

, for    x  



 2 t2
M X t   exp   t 
2




The Bound becomes

 2 t2
PX  a   exp a  t   M X t   exp a  t   exp   t 
2

 1 a   2 
,
PX  a   exp  
for a  
 2 
 2



Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
17 of 36
ECE 3800
4.7
Characteristic Functions
266
Therefore, the characteristic function is
 X w  Eexp j  w  X 
 X w 

 f x   exp j  w  X   dx
X

For discrete RV
 X w  

 pmf X xi   exp j  w  xi  
i  

 P x   exp j  w  x 
i  
X
i
i
Note that there is a relationship between the MGF and CF …
M X t   Eexpt  X  and  X w  Eexp j  w  X 
 X w  M X  j  w
For sums of R.V., Z=X+Y, as previously shown the density function is the convolution of the
two density function in X and Y.
f Z z  


f X  x   f Y  z  x   dx 


 f z  y   f  y   dy
X
Y

But this can also be performed in the “frequency domain” by multiplication
 Z w    X w    Y w 
As a result it makes it much easier to contemplate
Z  X1  X 2    X N
This would become
 Z  w   X 1  w    X 2  w      X N  w 
Moment Generation with the Characteristic Function
As with the MGF, the CF can generate moments by differentiation.
1
k
k 
k 
 X w 
  X 0   j k  mk or mk  k   X 0
k
w
j
w0
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
18 of 36
ECE 3800
Joint MGF and CF
Joint MGF and CRF Function can and are defined. As may be expected, they can be used to
compute “cross-products” of random variables just as they generated moments.
The Joint MGF is defined as
M X 1, X 2,,, X N t 1,t 2 , ,t N   Eexpt 1 X 1t 2  X 2  t N  X N 
The Joint CF is defined as
 X 1, X 2,,, X N w1 ,w2 , ,w N   Eexp j w1  X 1 j w2  X 2   j w N  X N 
All the moments become for the joint and marginal variables can then be computed based on
M X ,Y
( l ,n )
0,0 
 l n
M X ,Y t1 , t 2 
 ml ,n
l
n
t1  t 2
t t 0
1
2
or
 X ,Y
( l ,n )
0,0 
 l n
 X ,Y w1 , w2 
l
n
w1  w2
w  w 0
1
and
2
 j l n   X ,Y (l ,n ) 0,0  ml ,n
The Central Limit Theorem
The central limit theorem states that the normalized sum of a large number of mutually
independent R.V. with zero mean and finite variance tends to the Gaussian normal CDF,
provided that the individual variances are small compared to the sum of the variances.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
19 of 36
ECE 3800
Chapter 6
Statistics: Part 1 Parameter Estimation
Statistics Definition: The science of assembling, classifying, tabulating, and analyzing data or
facts:
Descriptive statistics – the collecting, grouping and presenting data in a way that can be easily
understood or assimilated.
Inductive statistics or statistical inference – use data to draw conclusions about or estimate
parameters of the environment from which the data came from.
Definitions
Population:
the collection of data being studied
N is the size of the population
Sample:
a random sample is the part of the population selected
all members of the population must be equally likely to be selected!
n is the size of the sample
Sample Mean:
the average of the numerical values that make of the sample
Population:
N
Sample set:
S  x1 , x2 , x3 , x 4 , x5 , xn 
Sampling Theory – The sample mean – (the mean itself, not the value of X)
1 n
Xˆ   X i , where X i are random variables with a pdf.
n i 1
Sample Mean

1 n
 1 n
1 n
E Xˆ  E   X i    E  X i      
n i 1
 n i 1  n i 1
Variance of the sample mean – (the mean itself, not the value of X)
For N>>n or N infinity (or even a known pdf), using the collected samples ….


2
 1 n
 
ˆ
Var X  E   X i    E Xˆ
 
 n i 1

 
2
   2   2
1
 n   X 2   X 2  X
Var Xˆ    X 2 

n
n2
n

 
n
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
20 of 36
ECE 3800
Sampling Theory – The Sample Variance
The sample variance of the population (stdevp) is defined as:
S2 
1
n
 X
n
i

2
 Xˆ
i 1
 
n 1 2

n
where  is the true variance of the random variable.
E S2 
Note: the sample variance is not equal to the true variance; it is a biased estimate!
To create an unbiased estimator, scale by the biasing factor to compute (stdev):
 

 
n
n 1 n
~
E S 2   x2 
E S2 
  X i  Xˆ
n 1
n  1 n i 1
  n 1 1  X
2
n
i 1
i
 Xˆ

2
Variance of the sample variance
As before, the variance of the variance can be computed. (Instead of deriving the values, it is
given.) It is defined as
 
Var S 2 
4   4
n
where  4 is the fourth central moment of the population and is defined by

 4  E  X  X

4 
For the unbiased variance, the result is
 
~
Var S 2 
 4   4 n   4   4 
n2
n2
2




Var
S
n
n  12
n  12
n  12
 
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
21 of 36
ECE 3800
Statistical Mean and Variance Summary
For taking samples and estimating the mean and variance …
The Estimate
Mean
Variance of Estimate


Var Xˆ  X
n
1 n
̂ X  Xˆ   X i
n
i 1
2
An unbiased estimate

E Xˆ  E X 
ˆ X   X
Variance (biased)
1
S 
n
2


n
2
X i  Xˆ
i 1
 
Variance (unbiased
 

 4  E  X  X

A biased estimate
E S2 

n  4   X
~
Var S 2 
n  12
4
4 
n 1
2
 X
n
 
 
n
~
2
ES2 
E S2 X
n 1
An unbiased estimate
  
~
E S   ˆ
~
2
E S 2  E X   X 
2
2
X
X

 
Var S 2 

4   X 4
n
 4  E  X  X

4 
2
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
22 of 36
ECE 3800

Building a confidence interval
P X   X    
Using the Chebyshev Inequality
X2
2
It may be convenient to define the delta function in terms of a multiple of the variance.
  k 
X2
P X   X  k   X   2
2
k  X
Flipping he bounds on the inequality
P X   X  k   X   1 
1
k2
Expanding the absolute value and adding the mean
P k   X  X   X  k   X   1 
1
k2
The confidence interval for the statistical average becomes

 

1
P  k  X  ˆ X   X  k  X   1  2
k
n
n

If we assume that the mean value has a Gaussian distribution, an exact value can be computed
for this probability


ˆ


  X
P  k  X
 k   k    k   2  k   1
X


n


A confidence interval often chosen is 95% or 0.95.
k  
1  0.95
 0.975
2
k  1.96
and


ˆ   X
P  1.96  X
 1.96  0.95
X n


Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
23 of 36
ECE 3800
Gaussian Confidence intervals:
For a “two sided” confidence interval, we want




ˆ X   X
 k   k    k   2  k   1  C.I .
P  k 
X


n


(A: textbook steps 1 & 2) Find the appropriate value k for the confidence interval selected.
(B: textbook step 3 & 4) Based on the known Gaussian mean and variance for the “estimated”
R.V and the known number of samples, Compute the bounds on the inequality.
Summary




P  bounds  k  X  ˆ X   X  k  X  bounds  2  k   1  C.I .
n
n


Compute what you need …
Looking at the numbers …
The higher the confidence interval, the wider are the bounds.
The tighter are the bounds, the smaller the confidence interval.
Gaussian “confidence”
+/- one standard deviation  68.3%
+/- two standard deviation  95.44%
+/- three standard deviation  99.74%
90%  k=+/-1.64 standard deviation
95%  k=+/-1.96 standard deviation
99%  k=+/-2.58 standard deviation
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
24 of 36
ECE 3800
Confidence Intervals when we do not know the actual variance …
We have the statistically computed, non-biased variance estimate.
Define the new “estimated random variable mean function: as
ˆ   ˆ  

T  ~
ˆ
S
n
n
and define
Tn 1 
ˆ  

ˆ
n
ˆ  
n
1
2
   X i  ˆ 
n  n  1 i 1
To simplify the textbook description involving the chi-squared distribution, this is the basis for
Student’s t-distribution with n-1 degrees of freedom.
The Student’s t probability density function (letting v=n-1, the degrees of freedom) is defined as
f T t  
where 
v 1
 v  1

 
2  2
 2   1  t 
v 
v 
v      
 2
 is the gamma function.
For a “two sided” confidence interval, we want




ˆ X   X
 t   Tn1 t   Tn1 t   2  Tn1 t   1  C.I .
P  t 
ˆ X


n


(A: textbook steps 1 & 2) Find the appropriate value t based on the value v=n-1 for the
confidence interval selected. (Hint. the tables says x and n, but you are looking up v=n-1 (not n)
and finding t=x based on FT)
(B: textbook step 3 & 4) Based on the known computed variance for the “estimated” R.V and the
known number of samples. Compute the bounds on the inequality.

ˆ
ˆ 
P  t 
 ˆ X   X  t 
  C.I
n
n

Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
25 of 36
ECE 3800
Or the bounds on the true mean, based on the confidence interval are

ˆ
ˆ 
P ˆ X  t 
  X  ˆ X  t 
  C.I
n
n

CI  100 
tc
 f t   dt  F t   F  t 
T
T
c
T
c
for  t c  t  t c , 2-sided
t c
6.2 Definitions
Estimator:
a function of the observations vector that estimates a particular parameter.
Unbiased estimator an estimator is unbiased if the estimate converges to the correct value.
Biased estimator
an estimator may be biased. Being converging to an offset or gain adjusted
value.
Linear estimation
the estimate is a linear combination of the sample points … bH x X
Consistent
the estimator is consistent if the estimate converges to the appropriate value
as the number of samples goes to infinity.
There are estimators that minimize the variance in the estimate from the sample values.
There are estimators that minimize the mean-squared error for the sample values.
6.7 Maximum Likelihood Estimators
The likelihood function can be “properly” defined as
n
L ; x1 , x 2 , , x n    f X  xi |  
i 1
we are interested in finding the value of theta, , that maximizes this function!
To solve … take the derivative, set to zero, solve and determine the minima and maxima. Pick
the global maxima!
Easy … right ?!
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
26 of 36
ECE 3800
6.8 Ordering, Ranking and Percentiles
Percentile: https://en.wikipedia.org/wiki/Percentile
“A percentile (or a centile) is a measure used in statistics indicating the value below which a
given percentage of observations in a group of observations fall. For example, the 20th percentile
is the value (or score) below which 20 percent of the observations may be found.”
Textbook:
The u-th percentile of X is the number xu such that FX(xu)=u.
u  FX xu 
One would say that the result xu is in the uth percentile.
Example 6.8-1 Assume a person’s IQ is distributed as N(100,100). That is , a Gaussian normal
with mean of 100, a variance of 100 and a standard deviation of 10.
Then an IQ of 115 would be defined at what percentile of the popolations?
z  115  100  1.5
10
1.5  0.9332
The individual is in the 93rd percentile for IQ.
Median: The median of a population is defined where half of the population is above and half is
below the value.
FX  xmedian  0.5
For some of the distributions described, the mean and the median are not equal!
For example, the exponential distribution.
xmedian  
ln0.5


0.69

whereas  X 
1

Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
27 of 36
ECE 3800
Chapter 7
7.
Statistics: Part 2 Hypothesis Testing
Hypothesis Testing
Statistical Decision Making.
A statement is made in terms of a Hypothesis. The goal of interpreting the measured values is to
accept or reject the Hypothesis.
When there is only one Hypothesis, it is referred to as the null Hypothesis (H0).
There are potentially multiple Hypotheses, we can generate criteria to accept one over another
(establishing thresholds for decision making).
7.1
Bayesian Decision Theory
Bayes Hypothesis Testing
Consider a simple binary hypothesis problem where we are to decide between two hypotheses
based on a random sample X n   X 1 , X 2 ,, X n  :
and we assume that we know that H 0 occurs with probability p0 and H 1 with probability
p1  1  p0 . There are four possible outcomes of the hypothesis test, and we assign a cost to each
outcome as a measure of its relative importance:
Cost = C00
1. H 0 true and decide H 0
2. H 0 true and decide H 1 (Type I error)
Cost = C01
3. H 1 true and decide H 0 (Type II error)
Cost = C10
4. H 1 true and decide H 1
Cost = C11
It is reasonable to assume that the cost of a correct decision is less than that of an erroneous
decision, that is C00  C01 and C11  C10 . Our objective is to find the decision rule that minimizes
the average cost C:
C  C00  PH 0 | H 0   C01  PH1 | H 0   p0  C11  PH1 | H1   C10  PH 0 | H1   p1
Note: in detection and estimation, “optimal” is typically defined based on a cost function. The
cost functions are often readily related to the operation or parameter (e.g. minimum meansquare-error, maximum likelihood, etc.) but they can include arbitrary terms, as long as they can
be readily defined based on the measured values, differences, statistics, or probability.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
28 of 36
ECE 3800
The following theorem identifies the decision rule that minimizes the average cost.
7.2
Likelihood Ratio Test
When we don’t have underlying probabilities or a cost function but do have the conditional
probabilities, we define a likelihood ratio test as
f  X 1 ;  1   f  X 1 ;  1    f  X n ;  1  L X ;  1 

 k accept  1 as choice
f  X 1 ;  2   f  X 1 ;  2    f  X n ;  2  L X ;  2 
f  X 1 ;  1   f  X 1 ;  1    f  X n ;  1  L X ;  1 

 k reject  1 as choice
f  X 1 ;  2   f  X 1 ;  2    f  X n ;  2  L X ;  2 
where the threshold k is determined from criteria that could be based on any decision theory
desired.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
29 of 36
ECE 3800
Hypothesis Testing
Based on : Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper
and Clare D. McGillem; Oxford Press, 1999..
Null Hypothesis Testing
A significance test based on a decision rule must be determined.
The significance test establishes a level, potentially the confidence level or confidence interval,
to determine whether to accept or reject the hypothesis. This is stated as a decision rule where
Accept H0:
if the computed value “passes” the significance test.
Reject H0:
if the computed value “fails” the significance test.
In general this involves a significance test that defines and equation or function that can be
computed based on the measured data (statistics). A performance threshold is then defined that
defines a Accept/Reject or pass/fail boundary.


Inside or outside the confidence interval.
Acceptably meet desired criteria or not.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
30 of 36
ECE 3800
Example: (p. 174) A capacitor manufacturer claims that the capacitors have a mean breakdown
voltage of 300V or greater. We establish a significance test of a 99% confidence level. (In this
case, we are looking for values above the minimum confidence level as acceptable, a one sided
test.)
In testing, 100 capacitors are tested (note this is a destructive test) and a mean value of 290V
with an unbiased sample standard deviation of 40 V.
Is the Hypothesis accepted or rejected?
The significance test at the 99% confidence level requires
~
X t S
 Xˆ
n
Using v=99 and F=0.99 (1 sided test) on G-4, t=2.358 (using 120).
Therefore
~
40
tS
 2.358 
 9.432
n
100
300  9.432  Xˆ or 290 .568  Xˆ
The measurement results cause us to reject the Hypothesis! Therefore, we would say that the
mean claimed is not valid.
Final note: a 99% confidence level was selected for the significance test in this example, if a
99.5% level were selected; the Hypothesis would have been accepted! (300-10.468=289.532)
To alleviate (?) this confusion, a “level of significance” can be defined that is 1.0 - confidence
level. This would say that the above test was to a 1% level of significance. Therefore a 1% level
of significance is rejected but a 0.5% level of significance would be accepted.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
31 of 36
ECE 3800
Example: Testing a fair Coin
The binomial random variable allows us to develop a test to see if a coin is fair when flipped. We
need to count the number of heads that occur and test at a level of significance of 5% or a 95%
confidence level.
Accept H0:
if the number of heads inside 95% confidence level, it is fair.
Reject H0:
if the number of heads outside 95% confidence level, it is not fair.
We assume that number of trials has lead to Gaussian statistics, except for a- prior knowledge of
the mean and variance expected for a fair coin (p=0.5) using the binomial random variable. For
this distribution, the mean and variance are
E X   X  n  p or Var X    2  n  p  1  p 
Assuming 100 trials and that the statistics have become Gaussian.
k 
k 
 Xˆ  X 
X
n
n
We use a two-sided test, therefore k=1.96 and we have
k   1.96  100  0.5  0.5 1.96  100


 4.9
10  4
n
100
Therefore the test region for the Hypothesis is
50  4.9  Xˆ  50  4.9
or
45.1  Xˆ  54.9
Now you have a criteria to establish if a coin is fair ….
From Matlab:
Pr(46<=x<=54) = 0.631798
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
32 of 36
ECE 3800
Example: Hypothesis testing in communications
Signal plus noise input to the receiver. “Digital symbol” receiver outputs a value corresponding
to a symbol plus noise. Hypothesis testing establishes the rules to select one symbol as compared
to another. Incorrect selections results in symbol errors and digital bit errors once the symbols
are translated into bits.
r t   s i t   nt ,
for   i  T  t    i  1  T
Bernard Sklar, “Digital Communications, Fundamentals and Applications,”
Prentice Hall PTR, Second Edition, 2001. Appendix B.
For each symbol a detection statistic is generated for the symbol period T.
z T   ai T   n0 T 
Hypothesis testing then determines the estimated symbol value from the detection statistic.
The number of Hypothesis is equivalent to the number of possible symbols transmitted.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
33 of 36
ECE 3800
7.3
Composite Hypotheses
402
Generalized Likelihood Ratio Test (GLRT)
A Generalized Likelihood Ratio Test can be used when multiple Hypothesis exist.
In this case, we could think of each hypothesis involving a different subset or subspace of the
possible outcomes.
Again a test statistic could be generated to define inclusion of exclusion such as
 
 
LLM  *

LGM  z
The numerator term involves a local maximization for the parameter theta, while the
denominator involves a global maximization in terms of the full range of theta.
Multiple values are computed based on the search restricted regions desired for each of the local
maxima.
This is more closely associated with detecting multiple signal levels from a multi-level
transmission, providing detection statistics within each expected regions for possible symbol
identification.
7.4
Goodness of Fit
417
Pearson test for valid statistical measures … flipping coins, rolling die,
A critical aspect of simulations is that the data actually has the desired characteristics defined or
desired. This is particularly important in creating random variables of mathematically defined
densities and distributions.
The general model is that of sorting the data into “l” bins and comparing the estimated
probabilities with the specified probabilities. Then if all of the estimates are near the desired
results, it is likely that population are correct. If they differ in two or more bins, it is likely that it
is not the correct density function.
In performing this test, the number and widths of the bins is important. For simple discrete tests,
all bins can be considered. For continuous tests, the number and size drive complexity, the
amount of data needed to be taken and the overall accuracy.
The R.V. is defined as
1, jth observation of X in bin i
X ij  
0, else
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
34 of 36
ECE 3800
The probability is
 
P X ij  pi
Then we have the sum R.V. … the number of outcomes in the bin.
n
Yi   X ij
j 1
This becomes a multinomial probability function and is based on
n!
PY1  r1 , Y2  r2 ,1 , Yl  rl  
 p1r1  p2r2   plrl
r1!r2 !rl !
The approximation is
PY1  r1 , Y2  r2 ,1 , Yl  rl  
n!
r1!r2 !rl !

 1 l  r  n  pi
exp     i
 2 i 1  n  pi






2




2  nl 1  p1  p2   pl
Skipping the maximum likelihood discussion in the text …
Under the large sample assumption we can approximate using the Gaussian normal with
Y  n  poi
N n  pi , n  pi  while the U i  i
n  poi
The statistical test, known as Pearson Test Statistics becomes
 Y  n  poi 

V   i
 n p 
i 1 
oi 
2
l
The Person’s test statistics has the form of a chi-squared RV with l-1 degrees of freedom. (Note
chi-squared is based on the sum of Gaussian R.V. squared). Therefore in assigning regions of
convergence or probability the chi-squared table must be used (G-5 Table 3).
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
35 of 36
ECE 3800
Example 7.4-1 Coin fairness
Assuming heads and tails are probability 0.50. Assume 100 flips at alevel of significance of
alpha =0.05. If we observe 61 heads …
2
 Y  n  poi 
1
2
2
 
V   i
 61  50  39  50
 n  p  100  0.5
i 1 
oi 

l
V

1
 121 121  4.84
50
Using the table … l-1=1 degree of freedom, 0.95 re3sults in z=3.84

F 2 0.95, l  1  1  3.84 3.84  V

The criteria is not passed.
If there were 41 and 59 …
V


1
 9 2  9 2  3.24
50
Example 7.4-2 Fair Die
Assume 1000 rolls with the following sequential results: 152, 175, 165, 180, 159, 171.
2
 Y  n  poi 
1
 
V   i
 n  p  1000  1
i 1 
oi 
6
l
 
V
152  1672  175  1672  165  1672 


2
2
2
 180  167  159  167  171  167 
1
 225  64  4  169  64  16  3.25
167
Using the table … l-1=5 degree of freedom, 0.95 results in z=11.1
Therefore,

F 2 0.95, l  1  5  11.1 3.25  V

The criterion is passed.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2015
36 of 36
ECE 3800
Download