Chapter 4 Some Popular Probability Models

advertisement
4.1
Chapter 4
Some Popular Probability Models
4.1 INTRODUCTION
The last chapter addressed the notion of probability in relation to sets. The reason for that approach is that once
one has a firm understanding of the set (i.e. event), computing its probability is often straightforward and
simple. This point was emphasized by focusing on Bernoulli random variables. Since the sample space for a 1D Bernoulli random variable, X, is S X  {0,1} , then any event one can fathom is one unique element of the
field of events  X  { {0},{1}, S X , }. Recall that the p-value for a Bernoulli random variable, X, is defined to
be p X  Pr{1}  Pr[ X  1] . Knowing the numerical value of this parameter allows one to compute the
probability of the only other nontrivial member of  X , which is the set {0}. Specifically, we have
Pr{0}  Pr[ X  0]  1  p X . In fact, the two singleton sets {0} and {1} are the only nontrivial subsets of S X .
In the case of a 2-D Bernoulli random variable ( X 1 , X 2 ) with sample space S ( X1 , X 2 )  {(0,0), (1,0), (0,1), (1,1) } ,
which has four elements, the corresponding field of events has 2 4  16 member sets. Even so, if one knows the
numerical values of the probability of each of the four singleton sets {(0,0}, {(1,0)}, {(1,0)} and {(1,1)}, then
one can easily compute the probability of any subset of S ( X 1 , X 2 ) . For example, knowledge of the numerical
values of the three parameters, p00, p10 and p01 immediately provides the numerical value of the parameter p11,
since we must have p 0 0  p010  p010  p11  1 . Having knowledge of these, in turn, allows one to compute
marginal probabilities associated with the marginal events [ X 1  1]  {(1,0), (1,1)} and [ X 2  1]  {(0,1), (1,1)} .
The probabilities of these two events are p X1  Pr[ X 1  1]  Pr{(1,0), (1,1)}  p10  p11 and
p X 2  Pr[ X 2  1]  Pr{(0,1), (1,1)}  p01  p11 , which are the p-values for the 1-D Bernoulli random variables X1
and X2, respectively. Even more apparently difficult probability computations are, in fact, no more difficult if
one understands the nature of the events. For example, to compute Pr[ X 1  X 2  1] is trivial if one understands
that the event [ X 1  X 2  1] as the unique subset of S ( X 1 , X 2 ) that is { ( x1 , x2 ) | x1  x2  1} . This is simply the set
{(1,1)}. Hence, Pr[ X 1  X 2  1]  p11 .
The above 2-D case can be extended to the case of an n-D Bernoulli random variable. In this more general case
there are 2n singleton subsets, and so if one knows the numerical values of the probabilities of any 2n-1 of these,
then one can compute the probability of any event. In the situation where these n random variables are mutually
independent, one needs to know the p-value of each of the n 1-D Bernoulli random variables. In this chapter we
will address a variety of random variables whose probability structure is completely dictated by a set of
parameters. However, as we shall see, those parameterized structures are associated with assumed shape
models. The Bernoulli probability structure entails no such assumptions, due to its simplicity. In this sense, a
Bernoulli probability model is, in fact, not a model at all. Any ‘0/1’ random variable is, by definition, a
Bernoulli random variable. One need not assume that it has a Bernoulli model for its probability structure. The
following example, taken from the last chapter, illustrates the difference between an assumed probability model
and a probability structure that relies on no assumptions.
4.2

Example 4.1 Consider the 3-D random variable X  ( X 1 , X 2 , X 3 ) where Xk = the act of recording whether the kth of
three chosen biopsies taken from a patient is not (0) or is (1) cancerous. Let Y 
3
X
k 1
k
be the act of recording the
number of cancerous biopsies.

(a) Suppose that we make no assumptions regarding the independence of the components of X . Then the probability
structure for Y is parameterized by the probabilities of any 7 of the 8 singleton subsets of S X . These are not parameters of
a probability model, since there is no model.
(b) Suppose now that we assume that the locations of the chosen biopsies are sufficiently far apart that cancer in any one
of the locations could not be directly related to cancer in another chosen region (i.e. there are independent cancerous

regions in the patient’s body). Under this assumption the probability structure of X is completely dictated by the 3 pvalues { p k  Pr[ X k  1]}3k 1 . This assumed model for Y is a 3-parameter model.

(c) Now, suppose that we assume that the components of X are not only mutually independent, but are also identically
distributed (i.e. iid). Under this assumption the three parameters { p k  Pr[ X k  1]}3k 1 are one and the same parameter,
call it pX. In this case, the assumed model for Y is a 1-parameter model.
(d) Compute Pr[Y  3] for each of the situations (a) through (c).
Solution:
(a): Pr[Y  3]  Pr[ X 1  1 X 2  1  X 3  1]  Pr{(1,1,1)}  p111 .
(b): Pr[Y  3]  Pr[ X 1  1 X 2  1  X 3  1]  Pr[ X 1  1]  Pr[ X 2  1]  Pr[ X 3  1]  p X1 p X 2 p X 3 .
(c): Pr[Y  3]  Pr[ X 1  1 X 2  1  X 3  1]  Pr[ X 1  1]  Pr[ X 2  1]  Pr[ X 3  1]  p X1 p X 2 p X 3  p 3X .
(e) Under the assumptions in part (c), the model for the probability structure for Y has a name. It is called a Binomial
distribution with n=3 and with parameter pX. The more general model in (b) does not have a name. □
4.3
4.2 MOTIVATION FOR PROBABILITY MODELS
There are many reasons that one might desire to invoke a certain probability model. One of the most common relates to
attempts to understand the probability structure of a random variable via a data histogram. Consider the following
example.
Example 4.2 A company that specializes in fabrication of custom furniture is considering using cedar slats to construct
the seats of chairs and couches designed for outdoor use. In order to arrive at the number of slats needed to carry a
specified design load, the company has decided to conduct bending load tests on 100 slats. The results are summarized in
the following two histograms:
14
12
Frequency
10
8
6
4
2
0
16
17
18
19
20
21
Load (lbf )
22
23
24
25
17
18
19
20
21
Load (lbf )
22
23
24
25
25
Frequency
20
15
10
5
0
16
Figure 4.1 Histograms associated with breaking strength measurements of 100 wooden slats. The top plot is a 10-bin
histogram and the bottom plot is a 5-bin histogram.
The 10-bin histogram reveals no measurements in the interval (22.4 ,24.1). Were we to use only this histogram to infer
probability, we would have to presume that the probability of recording a breaking load in this region is zero. On the other
hand, if we use the 5-bin histogram, the probability of this interval would not be zero. The 5-bin histogram provides a
more realistic description of the breaking strength probability structure, but at a price. That price is large bins, or
equivalently, less detail in the probability structure.
4.4
Suppose that similar testing on other types of wood have suggested that a normal probability model is often reasonable.
The normal pdf is a 2-parameter model. The parameters that describe it are the mean and standard deviation of X. From


the 50 measurements, we obtain  X  x  20.03 and  X  s  1.65 . The normal pdf model is shown below.
0.25
0.2
X
f (x)
0.15
0.1
0.05
0
15
16
17
18
19
20
Load (lbf )
21
22
23
24
25
Figure 4.2 A normal pdf model for X = the act of measuring the breaking strength of a wooden slat.
This model allows one to compute the probability of the interval (22.4 , 24.1), or any other interval, in an unambiguous
manner.
(a) Use the normal model to estimate the probability of the interval (0,15).
Answer: Pr[0  X  15]  FX (15)  FX (0)  .001  3(10 34 )  0.001
(b) Suppose the design load for a given chair design is 300#, and that each of m slats to be used will carry an equal load.
Find the value of m such that each slat carries a design load of 15#.
Answer: m = 300/15 = 20 slats.
(c) Suppose that the slats break independently of one another for the 300# load. Find the probability structure for Y = the
act of recording the number of slats that break.
Solution: For the kth slat define the Bernoulli random variable Wk via the equivalent events [Wk  1] ~ [ X k  15] where
Xk=the act of recording the breaking strength of the kth slat. Then the random variables {Wk }20
k 1 are iid Bernoulli random
variables with p  Pr[Wk  1]  Pr[ X k  15]  .001 . Hence, the number of slats that break is Y 
binomial pdf with n=20 and p=.001. The pdf for Y is shown below.
15
W
k 1
k
, and it has a
4.5
1
0.9
0.8
0.7
y
f (y)
0.6
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
8
10
12
14
The number of slats that break (y)
16
18
20
Figure 4.3 A binomial pdf model for Y = the act of recording the number of slats that break.
(d) Use your model in (c) to estimate the probability that no more than one slat will break under a 300# total load.
Solution: Pr[Y  1]  FY (1) = .9998.
(e) Suppose that a customer who has purchased a chair returns it, claiming that while he was sitting on it, the chair broke.
Inspection of the damage reveals that 3 adjacent center slats are broken. The breaks all occurred at the midpoints. Discuss
how you might evaluate this claim. [Note: you have estimated that the man weighs no more than 220#.]
Answer: First of all, the fact that 3 adjacent slats out of 20 failed strongly suggests that they did not fail independently.
The fact that they are in the center of the chair is also important. These facts suggest that a load significantly greater than
15#/slat was placed on the center area of the chair. One way this could happen is if an individual were to stand on the
chair. A force of 220# shared by a small fraction of the 20 slats- say, even half of them- would imply that each slat carried
a 22# load. This value is 2# greater than the mean breaking strength of a typical slat. From the normal model, the
probability that a slat breaks under a load of no more than 22# is ~0.89. If they are assumed to break independently, then
the probability that 3 or more of 10 slats would break is ?? □
4.3 ONE-DIMENSIONAL PROBABILITY MODELS
There are some popular pdf models for 1-D random variables. They include two types. One relates to discrete random
variables; that is, random variables having a discrete sample space. The other relates to continuous random variables; that
is, ones having a continuous sample space. Very often the biggest problem confronting a person who might desire to use a
probability model- as opposed, say, to histogram data- is choosing the most appropriate model. Clearly, the shape of a
histogram can provide valuable guidance. Equally valuable is the type of sample space.
We have, to this point, covered a number of model pdf’s in addition to the non-model Bernoulli pdf. The sum of n
Bernoulli random variables has a sample space {0,1,…,n}. It is a discrete random variable. If n is sufficiently large, such
that the events of interest are much larger than singleton events, then one might invoke a continuous pdf model. In any
case, as noted above, without any assumptions pertaining to these n random variables, it is virtually impossible to justify
4.6
any model. If it is assumed that they are iid then their sum is a binomial random variable. If the number n is unknown, and
potentially infinite, then the Poisson model is often used. If n is sufficiently large, then their average can be approximated
by a normal model [due to the Central Limit Theorem (CLT)].
One often finds in standard textbooks on the subject that there are other popular distributions that work well in certain
applications. For example, the time-to-failure of many electronic components is often modeled using an exponential pdf. It
is one of a family of models wherein probability is inversely related to x. Again, one should exercise a bit of care in
relation to the sample space. The exponential random variable, call it X, has the continuous sample space S X  [0, ) .
However, if the time-to-failure is measured in, say, years, then the sample space for X is S X  {0,1,, } . In this case,
one might resort to a discrete exponential model.
In relation to statistics (i.e. estimators of unknown parameters, such as the mean, variance, correlation coefficient, etc.)
there are commonly used pdf models. They include the student’s t, the chi-squared, the F, and others. These are all model
pdf’s. They depend on assumptions, just as the binomial model depends on the iid assumption in relation to the associated
Bernoulli random variables.
Table 4.1 Some 1-D PDF’s Given in Matlab
betapdf
binopdf
chi2pdf
exppdf
fpdf
gampdf
geopdf
hygepdf
lognpdf
nbinpdf
ncfpdf
nctpdf
ncx2pdf
normpdf
poisspdf
raylpdf
tpdf
unidpdf
unifpdf
wblpdf
Beta probability density function
Binomial probability density function
Chi-square probability density function
Exponential probability density function
F probability density function
Gamma probability density function
Geometric probability density function
Hypergeometric probability density function
Lognormal probability density function
Negative binomial probability density function
Noncentral F probability density function
Noncentral t probability density function
Noncentral chi-square probability density function
Normal probability density function
Poisson probability density function
Rayleigh probability density function
Student's t probability density function
Discrete uniform probability density function
Continuous uniform probability density function
Weibull probability density function
With the advent of powerful and inexpensive computers and statistical software, there is nowadays access to many more
pdf models than are discussed in traditional textbooks. Having an awareness of the vast array of possible models can lead
one to identifying a model that is more appropriate than the standard models related to a given application. For this
4.7
reason, in Table 4.1 we provide a list of some of the pdf models included in the Matlab Statistics Toolbox.The models in
Table 1 are listed alphabetically. It would be instructive to break them into two groups; namely, discrete and continuous
models. [See Problems 4.1 and 4.2.] For each pdf in this table, there is a corresponding random sample generator
function.
Example 4.3 Suppose that it is believed that X= the act of recording the time-to-failure of a microprocessor has an
exponential pdf model. This is a 1-parameter model, since the pdf requires one parameter to completely define it.
Specifically, the sample space and pdf are:
S X  [0, )
&
f X ( x) 
1
X

e
x
X
.
Now, suppose that one does not know the numerical value of  X , and so a sample of n microprocessors will be tested in

order to estimate it. Let X  ( X 1 , X 2 ,  , X n ) be the n-tuple data collection random variable. Suppose that it is

reasonable to assume that the components of X are independent, and that each one has the same exponential pdf as the

generic random variable, X. Under these assumptions, the estimator of  X , which is  X  X , has mean and variance of



 X are E( X )   X and Var (  X )   X2 / n , respectively. Even though we have not discussed the exponential pdf, the
reader can easily use the internet to find information about it. From Wikipedia
[http://en.wikipedia.org/wiki/Exponential_distribution ] we find that the variance of X ~ exp(  X ) is  X2   X2 . And so,


the variance of the estimator  X is Var (  X )   X2 / n . Suppose that, even though we do not know the value of  X , we

are confident that  X  3 . Then Var (  X ) max  9 / n .



(a) Suppose that n is large enough to invoke the CLT in relation to the statistic  X  X . Then  X ~ N (  X ,   X ) .
Consider the following sequence of equivalent events:
  X   X x   X  
[  X  x]  [  X   X  x   X ]  

  [ Z  z] .
  X 
   X



It follows that F X ( x)  Pr[  X  x] 

Pr[ Z  z ]  FZ ( z ) . Since  X  X has a normal cdf, then so does Z. The
random variable Z has a mean of zero and a variance equal to 1. It is called the standard normal random variable.

(b) Since Var (  X )   X2 / n , we have   X   X / n . Hence,
  X   X x   X


  X
   X

  X   X
x  X 




.
 X / n 
  X / n

4.8

Suppose that we want to ensure, with probability, p, that our estimator  X yields a number that is close to the unknown
value  X . Specifically, for a specified value of ε, consider the equivalent events:






X  X
X  X
      n 
  n   [ z1  Z  z1 ] .
   
X / n
X




Hence,



X  X
p  Pr   
    Pr[ z1  Z  z1 ]  FZ ( z1 )  FZ ( z1 )  1  2 FZ ( z1 ) .
X


1 p 
 . Since z1   n , we arrive at the required sample size:
 2 
It follows that  z1  FZ1 
2

1 p  
n    FZ1 
 /   .
 2  

(4.1)
(c) Use the Matlab command norminv( p , 0 , 1) to compute the values of (4.1) for the paired values of
  {. 01 , .05 , 0.1 , .15 , .20 , .25} and p  {. 99 , .95 , .90 , .85 , .80} .
Solution: The results and code are given below. The table must be viewed with care, because there are small values of n
that will preclude the reasonableness of the large-n assumption needed to invoke the CLT. Having said that, the table
below provides a quantitative basis for choosing n.
E
/ P
.99
.95
.90
.85
.80
.01
66,349
38,415
27,056
20,723
16,424
.05
2,654
1,537
1,083
829
657
.10
664
385
271
208
165
.15
295
171
121
93
73
.20
166
97
68
52
42
.25
107
62
44
34
27
4.9
(d) From the table in (c), we note that for p = .85 and ε = 0.25 the needed sample size is n = 34. Run 10,000 simulations

of X  ( X 1 , X 2 ,  , X n ) for X ~ Exp( X  3.5 yrs) in order to determine if the CLT is reasonable.
Solution:
Histogram-based & Normal pdf plots for muXhat
0.7
0.6
0.5
f(x)
0.4
0.3
0.2
0.1
0
0
1
2
3
4
5
6
7
x

Figure 4.4 Scaled histogram-based pdf for  X  X based on 10,000 simulations, compared to a normal pdf model.
This figure suggests that the sample size n = 34 is at least ‘close’ to being large enough to invoke the CLT in relation to

 X  X . The histogram-based pdf is slightly asymmetric; in particular, it is skewed to the right. □
TWO-DIMENSIONAL PROBABILITY MODELS
Consider a 2-D random variable (X,Y). While, marginally, X and Y are of interest, a main point of addressing the 2-D
random variable (X,Y) is to investigate the relationship between them. If they are (statistically) independent, then there is
no relationship. Another main point is that, even if they are independent, one might need information related to both of
them in order to address the probability structure of one of them. Before we address this latter point, we will first address
investigations pertaining to the relationship between them
Example 4.4 It is desired to investigate whether or not the malfunction of the transmission of a given tractor model is
related to failure of a main bearing. Develop the probability structure of the appropriate 2-D random variable that this
problem relates to. Then identify an appropriate parameter in this 2-D structure that the investigation should focus on.
Solution:
For any selected tractor, let X = the act of noting that the transmission has not (0) or has (1) malfunctioned. Let Y = the
act of noting that the main bearing has not (0) or has (1) failed. Then (X,Y) is a 2-D Bernoulli random variable. Recall
that the probability structure of such a random variable is completely defined by any three of the four joint probabilities.
Alternatively, it is completely defined by the model common parameters pX , pY and p1|1. It is this conditional probability
that the investigation should focus on. □
4.10
The above example does not involve any probability model. The 2-D Bernoulli random variable was not arrived via any
specified model. The next example does entail specification of a model.
Example 4.5 It is desired to investigate the relationship between temperature and viscosity of new synthetic oil. A total of
six temperatures will be considered: { -40, -20 , 0 , 40, 100, 200 } oC. A given oil sample will be randomly assigned one
of these temperatures to have its viscosity measured in mPa. [ See http://en.wikipedia.org/wiki/Motor_oil ]. Then specify
what you believe is a reasonable probability structure for the 2-D random variable
(a) Describe an appropriate generic 2-D random variable involved in this study.
Answer: Let T = the act of recording the temperature that any oil sample will be tested at, and let V = the act or
measuring the viscosity. Then clearly ST = { -40, -20, 0, 40, 100, 200 }. Without any a priori information related to V, we
will assume that SV  [0 , ) .
(b) Give the pdf for T.
Answer: Since a given oil sample is to be ‘randomly’ assigned to one of the 6 temperatures, we must assume that T has a
uniform pdf on ST. Hence, f T (t )  Pr[T  t ]  1 / 6 .
(c) We will assume that at any given temperature, t, the mean and standard deviation of V depend on t. We will also
assume that V has a bell-shape, or normal pdf for each t-value. Give the formula for this conditional pdf.
Answer:
[v  V (t )] 2 
f V |T t (v) 
exp 
.
2
2

(
t
)
 V (t ) 2
V


1
(d) From (b) and (c) obtain the expression for the marginal pdf for V.
Answer: Recall that, in relation to (T,V), the event [V  v]  [V  v
f V (v ) 
 f (T ,V ) (v, t ) 
tST
 T  ST ] . Hence,
[v  V (t )] 2 
1
1
f
(
v
)
f
(
t
)

exp
 
 V |T t T

.
2
2

(
t
)
 6 tST  V (t ) 2
tST
V


(e) Use (d) to compute E (V )  V
Solution:
[v  V (t )] 2 
v
1
1
E (V )   v f V (v) dv     { 
exp 
dv}     V (t ) . □
2
 6 tST SV  V (t ) 2
 6 tST
 2 V (t ) 
SV
4.11
The 2-D Normal Random VariableThe 2-D normal random variable is perhaps the most well-known 2-D random variable. Let (X,Y) have components that
are, marginally, normal. The marginal pdf’s are:
f X ( x) 
fY ( y) 
1
X
1
Y
  (x   X )2 
exp 
 ; S X  (, )
2 X2
2


(4.2a)
  ( y  Y ) 2 
exp 
 ; SY  (, )
2
2
 2 Y

(4.2b)
Even though X and Y are each marginally normal, it does not necessarily follow that (X,Y) has a normal pdf. Assume that
X and Y are jointly normal. Then the pdf for (X,Y) is:
f ( X ,Y ) ( x, y ) 
1
2  X  Y
  1  ( x   X ) 2 ( y  Y ) 2
 x  X
exp 

  

2
2
2
2 Y
1  2
 X
1    2 X
  y  Y
 
 Y
 
  .
 
(4.2c)
From (4.2c) we see that the 2-D normal pdf is a 5-parameter pdf. In addition to the mean and standard deviation for X and
Y, we have the correlation coefficient:

 
 XY
 XY
 XY .
(4.3)

Recall that the covariance between X and Y is defined as  X Y  E[( X   X )(Y  Y )] . The following example
illustrates how the various parameters of this pdf influence its shape.
Example 4.6 The axle vibration and interior noise level for a vehicle driving at a given speed are believed to have a joint
normal distribution. Let X = the act of measuring the axle vibration with S X  [0, ) and let Y= the act of measuring the
interior noise level with SY  [0, ) . Assume the numerical values of the pdf parameters are:  X  30 ft/sec (rms),
 X  1 ft/sec (rms), Y  70 dBA,  Y  3 dBA, and  XY  0.7 .
(a) Use the Matlab command mvnpdf to obtain a plot of the pdf for (X,Y).
Solution: The Matlab code to produce the pdf plot below is given in the Appendix.
4.12
Contour Plot of the PDF for (X,Y)
Surface Plot of the PDF for (X,Y)
80
0.08
75
y
f(x,y)
0.06
0.04
70
0.02
65
0
90
80
34
60
32
70
30
60
50
y
26
27
28
29
28
26
30
x
31
32
33
34
x
Figure 4.4 Surface and contour plots of the pdf for (X,Y).
(b) From the contour plot in (a) deduce the means and the  3 ranges for X and Y.
Solution: The center of the plot is located at the point (30,70), which corresponds to ( X , Y ) . The  3 range for X is
the interval (27,33), which corresponds to  X  1 . The  3 range for Y is the interval (61.79), which corresponds to
Y  3.
(c) To determine how the pdf parameters control the slope of the main axis of the elliptical contours, consider the linear
prediction model:

Y  mX  b .
(4.4a)
In the next chapter we will show that the ‘best’ prediction model of the form (4.4a) satisfies two conditions:
(C1)
(C2)


E (Y )  E (Y ) (i.e. Y is an unbiased estimator of Y), and

Cov(Y  Y , X )  0 (i.e. the prediction error is uncorrelated with X)
In relation to (C1), since the expectation operator E(*) is a linear operation, (C1) becomes:
(C1’)
m  X  b  Y .
We will show that the Cov(*) has a similar property that results in:
(C2’)
 XY
 m X2 .
Use (C1’) and (C2’) to determine the expressions and resulting numerical values of the model parameters m and b.
4.13
Solution:
From (C2’) we have m
  XY /  X2
From this and (C1’) it follows that b 
 
  XY
 XY
 Y
 
 X
Y  m  X

 
 3
   XY  Y   0.7   2.1 .
1

 X 
 70  2.1(30)  7 .
(d) Use the contour plot to estimate the slope of the main axis of the elliptical contours. Then compare this number to the
numerical values of the prediction model parameter(s).
Contour Plot of the PDF for (X,Y)
Solution:
80
The estimated slope is:
This is modestly close to m = 2.1, but sufficiently
different that it cannot be concluded that the slope
of the major axis equals the slope parameter, m, in
the linear model (4.4a).
y
81  59 22

 2.75
34  26 8
75
70
65
60
26
27
28
29
30
x
31
32
33
34
(e) For the case where   0 , it follows from (4.3) that X and Y are uncorrelated. Show that, in this case, it can be
claimed that, in fact, X and Y are independent.
Solution: For   0 equation (4.2) becomes

  ( x   X ) 2 ( y  Y ) 2  
1
f ( X ,Y ) ( x, y ) 
exp  


e


2
2  X  Y
2 Y2    X 2
  2 X
1
( x X )2
2 X2


1
 Y 2
e
( y  Y ) 2
2 Y2
.
This is exactly the relation f ( X ,Y ) ( x, y )  f X ( x)  f Y ( y ) that is needed to show that X and Y are independent. □
(f) Suppose now that r = .99. Obtain a plot of the pdf and draw conclusions as to how r controls its shape and orientation.
Answer: The plot at the right shows that as   1 , the
Contour Plot of the PDF for (X,Y)
80
pdf shape approaches that of a straight line, while the
75
y
slope of the line remains relatively the same. □
70
65
60
26
27
28
29
30
x
31
32
33
34
4.14
In the last problem we observed that as   1 the joint pdf ‘collapses’ to the form of a straight line.. To see what
effect this has on the linear model (4.4a), suppose that, in fact, we have a perfectly linear relationship between X and Y;
that is:
Y  mX  b .
(4.5a)
Taking the expected value of (4.5a) gives:
Y  m X  b .
(4.5b)
Hence,


 XY  E[( X   X )(Y  Y )]  E[( X   X ){( mX  b)  (m X  b)}]  m E[( X   X ) 2 ]  m X2 .
(4.5c)
Similarly,


 Y2  E[(Y  Y ) 2 ]  m 2 E[( X   X ) 2 ]  m 2 X2 .
(4.5d)
And so, the correlation coefficient (4.3) becomes
 XY
 
 XY

m X2

| m|  X2

 1
m

|m|
 1
for m  0
.
for m  0
(4.5e)
From (4.5), we can make the following
Conclusion: The correlation coefficient is a measure of how linearly related X and Y are.
This conclusion was based on (4.5), which in no way depended on any pdf assumptions. It was demonstrated in
comparing parts (d) and (f) of Example 4.6, wherein the pdf’s were normal. But it holds for any pdf’s.
In the situation where normality is assumed, then it was shown in (e) of Example 4.6 that if X and Y are
uncorrelated, then they are independent. This is one reason that the assumption of normality is so often made. It
is nontrivial to prove that X and Y are independent. But if they are jointly normal, then all one needs to do is
prove that the correlation is zero!
In-Class Problem 1 Prove that if random variables X1 and X2 are independent, then they are uncorrelated.
Proof:
4.15
E ( X1 X 2 ) 
 xx
f
1 2 ( X1 X 2 )
( x1 , x2 ) dx1dx2
S X 2 S X1

 xx
1 2
f X1 ( x1 ) f X 2 ( x2 ) dx1dx2
by independence
S X 2 S X1

x
2
f X 2 ( x2 )dx2  x1 f X1 ( x1 )dx1 by calculus . 
SX2
S X1
Thus, we see that if two random variables are independent, then they are uncorrelated. However, the converse is
not necessarily true. Uncorrelatedness only means that they are not related in a linear way. This is important!
Many engineers and scientists assume that because X and Y are uncorrelated, they have nothing to do with each
other (i.e. they are independent). It may well be that they are, in fact, very related to one another, as is
illustrated in the following example.
Example 4.7 Modern warfare in urban areas requires that projectiles fired into those areas be sufficiently
accurate to minimize civilian casualties. Consider the act of noting where in a circular target a projectile hits.
This can be defined by the 2-D random variable (R, Ф) where R is the radial distance from the center and Ф is
the angle relative to the horizontal right-pointing direction. Suppose that R has a pdf that is uniform over the
interval [0, ro] and that Ф has a pdf that is uniform over the interval [0, 2π). Thus, the marginal pdf’s are:
f R (r )  (1 / ro )
for r [0, ro ] and
f ( )  (1 / 2 )
for  [0,2 ) .
Furthermore, suppose that we can assume that these two random variables are statistically independent. Then
the joint pdf for (R , Ф) is:
f ( R , ) (r ,  )  (1 / 2 ro )
This pdf is shown in Figure 4.5.
for r  [0, ro ] and   [ ,  )
4.16
1.5
1
f(r,theta)
0.5
0
-0.5
-1
8
6
1
0.8
4
0.6
0.4
2
0
Angle in Radians (theta)
0.2
0
Figure 4.5. Plot of f ( R , ) (r ,  )  (1 / 2 ro )
Radius (r)
for r  [0, ro ] and   [0,2 ) for ro = 1.
(a) Compute the mean value of the location of impact in the x-y plane:
The point of impact of the projectile may be expressed in polar form as

W
 R ei  . Find the mean of W.
Solution: Since W=g(R,Ф), where g(r,φ)=reiφ, we have
E (W )  E[ g ( R,  )] 
ro
2
 
i
r e f ( R ,  ) (r ,  )d dr 
r 0  0
1
2 ro
2
ro

r 0
r dr
i
 e d 
 0
1  ro2  ei 2  ei 0 
 
 = 0.
2 ro  2 
i

(b) Show that X = Rcos(Ф) and Y = Rsin(Ф). are uncorrelated:
Solution: We need to show that E(XY) = E(X)E(Y). To this end,
E ( X )  E[ g ( R,  )] 
ro

2
 r cos( ) f( R, ) (r ,  )d dr 
r 0  0
1  ro2 
 [sin( 2 )  sin( 0)] = 0 ,
2 ro  2 
4.17
ro
E (Y )  E[ g ( R,  )] 

2
 r sin(  ) f( R, ) (r ,  )d dr 
r 0  0
E ( XY )  E[ g ( R,  )] 
ro

 1  ro2 
 [cos( 2 )  cos(0)] = 0 , and
2 ro  2 
2
2
 r cos( ) sin(  ) f( R, ) (r ,  )d dr 
r 0  0
2
1  ro3 
   cos( ) sin(  )d .
2 ro  3  0
To compute the value of the rightmost integral, one can use a table of integrals, a good calculator, or the
trigonometric identity sin(α+β) = sin(α)cos(β) + cos(α)sin(β). We will use this identity for the case where α = β
sin( 2 )
= φ. Thus, cos(φ)sin(φ)=
. From this, it can be easily shown that the rightmost integral is zero. Hence,
2
we have shown that X and Y are, indeed, uncorrelated.
(c) Simulate {(Xk,Yk)}k=1,n:
To this end, we will (only for convenience) choose ro = 1and n=1000. To simulate a measurement of (X,Y)
we use r = rand(1,1) and φ = 2π rand(1,1). In other words, the 2-D action (X,Y) = (rand(1,1) , 2π rand(1,1) ).
Consequently, a simulation of 1000 iid 2-D actions {(Xk,Yk)}k=1,1000 =(rand(1000,1) , 2π rand(1000,1) ). The
scatter plot below shows 1000 simulations associated with (X, Y).
1
0.8
0.6
0.4
y
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-1
-0.8
-0.6
-0.4
-0.2
0
x
0.2
0.4
0.6
0.8
1
4.18
Figure 4..6 A plot of 1000 simulations of (X, Y) = (Rcos(Ф), Rsin(Ф) ), equivalently, one simulation of {(Xk,
Yk) = (Rk cos(Фk ), Rk sin(Фk ) }k=1:1000.
(d) Comment as to whether the scatter plot suggests a linear relationship between X & Y:
There is no suggestion of a linear relationship between X and Y. In fact, the sample correlation coefficient
(computed via corrcoef(x,y) ) is -0.036. This is consistent with theory [i.e. part (b)].
(e) Find the joint pdf for (X , Y):
In the r-φ coordinate system we saw that the pdf for (R , Ф) was uniform. However, in the x-y coordinate
system, the scatter plot suggests that the probability is higher near the point (0,0) than near the perimeter of the
circle. The goal of this part is to arrive at a mathematical formula for f ( X ,Y ) ( x, y ) . We will arrive at this in two
different ways. First, we will use the authors’ approach, which is based primarily on probability. Then we will
use an approach that is based primarily on events. The reason for doing this is to try to convince the reader of
the value of thinking more deeply about events, as to using more mathematical tools.
Method 1: Using the Jacobin Transformation in Relation to pdf’s
Theorem 4.1 Let X = (X1, X2) be a continuous random variable, and let Y = (Y1, Y2) = g(X1, X2) . Assume that
the function g(*) satisfies the following two conditions:
(C1) g(x1, x2) is differentiable with respect to both x1 and x2 , and
(C1) g(x1, x2) is a one-to-one function; that is, from (y1, y2) = g(x1, x2) one can uniquely recover (x1, x2) by the
inverse functions h1 (y1, y2) = x1 and h2 (y1, y2) = x2. Then
f (Y1 ,Y2 ) ( y1 , y2 ) 
f ( X1 , X 2 ) [h1 ( y1 , y2 ), h2 ( y1 , y2 )]  | J |
where | J | is the determinant of the Jacobian matrix
  h ( y , y ) / y
h1 ( y1 , y2 ) / y1 
1
J  1 1 2
.
h2 ( y1 , y2 ) / y1 h2 ( y1 , y2 ) / y2 
4.19
In relation to the problem at hand, we have (X, Y) = (Rcos(Ф), Rsin(Ф) ). Let’s restrict our attention to the
segment of the sample space S   [0 , 2 ) that is [0 , 2 ) , because on this interval both the sine and cosine
functions are 1-1. Hence, we can uniquely recover x and y from the relations
r  h1 ( x, y )  x 2  y 2 and   h2 ( x, y)  arctan( y / x) .
The elements of the Jacobian are:
r /  x   h1 ( x, y ) /  x  x / x 2  y 2
;  r /  y   h1 ( x, y ) /  y  y /
 /  x   h2 ( x, y) /  x   y /( x 2  y 2 )
And so
;
x2  y2
 /  y   h2 ( x, y) /  y  x /( x 2  y 2 ) .
| J |  1 / x 2  y 2  1 / r . Hence,
f ( X ,Y ) ( x, y ) 
f ( R1 , ) [ x 2  y 2 , arctan( y / x)]  | J | 
1
1
| J |
2
2 x 2  y 2
. (1)
Method 2: Using equivalent events:
Consider the equivalent events shown in the figure below.
y
1
2
r
1
x
1
r
Figure 4.7 The event A in the sample space S ( R , ) (left) is equivalent to the event B in the sample space
0
S ( X ,Y ) (right).
Since the events A and B are equivalent events, they must have the same probability. We know that Pr( A) is the
blue area weighted by (1/2π); that is, Pr( A)  (2  r ) / 2  r . Now, the area on the right is simply the
difference of the area of the outer circle and the inner circle, which is:
area( B)   (r  r ) 2   r 2  2 r r   (r ) 2  2 r r .
4.20
This area must have probability Pr( B)  r . Furthermore, this probability must be uniformly spread over this
area. Hence, the probability density (i.e. probability per unit area) must be
f ( X ,Y )  Pr( B) / area ( B)  r /( 2 r r ) 
1
2 r

1
2 x 2  y 2
.
In conclusion, both methods give the correct result. However, the use of probability in relation to the second
method is relatively minimal. It relies primarily on an understanding of the events involved. ۞
4.21
PROBLEM 1 For the Matlab pdf’s given above, you are to identify the ones that correspond to discrete random
variables. Specifically, for each random variable, X, complete the following table. The first one has been included to give
you an idea of the desired format
Matlab name
pdf name
Sample Space, SX
pdf, fX(x)
µX
σ X2
binopdf(x,n,p)
binomial
{0, 1, 2, …, n}
n x
  p (1  p) n  x
 x
np
np(1-p)
PROBLEM 2 For the Matlab pdf’s given above, you are to identify the ones that correspond to continuous random
variables. Specifically, for each random variable, X, complete the following table. The first one has been included to give
you an idea of the desired format
Matlab name
pdf name
betapdf(x,a,b)
beta
Sample Space, SX
[0,1]
pdf, fX(x)
µX
x a 1 (1  x) b1
B(a, b)
a
ab
σ X2
ab
(a  b) (a  b  1)
2
where B(a,b) is the beta function [ http://en.wikipedia.org/wiki/Beta_distribution ]
PROBLEM 3 Suppose you are stress testing computer memory chips and collecting data on their lifetimes. A sample of
500 chips is to be conducted. Prior to conducting the test, you would like to get an idea of what types of histograms you
might expect. Suppose that similar chip designs have yielded a mean value of 150.23 hours and a standard deviation of
33.69 hours.
(a) Use the Matlab random number generator normrnd to obtain a sample of 500 chip lifetimes. Then compute a scaled
histogram of the data. This histogram should have the following properties:
(P1) It should extend from 0 to 300 hrs. ; (P2) It should include 20 bins ; (P3) Its total area should equal 1.0.
(b) The plot in (a) is an estimate of the pdf of the continuous random variable X=the act of recording the lifetime of any


chip that might be tested. From your data, obtain estimates  X and  X . Discuss how these compare to  X and  X .


(c) On your scaled histogram plot, overlay a plot of a normal pdf that uses  X and  X . Comment on how well it captures
the shape of the histogram-based pdf estimate.
% PROGRAM NAME: Ch4Pr3.m
mu = 150.23;
% True mean
sigma = 33.69;
% True Std. dev.
n = 500;
% Sample size
m = 10^4;
% Number of simulations
db = 15;
% Bin width
bctrs = db/2:db:300-db/2; % Bin centers
x = 0: .1 : 300;
% Essential sample space
4.22
%=============================================
X = normrnd(mu,sigma, n,m); % Simulations
h = hist(X,bctrs);
% Histograms
fX = h/(n*db);
% Scale to unit area
fX = fX';
% Switch rows & columns
fXmean = mean(fX);
% Mean scaled height
fXstd = std(fX);
% Std of height
figure(1)
bar(bctrs,fXmean)
hold on
fn = normpdf(x,mu,sigma);
plot(x,fn,'r','LineWidth',2)
xlabel('x')
title('Plots of Histogram-Based Mean pdf (with +/- 2-std values) & Normal pdf')
grid
%---------------------% Plot +/- 2-sigma values
plot(bctrs,fXmean - 2*fXstd,'*g')
plot(bctrs,fXmean + 2*fXstd,'*g')
-3
14
xPlots
10 of Histogram-Based Mean pdf (with +/- 2-std values) & Normal pdf
12
10
8
6
4
2
0
-2
0
50
100
150
x
200
250
300
(d) Repeat part (c), but for a gamma pdf.
Solution: First, one needs to compute the (a,b) parameters of the gamma pdf from the mean and standard deviation values
of the normal pdf. Then, in the above code the ‘normpdf’ command is replaced by the ‘gampdf’ command.
PROBLEM 4 In attempting to estimate the total amount of oil released as a result of an off-shore oil rig malfunction,
team of spotters will fly over the affected region. In order for a spill to be detected from the plane, it must have a
characteristic dimension of at least 10 meters.
4.23
(a) Let X=the act of recording the characteristic dimension of a spill module. Suppose that in a given region, the ocean
currents are such that the mean of X is  X  50 m and the standard deviation is  X  30 m . Explain why a normal pdf
model is not appropriate.
(b) Suppose that the pdf is well-modeled by a shifted Gamma(a,b) pdf ; that is, X  10  Q , where Q~ Gamma(a,b).
Find the mean and standard deviation of Q.
(c) Use your answers in (b) to find the values of the Gamma(a,b) parameters.
(c) Use the Matlab command gampdf to obtain a plot of the pdf for X (not Q) over the range [0,200].
(d) Use the Matlab command gaminv to find the interval [ x1 , x2 ] , such that Pr[ X  x1 , ]  Pr[ X  x2 , ]  0.025 . This
interval is called the p=0.95 2-sided interval for X.
(e) Let N = the act of recording the number of spill modules during a given fly-over. Suppose that, based on historical
date in the region, the mean value of N is E ( N )   N  12 . A candidate pdf for N that uses only this single piece of
information is N ~ Poisson (  12) . Give SN, the mathematical expression for f N (n) and plot it over the range
{0,1,,30} . Be sure to label your axes and to include a title.
(f) Find the p=0.95 2-sided interval for N.
(g) Suppose, for simplicity, that the area associated with a spill module with characteristic dimension, x, is approximated
by a square with area x2. Use the concept of equivalent events to identify the event associated with X that is equivalent to

the cumulative event [Y  y ] , where we have defined Y  X 2 .
(h) Express the cdf for Y in terms of the cdf for X.
(i) Use the chain rule for differentiation in relation to (h), in order to obtain the expression for the pdf for Y in terms of the
pdf for X.
(j) Use the command gampdf, along with other commands to obtain a plot of f Y ( y) over the range [0,1002].
4.24
PROBLEM 5 This problem concerns the size of a rain drop. In particular, it relates to the pdf document entitled
‘RainDrops’ in the Homework folder. In this problem you are to focus on the discussion pertaining to the size of a rain
drop. Restrict the shape of the raindrop to a sphere.
Note: the pdf document can also be found at: http://www1.cs.columbia.edu/CAVE/publications/pdfs/Garg_TR04.pdf
(a) Let X = the act of measuring the size (i.e. radius) of any rain drop. Give the quoted sample space for X.
SX =
(b) Give the quoted name and expression for the size distribution (including definitions of all related
parameters/variables).
Name:
Expression:
where the parameters/variables are:
(c) Translate the expression in (b) into an expression for the pdf for X.
f X (x) =
(d) From the Matlab list of pdf’s identify the pdf that corresponds to your expression in (c). Also, give the expression for
the parameter(s) of that pdf in terms of the parameters/variables given in (b).
(e) In relation to the parameters/variables given in (b), identify which are parameters and which are variables.
(f) Use the pdf you identified in (d) to obtain a plot of f X (x) . Then discuss how your plot compares to the one given in
Figure 2(b) of the article.
(g) Use Google to try to identify reasonable pdf(s) for the variables you identified in (e).
(h) Discuss whether or not the variable(s) you identified can be reasonably assumed to be independent of X.
4.25
PROBLEM 6 This problem concerns Example 4.6 on p.4.11 of this chapter. In particular, it investigates the sample size,
n, that would be needed to obtain a reliable estimator of the linear prediction model parameters (m,b).
(a) For each of the following sample sizes, n = 50, 100, 150, …, 1000, use simulations to arrive at three plots associated
with the two parameters, (m, b). Each plot should include three lines: the mean, and the  2 bounds of the estimator of
the parameter as a function of n. Be sure to give the mathematical expressions of your estimators of (m,b).
(b) From your plots, arrive at the smallest sample size needed, call it n*, to ensure that the relative error for each parameter
estimate is no more than 10%. For example, for a given sample size, n, your plot in (a) will give numerical values of

 m and of the bounds  m  2 m . The relative error for the estimator m is defined here as 2 m / m .

(c) For the numerical value of n* you found in (b), construct a plot that includes the true linear model y  mx  b , as



well as ‘worst-case’ models y  mx  b .
4.26
A potential topic for a final project: The use of soft woods (e.g. pine) in the construction industry is, presumably, due
to the lack of availability of hard wood. In fact, the major reason for its use has to do with the expense in removing the
moisture. [ http://en.wikipedia.org/wiki/Wood_drying ] Because hardwood has lower permeability than softwood, a
longer drying time is required. The moisture content, mc, is defined as
mc 
wg  wd
wd
where wg = green weight and wd = dry weight (i.e. after drying for 24 hrs. at 103  2 o C .
Let Wg = the act of measuring the green weight, and let Wd = the act of measuring the dry weight of any chosen specimen
of a given type of wood. Suppose the weight is measured to the nearest 0.01#. The equilibrium moisture content, emc, is
the target moisture content. Typically the emc target is on the order of 0.10.
4.27
APPENDIX
Matlab code for Example 4.3
% PROGRAM NAME: example4_3.m
e = [.01 .05 .10 .15 .20 .25];
p = [.99 .95 .90 .85 .80];
ne = length(e); np = length(p);
nvals = zeros(ne,np);
for ke = 1:ne
for kp = 1:np
pk =(1-p(kp))/2;
nvals(ke,kp)=(-norminv(pk,0,1)/e(ke))^2;
end
end
ceil(nvals)
pause
% Run 10,000 simulations for p=0.85 and e=0.25 (=> n=34)
X = exprnd(3.5,34,10000);
Xbar = mean(X);
figure(1)
bw = 7/50;
bvec =bw/2:bw:7-bw/2;
fh = hist(Xbar,bvec);
fh = fh/(bw*10^4);
bar(bvec,fh);
hold on
mu = 3.5;
sig = 3.5/34^.5;
xvec = 0:0.1:7;
fn = normpdf(xvec,mu,sig);
plot(xvec,fn,'r','LineWidth',3)
xlabel('x')
ylabel('f(x)')
title('Histogram-based & Normal pdf plots for mu_Xhat')
4.28
Matlab code for Example 4.6
% PROGRAM NAME: 2Dnormpdf.m
mux = 30; sigx = 1;
muy = 70; sigy = 3;
r = 0.7;
sigxy = r*sigx*sigy;
mu = [mux muy];
Varxy = [sigx^2 sigxy ; sigxy sigy^2];
xmin = mux-4*sigx;
xmax = mux+4*sigx;
ymin = muy-4*sigy;
ymax = muy+4*sigy;
dx = (xmax - xmin)/100;
dy = (ymax - ymin)/100;
x = xmin:dx:xmax;
y = ymin:dy:ymax;
[X,Y] = meshgrid(x,y);
fxy = mvnpdf([X(:) Y(:)],mu,Varxy);
fxy = reshape(fxy,length(y),length(x));
figure(1)
surf(x,y,fxy);
xlabel('x'); ylabel('y'); zlabel('f(x,y)');
title('Surface Plot of the PDF for (X,Y)')
pause
figure(2)
contour(x,y,fxy,50)
xlabel('x'); ylabel('y'); zlabel('f(x,y)');
title('Contour Plot of the PDF for (X,Y)')
grid
Download