4.1 Chapter 4 Some Popular Probability Models 4.1 INTRODUCTION The last chapter addressed the notion of probability in relation to sets. The reason for that approach is that once one has a firm understanding of the set (i.e. event), computing its probability is often straightforward and simple. This point was emphasized by focusing on Bernoulli random variables. Since the sample space for a 1D Bernoulli random variable, X, is S X {0,1} , then any event one can fathom is one unique element of the field of events X { {0},{1}, S X , }. Recall that the p-value for a Bernoulli random variable, X, is defined to be p X Pr{1} Pr[ X 1] . Knowing the numerical value of this parameter allows one to compute the probability of the only other nontrivial member of X , which is the set {0}. Specifically, we have Pr{0} Pr[ X 0] 1 p X . In fact, the two singleton sets {0} and {1} are the only nontrivial subsets of S X . In the case of a 2-D Bernoulli random variable ( X 1 , X 2 ) with sample space S ( X1 , X 2 ) {(0,0), (1,0), (0,1), (1,1) } , which has four elements, the corresponding field of events has 2 4 16 member sets. Even so, if one knows the numerical values of the probability of each of the four singleton sets {(0,0}, {(1,0)}, {(1,0)} and {(1,1)}, then one can easily compute the probability of any subset of S ( X 1 , X 2 ) . For example, knowledge of the numerical values of the three parameters, p00, p10 and p01 immediately provides the numerical value of the parameter p11, since we must have p 0 0 p010 p010 p11 1 . Having knowledge of these, in turn, allows one to compute marginal probabilities associated with the marginal events [ X 1 1] {(1,0), (1,1)} and [ X 2 1] {(0,1), (1,1)} . The probabilities of these two events are p X1 Pr[ X 1 1] Pr{(1,0), (1,1)} p10 p11 and p X 2 Pr[ X 2 1] Pr{(0,1), (1,1)} p01 p11 , which are the p-values for the 1-D Bernoulli random variables X1 and X2, respectively. Even more apparently difficult probability computations are, in fact, no more difficult if one understands the nature of the events. For example, to compute Pr[ X 1 X 2 1] is trivial if one understands that the event [ X 1 X 2 1] as the unique subset of S ( X 1 , X 2 ) that is { ( x1 , x2 ) | x1 x2 1} . This is simply the set {(1,1)}. Hence, Pr[ X 1 X 2 1] p11 . The above 2-D case can be extended to the case of an n-D Bernoulli random variable. In this more general case there are 2n singleton subsets, and so if one knows the numerical values of the probabilities of any 2n-1 of these, then one can compute the probability of any event. In the situation where these n random variables are mutually independent, one needs to know the p-value of each of the n 1-D Bernoulli random variables. In this chapter we will address a variety of random variables whose probability structure is completely dictated by a set of parameters. However, as we shall see, those parameterized structures are associated with assumed shape models. The Bernoulli probability structure entails no such assumptions, due to its simplicity. In this sense, a Bernoulli probability model is, in fact, not a model at all. Any ‘0/1’ random variable is, by definition, a Bernoulli random variable. One need not assume that it has a Bernoulli model for its probability structure. The following example, taken from the last chapter, illustrates the difference between an assumed probability model and a probability structure that relies on no assumptions. 4.2 Example 4.1 Consider the 3-D random variable X ( X 1 , X 2 , X 3 ) where Xk = the act of recording whether the kth of three chosen biopsies taken from a patient is not (0) or is (1) cancerous. Let Y 3 X k 1 k be the act of recording the number of cancerous biopsies. (a) Suppose that we make no assumptions regarding the independence of the components of X . Then the probability structure for Y is parameterized by the probabilities of any 7 of the 8 singleton subsets of S X . These are not parameters of a probability model, since there is no model. (b) Suppose now that we assume that the locations of the chosen biopsies are sufficiently far apart that cancer in any one of the locations could not be directly related to cancer in another chosen region (i.e. there are independent cancerous regions in the patient’s body). Under this assumption the probability structure of X is completely dictated by the 3 pvalues { p k Pr[ X k 1]}3k 1 . This assumed model for Y is a 3-parameter model. (c) Now, suppose that we assume that the components of X are not only mutually independent, but are also identically distributed (i.e. iid). Under this assumption the three parameters { p k Pr[ X k 1]}3k 1 are one and the same parameter, call it pX. In this case, the assumed model for Y is a 1-parameter model. (d) Compute Pr[Y 3] for each of the situations (a) through (c). Solution: (a): Pr[Y 3] Pr[ X 1 1 X 2 1 X 3 1] Pr{(1,1,1)} p111 . (b): Pr[Y 3] Pr[ X 1 1 X 2 1 X 3 1] Pr[ X 1 1] Pr[ X 2 1] Pr[ X 3 1] p X1 p X 2 p X 3 . (c): Pr[Y 3] Pr[ X 1 1 X 2 1 X 3 1] Pr[ X 1 1] Pr[ X 2 1] Pr[ X 3 1] p X1 p X 2 p X 3 p 3X . (e) Under the assumptions in part (c), the model for the probability structure for Y has a name. It is called a Binomial distribution with n=3 and with parameter pX. The more general model in (b) does not have a name. □ 4.3 4.2 MOTIVATION FOR PROBABILITY MODELS There are many reasons that one might desire to invoke a certain probability model. One of the most common relates to attempts to understand the probability structure of a random variable via a data histogram. Consider the following example. Example 4.2 A company that specializes in fabrication of custom furniture is considering using cedar slats to construct the seats of chairs and couches designed for outdoor use. In order to arrive at the number of slats needed to carry a specified design load, the company has decided to conduct bending load tests on 100 slats. The results are summarized in the following two histograms: 14 12 Frequency 10 8 6 4 2 0 16 17 18 19 20 21 Load (lbf ) 22 23 24 25 17 18 19 20 21 Load (lbf ) 22 23 24 25 25 Frequency 20 15 10 5 0 16 Figure 4.1 Histograms associated with breaking strength measurements of 100 wooden slats. The top plot is a 10-bin histogram and the bottom plot is a 5-bin histogram. The 10-bin histogram reveals no measurements in the interval (22.4 ,24.1). Were we to use only this histogram to infer probability, we would have to presume that the probability of recording a breaking load in this region is zero. On the other hand, if we use the 5-bin histogram, the probability of this interval would not be zero. The 5-bin histogram provides a more realistic description of the breaking strength probability structure, but at a price. That price is large bins, or equivalently, less detail in the probability structure. 4.4 Suppose that similar testing on other types of wood have suggested that a normal probability model is often reasonable. The normal pdf is a 2-parameter model. The parameters that describe it are the mean and standard deviation of X. From the 50 measurements, we obtain X x 20.03 and X s 1.65 . The normal pdf model is shown below. 0.25 0.2 X f (x) 0.15 0.1 0.05 0 15 16 17 18 19 20 Load (lbf ) 21 22 23 24 25 Figure 4.2 A normal pdf model for X = the act of measuring the breaking strength of a wooden slat. This model allows one to compute the probability of the interval (22.4 , 24.1), or any other interval, in an unambiguous manner. (a) Use the normal model to estimate the probability of the interval (0,15). Answer: Pr[0 X 15] FX (15) FX (0) .001 3(10 34 ) 0.001 (b) Suppose the design load for a given chair design is 300#, and that each of m slats to be used will carry an equal load. Find the value of m such that each slat carries a design load of 15#. Answer: m = 300/15 = 20 slats. (c) Suppose that the slats break independently of one another for the 300# load. Find the probability structure for Y = the act of recording the number of slats that break. Solution: For the kth slat define the Bernoulli random variable Wk via the equivalent events [Wk 1] ~ [ X k 15] where Xk=the act of recording the breaking strength of the kth slat. Then the random variables {Wk }20 k 1 are iid Bernoulli random variables with p Pr[Wk 1] Pr[ X k 15] .001 . Hence, the number of slats that break is Y binomial pdf with n=20 and p=.001. The pdf for Y is shown below. 15 W k 1 k , and it has a 4.5 1 0.9 0.8 0.7 y f (y) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 14 The number of slats that break (y) 16 18 20 Figure 4.3 A binomial pdf model for Y = the act of recording the number of slats that break. (d) Use your model in (c) to estimate the probability that no more than one slat will break under a 300# total load. Solution: Pr[Y 1] FY (1) = .9998. (e) Suppose that a customer who has purchased a chair returns it, claiming that while he was sitting on it, the chair broke. Inspection of the damage reveals that 3 adjacent center slats are broken. The breaks all occurred at the midpoints. Discuss how you might evaluate this claim. [Note: you have estimated that the man weighs no more than 220#.] Answer: First of all, the fact that 3 adjacent slats out of 20 failed strongly suggests that they did not fail independently. The fact that they are in the center of the chair is also important. These facts suggest that a load significantly greater than 15#/slat was placed on the center area of the chair. One way this could happen is if an individual were to stand on the chair. A force of 220# shared by a small fraction of the 20 slats- say, even half of them- would imply that each slat carried a 22# load. This value is 2# greater than the mean breaking strength of a typical slat. From the normal model, the probability that a slat breaks under a load of no more than 22# is ~0.89. If they are assumed to break independently, then the probability that 3 or more of 10 slats would break is ?? □ 4.3 ONE-DIMENSIONAL PROBABILITY MODELS There are some popular pdf models for 1-D random variables. They include two types. One relates to discrete random variables; that is, random variables having a discrete sample space. The other relates to continuous random variables; that is, ones having a continuous sample space. Very often the biggest problem confronting a person who might desire to use a probability model- as opposed, say, to histogram data- is choosing the most appropriate model. Clearly, the shape of a histogram can provide valuable guidance. Equally valuable is the type of sample space. We have, to this point, covered a number of model pdf’s in addition to the non-model Bernoulli pdf. The sum of n Bernoulli random variables has a sample space {0,1,…,n}. It is a discrete random variable. If n is sufficiently large, such that the events of interest are much larger than singleton events, then one might invoke a continuous pdf model. In any case, as noted above, without any assumptions pertaining to these n random variables, it is virtually impossible to justify 4.6 any model. If it is assumed that they are iid then their sum is a binomial random variable. If the number n is unknown, and potentially infinite, then the Poisson model is often used. If n is sufficiently large, then their average can be approximated by a normal model [due to the Central Limit Theorem (CLT)]. One often finds in standard textbooks on the subject that there are other popular distributions that work well in certain applications. For example, the time-to-failure of many electronic components is often modeled using an exponential pdf. It is one of a family of models wherein probability is inversely related to x. Again, one should exercise a bit of care in relation to the sample space. The exponential random variable, call it X, has the continuous sample space S X [0, ) . However, if the time-to-failure is measured in, say, years, then the sample space for X is S X {0,1,, } . In this case, one might resort to a discrete exponential model. In relation to statistics (i.e. estimators of unknown parameters, such as the mean, variance, correlation coefficient, etc.) there are commonly used pdf models. They include the student’s t, the chi-squared, the F, and others. These are all model pdf’s. They depend on assumptions, just as the binomial model depends on the iid assumption in relation to the associated Bernoulli random variables. Table 4.1 Some 1-D PDF’s Given in Matlab betapdf binopdf chi2pdf exppdf fpdf gampdf geopdf hygepdf lognpdf nbinpdf ncfpdf nctpdf ncx2pdf normpdf poisspdf raylpdf tpdf unidpdf unifpdf wblpdf Beta probability density function Binomial probability density function Chi-square probability density function Exponential probability density function F probability density function Gamma probability density function Geometric probability density function Hypergeometric probability density function Lognormal probability density function Negative binomial probability density function Noncentral F probability density function Noncentral t probability density function Noncentral chi-square probability density function Normal probability density function Poisson probability density function Rayleigh probability density function Student's t probability density function Discrete uniform probability density function Continuous uniform probability density function Weibull probability density function With the advent of powerful and inexpensive computers and statistical software, there is nowadays access to many more pdf models than are discussed in traditional textbooks. Having an awareness of the vast array of possible models can lead one to identifying a model that is more appropriate than the standard models related to a given application. For this 4.7 reason, in Table 4.1 we provide a list of some of the pdf models included in the Matlab Statistics Toolbox.The models in Table 1 are listed alphabetically. It would be instructive to break them into two groups; namely, discrete and continuous models. [See Problems 4.1 and 4.2.] For each pdf in this table, there is a corresponding random sample generator function. Example 4.3 Suppose that it is believed that X= the act of recording the time-to-failure of a microprocessor has an exponential pdf model. This is a 1-parameter model, since the pdf requires one parameter to completely define it. Specifically, the sample space and pdf are: S X [0, ) & f X ( x) 1 X e x X . Now, suppose that one does not know the numerical value of X , and so a sample of n microprocessors will be tested in order to estimate it. Let X ( X 1 , X 2 , , X n ) be the n-tuple data collection random variable. Suppose that it is reasonable to assume that the components of X are independent, and that each one has the same exponential pdf as the generic random variable, X. Under these assumptions, the estimator of X , which is X X , has mean and variance of X are E( X ) X and Var ( X ) X2 / n , respectively. Even though we have not discussed the exponential pdf, the reader can easily use the internet to find information about it. From Wikipedia [http://en.wikipedia.org/wiki/Exponential_distribution ] we find that the variance of X ~ exp( X ) is X2 X2 . And so, the variance of the estimator X is Var ( X ) X2 / n . Suppose that, even though we do not know the value of X , we are confident that X 3 . Then Var ( X ) max 9 / n . (a) Suppose that n is large enough to invoke the CLT in relation to the statistic X X . Then X ~ N ( X , X ) . Consider the following sequence of equivalent events: X X x X [ X x] [ X X x X ] [ Z z] . X X It follows that F X ( x) Pr[ X x] Pr[ Z z ] FZ ( z ) . Since X X has a normal cdf, then so does Z. The random variable Z has a mean of zero and a variance equal to 1. It is called the standard normal random variable. (b) Since Var ( X ) X2 / n , we have X X / n . Hence, X X x X X X X X x X . X / n X / n 4.8 Suppose that we want to ensure, with probability, p, that our estimator X yields a number that is close to the unknown value X . Specifically, for a specified value of ε, consider the equivalent events: X X X X n n [ z1 Z z1 ] . X / n X Hence, X X p Pr Pr[ z1 Z z1 ] FZ ( z1 ) FZ ( z1 ) 1 2 FZ ( z1 ) . X 1 p . Since z1 n , we arrive at the required sample size: 2 It follows that z1 FZ1 2 1 p n FZ1 / . 2 (4.1) (c) Use the Matlab command norminv( p , 0 , 1) to compute the values of (4.1) for the paired values of {. 01 , .05 , 0.1 , .15 , .20 , .25} and p {. 99 , .95 , .90 , .85 , .80} . Solution: The results and code are given below. The table must be viewed with care, because there are small values of n that will preclude the reasonableness of the large-n assumption needed to invoke the CLT. Having said that, the table below provides a quantitative basis for choosing n. E / P .99 .95 .90 .85 .80 .01 66,349 38,415 27,056 20,723 16,424 .05 2,654 1,537 1,083 829 657 .10 664 385 271 208 165 .15 295 171 121 93 73 .20 166 97 68 52 42 .25 107 62 44 34 27 4.9 (d) From the table in (c), we note that for p = .85 and ε = 0.25 the needed sample size is n = 34. Run 10,000 simulations of X ( X 1 , X 2 , , X n ) for X ~ Exp( X 3.5 yrs) in order to determine if the CLT is reasonable. Solution: Histogram-based & Normal pdf plots for muXhat 0.7 0.6 0.5 f(x) 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 x Figure 4.4 Scaled histogram-based pdf for X X based on 10,000 simulations, compared to a normal pdf model. This figure suggests that the sample size n = 34 is at least ‘close’ to being large enough to invoke the CLT in relation to X X . The histogram-based pdf is slightly asymmetric; in particular, it is skewed to the right. □ TWO-DIMENSIONAL PROBABILITY MODELS Consider a 2-D random variable (X,Y). While, marginally, X and Y are of interest, a main point of addressing the 2-D random variable (X,Y) is to investigate the relationship between them. If they are (statistically) independent, then there is no relationship. Another main point is that, even if they are independent, one might need information related to both of them in order to address the probability structure of one of them. Before we address this latter point, we will first address investigations pertaining to the relationship between them Example 4.4 It is desired to investigate whether or not the malfunction of the transmission of a given tractor model is related to failure of a main bearing. Develop the probability structure of the appropriate 2-D random variable that this problem relates to. Then identify an appropriate parameter in this 2-D structure that the investigation should focus on. Solution: For any selected tractor, let X = the act of noting that the transmission has not (0) or has (1) malfunctioned. Let Y = the act of noting that the main bearing has not (0) or has (1) failed. Then (X,Y) is a 2-D Bernoulli random variable. Recall that the probability structure of such a random variable is completely defined by any three of the four joint probabilities. Alternatively, it is completely defined by the model common parameters pX , pY and p1|1. It is this conditional probability that the investigation should focus on. □ 4.10 The above example does not involve any probability model. The 2-D Bernoulli random variable was not arrived via any specified model. The next example does entail specification of a model. Example 4.5 It is desired to investigate the relationship between temperature and viscosity of new synthetic oil. A total of six temperatures will be considered: { -40, -20 , 0 , 40, 100, 200 } oC. A given oil sample will be randomly assigned one of these temperatures to have its viscosity measured in mPa. [ See http://en.wikipedia.org/wiki/Motor_oil ]. Then specify what you believe is a reasonable probability structure for the 2-D random variable (a) Describe an appropriate generic 2-D random variable involved in this study. Answer: Let T = the act of recording the temperature that any oil sample will be tested at, and let V = the act or measuring the viscosity. Then clearly ST = { -40, -20, 0, 40, 100, 200 }. Without any a priori information related to V, we will assume that SV [0 , ) . (b) Give the pdf for T. Answer: Since a given oil sample is to be ‘randomly’ assigned to one of the 6 temperatures, we must assume that T has a uniform pdf on ST. Hence, f T (t ) Pr[T t ] 1 / 6 . (c) We will assume that at any given temperature, t, the mean and standard deviation of V depend on t. We will also assume that V has a bell-shape, or normal pdf for each t-value. Give the formula for this conditional pdf. Answer: [v V (t )] 2 f V |T t (v) exp . 2 2 ( t ) V (t ) 2 V 1 (d) From (b) and (c) obtain the expression for the marginal pdf for V. Answer: Recall that, in relation to (T,V), the event [V v] [V v f V (v ) f (T ,V ) (v, t ) tST T ST ] . Hence, [v V (t )] 2 1 1 f ( v ) f ( t ) exp V |T t T . 2 2 ( t ) 6 tST V (t ) 2 tST V (e) Use (d) to compute E (V ) V Solution: [v V (t )] 2 v 1 1 E (V ) v f V (v) dv { exp dv} V (t ) . □ 2 6 tST SV V (t ) 2 6 tST 2 V (t ) SV 4.11 The 2-D Normal Random VariableThe 2-D normal random variable is perhaps the most well-known 2-D random variable. Let (X,Y) have components that are, marginally, normal. The marginal pdf’s are: f X ( x) fY ( y) 1 X 1 Y (x X )2 exp ; S X (, ) 2 X2 2 (4.2a) ( y Y ) 2 exp ; SY (, ) 2 2 2 Y (4.2b) Even though X and Y are each marginally normal, it does not necessarily follow that (X,Y) has a normal pdf. Assume that X and Y are jointly normal. Then the pdf for (X,Y) is: f ( X ,Y ) ( x, y ) 1 2 X Y 1 ( x X ) 2 ( y Y ) 2 x X exp 2 2 2 2 Y 1 2 X 1 2 X y Y Y . (4.2c) From (4.2c) we see that the 2-D normal pdf is a 5-parameter pdf. In addition to the mean and standard deviation for X and Y, we have the correlation coefficient: XY XY XY . (4.3) Recall that the covariance between X and Y is defined as X Y E[( X X )(Y Y )] . The following example illustrates how the various parameters of this pdf influence its shape. Example 4.6 The axle vibration and interior noise level for a vehicle driving at a given speed are believed to have a joint normal distribution. Let X = the act of measuring the axle vibration with S X [0, ) and let Y= the act of measuring the interior noise level with SY [0, ) . Assume the numerical values of the pdf parameters are: X 30 ft/sec (rms), X 1 ft/sec (rms), Y 70 dBA, Y 3 dBA, and XY 0.7 . (a) Use the Matlab command mvnpdf to obtain a plot of the pdf for (X,Y). Solution: The Matlab code to produce the pdf plot below is given in the Appendix. 4.12 Contour Plot of the PDF for (X,Y) Surface Plot of the PDF for (X,Y) 80 0.08 75 y f(x,y) 0.06 0.04 70 0.02 65 0 90 80 34 60 32 70 30 60 50 y 26 27 28 29 28 26 30 x 31 32 33 34 x Figure 4.4 Surface and contour plots of the pdf for (X,Y). (b) From the contour plot in (a) deduce the means and the 3 ranges for X and Y. Solution: The center of the plot is located at the point (30,70), which corresponds to ( X , Y ) . The 3 range for X is the interval (27,33), which corresponds to X 1 . The 3 range for Y is the interval (61.79), which corresponds to Y 3. (c) To determine how the pdf parameters control the slope of the main axis of the elliptical contours, consider the linear prediction model: Y mX b . (4.4a) In the next chapter we will show that the ‘best’ prediction model of the form (4.4a) satisfies two conditions: (C1) (C2) E (Y ) E (Y ) (i.e. Y is an unbiased estimator of Y), and Cov(Y Y , X ) 0 (i.e. the prediction error is uncorrelated with X) In relation to (C1), since the expectation operator E(*) is a linear operation, (C1) becomes: (C1’) m X b Y . We will show that the Cov(*) has a similar property that results in: (C2’) XY m X2 . Use (C1’) and (C2’) to determine the expressions and resulting numerical values of the model parameters m and b. 4.13 Solution: From (C2’) we have m XY / X2 From this and (C1’) it follows that b XY XY Y X Y m X 3 XY Y 0.7 2.1 . 1 X 70 2.1(30) 7 . (d) Use the contour plot to estimate the slope of the main axis of the elliptical contours. Then compare this number to the numerical values of the prediction model parameter(s). Contour Plot of the PDF for (X,Y) Solution: 80 The estimated slope is: This is modestly close to m = 2.1, but sufficiently different that it cannot be concluded that the slope of the major axis equals the slope parameter, m, in the linear model (4.4a). y 81 59 22 2.75 34 26 8 75 70 65 60 26 27 28 29 30 x 31 32 33 34 (e) For the case where 0 , it follows from (4.3) that X and Y are uncorrelated. Show that, in this case, it can be claimed that, in fact, X and Y are independent. Solution: For 0 equation (4.2) becomes ( x X ) 2 ( y Y ) 2 1 f ( X ,Y ) ( x, y ) exp e 2 2 X Y 2 Y2 X 2 2 X 1 ( x X )2 2 X2 1 Y 2 e ( y Y ) 2 2 Y2 . This is exactly the relation f ( X ,Y ) ( x, y ) f X ( x) f Y ( y ) that is needed to show that X and Y are independent. □ (f) Suppose now that r = .99. Obtain a plot of the pdf and draw conclusions as to how r controls its shape and orientation. Answer: The plot at the right shows that as 1 , the Contour Plot of the PDF for (X,Y) 80 pdf shape approaches that of a straight line, while the 75 y slope of the line remains relatively the same. □ 70 65 60 26 27 28 29 30 x 31 32 33 34 4.14 In the last problem we observed that as 1 the joint pdf ‘collapses’ to the form of a straight line.. To see what effect this has on the linear model (4.4a), suppose that, in fact, we have a perfectly linear relationship between X and Y; that is: Y mX b . (4.5a) Taking the expected value of (4.5a) gives: Y m X b . (4.5b) Hence, XY E[( X X )(Y Y )] E[( X X ){( mX b) (m X b)}] m E[( X X ) 2 ] m X2 . (4.5c) Similarly, Y2 E[(Y Y ) 2 ] m 2 E[( X X ) 2 ] m 2 X2 . (4.5d) And so, the correlation coefficient (4.3) becomes XY XY m X2 | m| X2 1 m |m| 1 for m 0 . for m 0 (4.5e) From (4.5), we can make the following Conclusion: The correlation coefficient is a measure of how linearly related X and Y are. This conclusion was based on (4.5), which in no way depended on any pdf assumptions. It was demonstrated in comparing parts (d) and (f) of Example 4.6, wherein the pdf’s were normal. But it holds for any pdf’s. In the situation where normality is assumed, then it was shown in (e) of Example 4.6 that if X and Y are uncorrelated, then they are independent. This is one reason that the assumption of normality is so often made. It is nontrivial to prove that X and Y are independent. But if they are jointly normal, then all one needs to do is prove that the correlation is zero! In-Class Problem 1 Prove that if random variables X1 and X2 are independent, then they are uncorrelated. Proof: 4.15 E ( X1 X 2 ) xx f 1 2 ( X1 X 2 ) ( x1 , x2 ) dx1dx2 S X 2 S X1 xx 1 2 f X1 ( x1 ) f X 2 ( x2 ) dx1dx2 by independence S X 2 S X1 x 2 f X 2 ( x2 )dx2 x1 f X1 ( x1 )dx1 by calculus . SX2 S X1 Thus, we see that if two random variables are independent, then they are uncorrelated. However, the converse is not necessarily true. Uncorrelatedness only means that they are not related in a linear way. This is important! Many engineers and scientists assume that because X and Y are uncorrelated, they have nothing to do with each other (i.e. they are independent). It may well be that they are, in fact, very related to one another, as is illustrated in the following example. Example 4.7 Modern warfare in urban areas requires that projectiles fired into those areas be sufficiently accurate to minimize civilian casualties. Consider the act of noting where in a circular target a projectile hits. This can be defined by the 2-D random variable (R, Ф) where R is the radial distance from the center and Ф is the angle relative to the horizontal right-pointing direction. Suppose that R has a pdf that is uniform over the interval [0, ro] and that Ф has a pdf that is uniform over the interval [0, 2π). Thus, the marginal pdf’s are: f R (r ) (1 / ro ) for r [0, ro ] and f ( ) (1 / 2 ) for [0,2 ) . Furthermore, suppose that we can assume that these two random variables are statistically independent. Then the joint pdf for (R , Ф) is: f ( R , ) (r , ) (1 / 2 ro ) This pdf is shown in Figure 4.5. for r [0, ro ] and [ , ) 4.16 1.5 1 f(r,theta) 0.5 0 -0.5 -1 8 6 1 0.8 4 0.6 0.4 2 0 Angle in Radians (theta) 0.2 0 Figure 4.5. Plot of f ( R , ) (r , ) (1 / 2 ro ) Radius (r) for r [0, ro ] and [0,2 ) for ro = 1. (a) Compute the mean value of the location of impact in the x-y plane: The point of impact of the projectile may be expressed in polar form as W R ei . Find the mean of W. Solution: Since W=g(R,Ф), where g(r,φ)=reiφ, we have E (W ) E[ g ( R, )] ro 2 i r e f ( R , ) (r , )d dr r 0 0 1 2 ro 2 ro r 0 r dr i e d 0 1 ro2 ei 2 ei 0 = 0. 2 ro 2 i (b) Show that X = Rcos(Ф) and Y = Rsin(Ф). are uncorrelated: Solution: We need to show that E(XY) = E(X)E(Y). To this end, E ( X ) E[ g ( R, )] ro 2 r cos( ) f( R, ) (r , )d dr r 0 0 1 ro2 [sin( 2 ) sin( 0)] = 0 , 2 ro 2 4.17 ro E (Y ) E[ g ( R, )] 2 r sin( ) f( R, ) (r , )d dr r 0 0 E ( XY ) E[ g ( R, )] ro 1 ro2 [cos( 2 ) cos(0)] = 0 , and 2 ro 2 2 2 r cos( ) sin( ) f( R, ) (r , )d dr r 0 0 2 1 ro3 cos( ) sin( )d . 2 ro 3 0 To compute the value of the rightmost integral, one can use a table of integrals, a good calculator, or the trigonometric identity sin(α+β) = sin(α)cos(β) + cos(α)sin(β). We will use this identity for the case where α = β sin( 2 ) = φ. Thus, cos(φ)sin(φ)= . From this, it can be easily shown that the rightmost integral is zero. Hence, 2 we have shown that X and Y are, indeed, uncorrelated. (c) Simulate {(Xk,Yk)}k=1,n: To this end, we will (only for convenience) choose ro = 1and n=1000. To simulate a measurement of (X,Y) we use r = rand(1,1) and φ = 2π rand(1,1). In other words, the 2-D action (X,Y) = (rand(1,1) , 2π rand(1,1) ). Consequently, a simulation of 1000 iid 2-D actions {(Xk,Yk)}k=1,1000 =(rand(1000,1) , 2π rand(1000,1) ). The scatter plot below shows 1000 simulations associated with (X, Y). 1 0.8 0.6 0.4 y 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1 -0.8 -0.6 -0.4 -0.2 0 x 0.2 0.4 0.6 0.8 1 4.18 Figure 4..6 A plot of 1000 simulations of (X, Y) = (Rcos(Ф), Rsin(Ф) ), equivalently, one simulation of {(Xk, Yk) = (Rk cos(Фk ), Rk sin(Фk ) }k=1:1000. (d) Comment as to whether the scatter plot suggests a linear relationship between X & Y: There is no suggestion of a linear relationship between X and Y. In fact, the sample correlation coefficient (computed via corrcoef(x,y) ) is -0.036. This is consistent with theory [i.e. part (b)]. (e) Find the joint pdf for (X , Y): In the r-φ coordinate system we saw that the pdf for (R , Ф) was uniform. However, in the x-y coordinate system, the scatter plot suggests that the probability is higher near the point (0,0) than near the perimeter of the circle. The goal of this part is to arrive at a mathematical formula for f ( X ,Y ) ( x, y ) . We will arrive at this in two different ways. First, we will use the authors’ approach, which is based primarily on probability. Then we will use an approach that is based primarily on events. The reason for doing this is to try to convince the reader of the value of thinking more deeply about events, as to using more mathematical tools. Method 1: Using the Jacobin Transformation in Relation to pdf’s Theorem 4.1 Let X = (X1, X2) be a continuous random variable, and let Y = (Y1, Y2) = g(X1, X2) . Assume that the function g(*) satisfies the following two conditions: (C1) g(x1, x2) is differentiable with respect to both x1 and x2 , and (C1) g(x1, x2) is a one-to-one function; that is, from (y1, y2) = g(x1, x2) one can uniquely recover (x1, x2) by the inverse functions h1 (y1, y2) = x1 and h2 (y1, y2) = x2. Then f (Y1 ,Y2 ) ( y1 , y2 ) f ( X1 , X 2 ) [h1 ( y1 , y2 ), h2 ( y1 , y2 )] | J | where | J | is the determinant of the Jacobian matrix h ( y , y ) / y h1 ( y1 , y2 ) / y1 1 J 1 1 2 . h2 ( y1 , y2 ) / y1 h2 ( y1 , y2 ) / y2 4.19 In relation to the problem at hand, we have (X, Y) = (Rcos(Ф), Rsin(Ф) ). Let’s restrict our attention to the segment of the sample space S [0 , 2 ) that is [0 , 2 ) , because on this interval both the sine and cosine functions are 1-1. Hence, we can uniquely recover x and y from the relations r h1 ( x, y ) x 2 y 2 and h2 ( x, y) arctan( y / x) . The elements of the Jacobian are: r / x h1 ( x, y ) / x x / x 2 y 2 ; r / y h1 ( x, y ) / y y / / x h2 ( x, y) / x y /( x 2 y 2 ) And so ; x2 y2 / y h2 ( x, y) / y x /( x 2 y 2 ) . | J | 1 / x 2 y 2 1 / r . Hence, f ( X ,Y ) ( x, y ) f ( R1 , ) [ x 2 y 2 , arctan( y / x)] | J | 1 1 | J | 2 2 x 2 y 2 . (1) Method 2: Using equivalent events: Consider the equivalent events shown in the figure below. y 1 2 r 1 x 1 r Figure 4.7 The event A in the sample space S ( R , ) (left) is equivalent to the event B in the sample space 0 S ( X ,Y ) (right). Since the events A and B are equivalent events, they must have the same probability. We know that Pr( A) is the blue area weighted by (1/2π); that is, Pr( A) (2 r ) / 2 r . Now, the area on the right is simply the difference of the area of the outer circle and the inner circle, which is: area( B) (r r ) 2 r 2 2 r r (r ) 2 2 r r . 4.20 This area must have probability Pr( B) r . Furthermore, this probability must be uniformly spread over this area. Hence, the probability density (i.e. probability per unit area) must be f ( X ,Y ) Pr( B) / area ( B) r /( 2 r r ) 1 2 r 1 2 x 2 y 2 . In conclusion, both methods give the correct result. However, the use of probability in relation to the second method is relatively minimal. It relies primarily on an understanding of the events involved. ۞ 4.21 PROBLEM 1 For the Matlab pdf’s given above, you are to identify the ones that correspond to discrete random variables. Specifically, for each random variable, X, complete the following table. The first one has been included to give you an idea of the desired format Matlab name pdf name Sample Space, SX pdf, fX(x) µX σ X2 binopdf(x,n,p) binomial {0, 1, 2, …, n} n x p (1 p) n x x np np(1-p) PROBLEM 2 For the Matlab pdf’s given above, you are to identify the ones that correspond to continuous random variables. Specifically, for each random variable, X, complete the following table. The first one has been included to give you an idea of the desired format Matlab name pdf name betapdf(x,a,b) beta Sample Space, SX [0,1] pdf, fX(x) µX x a 1 (1 x) b1 B(a, b) a ab σ X2 ab (a b) (a b 1) 2 where B(a,b) is the beta function [ http://en.wikipedia.org/wiki/Beta_distribution ] PROBLEM 3 Suppose you are stress testing computer memory chips and collecting data on their lifetimes. A sample of 500 chips is to be conducted. Prior to conducting the test, you would like to get an idea of what types of histograms you might expect. Suppose that similar chip designs have yielded a mean value of 150.23 hours and a standard deviation of 33.69 hours. (a) Use the Matlab random number generator normrnd to obtain a sample of 500 chip lifetimes. Then compute a scaled histogram of the data. This histogram should have the following properties: (P1) It should extend from 0 to 300 hrs. ; (P2) It should include 20 bins ; (P3) Its total area should equal 1.0. (b) The plot in (a) is an estimate of the pdf of the continuous random variable X=the act of recording the lifetime of any chip that might be tested. From your data, obtain estimates X and X . Discuss how these compare to X and X . (c) On your scaled histogram plot, overlay a plot of a normal pdf that uses X and X . Comment on how well it captures the shape of the histogram-based pdf estimate. % PROGRAM NAME: Ch4Pr3.m mu = 150.23; % True mean sigma = 33.69; % True Std. dev. n = 500; % Sample size m = 10^4; % Number of simulations db = 15; % Bin width bctrs = db/2:db:300-db/2; % Bin centers x = 0: .1 : 300; % Essential sample space 4.22 %============================================= X = normrnd(mu,sigma, n,m); % Simulations h = hist(X,bctrs); % Histograms fX = h/(n*db); % Scale to unit area fX = fX'; % Switch rows & columns fXmean = mean(fX); % Mean scaled height fXstd = std(fX); % Std of height figure(1) bar(bctrs,fXmean) hold on fn = normpdf(x,mu,sigma); plot(x,fn,'r','LineWidth',2) xlabel('x') title('Plots of Histogram-Based Mean pdf (with +/- 2-std values) & Normal pdf') grid %---------------------% Plot +/- 2-sigma values plot(bctrs,fXmean - 2*fXstd,'*g') plot(bctrs,fXmean + 2*fXstd,'*g') -3 14 xPlots 10 of Histogram-Based Mean pdf (with +/- 2-std values) & Normal pdf 12 10 8 6 4 2 0 -2 0 50 100 150 x 200 250 300 (d) Repeat part (c), but for a gamma pdf. Solution: First, one needs to compute the (a,b) parameters of the gamma pdf from the mean and standard deviation values of the normal pdf. Then, in the above code the ‘normpdf’ command is replaced by the ‘gampdf’ command. PROBLEM 4 In attempting to estimate the total amount of oil released as a result of an off-shore oil rig malfunction, team of spotters will fly over the affected region. In order for a spill to be detected from the plane, it must have a characteristic dimension of at least 10 meters. 4.23 (a) Let X=the act of recording the characteristic dimension of a spill module. Suppose that in a given region, the ocean currents are such that the mean of X is X 50 m and the standard deviation is X 30 m . Explain why a normal pdf model is not appropriate. (b) Suppose that the pdf is well-modeled by a shifted Gamma(a,b) pdf ; that is, X 10 Q , where Q~ Gamma(a,b). Find the mean and standard deviation of Q. (c) Use your answers in (b) to find the values of the Gamma(a,b) parameters. (c) Use the Matlab command gampdf to obtain a plot of the pdf for X (not Q) over the range [0,200]. (d) Use the Matlab command gaminv to find the interval [ x1 , x2 ] , such that Pr[ X x1 , ] Pr[ X x2 , ] 0.025 . This interval is called the p=0.95 2-sided interval for X. (e) Let N = the act of recording the number of spill modules during a given fly-over. Suppose that, based on historical date in the region, the mean value of N is E ( N ) N 12 . A candidate pdf for N that uses only this single piece of information is N ~ Poisson ( 12) . Give SN, the mathematical expression for f N (n) and plot it over the range {0,1,,30} . Be sure to label your axes and to include a title. (f) Find the p=0.95 2-sided interval for N. (g) Suppose, for simplicity, that the area associated with a spill module with characteristic dimension, x, is approximated by a square with area x2. Use the concept of equivalent events to identify the event associated with X that is equivalent to the cumulative event [Y y ] , where we have defined Y X 2 . (h) Express the cdf for Y in terms of the cdf for X. (i) Use the chain rule for differentiation in relation to (h), in order to obtain the expression for the pdf for Y in terms of the pdf for X. (j) Use the command gampdf, along with other commands to obtain a plot of f Y ( y) over the range [0,1002]. 4.24 PROBLEM 5 This problem concerns the size of a rain drop. In particular, it relates to the pdf document entitled ‘RainDrops’ in the Homework folder. In this problem you are to focus on the discussion pertaining to the size of a rain drop. Restrict the shape of the raindrop to a sphere. Note: the pdf document can also be found at: http://www1.cs.columbia.edu/CAVE/publications/pdfs/Garg_TR04.pdf (a) Let X = the act of measuring the size (i.e. radius) of any rain drop. Give the quoted sample space for X. SX = (b) Give the quoted name and expression for the size distribution (including definitions of all related parameters/variables). Name: Expression: where the parameters/variables are: (c) Translate the expression in (b) into an expression for the pdf for X. f X (x) = (d) From the Matlab list of pdf’s identify the pdf that corresponds to your expression in (c). Also, give the expression for the parameter(s) of that pdf in terms of the parameters/variables given in (b). (e) In relation to the parameters/variables given in (b), identify which are parameters and which are variables. (f) Use the pdf you identified in (d) to obtain a plot of f X (x) . Then discuss how your plot compares to the one given in Figure 2(b) of the article. (g) Use Google to try to identify reasonable pdf(s) for the variables you identified in (e). (h) Discuss whether or not the variable(s) you identified can be reasonably assumed to be independent of X. 4.25 PROBLEM 6 This problem concerns Example 4.6 on p.4.11 of this chapter. In particular, it investigates the sample size, n, that would be needed to obtain a reliable estimator of the linear prediction model parameters (m,b). (a) For each of the following sample sizes, n = 50, 100, 150, …, 1000, use simulations to arrive at three plots associated with the two parameters, (m, b). Each plot should include three lines: the mean, and the 2 bounds of the estimator of the parameter as a function of n. Be sure to give the mathematical expressions of your estimators of (m,b). (b) From your plots, arrive at the smallest sample size needed, call it n*, to ensure that the relative error for each parameter estimate is no more than 10%. For example, for a given sample size, n, your plot in (a) will give numerical values of m and of the bounds m 2 m . The relative error for the estimator m is defined here as 2 m / m . (c) For the numerical value of n* you found in (b), construct a plot that includes the true linear model y mx b , as well as ‘worst-case’ models y mx b . 4.26 A potential topic for a final project: The use of soft woods (e.g. pine) in the construction industry is, presumably, due to the lack of availability of hard wood. In fact, the major reason for its use has to do with the expense in removing the moisture. [ http://en.wikipedia.org/wiki/Wood_drying ] Because hardwood has lower permeability than softwood, a longer drying time is required. The moisture content, mc, is defined as mc wg wd wd where wg = green weight and wd = dry weight (i.e. after drying for 24 hrs. at 103 2 o C . Let Wg = the act of measuring the green weight, and let Wd = the act of measuring the dry weight of any chosen specimen of a given type of wood. Suppose the weight is measured to the nearest 0.01#. The equilibrium moisture content, emc, is the target moisture content. Typically the emc target is on the order of 0.10. 4.27 APPENDIX Matlab code for Example 4.3 % PROGRAM NAME: example4_3.m e = [.01 .05 .10 .15 .20 .25]; p = [.99 .95 .90 .85 .80]; ne = length(e); np = length(p); nvals = zeros(ne,np); for ke = 1:ne for kp = 1:np pk =(1-p(kp))/2; nvals(ke,kp)=(-norminv(pk,0,1)/e(ke))^2; end end ceil(nvals) pause % Run 10,000 simulations for p=0.85 and e=0.25 (=> n=34) X = exprnd(3.5,34,10000); Xbar = mean(X); figure(1) bw = 7/50; bvec =bw/2:bw:7-bw/2; fh = hist(Xbar,bvec); fh = fh/(bw*10^4); bar(bvec,fh); hold on mu = 3.5; sig = 3.5/34^.5; xvec = 0:0.1:7; fn = normpdf(xvec,mu,sig); plot(xvec,fn,'r','LineWidth',3) xlabel('x') ylabel('f(x)') title('Histogram-based & Normal pdf plots for mu_Xhat') 4.28 Matlab code for Example 4.6 % PROGRAM NAME: 2Dnormpdf.m mux = 30; sigx = 1; muy = 70; sigy = 3; r = 0.7; sigxy = r*sigx*sigy; mu = [mux muy]; Varxy = [sigx^2 sigxy ; sigxy sigy^2]; xmin = mux-4*sigx; xmax = mux+4*sigx; ymin = muy-4*sigy; ymax = muy+4*sigy; dx = (xmax - xmin)/100; dy = (ymax - ymin)/100; x = xmin:dx:xmax; y = ymin:dy:ymax; [X,Y] = meshgrid(x,y); fxy = mvnpdf([X(:) Y(:)],mu,Varxy); fxy = reshape(fxy,length(y),length(x)); figure(1) surf(x,y,fxy); xlabel('x'); ylabel('y'); zlabel('f(x,y)'); title('Surface Plot of the PDF for (X,Y)') pause figure(2) contour(x,y,fxy,50) xlabel('x'); ylabel('y'); zlabel('f(x,y)'); title('Contour Plot of the PDF for (X,Y)') grid