Chap. 5 - Sun Yat

Chapter 5. Joint Probability Distributions and Random Sample Weiqi Luo (骆伟祺) School of Software Sun Yat-Sen University Email：weiqi.luo@yahoo.com Office：# A313 Chapter 5: Joint Probability Distributions and Random Sample      5.1. Jointly Distributed Random Variables 5.2. Expected Values, Covariance, and Correlation 5. 3. Statistics and Their Distributions 5.4. The Distribution of the Sample Mean 5.5. The Distribution of a Linear Combination 2 School of Software 5.1. Jointly Distributed Random Variables  The Joint Probability Mass Function for Two Discrete Random Variables Let X and Y be two discrete random variables defined on the sample space S of an experiment. The joint probability mass function p(x,y) is defined for each pair of numbers (x,y) by p ( x , y )  P ( X  x and 3 Y  y) School of Software 5.1. Jointly Distributed Random Variables  Let A be any set consisting of pairs of (x,y) values. Then the probability P[(X,Y)∈A] is obtained by summing the joint pmf over pairs in A: p [( X , Y )  A ]   p ( x, y ) ( x , y ) A  Two requirements for a pmf p ( x, y )  0  x p ( x, y )  1 y 4 School of Software 5.1. Jointly Distributed Random Variables  Example 5.1 A large insurance agency services a number of customers who have purchased both a homeowner’s policy and an automobile policy from the agency. For each type of policy, a deductible amount must be specified. For an automobile policy, the choices are $100 and $250, whereas for a homeowner’s policy the choices are 0, $100, and $200. Suppose an individual with both types of policy is selected at random from the agency’s files. Let X = the deductible amount on the auto policy, Y = the deductible amount on the homeowner’s policy 0 y 100 200 100 0.20 0.10 0.20 250 0.05 0.15 0.30 Joint Probability Table p(x,y) x 5 School of Software 5.1. Jointly Distributed Random Variables  Example 5.1 (Cont’) 0 y 100 200 100 0.20 0.10 0.20 250 0.05 0.15 0.30 p(x,y) x p(100,100) =P(X=100 and Y=100) = 0.10 P(Y ≥ 100) = p(100,100) + p(250,100) + p(100,200) + p(250,200) = 0.75 6 School of Software 5.1. Jointly Distributed Random Variables  The marginal probability mass function The marginal probability mass functions of X and Y, denoted by pX(x) and pY(y), respectively, are given by pX ( x)   p ( x , y ); pY ( y )  y  p ( x, y ) x pY pX Y1 Y2 X1 p1,1 X2 … Ym-1 Ym p1,2 p1,m-1 p1,m p2,1 p2,2 p2,m-1 p2,m Xn-1 pn-1,m pn-1,m pn-1,m pn-1,m Xn pn,m pn,m pn,m pn,m … 7 School of Software 5.1. Jointly Distributed Random Variables  Example 5.2 (Ex. 51. Cont’) The possible X values are x=100 and x=250, so computing row totals in the joint probability table yields 0 y 100 200 100 0.20 0.10 0.20 250 0.05 0.15 0.30 p(x,y) x px(100)=p(100,0 )+p(100,100)+p(100,200)=0.5 px(250)=p(250,0 )+p(250,100)+p(250,200)=0.5 8  0.5, x  100, 250 px ( x)    0, otherw ise School of Software 5.1. Jointly Distributed Random Variables  Example 5.2 (Cont’) 0 y 100 200 100 0.20 0.10 0.20 250 0.05 0.15 0.30 p(x,y) x py(0)=p(100,0)+p(250,0)=0.2+0.05=0.25 py(100)=p(100,100)+p(250,100)=0.1+0.15=0.25 py(200)=p(100,200)+p(250,200)=0.2+0.3=0. 5  0.25, y  0,100  p Y ( y )   0.5, y  200  0, otherw ise  P(Y ≥ 100) = p(100,100) + p(250,100) + p(100,200) + p(250,200) = pY(100)+pY (200) =0.75 9 School of Software 5.1. Jointly Distributed Random Variables  The Joint Probability Density Function for Two Continuous Random Variables Let X and Y be two continuous random variables. Then f(x,y) is the joint probability density function for X and Y if for any two-dimensional set A P [( X , Y )  A ]   f ( x , y ) dxdy A Two requirements for a joint pdf 1. f(x,y) ≥ 0; for all pairs (x,y) in R2   2.   f ( x , y ) dxdy  1   10 School of Software 5.1. Jointly Distributed Random Variables  In particular, if A is the two-dimensional rectangle {(x,y):a ≤ x ≤ b, c ≤ y ≤ d},then P [( X , Y )  A ]  P ( a  X  b , c  Y  d )  b   a f(x,y) d f ( x , y ) dydx c y Surface f(x,y) A = Shaded rectangle x 11 School of Software 5.1. Jointly Distributed Random Variables  Example 5.3 A bank operates both a drive-up facility and a walk-up window. On a randomly selected day, let X = the proportion of time that the drive-up facility is in use, Y = the proportion of time that the walk-up window is in use. Let the joint pdf of (X,Y) be 6 2 ( x  y )  f ( x, y )   5 0  0  x  1, 0  y  1 otherwise 1. Verify that f(x,y) is a joint probability density function; 2. Determine the probability P (0  X  1 4 12 ,0  Y  1 ) 4 School of Software 5.1. Jointly Distributed Random Variables  Marginal Probability density function The marginal probability density functions of X and Y, denoted by fX(x) and fY(y), respectively, are given by f X ( x)  fY ( y )      f ( x , y ) dy for    x   f ( x , y ) dx for    y     Y Fixed y X Fixed x 13 School of Software 5.1. Jointly Distributed Random Variables  Example 5.4 (Ex. 5.3 Cont’) The marginal pdf of X, which gives the probability distribution of busy time for the drive-up facility without reference to the walk-up window, is f X ( x)     f ( x , y ) dy   1 6 0 5 ( x  y ) dy  2 6 5 x 2 5 for x in (0,1); and 0 for otherwise. 6 2 3  y  fY ( y )   5 5 0  0  y 1 otherw ise Then P( 1 4 Y  3 4 3/ 4 )  f Y ( y ) dy  0.4625 1/ 4 14 School of Software 5.1. Jointly Distributed Random Variables  Example 5.5 A nut company markets cans of deluxe mixed nuts containing almonds, cashews, and peanuts. Suppose the net weight of each can is exactly 1 lb, but the weight contribution of each type of nut is random. Because the three weights sum to 1, a joint probability model for any two gives all necessary information about the weight of the third type. Let X = the weight of almonds in a selected can and Y = the weight of cashews. The joint pdf for (X,Y) is  24 xy f ( x, y )   0 0  x  1, 0  y  1, x  y  1 otherwise 15 School of Software 5.1. Jointly Distributed Random Variables  Example 5.5 (Cont’) 0  x  1, 0  y  1, x  y  1  24 xy f ( x, y )   0 otherwise 1: f(x,y) ≥ 0 (0,1) 2: (x,1-x)      f ( x , y ) dydx   f ( x , y ) dydx D  1  { 0 x (1, 0)   1 1 x (24 xy ) dy }dx 0 12 x (1  x ) dx  1 2 0 16 School of Software 5.1. Jointly Distributed Random Variables  Example 5.5 (Cont’) Let the two type of nuts together make up at most 50% of the can, then A={(x,y); 0≤x ≤1; 0 ≤ y ≤ 1, x+y ≤ 0.5} (0,1) P (( X , Y )  A )   f ( x , y ) dydx A  x+y=0.5 (1, 0)  0.5 0 { 0.5  x (24 xy ) dy }dx 0  0.625 17 School of Software 5.1. Jointly Distributed Random Variables  Example 5.5 (Cont’) The marginal pdf for almonds is obtained by holding X fixed at x and integrating f(x,y) along the vertical line through x: (0,1) f X (x)  (x,1-x)  0.5 f ( x , y )dy 0  1  x (24 xy )dy  12 x (1  x ) 2 , 0  x  1   0  0, otherw ise x (1, 0) 18 School of Software 5.1. Jointly Distributed Random Variables  Independent Random Variables Two random variables X and Y are said to be independent if for every pair of x and y values, p ( x, y )  p X ( x)  pY ( y ) f ( x, y )  f X ( x)  fY ( y ) when X and Y are discrete when X and Y are continuous Otherwise, X and Y are said to be dependent. Namely, two variables are independent if their joint pmf or pdf is the product of the two marginal pmf’s or pdf’s. 19 School of Software 5.1. Jointly Distributed Random Variables  Example 5.6 In the insurance situation of Example 5.1 and 5.2 p (100,100)  0.1  (0.5)(0.25)  p X (100) pY (100) 0 y 100 200 100 0.20 0.10 0.20 250 0.05 0.15 0.30 p(x,y) x So, X and Y are not independent. 20 School of Software 5.1. Jointly Distributed Random Variables  Example 5.7 (Ex. 5.5 Cont’) Because f(x,y) has the form of a product, X and Y would appear to be independent. However, although f X (x)  fY ( y )    1 x (24 xy )dy  12 x (1  x ) 2 0 1 y (2 4 xy )d x  1 2 y (1  y ) 2 By symmetry 0 3 3 9 9 9 f X ( ) fY ( )  , f ( x, y )  0   4 4 16 16 16 21 School of Software 5.1. Jointly Distributed Random Variables  Example 5.8 Suppose that the lifetimes of two components are independent of one another and that the first lifetime, X1, has an exponential distribution with parameter λ1 whereas the second, X2, has an exponential distribution with parameter λ2. Then the joint pdf is  1  2 e  1 x1   2 x 2 f ( x1 , x 2 )  f X 1 ( x 1 )  f X 2 ( x 2 )   0 x1  0, x 2  0 otherw ise Let λ1 =1/1000 and λ2=1/1200. So that the expected lifetimes are 1000 and 1200 hours, respectively. The probability that both component lifetimes are at least 1500 hours is P (1500  X 1 ,1500  X 2 )  P (1500  X 1 ) P (1500  X 2 ) 22 School of Software 5.1. Jointly Distributed Random Variables  More than Two Random Variables If X1, X2, …, Xn are all discrete rv’s, the joint pmf of the variables is the function p(x1, x2, …, xn) = P(X1 = x1, X2 = x2, …, Xn = xn) If the variables are continuous, the joint pdf of X1, X2, …, Xn is the function f(x1, x2, …, xn) such that for any n intervals [a1, b1], …, [an, bn], b1 P ( a1  X 1  b1 , ..., a n  X n  bb )   ...  a1 23 bn f ( x1 , ..., x n ) dx n ...dx1 an School of Software 5.1. Jointly Distributed Random Variables  Independent The random variables X1, X2, …Xn are said to be independent if for every subset Xi1, Xi2,…, Xik of the variable, the joint pmd or pdf of the subset is equal to the product of the marginal pmf’s or pdf’s. 24 School of Software 5.1. Jointly Distributed Random Variables  Multinomial Experiment An experiment consisting of n independent and identical trials, in which each trial can result in any one of r possible outcomes. Let pi=P(Outcome i on any particular trial), and define random variables by Xi=the number of trials resulting in outcome i (i=1,…,r). The joint pmf of X1,…,Xr is called the multinomial distribution. n!  x1 xr p ... p , x i  0,1... w ithx1  x 2 ...  x r  n  1 r p ( x1 , ..., x r )   ( x1 !)( x 2 !)...( x r !) 0  Note: the case r=2 gives the binomial distribution. 25 School of Software 5.1. Jointly Distributed Random Variables  Example 5.9 If the allele of each of then independently obtained pea sections id determined and p1=P(AA), p2=P(Aa), p3=P(aa), X1= number of AA’s, X2=number of Aa’s and X3=number of aa’s, then p ( x1 , x 2 , x 3 )  10 ! ( x1 !)( x 2 !)...( x r !) x1 p1 p 2 x2 p 3 , x i  0,1, ..andx1  x 2  x 3  10 x3 If p1=p3=0.25, p2=0.5, then P ( x1  2, x 2  5, x 3  3)  p (2, 5, 3)  0.0769 26 School of Software 5.1. Jointly Distributed Random Variables  Example 5.10 When a certain method is used to collect a fixed volume of rock samples in a region, there are four resulting rock types. Let X1, X2, and X3 denote the proportion by volume of rock types 1, 2 and 3 in a randomly selected sample. If the joint pdf of X1,X2 and X3 is  kx1 x 2 (1  x 3 ), 0  x1  1, 0  x 2  1, 0  x 3  1, x1  x 2  x 3  1 f ( x1 , x 2, x 3)    0, otherw ise   f ( x1 , x 2, x 3)  1, D1 :   x i   , i  1, 2, 3 k=144. D1   f ( x1 , x 2, x 3)  0.6066, D 2 : X 1  X 2  0.5 D2 27 School of Software 5.1. Jointly Distributed Random Variables  Example 5.11 If X1, …,Xn represent the lifetime of n components, the components operate independently of one another, and each lifetime is exponentially distributed with parameter, then f ( x1 , x 2 , ... x n )  (  e   x1 )(  e   x2 )...(  e   xn )   n e    xi , x  0; x  0; ..., x  0; 1 2 n   0, otherw ise 28 School of Software 5.1. Jointly Distributed Random Variables  Example 5.11 (Cont’) If there n components constitute a system that will fail as soon as a single component fails, then the probability that the system lasts past time is  P ( X 1  t , X 2  t , ..., X n  t )    ... t   (  e f ( x1 , x 2 , ..., x n )dx1 ...dx n t    x1 t dx1 )...(   e   xn dx n )  e  n t t therefore, P ( systemlifetime  t )  1  e 29  n t , fort  0 School of Software 5.1. Jointly Distributed Random Variables  Conditional Distribution Let X and Y be two continuous rv’s with joint pdf f(x,y) and marginal X pdf fX(x). Then for any X values x for which fX(x)>0, the conditional probability density function of Y given that X=x is fY | X ( y | x )  f ( x, y ) ,   y   f X ( x) If X and Y are discrete, then fY | X ( y | x )  p ( x, y ) ,   y   pX ( x) is the conditional probability mass function of Y when X=x. 30 School of Software 5.1. Jointly Distributed Random Variables  Example 5.12 (Ex.5.3 Cont’) X= the proportion of time that a bank’s drive-up facility is busy and Y=the analogous proportion for the walk-up window. The conditional pdf of Y given that X=0.8 is f Y | X ( y | 0.8)  f (0.8, y ) 1.2(0.8  y ) 2  f X (0.8) 1.2(0.8)  0.4 1  (24  30 y ), 0  y  1 2 34 The probability that the walk-up facility is busy at most half the time given that X=0.8 is then 0.5 f Y | X ( y  0.5 | X  0.8)   0.5 f Y | X ( y | 0.8)dy     31 1 (24  30 y )dy  0.39 2 34 School of Software 5.1. Jointly Distributed Random Variables  Homework Ex. 9, Ex.12, Ex.18, Ex.19 32 School of Software 5.2 Expected Values, Covariance, and Correlation  The Expected Value of a function h(x,y) Let X and Y be jointly distribution rv’s with pmf p(x,y) or pdf f(x,y) according to whether the variables are discrete or continuous. Then the expected value of a function h(X,Y), denoted by E[h(X,Y)] or μh(X,Y) , is given by    h ( x , y )  p ( x , y ), X & Y : discrete  x y E [ h ( X , Y )]      h ( x , y )  f ( x , y ) dxdy , X & Y : continuous    33 School of Software 5.2 Expected Values, Covariance, and Correlation  Example 5.13 Five friends have purchased tickets to a certain concert. If the tickets are for seats 1-5 in a particular row and the tickets are randomly distributed among the five, what is the expected number of seats separating any particular two of the five?  1  p ( x , y )   20  0  x  1, ..., 5; y  1, ..., 5; x  y otherw ise The number of seats separating the two individuals is h(X,Y)=|X-Y|-1 34 School of Software 5.2 Expected Values, Covariance, and Correlation  Example 5.13 (Cont’) h(x,y) 1 2 y 3 4 5 E [ h ( X , Y )]   1 -0 1 2 3 2 0 -0 1 2 x 3 1 0 -0 1 4 2 1 0 -0 5 3 2 1 0 -- h( x, y )  p ( x, y ) ( x, y) 5  5  (| x  y |  1)  1 1 20 x 1 y 1 x y 35 School of Software 5.2 Expected Values, Covariance, and Correlation  Example 5.14 In Example 5.5, the joint pdf of the amount X of almonds and amount Y of cashews in a 1-lb can of nuts was  24 xy f ( x, y )   0 0  x  1, 0  y  1, x  y  1 otherw ise If 1 lb of almonds costs the company $1.00, 1 lb of cashews costs $1.50, and 1 lb of peanuts costs $0.50, then the total cost of the contents of a can is h(X,Y)=(1)X+(1.5)Y+(0.5)(1-X-Y)=0.5+0.5X+Y 36 School of Software 5.2 Expected Values, Covariance, and Correlation  Example 5.14 (Cont’) The expected total cost is E [ h ( X , Y )]   1 1 x 0 0         h ( x , y )  f ( x , y ) d xd y (0 .5  0 .5 x  y )  2 4 xyd yd x  $ 1 .1 0 Note: The method of computing E[h(X1,…, Xn)], the expected value of a function h(X1, …, Xn) of n random variables is similar to that for two random variables. 37 School of Software 5.2 Expected Values, Covariance, and Correlation  Covariance The Covariance between two rv’s X and Y is C ov ( X , Y )  E [( X   X )(Y   Y )]    ( x   X )( y   Y ) p ( x , y )  x y     ( x   X )( y   Y ) f ( x , y ) dxdy    38 X , Y discrete X , Y continuous School of Software 5.2 Expected Values, Covariance, and Correlation  Illustrates the different possibilities. y - + y μY - + x (a) positive covariance - + + - μY μY + μX y + μX x (b) negative covariance; μX (c) covariance near zero Here: P(x, y) =1/10 39 x School of Software 5.2 Expected Values, Covariance, and Correlation  Example 5.15 The joint and marginal pmf’s for X = automobile policy deductible amount and Y = homeowner policy deductible amount in Example 5.1 were y x 100 250 y 0 100 250 p(x,y) 0 100 200 pX(x) .5 .5 pY(y) .25 .25 .5 x 100 .20 .10 .20 250 .05 .15 .30 From which μX=∑xpX(x)=175 and μY=125. Therefore C ov ( X , Y )   ( x  175)( y  125) p ( x , y ) ( x, y)  (100  175)(0  125)(0.2)  ...  (250  175)(200  125)(0 .3)  1 8 7 5 40 School of Software 5.2 Expected Values, Covariance, and Correlation  Proposition C ov( X , Y )  E ( XY )   X  Y Note: C ov ( X , X )  E ( X )   2 2 X  V (X )  Example 5.16 (Ex. 5.5 Cont’) The joint and marginal pdf’s of X = amount of almonds and Y = amount of cashews were  24 xy f ( x, y )    0 0  x  1, 0  y  1, x  y  1 otherw ise 41 School of Software 5.2 Expected Values, Covariance, and Correlation  Example 5.16 (Cont’)  12 x (1  x ) 2 f X ( x)   0 0  x 1 otherw ise fY(y) can be obtained through replacing x by y in fX(x). It is easily verified that μX = μY = 2/5, and E ( XY )        xyf ( x , y )dxdy  1 1 x 0 0   1 xy  24 xydydx  8  x (1  x ) dx  2 / 15 2 3 0 Thus Cov(X,Y) = 2/15 - (2/5)2 = 2/15 - 4/25 = -2/75. A negative covariance is reasonable here because more almonds in the can implies fewer cashews. 42 School of Software 5.2 Expected Values, Covariance, and Correlation  Correlation The correlation coefficient of X and Y, denoted by Corr(X,Y), ρX,Y or just ρ, is defined by  X ,Y  C ov ( X , Y )  X  Y The normalized version of Cov(X,Y)  Example 5.17 It is easily verified that in the insurance problem of Example 5.15, σX = 75 and σY = 82.92. This gives ρ = 1875/(75)(82.92)=0.301 43 School of Software 5.2 Expected Values, Covariance, and Correlation  Proposition 1. If a and c are either both positive or both negative Corr(aX+b, cY+d) = Corr(X,Y) 2. For any two rv’s X and Y, -1 ≤ Corr(X,Y) ≤ 1. 3. If X and Y are independent, then ρ = 0, but ρ = 0 does not imply independence. 4. ρ = 1 or –1 iff Y = aX+b for some numbers a and b with a ≠ 0. 44 School of Software 5.2 Expected Values, Covariance, and Correlation  Example 5.18 Let X and Y be discrete rv’s with joint pmf 1  p ( x, y )   4 0  ( x , y )  (  4,1), (4,  1), (2, 2)(  2,  2) otherw ise It is evident from the figure that the value of X is completely determined by the value of Y and vice versa, so the two variables are completely dependent. However, by symmetry μX = μY = 0 and E(XY) = (-4)1/4 + (-4)1/4 + (4)1/4 + (4)1/4 = 0, so Cov(X,Y) = E(XY) - μX μY = 0 and thus ρXY = 0. Although there is perfect dependence, there is also complete absence of any linear relationship! 45 School of Software 5.2 Expected Values, Covariance, and Correlation  Another Example X and Y are uniform distribution in an unit circle 1 2 2  ,x  y 1 p ( x, y )     0, otherw ise  (1,0) Obviously, X and Y are dependent. However, we have C ov ( X , Y )  0 46 School of Software 5.2 Expected Values, Covariance, and Correlation  Homework Ex. 24, Ex. 26, Ex. 33, Ex. 35 47 School of Software 5.3 Statistics and Their Distributions  Example 5.19 f(x) Given a Weibull Population with α=2, β=5 ~ μ= 4.4311, μ= 4.1628, δ=2.316 0.15 0.10 0.05 0 x 5 15 10 48 School of Software 5.3 Statistics and Their Distributions  Example 5.19 (Cont’) Sample 1 2 3 1 6.1171 5.07611 3.46710 1.55601 3.12372 8.93795 2 4.1600 6.79279 2.71938 4.56941 6.09685 3.92487 3 3.1950 4.43259 5.88129 4.79870 3.41181 8.76202 4 0.6694 8.55752 5.14915 2.49795 1.65409 7.05569 5 1.8552 6.82487 4.99635 2.33267 2.29512 2.30932 6 5.2316 7.39958 5.86887 4.01295 2.12583 5.94195 7 2.7609 2.14755 6.05918 9.08845 3.20938 6.74166 8 10.2185 8.50628 1.80119 3.25728 3.23209 1.75486 9 5.2438 5.49510 4.21994 3.70132 6.84426 4.91827 10 4.5590 4.04525 2.12934 5.50134 4.20694 7.26081 49 4 5 6 School of Software 5.3 Statistics and Their Distributions  Example 5.19 (Cont’) Sample 1 2 3 4 5 6 Mean 4.401 5.928 4.229 4.132 3.620 5.761 Median 4.360 6.144 4.608 3.857 3.221 6.342 Standard Deviation 2.642 2.062 1.611 2.124 1.678 2.496 Function of the sample observation Sample 1 Function of the sample observation Population Sample 2 … Function of the sample observation Sample k 50 A quantity #1 A quantity #2 A quantity #k School of Software statistic 5.3 Statistics and Their Distributions  Statistic A statistic is any quantity whose value can be calculated from sample data (with a function).  Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable. A statistic will be denoted by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic.  The probability distribution of a statistic is sometimes referred to as its sampling distribution. It describes how the statistic varies in value across all samples that might be selected. 51 School of Software 5.3 Statistics and Their Distributions  The probability distribution of any particular statistic depends on 1. The population distribution, e.g. the normal, uniform, etc. , and the corresponding parameters 2. The sample size n (refer to Ex. 5.20 & 5.30) 3. The method of sampling, e.g. sampling with replacement or without replacement 52 School of Software 5.3 Statistics and Their Distributions  Example Consider selecting a sample of size n = 2 from a population consisting of just the three values 1, 5, and 10, and suppose that the statistic of interest is the sample variance.  If sampling is done “with replacement”, then S2 = 0 will result if X1 = X2.  If sampling is done “without replacement”, then S2 can not equal 0. 53 School of Software 5.3 Statistics and Their Distributions  Random Sample The rv’s X1, X2,…, Xn are said to form a (simple) random sample of size n if 1. The Xi’s are independent rv’s. 2. Every Xi has the same probability distribution. When conditions 1 and 2 are satisfied, we say that the Xi’s are independent and identically distributed (i.i.d) Note: Random sample is one of commonly used sampling methods in practice. 54 School of Software 5.3 Statistics and Their Distributions  Random Sample  Sampling with replacement or from an infinite population is random sampling.  Sampling without replacement from a finite population is generally considered not random sampling. However, if the sample size n is much smaller than the population size N (n/N ≤ 0.05), it is approximately random sampling. Note: The virtue of random sampling method is that the probability distribution of any statistic can be more easily obtained than for any other sampling method. 55 School of Software 5.3 Statistics and Their Distributions  Deriving the Sampling Distribution of a Statistic  Method #1: Calculations based on probability rules e.g. Example 5.20 & 5.21  Method #2: Carrying out a simulation experiments e.g. Example 5.22 & 5.23 56 School of Software 5.3 Statistics and Their Distributions  Example 5.20 A large automobile service center charges $40, $45, and $50 for a tuneup of four-, six-, and eight-cylinder cars, respectively. If 20% of its tune-ups are done on four-cylinder cars, 30% on six-cylinder cars, and 50% on eight-cylinder cars, then the probability distribution of revenue from a single randomly selected tune-up is given by x 40 45 50 p(x) 0.2 0.3 0.5 μ = 46.5 σ2 = 15.25 Suppose on a particular day only two servicing jobs involve tune-ups. Let X1 = the revenue from the first tune-up & X2 = the revenue from the second, which constitutes a random sample with the above probability distribution. 57 School of Software 5.3 Statistics and Their Distributions  Example 5.20 (Cont’) x1 x2 p(x1,x2) x s2 40 40 0.04 40 0 40 45 0.06 42.5 12.5 40 50 0.10 45 50 45 40 0.06 42.5 12.5 x 40 42.5 45 47.5 px(x) 0.04 0.12 0.29 0.30 0.25 _ _   E ( X )  46.5   ,   V ( X )  7.635  2 _ 45 45 0.09 45 0 45 50 0.15 47.5 12.5 50 40 0.10 45 50 50 45 0.15 47.5 12.5 50 50 0.25 50 0 50 2 2 x s2 0 12.5 50 ps2(s2) 0.38 0.42 0.20  s  E ( S )  1 5 .2 5   2 2 2 Known the Population Distribution 58  School of Software 5.3 Statistics and Their Distributions  Example 5.20 (Cont’) x 40 42.5 45 47.5 50 n=2 px(x) 0.04 0.12 0.29 0.30 0.25 x 40 41.25 42.5 43.75 45 43.26 47.5 48.75 50 px(x) 0.0016 0.0096 0.0376 0.0936 0.1761 0.2340 0.2350 0.1500 0.0625 n=4 … 59 School of Software 5.3 Statistics and Their Distributions  Example 5.21 The time that it takes to serve a customer at the cash register in a minimarket is a random variable having an exponential distribution with parameter λ. Suppose X1 and X2 are service times for two different customers, assumed independent of each other. Consider the total service time To = X1 + X2 for the two customers, also a statistic. What is the pdf of To? The cdf of To is, for t≥0 FT0 ( t )  P ( X 1  X 2  t )  t t  x1  f ( x1 , x 2 )dx1 dx 2  {( x1 , x 2 ); x1  x 2  t } x2   0 t e   x1 e   x2 0  [ e (x1,t-x1)  x1+x2= t  1 e   x1  e t ]dx1 0 t   te t x 1 60 dx 2 dx1 School of Software 5.3 Statistics and Their Distributions  Example 5.21 (Cont’) The pdf of To is obtained by differentiating FTo(t);   2 te   t f T0 ( t )   0 t0 t0 This is a gamma pdf (α = 2 and β = 1/λ). The pdf of X = To/2 is obtained from the relation { X ≤ x }iff {To ≤ 2 x } as  4  2 xe  2  x fX (x )   0 61 x0 x0 School of Software 5.3 Statistics and Their Distributions  Simulation Experiments     This method is usually used when a derivation via probability rules is too difficult or complicated to be carried out. Such an experiment is virtually always done with the aid of a computer. And the following characteristics of an experiment must be specified: The statistic of interest (e.g. sample mean, S, etc.) The population distribution (normal with μ = 100 and σ = 15, uniform with lower limit A = 5 and upper limit B = 10, etc.) The sample size n (e.g., n = 10 or n = 50) The number of replications k (e.g., k = 500 or 1000) (the actual sampling distribution emerges as k∞) 62 School of Software 5.3 Statistics and Their Distributions  Example 5.23 Consider a simulation experiment in which the population distribution is quite skewed. Figure shows the density curve of a certain type of electronic control (actually a lognormal distribution with E(ln(X)) = 3 and V(ln(X))=.4). f(x) E(X)=μ=21.7584, V(X)=σ2=82.1449 .05 .03 .01 0 25 50 63 75 x School of Software 5.3 Statistics and Their Distributions  Example 5.23 (Cont’) 1. 2. 64 Center of the sampling distribution remains at the population mean. As n increases:  Less skewed (“more normal”)  More concentrated (“smaller variance”) School of Software 5.3 Statistics and Their Distributions  Homework Ex.38, Ex.41 65 School of Software 5.4 The Distribution of the Sample Mean  Proposition Let X1, X2, …, Xn be a random sample (i.i.d. rv’s) from a distribution with mean value μ and standard deviation σ.  Then E(X )      X  V(X )   2  X   2 and  n   / n X In addition, with To=X1+…+Xn (the sample total), E ( T 0 )  n  , V ( T 0 )  n 2 and  T0  n Refer to 5.5 for the proof! 66 School of Software 5.4 The Distribution of the Sample Mean  Example 5.24 In a notched tensile fatigue test on a titanium specimen, the expected number of cycles to first acoustic emission (used to indicate crack initiation) is μ = 28,000, and the standard deviation of the number of cycles is σ = 5000. Let X1, X2, …, X25 be a random sample of size 25, where each Xi is the number of cycles on a different randomly selected specimen. Then E ( X )    28, 000, E (T0 )  n   25(28000)  700, 000 The standard deviations of X and To are  X  / T  0 n  5000  1000 25 n  25 ( 5000 )  25 , 000 67 School of Software 5.4 The Distribution of the Sample Mean  Proposition Let X1, X2, …, Xn be a random sample from a normal distribution with mean μ and standard deviation σ. Then for any n, X is normally distributed (with mean μ and standard deviation  / n ), as is To (with mean nμ and standard deviation n ). 68 School of Software 5.4 The Distribution of the Sample Mean  Example 5.25 The time that it takes a randomly selected rat of a certain subspecies to find its way through a maze is a normally distributed rv with μ = 1.5 min and σ = .35 min. Suppose five rats are selected. Let X1, X2, …, X5 denote their times in the maze. Assuming the Xi’s to be a random sample from this normal distribution.  Q #1: What is the probability that the total time To = X1+X2+…+X5 for the five is between 6 and 8 min?  Q #2: Determine the probability that the sample average time is at most 2.0 min. 69 School of Software X 5.4 The Distribution of the Sample Mean  Example 5.25 (Cont’) A #1: To has a normal distribution with μTo= nμ = 5(1.5) = 7.5 min and variance σTo2 =nσ2 = 5(0.1225) =0.6125, so σTo = 0.783 min. To standardize To, subtract μTo and divide by σTo: P (6  T o  8)  P ( 6  7.5  Z  8  7.5 0.783 ) 0.783  P (  1.92  Z  0.64)   (0.64)   (  1.92)  0.7115 A #2: E ( X )    1.5  X  / P ( X  2 .0 )  P ( Z  n  0 .3 5 / 2 .0  1 .5 5  0 .1 5 6 5 ) 0 .1 5 6 5  P ( Z  3 .1 9 )   (3 .1 9 )  0 .9 9 9 3 70 School of Software 5.4 The Distribution of the Sample Mean  The Central Limit Theorem (CLT) Let X1, X2, …, Xn be a random sample from a distribution (may or may not be normal) with mean μ and variance σ2. Then if n is sufficiently large, X has approximately a normal distribution with 2 2  X   , X   / n To also has approximately a normal distribution with T  n , T 0 2 0  n 2 The larger the value of n, the better the approximation Usually, If n > 30, the Central Limit Theorem can be used. 71 School of Software 5.4 The Distribution of the Sample Mean  An Example for Uniform Distribution 72 School of Software 5.4 The Distribution of the Sample Mean  An Example for Triangular Distribution 73 School of Software 5.4 The Distribution of the Sample Mean  Example 5.26 When a batch of a certain chemical product is prepared, the amount of a particular impurity in the batch is a random variable with mean value 4.0g and standard deviation 1.5g. If 50 batches are independently prepared, what is the (approximate) probability that _ the sample average amount of impurity X is between 3.5 and 3.8g? _ Here n = 50 is large enough for the CLT to be applicable. X then has approximately a normal distribution with mean value  X  4 . 0 and  X  1 .5 / 5 0  0 .2 1 2 1, so P (3.5  X  3.8)  P ( 3.5  4.0 Z  3.8  4.0 0.2121 0.2121 74 )   (  0.94)   (  2.36)  0.1645 School of Software 5.4 The Distribution of the Sample Mean  Example 5.27 A certain consumer organization customarily reports the number of major defects for each new automobile that it tests. Suppose the number of such defects for a certain model is a random variable with mean value 3.2 and standard deviation 2.4. Among 100 randomly selected cars of this model, how likely is it that the sample average number of major defects exceeds 4? Let Xi denote the number of major defects for the ith car in the random sample. Notice that Xi is a discrete rv, but the CLT is applicable whether the variable of interest is discrete or continuous. 75 School of Software 5.4 The Distribution of the Sample Mean  Example 5.27 (Cont’) Using  X  3 . 2 and  X  0 . 24 4  3.2   P ( X  4)  P  Z   0.24    1   (3.33)  0.0004 76 School of Software 5.4 The Distribution of the Sample Mean  Other Applications of the CLT The CLT can be used to justify the normal approximation to the binomial distribution discussed in Chapter 4. Recall that a binomial variable X is the number of successes in a binomial experiment consisting of n independent success/failure trials with p = P(S) for any particular trial. Define new rv’s X1, X2, …, Xn by 1 if the ith trial results in a success Xi = (i = 1, …, n) 0 if the ith trial results in a failure 77 School of Software 5.4 The Distribution of the Sample Mean  Because the trials are independent and P(S) is constant from trial to trial to trial, the Xi’s are i.i.d (a random sample from a Bernoulli distribution).  The CLT then implies that if n is sufficiently large, both the sum and the average of the Xi’s have approximately normal distributions. Now the binomial rv X = X1+….+Xn. X/n is the sample mean of the Xi’s. That is, both X and X/n are approximately normal when n is large.  The necessary sample size for this approximately depends on the value of p: When p is close to .5, the distribution of Xi is reasonably symmetric. The distribution is quit skewed when p is near 0 or 1. 1 0 p=0.4 (a) 1 0 p=0.1 78 Rule: np ≥ 10 & n(1-p) ≥ 10 rather than n>30 (b) School of Software 5.4 The Distribution of the Sample Mean  Proposition Let X1, X2, …, Xn be a random sample from a distribution for which only positive values are possible [P(Xi > 0) = 1]. Then if n is sufficiently large, the product Y = X1X2 · … · Xn has approximately a lognormal distribution. Please note that : ln(Y)=ln(X1)+ ln(X2)+…+ ln(Xn) 79 School of Software Supplement: Law of large numbers  Chebyshev's Inequality Let X be a random variable (continuous or discrete) , then P (| X  E ( X ) |  )  D(X )  2 ,   0 Proof: 2 | X  E (X ) | ( X  E ( X ))  1) P (| X  E ( X ) |  )  P (  1)  P ( 2     p( x )dx 2 ( X  E ( X ))  2 1 B  {X  E ( X )   }  {X  E ( X )   } 2   ( X  E ( X ))  B     p( x )dx 2 ( X  E ( X ))  2 2 p( x )dx |  ( X  E ( X ))  2 2 | 1 D(X )  80 2 School of Software Supplement: Law of large numbers  Khintchine law of large numbers X1, X2, ... an infinite sequence of i.i.d. random variables with finite expected value E(Xk) = µ < ∞ and variable D(Xk) = δ2 < ∞ 1 lim P (| n n Proof: X n  1 n n  X i   |  )  1,    0 i 1 n  Xi E ( X n )  ; D ( X n )  i 1  2 n According to Chebyshev's inequality P (| 1 n n  i 1 lim P (| n X i   |  )  P (| X n  E ( X n ) |  )  1 n n  i 1 X i   |  )  1  lim P (| n 81 1 n D(X n )  2  1  2 n   n 2 n  X i   |  )  1 i 1 School of Software 0 Supplement: Law of large numbers  Bernoulli law of large numbers The empirical probability of success in a series of Bernoulli trials Ai will converge to the theoretical probability. 1, Ai occurs    0 ,others Ai Ai 1 0 p p 1-p Let n(A) be the number of replication on which A does occur, then we have n( A ) 1 n n( A )  Ai E( )  p D ( n( A ))    p(1  p )  n n 2 n i 1 n n n According to Chebyshev's inequality P (| lim P (| n n( A ) n n( A ) n  p ) |  )  1 p(1  p ) n  p |  )  1  lim P (| n 82  n( A ) 2 n n   0  p |  )  1 School of Software Supplement: Law of large numbers 1 Relative frequency: n(A)/n p 0 1 2 3 … 100 101 … Number of experiments performed 83 School of Software 5.4 The Distribution of the Sample Mean  Homework Ex. 48, Ex. 51, Ex. 55, Ex. 56 84 School of Software 5.5 The Distribution of a Linear Combination  Linear Combination Given a collection of n random variables X1, …, Xn and n numerical constants a1, …, an, the rv n Y  a1 X 1      a n X n  a i Xi i 1 is called a linear combination of the Xi’s. 85 School of Software 5.5 The Distribution of a Linear Combination Let X1, X2, …, Xn have mean values μ1, …, μn respectively, and variances of σ12, …., σn2, respectively.  Whether or not the Xi’s are independent, E (  i 1 a i X i )  n  i 1 a i E ( X i )  n  n i 1 ai  i  If X1, X2, …, Xn are independent, V ( n a Xi)  i 1 i aX 1 1   a n  Xn n  a V (Xi)  i 1 2 i n a i i 1 i 2 2 a1  1     a n  n  2 2 2 2  For any X1, X2, …, Xn, V (  i 1 a i X i )  n   n n i 1 j 1 86 a i a j C ov ( X i , X j ) School of Software 5.5 The Distribution of a Linear Combination E (  i 1 a i X i )   a For the result concerning expected values, suppose that Xi’s are continuous with joint pdf f(x1,…,xn). Then  Proof:  i 1 a i E ( X i )  n E (  i 1 a i X i )  n      n ...  n i 1       i 1 n i 1 i 1 i i (  i  1 a i x i ) f ( x1 , ..., x n )dx1 ...dx n ai  n n n   ai  ...    x i f ( x1 , ..., x n )d x1 ...d x n   x i f X i ( x i ) dx i ai E ( X i ) 87 School of Software 5.5 The Distribution of a Linear Combination  Proof: V (  i 1 a i X i )  n V   i 1 a i X i n     E         n n i 1 j 1   n i 1 j 1 a i a j C ov ( X i , X j )  i 1 a i X i   i 1 a i  i  E   i 1 a i  X i   i     n n n 2  n  E  n i 1  n j 1  2   a i a j ( X i   i )( X j   j ) a i a j E [( X i   i )( X j   j )] n n i 1 j 1 a i a j C ov( X i , X j ) When the Xi’s are independent, Cov(Xi, Xj) = 0 for i ≠ j, and V   i 1 a i X i n     n i 1 n a a C ov( X i , X j )  j 1 i j 88  i 1 a i V ( X i ) n 2 School of Software  5.5 The Distribution of a Linear Combination  Example 5.28 A gas station sells three grades of gasoline: regular unleaded, extra unleaded, and super unleaded. These are priced at $1.20, $1.35, and $1.50 per gallon, respectively. Let X1, X2 and X3 denote the amounts of these grades purchased (gallon) on a particular day. Suppose the Xi’s are independent with μ1 = 1000, μ2= 500, μ3= 300, σ1 = 100, σ2 = 80, and σ3 = 50. The revenue from sales is Y = 1.2X1+1.35X2+1.5X3. Compute E(Y), V(Y), σY. E (Y )  1.2  1  1.35  2  1.5  3  $2325 V (Y ) Y  (1.2)  1  (1.35)  2  (1.5)  3  31, 689 2  2 2 2 2 2 31, 689  $178.01 89 School of Software 5.5 The Distribution of a Linear Combination  Corollary (the different between two rv’s) E(X1-X2) = E(X1) - E(X2) and, if X1 and X2 are independent, V(X1-X2) = V(X1)+V(X2).  Example 5.29 A certain automobile manufacturer equips a particular model with either a six-cylinder engine or a four-cylinder engine. Let X1 and X2 be fuel efficiencies for independently and randomly selected six-cylinder and four-cylinder cars, respectively. With μ1 = 22, μ2 = 26, σ1 = 1.2, and σ2 = 1.5, E ( X 1  X 2 )   1   2  22  26   4 V (X1  X 2)   X 1 X2  2 1  2  (1.2)  (1.5)  3.69 2 2 2 3.69  1.92 90 School of Software 5.5 The Distribution of a Linear Combination  Proposition If X1, X2, …, Xn are independent, normally distributed rv’s (with possibly different means and/or variances), then any linear combination of the Xi’s also has a normal distribution.  Example 5.30 (Ex. 5.28 Cont’) The total revenue from the sale of the three grades of gasoline on a particular day was Y = 1.2X1+1.35X2+1.5X3, and we calculated μY = 2325 and σY =178.01). If the Xi’s are normally distributed, the probability that the revenue exceeds 2500 is P (Y  2500)  P ( Z  2500  2325 ) 178.01  P ( Z  0.98)  1   (0.98)  0.1635 91 School of Software 5.5 The Distribution of a Linear Combination  Homework Ex. 58, Ex. 70, Ex. 73 92 School of Software

Chap. 5 - Sun Yat

Related documents

Products

Support

Chap. 5 - Sun Yat

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib