Ch.4 Review of Basic Probability and Statistics 4.1 Introduction Perform statistic analyses of the simulation output data Design the simulation experiments Probability and statistics Model a probabilistic system Validate the simulation model Generate random samples from the input distribution Choose the input probabilistic distribution 4.2 Random variables and their properties • Experiment is a process whose outcome is not known with certainty. • Sample space (S) is the set of all possible outcome of an experiment. • Sample points are the outcomes themselves. • Random variable (X, Y, Z) is a function that assigns a real number to each point in the sample space S. • Values x, y, z Examples • flipping a coin S={H, T} • tossing a die S={1,2,…,6} • flipping two coins S={(H,H), (H,T), (T,H), (T,T)} X: the number of heads that occurs • rolling a pair of dice S={(1,1), (1,2), …, (6,6)} X: the sum of the two dice Distribution (cumulative) function F ( x ) P( X x ) for x P( X x ) : the probability associated with the event { X x} Properties: 1. 0 F ( x ) 1 for all x. 2. F(x) is nondecreasing [i.e., if x1 x2 , then F ( x1 ) F ( x2 ) ]. F ( x ) 1 and lim F ( x ) 0 . 3. lim x x Discrete random variable A random variable X is said to be discrete if it can take on at most a countable number of values. The probability that X takes on the value x1 p( xi ) P( X xi ) for i 1,2,... Then p( x ) 1 i 1 i Probability mass function on I=[a,b] P( X I ) p( x ) a xi b F ( x) p( x ) xi x i i for all x Examples p(x) 1 5/6 2/3 1/2 1/3 1/6 0 1 2 3 4 x p(x) for the demand-size random variable X. P(2 X 3) p(2) p(3) 13 13 2 3 F(x) 1 5/6 2/3 1/2 1/3 1/6 0 1 2 3 4 x F(x) for the demand-size random variable X. Continuous random variables A random variable is said to be continuous if there exists a nonnegative function f(x) such that for any set of real number B PX BB fxdx fxdx 1 and f(x) is called the probability density function. PX xPX x, x x fydy 0 x x x PX x, x x x fydy FxPX , x fydy x for all x f ( x ) F ( x ) b a, b PX Ia fydy FbFa I f(x) P( X [ x, x x ]) P( X [ x ' , x ' x]) x x x x' x ' x Interpretation of the probability density function x Uniform random variable on the interval [0,1] 1 fx 0 If if 0 x 1 otherwise 0 x 1 , then F ( x) 0 f ( y )dy 0 1dy x x x F(x) f(x) 1 1 0 1 x f(x) for a uniform random variable on [0,1] 1 fx 0 if 0 x 1 otherwise 0 x 1 F(x) for a uniform random variable on [0,1] x x PX x, x x x fydy Fx xFx x xx x where 0 x x x 1 Exponential random variable F(x) f(x) 1 1 0 f(x) for an exponential random variable with mean x 0 x F(x) for an exponential random variable with mean Joint probability mass function If X and Y are discrete random variables, then let px, yPX x, Y y for all x, y where p(x,y) is called the joint probability mass function of X and Y. X and Y are independent if px, yp X xp Yy for all x, y where p X x px, y all y p Yy px, y all x are the (marginal) probability mass functions of X and Y. Example 4.9 Suppose that X and Y are jointly discrete random variables with px, y xy 27 for x 1, 2 and y 2, 3, 4 0 otherwise Then 4 p X x y2 2 p Yy x1 xy 27 xy 27 x 3 y 9 for x=1,2 for y=2,3,4 x, yxy/27 p X xp Yy For all x, y, the random variables X and Y Since p are independent. Joint probability density function The random variables X and Y are jointly continuous if there exists a nonnegative function f(x,y), such that for all sets of real numbers A and B, P X A, Y BB A f x, y dxdy X and Y are independent if f x, yf X x f Y y for all x and y where f X x f x, y dy f Y y f x, y dx are the (marginal) probability density functions of X and Y, respectively. Example 4.11 Suppose that X and Y are jointly continuous random variables with f x, y 24xy 0 for x 0, y 0, and x y 1 otherwise Then 1x x 12x f X x0 24xydy 12xy 2|1 1 x2 for 0 x 1 0 1y f Y y0 24xydx 12yx 2 1y 0 12y 1 y2 for 0 y 1 Since f12 , 1 2 6 32 2 f X 12 f Y12 X and Y are not independent. Mean or expected value Ex i i x jp Xi x j if X i is discrete j1 xf Xi x if X i is continuous The mean is one measure of central tendency in the sense that it is the center of gravity Examples 4.12-4.13 For the demand-size random variable, the mean is given by 116 213 313 416 52 For the uniform random variable, the mean is given by 1 1 0 xfxdx 0 xdx 12 Properties of means 1. 2. E cXcE X E n c X i1 i i n c E Xi i1 i Even if the X‘s i are dependent. Median x 0.5 The median x 0.5 of the random variable is defined to be the x 0.50. 5 smallest value of x such that F Xi f X i (x ) area=0.5 x 0.5 The median x 0.5 for a continuous random variable x Example 4.14 1. Consider a discrete random variable X that takes on each of the values, 1, 2, 3, 4, and 5 with probability 0.2. Clearly, the mean And the median of X are 3. 2. Now consider random variable Y that takes on each of the values, 1, 2, 3, 4, and 100 with probability 0.2. The mean and the median of X are 22 and 3, respectively. Note that the median is insensitive to this change in the distribution. The median may be a better measure of central tendency than the mean. Variance Var( X i ) i2 E[( X i i )2 ] E ( X i2 ) i2 E ( X i2 ) [ E ( X i )]2 For the demand-size random variable, E X 21 21 2 21 3 21 4 21 43 6 3 3 6 6 VarXEX 22 43 5 2 11 6 2 12 For the uniform random variable on [0,1], 1 2 2 E X 0 x f x dx 1 0 x 2dx 1 3 Var XE X 22 1 1 2 1 3 2 12 2 2 small large Density functions for continuous random variables with large and small variances. Properties of the variance 1. Var X0 2. VarcXc 2VarX i1 Xi i1 VarXi if the X i‘s are 3. Var n n independent (or uncorrelated). Standard deviation i 2i The probability that X i is between i 1.96 i and i 1.96 i is 0.95. Covariance Cov( X i , X j ) Cij E[( X i i )( X j j )] E ( X i X j ) i j The covariance between the random variables X i and X j is a measure of their dependence. Cij Cji Cij Cji 2i if i=j, Example 4.17 For the jointly continuous random variables X and Y in Example 4.11 1 1x E XY 0 0 xyf x, y dydx 1 1x 0 24y 2dydx 0 x 2 1 0 8x 2 1 x3dx 2 15 1 1 E X0 xf X x dx 0 12x 2 1 x2dx 2 5 1 1 E Y0 yf Y y dy 0 12y 2 1 y2dy 2 5 CovX, Y EXYEXEY 2 2 2 5 5 15 2 75 If X i and X j are independent random variables Cij 0 X i and X j are uncorrelated. Generally, the converse is not true. Correlated If Cij 0, then X i and X j are said to be positively correlated. X i i and X j j tend to occur together X i i and X j j tend to occur together If Cij 0 , then X iand X jare said to be negatively correlated. X i i and X j j tend to occur together X i i and X j j tend to occur together Correlation ij C ij 2i 2i i 1, 2, , n j 1, 2, , n 1 ij 1 If ij is close to +1, then X i and X j are highly positively correlated. If ij is close to -1, then X i and X j are highly negatively correlated. For the random variable in Example 4.11 Var XVar Y Cor X, Y 1 25 752 Cov X, Y 1 2 3 Var X Var Y 25 4.3 Simulation output data and stochastic processes Stochastic process is a collection of "similar" random variables ordered over time, which are all defined on a common sample space. State space is the set of all possible values that these random variables can take on. Discrete-time stochastic process: X 1, X 2, t , t 0 Continuous-time stochastic process: X Example 4.19 M/M/1 queue with IID interarrival times A1, A2, IID service times S 1, S 2, FIFO service Define the discrete-time stochastic process of delays in queue D1, D2, D1 0 Di1 maxDi S i Ai1, 0 for i 1, 2, D i and Di1 are positively correlated. input random variables simulation output stochastic process The state space: the set of nonnegative real numbers Example 4.20 For the queueing system of Example 4.19, Let Qtbe the number of customers in the queue at time t . Then Qt, t 0is a continuous-time stochastic process with state space 0, 1, 2, Covariance-stationary Assumptions about the stochastic process are necessary to draw inferences in practice. A discrete-time stochastic process X1, X2, is said to be covariance-stationary, if i for i 1, 2, and 2i 2 for i 1, 2, and 2 and Ci , i j Cov( X i , X i j ) is independent of i for j 1, 2, . Covariance-stationary process For a covariance-stationary process, the mean and variance are stationary over time, and the covariance between two observations X i and X ij depends only on the separation j and not actual time value i and i+j. We denote the covariance and correlation between X i and X ij by C j and j respectively, where j Ci , i j i2 i2 j 2 Cj Cj C0 for j 1, 2, Example 4.22 Consider the output process D1, D2, for a covariance-stationary M/M/1 queue with /1 . Warmup period In general, output processes for queueing systems are positively correlated. If X1,X2, is a stochastic process beginning at time 0 in a simulation, then it is quite likely not to be covariance-stationary. However, for some simulation Xk1,Xk2, will be approximately covariance-stationary if k is large enough, where k is the length of the warmup period. 4.4 Estimation of means, variance, and correlations Suppose X 1, X 2 ,, X n are IID random variables with finite population mean and finite population variance 2 Unbiased estimators: Sample mean n E [ X ( n )] Xi Xn i1 n n 2 Sample variance E[ S 2 (n)] 2 2 X i X n S 2 n i1 n 1 How close X n is to ? Density function for X (n ) X First observation of X (n ) X Second observation of X (n ) How close X n is to to construct a confidence interval Var XnVarn1 n Xi i1 n 1 n2 Var X i (because the X i ’s are independent) i1 n 1 n2 VarX i 1 n2 n2 i1 2 n Unbiased estimator S 2n Var Xn n n 2 X i Xn i1 nn 1 Density function Density function for X n for X n n large n small Distributions of X nfor small and large n. Estimate the variance of the sample mean VarX n . X i´s are independent X i´s are uncorrelated j 0 However, the simulation output data are almost always correlated. X 1 , X 2 ,, X n are from a covariance-stationary stochastic process, Then, X nis an unbiased estimator of , however, S 2nis no longer an unbiased estimator of 2 . Since n1 1 j/nj ES 2n 2 1 2 j1 n 1 However, simulation output data are always correlated. Since n1 1 j/nj ES 2n 2 1 2 j 0 j1 n 1 (1) E ( S 2 (n)) 2 For a covariance-stationary process: n1 1 2 1 j/nj VarX n 2 j1 n (2) If one estimates VarX nfrom S 2 n /n (correct in the IID case) there are two errors: • the bias in S 2 nas an estimator of 2 . • the negligence of the correlation terms in Eq. (2). Solution: combine Eq. (1) and Eq. (2) S 2 n n/a n 1 E n VarX n n 1 (3) n1 an 1 2 1 j/nj j1 S 2 n /nVar X n . If j 0, then a n1 and E Example 4.24 D1, D2, , D10 from the process of delays for a covariancestationary M/M/1 queue with 0. 9 . Eq.(1) and (3) E S 2 10 0. 03282 S 2 10 E 0. 0034Var D 10 10 10 10 2 Di D10 Di 1 2 VarDi D10 i10 S 210 i1 9 D 10 , and we are Thus, S 2 10 /10 is a gross underestimate of Var Di likely to be overly optimistic about the closeness of D10to E Estimate j . nj j Ĉj , 2 S n X i X n X ij X n Ĉj i1 n j In general "good" estimates of the j 's will be difficult to obtain unless n is very large and j is small relative to n. 4.5.1 Confidence Intervals Z n [ X (n ) ] / 2 / n Fn ( z ) P( Z n z ) 4.5.1 Confidence Intervals Central Limit Theorem: Fn z z as n , where z, the distribution function of a normal random variable with 0 and 2 1, is given by z 1 z e y 2 /2dy for z 2 If n is "sufficiently large", the random variable Z n will be approximately distributed as a standard normal random variable, regardless of the underlying distribution of the X i 's. For large n, the sample mean Xnis approximately distributed as a normal random variable with mean and variance 2/n. Z n [ X (n ) ] / 2 / n t n Xn / S 2n/n Pz 1/2 Xn S n/n 2 z 1/2 P Xnz 1/2 Xnz 1/2 S 2n n S 2n n 1 where z 1/2 ( 0 1) is the upper 1 /2 critical point for a standard normal random variable. f(x) Shaded area = 1 a z1a / 2 0 z1a / 2 x If n is sufficiently large, an approximate 100 1 percent confidence interval for is given by X nz 1/2 S 2 n confidence interval n Interpretation I: If one constructs a very large number of independent 100 1 percent confidence intervals, each based on n observations, where n is sufficiently large, the proportion of these confidence intervals that contains (cover) should be 1 . Interpretation II: If the X i's are normal random variables, the random variable t n X n / S 2 n /n has a t distribution with n-1 degree of freedom (df), and an exact (for any n 2) 100 1 percent confidence interval for is given by X nt n1,1/2 S 2 n n Where t n1,1/2 is the upper 1 /2 critical point for the t distribution with n-1 df Standard normal distribution t distribution with 4df f(x) 0 x Figure 4.16 Density function for the t distribution with 4df and for the standard normal distribution. Example 4.26 10 observations: 1.20, 1.50, 1.68, 1.89, 0.95, 1.49, 1.58, 1.55, 0.50, 1.09 are from a normal distribution, To construct a 90% confidence interval for . X 101. 34 S 2 100. 17 X 10t 9,0.95 S 2 10 1. 34 1. 83 0. 17 1. 34 0. 24 10 10 Table 4.1 Estimated coverages based on 500 experiments Distribution Skewness v n=5 n=10 n=20 n=40 Normal 0.00 0.910 0.902 0.898 0.900 Exponential 2.00 0.854 0.878 0.870 0.890 Chi Square 2.83 0.810 0.830 0.848 0.890 Lognormal 6.18 0.758 0.768 0.842 0.852 Hyperexponential 6.43 0.584 0.586 0.682 0.774 E X 3 v v 2 3/2 4.5.2 Hypothesis tests for the mean H 0 : 0 If |Xn0 | is large, H 0 is not likely to be true. If H 0 is true, the statistic t n X n0 / S 2 n /n will have a t distribution with n-1 df. If |t n | t n1,1/2 reject H0 t n1,1/2 "accept" H0 H0 Example 4.27 For Example 4.26, To test the null hypothesis H 0 that 1 at level 0. 10 . t 10 X 101 S 2 10 /10 We reject H 0 . 0. 34 2. 65 1. 83 t 9,0.95 0. 17/10 4.6 The Strong Law of Large Numbers Theorem 4.2 X n w.p. 1 as n . Example 4.29 Tarea II: Teoría de la probabilitad y estatística A.M. Law and W.D. Kelton, Simulation, Modeling and Analysis, 3rd edition, pp. 261-263. Problems 4.1, 4.2, 4.4, 4.7, 4.9, 4.10, 4.13, 4.20, 4.21, 4.23, 4.24, 4.25, 4.26