ENM 307 - SIMULATION PROBABILITY AND STATISTICS REVIEW 1 Question 1 • The probability that the Red River will flood in any given year has been estimated from 200 years of historical data to be one in four. This means: a) The Red River will flood every four years b) In the next 100 years, the Red River will flood exactly 25 times c) In the last 100 years, the Red River flooded exactly 25 times d) In the next 100 years, the Red River will flood about 25 times e) In the next 100 years, it is very likely that the Red River will flood exactly 25 times 2 Question 2 • A random variable X has a probability distribution as follows: • a) b) c) d) e) X 0 1 2 3 P(X=x)=PX(x) 2k 3k 13k 2k Then the probability that X is less than 2, i.e., P(X<2),is equal to: 0.90 0.25 0.65 0.15 1.00 3 Question 3 • Cans of soft drinks cost $0.30 in a certain vending machine. What is the expected value and variance of daily revenue (Y) from the machine if X, the number of cans sold per day has E(X)=125 and Var(X)=50? a) E(Y)=37.5, Var(Y)=50 b) E(Y)=37.5, Var(Y)=4.5 c) E(Y)=37.5, Var(Y)=15 d) E(Y)=125, Var(Y)=4.5 e) E(Y)=125, Var(Y)=15 4 Question 4 - i • The average length of stay in a hospital is useful for planning purposes. Suppose that the following is the distribution of length of stay in a hospital after a certain operation: Days 2 3 4 5 6 Probabilities 0.05 0.20 0.40 0.20 ? • What is the probability that the length of stay is 6? a) 0.15 b) 0.17 c) 0.20 d) 0.25 e) 0.05 5 Question 4 - ii • The average length of stay in a hospital is useful for planning purposes. Suppose that the following is the distribution of length of stay in a hospital after a certain operation: Days 2 3 4 5 6 Probabilities 0.05 0.20 0.40 0.20 0.15 • The average length of stay is: a) 0.15 b) 0.17 c) 3.3 d) 4.0 e) 4.2 6 Question 5 • Many professional schools require applicants to take a standardized test. Suppose that 1000 students take the test, and you find that your mark of 63 (out of 100) was the 73rd percentile. This means: a) At least 73% of the students got 63 or better b) At least 270 students got 73 or better c) At least 270 students got 63 or better d) At least 27% of the students got 73 or worse e) At least 730 students got 73 or better. 7 Type-I: Reject Ho when Ho is true Type-II: Cannot reject Ho when Ho is not true Question 6 • To determine the reliability of experts used in interpreting the results of polygraph examinations in criminal investigations, 280 cases were studied. The results were: True status Examiner’s decision • Innocent Guilty Innocent 131 15 Guilty 9 125 If hypotheses were H: Suspect is innocent versus A: Suspect is guilty, then we could estimate the probability of making a type II error as: a) 15/280 b) 9/280 c) 15/140 d) 9/140 e) 15/146 8 Type-I: Reject Ho when Ho is true Type-II: Cannot reject Ho when Ho is not true Question 7 • In a statistical test of hypothesis, what happens to the rejection region when a, the probability of type-I error or the level of significance, is reduced? a) The answer depends on the value of b (probability of type-II error) b) The rejection region is reduced in size c) The rejection region is increased in size d) The rejection region is not changed e) The answer depends on the form of the alternative hypothesis 9 Type-I: Reject Ho when Ho is true Type-II: Cannot reject Ho when Ho is not true Question 8 • Which of the following is not correct? a) The probability of type-I error is controlled by the selection of the significance level, a. b) The probability of type-II error is controlled only by the sample size c) The power of a test depends upon the sample size and the distance between the null and alternative hypotheses d) The p-value measure the probability that the null hypothesis is true when Ho is rejected by the current data. e) The rejection region is controlled by the a level and the alternate hypothesis. 10 Type-I: Reject Ho when Ho is true Type-II: Cannot reject Ho when Ho is not true Question 9 • In a statistical test for the equality of a mean, such as Ho: m=10, if a =0.05: a) We will make an incorrect inference 95% of the time, b) We will say that there is a real difference 5% of the time when there is no difference c) We will say that there is no real difference 5% of the time when there is no difference d) 95% of the time the null hypothesis will be correct e) 5% of the time we will make a correct inference 11 Type-I: Reject Ho when Ho is true Type-II: Cannot reject Ho when Ho is not true Question 10 • Which of the following statements is correct? a) An extremely small p-value indicates that the actual data differs markedly from that expected if the null hypothesis were true b) The p-value measures the probability that hypothesis is true c) The p-value measures the probability of making a Type-II error. d) A large p-value indicates that the data is consistent with the alternative hypothesis e) The larger the p-value, the stronger the evidence against the null hypothesis 12 Type-I: Reject Ho when Ho is true Type-II: Cannot reject Ho when Ho is not true Question 11 • In a test of Ho: m=100, versus Ha: m≠100, a sample of size 10 produces a sample mean of 103 and a p-value of 0.08. Thus at the significance level a =0.05: a) There is sufficient evidence to conclude that m≠100 b) There is sufficient evidence to conclude that m=100 c) There is insufficient evidence to conclude that m=100 d) There is insufficient evidence to conclude that m≠100 e) There is sufficient evidence to conclude that m=103 13 Formulas Expectation and variance formulas 1. E[cX] = cE[X] 2. E[c1X1 + c2X2 + … + cnXn] = c1E[X1] + c2E[X2] + … + cnE[Xn] 3. Var(X) 0 4. Var(cX) = c2Var(X) 5. Var(c1X1 + c2X2 + … + cnXn) = c12Var[X1] + c22Var[X2] + … + cn2Var[Xn] if Xi’s are independent (or uncorrelated) Covariance and correlation formulas 1. Cov(Xi, Xj) = Cij = E[(Xi - mi)(Xj - mj)] = E[XiXj] - mimj 2. Cij = Cji, Cii = i2 = Var(Xi) 3. Cij Cij Cor( X i , X j ) ij i2 2j i j 14 Joint Distributions Joint Distributions of Discrete Random Variables p ( x, y ) P ( X x, Y y ) (joint probabilit y mass or density function) F(x,y) p(u, v) (joint cumulative distributi on function) u x ,v y p ( x, y ) P ( X x) P (Y y ) p X ( x) pY ( y ) (if X and Y are independen t) p X ( x ) p ( x, y ) all y pY ( y ) p ( x, y ) all x Joint Distributions of Continuous Random Variables F ( x, y ) P( X x, Y y ) (joint cumulative distributi on function) f(x,y)dxdy P( X dx, Y dy ) (joint probabilit y mass or density function) F ( x, y ) FX ( x) FY ( y ) or f ( x, y ) f X ( x) fY ( y ) (if X and Y are independen t) p X ( x) f ( x, y )dy pY ( y ) f ( x, y )dx 15 Example 24 xy for x 0, y 0 and x y 1 f ( x, y ) otherwise 0 1 x f X ( x) 2 xy 12 xydy 24 1 x 0 12 x(1 x) 2 0 1 y fY ( y ) 2 yx 12 xydx 24 1 y 0 12 y (1 y ) 2 0 f ( x, y ) 24 xy 12 x(1 x) 2 12 y (1 y ) 2 f X ( x) fY ( y ) Not independen t 16 Example 1 1 0 0 E X E Y xf X ( x)dx 12 x 2 (1 x) 2 dx EY x 1 EX 2 2 1 2 f X ( x)dx 12 x 3 (1 x) 2 dx 0 0 Var ( X ) Var (Y ) E X E X 2 0 2 1 5 2 1 2 1 5 5 25 1 1 x 2 2 2 3 xyf ( x , y ) dxdy x 24 y dy dx 8 x ( 1 x ) dx 0 0 0 0 15 1 1 x E XY 2 5 1 2 2 2 2 2 Cov( X , Y ) E XY E X E Y 15 5 75 Cov( X , Y ) 2 / 75 2 Cor( X , Y ) 3 Var ( X )Var (Y ) (1 / 25)(1 / 25) 17 Important Families of Distributions Normal Distribution X ~ Normal ( m , 2 ) EX m , Var ( X ) 2 f X ( x) 1 2 2 e 1 xm 2 2 We can obtain almost all other important distributions in (parametric) statistics by the following transformations. Standard Normal Distribution Z X m ~ Normal (0,1) EX 0, Var ( X ) 1 f Z ( x) 1 2 e 1 x2 2 18 Standard Normal PDF 19 20 21 22 Cumulative distribution function (CDF): 23 24 25 26 27 28 29 Chi-Square Distribution Z is a random variable with standard nrmal distribution Z12 Z 22 Z n2 ~ n2 E n2 n, Var n2 2n f 2 ( x) n x n2 2 e x 2 n 2 Γ 2 2 n Γ ( y ) e x x y 1dx (Gamma function) Γ (n) (n-1)! 0 30 Chi-Square PDF 31 F Distribution n2 / n m 2 m 2 ( m n 2) ~ Fn ,m E Fn ,m , Var Fn ,m 2 m / m m2 n ( m 2) 2 ( m 4) nm n nm n2 n 2 2 n 2 2 f Fn ,m ( x) x 1 x m n mm 2 2 32 Student’s t Distribution Z n2 n n ~ t n E t n 0, Var t n n2 n 1 n 1 2 2 x 2 1 f tn ( x) n n n 2 33 Student’s t PDF 34 35 Estimation of Means, Variances and Correlations • Suppose that X1, X2, …, Xn are IID random variables (observations) with finite population mean m and variance 2. • Sample Mean is an unbiased estimator of the population mean m n E X X i nm i i 1 X ( n) mˆ E X (n) i 1 m (Unbiased) n n n n n Var X X i n 2 2 i i 1 i 1 Var(X (n)) Var (Minimum variance) n2 n2 n n 2 X ( n) m ~ Normal 0,1 X ( n) ~ Normal m , n 0 n 0 n / n n 36 The Central Limit Theorem • Sample Variance is an unbiased estimator of the population variance 2 n S 2 ( n) X i X ( n) i 1 n 1 2 ˆ 2 E S 2 (n) 2 (Unbiased) 2 4 Var S (n) (Minimum variance) n 1 (n 1) S 2 (n) ~ n21 2 2 n S 2 ( n) Var( X (n)) n X i 1 i X ( n) n(n 1) n 0 2 2 E Var( X (n)) n (Unbiased) z X ( n) m 1 x2 / 2 lim P z ( z ) e dx (Normal(0, 1) cdf) n / n 2 37 38 Confidence Intervals and Hypothesis Tests for the Mean • Suppose that X1, X2, …, Xn are IID random variables (observations) with finite population mean m and variance 2 • We want to find a confidence interval [l(n,a), u(n,a)] so that P{ l(n,a) m u(n,a)} = 1 - a X (n) z1a / 2 S 2 ( n) n (n large! ) S 2 ( n) X (n) t n1,1a / 2 ( X ~ Normal ( m , 2 )) n • The length of the confidence interval is longer for the t distribution since t n 1,1a / 2 z1a / 2 lim t n 1,1a / 2 z1a / 2 n 39 Skewness • Actual coverage of the confidence interval depends on the sample size as well as the shape of the distribution in which skewness (a measure of symmetry) plays an important role. X m 3 E 40 Example 4.26 • Suppose that the 10 observations 1.20, 1.50, 1.68, 1.89, 0.95, 1.49, 1.58, 1.55, 0.50, and 1.09 are from a normal distribution. Then, a 90% confidence interval is found as follows X (10) 1.34 and S 2 (10) 0.17 X (10) t9,0.95 S 2 (10) 0.17 1.34 1.83 1.34 0.24 1.10, 1.58 10 10 41 42 Hypothesis Testing • In hypothesis testing, we need to choose among two competing hypothesis Status quo H 0 : m m0 Claim H1 : m m 0 P{Reject H 0 | H 0 } a (Type I error) P{Accept H 0 | H1} b (Type II error) α level of significan ce 1-β power of the test 43 Testing Hypothesis on the Mean • The decision rule with the maximum power (minimum Type II error probability) for which Type I error probability is at most a is given by the following t test tn X ( n) m 0 S ( n) / n t n t n 1,1a / 2 (Reject H 0 ) t n t n 1,1a / 2 (Accept H 0 ) • The critical region of rejection is given by a confidence interval t n t n 1,1a / 2 X ( n) m 0 2 S ( n) n t n 1,1a / 2 m 0 X (n) t n 1,1a / 2 S 2 ( n) n 44 Example 4.27 • For the data of Example 4.26, suppose that we want to test H0: m = 1 at level a = 0.90. t10 X (10) 1 2 S (10) 10 0.34 1 0.17 10 2.65 1.83 t 9,0.95 Reject H 0 45 The Strong Law of Large Numbers • Theorem 4.2: Suppose that X1, X2, … are IID random variables with finite mean m, then lim P{X (n) m} 1 n • Example 4.29: 46 Replacing a Random Variable by Its Mean • In general, it is not a good practice to replace random quantities by their means in a simulation study. • Example 4.30: Suppose that mean interarrival time is 1 minute and the mean service time is 0.99 minute in an M/M/1 queue. If simulation is done with the means, then all delays are 0 and the queue is always empty. But, in the actual M/M/1 model = / = 0.99 and the average delay is computed as Wq = /(1- ) = 0.99(0.99)/0.01 = 98.01 minutes. 47