Lectures 5 and 6.pptx

advertisement
ENM 307 - SIMULATION
PROBABILITY AND STATISTICS
REVIEW
1
Question 1
• The probability that the Red River will flood in any given year has
been estimated from 200 years of historical data to be one in four.
This means:
a) The Red River will flood every four years
b) In the next 100 years, the Red River will flood exactly 25 times
c) In the last 100 years, the Red River flooded exactly 25 times
d) In the next 100 years, the Red River will flood about 25 times
e) In the next 100 years, it is very likely that the Red River will flood
exactly 25 times
2
Question 2
• A random variable X has a probability distribution as follows:
•
a)
b)
c)
d)
e)
X
0
1
2
3
P(X=x)=PX(x)
2k
3k
13k
2k
Then the probability that X is less than 2, i.e., P(X<2),is equal to:
0.90
0.25
0.65
0.15
1.00
3
Question 3
• Cans of soft drinks cost $0.30 in a certain vending machine. What is
the expected value and variance of daily revenue (Y) from the
machine if X, the number of cans sold per day has E(X)=125 and
Var(X)=50?
a) E(Y)=37.5, Var(Y)=50
b) E(Y)=37.5, Var(Y)=4.5
c) E(Y)=37.5, Var(Y)=15
d) E(Y)=125, Var(Y)=4.5
e) E(Y)=125, Var(Y)=15
4
Question 4 - i
• The average length of stay in a hospital is useful for planning
purposes. Suppose that the following is the distribution of length of
stay in a hospital after a certain operation:
Days
2
3
4
5
6
Probabilities
0.05
0.20
0.40
0.20
?
• What is the probability that the length of stay is 6?
a) 0.15
b) 0.17
c) 0.20
d) 0.25
e) 0.05
5
Question 4 - ii
• The average length of stay in a hospital is useful for planning
purposes. Suppose that the following is the distribution of length of
stay in a hospital after a certain operation:
Days
2
3
4
5
6
Probabilities
0.05
0.20
0.40
0.20
0.15
• The average length of stay is:
a) 0.15
b) 0.17
c) 3.3
d) 4.0
e) 4.2
6
Question 5
• Many professional schools require applicants to take a
standardized test. Suppose that 1000 students take the
test, and you find that your mark of 63 (out of 100) was
the 73rd percentile. This means:
a) At least 73% of the students got 63 or better
b) At least 270 students got 73 or better
c) At least 270 students got 63 or better
d) At least 27% of the students got 73 or worse
e) At least 730 students got 73 or better.
7
Type-I: Reject Ho when Ho is true
Type-II: Cannot reject Ho when Ho is not true
Question 6
•
To determine the reliability of experts used in interpreting the results of
polygraph examinations in criminal investigations, 280 cases were studied.
The results were:
True status
Examiner’s
decision
•
Innocent
Guilty
Innocent
131
15
Guilty
9
125
If hypotheses were H: Suspect is innocent versus A: Suspect is guilty, then
we could estimate the probability of making a type II error as:
a) 15/280
b) 9/280
c) 15/140
d) 9/140
e) 15/146
8
Type-I: Reject Ho when Ho is true
Type-II: Cannot reject Ho when Ho is not true
Question 7
• In a statistical test of hypothesis, what happens to the rejection
region when a, the probability of type-I error or the level of
significance, is reduced?
a) The answer depends on the value of b (probability of type-II
error)
b) The rejection region is reduced in size
c) The rejection region is increased in size
d) The rejection region is not changed
e) The answer depends on the form of the alternative hypothesis
9
Type-I: Reject Ho when Ho is true
Type-II: Cannot reject Ho when Ho is not true
Question 8
• Which of the following is not correct?
a) The probability of type-I error is controlled by the selection of
the significance level, a.
b) The probability of type-II error is controlled only by the sample
size
c) The power of a test depends upon the sample size and the
distance between the null and alternative hypotheses
d) The p-value measure the probability that the null hypothesis is
true when Ho is rejected by the current data.
e) The rejection region is controlled by the a level and the
alternate hypothesis.
10
Type-I: Reject Ho when Ho is true
Type-II: Cannot reject Ho when Ho is not true
Question 9
• In a statistical test for the equality of a mean, such as Ho: m=10, if a
=0.05:
a) We will make an incorrect inference 95% of the time,
b) We will say that there is a real difference 5% of the time when
there is no difference
c) We will say that there is no real difference 5% of the time when
there is no difference
d) 95% of the time the null hypothesis will be correct
e) 5% of the time we will make a correct inference
11
Type-I: Reject Ho when Ho is true
Type-II: Cannot reject Ho when Ho is not true
Question 10
• Which of the following statements is correct?
a) An extremely small p-value indicates that the actual data differs
markedly from that expected if the null hypothesis were true
b) The p-value measures the probability that hypothesis is true
c) The p-value measures the probability of making a Type-II error.
d) A large p-value indicates that the data is consistent with the
alternative hypothesis
e) The larger the p-value, the stronger the evidence against the
null hypothesis
12
Type-I: Reject Ho when Ho is true
Type-II: Cannot reject Ho when Ho is not true
Question 11
• In a test of Ho: m=100, versus Ha: m≠100, a sample of size 10
produces a sample mean of 103 and a p-value of 0.08. Thus at the
significance level a =0.05:
a) There is sufficient evidence to conclude that m≠100
b) There is sufficient evidence to conclude that m=100
c) There is insufficient evidence to conclude that m=100
d) There is insufficient evidence to conclude that m≠100
e) There is sufficient evidence to conclude that m=103
13
Formulas
Expectation and variance formulas
1. E[cX] = cE[X]
2. E[c1X1 + c2X2 + … + cnXn] = c1E[X1] + c2E[X2] + … + cnE[Xn]
3. Var(X)  0
4. Var(cX) = c2Var(X)
5. Var(c1X1 + c2X2 + … + cnXn) = c12Var[X1] + c22Var[X2] + … +
cn2Var[Xn] if Xi’s are independent (or uncorrelated)
Covariance and correlation formulas
1. Cov(Xi, Xj) = Cij = E[(Xi - mi)(Xj - mj)] = E[XiXj] - mimj
2. Cij = Cji, Cii = i2 = Var(Xi)
3.
Cij
Cij
Cor( X i , X j )  ij 

 i2 2j
 i j
14
Joint Distributions
Joint Distributions of Discrete Random Variables
p ( x, y )  P ( X  x, Y  y ) (joint probabilit y mass or density function)
F(x,y) 
 p(u, v)
(joint cumulative distributi on function)
u  x ,v  y
p ( x, y )  P ( X  x) P (Y  y )  p X ( x) pY ( y ) (if X and Y are independen t)
p X ( x )   p ( x, y )
all y
pY ( y )   p ( x, y )
all x
Joint Distributions of Continuous Random Variables
F ( x, y )  P( X  x, Y  y ) (joint cumulative distributi on function)
f(x,y)dxdy  P( X  dx, Y  dy ) (joint probabilit y mass or density function)
F ( x, y )  FX ( x) FY ( y ) or f ( x, y )  f X ( x) fY ( y ) (if X and Y are independen t)

p X ( x)   f ( x, y )dy


pY ( y )   f ( x, y )dx

15
Example
24 xy for x  0, y  0 and x  y  1
f ( x, y )  
otherwise
0
1 x
f X ( x) 
2
xy
12

xydy
24

1 x
0
 12 x(1  x) 2
0
1 y
fY ( y ) 
2
yx
12

xydx
24

1 y
0
 12 y (1  y ) 2
0



f ( x, y )  24 xy  12 x(1  x) 2 12 y (1  y ) 2  f X ( x) fY ( y )  Not independen t
16
Example
1
1
0
0
E X   E Y    xf X ( x)dx   12 x 2 (1  x) 2 dx 
   EY    x
1
EX
2
2
1
2
f X ( x)dx   12 x 3 (1  x) 2 dx 
0
0
 
Var ( X )  Var (Y )  E X  E X 
2
0
2
1
5
2
1 2
1
   
5 5
25
1
 1 x

2
2
2
3


xyf
(
x
,
y
)
dxdy

x
24
y
dy
dx

8
x
(
1

x
)
dx

0
0  0
0

15

1 1 x
E XY   
2
5
1
2
2
2 2
2
Cov( X , Y )  E XY   E X E Y       
15  5 
75
Cov( X , Y )
 2 / 75
2
Cor( X , Y ) 


3
Var ( X )Var (Y )
(1 / 25)(1 / 25)
17
Important Families of Distributions
Normal Distribution
X ~ Normal ( m ,  2 )  EX   m , Var ( X )   2
f X ( x) 
1
2
2
e
1  xm 
 

2  
2
We can obtain almost all other important distributions in (parametric)
statistics by the following transformations.
Standard Normal Distribution
Z
X m

~ Normal (0,1)  EX   0, Var ( X )  1
f Z ( x) 
1
2
e
1
 x2
2
18
Standard Normal PDF
19
20
21
22
Cumulative distribution function (CDF):
23
24
25
26
27
28
29
Chi-Square Distribution
Z is a random variable with standard nrmal distribution
 
 
Z12  Z 22    Z n2 ~  n2  E  n2  n, Var  n2  2n
f  2 ( x) 
n
x
n2
2
e

x
2
n 2
Γ  2
2
n

Γ ( y )   e  x x y 1dx (Gamma function)  Γ (n)  (n-1)!
0
30
Chi-Square PDF
31
F Distribution
 n2 / n
m
2 m 2 ( m  n  2)
~ Fn ,m  E Fn ,m  
, Var Fn ,m  
2
m / m
m2
n ( m  2) 2 ( m  4)
nm
n
nm



n2
n  2
2   n 2 2 

f Fn ,m ( x) 
  x 1  x 
m 
n mm

    
2  2 
32
Student’s t Distribution
Z
 n2
n
 
n
~ t n  E t n   0, Var t n 
n2
 n 1
n 1



2

 2
x
2


1  
f tn ( x) 
n 
n

  n
2
33
Student’s t PDF
34
35
Estimation of Means, Variances and
Correlations
• Suppose that X1, X2, …, Xn are IID random variables
(observations) with finite population mean m and variance 2.
• Sample Mean is an unbiased estimator of the population mean m
 n

E
X
X

  i  nm
i
i 1
X ( n) 
 mˆ  E  X (n)    i 1  
 m (Unbiased)
n
n
n
 n

 n

Var
X
X
  i  n 2  2
 i 
 i 1  
i 1

Var(X (n))  Var 

(Minimum variance)
n2
n2
n
 n 




 2 
X ( n)  m
~ Normal  0,1  X ( n) ~ Normal  m ,

n

0
n  0
n
/ n


n
36
The Central Limit Theorem
• Sample Variance is an unbiased estimator of the population variance 2
n
S 2 ( n) 
 X
i  X ( n) 
i 1
n 1
2
 ˆ 2  E  S 2 (n)    2 (Unbiased)
2 4
Var S (n) 
(Minimum variance)
n 1
(n  1) S 2 (n)
~  n21
2


2

n
S 2 ( n)
Var( X (n)) 

n
 X
i 1
i  X ( n) 
n(n  1)
n  0
2
2

 E  Var( X (n))  


n
(Unbiased)
z
 X ( n)  m

1  x2 / 2
lim P 
 z   ( z )  
e
dx (Normal(0, 1) cdf)
n  
 / n

  2
37
38
Confidence Intervals and Hypothesis Tests for the Mean
• Suppose that X1, X2, …, Xn are IID random variables
(observations) with finite population mean m and variance 2
• We want to find a confidence interval [l(n,a), u(n,a)] so that
P{ l(n,a)  m  u(n,a)} = 1 - a
X (n)  z1a / 2
S 2 ( n)
n
(n large! )
S 2 ( n)
X (n)  t n1,1a / 2
( X ~ Normal ( m , 2 ))
n
• The length of the confidence interval is longer for the t distribution
since
t n 1,1a / 2  z1a / 2
lim t n 1,1a / 2  z1a / 2
n  
39
Skewness
• Actual coverage of the confidence interval depends on the sample size as
well as the shape of the distribution in which skewness (a measure of
symmetry) plays an important role.
 X  m 3 
  E 

   
40
Example 4.26
• Suppose that the 10 observations 1.20, 1.50, 1.68, 1.89,
0.95, 1.49, 1.58, 1.55, 0.50, and 1.09 are from a normal
distribution. Then, a 90% confidence interval is found as
follows
X (10)  1.34 and S 2 (10)  0.17
X (10)  t9,0.95
S 2 (10)
0.17
 1.34  1.83
 1.34  0.24  1.10, 1.58
10
10
41
42
Hypothesis Testing
• In hypothesis testing, we need to choose among two competing hypothesis
Status quo
H 0 : m  m0
Claim
H1 : m  m 0
P{Reject H 0 | H 0 }  a
(Type I error)
P{Accept H 0 | H1}  b
(Type II error)
α  level of significan ce
  1-β  power of the test
43
Testing Hypothesis on the Mean
• The decision rule with the maximum power (minimum Type II error
probability) for which Type I error probability is at most a is given by
the following t test
tn 
X ( n)  m 0
S ( n) / n
t n  t n 1,1a / 2
(Reject H 0 )
t n  t n 1,1a / 2
(Accept H 0 )
• The critical region of rejection is given by a confidence interval
t n  t n 1,1a / 2 
X ( n)  m 0
2
S ( n)
n
 t n 1,1a / 2  m 0  X (n)  t n 1,1a / 2
S 2 ( n)
n
44
Example 4.27
• For the data of Example 4.26, suppose that we want to
test H0: m = 1 at level a = 0.90.
t10 
X (10)  1
2
S (10)
10

0.34  1
0.17
10
 2.65  1.83  t 9,0.95  Reject H 0
45
The Strong Law of Large Numbers
• Theorem 4.2: Suppose that X1, X2, … are IID random variables with
finite mean m, then
lim P{X (n)  m}  1
n  
• Example 4.29:
46
Replacing a Random Variable by Its Mean
•
In general, it is not a good practice to replace random quantities by
their means in a simulation study.
•
Example 4.30: Suppose that mean interarrival time is 1 minute and
the mean service time is 0.99 minute in an M/M/1 queue. If
simulation is done with the means, then all delays are 0 and the
queue is always empty. But, in the actual M/M/1 model  = / =
0.99 and the average delay is computed as
Wq = /(1- ) = 0.99(0.99)/0.01 = 98.01 minutes.
47
Download