Uploaded by sumit indian

SC0x M2Unit2 ProbabilityDiscreteDist ANNOTATED

advertisement
Characterizing Uncertainty
11
Characterizing a Distribution
• Several ways to characterize a distribution:
Central Tendency – what is the “most likely” value?
Spread – how much do the observations “differ”?
Week
1
2
3
4
5
6
7
8
9
10
Unit
Sales
1
5
3
2
3
3
3
2
5
2
Week
11
12
13
14
15
16
17
18
19
20
Unit
Sales
3
2
3
4
2
1
3
4
4
3
Week
21
22
23
24
25
26
27
28
29
30
Unit
Sales
2
4
4
3
4
1
2
3
4
4
Week
31
32
33
34
35
36
37
38
39
40
Unit
Sales
1
2
3
4
5
5
1
5
5
1
Week
41
42
43
44
45
46
47
48
49
50
Unit
Sales
2
3
3
3
4
1
2
3
4
2
12
Central Tendency Metrics
• Mode – value that appears most frequently
• Median – value in the “middle” of a distribution
– value separating the lower from the higher half
• Mean – sum of values divided by the total number of observations (average)
– sum of values multiplied by their probability (expected value)
Mode = 3
Median = 3
13
Central Tendency - Mean
• Sum sales and divide by number of observations (weeks)
Sum = 148 units sold
N = 50 weeks
Mean = Average = 148/50 = 2.96 units/week
35%
30%
30%
22%
25%
• Expected Value
20%
22%
14%
15%
12%
10%
Notation
5%
0%
X = Discrete random variable
xi = Possible values of X, i.e., x1, x2, x3, . . .,xn
pi = Corresponding probabilities, i.e., p1, p2, p3, . . ., pn
1
2
3
4
5
Where P(X=xi) = pi and the probabilities sum to 1, i.e., p1+p2+p3+…+pn = 1
The expected value of X, E[X], is equal to:
x
p
px
i
E[X] = x = m = å pi xi
n
i=1
1
2
3
4
5
i
.14
.22
.30
.22
.12
i i
.14
.44
.90
.88
.60
Σ=2.96
14
Spread Metrics
• Range – maximum value minus minimum value
• Inner Quartiles – the 75th percentile value minus the 25th percentile value
• Variance – expectation of the squared deviation around the mean
2
Var[X] = s = å pi ( xi - x ) = å pi ( xi - m )
2
n
i=1
n
2
i=1
Max
75th Percentile
Range
= 5-1 = 4
25th Percentile
Inner
Quartile
= 4-2 = 2
Min
15
Spread - Variance
• Variance – Expectation of the squared deviation around the mean
– Also called the Second Moment around the mean
2
Var[X] = s = å pi ( xi - x ) = å pi ( xi - m )
2
n
i=1
xi
1
2
3
4
5
pi
.14
.22
.30
.22
.12
pixi
.14
.44
.90
.88
.60
μ=2.96
n
2
i=1
xi-μ
-1.96
-0.96
0.04
1.04
2.04
(xi-μ)2
3.84
0.92
0.0016
1.08
4.16
pi(xi-μ)2
0.5376
0.2024
0.00048
0.2376
0.4992
σ2=1.48
• Standard Deviation – Square root of the variance
- In same units as the mean!
σ=√1.48 = 1.215 units/week
• Coefficient of Variation – Ratio of standard deviation to the mean
- Standard measure of variability
CV=σ/μ= 1.215 / 2.96 = 0.411
16
Zippy Bright – Summary Statistics
Minimum = 1
25th Pct = 2
μ=Mean = 2.96
50th Pct = Median = 3
Mode = 3
75th Pct = 4
Maximum = 5
Range = 4
Inner Quartile = 2
σ2 = Variance = 1.48
σ = Standard Deviation = 1.215
CV = Coefficient of Variation = 0.411
17
Discrete Probability Distributions
18
Probability Distributions
• Where do they come from?
Empirical – based on actual data
Theoretical – based on a mathematical form
• Which is better?
It depends on what you are trying to accomplish
Empirical distributions follow past history
Theoretical distributions can allow for more robust modeling
Typically, we look for the theoretical distribution that fits the
data
19
Discrete Theoretical Distributions
• Discrete Uniform Distribution
N possible values
Each value has equal probability, i.e., pi= 1/N
Ex: Rolling a die
• Poisson Distribution
Probability of seeing x events within a certain time period
Example: Random arrivals to a customer service desk
PMFs of Theoretical Distributions
40.0%
Probabiity of X
35.0%
30.0%
25.0%
20.0%
Uniform [1,6]
15.0%
Poisson (mean=1.5)
10.0%
5.0%
0.0%
0
1
2
3
4
5
6
Random Variable X
7
8
9
10
20
Discrete Uniform Distribution
• Notation: U(a, b)
a = Minimum
b = Maximum
n = # of values = b – a + 1
• Metrics
Mean = (a + b) / 2
Median = (a + b) / 2
Mode N/A
Variance = ((b-a+1)2 – 1)/12
Probability Mass Function
ì 1
ï
for a £ x £ b
P éë X = xùû = f (x | a,b) = í n
ï 0
otherwise
î
PMF Rolling One Die
20%
16%
12%
8%
4%
0%
1
2
3
4
5
6
i
1
2
3
4
5
6
xi
1
2
3
4
5
6
pi
1/6
1/6
1/6
1/6
1/6
1/6
μX = 1/6*1 + 1/6*2 + 1/6*3 + 1/6*4 + 1/6*5 + 1/6*6 = 3.5 = (6 + 1)/2
σ2X = 1/6*(1-3.5)2 + 1/6*(2-3.5)2 + 1/6*(3-3.5)2 + 1/6*(4-3.5)2 + 1/6*(5-3.5)2 + 1/6*(6-3.5)2 = 2.917
= ((6-1+1)2 -1) / 12
σX = √(2.917) = 1.708
21
Poisson Distribution
22
Poisson Distribution
• Widely used to model arrivals, slow moving inventory, etc.
• Discrete distribution that cannot take negative values
• Notation: P(λ)
x
p
0.70
λ = mean = variance
Probability Mass Function
ì -l x
ï e l
for x = 0,1,2,...
P éë X = xùû = f (x | l ) = í x!
ï
0
otherwise
î
Recall:
e = Euler’s number 2.71828 . . .
λ = distribution parameter (mean)
x! = factorial of x,
e.g., 5! = 5×4×3×2×1 = 120
and 0! = 1
i
i
0 61%
1 30%
2
8%
3
1%
4 0.2%
5 0.02%
0.60
0.50
0.40
0.30
0.20
0.10
-
0
1
2 3
4
5
Suppose λ=0.5
P[X=0] = (e-0.5 λ0)/(0!) = (0.607)(1)/1 = 0.61
P[X=1] = (e-0.5 λ1)/(1!) = (0.607)(0.5)/1 = 0.30
P[X=2] = (e-0.5 λ2)/(2!) = (0.607)(0.25)/2 = 0.08
P[X=3] = (e-0.5 λ3)/(3!) = (0.607)(0.125)/6 = 0.01
P[X=4] = (e-0.5 λ4)/(4!) = (0.607)(0.0625)/24 ≈ 0.002
P[X=5] = (e-0.5 λ5)/(5!) = (0.607)(0.0312)/120 ≈ 0.0002
23
Poisson Distribution – for different λ values
0.50
0.45
Note:
• As λ increases, the distribution becomes
more symmetric and “bell shaped”
• Value is always an integer ≥0
• The value of λ does not need to be integer
0.40
0.35
0.30
λ= 0.75
0.25
λ= 2
λ= 5
0.20
λ= 10
0.15
0.10
0.05
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
24
Probability Mass Function
Poisson Distribution
ì -l x
ï e l
é
ù
P ë X = x û = f (x | l ) = í
x!
ï
0
î
You are running the customer complaint
center for Zippy Bright. Customer
complaint calls come in ~P(2.2) per minute.
for x = 0,1,2,...
otherwise
Cumulative Density Function
-l k
x e l
é
ù
P ë X £ xû = å
k=0
k!
1. What is the probability that no calls will come in over the next minute?
P[X=0] = (e-2.2 λ0)/(0!) = (0.1108)(1)/1 = 0.11 or 11%
2. What is the probability that 2 or fewer calls will come in over the next minute?
P[X=0] = (e-2.2 λ0)/(0!) = (0.1108)(1)/1 = 0.11 or 11%
P[X=1] = (e-2.2 λ1)/(1!) = (0.223)(2.2)/1 = 0.24 or 24%
P[X=2] = (e-2.2 λ1)/(2!) = (0.223)(4.84)/2 = 0.27 or 27%
P[X≤2]62%
3. What is the probability that at least 1 call will come in over the next minute?
P[X>0] = 1 – P[X=0] = 1 – 0.11 = 0.89 or 89%
Spreadsheet
Function
Prob 1
Prob 2
Microsoft Excel
=POISSON.DIST(x, mean, cumulative)
=POISSON.DIST(0, 2.2, 0)
=POISSON.DIST(2, 2.2, 1)
Google Sheets
=POISSON(x, mean, cumulative)
=POISSON(0, 2.2, 0)
=POISSON(2, 2.2, 1)
LibreOffice->Calc
=POISSON(Number; Mean; C)
=POISSON(0; 2.2; 0)
=POISSON(2; 2.2; 1)
25
Key Points
MIT Center for
Transportation & Logistics
1
Key Points
• Characterize a distribution:
n
Central Tendency
w Mode – value that appears most frequently
w Median – value in the “middle” of a distribution, separating the lower
from the higher half
w Mean (μ) – sum of values multiplied by their probability (expected value
n
Spread
w Range – maximum value minus minimum value
w Inner Quartiles – 75th percentile value minus the 25th percentile value
w Variance (σ2) - expectation of the squared deviation around the mean
w Standard Deviation (σ) - Square root of the variance
w Coefficient of Variation (CV) – Standard deviation over the mean = σ/μ
n
E[X] = x = µ = ∑ pi xi
i=1
MIT Center for
Transportation & Logistics
n
2
n
Var[X] = σ = ∑ pi ( xi − x ) = ∑ pi ( xi − µ )
2
i=1
2
i=1
2
Key Points
• Theoretical Distributions
n
Discrete Uniform
PMF Rolling One Die
Probability Mass Function
⎧ 1
⎪
for a ≤ x ≤ b
P ⎡⎣ X = x⎤⎦ = f (x | a,b) = ⎨ n
⎪ 0
otherwise
⎩
n
20%
16%
12%
8%
4%
0%
1
2
3
4
5
6
Poisson
Probability Mass Function
⎧ −λ x
⎪ e λ
P ⎡⎣ X = x⎤⎦ = f (x | λ ) = ⎨ x!
⎪
0
⎩
MIT Center for
Transportation & Logistics
0.70
0.60
0.50
for x = 0,1,2,...
0.40
0.30
0.20
otherwise
0.10
-
0
1
2 3
4
5
3
Questions, Comments, Suggestions?
Use the Discussion Forum!
“Dexter, Brody, and Wilson hoping that the probability of
getting the treat is not zero. ”
MIT Center for
Transportation & Logistics
MIT Center for
Transportation & Logistics
caplice@mit.edu
ctl.mit.edu
Download