Characterizing Uncertainty: Data Distributions & Probability

Characterizing Uncertainty 11 Characterizing a Distribution • Several ways to characterize a distribution: Central Tendency – what is the “most likely” value? Spread – how much do the observations “differ”? Week 1 2 3 4 5 6 7 8 9 10 Unit Sales 1 5 3 2 3 3 3 2 5 2 Week 11 12 13 14 15 16 17 18 19 20 Unit Sales 3 2 3 4 2 1 3 4 4 3 Week 21 22 23 24 25 26 27 28 29 30 Unit Sales 2 4 4 3 4 1 2 3 4 4 Week 31 32 33 34 35 36 37 38 39 40 Unit Sales 1 2 3 4 5 5 1 5 5 1 Week 41 42 43 44 45 46 47 48 49 50 Unit Sales 2 3 3 3 4 1 2 3 4 2 12 Central Tendency Metrics • Mode – value that appears most frequently • Median – value in the “middle” of a distribution – value separating the lower from the higher half • Mean – sum of values divided by the total number of observations (average) – sum of values multiplied by their probability (expected value) Mode = 3 Median = 3 13 Central Tendency - Mean • Sum sales and divide by number of observations (weeks) Sum = 148 units sold N = 50 weeks Mean = Average = 148/50 = 2.96 units/week 35% 30% 30% 22% 25% • Expected Value 20% 22% 14% 15% 12% 10% Notation 5% 0% X = Discrete random variable xi = Possible values of X, i.e., x1, x2, x3, . . .,xn pi = Corresponding probabilities, i.e., p1, p2, p3, . . ., pn 1 2 3 4 5 Where P(X=xi) = pi and the probabilities sum to 1, i.e., p1+p2+p3+…+pn = 1 The expected value of X, E[X], is equal to: x p px i E[X] = x = m = å pi xi n i=1 1 2 3 4 5 i .14 .22 .30 .22 .12 i i .14 .44 .90 .88 .60 Σ=2.96 14 Spread Metrics • Range – maximum value minus minimum value • Inner Quartiles – the 75th percentile value minus the 25th percentile value • Variance – expectation of the squared deviation around the mean 2 Var[X] = s = å pi ( xi - x ) = å pi ( xi - m ) 2 n i=1 n 2 i=1 Max 75th Percentile Range = 5-1 = 4 25th Percentile Inner Quartile = 4-2 = 2 Min 15 Spread - Variance • Variance – Expectation of the squared deviation around the mean – Also called the Second Moment around the mean 2 Var[X] = s = å pi ( xi - x ) = å pi ( xi - m ) 2 n i=1 xi 1 2 3 4 5 pi .14 .22 .30 .22 .12 pixi .14 .44 .90 .88 .60 μ=2.96 n 2 i=1 xi-μ -1.96 -0.96 0.04 1.04 2.04 (xi-μ)2 3.84 0.92 0.0016 1.08 4.16 pi(xi-μ)2 0.5376 0.2024 0.00048 0.2376 0.4992 σ2=1.48 • Standard Deviation – Square root of the variance - In same units as the mean! σ=√1.48 = 1.215 units/week • Coefficient of Variation – Ratio of standard deviation to the mean - Standard measure of variability CV=σ/μ= 1.215 / 2.96 = 0.411 16 Zippy Bright – Summary Statistics Minimum = 1 25th Pct = 2 μ=Mean = 2.96 50th Pct = Median = 3 Mode = 3 75th Pct = 4 Maximum = 5 Range = 4 Inner Quartile = 2 σ2 = Variance = 1.48 σ = Standard Deviation = 1.215 CV = Coefficient of Variation = 0.411 17 Discrete Probability Distributions 18 Probability Distributions • Where do they come from? Empirical – based on actual data Theoretical – based on a mathematical form • Which is better? It depends on what you are trying to accomplish Empirical distributions follow past history Theoretical distributions can allow for more robust modeling Typically, we look for the theoretical distribution that fits the data 19 Discrete Theoretical Distributions • Discrete Uniform Distribution N possible values Each value has equal probability, i.e., pi= 1/N Ex: Rolling a die • Poisson Distribution Probability of seeing x events within a certain time period Example: Random arrivals to a customer service desk PMFs of Theoretical Distributions 40.0% Probabiity of X 35.0% 30.0% 25.0% 20.0% Uniform [1,6] 15.0% Poisson (mean=1.5) 10.0% 5.0% 0.0% 0 1 2 3 4 5 6 Random Variable X 7 8 9 10 20 Discrete Uniform Distribution • Notation: U(a, b) a = Minimum b = Maximum n = # of values = b – a + 1 • Metrics Mean = (a + b) / 2 Median = (a + b) / 2 Mode N/A Variance = ((b-a+1)2 – 1)/12 Probability Mass Function ì 1 ï for a £ x £ b P éë X = xùû = f (x | a,b) = í n ï 0 otherwise î PMF Rolling One Die 20% 16% 12% 8% 4% 0% 1 2 3 4 5 6 i 1 2 3 4 5 6 xi 1 2 3 4 5 6 pi 1/6 1/6 1/6 1/6 1/6 1/6 μX = 1/6*1 + 1/6*2 + 1/6*3 + 1/6*4 + 1/6*5 + 1/6*6 = 3.5 = (6 + 1)/2 σ2X = 1/6*(1-3.5)2 + 1/6*(2-3.5)2 + 1/6*(3-3.5)2 + 1/6*(4-3.5)2 + 1/6*(5-3.5)2 + 1/6*(6-3.5)2 = 2.917 = ((6-1+1)2 -1) / 12 σX = √(2.917) = 1.708 21 Poisson Distribution 22 Poisson Distribution • Widely used to model arrivals, slow moving inventory, etc. • Discrete distribution that cannot take negative values • Notation: P(λ) x p 0.70 λ = mean = variance Probability Mass Function ì -l x ï e l for x = 0,1,2,... P éë X = xùû = f (x | l ) = í x! ï 0 otherwise î Recall: e = Euler’s number 2.71828 . . . λ = distribution parameter (mean) x! = factorial of x, e.g., 5! = 5×4×3×2×1 = 120 and 0! = 1 i i 0 61% 1 30% 2 8% 3 1% 4 0.2% 5 0.02% 0.60 0.50 0.40 0.30 0.20 0.10 - 0 1 2 3 4 5 Suppose λ=0.5 P[X=0] = (e-0.5 λ0)/(0!) = (0.607)(1)/1 = 0.61 P[X=1] = (e-0.5 λ1)/(1!) = (0.607)(0.5)/1 = 0.30 P[X=2] = (e-0.5 λ2)/(2!) = (0.607)(0.25)/2 = 0.08 P[X=3] = (e-0.5 λ3)/(3!) = (0.607)(0.125)/6 = 0.01 P[X=4] = (e-0.5 λ4)/(4!) = (0.607)(0.0625)/24 ≈ 0.002 P[X=5] = (e-0.5 λ5)/(5!) = (0.607)(0.0312)/120 ≈ 0.0002 23 Poisson Distribution – for different λ values 0.50 0.45 Note: • As λ increases, the distribution becomes more symmetric and “bell shaped” • Value is always an integer ≥0 • The value of λ does not need to be integer 0.40 0.35 0.30 λ= 0.75 0.25 λ= 2 λ= 5 0.20 λ= 10 0.15 0.10 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 Probability Mass Function Poisson Distribution ì -l x ï e l é ù P ë X = x û = f (x | l ) = í x! ï 0 î You are running the customer complaint center for Zippy Bright. Customer complaint calls come in ~P(2.2) per minute. for x = 0,1,2,... otherwise Cumulative Density Function -l k x e l é ù P ë X £ xû = å k=0 k! 1. What is the probability that no calls will come in over the next minute? P[X=0] = (e-2.2 λ0)/(0!) = (0.1108)(1)/1 = 0.11 or 11% 2. What is the probability that 2 or fewer calls will come in over the next minute? P[X=0] = (e-2.2 λ0)/(0!) = (0.1108)(1)/1 = 0.11 or 11% P[X=1] = (e-2.2 λ1)/(1!) = (0.223)(2.2)/1 = 0.24 or 24% P[X=2] = (e-2.2 λ1)/(2!) = (0.223)(4.84)/2 = 0.27 or 27% P[X≤2]62% 3. What is the probability that at least 1 call will come in over the next minute? P[X>0] = 1 – P[X=0] = 1 – 0.11 = 0.89 or 89% Spreadsheet Function Prob 1 Prob 2 Microsoft Excel =POISSON.DIST(x, mean, cumulative) =POISSON.DIST(0, 2.2, 0) =POISSON.DIST(2, 2.2, 1) Google Sheets =POISSON(x, mean, cumulative) =POISSON(0, 2.2, 0) =POISSON(2, 2.2, 1) LibreOffice->Calc =POISSON(Number; Mean; C) =POISSON(0; 2.2; 0) =POISSON(2; 2.2; 1) 25 Key Points MIT Center for Transportation & Logistics 1 Key Points • Characterize a distribution: n Central Tendency w Mode – value that appears most frequently w Median – value in the “middle” of a distribution, separating the lower from the higher half w Mean (μ) – sum of values multiplied by their probability (expected value n Spread w Range – maximum value minus minimum value w Inner Quartiles – 75th percentile value minus the 25th percentile value w Variance (σ2) - expectation of the squared deviation around the mean w Standard Deviation (σ) - Square root of the variance w Coefficient of Variation (CV) – Standard deviation over the mean = σ/μ n E[X] = x = µ = ∑ pi xi i=1 MIT Center for Transportation & Logistics n 2 n Var[X] = σ = ∑ pi ( xi − x ) = ∑ pi ( xi − µ ) 2 i=1 2 i=1 2 Key Points • Theoretical Distributions n Discrete Uniform PMF Rolling One Die Probability Mass Function ⎧ 1 ⎪ for a ≤ x ≤ b P ⎡⎣ X = x⎤⎦ = f (x | a,b) = ⎨ n ⎪ 0 otherwise ⎩ n 20% 16% 12% 8% 4% 0% 1 2 3 4 5 6 Poisson Probability Mass Function ⎧ −λ x ⎪ e λ P ⎡⎣ X = x⎤⎦ = f (x | λ ) = ⎨ x! ⎪ 0 ⎩ MIT Center for Transportation & Logistics 0.70 0.60 0.50 for x = 0,1,2,... 0.40 0.30 0.20 otherwise 0.10 - 0 1 2 3 4 5 3 Questions, Comments, Suggestions? Use the Discussion Forum! “Dexter, Brody, and Wilson hoping that the probability of getting the treat is not zero. ” MIT Center for Transportation & Logistics MIT Center for Transportation & Logistics caplice@mit.edu ctl.mit.edu

Characterizing Uncertainty: Data Distributions & Probability

Related documents

Products

Support

Characterizing Uncertainty: Data Distributions & Probability

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib