Common Parametric Distributions Gentle Introduction to Modeling Uncertainty Series #6 Lonnie Chrisman, Ph.D. Lumina Decision Systems Analytica Users Group Webinar 10 June 2010 Copyright © 2010 Lumina Decision Systems, Inc. Course Syllabus Over the coming weeks: • What is uncertainty? Probability. • Probability Distributions • Monte Carlo Sampling • Measures of Risk and Utility • Risk analysis for portfolios • Common parametric distributions Assessment of Uncertainty • Hypothesis testing Copyright © 2010 Lumina Decision Systems, Inc. Today’s Topics • Continuous vs. discrete. • Non-parametric distributions. • A handful of the most common distributions. • The cases where each is useful. • How to encode each in Analytica. Lots of model building exercises… Copyright © 2010 Lumina Decision Systems, Inc. Outline (Order of exercises) • • • • • • • • “Pre-test” questions Discrete non-parametric: Monte Hall game Continuous non-parametric: Data resampling Event counts: Durations between events Uncertain percentages Bounded Bell shapes Copyright © 2010 Lumina Decision Systems, Inc. Distribution Types • Discrete • Continuous Copyright © 2010 Lumina Decision Systems, Inc. Custom (Non-parametric) Discrete ChanceDist(P,A,I) Parameters: • P = Array of probabilities. Sum(P,I)=1 • A = Array of possible outcomes • I = Index shared by P and A Note: When A is the index, you can use: ChanceDist(P,A) Copyright © 2010 Lumina Decision Systems, Inc. ChanceDist Exercise An event occurs on one of the 7 days of the week. • Each weekday 8% • Each day of weekend 30% Create a chance variable named Day_of_event with this distribution. Copyright © 2010 Lumina Decision Systems, Inc. ChanceDist Exercise 2: Monte Hall Game You are a contestant on a game show. A prize is hidden behind 1 of three curtains. You select curtain 1. “Before opening your curtain,” says the host, “let me reveal one of the unselected curtains that does not contain the prize… Curtain 2 is empty! Would you now like to change curtains?” Task: Build an Analytica model, computing the probability of winning the prize if you do or do not change curtains. Copyright © 2010 Lumina Decision Systems, Inc. Monte Hall Steps 1. Chance: Start with the uncertain real location of the prize. 2. Model how the host decides which curtain to show you. • He will never reveal the prize or your selected curtain. Otherwise he picks randomly. 3. Decision: Change or not? 4. Objective: Probability that your final selection is the one with the prize. Copyright © 2010 Lumina Decision Systems, Inc. Custom (non-parametric) Continuous Distributions • CumDist(p,x,i) Parameters: p : Probabilities that value <= x x : Ascending set of values i : index shared CumDist(p,x,x) or just CumDist(p,x) Copyright © 2010 Lumina Decision Systems, Inc. CumDist Exercise • A geologist estimates the capacity of a recently discovered oil deposit. He expresses is assessments as follows: 100% that 100K < capacity < 1B barrels 90% that 5M < capacity < 500M barrels 75% that 50M < capacity < 100M barrels Median estimate: 75M barrels • Use CumDist to encode these estimates as a distribution for capacity. Copyright © 2010 Lumina Decision Systems, Inc. Homework challenge: Using CumDist to Resample • You have 143 measured values of a quantities. Define an uncertain variable with the same implied distribution (even though your sample size doesn’t match). • Here is your synthetic data: Index Data_i := 1..143 Variable Data := ArcCos(Random( over:data_i)) • Steps (the parameters to CumDist): Sort Data in ascending order: Sort(Data,Data_i) Compute p – equal probability steps along Data_I, starting at 0 and ending at 1. Copyright © 2010 Lumina Decision Systems, Inc. The Most Commonly used Parametric Distributions • Discrete: Bernoulli Poisson Binomial Uniform integer • Continuous: Normal LogNormal Uniform Triangular Exponential Gamma Beta Copyright © 2010 Lumina Decision Systems, Inc. Why chose one distribution over another? • Discrete or continuous? • Bounded quantity or infinite tails? Bounded both sides Continuous Uniform Triangular Beta Discrete Binomial Uniform int One-sided tail Two tailed LogNormal Gamma Exponential Normal StudentT Logistic Poisson Copyright © 2010 Lumina Decision Systems, Inc. Why chose one distribution over another? • Discrete or continuous? • Bounded quantity or infinite tails? • Convenience Some distributions are more “natural” for certain types of quantities. Ease of assessment. • Analytical properties for mathematicians – not model builders. x • Correctness Other than broad properties, the sensitivity of computed results to specific choice of distributions for assessments is usually extremely low. Copyright © 2010 Lumina Decision Systems, Inc. Distributions for Integer-valued Counts #1 • Poisson(mean) Count of events per unit time. # Earthquakes >6.0 in a given year # Vehicles that pass in a given hour # Alarms in a given month # Pelicans rescued from oil spill today When the occurrence of each event is independent of the time of occurrence of other events, the # of occurrences in any given window is Poisson distributed. Copyright © 2010 Lumina Decision Systems, Inc. Distributions for Integer-valued Counts #2 • Binomial(n,p) Number of times an event occurs in n repeated independent trials, each having probability p. # oil well blowouts in the next 100 deep-water wells drilled. # people that visit a store in its first month out of the 10,000 residents of the town. # of positive test results in 50 samples tested. Copyright © 2010 Lumina Decision Systems, Inc. Exercise with event counts In a certain region, malaria infections occur at an average rate of 500 infections per year. 10% of infections are fatal. Build an Analytica model to compute the distribution for the number of people expected to die from a malaria infection in a given year. Copyright © 2010 Lumina Decision Systems, Inc. Duration between events • Exponential(rate) When events occur independently at a given rate, this gives the time between successive events. Note: rate = 1 / meanArrivalTime • Gamma(a,1/rate) Time for a independent events to occur, each having a mean arrival time of 1/rate. Copyright © 2010 Lumina Decision Systems, Inc. Arrival times exercise • Cars arrive at a stoplight at a rate of 5 per minute. There is room for 10 cars before nearby freeway traffic is blocked. • Graph the CDF for the amount of time until cars begin to block freeway traffic when the light is red. • If the light stays red for 90 seconds, what fraction of red light-change cycles will result in blocked traffic? Copyright © 2010 Lumina Decision Systems, Inc. Uncertain Percentages • Beta(a,b) Useful for modeling uncertainty about a probability or percentage. Beta(a,b) expresses uncertainty on a [0,1] bounded quantity. Suppose you’ve seen s true instances out of n observations, with no further information. You’d estimate the true proportion as p=s/n. The uncertainty in this estimate can be modeled as: Beta(s+1,n-s+1) • Exercise: Of 100 sampled voters, 55 supported Candidate A. Model the uncertainty on the true proportion. Copyright © 2010 Lumina Decision Systems, Inc. Bounded Distributions • Triangular(min,mode,max) Often very convenient & natural for expressing estimates when only the range and a best guess are available. • Pert(min,mode,max) Same idea as Triangular. To use, include “Distribution Variations.ana” • Uniform(min,max) All values between are equally likely. • Uniform(min,max,integer:true) All integer values are equally likely. Copyright © 2010 Lumina Decision Systems, Inc. Bounded comparisons • Using: Min = 10 Mode = 25 Max = 40 • Compare distributions (on same PDF & CDF plot): Triangular Pert Uniform • Repeat for Mode=15 Copyright © 2010 Lumina Decision Systems, Inc. Central Limit Theorem • Suppose y = x1·x2·x3· .. ·xN z = x1+x2+x3+ .. +xN Each xi ~ P(·), where P(·) is any distribution. (each xi is independent) • Then as N→∞, y→LogNormal(..) z→Normal(..) Copyright © 2010 Lumina Decision Systems, Inc. Sensitivity to Distribution Choice • Load the TXC model (Example Models – Risk Analysis) • Compare Total_cost for these Control_cost_factor distributions: LogNormal(mean:108.6M,stddev:45.96M) Gamma(5.58,19.45M) Uniform(29M,188M) Triangular(41M,60M,245M) Weibull(2.53,122.4M) • Using the LogNormal: Compare Total_cost when Control_cost_factor mean is increased or decreased by 10%. Compare when stddev is altered by 50% Copyright © 2010 Lumina Decision Systems, Inc. Summary • Various parametric distributions are convenient for certain type of quantities. • Choice of parametric distribution is usually driven by: Continuous vs. discrete Tails or bounded Broad shape Type of information easily estimated • Results are usually fairly insensitive to exact choice of distribution type. Copyright © 2010 Lumina Decision Systems, Inc.