Modeling Uncertainty: Probability Distributions Lonnie Chrisman, Ph.D. Lumina Decision Systems Analytica User Group Webinar Series Session 2: 6 May 2010 Copyright © 2010 Lumina Decision Systems, Inc. Today’s Topics • Review • How can we characterize uncertainty for continuous quantities? • The Normal Distribution Viewing & interpreting • LogNormal Distribution • Why include uncertainty Copyright © 2010 Lumina Decision Systems, Inc. Course Syllabus (tentative) Over the coming weeks: • What is uncertainty? Probability. • Probability Distributions (today) • • • • • Monte Carlo Sampling Measures of Risk and Utility Common parametric distributions Assessment of Uncertainty Risk analysis for portfolios (risk management) • Hypothesis testing Copyright © 2010 Lumina Decision Systems, Inc. Review Copyright © 2010 Lumina Decision Systems, Inc. What is Uncertainty? • Uncertainty: the lack of perfect and complete knowledge. • Applies to: Future outcomes Existing states or quantities Physical measurements Unknowable (quantum mechanics) • Exercise: State something that you have perfect and complete knowledge of. Copyright © 2010 Lumina Decision Systems, Inc. Related Concepts • Randomness Will by next coin toss be heads or tails? • Variation 75% of the people in this room have type A blood. • Vagueness How many people worldwide live in warm climates? • Risk You could die during the operation. • Statistical Confidence/Significance The study confirmed the hypothesis at a 95% confidence level. Copyright © 2010 Lumina Decision Systems, Inc. Probability: A language for uncertainty Probability: A measure for how certain, on a scale from 0 to 1, a statement is to be true. • • • • P(A)=0 : Assertion A is certainly false. P(A)=1 : Assertion A is certainly true. P(A)=0.5: Equally likely to true or false. P(A)=0.7: A is more likely true than false. Copyright © 2010 Lumina Decision Systems, Inc. Assertions must be Crisp and Unambiguous Probability of what? • Must be a true/false assertion. • Vagueness not allowed. ✘ “Gas prices will increase substantially in the short term.” ✔ “The average retail price for regular unleaded gas in the state California, as reported by the U.S. Energy Information Administration, will increase by more than 20% from 26 Apr 2010 to 30 Aug 2010.” • Truth theoretically knowable Copyright © 2010 Lumina Decision Systems, Inc. Boolean Chance Variables in Analytica • Characterized by a single probability – P(B=true). • Examples: Component fails Dow drops by >1000 points Civil war breaks out in Nigeria Subject is male • Use Chance variable defined as Bernoulli(p) Copyright © 2010 Lumina Decision Systems, Inc. “Subjective” Interpretation of Probability • Probabilities measure: how much what we know. not frequency of occurrence. • Calibration: Over many probability assessments, the frequency of true assertions should match our subjective probabilities for the assertions. Copyright © 2010 Lumina Decision Systems, Inc. Today’s New Topics Copyright © 2010 Lumina Decision Systems, Inc. Continuous Quantities • Most variables in quantitative models represent real-valued quantities. Examples: Revenue Infection rate Oil well capacity Megawatt power output Unit sales (?) • Saying “Probability of x”, or P(x), is nonsensical. • We need something more… Copyright © 2010 Lumina Decision Systems, Inc. Real-valued uncertainty example At this time (6 May 2010), at what rate (in gallons per hour) is oil leaking into the Gulf of Mexico from the well in Louisiana that exploded on 22 Apr 2010? • Does this pass the clarity test? • How can we express or knowledge and degree of uncertainty regarding the true value? Note: A CNN article gave an estimate of 8,300 gal/hr. Copyright © 2010 Lumina Decision Systems, Inc. Ways to Expressing Uncertainty (Attendees ideas) Rate of Oil leak: • Minimum & maximum values • Standard deviation • Mean + Median (if different) • Distribution, e.g, triangular with 10% + 90% percentiles. Copyright © 2010 Lumina Decision Systems, Inc. Average Deviation Suppose our “best guess” is: E[ oil_leak_rate ] = 10K gal/hr • What is the expected error in our estimate? = E[ |10K – trueValue| ] • Ave. dev. is a simple (intuitive?) one-number measure of how uncertain we are. Allows us to characterize our knowledge / uncertainty with just two numbers: Expected value + Expected deviation Aka: Expected Deviation, (mean/average) Absolute deviation. Copyright © 2010 Lumina Decision Systems, Inc. Standard Deviation • Other measures of uncertainty “dispersion”: Variance (expected/average squared error): = E[ (10K – trueValue)2 ] Standard Deviation = 2 Variance E [( 10 K trueValue ) ] • Standard deviation has the same intuitive meaning as average (absolute) deviation. Both are a type of best guess for how much error our best guess has. Nicer mathematical properites More commonly used. Copyright © 2010 Lumina Decision Systems, Inc. Standard Deviation vs. Average Deviation sd 2 E [( x x *) ] vs ad E [| x x * |] • Both are always non-negative. • Zero indicates absolute certainty. • Both are measured in the same units as x. • Q: Which measure gets larger when extreme errors are more likely? • What is the typical ratio sd/ad? Symmetric: sd ≈ 1.25 ad One-sided tail: sd ≈ 1.35 ad “Heavy” tails: (up to) 1.3 ad ≤ sd ≤ 2.5 ad Copyright © 2010 Lumina Decision Systems, Inc. Expressing uncertainty for a real-valued quantity • Expected value + dispersion measure, e.g.: Expected value + average deviation Expected value + standard deviation • Exercise: Express your uncertainty for the oil well leak example in the above forms. • There are no probabilities here. Why? Copyright © 2010 Lumina Decision Systems, Inc. Visualization Normal Distribution Expected Value EV=10K AD=3K SD =3.8K This is called a probability density function (PDF) plot. Ave dev. Std dev. Copyright © 2010 Lumina Decision Systems, Inc. Visualization Normal Distribution EV=10K AD=3K SD =3.8K +/- Ave Deviation 58% of area within 1 average deviation. The connection to probability. Copyright © 2010 Lumina Decision Systems, Inc. Visualization Normal Distribution EV=10K AD=3K SD =3.8K +/- Std Deviation 68% of area within 1 average deviation. Copyright © 2010 Lumina Decision Systems, Inc. Cumulative Probability Function (CDF) • Easier to read than PDF. • P(rate≤x) Copyright © 2010 Lumina Decision Systems, Inc. Specifying the Normal Distribution in Analytica • Define your real-valued variable as: Normal( mean, stddev ) Take note: Standard Deviation, not expected/average deviation. Remember to increase slightly (e.g., 25%) when estimating. Copyright © 2010 Lumina Decision Systems, Inc. Exercise A toy company must decide how many toys to manufacture for the Christmas season three months in advance. Demand is: Normal(100K,25K) It costs $5 to manufacture a toy. The company makes a $10 profit on each toy sold. They order 100K toys. What is their expected profit? Copyright © 2010 Lumina Decision Systems, Inc. Exercise <cont> Using the toy company example: • Compare estimated profit when uncertainty is ignored (based on Mean demand) to mean profit. • Examine how mean profit varies with the number of toys ordered: Units_ordered := Sequence(70K,130K,1K) • What size order should they place? • What improvement in value results from including explicit uncertainty in the model? Copyright © 2010 Lumina Decision Systems, Inc. Positive real-valued quantities • Many real-valued quantities are positive-only, but no hard upper limit: Oil leak rate Demand Population counts Stock prices Multiplier for positive quantity Capacities • Normal distribution allows negative values. Copyright © 2010 Lumina Decision Systems, Inc. Nonsense negatives Negative oil leak? Nearly impossible? Copyright © 2010 Lumina Decision Systems, Inc. LogNormal Distribution Mode Median Mean • Positive values only. • Positive skew (most values to right of mode) • Multiple possible “central” estimates. Copyright © 2010 Lumina Decision Systems, Inc. Specifying a LogNormal LogNormal(median,gsdev,mean,stddev) • You specify any two of these: Median: 50th percentile – “typical value” Mean: Average value Gsdev: geometric standard deviation Stddev: (Arithmetic) standard deviation • When using LogNormal, use namedparameter syntax, e.g.: LogNormal(mean:10K,stddev:3.8K) LogNormal(median:9350,mean:10K) Copyright © 2010 Lumina Decision Systems, Inc. Exercise A mining company obtains rights to extract a gold deposit during a one-week window next year, before a construction project starts on the site. Extracting the deposit will cost $900K. The size of the deposit: LogNormal(Mean:1K,Stddev:300) oz. The price of gold next year: LogNormal(Mean:$1K, stddev:$500) What is the expected value of these mining rights? Compare to result ignoring uncertainty. Copyright © 2010 Lumina Decision Systems, Inc. How important is choice of distribution? Exercise: • Modify mining example to use Normal instead of LogNormal, same mean & stddev. • How much does this change the result? Copyright © 2010 Lumina Decision Systems, Inc. Compare Normal to LogNormal These have the same mean and same standard deviation. Copyright © 2010 Lumina Decision Systems, Inc. The Flaw of Averages Who is this guy? A: Sam Savage, author of: An entertaining account of the distortions caused by average-case analysis. Copyright © 2010 Lumina Decision Systems, Inc. Why model uncertainty explicitly? • Misleading results otherwise… “Flaw of averages” • Explicit “precision” of results. • Some decisions are about uncertainty. E.g., to gather more information contingency planning • Improved combining of information sources. • Productivity: Probabilities & distributions can often be estimated more quickly than expected values (!) • Sensitivity analyses • Causal modeling & abduction (diagnostic reasoning) Copyright © 2010 Lumina Decision Systems, Inc. What we covered • Uncertainty about continuous quantities can be largely characterized by: Central value (e.g., mean or median) Dispersion measure (expected deviation, standard deviation, variance, geometric standard deviation). • Normal distribution – unbounded quantities • LogNormal distribution – positive quantities Copyright © 2010 Lumina Decision Systems, Inc.