Risk Management Part 2 – Variability and Risk: Twin Sons of Different Mothers by Donald E. Shannon, PMP, CFCM, CPCM, DML The Contract Coach, Albuquerque, NM Disclaimer Information in this presentation makes reference to various software products. This should not be interpreted as a recommendation or endorsement by any of the sponsors of any one product. Individuals should conduct appropriate research to identify a product that best meets their specific needs. Where appropriate, credit has been given to the software OEM especially where screen shots of their products have been used. A listing of the products commonly used to accomplish the simulation and scheduling functions described herein is included at the end of this presentation Where Are We Going? Quantitative (numerical or statistical) analysis What is variability The nature of random variables How they are depicted Confidence intervals Variability (uncertainty) in program / project management Why estimates never seem to be right Adding risk to uncertainty = real world variability Adding risk and uncertainty to our cost/schedule model Variability Why things are never exactly alike…. Trivia questions: 1. 2. 3. Who is the driver in the above? What year was the photo taken and how can you tell? On average how long does a F-1 pit stop take in 2014 Trivia questions: 1. 2. 3. Who is the driver in the above? – Kimi Raikkonen What year was the photo taken and how can you tell? – 2008. Because Kimi’s car is #1 and he won the F1 championship in 2007 giving him that number for the following year. On average how long does a F1 pit stop take in 2014? – 3 Seconds A 3-Second Pit Stop? Let’s Fact Check That Claim Pit Stop Times 2014 F-1 Grand Prix of Spain Is the average pit stop time actually 3.0 seconds? Data from 13 pit stops during the most recent F-1 race Most consistent team was Williams with 3 stops each of which was 3.0 seconds Fastest team was Red Bull with an average stop of 2.675 seconds Further analysis would likely show that the average time of 2.8 in this sample was within the expected range and there is not enough evidence to reject the supposed average of 3.0 seconds Driver Team VET RB 3.1 2.3 2.7 RIC RB 2.9 2.4 2.65 GRO LOT 2.9 - 2.9 MAS WIL 3.0 - 3.0 BOT WIL 3.0 3.0 3.0 HAM MER 3.8 4.3 4.05 ROS MER 2.8 3.0 2.9 2.68 2.98 3.04 Average Stop 1 Stop 2 Average Variability (Uncertainty) The output of any task when repeated will vary to some degree from its predecessors or successors Size, weight, volume etc Performance time Your morning commute is a good example Depending on road conditions and traffic it may vary significantly from day-to-day Even when conditions are perfect the time still varies by a few seconds or minutes Random Variables We call these probabilistic events because the answer can not be described by a single value. The value of these events (time, cost, etc.) is variable. So how do we describe variability We establish ranges of possible values We define measures of central tendency (average, mean, median, mode) We identify how the values are spread out (variance) We quantify how far possible values are displaced from the center (standard deviation or zscore) On Closer Examination Reviewing some terms you probably know The pit stop range (low to high) was 2 Seconds (2.3 – 4.3) 3.0 was the most common (mode) time recorded 3.0 was the middle value (median) 3.04 was the arithmetic average (also called mean or μ) of all the times recorded The standard deviation (σ) was .54 All data lies within +/- 3 σ Pit Stop Time 8 7 6 5 4 3 2 1 0 μ -1 σ 2.3 - 2.5 - 2.8 2.5 2.7 3.0 1σ 3.1 3.3 2σ 3.3 3.5 Time 3.5 3.7 3σ 3.8 - 4.0 4.0 4.3 Random Variables Random variables are either discrete (whole numbers 1,2,3) or continuous There are separate rules for displaying the data for each Discrete RVs are typically shown in histograms (bar charts) Continuous RV are typically shown in probability distribution functions (PDF) or cumulative distribution functions (CDF) Histograms Data may be either continuous or discrete Discrete data is grouped into classes or “bins” Bin values displayed on x-axis Count or number in class displayed on y-axis A smoothed line enclosing the tips of each column may provide insight as to the underlying distribution (Normal, Lognormal, Beta, etc. Probability Distribution Function The total area under the curve is always equal to 1 (100%) The tallest point of the curve is the mode (most often occurring value) In symmetric distributions the mean and median are co-located with the mode If the distribution is not symmetric we say it is ‘skewed’ with the direction of the skew (left or right) being where the preponderance of the values reside. The shape of the distribution may give a clue to the underlying distribution but beware – sometimes the shape is misleading Values on the x-axis may be actual or z (standard deviations). If z scores those to the left of the mean are negative and those to the right are positive. Cumulative Distribution Function Values start at zero Values end at 1 (or 100 percent) Value on the y axis (vertical) is at cumulative probability for the value on the x axis i.e., the sum of the probabilities from zero to the selected x value The shape of the curve is an indication of the shape of the probability distribution function (PDF). Confidence Intervals What is the likelihood of an event happening? Single point estimate of continuous variable is undefined Therefore we phrase the question as: Probability that x will exceed some value Probability that x is less than some value Probability that x is between a and b. Using a PDF to Answer Questions Consider the normal distribution Average IQ is 100 Standard deviation is 10 points 68% lie between 90 – 110 95.4% lie between 80 – 120 99.74 lie between 70 – 130 PDFs help us answer questions such as: What is the probability of someone having an IQ between 110 and 120? (13.6%) What is the probability of an IQ of 125? (Trick question. If IQ is a continuous random variable the probability of one precise value is undefined) Uncertainty in Program Management Why things always take longer and cost more than you planned …. Murphy’s Law Often stated as: “Anything that can go wrong, will go wrong” References to the principle date to at least 1877 Famous corollaries include: If anything just cannot go wrong, it will anyway … Usually in the worst possible (or most inconvenient) time in the worst possible way. Things will be lost or damaged in inverse relationship to their value or need. But the real culprit in all of this is nature’s uncertainty. Uncertainty Uncertainty is an admission that the outcome of any event is a random variable Therefore the data describing that event (cost, performance time etc.) will be variable Forecasting the outcome of an uncertain event can only be stated probabilistically Probabilistic outcomes typically involve A percent likelihood the event will occur Sometimes an indication of underlying distribution (uniform, binomial, normal, etc,) When we plan for uncertain events we typically plan for the “most likely” outcome “On average” it costs “x” or takes “y” days Sometimes we do better Sometimes we do worse Please Fasten Your Seatbelts Statistics Airline Flight 101 Now Departing We can’t talk about uncertainty and quantitative risk management without delving into statistics. Keep in mind that the goal is not to teach you the math but the underlying concepts and terminology We’ll let the computer do the math – all we care about is the output But to understand the output you have to know the concepts. Uncertainty in Program Management The time to perform a task (e.g., a pit stop) is a random variable Because of this nature, when we express a task duration as a specific value (single point estimate) we ignore uncertainty Single point estimates are typically “most likely” values Better estimates are possible if we describe 3 points Minimum (Best Case) Maximum (Worst Case) Most Likely The three point estimate is especially useful if dealing with ‘expert opinion” Task Performance Time Most Likely (m) 9 6 Minimum (a) 15 Maximum (b) Triangular Distribution Triangle distribution is a continuous frequency distribution often used to model random variables (cost or performance time) in program management Most Likely (m) Fast Easy to use Provides reasonable accuracy 9 Tends to be slightly optimistic i.e., values returned tend to be a little less than what ends up being the case Formula Mean = a+m+b ÷ 3 = 30 ÷3 = 10 Mode = m = 9 STDEV = σ=√(a2+m2+c2-am-ab-mc) 18 15 6 Minimum (a) μ =10 Maximum (b) σ=1.58 Note: Don’t worry about doing the number crunching – that’s why we have software to do this for us! Triangular Distribution Probability of a value of x is given by the area of interest. Example. What is the probability that x is less than 7.25? f(x) = 2(x-a) (b-a)(m-a) f(x) = 2 x (7.25-6) (15-6)(9-3) f(x) = 2.5 / 54 = .0462963 p(x) = ½ * .0462963 * 1.25 = .029 or 2.9% Most Likely (m) 9 6 15 7.25 Minimum (a) μ =10 Maximum (b) σ=1.58 Note: Don’t worry about doing the number crunching – that’s why we have software to do this for us! Special Use Of the Triangular Distribution Some scheduling applications use a “weighted” version of the triangle The PERT Triangular Distribution distribution called the “PERT” Most Likely (m) distribution PERT (Program Evaluation and Review Technique) adds (statistical) 9 emphasis to the “Most Likely” value and weigh it 4 times as likely as the Minimum or Maximum Average (mean) is then = a+4m+b / 15 6 6 Criticism is that PERT tends to be Minimum (a) μ Maximum (b) optimistic =9.5 σ=1.50 Second criticism is that PERT ignores likelihood of events outside 3 points (closed interval) Other Distributions Used In Program Management Beta Distribution is a typically mound shaped continuous distribution Very flexible and can take on a number of shapes depending on the parameters used May be either closed form (upper example) or open form (lower example) Is less optimistic than triangular Other Distributions Used In Program Management Lognormal Distribution is a typically mound shaped continuous distribution Very flexible and can take on a number of shapes depending on the parameters used Open ended toward the upper side allowing for extreme values Is the resultant distribution when two triangular distributions are combined. Why Estimates of Cost or Schedule End Up Being Wrong Murphy was an optimist …… Leading Factors Optimism bias Everything takes longer than planned Everything costs more than you thought Improper estimates Corrupt data Failure to properly consider risk Optimism Bias Values predicted are “better” or “rosier” than real world experience would show Sources of Optimism Bias Expert Opinion on average tends to be as much as 25% optimistic for commonly performed tasks1 Use of the triangle distribution when another (e.g., Beta) is more appropriate The sequence of collecting estimates for Minimum, Maximum and Most likely can introduce bias2 1 Bias in Memory Predicts Bias in Estimation of Future Task Durations, Roy 2007 2 Herding Cats: Why 3 Point Estimates Create False Optimism Improper Estimates Estimates are commonly built on mathematical models Models must be appropriate for the application selected Models must be used within the range they were designed to predict Another estimating technique is to compare something new to something already done Effort is often scaled by some percentage Only as good a technique as the data on which it is based and the similarity of the two tasks/projects Risk Risk adds an additional element to uncertainty Broadens the range of possible values – usually to: Increase costs Extend performance time Risk is an “event” such that it either takes place or does not take place Risk events have 3 parameters Polarity ( +/- ) Likelihood – their chance of occurring Impact – the cost, delay, or opportunity associated with the risk Risk as a Variable Risks are a two-step process Step 1 … Does risk occur Step 2 … What is the impact Delay associated with risk Most Likely (m) Risk impacts are often modeled using the triangular distribution 3 Minimum impact (cost, delay, or both) Most likely impact Worst case impact Data comes from risk register as was completed in Part 1 1 Minimum (a) 6 Maximum (b) When Worlds Collide What happens if we combine a variable (cost or schedule) and risk …. How Risk Changes an Event If the risk event does not happen, no change to event If the risk event happens then the task performance time will be the sum of two probabilities Task Performance time (no risk) Risk Impact Summing probability distributions is a bit tricky Task Performance Time Most Likely (m) 9 15 6 Minimum (a) Maximum (b) + Delay associated with risk Most Likely (m) 3 1 Minimum (a) 6 Maximum (b) Combining Probability Distributions Using Method of Moments to Combine PDF’s Method of Moments Technique Analytical technique Used to calculate the “moments” of the combined distribution The resultant distribution from adding two triangular distributions is a lognormal distribution.1 The Moments of that are: Mean = μ = μ1 + μ2 … μn Variance = σ2 = it depends2 Skewedness1 = ϑ = Kurtosis = κ = 12/5 = 2.387 The math needed to calculate these is outside the scope of this presentation 1. Analytic Method for Cost and Schedule Risk Analysis, Raymond P. Covert, NASA, 5 April, 2013, pp 34 - 37 2. Calculating variance for the sum of two distributions is complicated when the two distributions are correlated. Formula shown is for correlated data Using Simulation to Combine PDF’s Monte Carlo Technique Generate random number between 0 and 1 Convert random number to duration based on triangular distribution 6,9,15 Generate second random number between 0 and 1 Convert second random number to duration based on triangular distribution 1,3,6 Add duration 1 to duration 2 Record value Repeat numerous times Compute statistics such as mean etc. from collected data Why Do I Favor Simulation? Mostly because it provides equivalent results without having to be a math major. Praised for Ability to provide statistics of a simulated CDF or PDF formed by complex modeling of random variables Ease of use Criticized for: Non-uniform sampling Unable to correlate two distributions using Pearson product – moment correlation coefficients Does not provide reasonable results with small number of trials Simulation output of 3,460 iterations © Intavar Institute Risky Project 6.0 Going Forward Adding risk and variability to our cost and schedule estimates …. Time to Build Our Model Our model of the effort (contract, project, etc) must include: Tasks to be performed Resources to be used in accomplishing those tasks Labor Materials Other (Travel, subcontracts, etc.) Cost data for the each of the resources and tasks Schedule data Risk Data Tools to be Used Excel or Numbers spreadsheet Project, Primavera, FastTrack, or other scheduling application Simulation program such as: @Risk Risky Project Primavera Full Monty What We Will Do Create Work Breakdown Structure Expand (decompose) WBS into various tasks Identify resources for each task Create risk register Map risks to each task Determine cost baseline with and without risk Create probabilistic estimates for cost at completion and completion date. Summary The language of quantitative analysis is statistics – learn the language Schedule (task) duration and cost are random variables Nothing is certain until it occurs Risk is another random variable that modifies cost, schedule, or both You don’t need a PhD to do the math … let the software do that while you interpret the results Produced by: The Contract Coach The Contract Coach 5338 La Colonia Dr NW Albuquerque, NM 87120 (505) 259-8485 http://www.contract-coach.com Monte Carlo Simulation Software MS Excel Add-in @Risk Oracle Crystal Ball Risk Solver General Purpose Products Analytica GoldSim Reno Oracle Crystal Ball SPSS MS Project Add-in Full Monty Risky Project (also supports Primavera P6) Enterprise Systems Risk+ (Deltek) Primavera EPPM & Primavera Risk (Oracle)