Uploaded by Holden Holt

Quantitative Risk Analysis

advertisement
Week 3: Where Even The Quants Go Wrong:
Common Fundamental Errors in Quantitative Models
●
●
●
●
●
●
The idea that the mere use of very sophisticated-looking mathematical models must
automatically be right has been called crackpot rigor– risk managers should be on guard
against that.
One way to realistically account for uncertainty is to generate a large number of possible
scenarios with random constraints and, therefore outcomes
Monte Carlo Concepts
○ Flaw of Averages
○ Estimates can be point estimates or range with CIs and normal distributions
○ 90% CI = 5% it would be above the upper bound and 5% it would be below the
lower bound
The number of instances is shown in each of a series of ranges of values - called a
histogram.
Histogram - chart that shows the number of instances in each of a series or ranges of
values
●
●
●
Bins show how outcomes are distributed by counting the number of results that end up
in a series of bins. Bins may be a range of outcomes, like $200,000-$400,000
Y Axis is # of outcomes in that bin (Frequency)
X Axis is the value of the bin ($millions in book example)
●
●
●
●
●
●
Translating risk to useful terms- the CDF.
Availability makes monte carlo sims more practical– but since it’s so avail, it opens up a
larger body of users who might not have the disciplines required to use it
People often use subjective estimates, and don’t do calibration training. People are
typically overconfident– models often understate risk
People often don’t validate their forecasts versus reality
People often do not use the original empirical (based on verifiable observation or
experience rather than theory or logic) measurements specifically gathered for their
model. Even fewer people conduct additional empirical measurement to reduce
uncertainty where the model is uncertain (sensitivity analysis)
The Risk Paradox
●
●
Most sophisticated risk models are often applied to low-level risks, whereas the biggest
risks use softer methods or none at all
Why use sophisticated risk methods? Large capital outlays with a lot of uncertainty of the
return
The Measurement Inversion
●
●
●
Opportunity Lost = The cost of making the wrong choice
○ Didn’t lose money? OL is zero
Expected Opportunity Lost (EOL) = Each possible opportunity loss times the chance of
that loss. The chance of being wrong x the cost of being wrong
○ Expected Opportunity Costs = Expected Value of Perfect Information EVPI
○ EVPI = the most you would reasonably be willing to pay if you could eliminate all
uncertainty about the decision
○ Almost impossible to get perfect information, however, it's useful for an upper
bound
Variables
○ Relatively few variables require further measurement but there are almost always
some
○ Measurement Inversion - Uncertain variables with the highest EVPI tend to not
be measured while the lowest EVPI variables are measured
Where’s the Science?
●
●
●
●
●
●
SMEs are not calibrated- likely overconfident with no idea of past performance
Models are rarely backtested - making sure models fit historical reality
Only a small amt of MC models make use of new empirical measures made specifically
for uncertain and sensitive variables in the model
Opportunities for marginal uncertainty reduction are overlooked
There is a pervasive lack of incentive to follow up on previous forecasts to see how well
they did
Mount St. Helens Fallacy – if two systems are dissimilar in some way, they cannot be
compared. “this event can’t be predicted like previous events! It’s not the same!” Yes, it
can.
Financial Models and the Shape of Disaster
●
●
●
●
●
●
Normal or Guassian distribution is commonly used by monte carlo modelers
The bell shape means that outcomes are more likely to be in the middle rather than the
tails
This is explained by the mean and standard deviation
○ The mean sits at the peak of the bell curve; if the distribution is symmetrical, the
mean is deadcenter. If it is skewed, the peak of the bell will be shifted left or right
○ The standard of deviation is the unit of uncertainty around the mean.
○ 68.2% within 1 standard dev
○ 95.3% within 2 standard dev
○ 99.7% within 3 standard dev
Goodness of fit is often measured, typically with something called the kolmogorovsmirnov test (K-S test)
○ The main concern for risk analysis are at the tails of the distributions
○ K-S tests are very insensitive to how fat the tails are
○ Focuses on the main body of the distribution
Power-law distribution “once in a decade event is X times as big as a once in a year
event , where X is some ratio of relative severity
●
●
Power-law distributed failures are far more common that normal distributions
(coin flips, etc)
○ We can see power-law distributed failures in volcanoes, major projects,
the stock market, etc
○ Characteristics of Systems with Power-Law Dist Failures
■ They are typically systems in which the entire system can be
stressed in a way that increases the chance of failure of all
components
■ The failure of one component causes the failure of several other
■ The failure of those components start a reaction – a cascade
failure
Correlation
○ Two variables move up and down together in some way
○ Perfect storm - random convergence of several independent factors
○ Expressed +1 to -1/
●
Too Uncertain
●
Leaving out a variable because it is too uncertain makes no sense
Week 3: NPV and IRR Explained
●
●
●
●
NPV Net Present Value = Discounted Cash flow method
○ Translating all future cash flows into todays money
○ Adding up today's investment (cash outflows) and the present values of all future
cash flows
○ NPV = positive, then it is worth pursuing. Thus it creates value for the company
Project = Year 0: ($1000), Year 1: $400,
Year 2: $400,
Year 3:$400
PV = ($1000) +
$400/1.2 = $333 + $400/(1.2^2) = $278 + $400/(1.2^3) = $231
Next year's cash flow / 1+WACC^n
●
●
●
○ N = year
○ Thus NPV is $35
How do we calculate IRR? Internal rate of return = the discount rate at which the NPV is
0
PV = ($1000) +
$400/(1+IRR) = $333 + $400/(1+IRR^2) = $278 +
$400/(1+IRR^3) = $231
○ Use Excel Formula to calculate that the IRR is 22%.
IRR Should be above the WACC. and NPV should be above 0.
Chapter 2: Sections 2.1 thru 2.2
In order to plan a risk analysis properly, you'll need to answer a few questions:
● What do you want to know and why?
● What assumptions are acceptable?
● What is the timing?
● Who is going to do the risk analysis?
The purpose of a risk analysis is to provide information to help make better decisions in
an uncertain world
1. Rank the questions that need answering from “critical” down to “interesting”
2. Discuss with the risk analyst the form of the answer (type of currency, do you need %,
what stats do you need to what accuracy)
3. Explain what arguments will be based on these outputs It is better to explain the arguments (e.g. comparing with the distribution of another
potential project's extra revenue) that would be put forward and find out if the risk
analyst agrees that this is technically correct before you get started
4. Explain whether the risk analysis has to sit within a framework.
This could be a formal framework, like a regulatory requirement or a company policy,
or it could be informal, like building up a portfolio of risk analyses that can be
compared on the same footing. Ensure max level of compatibility
5. Explain the target audience.
6. Discuss any possible hostile reactions.
Assumptions are the primary Achilles' heel, as we can argue forever about
whether assumptions are right.
7. Figure out a timeline.
8. Figure out the priority level.
9. Decide on how regularly the decision-maker and risk analyst will meet
●
Conservative assumptions are most useful as a sensitivity tool to demonstrate that
one has not taken an unacceptable risk, but they are to be avoided whenever
possible because they run counter to the principle of risk analysis which is to give an
unbiased report of uncertainty.
Chapter 3
●
quantitative risk analysis may be of little value. There are several key areas where it
can fall down:
1. It can't answer all the key questions.
2. There are going to be a lot of assumptions.
3. There is going to be one or more show-stopping assumption.
4. There aren't enough good data or experts
Add and divide by 4 to get total
You can also allow experts to weigh their expertise in a particular variable in question–
like so
First step in risk analysis - what are the questions ?
Things change, so iterate your risk analysis
Chapter 4:Sections 4.1 and 4.5
●
cardinal rule of risk analysis modelling is: "Every iteration of a risk analysis model
must be a scenario that could physically occur".
Chapter 5: Sections 5.1 thru 5.2
●
Models assumption should always be presented in a risk analysis report
○ Front of report is for results, assessment of robustness, and conclusions
○ Reports need the following components:
■ Summary
■ Intro to problem
■ Decision questions addressed and those not addressed
■ Discussion of avail data and relation to model choice
■ Assumptions
■ Critique of model
■ Results
■ Possibl eimprovements
■ Modeling strategy
■ Decision question
■ Avail daa
■ Methods of addressing decision questions w/ avail info
■ Assumptions inherent in different modeling options
■ Explanation of choice of model
■ Discussion of model used
■ Overview of model structure
■ Model validation
■ References and datasets
■ Technical appendices
●
●
■ Unusual equation derivations
■ Guide on how to interpret
■ And a shit ton more!
Chill on decimal places
Don’t include so much meaningless statistics
○ Keep the stats minimum
○ Use graphs when appropriate
○ Include explanation of models assumption
○ Tailor the report to the audience and problem
Using the parameters of a distribution to explain how a model variable has been characterised
will often be the most informative when explaining a model's logic.
Graphical illustrations of quantitative assumptions are particularly useful when non-parametric
distri- butions have been used.
Chapter 7: Sections 7.1, 7.3.4, 7.4
●
●
●
●
●
●
●
When designing our risk model, we must put ourselves in the shoes of the
decision maker
Remain neutral
It’s hardly useful to tell the reader how many iterations the monte carlo or latin
hypercube is
Never do less than 300 iterations
Too few iterations and you get inaccurate outputs and graphs
Too many takes a long time to simulate and plot grapes
Latin hyper cube modeling will only offer useful improvement when a model is
linear or when there are very few distributions in the model
●
●
MOST COMMON MODELING ERRORs- 90% of errors
○ 1- calculating means instead of simulating scenarios
■ Dont represent risk by it’s single point expected value- that
removes uncertainty. Instead, run a simulation
○ 2- representing an uncertain variable more than once in a model
○ 3- manipulating probability distributions as if they were fixed numbers
Chapter 8
●
●
●
●
Binomial Process:
○ Consists of n independent identical trials with probability of success p, resulting in s
successes
○ Three quantities (n, p, s) describe a binomial process
○ Distributions to estimate these quantities require knowledge of the other two
○ Example: Coin toss with "heads" as success (probability p=0.5 for a fair coin)
○ Each trial can return either 1 (success) with probability p or 0 (failure) with probability
(1-p), known as Bernoulli trial
Estimations in Binomial Process:
○ Number of successes in n tria ls: Binomial distribution
○ Number of trials to achieve s successes: Negative binomial distribution
○ Estimating probability of success p: Beta distribution
○ Estimating number of trials n: Inverse sampling methods or maximum likelihood
estimation
○ Summarizing results: Use relevant distribution (e.g., binomial, negative binomial,
beta) based on known quantities
Beta-binomial Process:
○ Extension of binomial process with probability p as a random variable
○ Beta(a, B) distribution used for modeling variability of p
○ Models number of successes (beta-binomial distribution) and number of failures
(beta-negative binomial distribution) for s successes
Multinomial Process:
Allows for multiple outcomes compared to binomial process with only two outcomes
Outcomes must be exhaustive and mutually exclusive (e.g., rolling a die with six
possible outcomes)
Poisson Process:
○ Continuous and constant opportunity for an event to occur, as opposed to discrete
opportunities in the binomial process
○ Examples: Lightning strikes during a storm, typographic errors in a book,
discontinuities in continuous manufacturing
○ Event can theoretically occur between zero and infinite times within a specific
amount of opportunity
○ Many systems are well approximated by a Poisson process despite not exactly
conforming to its assumptions
○
○
●
● Poisson and Binomial processes:
○ Both are "memoryless" processes
○ Key parameter in binomial process is p, the probability of occurrence
○ Key parameter in Poisson process is λ, the mean number of events per
unit of exposure, constant over exposure period
● Deriving Poisson distribution from binomial:
○ Binomial process with the number of trials tending to infinity, probability of
success tending to zero, and mean (np) remaining finite
○ Probability mass function for Poisson distribution derived from binomial
distribution
○ Example: Falls while wearing platform shoes can be modeled as either
Binomial(100, 2%) or Poisson(2)
● Poisson distribution:
○ Not only for rare events but also for continuous exposure situations where
event probability becomes small in small divisions
○ Number of events a in time t follows a Poisson distribution with mean
number of events per period (Poisson intensity) λ
● Time to observe a events in Poisson process:
○ Constant probability of event occurrence per time increment
○ Small element of time Δt with the probability of event occurrence as kΔt (k
is a constant)
● Estimate of Poisson intensity (mean number of events per period) λ:
○ Fundamental property of stochastic system, never observed or exactly
known
○ Bayesian inference helps in quantifying knowledge as data accumulates
○ Assumes uninformed prior and Poisson likelihood function for observing a
events in period t
8.4.3 Mean, variance, and covariance of the hypergeometric distribution
The mean, variance, and covariance of the hypergeometric distribution are given by:
Mean: E(X) = n * (D / M)
Variance: Var(X) = n * (D / M) * ((M - D) / M) * ((M - n) / (M - 1))
Covariance: Cov(X, Y) = -n * n * (D / M) * (1 - D / M) * ((M - n) / (M - 1))
where E(X) and Var(X) are the mean and variance of the hypergeometric distribution, Cov(X, Y) is the
covariance between two hypergeometric random variables X and Y, n is the sample size, D is the
number of items with the characteristic of interest, and M is the population size.
8.4.4 Applications of the hypergeometric distribution
The hypergeometric distribution has various applications, particularly in scenarios where the
population size is finite, and sampling is done without replacement. Some common applications
include:
1. Quality control: In manufacturing, a batch of products is sampled without replacement to
determine the number of defective items.
2. Ecology: Researchers may use the hypergeometric distribution to estimate the population
size of a particular species in a given area by sampling without replacement.
3. Genetics: The hypergeometric distribution can be used to analyze genetic linkage, i.e., the
probability of inheriting certain genes or traits from parents.
4. Social sciences: Surveys that involve sampling without replacement from a finite population
can be analyzed using the hypergeometric distribution.
5. Sports: The hypergeometric distribution can be used to analyze drafts in professional sports
leagues, where teams select players without replacement from a pool of eligible players.
6. Lottery: The probability of winning a lottery, where a certain number of balls are drawn
without replacement from a set, can be calculated using the hypergeometric distribution.
Hypergeometric Distribution:
● Deals with sampling without replacement
● Used when population size is finite and known
● Applicable when considering a specific success within a fixed number of trials
● Application: Lottery or card game probabilities
Poisson Distribution:
●
●
●
●
Deals with rare events occurring over a fixed interval (time, space, etc.)
Assumes a constant rate of occurrence
Events are independent of each other
Application: Number of accidents at an intersection, call center traffic
Multinomial Distribution:
● Generalization of the binomial distribution to multiple categories
● Describes the probabilities of different outcomes in a fixed number of trials
● Application: Predicting election outcomes, analyzing consumer preferences
Binomial Distribution:
●
●
●
●
Deals with binary outcomes (success/failure)
Assumes a fixed number of trials with constant probability of success
Events are independent of each other
Application: Tossing a coin, medical trials
Mixture Distributions:
● Combines multiple probability distributions
● Represents a population consisting of subpopulations with different
characteristics
● Application: Modeling heterogeneous populations, financial risk assessment
Renewal Process:
● Describes events occurring in a stochastic system at random time intervals
● Assumes constant probability of event occurrence
● Application: Predicting machine failure, studying customer purchase behavior
Central Limit Theorem:
● States that the sum or average of a large number of independent random
variables will approach a normal distribution
● Applies regardless of the original distributions
● Foundation for hypothesis testing and confidence intervals
Estimation of Population and Subpopulation Sizes:
● Techniques for estimating the total or subset size within a population
● Can use sampling, capture-recapture, or other statistical methods
● Application: Wildlife population estimation, market sizing
Deriving Poisson Distribution from Binomial Distribution:
● Poisson is the limit of the binomial distribution when:
1. Number of trials (n) goes to infinity
2. Probability of success (p) goes to zero
3. The product of n and p remains constant (λ)
● Applicable when events are rare and the number of trials is large
Download