Uploaded by Mikayla Trang Le

Individual Assignment 1 Cheatsheet (Trang) - FINAL!!

advertisement
WEEK 1 & 2 & 3 MACHINE LEARNING BASICS
A. Fundamental trade-off when implementing/building quantitative risk
management models
 Model risk: a simple/tractable model may fail to capture phenomena
that are potentially important --> using normal distributions because it
is common to work with but its right tail property cannot model real
data accurately --> using Normal instead of more advanced model such
as Gaussian can underestimate the risks significantly - UNDERFITTING
 Sample risk: a flexible/rich model can be vulnerable to noise in the data
model that is too complicated or flexible model that tries to capture
everything of the real data in the world --> hard to calibrate and
estimate from the data --> cannot obtain wisdom and understanding
from the data – OVERFITTING
B. Validation/Test versus Training prediction error illustrates a
variance/bias tradeoff:
• Bias: The error that is introduced by modeling a complicated problem
by a simpler model. More complexity (p) => Less bias. Error between
average model prediction and ground truth. capacity of the underlying
model to predict the values
• Variance: How much your estimate for f() would change if you had a
different training data set. More complexity (p) => More variance
Prediction power is evaluated in the validation/test data
1) Confusion Matrix
MLE: estimate the parameters of a probability distribution that best
describe a given dataset. The fundamental idea behind MLE is to find the
values of the parameters that maximize the likelihood of the observed
data, assuming that the data are generated by the specified distribution.
If the model is correct: MLE recovers the “true” parameter (“consistency”);
Incorrect: MLE converges to the one that is the “closest” to “true
distribution” from which data is generated
Nodes: Root/Parent  Sub  Terminal/Leaf
False +ve rate = Fall out; False -ve /miss out rate = FN/(TP+FN)
2) Receiver Operating Characteristics (ROC) Curve & AUC Score
 ROC = FPR-TPR Plot
High Bias: Overly simplified Model, Under-fitting, High error on both test
and train data
High Variance: Overly complex Model, complex Model, Low error on train
data and high on test, Starts modelling the noise in the input
Lending club Problem:
 Credit risk arises from the possibility that borrowers may default
 If you could understand credit rates for each individual borrower. By
learning from their historical data, you may already be able to mitigate
or to avoid a lot. So a lot of those risks in the first place
 Suppose a lender has access to historical data of borrower’s profiles and
their loan outcomes
 The credit decision problem: Given a (new) borrower profile, should the
lender accept the loan? or it can be the same borrower that has
borrowing history. Given that the interest rate has been given to the
lender --> should the lender lend the money to borrower or not?
 (Binary) classification: how to classify outcomes into a “positive” and
“negative” outcomes using feature information
o In our example: “positive” = “paid in full” and “negative” =
“charged off”
o Q = probability of positive group
o Decision criterion: classify the profile to be positive if and only
whenever the classifiers output Q > Z
o lend the loan to the borrower whenever the probability of
repaying the loan Q > probability threshold Z
C. Logistic Regression
Overall likelihood: 0.905*0.516*0.691 = 0.323. Avg MLE=*loss/#data DT
Given the coefficient fix everything else, we need equal home ownership
is typically associated with less default probability: Positive b-->larger Y->Larger Q --> prob of repaying loans is high
D.
Regularization
for
Logistic
Regression
–
overcoming
overfitting/complicated model/multicollinearity issue
Dummy Variable – One hot encoding: transform verbal description and
order variable in numeric; “blurs” structures within the values of a variable
due to too many additional variables created, requires more prediction
models (e.g., deep learning, random forest and neural network) to pick up
the (nonlinear) structure; Remedies: autoencoder/embedding – encode
select a small number of features that capture nearly all the properties of
the data
Multicollinearity: arises when the features are highly correlated but highly
correlated means – redundant variables. If you know the values of certain
features, you almost certainly know the values of some other variable.
Does not affect prediction power but disrupts interpretations for
parameters
LASSO REGRESSION: Can fit either a line, or polynomial minimizing the
sum of mean-squared error for each datapoint and the weighted L1 norm
of the function parameters beta.
RIDGE REGRESSION: Can fit either a line, or polynomial minimizing the
sum of mean-squared error for each datapoint and the weighted L2 norm
of the function parameters beta.
Larger the lambda, more complex the model is, more overfitting
E. Model Evaluation Metrics

F. Decision Trees: easily interpreted, close to human decision making
Decision tree splitting criterion: Entropy (measure of disorder,
uncertainty, impurity in a node)  Leaf nodes (have all instances
belonging to ONLY ONE class – entropy = 0), Half/half entropy=1
Information Gain: The expected decrease in uncertainty, could be used to
measure the gain of adding a predictor (how much informativeness each
predictor has at each node of the tree has. how much information a
feature provides about a class.
Information gain = Entr_parent – weighted average of Entropy_children
Probability Pass/Fail at each Node: (#Pass or Fail)/#data in each node
Average entropy: (#subnode1/#parent) * Entropy_subnode1
+(#subnode2/#parent) * Entropy_subnode2
Decision tree splits the target variable using predictors into different sub
groups which are more homogenous/pure (e.g. having either 1’s or 0’s)
Target variables: Pass/Fail, Non-default/Default, Positive/Negative Loan
Regularization: change max_depth, min_samples_split, min_samples_leaf
Ensemble: grow more tree & fit each tree with different training data sets
+ Bootstrap (artificially generate many training data sets through
randomly resampling the original training data set w REPLACEMENT)
WEEK 4: DERIVATIVES
Defn – value derived from value of other variables, transfers risk and
rewards without transfer underlying
Hedging – intended to offset potential losses/gains
 To take an offsetting position in an asset or investment that reduces
the price risk of an existing position  reducing the risk of adverse
price movements in another asset. Hedge taking the opposite position
in a related security or in a derivative security based on the asset to be
hedged. Derivatives can be effective hedges against their underlying
assets because the relationship between the two is more or less clearly
defined. Derivatives: options, swaps, futures, and forward contracts.
The underlying assets: stocks, bonds, commodities, currencies,
indexes, or interest rates. Trading strategy in which a loss for one
investment is mitigated or offset by a gain in a comparable derivative.
Short – selling asset that is not owned
Forwards: An agreement between two parties to buy (sell) something at a
pre-specified price on a pre-specified date
Forward rate – contract to deliver fixed FV of t2-maturity 0s on date t1, in
exchange for fixed amt cash paid at t1
The party to buy the asset in the future is said to buy a forward, and has
a long position. The party to sell the asset in the future is said to sell a
forward and has a short position. Both long & short sum to ZERO
Max gain: K when
S(T) = 0
Max loss: Unlimi
Max gain: Unlim
Max loss: -K when
S(T) = 0
Underlying asset price contingency: Always. No PREMIUM!
Options S - Price of underlying asset generally; S0- Price of underlying
asset today, ST- Price of underlying asset at option expiration date
K – Exercise or Strike Price of the option; r – Interest rate (risk-free rate)
C0 – The price of a call option today; CT- The price of a call option at the
option’s expiration date
Buy OPTIONS – Long position | Sell OPTIONS – short position
A CALL/PUT option is a security that gives its owner – the right (but not
the obligation) – to purchase/sell – a given asset (usually a stock) – on a
given date (or given the type of option, anytime before a given date) – for
a predetermined price (referred to as exercise or strike price)
In the money: if today expire, holder will exercise the option, S>K for call,
K>S for put . At the money: if today expire, holder is indifferent S=K
Out of money: holder will give up the right to exercise the option S<K for
call, K<S for put *If the call is in the money, then the put is out!
Intrinsic value = payoff. Profit that could have be made if the option was
immediately exercised. *Only have IV when in or at the money
IV Call: S – K | IV Put: K– S
Time value – the diff b/w options price and IV (most of TV is volatility
value, paying a premium for the volatility!)
Factors affecting call option value:
Value of call option increases as stock prices increase. For same stock
price, lower K, higher value of call. Value increases with i/r, time to
maturity, volatility and exercise price relative to mkt price. Value increase
if expected dividend decrease
Long Call option payoff: max (S0 – K, 0) ≤ C0 ≤ S0, asset price cont: S> K
Max loss: - FV(Pc), Max Gain: Unlimited. Good when P asset rises
Long Call option profit: max (S0 – K, 0) – Fair value of Premium Paid
Short Call option payoff: -max (S0 – K, 0) ≤ C0 ≤ S0
Max loss: FV(Pc), Max Gain: FV(Pc)
Short Call option profit: -max (S0 – K, 0) + Fair value of Premium Received
Long Put option payoff: max (0, K – So),
Max loss: - FV(Pp), Max Gain: K – FV(Pp). Good when P asset decreases
Short wrt underlying, long wrt derivative
Long Put option profit: max (0, K – So) – Fair value of Premium Paid
Short Put option payoff: - max (0, K – So), asset price cont: K > S
Max loss: - K + FV(Pp), Max Gain: FV(Pp)
Short Put option profit: - max (0, K – So) + Fair value of Premium Received
*IF not fulfilled = Arbitrage! (Misprice check!)
Esp. when C0 < K-S
Value/Price cannot be < 0, min is 0 i.e. “out of money”
Payoff of a call at Maturity: Ct = Max (ST – K, 0)
Fair value of forward = difference between current forward price and strike
price
Buyer Value of forward, 𝑉0 = ((𝐹0 − 𝐾) ∗ 𝐴) ∗ 𝑑𝑡
Seller Value of forward, 𝑉0 = −((𝐹0 − 𝐾) ∗ 𝐴) ∗ 𝑑𝑡
𝐹0 is cash inflow or current exchange rate or closing forward rate, or
current forward price
K is cash outflow or agreed future rate or set fixed rate; A is amt of units
being buy/sell and 𝑑𝑡 is discount factor. If L>R, profit made by buying
security, sell forward. If L<R, profit made by doing reverse
Put-Call parity theory Investment 1: protective put (stock position and a
put option on that position). Investment 2: buy a call option on the same
stock and treasury bills with face value equal to the exercise price. Since e
two payoffs are identical, they must cost the same
C+X/ (1+rf)T = S0 + P if prices not equal, arbitrage happens
Cost of 1st strategy = cost of 2nd strategy
Price of underlying stock (S) + Price of put (P) =
Price of call (C) + PV of Exercise price PV(X)
P = C+ PV(X) – S Given the price of 3, we can find the other!
*1 contract = 100 options for 100 shares Assume 1 yr period if not stated
**Valuing/pricing options only at/in the money
To find value at maturity/terminal, no need to DC
To find value before maturity/terminal must DC (X price)
C0 = S0 – PV(X)
|
P0 = PV(X) - S0
Break even stock price = strike price + premium
*Profit for buyer: payoff – option price/premium
*Profit for seller: payoff (can be –ve) + option price/premium
*BEP (Find the St): IV (X-ST) = Option Price/value
easier to use than the tedious algorithm involved in the binomial model.
Two more assumptions: that both the risk-free interest rate and stock
price volatility are constant over the life of the option. As the time to
expiration is divided into ever-more subperiods, the distribution of the
stock price at expiration progressively approaches the lognormal
distribution  derive the exact option-pricing formula
Frequent rebalancing, transaction-cost free, independent price changes
Tells you the PRESENT VALUE OF AN OPTION POSITION
Gives you the FAIR VALUE OF THE CALL OPTION PREMIUM
𝑐 = 𝑆0 𝑁(𝑑1 ) − 𝐾𝑒 −𝑟𝑇 𝑁(𝑑2 ), 𝑒 𝑟𝑇 = PV of $1 at end of time T
2
𝑑1 =
Hedge and arbitrage are two sides of the same coin
Hedge: reduce/eliminate risk exposure to an explicit market variable
(price, interest rate…)
- Usually comes with a cost (i.e., price of derivative) that off sets return
premium
- If return still persists after risks are hedged away, we have an arbitrage
opportunity (happens when the offset is not exact --> still have some
benefits/profits lefts after all the downward possibilities/risks are hedged
away)
• The theoretical value of the derivative is derived from an idealized
environment where no arbitrage opportunities exist
- Requires only that there is at least one intelligent investor in the
economy
-When market value ≠ theoretical value, arbitrage opportunities emerge
Forward & Futures
[Step 1] Calculate the present value of the derivative
1. Check PV values
 If positive: take a long position (of the derivative)
• If negative: take a short position (of the derivative)
• If zero: no arbitrary opportunity (theoretical price matches the actual
price --> no need for additional transactions to neutralise --> no missed
pricing in the market)
2. OR is the value is delta neutral?
[Step 2] Delta hedging: pair the derivative with long/short position in the
underlying to reach delta neutrality
[Step 3] Create the “arbitrage table” to achieve/verify the arbitrage
condition
Long Position Hedging Table
𝐾
2
𝜎√𝑇
and 𝑑2 = 𝑑1 − 𝜎√𝑇
Gives you the FAIR VALUE OF THE PUT OPTION PREMIUM
𝑝 = 𝐾𝑒 −𝑟𝑇 𝑁(−𝑑2 ) − 𝑆0 𝑁(−𝑑1 )
where r is the continually compounded APR, NOT the annual
compounded one, use EAR = eAPR – 1, to get APR, 𝝈 is annual volatility or
SD; S is stock price
To get N(d), use GC, normalcdf or norm.cdf, lower bound put -1e99, upper
bound put the d figure. 𝑲𝒆−𝒓𝑻 is PV of exercise price, discounted at riskfree rate. Higher 𝜎 2 will make d1 higher and d2 lower, which give
higher/lower cumulative probability respectively and higher call value.
Option’s value = option’s payoff = option’s payoff
Underlying asset price  e.g. stock price
When the stock price = 49. The theoretical fair value of the option is $2.4
WEEK 5: DYNAMIC VOLATILITY MODELS
Implied volatility: the volatility that gives the market price of the option
under BSM
A. BASIC ESTIMATE
Mean E[X] = ∑𝒏𝒊=𝟏 𝒑𝒊 𝑿𝒊 ; Var(x) = ∑𝒏𝒊=𝟏 𝒑𝒊 (𝑿𝒊 − 𝑬[𝑿])𝟐;
Annualized/Yearly Volatility: √𝟐𝟓𝟐𝝈𝒊
Method 1 – Usual Formula
Method 2 – Simplified Formula
 Show the value: payoff or net cashflow
 Hedging = combine a bunch of different positions and transactions-->
a lock in arbitrage profit of 9.62 risk free as of today without danger of
any loss
Short Position Hedging Table
-Present Value (P) of Long Position = S - K/(1 + r) = 1000 - 1060/(1+0.04)
=( - 19.23)
-Present Value of Short Position = K/(1 + r) - S = 19.23
Hence, to satisfy no arbitrage condition, we should enter a short forward
contract.
Method 1: In the money  S > K of 50  exercise option. Cost of hedging
= $5,263,300 - ($50) * 100,000 = $263,300. Method 2: min{0, k – S(T)}
banks|: call payoff is (50-57.25*100K buy; share’s payoff = 57.25*100K
Out of the money  not exercise Cost of hedging: 256.6K – 0 = 256.6K
bank: call payoff = 0 , shares payoff = 0 (no selling)
Discounted to week 0: W20 cost/1.005^(20/52),BMS = 240K
𝑆
𝜎
ln( 0 )+(𝑟+ )𝑇
Options
Black-Scholes-Merton model ONLY for European Call & Put (underlying
asset is non-dividend paying stock)
While the binomial model is extremely flexible, a computer is needed for
it to be useful in actual trading. An option-pricing formula would be far
If stock price is too low  almost never exercise the option hence option
value is 0. Delta is closely related
We can still use delta hedging to exploit the mispricing and lock in the
arbitrage profit
1) LINEAR: just needs to hedge once - on arbitrage table combine
portfolio with the right amount of asset once then ==> you are
guaranteed to be freed from any possible risks of making losses - EASY
case
2) NON-LINEAR: combine once is not enough, Require the hedge to be
repeatedly rebalanced to preserve delta neutrality ~
Put-Call Parity: To derive Put option fair value from Call option Fair Value
𝑃 = 𝐶 + (𝐾𝑒 −𝑟𝑇 − 𝑆0 ) where P and C are value of Euro put and call
options. How: find 𝑑1 and 𝑑2 , find N(𝑑1 ) and N(𝑑2 ), estimate C, estimate
Put
Sample Path Delta Hedging – to lock in arbitrage profit: rate of change in
total portfolio value with respect to the value of market variable (stock
price) – N(d1) how much option will change for every $1 move in the
underlying. Option: $3, D: 0.50, underlying ↑ 50 to 51, option ↑ $3 to $3.5
B. Expo Weighted Moving Avg (EWMA) Model – lamda
lamb = 0.94
C. Generalized AutoRegressive Conditional Heteroskedasticity (GARCH)
(1,1), we combine the ideas of ARCH and EWMA
D. GARCH(1,1) – Variance Targeting
One way of
Garch(1,1) – omega, α and β
implementing
GARCH(1,1) that
increases stability
is by using
variance
targeting
• The long-run
average variance
Garch VT - α and β
equal to the
sample variance
• Only two other
parameters (i.e.,
α and β) then
have to be
estimated
Download