WEEK 1 & 2 & 3 MACHINE LEARNING BASICS A. Fundamental trade-off when implementing/building quantitative risk management models Model risk: a simple/tractable model may fail to capture phenomena that are potentially important --> using normal distributions because it is common to work with but its right tail property cannot model real data accurately --> using Normal instead of more advanced model such as Gaussian can underestimate the risks significantly - UNDERFITTING Sample risk: a flexible/rich model can be vulnerable to noise in the data model that is too complicated or flexible model that tries to capture everything of the real data in the world --> hard to calibrate and estimate from the data --> cannot obtain wisdom and understanding from the data – OVERFITTING B. Validation/Test versus Training prediction error illustrates a variance/bias tradeoff: • Bias: The error that is introduced by modeling a complicated problem by a simpler model. More complexity (p) => Less bias. Error between average model prediction and ground truth. capacity of the underlying model to predict the values • Variance: How much your estimate for f() would change if you had a different training data set. More complexity (p) => More variance Prediction power is evaluated in the validation/test data 1) Confusion Matrix MLE: estimate the parameters of a probability distribution that best describe a given dataset. The fundamental idea behind MLE is to find the values of the parameters that maximize the likelihood of the observed data, assuming that the data are generated by the specified distribution. If the model is correct: MLE recovers the “true” parameter (“consistency”); Incorrect: MLE converges to the one that is the “closest” to “true distribution” from which data is generated Nodes: Root/Parent Sub Terminal/Leaf False +ve rate = Fall out; False -ve /miss out rate = FN/(TP+FN) 2) Receiver Operating Characteristics (ROC) Curve & AUC Score ROC = FPR-TPR Plot High Bias: Overly simplified Model, Under-fitting, High error on both test and train data High Variance: Overly complex Model, complex Model, Low error on train data and high on test, Starts modelling the noise in the input Lending club Problem: Credit risk arises from the possibility that borrowers may default If you could understand credit rates for each individual borrower. By learning from their historical data, you may already be able to mitigate or to avoid a lot. So a lot of those risks in the first place Suppose a lender has access to historical data of borrower’s profiles and their loan outcomes The credit decision problem: Given a (new) borrower profile, should the lender accept the loan? or it can be the same borrower that has borrowing history. Given that the interest rate has been given to the lender --> should the lender lend the money to borrower or not? (Binary) classification: how to classify outcomes into a “positive” and “negative” outcomes using feature information o In our example: “positive” = “paid in full” and “negative” = “charged off” o Q = probability of positive group o Decision criterion: classify the profile to be positive if and only whenever the classifiers output Q > Z o lend the loan to the borrower whenever the probability of repaying the loan Q > probability threshold Z C. Logistic Regression Overall likelihood: 0.905*0.516*0.691 = 0.323. Avg MLE=*loss/#data DT Given the coefficient fix everything else, we need equal home ownership is typically associated with less default probability: Positive b-->larger Y->Larger Q --> prob of repaying loans is high D. Regularization for Logistic Regression – overcoming overfitting/complicated model/multicollinearity issue Dummy Variable – One hot encoding: transform verbal description and order variable in numeric; “blurs” structures within the values of a variable due to too many additional variables created, requires more prediction models (e.g., deep learning, random forest and neural network) to pick up the (nonlinear) structure; Remedies: autoencoder/embedding – encode select a small number of features that capture nearly all the properties of the data Multicollinearity: arises when the features are highly correlated but highly correlated means – redundant variables. If you know the values of certain features, you almost certainly know the values of some other variable. Does not affect prediction power but disrupts interpretations for parameters LASSO REGRESSION: Can fit either a line, or polynomial minimizing the sum of mean-squared error for each datapoint and the weighted L1 norm of the function parameters beta. RIDGE REGRESSION: Can fit either a line, or polynomial minimizing the sum of mean-squared error for each datapoint and the weighted L2 norm of the function parameters beta. Larger the lambda, more complex the model is, more overfitting E. Model Evaluation Metrics F. Decision Trees: easily interpreted, close to human decision making Decision tree splitting criterion: Entropy (measure of disorder, uncertainty, impurity in a node) Leaf nodes (have all instances belonging to ONLY ONE class – entropy = 0), Half/half entropy=1 Information Gain: The expected decrease in uncertainty, could be used to measure the gain of adding a predictor (how much informativeness each predictor has at each node of the tree has. how much information a feature provides about a class. Information gain = Entr_parent – weighted average of Entropy_children Probability Pass/Fail at each Node: (#Pass or Fail)/#data in each node Average entropy: (#subnode1/#parent) * Entropy_subnode1 +(#subnode2/#parent) * Entropy_subnode2 Decision tree splits the target variable using predictors into different sub groups which are more homogenous/pure (e.g. having either 1’s or 0’s) Target variables: Pass/Fail, Non-default/Default, Positive/Negative Loan Regularization: change max_depth, min_samples_split, min_samples_leaf Ensemble: grow more tree & fit each tree with different training data sets + Bootstrap (artificially generate many training data sets through randomly resampling the original training data set w REPLACEMENT) WEEK 4: DERIVATIVES Defn – value derived from value of other variables, transfers risk and rewards without transfer underlying Hedging – intended to offset potential losses/gains To take an offsetting position in an asset or investment that reduces the price risk of an existing position reducing the risk of adverse price movements in another asset. Hedge taking the opposite position in a related security or in a derivative security based on the asset to be hedged. Derivatives can be effective hedges against their underlying assets because the relationship between the two is more or less clearly defined. Derivatives: options, swaps, futures, and forward contracts. The underlying assets: stocks, bonds, commodities, currencies, indexes, or interest rates. Trading strategy in which a loss for one investment is mitigated or offset by a gain in a comparable derivative. Short – selling asset that is not owned Forwards: An agreement between two parties to buy (sell) something at a pre-specified price on a pre-specified date Forward rate – contract to deliver fixed FV of t2-maturity 0s on date t1, in exchange for fixed amt cash paid at t1 The party to buy the asset in the future is said to buy a forward, and has a long position. The party to sell the asset in the future is said to sell a forward and has a short position. Both long & short sum to ZERO Max gain: K when S(T) = 0 Max loss: Unlimi Max gain: Unlim Max loss: -K when S(T) = 0 Underlying asset price contingency: Always. No PREMIUM! Options S - Price of underlying asset generally; S0- Price of underlying asset today, ST- Price of underlying asset at option expiration date K – Exercise or Strike Price of the option; r – Interest rate (risk-free rate) C0 – The price of a call option today; CT- The price of a call option at the option’s expiration date Buy OPTIONS – Long position | Sell OPTIONS – short position A CALL/PUT option is a security that gives its owner – the right (but not the obligation) – to purchase/sell – a given asset (usually a stock) – on a given date (or given the type of option, anytime before a given date) – for a predetermined price (referred to as exercise or strike price) In the money: if today expire, holder will exercise the option, S>K for call, K>S for put . At the money: if today expire, holder is indifferent S=K Out of money: holder will give up the right to exercise the option S<K for call, K<S for put *If the call is in the money, then the put is out! Intrinsic value = payoff. Profit that could have be made if the option was immediately exercised. *Only have IV when in or at the money IV Call: S – K | IV Put: K– S Time value – the diff b/w options price and IV (most of TV is volatility value, paying a premium for the volatility!) Factors affecting call option value: Value of call option increases as stock prices increase. For same stock price, lower K, higher value of call. Value increases with i/r, time to maturity, volatility and exercise price relative to mkt price. Value increase if expected dividend decrease Long Call option payoff: max (S0 – K, 0) ≤ C0 ≤ S0, asset price cont: S> K Max loss: - FV(Pc), Max Gain: Unlimited. Good when P asset rises Long Call option profit: max (S0 – K, 0) – Fair value of Premium Paid Short Call option payoff: -max (S0 – K, 0) ≤ C0 ≤ S0 Max loss: FV(Pc), Max Gain: FV(Pc) Short Call option profit: -max (S0 – K, 0) + Fair value of Premium Received Long Put option payoff: max (0, K – So), Max loss: - FV(Pp), Max Gain: K – FV(Pp). Good when P asset decreases Short wrt underlying, long wrt derivative Long Put option profit: max (0, K – So) – Fair value of Premium Paid Short Put option payoff: - max (0, K – So), asset price cont: K > S Max loss: - K + FV(Pp), Max Gain: FV(Pp) Short Put option profit: - max (0, K – So) + Fair value of Premium Received *IF not fulfilled = Arbitrage! (Misprice check!) Esp. when C0 < K-S Value/Price cannot be < 0, min is 0 i.e. “out of money” Payoff of a call at Maturity: Ct = Max (ST – K, 0) Fair value of forward = difference between current forward price and strike price Buyer Value of forward, 𝑉0 = ((𝐹0 − 𝐾) ∗ 𝐴) ∗ 𝑑𝑡 Seller Value of forward, 𝑉0 = −((𝐹0 − 𝐾) ∗ 𝐴) ∗ 𝑑𝑡 𝐹0 is cash inflow or current exchange rate or closing forward rate, or current forward price K is cash outflow or agreed future rate or set fixed rate; A is amt of units being buy/sell and 𝑑𝑡 is discount factor. If L>R, profit made by buying security, sell forward. If L<R, profit made by doing reverse Put-Call parity theory Investment 1: protective put (stock position and a put option on that position). Investment 2: buy a call option on the same stock and treasury bills with face value equal to the exercise price. Since e two payoffs are identical, they must cost the same C+X/ (1+rf)T = S0 + P if prices not equal, arbitrage happens Cost of 1st strategy = cost of 2nd strategy Price of underlying stock (S) + Price of put (P) = Price of call (C) + PV of Exercise price PV(X) P = C+ PV(X) – S Given the price of 3, we can find the other! *1 contract = 100 options for 100 shares Assume 1 yr period if not stated **Valuing/pricing options only at/in the money To find value at maturity/terminal, no need to DC To find value before maturity/terminal must DC (X price) C0 = S0 – PV(X) | P0 = PV(X) - S0 Break even stock price = strike price + premium *Profit for buyer: payoff – option price/premium *Profit for seller: payoff (can be –ve) + option price/premium *BEP (Find the St): IV (X-ST) = Option Price/value easier to use than the tedious algorithm involved in the binomial model. Two more assumptions: that both the risk-free interest rate and stock price volatility are constant over the life of the option. As the time to expiration is divided into ever-more subperiods, the distribution of the stock price at expiration progressively approaches the lognormal distribution derive the exact option-pricing formula Frequent rebalancing, transaction-cost free, independent price changes Tells you the PRESENT VALUE OF AN OPTION POSITION Gives you the FAIR VALUE OF THE CALL OPTION PREMIUM 𝑐 = 𝑆0 𝑁(𝑑1 ) − 𝐾𝑒 −𝑟𝑇 𝑁(𝑑2 ), 𝑒 𝑟𝑇 = PV of $1 at end of time T 2 𝑑1 = Hedge and arbitrage are two sides of the same coin Hedge: reduce/eliminate risk exposure to an explicit market variable (price, interest rate…) - Usually comes with a cost (i.e., price of derivative) that off sets return premium - If return still persists after risks are hedged away, we have an arbitrage opportunity (happens when the offset is not exact --> still have some benefits/profits lefts after all the downward possibilities/risks are hedged away) • The theoretical value of the derivative is derived from an idealized environment where no arbitrage opportunities exist - Requires only that there is at least one intelligent investor in the economy -When market value ≠ theoretical value, arbitrage opportunities emerge Forward & Futures [Step 1] Calculate the present value of the derivative 1. Check PV values If positive: take a long position (of the derivative) • If negative: take a short position (of the derivative) • If zero: no arbitrary opportunity (theoretical price matches the actual price --> no need for additional transactions to neutralise --> no missed pricing in the market) 2. OR is the value is delta neutral? [Step 2] Delta hedging: pair the derivative with long/short position in the underlying to reach delta neutrality [Step 3] Create the “arbitrage table” to achieve/verify the arbitrage condition Long Position Hedging Table 𝐾 2 𝜎√𝑇 and 𝑑2 = 𝑑1 − 𝜎√𝑇 Gives you the FAIR VALUE OF THE PUT OPTION PREMIUM 𝑝 = 𝐾𝑒 −𝑟𝑇 𝑁(−𝑑2 ) − 𝑆0 𝑁(−𝑑1 ) where r is the continually compounded APR, NOT the annual compounded one, use EAR = eAPR – 1, to get APR, 𝝈 is annual volatility or SD; S is stock price To get N(d), use GC, normalcdf or norm.cdf, lower bound put -1e99, upper bound put the d figure. 𝑲𝒆−𝒓𝑻 is PV of exercise price, discounted at riskfree rate. Higher 𝜎 2 will make d1 higher and d2 lower, which give higher/lower cumulative probability respectively and higher call value. Option’s value = option’s payoff = option’s payoff Underlying asset price e.g. stock price When the stock price = 49. The theoretical fair value of the option is $2.4 WEEK 5: DYNAMIC VOLATILITY MODELS Implied volatility: the volatility that gives the market price of the option under BSM A. BASIC ESTIMATE Mean E[X] = ∑𝒏𝒊=𝟏 𝒑𝒊 𝑿𝒊 ; Var(x) = ∑𝒏𝒊=𝟏 𝒑𝒊 (𝑿𝒊 − 𝑬[𝑿])𝟐; Annualized/Yearly Volatility: √𝟐𝟓𝟐𝝈𝒊 Method 1 – Usual Formula Method 2 – Simplified Formula Show the value: payoff or net cashflow Hedging = combine a bunch of different positions and transactions--> a lock in arbitrage profit of 9.62 risk free as of today without danger of any loss Short Position Hedging Table -Present Value (P) of Long Position = S - K/(1 + r) = 1000 - 1060/(1+0.04) =( - 19.23) -Present Value of Short Position = K/(1 + r) - S = 19.23 Hence, to satisfy no arbitrage condition, we should enter a short forward contract. Method 1: In the money S > K of 50 exercise option. Cost of hedging = $5,263,300 - ($50) * 100,000 = $263,300. Method 2: min{0, k – S(T)} banks|: call payoff is (50-57.25*100K buy; share’s payoff = 57.25*100K Out of the money not exercise Cost of hedging: 256.6K – 0 = 256.6K bank: call payoff = 0 , shares payoff = 0 (no selling) Discounted to week 0: W20 cost/1.005^(20/52),BMS = 240K 𝑆 𝜎 ln( 0 )+(𝑟+ )𝑇 Options Black-Scholes-Merton model ONLY for European Call & Put (underlying asset is non-dividend paying stock) While the binomial model is extremely flexible, a computer is needed for it to be useful in actual trading. An option-pricing formula would be far If stock price is too low almost never exercise the option hence option value is 0. Delta is closely related We can still use delta hedging to exploit the mispricing and lock in the arbitrage profit 1) LINEAR: just needs to hedge once - on arbitrage table combine portfolio with the right amount of asset once then ==> you are guaranteed to be freed from any possible risks of making losses - EASY case 2) NON-LINEAR: combine once is not enough, Require the hedge to be repeatedly rebalanced to preserve delta neutrality ~ Put-Call Parity: To derive Put option fair value from Call option Fair Value 𝑃 = 𝐶 + (𝐾𝑒 −𝑟𝑇 − 𝑆0 ) where P and C are value of Euro put and call options. How: find 𝑑1 and 𝑑2 , find N(𝑑1 ) and N(𝑑2 ), estimate C, estimate Put Sample Path Delta Hedging – to lock in arbitrage profit: rate of change in total portfolio value with respect to the value of market variable (stock price) – N(d1) how much option will change for every $1 move in the underlying. Option: $3, D: 0.50, underlying ↑ 50 to 51, option ↑ $3 to $3.5 B. Expo Weighted Moving Avg (EWMA) Model – lamda lamb = 0.94 C. Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) (1,1), we combine the ideas of ARCH and EWMA D. GARCH(1,1) – Variance Targeting One way of Garch(1,1) – omega, α and β implementing GARCH(1,1) that increases stability is by using variance targeting • The long-run average variance Garch VT - α and β equal to the sample variance • Only two other parameters (i.e., α and β) then have to be estimated