How Wrong is your Model? Efficient Quantification of Model Risk Advanced Statistical Methods in Credit Risk Royal Statistical Society Alan Forrest, RBS Group London, 13th June 2013 Information Classification – PUBLIC Disclaimer Thanks With many thanks to the Credit Research Centre, University of Edinburgh Business School for supporting me as a Visiting Scholar November and December 2012; and to the Royal Bank of Scotland Group for granting me Special Leave to visit the CRC. Disclaimer The opinions expressed in this document are solely the author’s and do not necessarily reflect those of The Royal Bank of Scotland Group or any of its subsidiaries. Any graphs or tables shown are based on mock data and are for illustrative purposes only. 2 Overview Model Risk Principles and Background Model Risk – an emerging and powerful idea in bank regulation and credit risk model management. Model risk assessment needs quick and quantified sensitivity analysis. Geometry and Information Model specification and model sensitivity analysis can be presented geometrically within a classical, mathematically rich theory. Efficient Sensitivity Analysis Insight to the modelling process. Practical strategies for sensitivity analysis and managing model risk. 3 Model Risk Background The US Regulator (Fed / OCC) “The use of models invariably presents model risk, which is the potential for adverse consequences from decisions based on incorrect or misused model outputs and reports.” The Statisticians (George Box, Norman Draper) “Essentially, all models are wrong, but some are useful.” The Logicians (attr John Maynard Keynes, but in fact Carveth Read) “It is better to be roughly right than precisely wrong.” 4 Model Risk Background FSA - Turner Review - March 2009 “Misplaced reliance on sophisticated maths” – Too many pricing and lending decisions did not take into account the assumptions and limitations of the models. BoE - The Dog and the Frisbee – Haldane, August 2012 “… opacity and complexity… It is close to impossible to tell whether results from [internal risk models] are prudent.” – If we cannot say why we trust a model, are we right to use it? Fed / OCC 2011-12a “Model Risk should be managed like other types of risk.” 5 Model Risk Background Sensitivity Analysis How different would the model be if … ? – – – – – The missing data were filled in in a different way A different time-period was used to develop the model We enter a downturn or stressed period An expert panel decision happened to turn out differently Etc. This is the key to quantitative model risk, but it is hard work – Then combinations of these sensitivities… – Model fitting is a multi-stage process… – Sensitivity impact (eg capital, provisions) is complex and multidimensional… Is this work out of proportion to the benefits? Can we do sensitivities quickly? Without refitting models? 6 Geometry and Information Describing Data A model is a description of the development data. – Model developers chose one of these descriptions for use in a decision. – Model Risk considers the degree to which this choice could influence the decision. This talk considers frequency histogram or contingency table descriptions of models and data (Kullback, Centsov). – Discrete buckets with sample frequencies (proportions, not counts). – This is not a practical limitation: data elements are often classified and always finite precision. – We should not restrict too early the breadth of models that could be chosen. This defines a high but finite-dimensional dimensional space whose coordinate values are the frequency in each bucket. – The observed data is a single point in this space. – Models are also “data” points: points that are preferred for use or descriptive convenience. – Dimension is the number of buckets, with one constraint: sum to 1 . 7 Geometry and Information Illustration of a 3-cell model space: 2 dimensional simplex 8 Geometry and Information Data, model subspace and model choice Model Subspace Model chosen “closest to the data” Data 9 Geometry and Information Sensitivity, model space flatness, over-sensitivity and over-fitting. Just right Models Original data Modified data “Type 3 (or type 0) Error” Over-sensitive / discontinuous Right answer to the wrong question Over-fitting 10 Geometry and Information Fitting a model to data: a log-linear example mw = ceaw Prior MLE fit Data 11 Geometry and Information Likelihood maximisation and information Some notation – d is the data point – vector (dw) of frequencies, indexed by bucket labels w , – m is the model point (mw) – N is the sample size Sw dw =1. Log Likelihood of d, given m, is log L = - N *( I(d,m) + H(d) ) where I(d,m) H(d) = Sw dw log(dw / mw) = - Sw dw log(dw), Kullback-Leibler divergence Shannon Entropy of the data. – Maximising likelihood is the same as minimising KL divergence. – but KL divergence is not a metric. Following this line gives principles for practical inference, model selection etc.. – Akaike’s Information Criterion follows from this point of view. The Bayes Information Criterion also follows, assuming different priors and sample asymptotics. 12 Geometry and Information Dual optimisation principles For exponential families of models: – The blue spaces are linear. – The model space is generated “tropically”. Principle of Maximum Likelihood – The model that maximises likelihood is where the data’s blue space hits the red model space. Principle of Maximum Entropy (uniform prior) – Within each blue space, the red point maximises entropy, H(x) . Principles of Inference – if m is the MLE fit to data d, and m’ is another model from the model space, then I(d,m’) = I(d,m) + I(m,m’) . 13 Geometry and Information Two ways of moving distributions – taxic and tropic – shifts and tilts Tropic movements generated by groups characterise exponential families of models Taxic movement f(x) f(x-t) Tropic movement f(x) g(x) f(x) Uses a group action on the state space. A group of functions acting multiplicatively on probability distributions. Renormalising after each action 14 Geometry and Information Dual Foliations (Amari, Critchley et al.) Assume a tropic structure. – Equivalently, decide what direction to give the blue subspaces. Each model space is generated tropically from a prior m0 . – Maximum likelihood principle still applies. Modified principle of maximum entropy: – Within each blue space, red points minimise I(x,m0) . Principle of Inference still holds Euclidean metric ds2 = Sw dxw2 the natural geometry. is not 15 Geometry and Information Spherical Geometry Another metric ds2 = Sw dxw2 / xw appears more natural. – Locally equivalent to KL divergence: ds2 = 2 I(x,x+dx) = 2 I(x+dx,x) , up to third order terms. – “Local Chi-squared”. – “Boot-strapping geometry”. The model fitting foliations are orthogonal in this metric. – The model space curvature is true. Isometric to a portion of a sphere. – 2x = u2 connects this space isometrically with Euclidean space. 16 Efficient Sensitivity Analysis Data shifts give rise to model shifts ds2(model) <= Const * ds2(data) Seek a distance constraint: Seek understanding of the map: d(data) d(model) – d(SQRT(data)) d(SQRT(model)) – d(LOG(data)) d(LOG(model)) – Linear, singular – Expansive or contractive directions? – Twists? Data shifted Data Model Model shifted 17 Efficient Sensitivity Analysis More generally, seek conditions for contractive sensitivities. – These are the “flat” directions in the model space – good for model risk. – Some model families contract for all data-shifts, eg a model that groups two classes. In context of logistic models, situations where sensitivities contract include: – Full interaction descriptions, eg population shifts (i.e. independent of default event); – Class amalgamation in dummy factors. 18 Efficient Sensitivity Analysis Some general results are possible: For an exponential family of models generated by dummy factors from a uniform prior: If m is the MLE model fitted to data d, then ds2(model) <= eI(d,m) ds2(data) . Outline of proof: – A sequence of MLE models take us from d to m, whose nested model spaces drop one dimension at each step, d=m0 … mt=m. – The conditions assumed allow us to arrange that each step in the model sequence removes one dummy factor at a time. – The simplified 2-by-K context allows the direct calculation 2 ds (mi+1) <= e I(mi,mi+1) 2 ds (mi) – However, inference and the MLE conditions give us – QED I(d,m) = I(d,m1) + I(m1,m2) + … + I(mk-1,m) Conjecture this is true for general exponential families and general priors. 19 Efficient Sensitivity Analysis A sensitivity principle implied by boot-strapping For large development samples, the standard error ellipsoid is sufficient to describe model sensitivity to data shifts. Data Bootstrapping Ellipsoid = ds-ball Model Standard Error Ellipsoid = image of data ds-ball Scale = Chi-sq (df = dimension) / (2N) Model Prediction Error Ellipsoid = ds-ball 20 Implications for Model Risk Managing Sensitivities Sensitivity analysis asks “what if the data was different?” Now we can connect model changes to data movements geometrically. Sensitivities can be prioritised by the (most generous) metric distance of its specified data change, ds2(data) . – Sometimes it is simpler to prioritise on the divergence, which is numerically a good approximation to the distance. Establish a limit to the size of data sensitivity, below which material changes of the model clearly will not happen. Convert model variation, measured by ds2(model), into a simple business measure, as a communicable measure of model specification risk. – For example, conservatism equivalent, swap-set size, effect on gini, provisions impact or RWA impact. 21 Implications for Model Risk Example A model risk with default identification has been identified for a logistic scorecard model. A sensitivity analysis has been proposed: How different could the model be if the default counts were undercounted? Assume a recorded 5% default rate, with possible additional defaults up to 5.5%. A 2 general upper bound for data ds in this case is about 0.005 . – A sharper upper bound could be achieved knowing more about the factor distributions. This prioritizes this sensitivity analysis in the model’s management. – All things being equal, it will come after a check on missing values accuracy (est. ds 2 = 0.2) and before a check on the robustness of classing another factor (est. ds 2 = 0.003). With a sample size of 10,000 , and data space of dimension 100, this gives a 95%-ile 2 bootstrap ds radius of about 0.01 (1/20000 *chi-sq(df=100) ). Therefore the sensitivity proposed is about 1/2 the strength of a standard error 95% significance movement. Conclusion: assuming this model is built on robust use of standard errors, this sensitivity should cause no change to the significance of the factors selected, but could change the factor selection and the size of some of the parameters. 22 Conclusions Model Risk Principles Model risk is an important and growing area of banks’ risk management. The key to quantitative model risk assessment is sensitivity analysis, and The key to practical sensitivity analysis is a quick, effective method to gauge model variation without having to rebuild models. Efficient Sensitivity Analysis and Model Risk Managment Classical ideas of statistical geometry and information theory add insight to the process of model fitting, Sensitivity analysis is framed as a differential data-shift problem. This approach to sensitivity analysis develops fully general principles, and practical methods that cut resource and by-pass the need to refit models. 23