Backtesting Lambda Value at Risk Jacopo Corbetta arXiv:1602.07599v2 [q-fin.RM] 12 Jun 2016 CERMICS, École des Ponts ,UPE, Champs sur Marne, France. and Ilaria Peri Department of Finance, Business School, University of Greenwich, London, England June 14, 2016 Abstract A new risk measure, the Lambda value at risk (ΛV aR), has been recently proposed as a generalization of the Value at risk (V aR). The ΛV aR appears attractive for its potential ability to solve several problems of V aR. The aim of this paper is to provide the first study on the backtesting of ΛV aR. We propose three nonparametric tests which exploit different features. Two of these tests directly assess the correctness of the level of coverage predicted by the model. One is bilateral and provides an asymptotic result. A third test assesses the accuracy of ΛV aR that depends on the choice of the P&L distribution. Finally, we perform a backtesting exercise that confirms the highest performance of ΛV aR especially when the distribution tail behaviour is considered. Keywords: hypothesis test, estimation risk, risk management 1. Introduction Risk measurement and its backtesting are matter of primary concern to the financial industry. The value at risk (V aR) measure has become the best practice. Despite its popularity, after the recent financial crisis, V aR has been extensively criticized by academics and risk managers. Among these critics, we recall the inability to capture the tail risk and the lack of reactivity to the market fluctuations. Thus, the suggestion of the Basel Committee, in the consultative document Fundamental review of the trading book (2013), is to consider alternative risk measures that can overcome the V aR’s weaknesses. A new risk measure, the Lambda Value at Risk (ΛV aR), has been introduced by a theoretical point of view by Frittelli et al. (2014). The ΛV aR is a generalization of the V aR at confidence level λ. Specifically, the ΛV aR considers a function Λ instead of a constant confidence level λ, where Λ is a function of the losses. Formally, given a monotone and right continuous function Λ : R → (0, 1), the ΛV aR of the asset return X is a map that associates to its cumulative distribution function F (x) = P (X ≤ x) the number: ΛV aR = − inf {x ∈ R | F (x) > Λ(x)} . (1) This new risk measure appears to be attractive for its potential ability to solve several problems of V aR. First of all, it seems to be flexible enough to discriminate the risk among return distributions with different tail behaviors, by assigning more risk to heavytailed return distributions and less in the opposite case. In addition, ΛV aR may allow a rapid changing of the interval of confidence when the market conditions change. Recently, Hitaj et al. (2015) proposed a methodology for computing ΛV aR. In this 2 study, a first attempt of backtesting has also been performed and compared with V aR. Their proposal is based on the hypothesis testing framework by Kupiec (1995). Here, the accuracy of the ΛV aR model is evaluated by considering the following null hypothesis: the relative frequency of exceptions over the backtesting time window does not surpass the maximum of the Λ function. However, the actual level of coverage provided by the ΛV aR model is not constant at any time and, thus, this method misses to evaluate properly the ΛV aR performance. The objective of this paper is to propose the first theoretical framework for the backtesting of the ΛV aR. We propose three backtesting methodologies which exploit different features. The first two tests have the objective to evaluate if the ΛV aR provides an accurate level of coverage. Here, we check if the probability that a violation occurs ex-post actually coincides with the one predicted by the model. Both these two tests are based on test statistics where the distribution is obtained by applying results of probability theory. The first test is unilateral and provides more precise results for usual backtesting time window (i.e 250 observations). The second test is bilateral and provides an asymptotic result. Thus, the second test is more suitable for larger sample of observations. In respect to the hypothesis test proposed in Hitaj et al. (2015), we consider a null hypothesis which better evaluates the ΛV aR performance and, thus, the advantages introduced by its flexibility. We propose a third test that is inspired to the approach used by Acerbi and Szekely (2014) for the Expected Shortfall backtesting. This test is focused on another aspect: it evaluates if the correct coverage of risk derives from the fact that the model has been estimated with the correct distribution of the return. Here, the alternative and null 3 hypothesis change. We propose a test statistic for which the distribution is obtained by simulations. Hence, the first kind of tests do not directly question if the model has been estimated by using the correct distribution function of the asset returns, but verify if the Λ function has been correctly computed and allows for an actual coverage of the risk. On the other hand, the third test considers Λ as correct and question the impact of the estimation of the P&L distribution on the coverage capacity of ΛV aR. Finally, we conduct an empirical analysis based on the backtesting of the ΛV aR, calibrated using the same dynamic benchmark approach proposed by Hitaj et al. (2015). The backtesting exercise has been performed along six different time windows throughout all the global financial crisis (2006-2011). The paper is structured as follows: Section 2 introduces the backtesting models; Section 3 describes and shows the results of the empirical analysis. 2. Model 2.1. Notations and definitions Let us consider a probability space (Ω, (Ft )T , Pt ), where the sigma algebra Ft represents the information at time t. We assume that X is the random variable of the returns of an asset distributed along a real (unknowable) distribution Ft , i.e. Ft (x) := Pt (Xt < x), and it is forecasted by a model predictive distribution Pt conditional to previous information, i.e. Pt (x) = Pt (Xt ≤ x|Ft−1 ). We can measure the risk of the asset return X using the classical V aR, by attributing 4 to X at time t the following value: V aRt = − inf {x ∈ R | Pt (x) > λ} . (2) The alternative risk measure proposed by Frittelli et al. (2014), the ΛV aR, attributes to X at time t the following value: ΛV aRt = − inf {x ∈ R | Pt (x) > Λt (x)} . (3) where Λt is a monotone function that maps the x ∈ R in (λm , λM ) with λm > 0 and λM < 1. Hitaj et al. (2015) proposed a method to estimate the Λ function that is called dynamic benchmark approach. The Λ function represents the proxy of the tails of the market’s P&L distributions. This approach is called dynamic since the Λ is re-estimated at each time t according the information in t − 1. This feature allows the ΛV aR to be: 1. sensitive to the tail risk, in fact, ΛV aR can discriminate the risk of assets with different tail behavior; 2. reactive to the market fluctuations: the probability level given by the Λ function changes according the different asset reaction to the market fluctuations. The authors proposed six different models to estimate Λ, but we focus on the linear ΛV aR versions. These models are obtained by linear interpolation of n points (πi , λi ) for any π1 ≤ x < πn , with i = 1, 2, ..n, and fixing a lower (upper) bound at Λ(x) = λ1 for any x ≤ π1 and upper (lower) bound at Λ(x) = λn for x ≥ πn in the increasing (decreasing) case. Hitaj et al. (2015) chose 4 points (n = 4). On the probability axis, they fixed 5 the Λ minimum λm = 0.001, the maximum λM = 0.01 and the others λi values, with i = 2, .., 3, by an equipartition of the interval (0, λM ]. On the losses axis, they fixed the 4 points πi on the basis of n order statistics of the P&L distribution of some selected market benchmark. Specifically, π1 is equal to the minimum of all the benchmark returns: π1 = min xt,j , where xt,j is the realized return of the j-th benchmark, with j = 1, . . . , B and B is the number of benchmarks and t = 1, .., T and T is time horizon (i.e. number of days in the rolling window); π2 , π3 , and π4 are equal to the maximum, mean, and minimum of the benchmark’s λ%-V aR, respectively. 2.2. Backtesting models Let us denote with xt the realization of the asset return X at time t. In order to perform the backtesting of a risk measure, we need to construct the sequence of the random variable representing the violations, {It }Tt=1 , across T days, as follows: It = 1 if xt < yt 0 otherwise where yt is the return forecasted by the risk measure. The hit sequence is equal to 1 on day t if the realized returns on that day, xt , is smaller than the value yt predicted by the risk measure at time t − 1 for the day t, i.e. ΛV aRt or V aRt . If the yt is not exceeded (or violated), then the hit sequence returns a 0. We assume that the violations It independently occurs. We observe that It is a random variable that follows a Bernoulli 6 distribution, that is: It ∼ B(λt ) (4) where λt is the probability to have an exception at time t. The first test proposed for the backtesting of V aR is given by Kupiec (1995), where the author consider the following null and alternative hypothesis: H0K : λ ≤ (=)λ0 (5) H1K : λ > λ0 Hitaj et al. (2015) proposed a backtesting method by adapting the classical Kupiec test for V aR to the ΛV aR. They consider the following null and alternative hypothesis: H0K : λ ≤ max(Λ) (6) H1K : λ > max(Λ) Substantially, the ΛV aR is accepted if the relative frequency of the n exceptions over the time horizon T , λ := n/T , is less or equal to the maximum of the Λ function, max(Λ). This is an unilateral hypothesis test that can be conducted by using the same log-likelihood ratio and critical value of the V aR test. This approach allows for testing if the objective of having less than 1% of violations has been reached, however, it does not allow to test properly the accuracy of the true ΛV aR. Indeed, if the ΛV aR model is correct, at time t we should be expecting that the hit 7 sequence assumes value 1 with probability λ0t = Pt (Xt < −ΛV aRt ) (7) and 0 with probability 1 − λ0t . In particular, if Xt admits a density under Pt and both the function Λ and Pt are continuous, λ0t = Λt (−ΛV aRt ). In this case, the probability of violations depends by the function Λt . From this considerations, it follows that the random variable It of the violations are not identically distributed, which implies that usual likelihood backtesting framework (POF by Kupiec (1995) , TUFF by Christoffersen (2010) etc.) cannot be directly applied. Hence, if the ΛV aR is correct, that is, the model probability is correct, under H0 , we have that: H0 : λt = λ0t for any t (8) In case of bilateral test, the alternative hypothesis should be set as follows: H1 : λt 6= λ0t for some t (9) While an unilateral test, should be conducted by setting the following alternative hypothesis: H1 : λt > λ0t for some t and equal otherwise (10) where H1 is chosen to be only in the direction of risk under-estimation. In order to test the accuracy of the ΛV aR model, we propose two hypothesis tests of unconditional coverage of ΛV aR. Using theoretical results of probability theory we can 8 evaluate with sufficient level of precision if the ΛV aR guarantees the level of coverage predicted by the parameter λ0t . In particular, the second test provides an asymptotic result, hence it provides the best results for large sample (i.e. time horizon larger than 500). In this way, we are able to better detect the correctness of the ΛV aR than Hitaj et al. (2015). Notice that a rejection of their null hypothesis implies a rejection of ours. We propose also a third test that is very useful to check if the ΛV aR allows for an appropriate coverage of risk by having been estimated with the correct distribution function, Pt . This test does not argue if the level of coverage predicted by the model has been reached, thus, it does not question the correctness of the Λ function. Hence, in this case, the null and alternative hypothesis are: H0 : ΛV aR(Ft ) = ΛV aR(Pt ) for every t (11) H1 : ΛV aR(Ft ) > ΛV aR(Pt ) for some t and equal otherwise Here, the correctness of the null hypothesis is evaluated by a simulation exercise. 2.2.1 Test 1: Test of coverage We set the null and the alternative hypothesis as in (8) and (10), respectively. We construct this first test by defining the test statistic Z1 equal to the number of violations over the time horizon T , as follows: Z1 := T X t=1 9 It (12) The distribution of Z1 is obtained by applying classical results of probability theory. Since the violations It independently occurs and the sum of independent Bernoulli with different mean follows a Poisson Binomial distribution (λt ), thus we have that under H0 : Z1 ∼ Poiss.Bin({λ0t }). This test is in principle a bilateral test, with critical region: C = z1 : z1 < qZ1 ( α2 ) ∪ z1 : z1 ≥ qZ1 (1 − α2 ) . However, when T corresponds to the usual time horizon (i.e. 250 days), the probability that Z1 < qZ1 ( α2 ) is null. In the backtesting practice, we propose to treat this test as unilateral, where the critical region is given by: CZ1 = {z1 : z1 ≥ qZ1 (1 − α)} = {z1 : PZ1 (z1 ) > 1 − α} and α denotes the significance level of the test (i.e. 1 type error) and qZ1 denotes the quantile of the distribution of Z1 under H0 , i.e. PZ1 . In our empirical analysis we fix α = 10% and we compare the result with V aR. For the V aR model, under H0 we have that: Z1 ∼ B(T, λ0 ). This corresponds to the traffic light approach by Basel with two bands instead of three. 10 2.2.2 Test 2: Asymptotic test of coverage We propose a second test that is founded on a result of probability theory known as the Lyapunov theorem. This theorem, that we recall here after, is based on the application of the central limit theorem to random variables that are independent but not identically distributed (see Lyapunov (1954)). Theorem 1 (Lyapunov) Suppose X1 , X2 , ... is a sequence of independent random variables, each with finite expected value µt and variance σt2 . Define s2n = T X σt2 t=1 If for some δ > 0, the “Lyapunov’s condition” T 1 X E |Xt − µt |2+δ = 0 lim 2+δ n→∞ s T t=1 is satisfied, then the following convergence in distribution holds as T goes to infinity: T 1 X d (Xt − µt ) → − N (0, 1) sT t=1 In the following lemma we show that the “Lyapunov’s condition” is satisfied when s2T = PT 1 λt (1 − λt ) and µt = λt . Lemma 2 If {It } is a sequence of independent random variables distributed as a Bernoulli 11 with parameters {λt }t and inf t λt = λm > 0, then lim T →∞ with s2T = PT 1 T 1 X s2+δ T E[|It − λt |2+δ ] = 0 t=1 λt (1 − λt ). Proof. We observe that: E[|It − λt |2+δ ] = (1 − λt )λ2+δ + λt (1 − λt )2+δ t 1 = λt (1 − λt ) λ1+δ + (1 − λt )1+δ ≤ λt (1 − λt ) ≤ . t 4 On the other hand we have s2+δ = T T X λt (1 − λt ) 1 !1+ δ2 ≥ T X λm (1 − λm ) 1 !1+ 2δ δ = (T λm (1 − λm ))1+ 2 . We can thus conclude that PT t=1 E[|It − λt |2+δ ] T ≤ δ → 0 2+δ sT 4 (T λm (1 − λm ))1+ 2 as T → ∞. We set the null and the alternative hypothesis as in (8) and (9), respectively. Thus, we can build the following test statistic, that under H0 is defined as follows: PT (It − λ0t ) Z2 := qPt=1 . T 0 0 1 λt (1 − λt ) and is asymptotically distributed as a Standard Normal. Formally: 12 d Z2 − → N(0, 1) . This result follows from the application of Lemma 2 and the Lyapunov’s theorem. We remark that this is a bilateral test. Thus, we reject the hypothesis H0 if the realization z2 of the test statistic stays in the following critical region: α o n n α o ∪ z2 : z2 (x) > qZ2 1 − CZ2 := z2 : z2 (x) < qZ2 2 2 where α is the significance level of the test, and qZ2 is the quantile function of the Standard Normal distribution PZ2 . 2.2.3 Test 3: Test of P&L correct estimation The third test is inspired by Acerbi and Szekely (2014) and focused on another aspect. Here, we do not only test if the probability λ that a violation occurs is the one provided by the model, λ0 , since we consider the Λ function as correct. The objective of this test is to verify if the ΛV aR guarantees the correct coverage of the risk since it has been estimated under the correct assumption on the distribution Pt of the returns. Hence, the accuracy of the model can be checked by setting the null and alternative hypothesis as in (11). Anyway, under H0 the distribution of Xt should be equal to Pt , hence these hypothesis imply that an unilateral test can also be conducted by testing the correctness of the assumption on the asset return distribution, with the following 13 hypothesis: H0′ : Ft = Pt for every t (13) H1′ : Ft > Pt for some t and equal otherwise The model must be rejected if it is computed under a distribution Pt that under-estimates the correct distribution Ft . In the empirical exercise, we have chosen the hypothesis in (13) because the weakest hypothesis in (11) would have not been sufficient to simulate the test statistic and compute the p-value. We define the Z3 test statistic, that under H0 is given by: Z3 := T T T 1X 0 1X 0 1X (λt − It ) = λt − It T t=1 T t=1 T t=1 (14) We observe that under H0 , we have E[Z3 (X)] = 0, while under H1 , E[Z3 (X)] < 0 in the ΛV aR. So, the realized value Z3 (x) is expected to be zero, and it signals that the model estimation does not allow for covering the risk when it is negative. Proposition 3 Under the test hypothesis H0′ and H1′ we have: 1. EH0′ [Z3 ] = 0 2. EH1′ [Z3 ] < 0. Proof. It is enough to notice that under H0′ , It ∼ B(λ0t ) = so that EH0′ [It − λ0t ] = 0, which implies EH0′ [Z3 ] = 1X EH0′ [λ0t − It ] = 0 . T 14 In a similar way, under H1′ , since It ∼ B(λt ) with λt > λ0t , we obtain that EH1′ [Z3 ] < 0. Notice that the violations It depend on Xt , then under H0 the distribution of Z3 depends on the assumption for the distribution Pt of the asset returns. Hence, in order perform the test, it is necessary to simulate M scenario of the distribution Pt of the returns at each time t, with t = 1, . . . , T . In this way, we obtain at time T the distribution of the test statistic PZ3 under H0 . In order to construct the critical region we need to study the behavior of the Z3 distribution when the distribution of the returns changes from P to F . Let us compute PZ3 : PZ 3 T 1X 0 = P (Z3 ≤ z) = P (λ − It ) ≤ z T t=1 t ! T T X X =P (−It ) ≤ zT − λ0t t=1 =P T X t=1 It ≥ −zT + t=1 where PT t=1 It ! T X t=1 λ0t ! is distributed as a Binomial Poisson of parameter {λt }. We observe that PZ3 is an increasing function of {λt } (i.e. the CDF of Z3 shifts to the left when λ increases). As a consequence, given a confidence level α, we reject when the p-value p = PZ3 (z) is smaller than α. In the empirical analysis we conduct M = 10000 simulations using the same assumptions on the returns’ distributions as for the risk measures computation. We set the the significant level of the test α at 10%. This test allows to verify how the choice of the P&L distribution function influences the level of risk coverage of the ΛV aR, that, instead, it is not directly assessed by Test 15 1 and Test 2. Hence, the best use of Test 3 is comparing the results between the same kind of ΛV aR models but estimated with different assumptions on the P&L distribution (i.e. historical, Montecarlo Normal and GARCH, etc.). The limit of this test is that requires a massive storage of information, since at time T we need all the predictive distributions Pt of the returns for t = 1, . . . , T . 3. Empirical analysis In this section, we provide an empirical analysis of the backtesting methods of the ΛV aR that we have defined in Section (2.2). We applied our tests to a slightly different version of the 1% − ΛV aR models proposed in Hitaj et al. (2015) and to the 1% − V aR model. We compare our backtesting results with the Kupiec-type test proposed in Hitaj et al. (2015) for the ΛV aR and with the classical Kupiec’s test for V aR. We refer to the same dataset as in Hitaj et al. (2015), consisting in daily data of 12 stocks quoted in different countries along different time windows throughout the global financial crisis (specifically, from January 2005 to December 2011). These comprise the stocks of Citigroup Inc. (C UN Equity) and Microsoft Corporation (MSFT UW Equity) for the United States, Royal Bank of Scotland Group PLC (RBS LN Equity) and Unilever PLC (ULVR LN Equity) for the United Kingdom, Volkswagen AG (VOW3 GY Equity) and Deutsche Bank AG (DBK GY Equity) for Germany, Total SA (FP FP Equity) and BNP Paribas SA (BNP FP Equity) for France, Banco Santander SA (SAN SQ Equity) and Telefonica SA (TEF SQ Equity) for Spain, and Intesa Sanpaolo SPA (ISP IM Equity) and Enel SPA (ENEL IM Equity) for Italy. 16 The computation of the risk measures is based on the assumption of historical and Normal distribution of the asset returns. In order to add robustness to the analysis, we also implement GARCH models with t-student increments. The estimation of the parameters is based on 250 days of observations for the historical and Normal assumption, while 500 days are considered for the GARCH model. The backtesting exercise is conducted comparing the realized ex-post daily P&L with the daily V aR and ΛV aR estimates of the 12 stocks over the time period of 1 year. In particular, we split the analysis into six different 2-year rolling windows (250 days for the risk measure computation and 1 year for the backtesting). 3.1. Results 3.1.1 The violations and Kupiec test We first report the results of the violations and the Kupiec test for the V aR model and the Kupiec-type test adapted by Hitaj et al. (2015) for the ΛV aR model. We compute the average number of violations and acceptance rate over all the assets and different time horizon T . The results here presented are under the assumption of historical distribution of the asset returns. 17 Average number of violations VaR 2006 2007 2008 3.42 5.33 3.42 (VaR 5%) (VaR 1%) Kupiec-Test 2009 2010 2011 2006 11.58 0.75 3.08 6.83 100 % 83 % 0% 100 % 92 % 50 % 5.33 11.58 0.75 3.08 6.83 100% 83% 0% 100% 92% 50% 2.25 3.67 7 0.67 2 4.25 100 % 83 % 42 % 100 % 100 % 83 % 2.17 2.33 5.75 0.67 1.58 4 100 % 83 % 67 % 100 % 100 % 83 % 2.21 3.00 6.38 0.67 1.79 4.13 100 % 83 % 54 % 100 % 100 % 83 % (VaR 5%) 1.17 1 3.92 0.42 0.92 2.75 100 % 100 % 100 % 100 % 100 % 100 % (VaR 1%) 1.17 1.08 3.92 0.42 1 2.75 100 % 100 % 100 % 100 % 100 % 100 % 1.17 1.04 3.92 0.42 0.96 2.75 100 % 100 % 100 % 100 % 100 % 100 % 1% 2007 2008 2009 2010 2011 ΛV aR 1% (decr) ΛV aR 1% (incr) Table 1. Time evolution of the average number of violations and the Kupiec test under the historical distribution assumption. The table shows the evolution over the global financial crisis of the average number of violations and the percentage of Kupiec acceptance, aggregated at the level of the 1%V aR, as well as the increasing and decreasing ΛV aR models. As expected and already pointed out in Hitaj et al. (2015) the average number of violations of the 1%V aR is bigger then the one of the ΛV aR, in particular if we compare the increasing models. In fact the 1% V aR model shows a drastic increase in the average number of violations, moving from 3.42 in 2006 to 11.58 in 2008. On the other hand, the increasing ΛV aR models register an average number of violations of around 1.17 during 2006 and retain the number at around 3.92 in the 2008 crisis. This result was expected since the Λ function has been built with maxx Λt (x) = 0.01, which implies that the ΛV aR is always greater or equal than the 1% V aR, so that a loss not covered by the first is also not covered by to the latter. This implies that the ΛV aR performs always better than 1% V aR by using an unilateral Kupiec-type test, since this kind of test does not capture the variability of the Λ function that is the essential feature of the ΛV aR. 18 The number of infractions can be seen as an index of how fast the different models respond to the external events. During 2009, both the V aR and ΛV aR models quickly incorporate the effects of the crisis, significantly decreasing the number of violations. The violations trend is the same also under the other two distribution’s assumptions taken in exam as shown in the following table. Gaussian VaR GARCH 2006 2007 2008 2009 2010 2011 2006 2007 2008 2009 2010 2011 4.58 7.08 14.92 1.75 4.17 9.42 3.17 6.83 8.25 0.33 0.75 4.33 4.58 7.08 14.92 1.75 4.17 9.42 3.17 6.83 8.25 0.33 0.75 4.33 (VaR 5%) 4.42 6.75 14.25 1.58 3.75 9.17 3.08 5.83 7.33 0.33 0.42 4.25 (VaR 1%) 4.25 5.83 13.08 1.42 3.42 8.58 2.75 4.75 6.42 0.25 0.33 3.92 4.33 6.29 13.67 1.50 3.58 8.88 2.92 5.29 6.88 0.29 0.38 4.08 (VaR 5%) 3.33 4.75 10.83 0.92 2.75 6.67 1.25 2.67 3.58 0.00 0.17 1.42 (VaR 1%) 3.33 5.08 11.67 1.17 3.00 7.00 1.25 2.83 3.50 0.00 0.33 1.42 3.33 4.92 11.25 1.04 2.88 6.83 1.25 2.75 3.54 0.00 0.25 1.42 1% ΛV aR 1% (decr) ΛV aR 1% (incr) Table 2. Time evolution of the average number of violations under the Gaussian and GARCH model. The table shows the evolution over the global financial crisis of the average number of violations aggregated at the level of the 1%V aR, as well as the increasing and decreasing ΛV aR models. 3.1.2 Test 1 and Test 2: comparison of the level of coverage among V aR and ΛV aR In Table (3) and (4) we show the results of the tests of coverage that we have proposed in Section (2.2) for the ΛV aR model. The results here presented are under the assumption of historical, gaussian or GARCH distribution of the asset returns. 19 Historical VaR 1% 2006 2007 2008 100% 58% 100% 58% Gaussian GARCH 2009 2010 2011 2006 2007 2008 2009 2010 2011 2006 2007 2008 2009 2010 2011 0% 100% 75% 25% 58% 33% 0% 92% 50% 8% 75% 50% 33% 100% 100% 67% 0% 100% 75% 58% 33% 0% 92% 50% 8% 75% 50% 33% 100% 100% 67% 100% 92% 67% 42% 8% 0% 83% 50% 8% 75% 50% 33% 100% 100% 67% 100% 100% 67% 33% 25% 0% 92% 42% 8% 67% 67% 33% 100% 100% 67% 25% (VaR 5%) 100% 75% 8% (VaR 1%) 92% 83% 25% 96% 79% 17% 100% 96% 67% 38% 17% 0% 88% 46% 8% 71% 58% 33% 100% 100% 67% (VaR 5%) 75% 83% 0% 100% 83% 17% 0% 0% 0% 42% 33% 8% 67% 58% 25% 100% 92% 58% (VaR 1%) 75% 83% 0% 100% 75% 17% 8% 8% 0% 42% 42% 8% 75% 50% 25% 100% 92% 58% 75 % 83% 0% 100% 79% 17% 4% 4% 0% 42% 38% 8% 71% 54% 25% 100% 92% 58% ΛV aR 1% (decr) ΛV aR 1% (incr) Table 3. Time evolutions of the Test 1 for the ΛV aR models under different assumptions of the P&L distribution. The table shows the evolution over the global financial crisis of the acceptance rates, aggregated at the level of the ΛV aR models (minx Λ(x) = 0.5%) calculated using the historical, normal and GARCH assumption of the P&L distribution. Historical 2006 VaR 1% 2007 2008 2009 Gaussian 2010 2011 2006 2007 2008 42% 58% 42% 58% GARCH 2009 2010 2011 2006 2007 2008 2009 2010 2011 0% 100% 67% 25% 83% 58% 42% 100% 100% 67% 42% 0% 100% 67% 25% 83% 58% 42% 100% 100% 67% 100% 75% 0% 100% 92% 100% 75% 0% 100% 92% (VaR 5%) 100% 83% 17% 100% 100% 75% 58% 42% 0% 100% 67% 17% 83% 58% 33% 100% 100% 67% (VaR 1%) 100% 83% 42% 100% 100% 83% 50% 50% 0% 100% 67% 17% 92% 75% 42% 100% 100% 75% 100% 83% 29% 100% 100% 79% 54% 46% 0% 100% 67% 17% 88% 67% 38% 100% 100% 71% (VaR 5%) 100% 100% 17% 100% 92% 42% 17% 25% 0% 92% 50% 8% 92% 75% 67% 100% 100% 83% (VaR 1%) 100% 100% 17% 100% 92% 42% 25% 33% 0% 83% 58% 25% 92% 67% 67% 100% 92% 83% 100% 100% 17% 100% 92% 42% 21% 29% 0% 88% 54% 17% 92% 71% 67% 100% 96% 83% 42% ΛV aR 1% (decr) ΛV aR 1% (incr) Table 4. Time evolutions of the Test 2 for the ΛV aR models under different assumptions of the P&L distribution. The table shows the evolution over the global financial crisis of the acceptance rates, aggregated at the level of the ΛV aR models (minx Λ(x) = 0.5%) calculated using the historical, normal and GARCH assumption of the P&L distribution. We first notice that the acceptance rate of the tests we propose is lower than the unilateral Kupiec-POF test in Hitaj et al. (2015). This is due to the particular construction of the Kupiec-POF test. By imposing a fix parameter λ0 that is equal to max(Λ), this test is not able to capture the daily variation of the coverage level λ0t = Pt (Xt < ΛV aRt ) 20 given by the ΛV aR model. For this reason, this test is useful to assess if the ΛV aR model guarantees a maximum level of accepted coverage, but cannot be used to check the accuracy of the real coverage level, λ0t offered by the ΛV aR model. On the other hand, the coverage tests that we propose are able to better evaluate if the flexibility introduced by the Λ function helps to detect adverse scenario and put aside a more adequate amount of capital. For all the models the asymptotic coverage test (Test 2) provides an higher acceptance rates in respect to the coverage test (Test 1). This is due to the fact that the coverage test provides more precise results with a smaller number of observations. In general, the ΛV aR models result more accurate than 1% V aR, confirming the outcomes in Hitaj et al. (2015). This means that the highest flexibility of the ΛV aR contributes to the highest coverage. On the other hand, in our tests, the decreasing ΛV aR models seem to be more accurate, in contrast with the results of the Kupiec test in Hitaj et al. (2015). Even if the number of the infractions of the increasing ΛV aR models is the smallest, these models lose accuracy especially during the crisis periods (2008, 2011). Our coverage tests point out an issue of estimation in the ΛV aR models proposed by Hitaj et al. (2015). The choice of the Λ minimum, minx Λ(x), that seemed to be irrelevant in Hitaj et al. (2015), here it is determinant, as we discuss below. 3.1.3 The choice of the Λ minimum The results in Table (3) and (4) have been computed by fixing minx Λ(x) = 0.5%. In fact, we notice that, using our coverage tests, the increasing ΛV aR models computed as in Hitaj et al. (2015) presented an higher rejection rate (see Table (5)), while presenting 21 the smallest number of infraction. Test 1: Coverage Test 2006 2007 2008 (VaR 5%) 100 % 75 % (VaR 1%) 92 % Test 2: Asymptotic Coverage Test 2009 2010 2011 2006 2007 2008 2009 2010 2011 8% 100 % 92 % 67 % 100 % 83 % 17 % 100 % 100 % 75 % 83 % 25 % 100 % 100 % 67 % 100 % 83 % 42 % 100 % 100 % 83 % 96 % 79 % 17 % 100 % 96 % 67 % 100 % 83 % 29 % 100 % 100 % 79 % (VaR 5%) 8% 17 % 0% 58 % 42 % 8% 75 % 83 % 0% 100 % 75 % 17 % (VaR 1%) 8% 17 % 0% 58 % 42 % 8% 75 % 83 % 0% 100 % 83 % 25 % 8% 17 % 0% 58 % 42 % 8% 75 % 83 % 0% 100 % 79 % 21 % ΛV aR 1% (decr) ΛV aR 1% (incr) Table 5. Time evolutions of the Test 1 and Test 2 for the ΛV aR models with minx Λ(x) = 0.1% under the historical distribution assumption. The table shows the evolution over the global financial crisis of the acceptance rates, aggregated at the level of the ΛV aR models with minx Λ(x) = 0.1%. Thus, we have studied how the probability of infraction λt evolves and we have observed that in most of the cases it obtains the minimal value 0.1%. This happens especially during crisis periods, when the cumulative distribution function of the assets shifts on the left and intersect the Λ function at the minimum level. For this reason, we propose to compute the ΛV aR models by fixing the Λ minimum equal to 0.5%, i.e. minx Λ(x) = 0.005 instead of minx Λ(x) = 0.001 used by Hitaj et al. (2015). Although, the authors did not specify a criteria for this choice, we consider this as a relevant and critical issue. From our point of view, the Λ minimum should provides the probability to lose more than the worst case event (i.e. benchmarks’ minimum, π1 = min xt,j ) over the time window observations (i.e. 250 in our case). If we consider all the event equally probable, the selection of the Λ minimum should be greater than 1/T over T observations. Thus, in our models, we set minx Λ(x) = 0.5% since the probability of an event over 250 past realizations is 0.4%. 22 By using these new estimations, we have observed that the number of infractions does not change in any of the period under consideration, while the acceptance rate of the increasing ΛV aR models drastically increases (see tables (3) and (4)), validating our choice. Clearly, this new setting does not affect the decreasing ΛV aR models. Anyway, the choice of the Λ minimum can be refined considering more precise evaluation of the probability of a worst case event, but this is beyond the objective of this paper. 3.1.4 Test 3: comparison of ΛV aRs with different P&L estimations As anticipated in Section (2.2), the best use of the third test is the comparison of the level of coverage among different estimations of the ΛV aR. We computed the time evolution of the acceptance rate aggregated at the level of the increasing and decreasing ΛV aR models. We repeat the analysis changing the assumption on the P&L distribution: specifically, historical, Monte Carlo Normal and GARCH simulations. The results are presented in Table (6) Historical VaR 2006 2007 2008 50% 33% 50% (VaR 5%) (VaR 1%) 1% Gaussian GARCH 2009 2010 2011 2006 2007 2008 2009 2010 2011 2006 2007 2008 2009 2010 2011 0% 100% 58% 25% 58% 33% 0% 92% 50% 8% 75% 58% 33% 100% 100% 67% 33% 0% 100% 58% 58% 33% 0% 92% 50% 8% 75% 58% 33% 100% 100% 67% 50% 33% 0% 100% 67% 17% 58% 42% 0% 92% 58% 17% 75% 58% 33% 100% 100% 67% 58% 50% 8% 100% 67% 8% 50% 33% 0% 92% 58% 25% 92% 67% 33% 100% 100% 75% 54% 42% 4% 100% 67% 13% 54% 38% 0% 92% 58% 21% 83% 63% 33% 100% 100% 71% 8% 17% 0% 58% 42% 0% 17% 17% 0% 92% 50% 8% 83% 67% 67% 100% 100% 83% 25% ΛV aR 1% (decr) (VaR 5%) ΛV aR 1% (incr) (VaR 1%) 8% 17% 0% 58% 42% 8% 33% 8% 0% 83% 50% 17% 83% 58% 67% 100% 92% 83% 8% 17% 0% 58% 42% 4% 25% 13% 0% 88% 50% 13% 83% 63% 67% 100% 96% 83% Table 6. Time evolutions of the Test 3 for the ΛV aR models under different assumptions of the P&L distribution. The table shows the evolution over the global financial crisis of the acceptance rates, aggregated at the level of the ΛV aR models (minx Λ(x) = 0.5%) calculated using the historical, normal and GARCH assumption of the P&L distribution. 23 The results show that the GARCH assumption on the return guarantees the highest coverage. Moreover, we notice how in this test the Historical estimator frequently underperforms the Gaussian one, in contrast with the previous tests. This is due to the fact that the Historical estimator takes values only from a finite sample; in particular, if the realized return xt at time t is lower than all the ones of the year before, on which the historical estimator is built, not only the probability of obtain it is 0 but also Pt (xt ) = 0. Obviously, this problem does not occur if we suppose a normal distributions of the returns. Test 3 is based on the tail behaviour of the distribution and, in particular, compares the realized number of violations with the ones provided by the model. For this reason, a distribution with thin tails (as the historical) will perform poorly on this test. These observations can explain why this test sometimes results more punitive in the historical case than in the normal one. Such a preference for the normal distribution is, on the other hand, completely reversed by the other tests, which privilege the Historical distribution by relying (almost) only on the number of infractions and not on the full shape of the distribution. 4. Conclusions A new risk measure, the ΛV aR, has been recently introduced. An ad hoc study on its backtesting has not been done in literature. The issue is that, in the ΛV aR model, the probability of a violation is not constant, but somehow depends on the function Λ. A first backtesting proposal is provided by Hitaj et al. (2015). However, this methodology 24 does not keep into account the effective predictive capacity of the ΛV aR as introduced by the Λ function. We propose three backtesting methodologies and we asses the accuracy of the new risk measure from different points of view. Test 1 and Test 2 evaluate if the ΛV aR provides an accurate level of coverage, which is the one exactly predicted by the model. These tests are more efficient to compare the goodness of ΛV aR with respect to V aR. In particular, they assess the additional value introduced by the Λ function and if Λ has been correctly estimated, thus, allowing a better coverage of the risk. Test 1 is unilateral and provides more precise results with a small sample of observation (g.e. 250). Test 2 is bilateral and provides an asymptotic result; for this reason, it is preferable for larger sample of observations. The Test 3 is focused on another aspect. Here, the correctness of the Λ function is not argued. This test evaluates if the correct coverage of the risk derives from the fact that ΛV aR has been estimated with the correct distribution of the returns. Hence, the best use of Test 3 is comparing the results between the same kind of ΛV aR models but estimated under different assumptions on the P&L distribution (i.e. historical, Montecarlo Normal and GARCH). This test, being based on simulations, requires a massive storage of information and may provide less accurate results when the distribution of the returns has thin tails. Finally, we conduct an empirical analysis. Both Test 1 and Test 2 show that the ΛV aR models perform better than 1 % V aR, confirming the results in Hitaj et al. (2015). Test 1 provides more precise results than Test 2, implying an higher rejection rate. Test 3 shows that the ΛV aR computed with the GARCH model of the returns has the highest 25 level of coverage. Acknowledgements This research benefited from the support of the “Chaire Risques Financiers”, Fondation du Risque. References Acerbi, C., and Szekely, B. (2014), ”Back-testing expected shortfall,” Risk, 27(11). Basel Committee on Banking Supervision (1996),”Supervisory Framework for the Use of Backtesting in Conjunction with the Internal Models Approach to Market Risk Capital Requirements,” Bank for International Settlements. Christoffersen, P. (2010), ”Encyclopedia of Quantitative Finance - Backtesting,” John Wiley and Sons. Basel Committee on Banking Supervision (2013), ”Fundamental review of the trading book,” Second consultative document, Bank for International Settlements. Frittelli, M., Maggis, M., and Peri, I. (2014), ”Risk Measures on and Value at Risk with Probability/Loss Function,” Mathematical Finance, 24, 442-463. Hitaj, A., Mateus, C., and Peri, I. (2015), ”Lambda value at risk and regulatory capital: a dynamic approach to tail risk,” Working paper. 26 Kerkhof, J., and Melenberg, B. (2004), ”Backtesting for Risk-Based Regulatory Capital,” Journal of Banking & Finance, 28(8), 1845–1865. Kupiec, P. (1995), ”Techniques for Verifying the Accuracy of Risk Measurement Models,” Journal of Derivatives, 3, 73-84. Lyapunov, A. M. (1954), Collected works, Vol 1. 27