Links for refrenced papers https://dl.acm.org/doi/pdf/10.1145/2661631 https://www.researchgate.net/publication/322488817_Conditi onal_Monte_Carlo_for_sums_with_applications_to_insurance_ and_finance https://www.tandfonline.com/doi/pdf/10.1080/10920277.201 1.10597627?needAccess=true Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. Reinforcement learning differs from supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of a training dataset, it is bound to learn from its experience. Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three problem classes:[1] optimization, numerical integration, and generating draws from a probability distribution. Abstract Value-at-risk (VaR) and conditional value-at-risk (CVaR) are two widely used risk measures of large losses and are employed in the financial industry for risk management purposes. In practice, loss distributions typically do not have closed-form expressions, but they can often be simulated (i.e., random observations of the loss distribution may be obtained by running a computer program). Therefore, Monte Carlo methods that design simulation experiments and utilize simulated observations are often employed in estimation, sensitivity analysis, and optimization of VaRs and CVaRs. In this article, we review some of the recent developments in these methods, provide a unified framework to understand them, and discuss their applications in financial risk management. INTRODUCTION Risk is a fundamental attribute of financial activities. When investors make financial decisions, they consider not only potential returns but also potential risks. There are various kinds of risks in the financial industry. For instance, an investment bank may hold a portfolio of stocks for a period of time and the value of the portfolio may evolve at random during the period. Then, the bank faces the market risk that the value of the portfolio may fall below the initial value. Similarly, a commercial bank may hold a portfolio of loans lent to different obligors. Then, the bank faces the credit risk that some of the obligors may default. Because of the importance and ubiquity of financial risks, individual financial institutions often want to identify and understand the risks in their activities, based on which they can then control or manage the risks. Furthermore, because of the systematic nature of financial institutions, risks of one institution can easily spread to other institutions or even to the entire financial system, resulting in the so-called systemic risk. Such systemic risk may even affect the entire economic and social system. Therefore, consensus has been reached that regulations on financial systems and financial markets are necessary. There have been numerous risk measures introduced and employed in the financial industry. Value-at-risk (VaR) and conditional value-at-risk (CVaR, also known as expected shortfall or tail conditional expectation), which we review in this article, are among the most well-known and widely used ones and play dominating roles in practice. For any α ∈ (0, 1), the α-VaR of a random loss L is the α quantile of L, while the α-CVaR is the average of all β-VaR for β ∈ (α, 1). As we are typically interested in the risk of large losses in practice, α is typically quite close to 1, for example, α = 0.9, 0.95, 0.99. As pointed out by Hong and Liu [2009], if we define the large losses to be the losses in the upper (1 − α)-tail of the loss distribution, then the α-VaR is the lower bound of the large losses and the α-CVaR is the mean of the large losses. They provide information on potential large losses that an investor may suffer Even though VaR was widely adopted in financial practice, there is also criticism on its use as a risk measure. Artzner et al. [1999] defined four axioms and called a risk measure that satisfies these axioms a coherent risk measure. One of these axioms is the subadditivity axiom, which basically means that “a merger does not create extra risk.” They further showed that VaR does not always satisfy the subadditivity axiom and is therefore not a coherent risk measure. Rockafellar and Uryasev [2002], on the other hand, showed that CVaR satisfies all four axioms and is therefore a coherent risk measure (see also the study of Acerbi and Tasche [2002]). Kou et al. [2013], however, argued that the subadditivity axiom is not necessary and suggested replacing it with the comonotonic subadditivity axiom, which only requires subadditivity to hold for random variables moving in the same direction. They showed that both VaR and CVaR satisfy the comonotonic subadditivity axiom. However, they argued that, compared to CVaR, VaR is often more robust to the tail behavior of the loss distribution, which is in general difficult to characterize in practice, and is therefore more suitable for regulatory purposes. 2. ESTIMATIONS OF VAR AND CVAR As a starting point, we define VaR and CVaR and explore their inherent connections. Let L be the random loss of interest and F(y) = Pr{L ≤ y} be the cumulative distribution function (CDF) of L. Then, the inverse CDF of L can be defined as F−1 (γ ) = inf {y : F(y) ≥ γ }. Following the definitions of Trindade et al. [2007], for any α ∈ (0, 1), we define the α-VaR of L as vα = F−1 (α), and define the α-CVaR of L as cα = 1 1−α ∫1 α vβ dβ. (1)Monte Carlo Methods for Value-at-Risk and Conditional Value-at-Risk: A Review 22:5 Pflug [2000] showed that cα is also the optimal value of the stochastic program: cα = inf t∈ { t+1 1 − α E[L − t]+ } , (2) where [a]+ = max{0, a}. Let T be the set of optimal solutions to the stochastic program defined in Equation (2). Then it can be shown that T = [vα , uα ], where uα = sup{t : F(t) ≤ α} (see, e.g., Rockafellar and Uryasev [2002] and Trindade et al. [2007]). In particular, note that vα ∈ T . Therefore, cα = vα + 1 1 − α E[L − vα ]+. (3) When L has a positive density in the neighborhood of vα , then vα = uα . Therefore, the stochastic program defined in (2) has a unique solution, and cα = E[L|L ≥ vα ], (4) while the right-hand side of Equation (4) is also known as expected shortfall or tail conditional expectation. To be meaningful, we assume that cα is finite for all discussions related to CVaR in this article. 2.1. Crude Monte Carlo Estimation Suppose that L1, L2, . . . , Ln are n independent and identically distributed (i.i.d.) observations from the loss L. Then, the α-VaR of L can be estimated by ˆvn α = Lnα:n, where a denotes the smallest integer larger than or equal to a, and Li:n is the ith order statistic from the n observations. Trindade et al. [2007] suggested to use the estimator ˆc n α = inf t∈ { t+1 n(1 − α) n∑ i=1 [Li − t]+ } (5) to estimate the α-CVaR of L. Let Fn(y) = 1 n n∑ i=1 1{Li ≤y} be the empirical CDF constructed from L1, L2, . . . , Ln, where 1{·} is the indicator function. Then ˆc n α = inf t∈ { t+1 1 − α E[ ̃L − t]+ } , where the CDF of ̃L is Fn. Since ˆvn α = F−1 n (α), then by Equation (3), we have ˆc n α = ˆvn α+1 n(1 − α) n∑ i=1 [Li − ˆvn α ]+ . (6) Therefore, we can apply Equation (6) to directly estimate cα instead of solving the stochastic program in Equation (5). Consistency and asymptotic normality of the estimators ˆvn α and ˆc n α have been studied extensively in the literature (see, e.g., Serfling [1980] and Trindade et al. [2007]). Regarding the asymptotic properties, a result that is even sharper is the Bahadur representation [Bahadur 1966]. ACM Transactions on Modeling and Computer Simulation, Vol. 24, No. 4, Article 22, Publication date: November 2014. 22:6 L. J. Hong et al. As a unified view, we present the asymptotic properties of ˆvn α and ˆc n α using the Bahadur representations. To this end, we first make the following assumption. ASSUMPTION 1. There exists an > 0 such that L has a positive and continuously differentiable density f (x) for any x ∈ (vα − , vα + ). Assumption 1 requires that L has a positive and differentiable density in a neighborhood of vα . It implies that F(vα ) = α and cα = E[L|L ≥ vα ]. Bahadur representations of ˆvn α and ˆc n α are summarized in the following theorem, whose proof can be found in Sun and Hong [2010]. THEOREM 2.1. For a fixed α ∈ (0, 1), suppose that Assumption 1 is satisfied. Then ˆvn α = vα + 1 f (vα ) ( α−1 n n∑ i=1 1{Li ≤vα } ) + An, and ˆc n α = cα + ( 1 n n∑ i=1 [ vα + 1 1 − α (Li − vα )+ ] − cα ) + Bn, where An = Oa.s.(n−3/4 (log n) 3/4 ), Bn = Oa.s.(n−1 log n), and the statement Y n = Oa.s.(g(n)) means that Yn/g(n) is bounded by a constant almost surely. Consistency and asymptotic normality of ˆvn α and ˆc n α follow straightforwardly from Theorem 2.1. Specifically, if Assumption 1 is satisfied, then ˆvn α → vα and ˆc n α → cα with probability 1 (w.p.1) as n → ∞, and √n ( ˆvn α − vα )⇒ √α(1 − α) f (vα ) N(0, 1), as n → ∞, (7) where “⇒” denotes “converge in distribution,” and N(0, 1) represents the standard normal random variable. If, in addition, E[(L − vα ) 2 1{L≥vα }] < ∞, then √n ( ˆc n α − cα ) ⇒ σ∞ · N(0, 1), as n → ∞, (8) where σ2 ∞ = lim n→∞ nVar ( ˆc n α )=1 (1 − α) 2 · Var([L − vα ]+). 2.2. Variance Reduction In the simulation literature, there has been a significant amount of work on the topic of variance reduction for VaR estimation. For instance, Hsu and Nelson [1990] and Hesterberg and Nelson [1998] studied the use of control variates. Avramidis and Wilson [1998] employed correlation-induction techniques for variance reduction in quantile estimation. Glynn [1996] considered the use of importance sampling (IS) and discussed its asymptotic properties. The problem of estimating portfolio VaR has been studied in Glasserman et al. [2000] and Glasserman et al. [2002], where IS and stratified sampling are employed. Among various variance reduction methods proposed in the literature, IS is particularly attractive, given the rare-event features of many practical problems. It has proven to be a very effective variance reduction technique in this context, and much work has been done regarding this issue. In what follows, we discuss a general IS method for estimating VaR and CVaR, with a focus on the asymptotic properties of the IS estimators. Specifically, suppose that L is simulated under another CDF G(·), where F is absolutely continuous with respect to G in [vα − , ∞), with > 0 being a fixed constant, that is, F(dx) = 0 if G(dx) = 0 for any ACM Transactions on Modeling and Computer Simulation, Vol. 24, No. 4, Article 22, Publication date: November 2014. Monte Carlo Methods for Value-at-Risk and Conditional Valueat-Risk: A Review 22:7 x ∈ [vα − , ∞). We refer to G as the IS distribution and let l(x) = F(dx)/G(dx) denote the likelihood ratio function (also called score function) associated with the change of measure. Note that for x ∈ [vα − , ∞), F(x) = EF [1{L≤x} ] = EG [1{L≤x}l(L)] , where EF and EG denote taking expectations with respect to F and G, respectively. Then we may estimate F(x) by Fn,IS (x) = 1 n n∑ i=1 1{Li ≤x}l(Li ). Then the IS estimators of vα and cα , denoted by ˆvn,IS α and ˆc n,IS α , can be defined as follows: ˆvn,IS α = F−1 n,IS (α) = inf {x : Fn,IS (x) ≥ α}, and ˆc n,IS α = ˆvn,IS α+1 n(1 − α) n∑ i=1 (Li − ˆvn,IS α )+l(Li ). Recently, Sun and Hong [2010] and Chu and Nakayama [2012] independently studied the Bahadur representations of the IS estimators. To present this result, we follow the framework of Sun and Hong [2010] and make a further assumption. ASSUMPTION 2. There exist > 0 and C > 0 such that l(x) ≤ C for any x ∈ (vα − , vα + ), and there exists p > 2 such that EG [l p(L)] < ∞. Assumption 2 requires that the likelihood ratio is bounded from above in a neighborhood of vα and has a finite p > 2 moment on the right tail of the loss. The Bahadur representations of the IS estimators of vα and cα are summarized in the following theorem. Interested readers may refer to Sun and Hong [2010] for its proof. THEOREM 2.2. For a fixed α ∈ (0, 1), suppose that Assumptions 1 and 2 are satisfied. Then, ˆvn,IS α = vα + 1 f (vα ) ( α−1 n n∑ i=1 1{Li ≤vα }l(Li ) ) + C n, and ˆc n,IS α = cα + ( 1 n n∑ i=1 [ vα + 1 1 − α (Li − vα )+l(Li ) ] − cα ) + Dn, where C n = Oa.s.(max{n−1+2/ p+δ , n−3/4+1/(2 p)+δ }) and Dn = Oa.s.(n−1+2/ p+δ ) for any δ > 0. Asymptotic normality of the estimators follows immediately from Theorem 2.2. In particular, under Assumptions 1 and 2, √n( ˆvn,IS α − vα )⇒ √ VarG [1{L≥vα }l(L)] f (vα ) N(0, 1), as n → ∞. If, in addition, EG[(L − vα ) 2l2 (L)1{L≥vα }] < ∞, then √n( ˆc n,IS α − cα )⇒ √ VarG [(L − vα )+l(L)] 1 − α N(0, 1), as n → ∞. If l(x) ≤ 1 for all x ≥ vα , then it can be easily verified that VarG[1{L≥vα }l(L)] ≤ α(1 − α) and VarG[(L − vα )+l(L)] ≤ Var[(L − vα )+]. Then, compared to Equations (7) and (8), ACM Transactions on Modeling and Computer Simulation, Vol. 24, No. 4, Article 22, Publication date: November 2014. 22:8 L. J. Hong et al. it can be seen that the asymptotic variances of the IS estimators are smaller than those of the estimators without IS, given that l(x) ≤ 1 for all x ≥ vα . In practice, an effective IS distribution (with a density function g) often satisfies g(x) ≥ f (x) for x ≥ vα . This provides a guideline for selecting an appropriate IS distribution during practical implementation. Large Sample The -level value at risk (VaR) and the -level conditional tail expectation (CTE) of a continuous random variable X are defined as its -level quantile (denoted by q ) and its conditional expectation given the event {X q}, respectively. VaR is a popular risk measure in the banking sector, for both external and internal reporting purposes, while the CTE has recently become the risk measure of choice for insurance regulation in North America. Estimation of the CTE for company assets and liabilities is becoming an important actuarial exercise, and the size and complexity of these liabilities make inference procedures with good small sample performance very desirable. A common situation is one in which the CTE of the portfolio loss is estimated using simulated values, and in such situations use of variance reduction techniques such as importance sampling have proved to be fruitful. Construction of confidence intervals for the CTE relies on the availability of the asymptotic distribution of the normalized CTE estimator, and although such a result has been available to actuaries, it has so far been supported only by heuristics. The main goal of this paper is to provide an honest theorem establishing the convergence of the normalized CTE estimator under importance sampling to a normal distribution. In the process, we also provide a similar result for the VaR estimator under importance sampling, which improves upon an earlier result. Also, through examples we motivate the practical need for such theoretical results and include simulation studies to lend insight into the sample sizes at which these asymptotic results become meaningful. 1. INTRODUCTION (similar to the structure of Review of literature from the example BTP The -level value at risk (VaR) and the -level conditional tail expectation (CTE) of a continuous random variable X are defined as its -level quantile (denoted by q ) and its conditional expectation given the event {X q}, respectively. Although both of these risk measures are popular, it is noteworthy that requirements on a risk measure for it to be coherent, as laid out in Artzner et al. (1999), are satisfied by the CTE but are not by the VaR. VaR is popular in the banking sector and is used for both external and internal reporting purposes. On the insurance side, for variable annuities the adoption of the C-3 Phase II revision to the regulatory risk-based capital model in 2005, and the implementation of the analogous principles-based reserving methodology (AG VACARVM) in 2009 by the National Association of Insurance Commissioners (NAIC), have together made the CTE the key risk measure. Now with the Life Reserves Work Group and the Life Capital Work Group (C3WG) of the American Academy of Actuaries working on an analogous reserve and capital methodology for life insurance products, and the possibility of principles-based reserves (PBRs) being made effective in 2014, CTE is well poised to become the risk measure of choice for the whole of the life industry in the United States. * Jae Youn Ahn, Doctoral Student, Dept. of Statistics and Actuarial Science, The University of Iowa, 241 Schaeffer Hall, Iowa City, IA 52242, USA. jaeyoun-ahn@uiowa.edu. † Nariankadu D. Shyamalkumar, ASA, PhD, Assistant Professor, Dept. of Statistics and Actuarial Science, The University of Iowa, 241 Schaeffer Hall, Iowa City, IA 52242, USA. shyamal-kumar@uiowa.edu. 394 NORTH AMERICAN A CTUARIAL J OURNAL, V OLUME 15, NUMBER 3 The above described changes in insurance regulation require estimation of the CTE for company assets and liabilities, and the size and complexity of these liabilities make inference procedures with good small sample performance very desirable. There are two common situations requiring inference procedures for the CTE. In the first, the distribution of the loss random variable is unknown, and the actuary has only a random sample from this unknown distribution at his or her disposal. In the second, the loss random variable is a known function of some economic variable(s) with a known distribution for the latter. The complexity in this situation arises from the huge computational cost involved in calculating the loss random variable as a function of the economic variable(s). So although the distribution in theory can be ascertained with certainty, the computational complexity of the task renders it practically unknown, and once again the actuary has to make do with a sample from the portfolio loss distribution. Although both of these situations have much in common, it is the availability of variance reduction techniques such as importance sampling in the second situation that makes them different. We refer to Glasserman (2004) for a self-contained treatment of the use of Monte Carlo methods in finance, in particular the use of variance reduction techniques such as importance sampling. In response to this need for inference procedures for the CTE, and for better understanding of their performance, there has been a surge in the actuarial literature of papers dealing with statistical inference of the CTE and related risk measures; see, for example, Jones and Zitikis (2003), Manistre and Hancock (2005), Kaiser and Brazauskas (2006), Kim and Hardy (2007), Brazauskas et al. (2008), Ko et al. (2009), Russo and Shyamalkumar (2010), Necir et al. (2010), and Ahn and Shyamalkumar (2010). Nevertheless, only Manistre and Hancock (2005) discuss the use of variance reduction techniques for estimation of the CTE, and this is the area of focus for our paper. Our interest in establishing asymptotic convergence results for the empirical CTE and quantile under importance sampling arose mainly because we see importance sampling as one of the potent practical strategies to get not only better point estimators but also confidence intervals; see, for example, Manistre and Hancock (2005) and Glasserman et al. (2000). The main contribution of the paper is that we establish asymptotic normality of the CTE and VaR estimators under importance sampling. In the case of VaR, as discussed later, our result improves upon an earlier result of Glynn (1996), whereas there is no published result for the case of the CTE. However, we note that our results have been suggested and supported by heuristics derived from the use of influence functions in Manistre and Hancock (2005), one of the earlier articles on the estimation of CTE in the actuarial literature. Al- though a theoretical result justifying the use of a methodology is undoubtedly of interest, its practical value is amplified if it prevents the use of the methodology in cases where against expectations the methodology fails. Through the first two examples, for the expository ease dealing with the case of ordinary sampling, we motivate the practical need for theoretical results establishing asymptotic normality of the CTE and VaR estimators under importance sampling. The following nonpathological ordinary sampling example shows that the existence of influence function in the case of the CTE falls short of establishing convergence of the empirical CTE to normality, and also that the formula derived for the asymptotic variance through the use of influence function could be misleading. The use of influence function for VaR is not similarly prone to misuse because the asymptotic variance formula for the VaR is proportional to the reciprocal of the density evaluated at the quantile, and known results for weak convergence of empirical quantiles under ordinary sampling (for example, see Reiss 1989) require only that the density evaluated at the quantile be positive. We refer the reader to Manistre and Hancock (2005), especially to sections 2 and 5 therein, for an introduction to influence functions. 3. Variance Reduction for the c.d.f. CdMC always gives variance reduction. But as argued, it needs to be substantial for the procedure to be worthwhile. Further in many applications the right and/or left tail is of particular interest, so one may pay particular attention to the behaviour there. Remark 3.1 That CdMC gives variance reduction in the tails can be seen intuitively by the following direct argument without reference to Rao–Blackwellization. The CrMC, respectively, the CdMC, estimators of F n xð Þ are I Sn > xð Þ and F xSn1ð Þ, with second moments EI Sn > xð Þ2 = EI S n > xð Þ = ð1 1 f n1 yð ÞF xyð Þ dy (3.1) = X ≥ 0 P Sn1 > xð Þ + ðx 0 f n1 yð ÞF xyð Þ dy (3.2) EF xSn1ð Þ2 = ð1 1 f n1 yð ÞF xyð Þ2 dy (3.3) = X ≥ 0 P Sn1 > xð Þ + ðx 0 f n1 yð ÞF xyð Þ2 dy (3.4) In the right tail (say), these second moments can be interpreted as the tails of the r.v.’s Sn − 1 + X, Sn − 1 + X* where X, X* are independent of Sn − 1 and have tails F and F2 . Since F2 xð Þ is of smaller order than F xð Þ in the right tail, the tail of Sn − 1 + X* should be of smaller order than that of Sn − 1 + X, implying the same ordering of the second moments. However, as n becomes large one also expects the tail of Sn − 1 to more and more dominate the tails of X, X* so that the difference should be less and less marked. The analysis to follow will confirm these guesses. A measure of performance which we consider is the ratio r n(x) of the CdMC variance to the CrMC variance: r n xð Þ = Var F xSn1ð Þ Fn xð ÞF n xð Þ = Var F xSn1ð Þ½ Fn xð ÞF n xð Þ (3.5) (note that the two alternative expressions reflect that the variance reduction, is the same whether CdMC is performed for F itself or the tail F). To provide some initial insight, we examine in Figure 3, r n(xn, z) as function of z where xn, z is the z-quantile of Sn. In Figure 3(a), the underlying F is Pareto with tail F xð Þ = 1 = 1 + xð Þ3 = 2 and in Conditional Monte Carlo for sums 459 at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S1748499517000252 Downloaded from https://www.cambridge.org/core. IP address: 178.171.127.215, on 26 Nov 2019 at 09:35:55, subject to the Cambridge Core terms of use, available Figure 3(b), it is standard normal. Both figures consider the cases of a sum of n = 2, 5 or 10 terms and use R = 250,000 replications of the vector Y1 , ... , Yn − 1 (variances are more difficult to estimate than means, therefore the high value of R). The dotted line for AK (the Asmussen-Kroese estimator, see section 4) may be ignored for the moment. The argument z on the horizontal axis is in log10-scale, and xn, z was taken as the exact value for the normal case and the CdMC estimate for the Pareto case. For the Pareto case in Figure 3(a), it seems that the variance reduction is decreasing in both x and n, yet in fact it is only substantial in the left tail. For the normal case, note that there should be symmetry around x = 0, corresponding to z(x) = 1/2 with base-10 logarithm −0.30. This is confirmed by the figure (though the feature is of course somewhat disguised by the logarithmic scale). In contrast to the Pareto case, it seems that the variance reduction is very big in the right (and therefore also left) tail but also that it decreases as n increases. We proceed to a number of theoretical results supporting these empirical findings. They all use formulas (3.3) and (3.4) for the second moments of the CdMC estimators. For the exponential distribution, the calculations are particularly simple: Example 3.2 Assume F xð Þ = ex, n = 2. Then P X1 + X2 > xð Þ = xex + ex and (5) takes the form F xð Þ + ðx 0 eye2ðxyÞ dy = ex + e2x ex1ð Þ = 2exe2x and so for the right tail: r2 xð Þ = 2exe2x xex + ex ð Þ2 xex + exð Þ 1xexexð Þ For x → ∞, this gives r2 xð Þ = 2ex + o ex ðÞ xex + o xexð Þ = 2 x 1 + oð1Þð Þ ! 0 In the left tail x → 0, Taylor expansion give that up to the third-order term 2exe2x 1x2 + x3; xex + ex = 1x2 = 2 + x3 = 3 10 -1 10 -2 10 -3 10 -4 0 0.5 1 Pareto n = 2, CdMC n = 2, AK n = 5, CdMC n = 5, AK n = 10, CdMC n = 10, AK 10 -1 10 -2 10 -3 10 -4 0 0.5 1 Normal(0,1) n = 2, CdMC n = 2, AK n = 5, CdMC n = 5, AK n = 10, CdMC n = 10, AK (a) (b) Figure 3. The ratio r n(z) in (3.5), with F Pareto in (a) and normal in (b). Søren Asmussen 460 at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S1748499517000252 Downloaded from https://www.cambridge.org/core. IP address: 178.171.127.215, on 26 Nov 2019 at 09:35:55, subject to the Cambridge Core terms of use, available and so r2 xð Þ 1x2 + x3 1x2 = 2 + x3 = 3 2 1x2 + x3ð Þ x2 = 2x3 = 6ð Þ 1x2 + x3 1x2 + 2x3 = 3 x2 = 2 = 2x 3!0◊ The relation r n(x) → 0 in the left tail (i.e. as x → 0) in the exponential example is in fact essentially a consequence of the support being bounded to the left: Proposition 3.3 Assume X > 0 and that the density f(x) satisfies f xð Þ cxp as x → 0 for some c >0 and some p > −1. Then r n xð Þ dx p + 1 as x → 0 for some 0 < d = d(n) < ∞. The following result explains the right tail behaviour in the Pareto example and shows that this extends to other standard heavy-tailed distributions like the lognormal or Weibull with decreasing failure rate (for subexponential distributions, see, e.g. Embrechts et al., 1997): Proposition 3.4 Assume X > 0 is subexponential. Then r n(x) → 1 − 1/n as x → ∞. For light tails, Example 3.2 features a different behaviour in the right tail, namely rn(x) → 0. Here is one more such light-tailed example: Proposition 3.5 If X is standard normal, then r n(x) → 0 as x → ∞. More precisely, r n xð Þ 1 x ffiffiffiffiffiffiffiffiffiffiffiffi 2n1 nπ r ex2 = ½2nð2n1Þ The proofs of Propositions 3.3–3.5 are in the Appendix. To formulate a result of type r n(x) → 0 as x → ∞ in a sufficiently broad class of light-tailed F encounters the difficulty that the general results giving the asymptotics of P Sn > xð Þ as x → ∞ with n fixed are somewhat involved (the standard light-tailed asymptotics is for P S n > bnð Þ as n → ∞ with b fixed, cf. e.g. Jensen, 1995). It is possible to obtain more general versions of Example 3.2 for close-toexponential tails by using results of Cline (1986) and of Proposition 3.5 for thinner tails by involving Balkema et al. (1993). However, the adaptation of Balkema et al. (1993) is rather technical and can be found in Asmussen et al. (2017). One may note that the variance reduction is so moderate in the range of z considered in Figure 3(b) that CdMC may hardly be worthwhile for light tails except for possibly very small n. If variance reduction is a major concern, the obvious alternative is to use the standard IS algorithm which uses exponential change of measure (ECM). The r.v.’s X1 , ... , X n are here generated from the exponentially twisted distribution with density fθ xð Þ = eθx f ðxÞ = EeθX, where θ should be chosen such that EθSn = x. The estimator of P Sn > xð Þ is eθS n EeθX nI Sn > xð Þ (3.6) see Asmussen & Glynn (2007: 167–169) for more detail. Further variance reduction would be obtained by applying CdMC to (3.6) as implemented in the following example. Example 3.6 To illustrate the potential of the IS-ECM algorithm, we consider the sum of n = 10 r.v.’s which are γ(3,1) at the z = 0.95, 0.99 quantiles xz. The exponentially twisted distribution is γ(3, 1 − θ) and EθSn = x means 3/(1 − θ) = x, i.e. θ = 1 − 3/(x/n). With R = 100,000 replications, we obtained the values of r n(x) at the z quantiles for z = 0.95, 0.99 given in Table 1. It is seen that IS-ECM indeed performs much better that CdMC, but that CdMC is also moderately useful for providing some further variance reduction. ◊ A further financially relevant implementation of the IS-ECM algorithm is in Asmussen et al. (2016) for lognormal sums. It is unconventional because it deals with the left tail (which is light) rather than the right tail (which is heavy) and because the ECM is not explicit but done in an approximately efficient way. Another IS algorithm for the left lognormal sum tail is in Gulisashvili & Tankov (2016), but the numerical evidence of Asmussen et al. (2016) makes its efficiency somewhat doubtful. CONCLUSIONS AND FURTHER DISCUSSIONS This article provides a unified view of the simulation of VaR, CVaR, and their sensitivities. It also gives a brief review on VaR and CVaR optimization. These topics are inherently related and are important content of financial risk management. We believe the methodologies and techniques covered in this article are very important for financial risk management practice. However, the context of this article is far from sufficient for the practice of risk management. In this article, we have mainly focused on research for dealing with VaR and CVaR. We did not study in depth the properties of VaR and CVaR risk measures. Every risk measure has its properties, advantages, and disadvantages. Understanding these properties is important and could be beneficial from a risk management perspective. For instance, one important feature of using VaR optimization is that the model may result in very skewed loss distribution, and consequently, the risk may hide in the tail of the distribution (see, e.g., Natarajan et al. [2008]). This issue is very important for risk management practice. Similarly, we think using the CVaR optimization model may also bring in important issues. For instance, Lim et al. [2011] showed that CVaR is fragile in portfolio optimization; that is, estimation errors in CVaR may affect optimization results and thus decisions significantly. Also, we did not include any empirical study on VaR and CVaR, which is very important. It is of great meaning to analyze VaR/CVaR-based models and to study the pros and cons of these models in practice using data and information available. Another important theoretical question is the specification of distributions of random variables in risk management models. In the context of this article, we have assumed that an input distribution is predetermined and is given to modelers. However, in practice, it is often difficult to specify the input distribution precisely. A considerable amount of research has been devoted to the issue of uncertainty in models of VaR/CVaR (see, e.g., El Ghaoui et al. [2003], Zymler et al. [2013], Hu and Hong [2012], Hu et al. [2013a], and many others). However, it is far from sufficient and more study on input uncertainty is necessary in the context of financial risk management. Modeling input uncertainty should incorporate information available and should reflect the practice.