Portfolio Selection with a Bayesian Approach Bachelor’s thesis, 15 ECTS Author: Martin de Caprétz Supervisor: Stepan Mazur Spring 2015 Abstract This paper deals with a traditional method for creating portfolios of financial assets known as the mean-variance portfolio theory and especially a specific case of the theory known as the global minimum variance portfolio. A disadvantage with many of the models that exist within portfolio theory is that they do often only consider a few quantified variables in the calculations and this could cause problems in areas such as finance where a lot of the information regarding investments can be difficult to quantify into numbers. In the paper Bayesian statistical methods are used to try to implement the beliefs of an investor and by testing and evaluate different approaches examine if it is possible to find a method that take both calculated variables and an investor’s beliefs into account. Table of Contents 1. Introduction ................................................................................................................................. 3 2. Modern Portfolio Theory............................................................................................................. 4 2.1 Mean –Variance Portfolio Theory ............................................................................................. 4 2.2 The portfolio’s characteristics ................................................................................................... 5 2.3 Global Minimum Variance Portfolio (GMVP) ............................................................................ 6 2.4 Diversification ............................................................................................................................ 7 2.5 Estimating expected return and standard deviation................................................................. 8 3. Data ............................................................................................................................................. 9 4. Bayesian Statistics ..................................................................................................................... 10 4.1 Bayes’ Theorem ....................................................................................................................... 10 4.2 Informative and non-informative priors ................................................................................. 11 4.3 Prior distributions .................................................................................................................... 11 4.4 Posterior distributions ............................................................................................................. 13 5. Results ....................................................................................................................................... 16 5.1 Global Minimum Variance Portfolios ...................................................................................... 16 5.2 Distribution of the weights ...................................................................................................... 18 6. Summary.................................................................................................................................... 23 References ......................................................................................................................................... 24 Appendix............................................................................................................................................ 25 Matrices ......................................................................................................................................... 25 2 1. Introduction The financial markets have grown extremely fast over the last couple of decades, the liquidity of the markets has drastically increased and the number of financial instruments has virtually exploded. This has led to a need and demand for models that try to determine the optimal financial decisions to make in any given situation. One of the earliest and still most popular methods to use when constructing a portfolio is the mean-variance theory; it is a fairly simple way of finding the optimal weights for your assets under relatively simple assumptions. Most theoretical models generally have one thing in common and that is that they are often very strict and impersonal in that they doesn’t take all the factors that a human investor would into consideration. It could be specific personal beliefs in the future of a market, specific conditions that might occur or rumors about future events that aren’t possible to quantify. The purpose of this paper is to see if a model can be created for which both the theoretical model and the subjective beliefs of an investor are taken into consideration while constructing the portfolio. This paper will use one special case of the mean-variance portfolio theory known as the global minimum variance portfolio; it is a method to find the combination of the available assets with the lowest possible risk. In the paper five different indices from Morgan Stanley Capital International (MSCI) will be used as assets and the indices themselves will not be of important in any other way other than that they will be used to demonstrate the results from the calculations. In the second part we will use a Bayesian approach to try and implement the subjective beliefs into the models. Four different ways will be examined and analyzed to see if they manage to take the investor’s beliefs into account and create a portfolio which utilizes both theoretical data and personal beliefs. All the formulas in the paper will be expressed in the form of matrices and a short explanation of the matrix operations used in the paper are in the appendix; matrices are throughout the paper denoted by bold letters. 3 2. Modern Portfolio Theory The foundation to modern portfolio theory is considered to be laid with an article by Harry Markowitz (1952). The idea in the article that made it catch on was that Markowitz introduced variance as a measurement of risk and that it should complement the expected return as the main criteria for portfolio construction. The variance of a portfolio of assets depends on the covariance between the assets and that is what made the method plausible. Markowitz later received the Nobel Prize in Economy in 1990 for his contributions to the portfolio theory. The article led to the development of the mean-variance portfolio theory. It is a method were the calculations of the weights for the assets in the portfolio are only based on two factors, the expected return and variance of each asset. Due to the strict relationship between the variance and the standard deviation the latter is often used as a risk measurement as well. 2.1 Mean –Variance Portfolio Theory Mean-variance is in its simplest form a very easily calculated method and the purpose of the method is to make more efficient investment decisions. The fundamental task with the method is to find an efficient relation between the expected return on your capital and the risks that you have to take to get it. Figure 1 A B C D Expected Return Standard deviation The basic idea of the theory can be shown in a simple figure with four different assets represented by their expected return and standard deviation. The optimal scenario would be to have a high expected return with a low standard deviation thus placing us in the top left corner. Of the four assets in the figure asset A is the one that would be considered the best one since it has got the same expected return as B but with lower standard deviation and it has got the same standard deviation as C but a 4 higher expected return. The same arguments could be used to argue for that D is the worst asset of the four. It is however not possible to make any conclusions on if B or C is to be preferred, that decision depends on the investor’s personal preferences when it comes to risk. Asset B is the more risky asset but it is compensated with a higher expected return and if you are not risk averse you would consider B to be better. 2.2 The portfolio’s characteristics The characteristics for a portfolio with n assets are calculated through two common statistical formulas. The expected return for the portfolio is calculated by multiplying the weights invested in each asset with that assets expected return and sum the factors ๐ (1) ๐๐ = ∑ ๐ค๐ ๐๐ , ๐=1 where ๐ ∑ ๐ค๐ = 1. ๐=1 The variable ๐๐ is the expected returns and ๐ค๐ represents the percentage weights, they weights have to add up to one to represent a full portfolio. It is possible for the ๐ค๐ to be negative or larger than one if short sales are allowed. Short sales are a negative position in an asset which in reality means that an investor is borrowing and selling the asset and this will give a profit for the investor if the assets price falls since eventually the asset has to be bought back to return to the lender. The variance of the portfolio is calculated by multiplying the squared weights of an asset with the variance of that asset and this is done for all assets. In a portfolio of multiple assets the correlation between the assets has to be taken into account and this is done by multiplying the weights of two assets with their covariance ๐๐๐ . These calculations can be expressed in an equation as ๐ ๐๐2 ๐ = ∑ ∑ ๐ค๐ ๐ค๐ ๐๐๐ . ๐=1 ๐=1 5 (2) Equation 1 and 2 can also be expressed by matrices as ๐๐ = ๐๐ป ๐น, (3) ๐๐2 = ๐๐ป ๐ฎ๐, (4) where ๏ท w is a vector of the weights; ๏ท R is a vector of the expected returns; ๏ท Σ is the covariance matrix. 2.3 Global Minimum Variance Portfolio (GMVP) The GMVP is the specific portfolio of a number of assets where the variance is minimized ๐บ๐๐๐ = min(๐๐ป ๐ฎ๐) ๐ ๐ข๐โ ๐กโ๐๐ก ๐๐ ๐ = 1. (5) It is easy to calculate by using modern computer software and if short sales are allowed it can also be found through the expression ๐๐ฎ๐ด๐ฝ = ๐ฎ−๐ ๐ . ๐๐ป ๐ฎ−๐ ๐ (6) The true covariance matrix Σ is in reality never known so an estimate S will be used instead and it is calculated by the equation ๐ 1 ฬ )(๐ฟ๐ − ๐ฟ ฬ )๐ ๐บ= ∑(๐ฟ๐ − ๐ฟ ๐−1 ๐=1 with ๐ ฬ = ๐ฟ 1 ∑ ๐ฟ๐ , ๐−1 ๐=1 where ๐ฟ๐ is the independent observations of daily return for the assets. 6 (7) 2.4 Diversification Diversification is the reason to why it is possible to create a GMVP and the essence behind it is equivalent to the saying “don’t put all your eggs in the same basket” There are different types of risks associated with an investment in different market and in specific asset. There are risks that originate from the general economy such as interest rates, exchange rates, inflation and business cycles. These are all macroeconomic factors and none of them can be predicted with certainty but they affect all companies and commodities in some way. Besides these risks for the broader economy there are risks associated with specific assets, a mining company might be exposed to risks in the price on minerals, a farmer on the other hand might consider the weather the biggest risk for their business. Just holding one asset in a portfolio makes you very exposed for the risks associated with that specific asset; if you instead include two assets in the portfolio from companies with very different businesses you would be able to reduce the overall risk for your portfolio. There is of course no reason to stop with just two stocks; it would be possible to continue adding assets to the portfolio and thus spreading the asset-specific risks. There is however no way to reduce risk all together, even with a very large number of assets there would still be an exposure to risk of the general market. There is no way to diversify the portfolio in such a way so that it is made risk neutral to all the risks associated with the general economy. If there were such an portfolio the expected return of that portfolio would due to arbitrage theory be equal to the risk free rate in the economy. When common sources of risk are connected with all assets in the portfolio, risk cannot be reduces altogether even with extensive diversification. The risk that remains after diversification is called market risk or systematic risk. The risk that we can diversify away is called asset-specific risk or nonsystematic risk. This could be expressed mathematically, if all covariances ๐๐๐ are assumed to be positive and an equal 1 amount ๐ is invested in the n different assets then ๐ ๐๐2 ๐ ๐ ๐ ๐ ๐๐๐ 1 ๐−1 1 ๐−1 ๐−1 1 = ∑( )2 ๐๐2 + ∑ ∑ (๐1 )(๐1 )๐๐๐๐−1= ๐ฬ 2 + ∑∑ = ๐ฬ 2 + ๐ฬ ๐๐ , ๐ ๐ ๐ ๐(๐ − 1) ๐ ๐ ๐=1 ๐=1 ๐ฝ=1 ๐≠๐ (8) ๐=1 ๐=1 ๐≠๐ where ๐ฬ is the mean of the variances and ๐ฬ ๐๐ is the mean of the covariances between the assets. In the case when n→ ∞ it holds that ๐๐2 → ๐ฬ ๐๐ . The variance of the portfolio would equal the average of the covariances and this can be illustrated through figure 2. 7 Portfolio Variance Diversification 1 0.8 0.6 0.4 0.2 0 1 6 11 16 21 26 31 36 41 46 Number of Assets Figure 2 2.5 Estimating expected return and standard deviation The main question that hasn’t been answered so far and the constant problem with predictions for the future are how the expected returns and standard deviations should be estimated. It is difficult to predict the expected return on an asset for the future since there are many factors that could affect it and very often asset prices are considered to be stochastic and thus making it impossible to predict the expected return. If an investor is well informed and follows the market closely then that person might have a personal belief for what the future holds for different companies and those beliefs could be used as estimates. There are also companies that make opinion polls regarding expectations in the market and they usually asks investors about their beliefs of the future stock prices and later derives statistics from the polls. There are theoretical methods for calculating the markets expected return on assets and the most prominently is probably the Black-Litterman model. The easiest and often the easiest approach is to base the predictions of the future on the past; it is a unsophisticated approach since there is no guarantee that a company or an asset that has been producing a high return in the past will continue to produce a high return. The focus of this paper is to find a way to combine a traditional portfolio model with personal beliefs of an investor, the expected returns and variances will therefore just be used to illustrate the calculations. This means that the method used to estimate the expected returns and variances will not be of any important to the results in the paper and thus the simplest approach to estimate these variables will be used and that is to estimate them from the historical data. 8 3. Data The data that will be used in this paper are from the MSCI (Morgan Stanley Capital International). They are one of the world’s leading providers of support tools for investment decisions and they compute indices for different capital markets around the world which are frequently used as benchmarks or as tracks for fund managers. The time series for the indices are, as is standard with financial time series, converted into log-returns according to the formula ๐๐ก = ln ( ๐๐ก ), ๐๐ก−1 (9) where St is the index value for the day t and St-1 is the index value for day t-1. The indices used in this paper have been chosen because they are major economies in the world or because they were in another way deemed interesting. The five indices are for Germany, Japan, the Nordic countries, the United Kingdom and the United States and the observations are from all business days in the period between January 1, 2010 and December 31, 2013. The data will be split into two subsets, one set contains data for the years 2010 and 2011 and the second set is for the years 2012 and 2013. The reason to why the data is split into two samples is that the first period will serve as a simulation of the investor’s beliefs while the second period will be considered as the data for which the mean-variance analysis would have been done in January of 2014. 9 4. Bayesian Statistics The most common approach in traditional statistical theory is that there is a parameter θ that one wishes to estimate, give confidence intervals or do hypothesis tests on. This parameter is usually considered to be fixed but unknown, within the Bayesian field of statistics this parameter will instead be a stochastic variable with some distribution, known as the prior distribution. The prior distribution can be based on previous estimates of θ or could be a subjective belief about the likelihood of different values on θ. It is also possible for it to be uniformly distributed and thus show no or very limited information about the distribution of the parameter θ. The traditional way of formulating a statistical model for n observations (x1,….,xn) is to treat them as turn-outs of stochastic variables (X1,….,Xn) with a distribution that depends on the parameter θ. With a continuous distribution it would be expressed as ๐๐1 ,…,๐๐ (๐ฅ1 , … , ๐ฅ๐ ; θ) (๐ฅ1 , … , ๐ฅ๐ ) ∈ ๐ ๐ . The probabilities for getting the data we have got is usually calculated by the likelihood function ๐ฟ(๐ฅ1 , … , ๐ฅ๐ ; θ), in short the Maximum Likelihood method means that an estimate θฬ is the value that maximizes ๐ฟ(๐ฅ1 , … , ๐ฅ๐ ; θ). 4.1 Bayes’ Theorem The fundamental part of Bayesian statistics is Bayes’ theorem which can be expressed as ๐(๐ด|๐ต) = ๐(๐ต|๐ด)๐(๐ด) , ๐(๐ต) (10) where ๏ท P(A|B) is the conditional probability and is the beliefs in A when we take B into account; ๏ท P(A) is the prior probability, what are the beliefs for that A happens; ๏ท P(B|A)/P(B) is a quotient that represents the support B gives to A. Often in Bayesian inference the event B is fixed and the effects of varying A are what one is interested in. We can from Bayes theorem show that the posterior probabilities are proportional to the numerator, this leads to that the posterior is proportional to the prior times the likelihood and it can mathematically be expressed as ๐(๐ด|๐ต) ∝ ๐(๐ด) โ ๐(๐ต|๐ด) 10 or in words as ๐๐๐ ๐ก๐๐๐๐๐ ∝ ๐๐๐๐๐ โ ๐๐๐๐๐๐โ๐๐๐. 4.2 Informative and non-informative priors One aspect about Bayesian inference that makes it special is that you have to guess of assume the distribution of the parameters you are estimating and there is many ways in which this can be done. You could for example have some experience about the area you are investigating and thus have a fairly good belief of how these parameters vary; it would then be possible for you to choose a specific distribution to go with your beliefs. If you on the other hand don’t have much information about the distribution of the parameters you estimate then it would be possible to use a more diffuse distribution where you keep the options open about the distribution and base your assumption solely on the gathered observations. The distributions that you choose for the parameters are known as prior distributions or simply priors and this is because you choose them before you actually have any data to work with. It is usually possible to divide the prior distribution into two categories depending on the amount of information you include in your prior, the priors are either informative or non-informative. In an informative prior we express specific information about the parameters that we might know or have a strong belief in. The non-informative prior on the other hand are vaguer and expresses very limited information about our parameters. 4.3 Prior distributions Four priors will be considered in this paper and they are all from a paper by Bodnar et al. (2015) and a summary of them will be issued here. We will consider the linear transformations of the GMVP weights θ ๐ฝ = ๐ณ๐๐บ๐๐ , where L is an arbitrary p × k matrix of non-zero constants with p < k. 11 4.3.1 Priors for μ and Σ The first two priors focus on statistical models for the average returns μ and the covariance matrix Σ. The first prior is a standard diffuse prior, applied in portfolio theory by Barry(1974), Brown(1976) and Klein and Bawa(1976). This is a non-informative prior and its densities is given by ๐๐ (๐, ๐ฎ) ∝ |๐ฎ|− (11) ๐+1 2 . The second prior is a conjugate prior proposed by Frost and Savarino(1986), the conjugate is an informative prior with a normal prior for μ(conditional on Σ) and an inverse Wishart prior for Σ, the joint prior for these two is ๐๐ (๐, ๐ฎ) ∝ |๐ฎ|− ๐๐ +1 ๐ ๐ 2 exp {− 1 (๐ − ๐๐ )๐ ๐ฎ−1 (๐ − ๐๐ ) − ๐ก๐[๐บ๐ ๐ฎ−1 ]}, 2 2 (12) where ๏ท ๐๐ is the prior mean; ๏ท ๐ ๐ is a parameter representing the precision of ๐๐ ; ๏ท ๐๐ is a parameter representing the precision of Σ; ๏ท ๐บ๐ is a known prior matrix of Σ. 4.3.2 Priors for the weights The next two priors make statements directly about the portfolio weights. This can tend to make more sense from an investors perspective since it is natural for them to have preferences about the composition of their portfolios rather than about average returns or covariance matrices. The Jefferys non-informative prior which is given by ๐ ๐ ๐๐ (๐ฝ, ๐ณ, ฯ) ∝ ฯ2−1 |๐ณ|−2−1 . Both Ψ and ฯ represents linear transformations of the covariance matrix but they will not be considered in greater detail since they will not affect the later calculations. The last prior is an informative prior similar to the one developed by Tunaru (2002): 1 ๐ฝ~๐๐ (๐๐ผ , ๐ณ−1 ), ฯ 12 (13) ๐ณ~๐๐ (๐๐ผ , ๐บ๐ผ ), ฯ~๐บ๐๐๐๐(๐ฟ1 , 2๐ฟ2 ), where ๏ท ๐๐ผ is the prior mean; ๏ท ๐๐ผ is a parameter representing the precision of Ψ; ๏ท ๐บ๐ผ is a known prior matrix of Σ; ๏ท ๐ฟ1 ๐๐๐ ๐ฟ2 are prior constants. 4.4 Posterior distributions The posterior distributions are calculated from the priors by multiplying the prior distribution with the likelihood function for the data and subsequently integrate out unwanted parameters. The theoretical calculations are done by Bodnar et al. (2015) and the results from those calculations will be used in this paper. For all of the posteriors expressed below Xi is considered to be independent and identically distributed with Xi ~ Nk(μ,Σ).. 4.4.1 Models based on μ and Σ The posterior for θ under the diffuse prior is a multivariate t-distribution expressed as ฬ; ๐ฝ|๐ฟ1 … ๐ฟ๐ ~ ๐ก๐ (๐ − 1; ๐ฝ 1 ๐ณ๐น๐ ๐ณ๐ ), ๐ − 1 ๐๐ ๐บ−1 ๐ (14) where ฬ= ๐ฝ ๐ณ๐บ−1 ๐ , ๐๐ ๐บ−1 ๐ ๐น๐ = ๐บ−1 − ๐บ−1 ๐๐๐ ๐บ−1 . ๐๐ ๐บ−1 ๐ The t-distribution is expressed through three parameters, the first is the degrees of freedom, the second is the mean vector and the third is dispersion matrix which has got a similar function as the covariance matrix in the normal distribution. The posterior for θ under the conjugate prior also results in a multivariate t-distribution 13 ๐ฝ|๐ฟ1 … ๐ฟ๐ ~ ๐ก๐ (๐๐ + ๐ − ๐ − 1; ๐ณ๐ฝ−1 1 ๐ณ๐น๐ ๐ณ๐ ๐ ๐ ; ), ๐ −1 ๐๐ ๐ฝ−1 ๐ ๐ ๐๐ + ๐ − ๐ − 1 ๐ ๐ฝ๐ ๐ (15) where ๐๐ = ฬ + ๐ ๐ ๐๐ ๐๐ฟ , ๐ + ๐ ๐ ฬ ๐ฟ ฬ ๐ + ๐ ๐ ๐๐ ๐๐๐ , ๐ฝ๐ = (๐ − 1)๐บ + ๐บ๐ + (๐ + ๐ ๐ )๐๐ ๐๐๐ + ๐๐ฟ ๐น๐ = ๐ฝ−1 ๐ ๐ −1 ๐ฝ−1 ๐ ๐๐ ๐ฝ๐ − . ๐๐ ๐ฝ−1 ๐ ๐ 4.4.2 Models based on the weights The posterior that have been derived under the Jeffreys non-informative prior for the GMVP weights θ are ฬ; ๐|๐ฟ1 … ๐ฟ๐ ~ ๐ก๐ (๐ − ๐ + ๐; ๐ (16) 1 ๐ณ๐น๐ ๐ณ๐ ). ๐ − ๐ + ๐ ๐๐ ๐บ−1 ๐ The posterior is a multivariate t-distribution just as we have had on previous two posterior distributions; the only difference to the posterior under the diffuse prior is the degrees of freedom. The posterior for the informative prior is in contrast to the others not a t-distribution and it is expressed as (๐−๐+2๐+2๐ฟ1 ) 2 ๐ −1 −1 ๐๐ผ (๐|๐ฟ1 … ๐ฟ๐ ) ∝ [(๐ฝ − ๐๐ผ )๐ (๐บ−1 ๐ผ + (๐ − 1)(๐ณ๐น๐ ๐ณ ) ) (๐ฝ − ๐๐ผ )] (17) (๐ − ๐ + 2๐ + 2๐ฟ1 ) (๐ + 2๐ฟ1 − ๐๐ผ + 1) ∗๐( ; ; ๐(๐ฝ)), 2 2 where U(โ;โ;โ) is an confluent hypergeometric function expressed by Abramowitz and Stegun(1972) and 14 −1 ฬ )๐ (๐ณ๐น๐ ๐ณ๐ )−1 (๐ฝ − ๐ฝ ฬ ) + (๐๐ ๐บ−1 ๐)−1 ) + ๐ฟ2 ๐ − 1 ((๐ฝ − ๐ฝ ๐ − 1. ๐(๐ฝ) = ๐ )−1 )−1 (๐ฝ − ๐ ) 2 (๐ฝ − ๐๐ผ )๐ (๐บ−1 (๐ + − 1)(๐ณ๐น ๐ณ ๐ ๐ผ ๐ผ This expression is difficult to compute so a stochastic representation for ๐ฝ will be used and it can be expressed as 1 1 ๐ฝ = ๐๐ผ + ฯ−2 (๐ฝ๐ผ )2 ๐0 , (18) where ๐0 ~๐(๐๐ , ๐ฐ๐ ), ๐ − ๐ + 2๐ + 2๐ฟ1 2 ฯ ~ ๐บ๐๐๐๐ ( , ), 2 โ๐ผ ๐ ~ ๐บ๐๐๐๐ ( ๐ − ๐ + ๐ + ๐๐ผ − 1 , 2), 2 with ๐ −1 −1 ๐ท1 = (๐บ−1 ๐ผ + (๐ − 1)(๐ณ๐น๐ ๐ณ ) ) , ๐ท2 = (๐ − 1)(๐ณ๐น๐ ๐ณ๐ )−1 , ๐ = ๐ฟ2−1 + (๐ − 1)(๐๐ ๐บ−1 ๐)−1 , ๐ฝ๐ผ = (๐๐ท1 + ๐ท2 )−1 , ฬ ), ๐๐ผ = (๐๐ท1 + ๐ท2 )−1 (๐๐ท1 ๐๐ผ + ๐ท2 ๐ฝ ฬ ๐ ๐ท2 ๐ฝ ฬ − ๐๐ผ ๐ ๐ฝ๐ผ −1 ๐๐ผ . ๐๐ผ = ๐ + ๐๐๐๐ผ ๐ท1 ๐๐ผ + ๐ฝ For the posteriors there are some assumptions that have to be considered and they will be the same as in the paper by Bodnar et al. (2015) ๐ฟ1 = 1; ๐ฟ2 = 0,5; ๐๐ = ๐ ๐ = ๐๐ผ = ๐. L is set equal to the basis vector ei for each dimension which means that for the first dimension it equals ๐1 = [1 0 0 0 0]. This will allow for the distributions to be plotted individually as independent t-distribution. 15 5. Results 5.1 Global Minimum Variance Portfolios The first results are from the mean-variance analysis of the historic data, the data were split into two different samples. The first one consists of the period from 1st of January 2010 to 31st of December 2011 and the second is from 1st of January 2012 to 31st of December 2013, both periods contain 522 observations each. All calculations and simulations of the graphs in this part have been done in R. The average daily log-return for the first period is displayed in table 1. Table 1 Germany Japan Average log-return(×10-4) -1,253 Nordic UK -4,646 0,019 US 0,491 2,356 The most noticeable aspect of this result is that the results differ a lot for different indices; both Germany and Japan have had a negative return for the period while the US have been doing a lot better. All of the average daily returns are very close to zero and it is not uncommon in some financial calculations to assume that the average daily return are zero for short time periods. The covariance matrix for the period is summarized in table 2. Table 2 Germ. Japan Nordic UK US Germany 2,275 0,499 1,894 1,617 1,416 Japan 0,499 1,564 0,452 0,405 0,238 Nordic 1,894 0,452 1,945 1,479 1,305 UK 1,617 0,405 1,479 1,438 1,090 US 1,416 0,238 1,305 1,090 1,675 These two results are enough to calculate the GMVP based on the first periods results. This is done by finding the portfolio that minimizes the portfolio variance and the weights of that portfolio is presented in table 3. Table 3 Germ. Japan Nordic UK GMVP weights -0,394 0,427 -0,030 US 0,639 0,358 16 The portfolio has a large negative weight in the German index and just a slight negative weight in the Nordic index. The remaining three indices have solid positive weights which suggest positive investments in those markets. This result is a bit peculiar in that the result stipulates that a lot of the investments should be in the asset with the lowest average return for the period. This is due to the fact that in the definition of the GMVP the only criteria used are that the portfolio variance should be minimized and no consideration is taken to the expected return. The main reason to the large weight in the Japanese index is due to the fact that it has got a weaker correlation with the other markets and including it in the portfolio results in a larger diversification effect. The fact is that if the expected return for the portfolio with the weights above were to be calculated it would actually be a negative expected return which might not be what an investor is looking for. The same calculations are carried out for the second period and they gave following results: Table 4 Germ. Japan Average log-return(×10-4) 8,227 Nordic UK 11,312 6,072 US 3,622 7,434 The average returns are presented in table 4 and they are a lot higher for this period and the two indices with the worst return for the previous period now have the two highest returns. The weak results for the previous period can largely be explained by the euro crisis in 2011 when a number of countries within the Euro-zone were in severe financial problems and that caused concerns on the financial markets. Table 5 Germ. Japan Nordic UK US Germany 1,065 0,256 0,806 0,694 0,472 Japan 0,256 1,622 0,280 0,221 0,121 Nordic 0,806 0,280 0,819 0,600 0,393 UK 0,694 0,221 0,600 0,653 0,359 US 0,472 0,121 0,393 0,359 0,547 The covariance matrix is presented in table 5 and it can be noted that it is weaker covariances and variances in this matrix compared to the one for the previous period and this is also an effect of the more stable developments on the markets during the second period. It is a well-established fact within finance that markets tend to have higher correlation in times of trouble. 17 The calculations of the GMVP for this are presented in table 6. Table 6 Germ. Japan Nordic UK GMVP weights -0,310 0,167 0,136 US 0,442 0,565 It is still a negative weight for the German index which means that short selling it is the best option under these conditions. All of the other indices have positive weights and most of the investments should be placed in the UK and US indices. There are quite clear differences between the two periods which is interesting for the future calculations, the first period will now serve as the prior information and that is the beliefs of the investor while the second period will be the GMVP for the start of 2014 based on the last two years of data. 5.2 Distribution of the weights For the diffuse prior the distribution for the weights are multivariate t-distributed with 521 degrees of freedom, the means will be identical to the weights calculated for the GMVP. A dispersion matrix is also calculated and it indicates how the distribution varies around its mean. The means are presented in table 7 and the dispersions in table 8. Table 7 Germ. Means Table 8 Japan Nordic UK -0,310 0,167 0,136 US 0,442 0,565 Germ. Japan Nordic UK Dispersions (×10−3 ) 3,364 0,446 4,244 US 4,094 1,736 Figure 3 18 The distribution for the Japanese weight has the lowest dispersion and thus results in a thin and tall distribution; the distribution for the US weight is also slightly thinner and taller than the remaining three indices which are all quite similar in shape. This posterior distribution is based on a noninformative prior and does not take the investor’s beliefs into account; it is instead solely based on the GMVP results. The second prior introduced was the conjugate prior which is an informative prior; the posterior distribution is a multivariate t-distribution just as in the case of the diffuse prior. The results are a bit different since this posterior doesn’t use the GMVP weights as the means of the posterior distributions, the degrees of freedom and the dispersions are also different compared to the diffuse prior. The means for the posterior distributions are presented in table 9 and the dispersions in table 10. Table 9 Germ. Means Table 10 Japan Nordic UK -0,320 0,165 0,137 US 0,461 0,557 Germ. Japan Nordic UK Dispersions (×10−3 ) 1,707 0,227 2,166 US 2,069 0,882 Figure 4 19 In a comparision between the diffuse and conjugate the means differs slightly for the distributions but the main difference is that the conjugate prior produces distributions with a lower dispersion for all five distributions, the conjugates dispersions are about half of the ones for the diffuse prior. The similaritites in the results mean that the conjugate have failed to incorporate the prior information in a satisfying way since the beliefs of the investor is not visible in the results from the posterior distribution. The first two distribution were derived from prior assumptions about the distribution of the mean and covariance matrix, the following two distributions are derived directly from prior assumptions about the GMVP weights. This distribution was derived from the Jeffreys non-informative prior and as the previous two it is also expressed through a multivariate t-distribution. The result for this distribution is very similar to the one for the diffuse prior, the only difference is the number of degrees of freedom. The means and dispersions are presented in table 11 and 12 Table 11 Germ. Means Table 12 Japan Nordic UK -0,310 0,167 0,136 US 0,442 0,565 Germ. Japan Nordic UK Dispersions (×10−3 ) 3,338 0,448 4,269 US 4,118 1,746 Figure 5 20 This result is almost identical with the diffuse priors in that the distribution for the Japanese index is tall and thin, the distribution for the US index is a bit taller than the remaining three are all similar to each other. This is not surprising since they are both based on non-informative priors and are thus only based on the distribution for the actual observations. The last posterior distribution is for the informative prior with respect to the GMVP weights, the characteristics of the distribution is based on 25.000 simulations from the stochastic representation in equation 17 and the results from the simulations are presented in table 13 and 14. Table 13 Germ. Means Table 14 Japan Nordic UK -0,380 0,247 -0,005 US 0,599 0,431 Germ. Japan Nordic UK Variances (×10−2 ) 5,433 2,940 6,421 US 8,089 5,775 Figure 6 It is clearly noticeable that the distributions here are significantly wider and not as spiky, the prior information has been taken into account and the center for the distributions are located approximately halfway between the investor’s beliefs and the weights for the GMVP. Table 15 shows the comparison between the investor’s beliefs, the weights from the GMVP calculations and the means of the posterior distribution with the informative prior. 21 Table 15 Investor’s beliefs GMVP weights Posterior weights Scaled weights Germany - 0,394 - 0,310 - 0,380 -0,426 Japan 0,427 0,167 0,247 0,277 Nordic - 0,030 0,136 - 0,005 -0,006 UK 0,639 0,442 0,599 0,672 US 0,358 0,565 0,431 0,483 Sum 1 1 0,892 1 It seems as the results are in general closer to the investor’s beliefs but there is a problem in that the weights from the simulation of the posterior distribution do not sum up to one as is necessary to have a full portfolio. The weights would have to be adjusted in a way so that the sum will be one and the easiest way of doing so is to divide all the posterior weights with 0,892, this will result in the scaled weights presented in the table above. It is possible to consider different methods for the scaling of the weights since the method used here give that the weight in the German index is actually smaller than both the investor’s belief and the GMVP weight, the opposite is true for the UK index were the scaled weight is larger than both the investor’s belief and the GMVP weight. 22 6. Summary The time series for the indices were divided into two periods and a mean-variance analysis was conducted for both periods. The result for the first period served as a representation for what could have been an investor’s beliefs for the future while the second period served as the historic data for which the weights of a portfolio based on traditional portfolio theory is derived from. Four different prior distributions were introduced; two priors were based on the μ and Σ with one informative and one non-informative prior. The posterior distributions for these priors were calculated and they turned out to be very similar which wasn’t in line with the purpose since the informative prior failed to incorporate the information that represented the investor’s beliefs. The remaining two priors were instead based directly on the GMVP weights and just as with the previous two there was one informative and one non-informative prior. The non-informative prior posed a result very similar to the first two priors while the informative prior instead used the prior information to create distributions which considered both the investors beliefs and the result from the mean-variance estimations. There is a flaw with the use of the informative prior and that is that the weights of the simulated distributions don’t sum up to one, it is easy to scale the weights so that they do sum up to one but this causes different problems. A better method for the scaling can probably be used and it could be and interesting continuation of this paper. We can make the conclusion that the informative prior was the only prior that created a result that was consistent with the goal of the paper and could be a possible alternative in creating more personalized portfolios. 23 References Abramowitz M. and Stegun I. A. (1972). “Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables”. New York, Dover. Barry C.B. (1974). “Portfolio Analysis under uncertain means, variances, and covariances”. Journal of Finance 29, 515-522 Bodnar T., Mazur S. and Okhrin Y. (2015). “ Bayesian Estimation of the Global Minimum Variance Portfolio”, Working Paper. Brown S. J. (1976). Optimal Portfolio Choice under Uncertainty: A Bayesian Approach. Ph.D. Diss., University of Chicago Frost P.A. and Savarino J.E. (1986). “An empirical Bayes approach to efficient portfolio selection”. Journal of Financial and Quantitative Analysis 21, 293-305. Klein R.W. and Bawa V.S. (1976). “The effect of estimation risk on optimal portfolio choice”. Journal of Financial Economics 3, 215-231 Markowitz H.M. (1952). “Mean-variance analysis in portfolio choice and capital markets”. Journal of Finance 7, 77-91. MSCI. ”MSCI Index Performance”. [2014-10-15]. <http://www.msci.com/products/indexes/country_and_regional/dm/performance.html> Tunaru R. (2002). “Hierarchial Bayesian models for multiple count data”. Austrian Journal of Statistics 31, 221-229 24 Appendix Matrices The majority of formulas used in the paper are expressed in the form of matrices and they generally simplify the calculations when we use computers to perform our calculations. A few basic definitions and applications of matrix calculations will be introduced below and matrices will throughout the paper be defined with a bold letters. A matrix is a rectangular array consisting of a number of rows and columns, a matrix with three rows and three columns would look like this: ๐ ๐ ๐ด=[ ๐ ๐ ๐ โ ๐ ๐] ๐ A matrix with the same number of rows as columns is known as a square matrix. A vector is a matrix that only consists of one column (or row) and thus could look like ๐ ๐ฝ = [๐ ] ๐ A transpose of a matrix is when it is tipped over so that the first row becomes the first column, the transpose of the matrix M is popularly noted with a superscript T as MT and it would be ๐ ๐ด๐ป = [ ๐ ๐ ๐ ๐ ๐ ๐ โ] ๐ ๐ฝ๐ป = [ ๐ ๐ ๐] The transpose of the vector V is Two matrices of the same size can be added together by simply adding the corresponding elements and the result will be a matrix of the same size as the first two. ๐จ+๐ฉ = [ ๐ ๐ ๐ ๐ ]+[ ๐ ๐ 25 ๐ ๐+๐ ]=[ โ ๐+๐ ๐+๐ ] ๐+โ Multiplication of matrices is only possible if the number of columns in the left matrix equals the number of rows in the right matrix. The values in the matrix product are given by the formula ๐ [๐จ๐ฉ]๐,๐ = ∑ ๐ด๐,๐ ๐ต๐,๐ ๐=1 where A have got n columns and B have n rows, the i and j represents the row and column of the product matrix. Two 2x2-matrices would give calculations like these: [ ๐ ๐ ๐ ๐ ][ ๐ ๐ ๐ ๐๐ + ๐๐ ]=[ โ ๐๐ + ๐๐ ๐๐ + ๐โ ] ๐๐ + ๐โ It is worth noticing that matrix multiplication is not commutative which means that AB ≠ BA which is demonstrated by the result below [ ๐ ๐ ๐ ๐ ][ โ ๐ ๐๐ + ๐๐ ๐ ]=[ ๐๐ + โ๐ ๐ ๐๐ + ๐๐ ] ๐๐ + โ๐ The identity matrix is a square matrix that consists of ones on the main diagonal (the upper left to the bottom right) and zeros everywhere else. It is denoted as ๐ฐ๐ . 1 ๐ฐ3 = [0 0 0 0 1 0] 0 1 The inverse of a matrix is a corresponding matrix that if it is multiplied with the original gives the identity matrix. The inverse of A is denoted with the superscript -1 as A-1. ๐จ๐จ−1 = ๐ฐ๐ The determinant of a matrix can be defined through the Leibniz formula ๐ det(๐จ) = |๐จ| = ∑ ๐ ๐๐(๐ฅ) ∏ ๐๐,๐ฅ๐ ๐ฅ∈๐๐ ๐=1 Where the sum is computed over all permutations of the set {1,2,…,n} and the set of all such permutations is Sn, the x:s are the permutations within Sn. Sgn(x) is a function that either takes the value +1 or -1, it is +1 whenever the reordering given by x can be achieved by successively interchanging two entries an even number of times, and −1 whenever it can be achieved by an odd number of such interchanges 26 The trace of a matrix is the sum of the elements on the main diagonal. ๐ ๐ก๐(๐จ) = ๐ก๐ [ ๐ ๐ ๐ ๐ โ 27 ๐ ๐]=๐+๐+๐ ๐