BachThesisStat

advertisement
Portfolio Selection with a Bayesian Approach
Bachelor’s thesis, 15 ECTS
Author: Martin de Caprétz
Supervisor: Stepan Mazur
Spring 2015
Abstract
This paper deals with a traditional method for creating portfolios of financial assets known as the
mean-variance portfolio theory and especially a specific case of the theory known as the global
minimum variance portfolio. A disadvantage with many of the models that exist within portfolio
theory is that they do often only consider a few quantified variables in the calculations and this could
cause problems in areas such as finance where a lot of the information regarding investments can be
difficult to quantify into numbers.
In the paper Bayesian statistical methods are used to try to implement the beliefs of an investor and by
testing and evaluate different approaches examine if it is possible to find a method that take both
calculated variables and an investor’s beliefs into account.
Table of Contents
1.
Introduction ................................................................................................................................. 3
2.
Modern Portfolio Theory............................................................................................................. 4
2.1 Mean –Variance Portfolio Theory ............................................................................................. 4
2.2 The portfolio’s characteristics ................................................................................................... 5
2.3 Global Minimum Variance Portfolio (GMVP) ............................................................................ 6
2.4 Diversification ............................................................................................................................ 7
2.5 Estimating expected return and standard deviation................................................................. 8
3.
Data ............................................................................................................................................. 9
4.
Bayesian Statistics ..................................................................................................................... 10
4.1 Bayes’ Theorem ....................................................................................................................... 10
4.2 Informative and non-informative priors ................................................................................. 11
4.3 Prior distributions .................................................................................................................... 11
4.4 Posterior distributions ............................................................................................................. 13
5.
Results ....................................................................................................................................... 16
5.1 Global Minimum Variance Portfolios ...................................................................................... 16
5.2 Distribution of the weights ...................................................................................................... 18
6.
Summary.................................................................................................................................... 23
References ......................................................................................................................................... 24
Appendix............................................................................................................................................ 25
Matrices ......................................................................................................................................... 25
2
1. Introduction
The financial markets have grown extremely fast over the last couple of decades, the liquidity of the
markets has drastically increased and the number of financial instruments has virtually exploded. This
has led to a need and demand for models that try to determine the optimal financial decisions to make
in any given situation. One of the earliest and still most popular methods to use when constructing a
portfolio is the mean-variance theory; it is a fairly simple way of finding the optimal weights for your
assets under relatively simple assumptions.
Most theoretical models generally have one thing in common and that is that they are often very strict
and impersonal in that they doesn’t take all the factors that a human investor would into consideration.
It could be specific personal beliefs in the future of a market, specific conditions that might occur or
rumors about future events that aren’t possible to quantify. The purpose of this paper is to see if a
model can be created for which both the theoretical model and the subjective beliefs of an investor are
taken into consideration while constructing the portfolio.
This paper will use one special case of the mean-variance portfolio theory known as the global
minimum variance portfolio; it is a method to find the combination of the available assets with the
lowest possible risk.
In the paper five different indices from Morgan Stanley Capital International (MSCI) will be used as
assets and the indices themselves will not be of important in any other way other than that they will be
used to demonstrate the results from the calculations.
In the second part we will use a Bayesian approach to try and implement the subjective beliefs into the
models. Four different ways will be examined and analyzed to see if they manage to take the
investor’s beliefs into account and create a portfolio which utilizes both theoretical data and personal
beliefs.
All the formulas in the paper will be expressed in the form of matrices and a short explanation of the
matrix operations used in the paper are in the appendix; matrices are throughout the paper denoted by
bold letters.
3
2. Modern Portfolio Theory
The foundation to modern portfolio theory is considered to be laid with an article by Harry Markowitz
(1952). The idea in the article that made it catch on was that Markowitz introduced variance as a
measurement of risk and that it should complement the expected return as the main criteria for
portfolio construction. The variance of a portfolio of assets depends on the covariance between the
assets and that is what made the method plausible. Markowitz later received the Nobel Prize in
Economy in 1990 for his contributions to the portfolio theory.
The article led to the development of the mean-variance portfolio theory. It is a method were the
calculations of the weights for the assets in the portfolio are only based on two factors, the expected
return and variance of each asset. Due to the strict relationship between the variance and the standard
deviation the latter is often used as a risk measurement as well.
2.1 Mean –Variance Portfolio Theory
Mean-variance is in its simplest form a very easily calculated method and the purpose of the method is
to make more efficient investment decisions. The fundamental task with the method is to find an
efficient relation between the expected return on your capital and the risks that you have to take to get
it.
Figure 1
A
B
C
D
Expected
Return
Standard deviation
The basic idea of the theory can be shown in a simple figure with four different assets represented by
their expected return and standard deviation. The optimal scenario would be to have a high expected
return with a low standard deviation thus placing us in the top left corner. Of the four assets in the
figure asset A is the one that would be considered the best one since it has got the same expected
return as B but with lower standard deviation and it has got the same standard deviation as C but a
4
higher expected return. The same arguments could be used to argue for that D is the worst asset of the
four. It is however not possible to make any conclusions on if B or C is to be preferred, that decision
depends on the investor’s personal preferences when it comes to risk. Asset B is the more risky asset
but it is compensated with a higher expected return and if you are not risk averse you would consider
B to be better.
2.2 The portfolio’s characteristics
The characteristics for a portfolio with n assets are calculated through two common statistical
formulas. The expected return for the portfolio is calculated by multiplying the weights invested in
each asset with that assets expected return and sum the factors
๐‘›
(1)
๐‘Ÿ๐‘ = ∑ ๐‘ค๐‘– ๐‘Ÿ๐‘– ,
๐‘–=1
where
๐‘›
∑ ๐‘ค๐‘– = 1.
๐‘–=1
The variable ๐‘Ÿ๐‘– is the expected returns and ๐‘ค๐‘– represents the percentage weights, they weights have to
add up to one to represent a full portfolio. It is possible for the ๐‘ค๐‘– to be negative or larger than one if
short sales are allowed. Short sales are a negative position in an asset which in reality means that an
investor is borrowing and selling the asset and this will give a profit for the investor if the assets price
falls since eventually the asset has to be bought back to return to the lender.
The variance of the portfolio is calculated by multiplying the squared weights of an asset with the
variance of that asset and this is done for all assets. In a portfolio of multiple assets the correlation
between the assets has to be taken into account and this is done by multiplying the weights of two
assets with their covariance ๐œŽ๐‘–๐‘— . These calculations can be expressed in an equation as
๐‘›
๐œŽ๐‘2
๐‘›
= ∑ ∑ ๐‘ค๐‘– ๐‘ค๐‘— ๐œŽ๐‘–๐‘— .
๐‘–=1 ๐‘—=1
5
(2)
Equation 1 and 2 can also be expressed by matrices as
๐‘Ÿ๐‘ = ๐’˜๐‘ป ๐‘น,
(3)
๐œŽ๐‘2 = ๐’˜๐‘ป ๐œฎ๐’˜,
(4)
where
๏‚ท
w is a vector of the weights;
๏‚ท
R is a vector of the expected returns;
๏‚ท
Σ is the covariance matrix.
2.3 Global Minimum Variance Portfolio (GMVP)
The GMVP is the specific portfolio of a number of assets where the variance is minimized
๐บ๐‘€๐‘‰๐‘ƒ = min(๐’˜๐‘ป ๐œฎ๐’˜) ๐‘ ๐‘ข๐‘โ„Ž ๐‘กโ„Ž๐‘Ž๐‘ก ๐’˜๐‘‡ ๐Ÿ = 1.
(5)
It is easy to calculate by using modern computer software and if short sales are allowed it can also be
found through the expression
๐’˜๐‘ฎ๐‘ด๐‘ฝ =
๐œฎ−๐Ÿ ๐Ÿ
.
๐Ÿ๐‘ป ๐œฎ−๐Ÿ ๐Ÿ
(6)
The true covariance matrix Σ is in reality never known so an estimate S will be used instead and it is
calculated by the equation
๐‘›
1
ฬ… )(๐‘ฟ๐‘– − ๐‘ฟ
ฬ… )๐‘‡
๐‘บ=
∑(๐‘ฟ๐‘– − ๐‘ฟ
๐‘›−1
๐‘–=1
with
๐‘›
ฬ…=
๐‘ฟ
1
∑ ๐‘ฟ๐‘– ,
๐‘›−1
๐‘–=1
where ๐‘ฟ๐‘– is the independent observations of daily return for the assets.
6
(7)
2.4 Diversification
Diversification is the reason to why it is possible to create a GMVP and the essence behind it is
equivalent to the saying “don’t put all your eggs in the same basket”
There are different types of risks associated with an investment in different market and in specific
asset. There are risks that originate from the general economy such as interest rates, exchange rates,
inflation and business cycles. These are all macroeconomic factors and none of them can be predicted
with certainty but they affect all companies and commodities in some way. Besides these risks for the
broader economy there are risks associated with specific assets, a mining company might be exposed
to risks in the price on minerals, a farmer on the other hand might consider the weather the biggest risk
for their business.
Just holding one asset in a portfolio makes you very exposed for the risks associated with that specific
asset; if you instead include two assets in the portfolio from companies with very different businesses
you would be able to reduce the overall risk for your portfolio. There is of course no reason to stop
with just two stocks; it would be possible to continue adding assets to the portfolio and thus spreading
the asset-specific risks. There is however no way to reduce risk all together, even with a very large
number of assets there would still be an exposure to risk of the general market. There is no way to
diversify the portfolio in such a way so that it is made risk neutral to all the risks associated with the
general economy. If there were such an portfolio the expected return of that portfolio would due to
arbitrage theory be equal to the risk free rate in the economy.
When common sources of risk are connected with all assets in the portfolio, risk cannot be reduces
altogether even with extensive diversification. The risk that remains after diversification is called
market risk or systematic risk. The risk that we can diversify away is called asset-specific risk or
nonsystematic risk.
This could be expressed mathematically, if all covariances ๐œŽ๐‘–๐‘— are assumed to be positive and an equal
1
amount ๐‘› is invested in the n different assets then
๐‘›
๐œŽ๐‘2
๐‘›
๐‘›
๐‘›
๐‘›
๐œŽ๐‘–๐‘—
1
๐‘›−1
1
๐‘›−1
๐‘›−1 1
= ∑( )2 ๐œŽ๐‘–2 + ∑ ∑ (๐‘›1 )(๐‘›1 )๐œŽ๐‘–๐‘—๐‘›−1= ๐œŽฬ… 2 +
∑∑
= ๐œŽฬ… 2 +
๐œŽฬ…๐‘–๐‘— ,
๐‘›
๐‘›
๐‘›
๐‘›(๐‘› − 1) ๐‘›
๐‘›
๐‘–=1
๐‘–=1 ๐ฝ=1
๐‘–≠๐‘—
(8)
๐‘–=1 ๐‘—=1
๐‘–≠๐‘—
where ๐œŽฬ… is the mean of the variances and ๐œŽฬ…๐‘–๐‘— is the mean of the covariances between the assets.
In the case when n→ ∞ it holds that ๐œŽ๐‘2 → ๐œŽฬ…๐‘–๐‘— .
The variance of the portfolio would equal the average of the covariances and this can be illustrated
through figure 2.
7
Portfolio Variance
Diversification
1
0.8
0.6
0.4
0.2
0
1
6
11
16
21
26
31
36
41
46
Number of Assets
Figure 2
2.5 Estimating expected return and standard deviation
The main question that hasn’t been answered so far and the constant problem with predictions for the
future are how the expected returns and standard deviations should be estimated.
It is difficult to predict the expected return on an asset for the future since there are many factors that
could affect it and very often asset prices are considered to be stochastic and thus making it impossible
to predict the expected return. If an investor is well informed and follows the market closely then that
person might have a personal belief for what the future holds for different companies and those beliefs
could be used as estimates. There are also companies that make opinion polls regarding expectations
in the market and they usually asks investors about their beliefs of the future stock prices and later
derives statistics from the polls.
There are theoretical methods for calculating the markets expected return on assets and the most
prominently is probably the Black-Litterman model.
The easiest and often the easiest approach is to base the predictions of the future on the past; it is a
unsophisticated approach since there is no guarantee that a company or an asset that has been
producing a high return in the past will continue to produce a high return. The focus of this paper is to
find a way to combine a traditional portfolio model with personal beliefs of an investor, the expected
returns and variances will therefore just be used to illustrate the calculations. This means that the
method used to estimate the expected returns and variances will not be of any important to the results
in the paper and thus the simplest approach to estimate these variables will be used and that is to
estimate them from the historical data.
8
3. Data
The data that will be used in this paper are from the MSCI (Morgan Stanley Capital International).
They are one of the world’s leading providers of support tools for investment decisions and they
compute indices for different capital markets around the world which are frequently used as
benchmarks or as tracks for fund managers.
The time series for the indices are, as is standard with financial time series, converted into log-returns
according to the formula
๐‘Ÿ๐‘ก = ln (
๐‘†๐‘ก
),
๐‘†๐‘ก−1
(9)
where St is the index value for the day t and St-1 is the index value for day t-1.
The indices used in this paper have been chosen because they are major economies in the world or
because they were in another way deemed interesting. The five indices are for Germany, Japan, the
Nordic countries, the United Kingdom and the United States and the observations are from all business
days in the period between January 1, 2010 and December 31, 2013.
The data will be split into two subsets, one set contains data for the years 2010 and 2011 and the
second set is for the years 2012 and 2013. The reason to why the data is split into two samples is that
the first period will serve as a simulation of the investor’s beliefs while the second period will be
considered as the data for which the mean-variance analysis would have been done in January of 2014.
9
4. Bayesian Statistics
The most common approach in traditional statistical theory is that there is a parameter θ that one
wishes to estimate, give confidence intervals or do hypothesis tests on. This parameter is usually
considered to be fixed but unknown, within the Bayesian field of statistics this parameter will instead
be a stochastic variable with some distribution, known as the prior distribution. The prior distribution
can be based on previous estimates of θ or could be a subjective belief about the likelihood of different
values on θ. It is also possible for it to be uniformly distributed and thus show no or very limited
information about the distribution of the parameter θ.
The traditional way of formulating a statistical model for n observations (x1,….,xn) is to treat them as
turn-outs of stochastic variables (X1,….,Xn) with a distribution that depends on the parameter θ. With
a continuous distribution it would be expressed as
๐‘“๐‘‹1 ,…,๐‘‹๐‘› (๐‘ฅ1 , … , ๐‘ฅ๐‘› ; θ)
(๐‘ฅ1 , … , ๐‘ฅ๐‘› ) ∈ ๐‘… ๐‘› .
The probabilities for getting the data we have got is usually calculated by the likelihood function
๐ฟ(๐‘ฅ1 , … , ๐‘ฅ๐‘› ; θ), in short the Maximum Likelihood method means that an estimate θฬ‚ is the value that
maximizes ๐ฟ(๐‘ฅ1 , … , ๐‘ฅ๐‘› ; θ).
4.1 Bayes’ Theorem
The fundamental part of Bayesian statistics is Bayes’ theorem which can be expressed as
๐‘ƒ(๐ด|๐ต) =
๐‘ƒ(๐ต|๐ด)๐‘ƒ(๐ด)
,
๐‘ƒ(๐ต)
(10)
where
๏‚ท
P(A|B) is the conditional probability and is the beliefs in A when we take B into account;
๏‚ท
P(A) is the prior probability, what are the beliefs for that A happens;
๏‚ท
P(B|A)/P(B) is a quotient that represents the support B gives to A.
Often in Bayesian inference the event B is fixed and the effects of varying A are what one is interested
in. We can from Bayes theorem show that the posterior probabilities are proportional to the numerator,
this leads to that the posterior is proportional to the prior times the likelihood and it can
mathematically be expressed as
๐‘ƒ(๐ด|๐ต) ∝ ๐‘ƒ(๐ด) โˆ™ ๐‘ƒ(๐ต|๐ด)
10
or in words as
๐‘๐‘œ๐‘ ๐‘ก๐‘’๐‘Ÿ๐‘–๐‘œ๐‘Ÿ ∝ ๐‘๐‘Ÿ๐‘–๐‘œ๐‘Ÿ โˆ™ ๐‘™๐‘–๐‘˜๐‘’๐‘™๐‘–โ„Ž๐‘œ๐‘œ๐‘‘.
4.2 Informative and non-informative priors
One aspect about Bayesian inference that makes it special is that you have to guess of assume the
distribution of the parameters you are estimating and there is many ways in which this can be done.
You could for example have some experience about the area you are investigating and thus have a
fairly good belief of how these parameters vary; it would then be possible for you to choose a specific
distribution to go with your beliefs. If you on the other hand don’t have much information about the
distribution of the parameters you estimate then it would be possible to use a more diffuse distribution
where you keep the options open about the distribution and base your assumption solely on the
gathered observations. The distributions that you choose for the parameters are known as prior
distributions or simply priors and this is because you choose them before you actually have any data to
work with.
It is usually possible to divide the prior distribution into two categories depending on the amount of
information you include in your prior, the priors are either informative or non-informative.
In an informative prior we express specific information about the parameters that we might know or
have a strong belief in. The non-informative prior on the other hand are vaguer and expresses very
limited information about our parameters.
4.3 Prior distributions
Four priors will be considered in this paper and they are all from a paper by Bodnar et al. (2015) and a
summary of them will be issued here.
We will consider the linear transformations of the GMVP weights θ
๐œฝ = ๐‘ณ๐’˜๐บ๐‘€๐‘‰ ,
where L is an arbitrary p × k matrix of non-zero constants with p < k.
11
4.3.1
Priors for μ and Σ
The first two priors focus on statistical models for the average returns μ and the covariance matrix Σ.
The first prior is a standard diffuse prior, applied in portfolio theory by Barry(1974), Brown(1976)
and Klein and Bawa(1976). This is a non-informative prior and its densities is given by
๐‘๐‘‘ (๐, ๐œฎ) ∝ |๐œฎ|−
(11)
๐‘˜+1
2 .
The second prior is a conjugate prior proposed by Frost and Savarino(1986), the conjugate is an
informative prior with a normal prior for μ(conditional on Σ) and an inverse Wishart prior for Σ, the
joint prior for these two is
๐‘๐‘ (๐, ๐œฎ) ∝ |๐œฎ|−
๐œ๐‘ +1
๐œ…๐‘
2 exp {−
1
(๐ − ๐๐‘ )๐‘‡ ๐œฎ−1 (๐ − ๐๐‘ ) − ๐‘ก๐‘Ÿ[๐‘บ๐‘ ๐œฎ−1 ]},
2
2
(12)
where
๏‚ท
๐๐‘ is the prior mean;
๏‚ท
๐œ…๐‘ is a parameter representing the precision of ๐๐‘ ;
๏‚ท
๐œ๐‘ is a parameter representing the precision of Σ;
๏‚ท
๐‘บ๐‘ is a known prior matrix of Σ.
4.3.2
Priors for the weights
The next two priors make statements directly about the portfolio weights. This can tend to make more
sense from an investors perspective since it is natural for them to have preferences about the
composition of their portfolios rather than about average returns or covariance matrices.
The Jefferys non-informative prior which is given by
๐‘
๐‘
๐‘๐‘› (๐œฝ, ๐œณ, ฯš) ∝ ฯš2−1 |๐œณ|−2−1 .
Both Ψ and ฯš represents linear transformations of the covariance matrix but they will not be
considered in greater detail since they will not affect the later calculations.
The last prior is an informative prior similar to the one developed by Tunaru (2002):
1
๐œฝ~๐‘๐‘ (๐’˜๐ผ , ๐œณ−1 ),
ฯš
12
(13)
๐œณ~๐‘Š๐‘ (๐œ๐ผ , ๐‘บ๐ผ ),
ฯš~๐บ๐‘Ž๐‘š๐‘š๐‘Ž(๐›ฟ1 , 2๐›ฟ2 ),
where
๏‚ท
๐’˜๐ผ is the prior mean;
๏‚ท
๐œ๐ผ is a parameter representing the precision of Ψ;
๏‚ท
๐‘บ๐ผ is a known prior matrix of Σ;
๏‚ท
๐›ฟ1 ๐‘Ž๐‘›๐‘‘ ๐›ฟ2 are prior constants.
4.4 Posterior distributions
The posterior distributions are calculated from the priors by multiplying the prior distribution with the
likelihood function for the data and subsequently integrate out unwanted parameters. The theoretical
calculations are done by Bodnar et al. (2015) and the results from those calculations will be used in
this paper.
For all of the posteriors expressed below Xi is considered to be independent and identically distributed
with Xi ~ Nk(μ,Σ)..
4.4.1 Models based on μ and Σ
The posterior for θ under the diffuse prior is a multivariate t-distribution expressed as
ฬ‚;
๐œฝ|๐‘ฟ1 … ๐‘ฟ๐‘› ~ ๐‘ก๐‘ (๐‘› − 1; ๐œฝ
1 ๐‘ณ๐‘น๐‘‘ ๐‘ณ๐‘‡
),
๐‘› − 1 ๐Ÿ๐‘‡ ๐‘บ−1 ๐Ÿ
(14)
where
ฬ‚=
๐œฝ
๐‘ณ๐‘บ−1 ๐Ÿ
,
๐Ÿ๐‘‡ ๐‘บ−1 ๐Ÿ
๐‘น๐‘‘ = ๐‘บ−1 −
๐‘บ−1 ๐Ÿ๐Ÿ๐‘‡ ๐‘บ−1
.
๐Ÿ๐‘‡ ๐‘บ−1 ๐Ÿ
The t-distribution is expressed through three parameters, the first is the degrees of freedom, the second
is the mean vector and the third is dispersion matrix which has got a similar function as the covariance
matrix in the normal distribution.
The posterior for θ under the conjugate prior also results in a multivariate t-distribution
13
๐œฝ|๐‘ฟ1 … ๐‘ฟ๐‘› ~ ๐‘ก๐‘ (๐œ๐‘ + ๐‘› − ๐‘˜ − 1;
๐‘ณ๐‘ฝ−1
1
๐‘ณ๐‘น๐‘ ๐‘ณ๐‘‡
๐‘ ๐Ÿ
;
),
๐‘‡ −1
๐Ÿ๐‘‡ ๐‘ฝ−1
๐‘ ๐Ÿ ๐œ๐‘ + ๐‘› − ๐‘˜ − 1 ๐Ÿ ๐‘ฝ๐‘ ๐Ÿ
(15)
where
๐’“๐‘ =
ฬ… + ๐œ…๐‘ ๐๐‘
๐‘›๐‘ฟ
,
๐‘› + ๐œ…๐‘
ฬ…๐‘ฟ
ฬ… ๐‘‡ + ๐œ…๐‘ ๐๐‘ ๐๐‘‡๐‘ ,
๐‘ฝ๐‘ = (๐‘› − 1)๐‘บ + ๐‘บ๐‘ + (๐‘› + ๐œ…๐‘ )๐’“๐‘ ๐’“๐‘‡๐‘ + ๐‘›๐‘ฟ
๐‘น๐‘ =
๐‘ฝ−1
๐‘
๐‘‡ −1
๐‘ฝ−1
๐‘ ๐Ÿ๐Ÿ ๐‘ฝ๐‘
−
.
๐Ÿ๐‘‡ ๐‘ฝ−1
๐‘ ๐Ÿ
4.4.2 Models based on the weights
The posterior that have been derived under the Jeffreys non-informative prior for the GMVP weights
θ are
ฬ‚;
๐›‰|๐‘ฟ1 … ๐‘ฟ๐‘› ~ ๐‘ก๐‘ (๐‘› − ๐‘˜ + ๐‘; ๐›‰
(16)
1
๐‘ณ๐‘น๐‘‘ ๐‘ณ๐‘‡
).
๐‘› − ๐‘˜ + ๐‘ ๐Ÿ๐‘‡ ๐‘บ−1 ๐Ÿ
The posterior is a multivariate t-distribution just as we have had on previous two posterior
distributions; the only difference to the posterior under the diffuse prior is the degrees of freedom.
The posterior for the informative prior is in contrast to the others not a t-distribution and it is
expressed as
(๐‘›−๐‘˜+2๐‘+2๐›ฟ1 )
2
๐‘‡ −1 −1
๐‘๐ผ (๐›‰|๐‘ฟ1 … ๐‘ฟ๐‘› ) ∝ [(๐œฝ − ๐’˜๐ผ )๐‘‡ (๐‘บ−1
๐ผ + (๐‘› − 1)(๐‘ณ๐‘น๐‘‘ ๐‘ณ ) ) (๐œฝ − ๐’˜๐ผ )]
(17)
(๐‘› − ๐‘˜ + 2๐‘ + 2๐›ฟ1 ) (๐‘ + 2๐›ฟ1 − ๐œ๐ผ + 1)
∗๐‘ˆ(
;
; ๐‘”(๐œฝ)),
2
2
where U(โˆ™;โˆ™;โˆ™) is an confluent hypergeometric function expressed by Abramowitz and Stegun(1972)
and
14
−1
ฬ‚ )๐‘‡ (๐‘ณ๐‘น๐‘‘ ๐‘ณ๐‘‡ )−1 (๐œฝ − ๐œฝ
ฬ‚ ) + (๐Ÿ๐‘‡ ๐‘บ−1 ๐Ÿ)−1 ) + ๐›ฟ2
๐‘› − 1 ((๐œฝ − ๐œฝ
๐‘› − 1.
๐‘”(๐œฝ) =
๐‘‡ )−1 )−1 (๐œฝ − ๐’˜ )
2
(๐œฝ − ๐’˜๐ผ )๐‘‡ (๐‘บ−1
(๐‘›
+
−
1)(๐‘ณ๐‘น
๐‘ณ
๐‘‘
๐ผ
๐ผ
This expression is difficult to compute so a stochastic representation for ๐œฝ will be used and it can be
expressed as
1
1
๐œฝ = ๐’“๐ผ + ฯš−2 (๐‘ฝ๐ผ )2 ๐’›0 ,
(18)
where
๐’›0 ~๐‘(๐ŸŽ๐‘ , ๐‘ฐ๐‘ ),
๐‘› − ๐‘˜ + 2๐‘ + 2๐›ฟ1 2
ฯš ~ ๐บ๐‘Ž๐‘š๐‘š๐‘Ž (
, ),
2
โ„Ž๐ผ
๐œ ~ ๐บ๐‘Ž๐‘š๐‘š๐‘Ž (
๐‘› − ๐‘˜ + ๐‘ + ๐œ๐ผ − 1
, 2),
2
with
๐‘‡ −1 −1
๐‘ท1 = (๐‘บ−1
๐ผ + (๐‘› − 1)(๐‘ณ๐‘น๐‘‘ ๐‘ณ ) ) ,
๐‘ท2 = (๐‘› − 1)(๐‘ณ๐‘น๐‘‘ ๐‘ณ๐‘‡ )−1 ,
๐‘Ÿ = ๐›ฟ2−1 + (๐‘› − 1)(๐Ÿ๐‘‡ ๐‘บ−1 ๐Ÿ)−1 ,
๐‘ฝ๐ผ = (๐œ๐‘ท1 + ๐‘ท2 )−1 ,
ฬ‚ ),
๐’“๐ผ = (๐œ๐‘ท1 + ๐‘ท2 )−1 (๐œ๐‘ท1 ๐’˜๐ผ + ๐‘ท2 ๐œฝ
ฬ‚ ๐‘‡ ๐‘ท2 ๐œฝ
ฬ‚ − ๐’“๐ผ ๐‘‡ ๐‘ฝ๐ผ −1 ๐’“๐ผ .
๐’‰๐ผ = ๐‘Ÿ + ๐œ๐’˜๐‘‡๐ผ ๐‘ท1 ๐’˜๐ผ + ๐œฝ
For the posteriors there are some assumptions that have to be considered and they will be the same as
in the paper by Bodnar et al. (2015)
๐›ฟ1 = 1; ๐›ฟ2 = 0,5; ๐œ๐‘ = ๐œ…๐‘ = ๐œ๐ผ = ๐‘›.
L is set equal to the basis vector ei for each dimension which means that for the first dimension it
equals
๐‘’1 = [1 0
0
0 0].
This will allow for the distributions to be plotted individually as independent t-distribution.
15
5. Results
5.1 Global Minimum Variance Portfolios
The first results are from the mean-variance analysis of the historic data, the data were split into two
different samples. The first one consists of the period from 1st of January 2010 to 31st of December
2011 and the second is from 1st of January 2012 to 31st of December 2013, both periods contain 522
observations each. All calculations and simulations of the graphs in this part have been done in R.
The average daily log-return for the first period is displayed in table 1.
Table 1
Germany Japan
Average log-return(×10-4) -1,253
Nordic UK
-4,646 0,019
US
0,491 2,356
The most noticeable aspect of this result is that the results differ a lot for different indices; both
Germany and Japan have had a negative return for the period while the US have been doing a lot
better. All of the average daily returns are very close to zero and it is not uncommon in some financial
calculations to assume that the average daily return are zero for short time periods.
The covariance matrix for the period is summarized in table 2.
Table 2
Germ. Japan Nordic UK
US
Germany 2,275
0,499
1,894
1,617 1,416
Japan
0,499
1,564
0,452
0,405 0,238
Nordic
1,894
0,452
1,945
1,479 1,305
UK
1,617
0,405
1,479
1,438 1,090
US
1,416
0,238
1,305
1,090 1,675
These two results are enough to calculate the GMVP based on the first periods results. This is done by
finding the portfolio that minimizes the portfolio variance and the weights of that portfolio is
presented in table 3.
Table 3
Germ.
Japan Nordic UK
GMVP weights -0,394 0,427
-0,030
US
0,639 0,358
16
The portfolio has a large negative weight in the German index and just a slight negative weight in the
Nordic index. The remaining three indices have solid positive weights which suggest positive
investments in those markets. This result is a bit peculiar in that the result stipulates that a lot of the
investments should be in the asset with the lowest average return for the period. This is due to the fact
that in the definition of the GMVP the only criteria used are that the portfolio variance should be
minimized and no consideration is taken to the expected return. The main reason to the large weight in
the Japanese index is due to the fact that it has got a weaker correlation with the other markets and
including it in the portfolio results in a larger diversification effect.
The fact is that if the expected return for the portfolio with the weights above were to be calculated it
would actually be a negative expected return which might not be what an investor is looking for.
The same calculations are carried out for the second period and they gave following results:
Table 4
Germ. Japan
Average log-return(×10-4) 8,227
Nordic UK
11,312 6,072
US
3,622 7,434
The average returns are presented in table 4 and they are a lot higher for this period and the two
indices with the worst return for the previous period now have the two highest returns. The weak
results for the previous period can largely be explained by the euro crisis in 2011 when a number of
countries within the Euro-zone were in severe financial problems and that caused concerns on the
financial markets.
Table 5
Germ. Japan Nordic UK
US
Germany 1,065
0,256
0,806
0,694 0,472
Japan
0,256
1,622
0,280
0,221 0,121
Nordic
0,806
0,280
0,819
0,600 0,393
UK
0,694
0,221
0,600
0,653 0,359
US
0,472
0,121
0,393
0,359 0,547
The covariance matrix is presented in table 5 and it can be noted that it is weaker covariances and
variances in this matrix compared to the one for the previous period and this is also an effect of the
more stable developments on the markets during the second period. It is a well-established fact within
finance that markets tend to have higher correlation in times of trouble.
17
The calculations of the GMVP for this are presented in table 6.
Table 6
Germ.
Japan Nordic UK
GMVP weights -0,310 0,167
0,136
US
0,442 0,565
It is still a negative weight for the German index which means that short selling it is the best option
under these conditions. All of the other indices have positive weights and most of the investments
should be placed in the UK and US indices.
There are quite clear differences between the two periods which is interesting for the future
calculations, the first period will now serve as the prior information and that is the beliefs of the
investor while the second period will be the GMVP for the start of 2014 based on the last two years of
data.
5.2 Distribution of the weights
For the diffuse prior the distribution for the weights are multivariate t-distributed with 521 degrees of
freedom, the means will be identical to the weights calculated for the GMVP. A dispersion matrix is
also calculated and it indicates how the distribution varies around its mean.
The means are presented in table 7 and the dispersions in table 8.
Table 7 Germ.
Means
Table 8
Japan Nordic UK
-0,310 0,167
0,136
US
0,442 0,565
Germ. Japan Nordic UK
Dispersions (×10−3 ) 3,364
0,446
4,244
US
4,094 1,736
Figure 3
18
The distribution for the Japanese weight has the lowest dispersion and thus results in a thin and tall
distribution; the distribution for the US weight is also slightly thinner and taller than the remaining
three indices which are all quite similar in shape. This posterior distribution is based on a noninformative prior and does not take the investor’s beliefs into account; it is instead solely based on the
GMVP results.
The second prior introduced was the conjugate prior which is an informative prior; the posterior
distribution is a multivariate t-distribution just as in the case of the diffuse prior. The results are a bit
different since this posterior doesn’t use the GMVP weights as the means of the posterior distributions,
the degrees of freedom and the dispersions are also different compared to the diffuse prior.
The means for the posterior distributions are presented in table 9 and the dispersions in table 10.
Table 9 Germ.
Means
Table 10
Japan Nordic UK
-0,320 0,165
0,137
US
0,461 0,557
Germ. Japan Nordic UK
Dispersions (×10−3 ) 1,707
0,227
2,166
US
2,069 0,882
Figure 4
19
In a comparision between the diffuse and conjugate the means differs slightly for the distributions but
the main difference is that the conjugate prior produces distributions with a lower dispersion for all
five distributions, the conjugates dispersions are about half of the ones for the diffuse prior. The
similaritites in the results mean that the conjugate have failed to incorporate the prior information in a
satisfying way since the beliefs of the investor is not visible in the results from the posterior
distribution.
The first two distribution were derived from prior assumptions about the distribution of the mean and
covariance matrix, the following two distributions are derived directly from prior assumptions about
the GMVP weights.
This distribution was derived from the Jeffreys non-informative prior and as the previous two it is also
expressed through a multivariate t-distribution. The result for this distribution is very similar to the one
for the diffuse prior, the only difference is the number of degrees of freedom.
The means and dispersions are presented in table 11 and 12
Table 11 Germ.
Means
Table 12
Japan Nordic UK
-0,310 0,167
0,136
US
0,442 0,565
Germ. Japan Nordic UK
Dispersions (×10−3 ) 3,338
0,448
4,269
US
4,118 1,746
Figure 5
20
This result is almost identical with the diffuse priors in that the distribution for the Japanese index is
tall and thin, the distribution for the US index is a bit taller than the remaining three are all similar to
each other. This is not surprising since they are both based on non-informative priors and are thus only
based on the distribution for the actual observations.
The last posterior distribution is for the informative prior with respect to the GMVP weights, the
characteristics of the distribution is based on 25.000 simulations from the stochastic representation in
equation 17 and the results from the simulations are presented in table 13 and 14.
Table 13 Germ.
Means
Table 14
Japan Nordic UK
-0,380 0,247
-0,005
US
0,599 0,431
Germ. Japan Nordic UK
Variances (×10−2 ) 5,433
2,940 6,421
US
8,089 5,775
Figure 6
It is clearly noticeable that the distributions here are significantly wider and not as spiky, the prior
information has been taken into account and the center for the distributions are located approximately
halfway between the investor’s beliefs and the weights for the GMVP.
Table 15 shows the comparison between the investor’s beliefs, the weights from the GMVP
calculations and the means of the posterior distribution with the informative prior.
21
Table 15
Investor’s beliefs
GMVP weights
Posterior weights
Scaled weights
Germany
- 0,394
- 0,310
- 0,380
-0,426
Japan
0,427
0,167
0,247
0,277
Nordic
- 0,030
0,136
- 0,005
-0,006
UK
0,639
0,442
0,599
0,672
US
0,358
0,565
0,431
0,483
Sum
1
1
0,892
1
It seems as the results are in general closer to the investor’s beliefs but there is a problem in that the
weights from the simulation of the posterior distribution do not sum up to one as is necessary to have a
full portfolio. The weights would have to be adjusted in a way so that the sum will be one and the
easiest way of doing so is to divide all the posterior weights with 0,892, this will result in the scaled
weights presented in the table above. It is possible to consider different methods for the scaling of the
weights since the method used here give that the weight in the German index is actually smaller than
both the investor’s belief and the GMVP weight, the opposite is true for the UK index were the scaled
weight is larger than both the investor’s belief and the GMVP weight.
22
6. Summary
The time series for the indices were divided into two periods and a mean-variance analysis was
conducted for both periods. The result for the first period served as a representation for what could
have been an investor’s beliefs for the future while the second period served as the historic data for
which the weights of a portfolio based on traditional portfolio theory is derived from.
Four different prior distributions were introduced; two priors were based on the μ and Σ with one
informative and one non-informative prior. The posterior distributions for these priors were calculated
and they turned out to be very similar which wasn’t in line with the purpose since the informative prior
failed to incorporate the information that represented the investor’s beliefs.
The remaining two priors were instead based directly on the GMVP weights and just as with the
previous two there was one informative and one non-informative prior. The non-informative prior
posed a result very similar to the first two priors while the informative prior instead used the prior
information to create distributions which considered both the investors beliefs and the result from the
mean-variance estimations. There is a flaw with the use of the informative prior and that is that the
weights of the simulated distributions don’t sum up to one, it is easy to scale the weights so that they
do sum up to one but this causes different problems. A better method for the scaling can probably be
used and it could be and interesting continuation of this paper. We can make the conclusion that the
informative prior was the only prior that created a result that was consistent with the goal of the paper
and could be a possible alternative in creating more personalized portfolios.
23
References
Abramowitz M. and Stegun I. A. (1972). “Handbook of Mathematical Functions With Formulas,
Graphs and Mathematical Tables”. New York, Dover.
Barry C.B. (1974). “Portfolio Analysis under uncertain means, variances, and covariances”. Journal of
Finance 29, 515-522
Bodnar T., Mazur S. and Okhrin Y. (2015). “ Bayesian Estimation of the Global Minimum Variance
Portfolio”, Working Paper.
Brown S. J. (1976). Optimal Portfolio Choice under Uncertainty: A Bayesian Approach. Ph.D. Diss.,
University of Chicago
Frost P.A. and Savarino J.E. (1986). “An empirical Bayes approach to efficient portfolio selection”.
Journal of Financial and Quantitative Analysis 21, 293-305.
Klein R.W. and Bawa V.S. (1976). “The effect of estimation risk on optimal portfolio choice”.
Journal of Financial Economics 3, 215-231
Markowitz H.M. (1952). “Mean-variance analysis in portfolio choice and capital markets”. Journal of
Finance 7, 77-91.
MSCI. ”MSCI Index Performance”. [2014-10-15].
<http://www.msci.com/products/indexes/country_and_regional/dm/performance.html>
Tunaru R. (2002). “Hierarchial Bayesian models for multiple count data”. Austrian Journal of
Statistics 31, 221-229
24
Appendix
Matrices
The majority of formulas used in the paper are expressed in the form of matrices and they generally
simplify the calculations when we use computers to perform our calculations. A few basic definitions
and applications of matrix calculations will be introduced below and matrices will throughout the
paper be defined with a bold letters.
A matrix is a rectangular array consisting of a number of rows and columns, a matrix with three rows
and three columns would look like this:
๐‘Ž
๐‘‘
๐‘ด=[
๐‘”
๐‘
๐‘’
โ„Ž
๐‘
๐‘“]
๐‘–
A matrix with the same number of rows as columns is known as a square matrix.
A vector is a matrix that only consists of one column (or row) and thus could look like
๐‘Ž
๐‘ฝ = [๐‘ ]
๐‘
A transpose of a matrix is when it is tipped over so that the first row becomes the first column, the
transpose of the matrix M is popularly noted with a superscript T as MT and it would be
๐‘Ž
๐‘ด๐‘ป = [ ๐‘
๐‘
๐‘‘
๐‘’
๐‘“
๐‘”
โ„Ž]
๐‘–
๐‘ฝ๐‘ป = [ ๐‘Ž
๐‘
๐‘]
The transpose of the vector V is
Two matrices of the same size can be added together by simply adding the corresponding elements
and the result will be a matrix of the same size as the first two.
๐‘จ+๐‘ฉ = [
๐‘Ž
๐‘
๐‘’
๐‘
]+[
๐‘”
๐‘‘
25
๐‘“
๐‘Ž+๐‘’
]=[
โ„Ž
๐‘+๐‘”
๐‘+๐‘“
]
๐‘‘+โ„Ž
Multiplication of matrices is only possible if the number of columns in the left matrix equals the
number of rows in the right matrix. The values in the matrix product are given by the formula
๐‘›
[๐‘จ๐‘ฉ]๐‘–,๐‘— = ∑ ๐ด๐‘–,๐‘Ÿ ๐ต๐‘Ÿ,๐‘—
๐‘Ÿ=1
where A have got n columns and B have n rows, the i and j represents the row and column of the
product matrix. Two 2x2-matrices would give calculations like these:
[
๐‘Ž
๐‘
๐‘ ๐‘’
][
๐‘‘ ๐‘”
๐‘“
๐‘Ž๐‘’ + ๐‘๐‘”
]=[
โ„Ž
๐‘๐‘’ + ๐‘‘๐‘”
๐‘Ž๐‘“ + ๐‘โ„Ž
]
๐‘๐‘“ + ๐‘‘โ„Ž
It is worth noticing that matrix multiplication is not commutative which means that AB ≠ BA which is
demonstrated by the result below
[
๐‘’
๐‘”
๐‘“ ๐‘Ž
][
โ„Ž ๐‘
๐‘’๐‘Ž + ๐‘“๐‘
๐‘
]=[
๐‘”๐‘Ž
+ โ„Ž๐‘
๐‘‘
๐‘’๐‘ + ๐‘“๐‘‘
]
๐‘”๐‘ + โ„Ž๐‘‘
The identity matrix is a square matrix that consists of ones on the main diagonal (the upper left to the
bottom right) and zeros everywhere else. It is denoted as ๐‘ฐ๐‘› .
1
๐‘ฐ3 = [0
0
0 0
1 0]
0 1
The inverse of a matrix is a corresponding matrix that if it is multiplied with the original gives the
identity matrix. The inverse of A is denoted with the superscript -1 as A-1.
๐‘จ๐‘จ−1 = ๐‘ฐ๐‘›
The determinant of a matrix can be defined through the Leibniz formula
๐‘›
det(๐‘จ) = |๐‘จ| = ∑ ๐‘ ๐‘”๐‘›(๐‘ฅ) ∏ ๐‘Ž๐‘–,๐‘ฅ๐‘–
๐‘ฅ∈๐‘†๐‘›
๐‘–=1
Where the sum is computed over all permutations of the set {1,2,…,n} and the set of all such
permutations is Sn, the x:s are the permutations within Sn. Sgn(x) is a function that either takes the
value +1 or -1, it is +1 whenever the reordering given by x can be achieved by successively
interchanging two entries an even number of times, and −1 whenever it can be achieved by an odd
number of such interchanges
26
The trace of a matrix is the sum of the elements on the main diagonal.
๐‘Ž
๐‘ก๐‘Ÿ(๐‘จ) = ๐‘ก๐‘Ÿ [ ๐‘‘
๐‘”
๐‘
๐‘’
โ„Ž
27
๐‘
๐‘“]=๐‘Ž+๐‘’+๐‘–
๐‘–
Download