icec-97 - People - University of Idaho

advertisement
The Efficient Set GA for Stock Portfolios
Jacqueline Shoaf and James A. Foster
Dept. of Computer Science, University of Idaho, Moscow, Idaho
email: jackies@alaska.net, foster@cs.uidaho.edu
Abstract
The genetic algorithm (GA) for the efficient set
portfolio problem based on the Markowitz model
introduced by Shoaf and Foster[4] offers
significant benefits over the quadratic
programming approach. These benefits include
simultaneous optimization of risk and return.
The efficient set GA uses an indirect
representation style in order to avoid unfeasible
solutions and penalty functions. This
representation is generally applicable to
problems which seek an optimal partition for a
given amount of some resource which includes
both negative and positive allocations. Efficient
set GA evolution scales well and is O(n log n)
with a small constant for portfolios containing
up to n=100 stocks. Using demes further
improves the quality of solution and the run time
for this GA.
1. Introduction
The efficient set portfolio problem is to find
the allocation of investments for given set of
securities with minimum risk for any given rate
of return. Markowitz’s [2] approach to solving
this problem uses the covariance matrix derived
from historical rates of return to predict the
variance, or “risk factor” of any allocation of
resources. The Markowitz model accomodates
both long and short positions. A long position
represents an allocation for purchase of
securities, whereas a short position represents an
allocation from the sale of borrowed securities.
In the Markowitz model, the weighted sum of
the values in the rates of return covariance matrix
represents the overall variance, rp2 , of a
portfolio. Let n be the number of stocks in the
portfolio, xi be the proportion of resources
allocated for stock i (negative for short
positions), E(r*p) be the given expected rate of
return for the portfolio, and E(rj) be the expected
rate of return for each security. The objective
equation for the efficient set portfolio problem is:
n
n
j 1
i 1
 ( xi x j Cov(ri , rj ))
min (rp)2 =
with the following constraints:
N
1) E(r*p)=
 x E(r )
j
j 1
j
n
2) 1.0 =
x
j 1
j
The quadratic programming approach to this
problem described by Haugen [1, Appendix 3]
requires the objective equation to be rewritten in
Lagrangian form. For minimization, the partial
derivative of each variable is taken and set to 0.
This leaves a set of linear simultaneous equations
which can be solved for the coefficients of
allocation for the minimum variance portfolio.
This portfolio will have the given expected rate
of return , E(r*p), which is specified as a
constraint. Note that E(r*p) is required as input,
so the quadratic programming approach can only
solve the efficient set problem for one portfolio
rate of return. Also, the algorithm for solving a
set of linear simultaneous equations has time
complexity between O(n2) and O(n3), according
to Smith [5].
2. The Efficient Set GA
The GA solution to the efficient set problem
Shoaf and Foster [4] alters the problem slightly
to solve for an efficient set portfolio over the
entire range of potential expected portfolio
returns. Each member of the GA population
represents an allocation of resources for the
portfolio. The user selects a desirable balance
between risk and return using adjustable
constants in the GA fitness function:
  


2
  ( rp ) 

 E (r )  E (r * )
p

p
where ,,  are set by the user, E(rp) represents
the expected rate of return of the portfolio
represented by the population member, and
E(r*p) represents the user’s target expected rate
of portfolio return.
Because the efficient set portfolio problem is
an allocation problem, a direct representation of
resource allocation by each population member
in the GA will not work well. This type of
representation will result in predominantly
unfeasible solutions in every generation, where
the allocations do not sum to 1.0.
Our representation has a single field of k+1
bits for each security. The first bit indicates
whether the position on that stock will be long
(one) or short (zero). The remaining k bits are an
unsigned index onto an “allocation wheel”.
Conceptually, this is a wheel representing the
resources to be allocated. It is divided into 2k
equal sections, each indexed by a k bit binary
value. The distance between an index and the
index of the next long position, plus any enclosed
short position wedges, is the percentage of the
total resource allocation for the security with that
index. (See Figure 1.)
More precisely, suppose that i1,…,in are the
indices for n securities, in non-decreasing order,
mod 2k (so that 000, for example, follows 111).
Now, let S be the set of indexes of securities with
short positions. Let, L(j) be the next index of a
long position on the allocation wheel. Now, let
dj be:




k
 a iL( j )  i j   b  iL( k )  ik  mod 2
kS ,


jk  L ( j )


dj 
2k
with a=-1 and b=1 when j S, and a=1 and b=0
otherwise.
Pictorially, dj is the length of the arc between
the index ij and the next long security, with any
subtended short position arcs added in. Figure 1
is an example of this representation for k=3 and
n=5.
An important benefit of this representation is
that

jS
d j   jS d j  1 , for any set of
short positions S and any chromosome. That is,
the total investment is always one hundred
percent of the available resources. This makes it
impossible to create an unfeasible solution. So,
every member of the population in each
generation can contribute to the next generation
and no valuable schemata are discarded. This
also avoids any unpredictable influence on
evolution by penalty functions in the fitness
function, since they are not necessary. In general
this representation makes the GA efficient. This
indirect style of representation should work for
any optimization problem where allocation
proportions may be either positive or negative
and must sum to a given value.
Stock 0 Stock 1 Stock 2 Stock 3 Stock 4
INDEX
INDEX
INDEX
INDEX
INDEX
1 110 0 000 0 101 1 111 1 010
LONG
SHORT
SHORT
LONG
LONG
000
111
Stock 3
Stock 1
(short)
Stock 0
110
001
010
Stock 2
(short)
101
Stock 4
INDEX
011
100
Sum of Allocation s for Stocks 0-4 (in order):
.125 + -.25 + -.125 + .625 + .625 = 1.00
Figure 1. Allocation based on solution
representation
One of the less obvious effects of this
representation style, however, is the sensitivity of
the efficient set GA to increases in mutation and
crossover rates. A change in the index of one
security can affect one to two other security
allocations.
Experiments were conducted using a small set
of 5 stocks [3] comparing the effective set GA to
the quadratic programming technique and
demonstrated the benefits of simultaneously
optimizing for return and risk. These
experiments demonstrated that the GA could find
portfolio allocations with similar risk and higher
rates of return than the risk-constrained quadratic
programming solution.
The data for these experiments was derived
from end-of-week closing data accumulated over
an eleven month period beginning October 3,
1994. The covariance matrix for stock rates of
return is shown in Table 1 and the averaged
annualized rates of return are shown in Table 2.
Table 1. Stock covariance matrix
CYBE
ISLI
NBL
ORLY
CYBE
ISLI
2.45
NBL
1.36
2.76
ORLY
0.07
-0.10
0.99
RGIS
0.55
0.44
-0.02
0.68
Table 2. Average Rate of Return
CYBE
0.589
ISLI
-1.573
NBL
1.219
ORLY
.159
RGIS
-0.094
The allocation proportions, obtained using the
quadratic programming technique with a given
expected return rate of .15, are shown in Table 3.
This solution yields a minimum risk, rp2, of
.405.
Table 3. Allocation by quadratic
programming
CYBE
ISLI
NBL
ORLY
RGIS
-0.l0
0.15
0.30
0.55
0.10
Five randomly-seeded GA runs were
conducted using the same data [3]. The length of
each chromosome was 35 bits, with individual
fields of 7 bits. Each run lasted 200 generations
using a population size of 300.
The allocation proportions obtained using the
GA are shown in Table 4. This solution is the
best over all five runs. These allocations yield an
expected rate of return, of .52 with an associated
minimum risk, rp2 , of .384. In this case the
efficient set GA was clearly able to find a better
solution than quadratic programming.
Table 4. Allocation by GA
CYBE
ISLI
NBL
ORLY
RGIS
0.0
0.047
.422
.516
0.016
5 Stock Portfolio GA
Scaled
Fitness
B est o f Gen
2
A ve o f Gen
1
200
150
100
50
0
0
Generation
Figure 2.
The convergence graph of the averaged GA
solution, which plots average generation fitness
and best-of-generation fitness against generation,
is shown in Figure 2. The plot demonstrates the
expected exponential increase in fitness values.
3. Complexity of the Efficient Set GA
We also ran to determine the exact expected
time complexity of the GA as a function of n, the
number of securities in the portfolio. Since n is
a variable that is local only to the objective
function of the GA, this would confirm that the
time complexity of the fitness function dominates
the GA as n increases. An efficient algorithm in
terms of n implies that the GA can be used for
portfolio allocations involving large numbers of
securities. Notice that the size of the
chromosomes, and therefore the complexity of
the GA, depends on both the number of
securities and the number of slices on the
allocation wheel, so it is not obvious a priori that
the number of securities is the critical
performance parameter.
Each chromosome contains n fields,
representing an investment position (long or
short) and the index used for allocation for each
security. Thus the product of n and the field size
determines the total length of each chromosome.
However, the fitness function does a number of
transformations based on the values in each field,
in order to convert the chromosome into a
portfolio allocation. These transformations
affect the algorithmic time complexity of the GA.
The indirect representation method for allocation
determination described earlier requires that the
fields of the chromosome be sorted, an operation
of average expected complexity in O(n log n)
2
using a quicksort (the O( n ) worst case behavior
rarely shows up in practice). We anticipated that
sorting would dominate the time complexity of
the fitness function and the GA.
The experiments consisted of 25 sets of GA
runs, one set for each of 5 different values of n:
8, 16, 32, 64 and 100. Time was clocked on
either side of the evolution step in the GA in
order to bypass time required by chromosome
initialization, which is assumed to be linear in n
with a small constant. Between the sets, all other
GA constants remained the same. However,
since the absolute length of the chromosome is
also affected by the change in n, a separate set of
experiments was conducted to determine which
type of change dominated time complexity of the
GA.
A fixed population size of 100 chromosomes
and runs lasting 100 generations were used.
These and other basic GA parameters are noted
in Table 5.
Table 5. Basic GA parameters for
complexity experiments
GA Type
Crossover
Selection
Field Size
Population Size
Crossover Rate
Mutation Rate
Generations
Simple
2-Pt
Roulette-Wheel
7
100
.6
.001
100
The averaged experimental results are
summarized on the chart in Figure 3. Let t be the
time, in seconds, required to evolve this
Sec/100 Generations
Time= c * n log2 n
150
100
50
C=0.2
Averaged
Data
Time = a+ c * FIELDSIZE
18
17
16
15
14
c=0.49, , a=14.3
Averaged
Data
3
4
5
6
7
FIELDSIZE (Bits per Field)
Figure 4.
C=0.1
0
0
operation, which is dependent only on total
number of bits.
Sec./100
Generations
population 100 generations. The algorithmic
complexity of the fitness function based on the
data used here appears to very well fit the
equation t=F(n)= c*n log2 n . In this case the
constant c is between .1 and .2.
50
100
n= Stock Set Size (Number of
Stocks)
Figure 3
In order to determine whether the GA run
time was dominated by changes in n or by
changes in overall chromosome length, we ran an
additional set of experiments in which n, the
number of securities, remains constant but the
chromosome length changes based on the field
size
The number of bits in a field determines the
minimum allocation proportion for any stock in
the portfolio and the maximum number of
securities in the portfolio with allocations greater
than 0. Increasing the number of securities in the
portfolio also makes it desirable to increase the
field size to reduce the likelihood of index
collision. So, in practice these two values should
be correlated. But for our experiments, we used
a constant number of fields (securities), n=32,
and varied the size of each field from 3 to 7,
effectively changing the chromosome length
from 96 to 224 bits. With the exception of field
size, all other parameters from Table 5 remained
the same.
The averaged results from sets of 25
experiments in each configuration, summarized
in the chart in Figure 4, shows that increasing the
number of bits in each field has a more gradual
effect on time complexity than increasing n, the
number of fields. The graph shows constant
linear growth in time with increasing
chromosome length. For this population,
t=a*c*(bits per field), with a=14.3 and c=0.49.
The effect of increasing number of bits in the
chromosome, without increasing n, may be to
increase the time required for the mutation
Therefore, the empirical data from these
experiments confirms that overall algorithmic
time complexity for this GA application is
affected mainly by changes in the number of
security in the portfolio, n, and that the order of
expected time complexity for the fitness function
and the GA is O(n log n) with small constants.
4. The Deme Modification
A natural modification for improving time
efficiency for the efficient set GA over multiple
single population runs is the use of a deme
model. In a deme model, subpopulations evolve
independently and migrate their most highly fit
members periodically. The deme model was
designed to allow parallel evolution on a
multiprocessor system. In addition to improving
time efficiency the deme model may improve the
capability of the efficient set GA. For the
efficient set GA the deme model appears to be
better than multiple single runs because it
provides for alternating periods of local hillclimbing and global competition between
improved local optima.
The single population and deme models are
compared here based on the number of
generation steps rather than absolute GA
runtime. There are several reasons for this. The
deme modification that was implemented for
these experiments requires a steady state GA
framework, one in which only a small proportion
of deme members are replaced every generation.
The deme model has a migration step, which is
absent from the single population model. And,
although the deme model can be run on a
multiprocessor system, our implementation was
designed for a single processor system with deme
evolution implemented sequentially. Because of
The GA parameters are shown in Table 6.
Two sets of experiments were run for the single
population model. In the first set, mutation and
crossover rates were the highest possible (within
.001 and .1 increments, respectively) that would
still allow convergence and guarantee a final
local hill-climbing phase. The second set was
run at a much higher mutation rate with no
convergence (there were less than 3 matched
population members in the any final generation
from set #2) to see whether it would be possible
to find a more optimal solution doing a more
random search. In general, higher mutation rates
provide the longest possible solution space
exploration phase in the efficient set GA, which
may be a result of the potentially highly
multimodal solution space.
The fitness profile (Figure 5) from one of the
deme runs in set #3 demonstrates the alternating
influences of local improvement with total
population competition during evolution. The
results of the three sets of experiments, in terms
of the statistics for the generated portfolios, are
shown in Table 7. While the empirical data is
very limited, it serves to illustrate how well the
deme model works for this GA. The portfolio
statistics, which reflect optimal fitness statistics,
show that the deme model has the potential to
produce results comparable to and better than the
single population GA for the same time resource
or number of generations. Assuming the time for
the migration step is minimal, the time resource
required for multiple GA runs can be cut
1800
1600
1400
1200
800
1
2000
N/A
All
100
2-Pt
Roulette
6
10
.6
.007
0
1000
1
2000
N/A
All
100
2-Pt
Roulette
6
10
.6
.001
Ave of Gen
0.4
600
Populations
Gens
Epochs
Replace/Gen
PopSize
Xover
Select
Fields
Stock Set
Xover Rate
Mut Rate
3
(10 runs)
Steady
State/Deme
5
100
20 Gens/Epoch
5
100
2-Pt
Roulette
6
10
.7
.007
0.8
400
2
(10 runs)
Simple
1.2
0
GA Type
1
(10 runs)
Simple
Dem e GA
Best of Gen
200
Experiment Set
geometrically by the use of the deme model with
no loss of capability.
Scaled Fitness
the disparity in the models, this comparison is
based on the quality of optimal results between
the two models after 2000 generation steps rather
than on absolute runtime. A generation step for
the deme model includes one generation in each
of the subpopulations, since the deme model can
potentially allow evolution to proceed
simultaneously for each subpopulation on a
multiprocessor system.
Table 6.
Generations
Figure 5.
Table 7.
Portfolio statistics
Return
Risk
Set
1
2
3
Mean
.325
.366
.398
Std. Dev.
.069
.074
.063
Mean
.485
.426
.424
Std. Dev.
. 122
. 022
. 017
Conclusions
We compared a simple GA approach to
solving the efficient set problem to the more
traditional quadratic programming approach
using covariances. The GA can simultaneously
minimize risk and maximize expected return,
whereas the quadratic programming approach
must hold the risk constant. This flexibility
allows the GA to discover portfolio opportunities
that the more traditional approach misses.
We also examined the expected time
complexity of the GA solution. Our experiments
show that when the GA is run with portfolios
smaller than n=100 stocks, the expected time
complexity of the genetic algorithm is O(n log n)
with a very small constant. This is greatly
superior to the time complexity of quadratic
programming. Moreover, the GA complexity
can be primarily attributed to the fitness function,
which produces a portfolio allocation from an
indirect solution representation. The
representation style is advantageous because it
eliminates the possibility of infeasible solutions
and the need for penalty functions. Additional
experiments demonstrate that the O(n log n)
complexity of the fitness function overshadows
the linear relationship between overall length of
the chromosome and GA runtime.
Finally, we demonstrated the effectiveness of
using demes for this GA. This modification was
shown to have the potential of finding solutions
comparable and possibly superior to those gained
from multiple single population runs. A
contributing factor to the success of this type of
modification may be the highly multimodal
character of the potential solution space in the
efficient set problem.
Acknowledgements
The software for this work used the GAlib
genetic algorithm package, written by Matthew
Wall at the Massachusetts Institute of
Technology.
References
[1] Haugen, R.A., Modern Investment Theory,
Prentice Hall Inc.,Englewood Cliffs, N.J., 1993.
[2]Markowitz, H.M., Portfolio Selection, Basil
Blackwell, Inc. Cambridge, MA., 1991.
[3]Shoaf, J.S. and Foster, J.A., “A Genetic Algorithm
Solution to the Efficient Set Problem: A Technique for
Portfolio Selection Based on the Markowitz Model”,
Tech Report. , Dept. of Computer Science, Univ. of
Idaho, Moscow, ID, 1995.
[4]Shoaf, J.S., and Foster, J.A., “A Genetic Algorithm
Solution to the Efficient Set Problem: A Technique for
Portfolio Selection Based on the Markowitz Model”,
Proc. 1996 Annual Meeting,, Vol. 2, Decision
Sciences Institute, Orlando, FL., 1996, pp. 571-573.
[5] Smith, H.A., Data Structures: Form and Function,
Harcourt Brace Jovanovich, Inc. San Diego, CA.,
1987.
Download