Slides

advertisement
On-Line Portfolio
Selection Using
Multiplicative Updates
Written by David P. Helmbold (Cal), Robert E.
Schapire (Cal), Yoram Singer (AT&T) and Manfred
K. Warmuth (Cal)
Presented by Ryan M. McCabe
Goal

Within a menu of a fixed number of stocks, we
want to make as much money as possible
without relying too much on luck

We’ll compare our results to how well the best
single stock, another form of on-line learning
(Cover) and a batch learner (BCRP) each
performed
Context



Remember, this is on-line learning
Unlike batch learning, the data is coming to us
in a stream, and we learn from each example
Still, we do not want to completely ignore what
we have learned from history
More Context





We have a bunch of stocks
We have some wealth
Every day we get a report on the stocks
Every day we update our current wealth, based
on their performance yesterday
Every day we re-allocate our wealth over the
stocks
Preliminaries


We have N stocks
w is a vector of weights over N stocks
wi from i = 1 to N, sums to 1
 every wi >= 0


We have T total time, superscript t denotes a
specific time
Preliminaries

wt is the vector of weights at time t


xt is the vector of relative performance of all the
stocks over the course of day t



wt is chosen at the beginning of day t
xt = closing price on t / opening price at t
The wealth resulting from day t is wt * xt
We change wt every day in some way
Follow-Ups

If we have time at the end of this presentation,
we’ll talk about some things of practical
importance
Transaction costs
 Side information
 Implementation details

Four Types of Portfolio Mangers




(Best) Constant-Rebalanced Portfolio
Cover Universal Portfolio
Exact Exponentiated Gradient (ExactEG(h))
Approximate Exponential Gradient (EG(h))
Constant-Rebalanced Portfolios




In a CRP wt is learned over all T by looking back
over the data (this is our batch method)
Although the wealth is redistributed every day
over the N stocks, wt stays the same from 1…T
w* denotes the wt that maximizes wealth over
the given set of xt from 1…T
w* is associated with the Best ConstantRebalanced Portfolio (BCRP)
Cover Universal Portfolio

Another on-line method
wt is updated every day
 wt is a weighted average over all feasible portfolios



Guarantees the same asymptotic growth rate as
BCRP for any given set of xt
Exponential complexity in N
Exact Exponentiated Gradient

Remember on-line regression?

F(wt+1) = h log(wt+1 * xt) – d(wt+1, wt)
Maximize F(wt+1) over wt+1, given wt and xt
 log(wt+1 * xt), maximizes wealth if xt stays still
 d(wt+1, wt), penalizes moving too far from wt
 h, learning rate - shifts importance between main two
terms


But F(wt+1) is difficult to maximize
How do we learn


t
w?
So we use an approximation
Using a first-order Taylor approximation of the
first term at wt+1 = wt and a relative entropy
distance measure for the second penalty term,
waving some hands, we get the EG(h) update:
Exponential Gradient Update


This approximate version performs
indistinguishably as well as the original Exact
EG(h) = F(wt+1) = h log(wt+1 * xt) – d(wt+1, wt)
It is only linearly complex in N
Quick ReCap

So now we have defined our four methods
Best Constant-Rebalanced Portfolio (BCRP)
 Cover Universal On-Line Portfolio
 Exact EG(h)
 Common EG(h)

Let’s see how they perform under pressure…
The Experiments



22 years of NYSE data (T > 5,000)
36 equities (N = {2, 3,…,36})
Usually 2- or 3-stock subsets were used
Reproduced each Cover experiment
 Stocks chosen for volatility reasons



Found BCRP, then ran w* through from the
beginning
Ran EG(h), ExactEG(h) through from the
beginning
Commercial Metals and Kin Ark
(Figure 5.1)
IBM and Coca Cola (Figure 5.2)
Gulf, HP, and Schlum (Fig 5.3)
Volatility Elasticity (Table 5.5)
Results Analysis Summary



EG(h) and ExactEG(h) were always about 1%
from each other with EG(h) running much
faster
BCRP always did the best
EG(h) always outperformed Cover’s Universal
Portfolio, despite Cover’s superior analytical
worst-case bound
Talking Points




“[S]urprisingly, the wealth achieved by the
EG(h) update was larger than the wealth
achieved by the universal portfolio algorithm.
This outcome is contrary to the superior worstcase bounds proved for the universal portfolio
algorithm.”
Cover = O((N log T)/T)
EG(h) = O(√((log N)/T))
Any ideas why?
Talking Points

So, the size of N affected relative running times,
but how did stock volatility affect relative overall
wealth?

Would running time matter in this domain if the
algorithms were applied? Why did it matter so
much to the authors?
Follow Up

Transaction Costs
Scottrade.com charges $7 per transaction
 Would you update every stock every day?


Side Information
K-finite states of side info, available to algorithm
 Computationally the same as K parallel versions
running, so no big deal and may increase wealth


Implementation Details
How do we pick h?
 How do we pick w1?

Done
Download