On-Line Portfolio Selection Using Multiplicative Updates Written by David P. Helmbold (Cal), Robert E. Schapire (Cal), Yoram Singer (AT&T) and Manfred K. Warmuth (Cal) Presented by Ryan M. McCabe Goal Within a menu of a fixed number of stocks, we want to make as much money as possible without relying too much on luck We’ll compare our results to how well the best single stock, another form of on-line learning (Cover) and a batch learner (BCRP) each performed Context Remember, this is on-line learning Unlike batch learning, the data is coming to us in a stream, and we learn from each example Still, we do not want to completely ignore what we have learned from history More Context We have a bunch of stocks We have some wealth Every day we get a report on the stocks Every day we update our current wealth, based on their performance yesterday Every day we re-allocate our wealth over the stocks Preliminaries We have N stocks w is a vector of weights over N stocks wi from i = 1 to N, sums to 1 every wi >= 0 We have T total time, superscript t denotes a specific time Preliminaries wt is the vector of weights at time t xt is the vector of relative performance of all the stocks over the course of day t wt is chosen at the beginning of day t xt = closing price on t / opening price at t The wealth resulting from day t is wt * xt We change wt every day in some way Follow-Ups If we have time at the end of this presentation, we’ll talk about some things of practical importance Transaction costs Side information Implementation details Four Types of Portfolio Mangers (Best) Constant-Rebalanced Portfolio Cover Universal Portfolio Exact Exponentiated Gradient (ExactEG(h)) Approximate Exponential Gradient (EG(h)) Constant-Rebalanced Portfolios In a CRP wt is learned over all T by looking back over the data (this is our batch method) Although the wealth is redistributed every day over the N stocks, wt stays the same from 1…T w* denotes the wt that maximizes wealth over the given set of xt from 1…T w* is associated with the Best ConstantRebalanced Portfolio (BCRP) Cover Universal Portfolio Another on-line method wt is updated every day wt is a weighted average over all feasible portfolios Guarantees the same asymptotic growth rate as BCRP for any given set of xt Exponential complexity in N Exact Exponentiated Gradient Remember on-line regression? F(wt+1) = h log(wt+1 * xt) – d(wt+1, wt) Maximize F(wt+1) over wt+1, given wt and xt log(wt+1 * xt), maximizes wealth if xt stays still d(wt+1, wt), penalizes moving too far from wt h, learning rate - shifts importance between main two terms But F(wt+1) is difficult to maximize How do we learn t w? So we use an approximation Using a first-order Taylor approximation of the first term at wt+1 = wt and a relative entropy distance measure for the second penalty term, waving some hands, we get the EG(h) update: Exponential Gradient Update This approximate version performs indistinguishably as well as the original Exact EG(h) = F(wt+1) = h log(wt+1 * xt) – d(wt+1, wt) It is only linearly complex in N Quick ReCap So now we have defined our four methods Best Constant-Rebalanced Portfolio (BCRP) Cover Universal On-Line Portfolio Exact EG(h) Common EG(h) Let’s see how they perform under pressure… The Experiments 22 years of NYSE data (T > 5,000) 36 equities (N = {2, 3,…,36}) Usually 2- or 3-stock subsets were used Reproduced each Cover experiment Stocks chosen for volatility reasons Found BCRP, then ran w* through from the beginning Ran EG(h), ExactEG(h) through from the beginning Commercial Metals and Kin Ark (Figure 5.1) IBM and Coca Cola (Figure 5.2) Gulf, HP, and Schlum (Fig 5.3) Volatility Elasticity (Table 5.5) Results Analysis Summary EG(h) and ExactEG(h) were always about 1% from each other with EG(h) running much faster BCRP always did the best EG(h) always outperformed Cover’s Universal Portfolio, despite Cover’s superior analytical worst-case bound Talking Points “[S]urprisingly, the wealth achieved by the EG(h) update was larger than the wealth achieved by the universal portfolio algorithm. This outcome is contrary to the superior worstcase bounds proved for the universal portfolio algorithm.” Cover = O((N log T)/T) EG(h) = O(√((log N)/T)) Any ideas why? Talking Points So, the size of N affected relative running times, but how did stock volatility affect relative overall wealth? Would running time matter in this domain if the algorithms were applied? Why did it matter so much to the authors? Follow Up Transaction Costs Scottrade.com charges $7 per transaction Would you update every stock every day? Side Information K-finite states of side info, available to algorithm Computationally the same as K parallel versions running, so no big deal and may increase wealth Implementation Details How do we pick h? How do we pick w1? Done