Understanding the Kelly Criterion

Column overall title: A Mathematician on Wall Street Column 23 Understanding The Kelly Criterion by Edward O. Thorp Copyright 2008 In January 1961 I spoke at the annual meeting of the American Mathematical Society on “Fortune’s Formula: The Game of Blackjack.” This announced the discovery of favorable card counting systems for blackjack. My 1962 book Beat the Dealer explained the detailed theory and practice. The “optimal” way to bet in favorable situations was an important feature. In Beat the Dealer I called this, naturally enough, “The Kelly gambling system,” since I learned about it from the 1956 paper by John L. Kelly. (Claude Shannon, who refereed the Kelly paper, brought it to my attention in November of 1960.) I’ve continued to use it successfully in gambling and in investing. Since 1966 I’ve called it “the Kelly criterion” in my articles. The rising tide of theory about and practical use of the Kelly Criterion by several leading money managers received further impetus from William Poundstone’s readable book about the Kelly Criterion, Fortune’s Formula. (As this title came from that of my 1961 talk, I was asked 3/6/2016 1 to approve the use of the title.) At a value investor’s conference held in Los Angeles in May, 2007, my son reported that “everyone” said they were using the Kelly Criterion. The Kelly Criterion is simple: bet or invest so as to maximize (after each bet) the expected growth rate of capital, which is equivalent to maximizing the expected value of the logarithm of wealth. But the details can be mathematically subtle. Since they’re not covered in Poundstone (2005) you may wish to refer to my article “The Kelly Criterion in Blackjack, Sports Betting and the Stock Market,” Handbook of Asset and Liability Management, Volume I, Zenios and Ziemba editors, Elsvier 2006 (also available on my website www.EdwardOThorp.com). Hedge fund manager Mohnish Pabrai, in his new book The Dhandho Investor, gives examples of the use of the Kelly Criterion for investment situations. (Pabrai won the bidding for this year’s lunch with Warren Buffett, paying over $600,000.) Consider his investment in Stewart Enterprises (pages 108-115). His analysis gave what he believed to be a list of worst case scenarios and payoffs over the next 24 months which I summarize in Table 1. Table 1 Stewart Enterprises, Payoff Within 24 Months 3/6/2016 2 Probability Return p1  0.80 R1  100% p2  0.19 R2  0% p3  0.01 R3  100% ______________________________ Sum=1.00 The expected growth rate of capital g  f  if we bet a fraction f of our net worth is (1) 3 g  f    pi ln 1  Ri f  i 1 where ln means the logarithm to the base e. When we use Table 1 to replace the pi by their values and the Ri by their lower bounds this gives the conservative estimate for g  f (2)  in equation (2): g  f   0.80 ln 1  f   0.01ln 1  f  Setting g ‵ f   0 and solving gives the optimal Kelly fraction f *  0.975 noted by Pabrai. Not having heard of the Kelly Criterion back in 2000, Pabrai only bet 10% of his fund on Stewart. Would he have bet more, or less, if he had then known about 3/6/2016 3 Kelly’s Criterion? Would I have? Not necessarily. Here are some of the many reasons why. (1) Opportunity costs. A simplistic example illustrates the idea. Suppose Pabrai’s portfolio already had one investment which was statistically independent of Stewart and with the same payoff probabilities. Then, by symmetry, an optimal strategy is to invest in both equally. Call the optimal Kelly fraction for each f *. Then 2 f * < 1 since 2 f *  1 has a positive probability of total loss, which Kelly always avoids. So f *  0.50. The same reasoning for n such investments gives f *  1/ n. Hence we need to know the other investments currently in the portfolio, any candidates for new investments, and their (joint) properties, in order to find the Kelly optimal fraction for each new investment, along with possible revisions for existing investments. Pabrai’s discussion (e.g. pp. 78-81) of Buffett’s concentrated bets gives considerable evidence that Buffet thinks like a Kelly investor, citing Buffett bets of 25% to 40% of his net worth on single situations. Since f *  1 is necessary to avoid total loss, Buffett must be betting more than .25 to .40 of f * in these cases. The opportunity cost principle suggests it must be higher, perhaps much higher. Here’s 3/6/2016 4 what Buffett himself says, as reported in http://undergroundvalue.blogspot.com/2008/02/notes-from-buffett-meeting2152008_23.html, notes from a Q & A session with business students. Emory: With the popularity of “Fortune’s Formula” and the Kelly Criterion, there seems to be a lot of debate in the value community regarding diversification vs. concentration. I know where you side in that discussion, but was curious if you could tell us more about your process for position sizing or averaging down. Buffett: I have 2 views on diversification. If you are a professional and have confidence, then I would advocate lots of concentration. For everyone else, if it’s not your game, participate in total diversification. If it’s your game, diversification doesn’t make sense. It’s crazy to put money in your 20th choice rather than your 1st choice. “LeBron James” analogy. If you have LeBron James on your team, don’t take him out of the game just to make room for some else. 3/6/2016 5 Charlie and I operated mostly with 5 positions. If I were running 50, 100, 200 million, I would have 80% in 5 positions, with 25% for the largest. In 1964 I found a position I was willing to go heavier into, up to 40%. I told investors they could pull their money out. None did. The position was American Express after the Salad Oil Scandal. In 1951 I put the bulk of my net worth into GEICO. With the spread between the on-the-run versus off-the-run 30 year Treasury bonds, I would have been willing to put 75% of my portfolio into it. There were various times I would have gone up to 75%, even in the past few years. If it’s your game and you really know your business, you can load up. This supports the assertion in Rachel and (fellow Wilmott columnist) Bill Ziemba’s new book, Scenarios for Risk Management and Global Investment Strategies, that Buffett thinks like a Kelly investor when choosing the size of an investment. They discuss Kelly and investment scenarios at length. Computing f * without considering the available alternative investments is one of the most common oversights I’ve seen in the use of the Kelly Criterion. It is a dangerous error because it generally overestimates f * . 3/6/2016 6 (2) Risk tolerance. As discussed at length in Thorp (2006, op. cit.), “full Kelly” is too risky for the tastes of many, perhaps most, investors and using instead an f  cf *, with fraction c where 0  c  1, or “fractional Kelly” is much more to their liking. Full Kelly is characterized by drawdowns which are too large for the comfort of many investors. (3) The “true” scenario is worse than the supposedly conservative lower bound estimate. Then we are inadvertently betting more than f * and, as discussed in Thorp (2006, op. cit.), we get more risk and less return, a strongly suboptimal result. Betting f  cf *, 0  c  1, gives some protection against this. (4) Black swans. As fellow Wilmott columnist Nassim Nicholas Taleb has pointed out so eloquently in his new bestseller The Black Swan, humans tend not to appreciate the effect of relatively infrequent unexpected high impact events. Failing to allow for these “black swans,” scenarios often don’t adequately consider the probabilities of large losses. These large loss probabilities may substantially reduce f *. (5) The “long run.” The Kelly Criterion’s superior properties are asymptotic, appearing with increasing probability as time increases. For instance: 3/6/2016 7 As time t tends to infinity the Kelly bettor’s fortune will, with probability tending to 1, permanently surpass that of any bettor following an “essentially different” strategy. The notion of “essentially different” has confounded some well known quants so I’ll take time here to explore some of its subtleties. Consider for simplicity repeated tosses of a favorable coin. The outcome of the n th trial is  n where P  n  1  p  12 and P  n  1 is 1  p  q  0. The n  are independent identically distributed random variables. The Kelly fraction is f   p  q  E  n   0. The Kelly strategy is to bet a fraction f n  f  at each trial n  1, 2, . Now consider a strategy which bets g n , n  1, 2, at each trial with gn  f  for some n  N and gn  f  thereafter. So the gn  strategy differs from Kelly on at least one of the first N trials but copies it thereafter. There is a positive probability that gn  is ahead of Kelly at time N , hence ahead for all n  N. For example consider the sequence of the first N outcomes such that n  1 if gn  f  and  n  1 if g n  f  . Then for this 3/6/2016 8 specific sequence, which has probability  q N , gn  gains more than Kelly for each n  N where g n  f , hence exceeds Kelly for all n  N. What if instead in this coin tossing example we require that gn  f  for infinitely many n ? This question arose indirectly about 15 years ago in the newsletter Blackjack Forum when a well known anti Kellyite, John Leib, challenged a well known blackjack expert with (approximately) this proposition bet: Leib would produce a strategy which differed from Kelly at every trial but would (with probability as close to 1 as you wish), after a finite number of trials, get ahead of Kelly and stay ahead forever. When I read the challenge I immediately saw how Leib could win the bet. Leib’s Paradox: Assuming capital is infinitely divisible, then given   0 there is an N  0 and a sequence   fn  with  f n  f  for all n, such that n n i 1 i 1   P Vn*  Vn for all n  N  1   where Vn   1  fi i  and Vn*   1  f *i .   Furthermore there is a b  1 such that P Vn / Vn*  b, n  N  1   and 3/6/2016 9   P Vn  Vn*    1   . That is, for some N there is a non Kelly sequence that beats Kelly “infinitely badly” with probability 1   for all n  N. Note: The infinite divisibility of capital can be dealt with as needed in examples where there is a minimum monetary unit by choosing a sufficiently large starting capital. PROOF. The proof has two parts. First we want to establish the assertion for n  N. Second we show that once we have an  f n , n  N  that is ahead of Kelly at n  N , we can construct f n  f *, n  N  to stay ahead. To see the second part, suppose VN  VN* . Then VN  a  bVN* for some a  0, b  1. For instance VN  VN*  c  0 since there are only a finite number of sequences of outcomes in the first N trials, hence only a finite number with VN  VN* . So VN  c  VN*  c / 2   c / 2   VN*   c / 2   dMaxVN*  VN*   c / 2   d  1VN* where dMaxVN*  c / 2 defines d  0 and MaxVN* is over all sequences of the first N trials such that VN  VN* . Setting c / 2  a  0 and d  1  b  1 suffices. 3/6/2016 10 Once we have VN  a  bVN* we can, for bookkeeping purposes, partition our capital into two parts: a and bVN* . For n  N we bet f n  f * from bVN* and an additional amount a / 2 n from the a part, for a total which is generally unequal to f * of our capital. If by chance for some n the total equals f * of our total capital we simply revise a / 2 n to a / 3n for that n. The portion bVN* will become bVN* for n  N and the portion a will never be exhausted so we have   Vn  bVn* for all n  N. Hence, since P Vn*    1, we have   P Vn / Vn*  b  1 from which it follows that P Vn  Vn*     1. To prove the first part, we show how to get ahead of Kelly with probability 1   within a finite number of trials. The idea is to begin by betting less than Kelly by a very small amount. If the first outcome is a loss, then we have more than Kelly and use the strategy from the proof of the second part to stay ahead. If the first outcome is a win, we’re behind Kelly and now underbet on the second trial by enough so that a loss on the second trial will put us ahead of Kelly. We continue this strategy until either there is a loss and we are ahead 3/6/2016 11 of Kelly or until even betting 0 is not enough to surpass Kelly after a loss. Given any N , if our initial underbet is small enough, we can continue this strategy for up to N trials. The probability of the strategy failing is p N , 1 2  p  1. Hence, given   0, we can choose N such that p N   and the strategy therefore succeeds on or before trial N with probability 1  p N  1   . More precisely: suppose the first n trials are wins and we have bet a fraction f *  ai with ai  0, i  1,  1  f *  a1 Vn  Vn* 1 f *  , n, on the i th trial. Then  1  f  a   1  f  * n *  a   1  1 *   1 f   an   1  a1  1  *   1 f  1  an   1   a1   an  where the last inequality is proven easily by induction. Letting a1   an  a, so Vn / Vn*  1  a, what betting fraction f *  b will put us ahead of Kelly if the next trial is a loss? A sufficient condition is   *  Vn 1 Vn 1  f  b a b    1  a  1   1  a 1  b   1 or b  * * * * 1 a Vn 1 Vn 1  f  1 f    provided b  f * and 0  a  1. If a  3/6/2016 1 2 then b  2a suffices. Proceeding 12 recursively, we have these conditions on the ai : choose a1  0. Then an1  2  a1   an  , n  1, 2, f  x   a1 x  a2 x2  provided all the an  1 . 2 Letting we get the equation  f  x   a1 x  2 xf  x  1  x  x 2    2 xf  x  / 1  x   whose solution is f  x   a1  x  2   3 n2 n2  x n  from which an  2a1 3n  2 if n  2.  Then given   0 and an N such that p N   it suffices to choose a1 so that   aN  2a1 3N  2  min f * , 12 . Q.E.D. Although Leib didn’t have the mathematical background to give such a proof he understood the idea and indicated this sort of procedure. So far we’ve seen that all sequences which differ from Kelly for only a finite number of trials, and some sequences which differ infinitely often (even always), are not essentially different. How can we tell, then, if a betting sequence is essentially different than Kelly? Going to a more general setting 3/6/2016 13 than coin tossing, assume now for simplicity that the payoff random variables X i are independent and bounded below but not necessarily identically distributed. At this point we come to an important distinction. In financial applications one commonly assumes that the f i are constants that are dependent only on the current period payoff random variable (or variables). Such “myopic strategies” might arise for instance, by selecting a utility function and maximizing expected utility to determine the amount to bet. However, for gambling systems the amount depend on previous outcomes, i.e. f n  f n  X1 , X 2 , , X n1  , just as it does in the Leib example. As professor Stewart Ethier pointed out, our discussion of “essentially different” is for the constant f i case. For a more general case, including the Leib example and many of the classical gambling systems, I recommend Ethier’s forthcoming book on the mathematics of gambling, The Doctrine of Chances, Springer-Verlag, Berlin (2008 or 2009). 3/6/2016 14 We assume E  X i   0 for all X i from which it follows that f i *  0 for all n n i 1 i 1  i. As before, Vn   1  fi X i  and Vn*   1  fi* X i  from which ln Vn   ln 1  f i X i  and ln Vn*   ln 1  f i * X i  . Note from the definition f * n n i 1 i 1   that E ln 1  f i * X i  E ln 1  f i X i  , where E denotes the expected value, with equality if and only if fi *  fi . Hence   E ln Vn* / Vn   E  ln 1  fi * X i   ln 1  fi X i    ai n i 1 n i 1 where ai  0 and ai  0 if and only if fi *  fi . This series of non-negative terms either increases to infinity or increases to a positive limit M . We say is essentially different from increases. Otherwise,  fi   f  if and only if  a n * i i 1 i  fi  tends to infinity as n is not essentially different from  f . * i The basic idea here can be applied to more general settings. (6) Given a large fixed goal, e.g. to multiply your capital by 100, or 1000, the expected time for the Kelly investor to get there tends to be least. 3/6/2016 15 Is a wealth multiple of 100 or 1000 realistic? Indeed. In the 51½ years from 1956 to mid 2007, Warren Buffett has increased his wealth to about $5x1010. If he had $2.5x104 in 1956, that’s a multiple of 2x106. We know he had about $2.5x107 in 1969 so his multiple over the last 38 years is about 2x103. Even my own efforts, as a late starter on a much smaller scale, have multiplied capital by more than 2x104 over the 41 years from 1967 to early 2007. I know many investors and hedge fund managers who have achieved such multiples. The caveat here is that an investor or bettor many not choose to make, or be able to make, enough Kelly bets for the probability to be “high enough” for these asymptotic properties to prevail, i.e. he doesn’t have enough opportunities to make it into this “long run.” In a subsequent article we’ll explore for which investors Kelly or fractional Kelly may be a more or less appropriate approach. An important consideration will be the investor’s expected future wealth multiple. 3/6/2016 16

Understanding the Kelly Criterion

Related documents

Study collections

Products

Support

Understanding the Kelly Criterion

Related documents

Study collections

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib