Testing multi-signal strategies Robert Novy-Marx University of Rochester and NBER 1 n Performance evaluation when stocks selected using multiple signals? q Generally use t-statistics (significance) n q Relative to some asset pricing model I.e., using the information ratio n Backtests q horribly biased! Completely obvious once you know why 2 Common in industry n Fundamental q “Smart beta” n RAFI: weight on sales, CF, BE, dividends n Factor q indices E.g., MSCI Quality Index n High ROE, low ROE vol., low leverage n Select/weight q indices stocks on composite score E.g., combined z-score or rank-sum 3 E.g., Russell Stability Indexes 4 n Increasingly q E.g., Piotroski’s F-score (2000) n n q common in academia 9 signals Well cited (~700 on GS) Asness et. al. Quality Score (2014) n n 21 signals 2013 NBER Asset Pricing meetings 5 Issues n Combine things that backtest well è Get even better backtests n Not surprising! q What do the backtests mean? n Biased? Why? What biases? q If so, by how much? (Quantify) q n Other intuitions? 6 Example n Stocks q Benchmarked against those low in 1st digit: n n n q w/ SIC codes high within 1st digit E.g., pens (3950) vs tires and inner tubes (3011) RE investors (6799) vs commercial banks (6021) Health clubs (7997) vs hotels (7000) SR: 0.14 n Not significant q Because probably (almost certainly) nothing there 7 Example n Stocks q Versus those with odd prices n SR: 0.23 (not significant) n High q w/ even prices intra-industry SICs and even prices? Versus low and odd? n n SR: 0.30 Significant! q q Do you think there really is anything there? Have I cheated, and if so how? 8 Thought experiment? 9 n What if you diversify across the lucky monkeys’ recommendations? q Clearly “snooping” n Uses in-sample aspect of data to form strategy n Could also use all the recs… … but bet on the lucky monkeys… q … and against the unlucky monkeys q 10 n Get the average return è E [r | r > 0] n Diversify n Yields q across their risks è σ p ≈ σ / N a high t-statistic è E [ t-stat.] ≈ 2 N / π Expected IR ≈ 0.8 × N / T n E.g., 10 monkeys, 10 years è E [ IR ] ≈ 0.8 n 5 monkeys, 20 years è E [ IR ] ≈ 0.4 n Why should we care about all this? 11 n It’s what multi-signal strategies do! Sign each “signal” to gives positive returns q Standard statistics account for this if N = 1 q n But only if N = 1! n E.g., why low leverage for high returns? è q Positive in-sample alpha! 12 What our statistics assume n t-statistics q (IRs) normally distributed Centered at zero n Under “null hypothesis” of uninformative signals real returns, uninformative (random) signals IRs è 13 What they account for n That signals are used (signed) to predict positive returns q IRs è In sample 14 n Critical q Slightly underestimate true likelihoods n IRs è values (e.g., 5%) account for this Real world fat tails (jumps, heteroskedasticity) 15 n True for any combination of signals… … chosen ex ante! q I.e., choose signals you want to combine at the start of sample q n At q end can flip combined signal But combined signal only! n n Not OK to flip only some of them! I.e., can’t sign signals individual at end 16 Don’t account for n Signing q individual sub-signals! Distributions? n One, two, three signals Three signed signals Two signed signals IRs è 17 Other issues? n Over-sampling q (selection or MT) bias Look at many things, show only best n Well understood, standard fix (Bonferroni) Single best of Single of threebest signals two signals IRs è 18 Two biases interact! n Best q k-of-n strategies Look at n strategies, combine k best n Representative of real processes? Best 3-of-20 IRs è 19 Summary statistic? n Whole distributions are hard n So look at critical values q E.g., how big an IR can you expect at least 5% of the time? n Or t-statistic (accounts for sample length) n Again, in real data! Real returns q Random “signals” q 20 Critical values Best k-of-n strategies n Special cases q n > k = 1 è pure selection bias n q Well understood k=n n How do you put the signals together? n General q è pure overfitting bias case n > k > 1 è both bias 21 Special cases IR ≈ 1 IR ≈ 0.43 22 General cases Profession as a whole? Individuals, unintentionally? Even worse than this? IR ≈ 1.35 Individual researchers? IR ≈ 1 IR ≈ 0.67 23 Theory? Model n Simple assumptions q Normal returns n n Homoscedastic Uncorrelated n Distributions? n Intuitions? 24 t-statistic distributions t t n MV n,k MVE n,k ∑ = = k i =1 t( n +1−i ) k k 2 t ∑ i=1 (n+1−i ) (equal-weighted) (signal-weighted) Convolutions of conditional random variables q Agree closely with those observed in real data 25 26 Critical value approximation n Analytic, from normal approximation q Exact in special cases (k = 1; MVE k = n) 27 Not very illuminating n where 28 But work well in practice! n Also help explain some observed features 29 Alternative quantification n Pure q How many single signals would you have to look at to get same bias? n Critical value approximation useful here n That q selection bias equivalence is, find n* s.t. t * n* ,1, p = t * n,k , p * n,k , p Here t denotes the critical t-statistic for a best k-of-n strategy 30 31 Approximate Power Law n = o (n * k ) 32 Conclusion n View multi-signal claims skeptically Multiple good signals è better combined performance q Good backtested performance does NOT è any good signals q n “High tech” solution: use different tests n “Low tech”: evaluate signals individually q Marginal power of each (using Bonferroni!) 33