Testing multi-signal strategies Robert Novy-Marx University of Rochester and NBER 1

advertisement
Testing multi-signal strategies
Robert Novy-Marx
University of Rochester and NBER
1
n  Performance
evaluation when stocks
selected using multiple signals?
q 
Generally use t-statistics (significance)
n 
q 
Relative to some asset pricing model
I.e., using the information ratio
n  Backtests
q 
horribly biased!
Completely obvious once you know why
2
Common in industry
n  Fundamental
q 
“Smart beta”
n 
RAFI: weight on sales, CF, BE, dividends
n  Factor
q 
indices
E.g., MSCI Quality Index
n 
High ROE, low ROE vol., low leverage
n  Select/weight
q 
indices
stocks on composite score
E.g., combined z-score or rank-sum
3
E.g., Russell Stability Indexes
4
n  Increasingly
q 
E.g., Piotroski’s F-score (2000)
n 
n 
q 
common in academia
9 signals
Well cited (~700 on GS)
Asness et. al. Quality Score (2014)
n 
n 
21 signals
2013 NBER Asset Pricing meetings
5
Issues
n  Combine
things that backtest well
è Get even better backtests
n  Not surprising!
q 
What do the backtests mean?
n  Biased?
Why? What biases?
q  If so, by how much? (Quantify)
q 
n 
Other intuitions?
6
Example
n  Stocks
q 
Benchmarked against those low in 1st digit:
n 
n 
n 
q 
w/ SIC codes high within 1st digit
E.g., pens (3950) vs tires and inner tubes (3011)
RE investors (6799) vs commercial banks (6021)
Health clubs (7997) vs hotels (7000)
SR: 0.14
n 
Not significant
q 
Because probably (almost certainly) nothing there
7
Example
n  Stocks
q 
Versus those with odd prices
n 
SR: 0.23 (not significant)
n  High
q 
w/ even prices
intra-industry SICs and even prices?
Versus low and odd?
n 
n 
SR: 0.30
Significant!
q 
q 
Do you think there really is anything there?
Have I cheated, and if so how?
8
Thought experiment?
9
n  What
if you diversify across the lucky
monkeys’ recommendations?
q 
Clearly “snooping”
n 
Uses in-sample aspect of data to form strategy
n  Could
also use all the recs…
… but bet on the lucky monkeys…
q  … and against the unlucky monkeys
q 
10
n  Get
the average return è E [r | r > 0]
n  Diversify
n  Yields
q 
across their risks è σ p ≈ σ / N
a high t-statistic è E [ t-stat.] ≈ 2 N / π
Expected IR ≈ 0.8 × N / T
n 
E.g., 10 monkeys, 10 years è E [ IR ] ≈ 0.8
n 
5 monkeys, 20 years è E [ IR ] ≈ 0.4
n  Why
should we care about all this?
11
n  It’s
what multi-signal strategies do!
Sign each “signal” to gives positive returns
q  Standard statistics account for this if N = 1
q 
n 
But only if N = 1!
n  E.g.,
why low leverage
for high returns? è
q 
Positive in-sample alpha!
12
What our statistics assume
n  t-statistics
q 
(IRs) normally distributed
Centered at zero
n 
Under “null hypothesis” of uninformative signals
real returns,
uninformative
(random) signals
IRs è
13
What they account for
n  That
signals are used (signed) to predict
positive returns
q 
IRs è
In sample
14
n  Critical
q 
Slightly underestimate true likelihoods
n 
IRs è
values (e.g., 5%) account for this
Real world fat tails (jumps, heteroskedasticity)
15
n  True
for any combination of signals…
… chosen ex ante!
q  I.e., choose signals you want to combine at
the start of sample
q 
n  At
q 
end can flip combined signal
But combined signal only!
n 
n 
Not OK to flip only some of them!
I.e., can’t sign signals individual at end
16
Don’t account for
n  Signing
q 
individual sub-signals!
Distributions?
n 
One, two, three signals
Three signed
signals
Two signed
signals
IRs è
17
Other issues?
n  Over-sampling
q 
(selection or MT) bias
Look at many things, show only best
n 
Well understood, standard fix (Bonferroni)
Single best of
Single
of
threebest
signals
two signals
IRs è
18
Two biases interact!
n  Best
q 
k-of-n strategies
Look at n strategies, combine k best
n 
Representative of real processes?
Best
3-of-20
IRs è
19
Summary statistic?
n  Whole
distributions are hard
n  So look at critical values
q 
E.g., how big an IR can you expect at least
5% of the time?
n 
Or t-statistic (accounts for sample length)
n  Again,
in real data!
Real returns
q  Random “signals”
q 
20
Critical values
Best k-of-n strategies
n  Special cases
q 
n > k = 1 è pure selection bias
n 
q 
Well understood
k=n
n 
How do you put the signals together?
n  General
q 
è pure overfitting bias
case
n > k > 1 è both bias
21
Special cases
IR ≈ 1
IR ≈ 0.43
22
General cases
Profession as a whole?
Individuals, unintentionally?
Even worse than this?
IR ≈ 1.35
Individual
researchers?
IR ≈ 1
IR ≈ 0.67
23
Theory?
Model
n  Simple assumptions
q 
Normal returns
n 
n 
Homoscedastic
Uncorrelated
n  Distributions?
n  Intuitions?
24
t-statistic distributions
t
t
n 
MV
n,k
MVE
n,k
∑
=
=
k
i =1
t( n +1−i )
k
k
2
t
∑ i=1 (n+1−i )
(equal-weighted)
(signal-weighted)
Convolutions of conditional random variables
q 
Agree closely with those observed in real data
25
26
Critical value approximation
n 
Analytic, from normal approximation
q 
Exact in special cases (k = 1; MVE k = n)
27
Not very illuminating
n 
where
28
But work well in practice!
n 
Also help explain some observed features
29
Alternative quantification
n  Pure
q 
How many single signals would you have to
look at to get same bias?
n 
Critical value approximation useful here
n  That
q 
selection bias equivalence
is, find n* s.t. t
*
n* ,1, p
= t
*
n,k , p
*
n,k , p
Here t
denotes the critical t-statistic for
a best k-of-n strategy
30
31
Approximate Power Law
n = o (n
*
k
)
32
Conclusion
n  View
multi-signal claims skeptically
Multiple good signals è better combined
performance
q  Good backtested performance does NOT è
any good signals
q 
n  “High
tech” solution: use different tests
n  “Low tech”: evaluate signals individually
q 
Marginal power of each (using Bonferroni!)
33
Download