Notes: FACTSET Variable Code Selection and Alpha Tester

advertisement
FACTSET NOTES
VARIABLE/PARAMETER CODE SELECTION AND ALPHA TESTING CONVENTIONS
Disclaimer:
This document is a collection of raw notes, intended to aid the work of Duke Investment
Analytics and Fuqua Investment Analytics. Little time has been spent on editing or
organization for ease of external use. This document is shared only because it may still
be of some use in subsequent FactSet-based research even in its present, raw form.
Duke Investment Analytics (DIA), Global Asset Allocation and Stock Selection: Claudio
Aritomi, Sam Ding, Mak Pitke, Marcus Shaw, Brian Wachob
Fuqua Investment Analytics (FIA), Quantitative Stock Selection: Stefan Gertsch, Brian
Wachob.
ALPHA TESTING CONVENTIONS
In-sample date range: 1/31/1987 – 11/31/2001
Out-of-sample date range: 12/31/2001 – 12/31/2004
(Note that using 31 as the last day of the month when specifying the date range is necessary—even when there is no 31st day of the
specified month. If not used in this way, lagged variables (parameters) may return values with unintended time alignments
(misalignments) in Alpha Tester.)
(Also note that FactSet’s convention is to associate the returns from the following month with the data, and portfolio selected, for the
final day of a given month. For example, the first in-sample portfolio is formed on Jan. 31, 1987 based on data available on Jan. 31,
1987 and the first return recorded in-sample is the return realized in February 1987 – this return is associated with the date Jan. 31,
1987.)
In Alpha Testing, always select low fractile = low values. This convention will maintain
consistency across our screens and alpha tests. (This convention was used in the work of
Duke Investment Analytics. Fuqua Investment Analytics has chosen to associate the low
numbered fractiles with high factor values—which is consistent with FactSet default
conventions for UQUINTILE, UDECILE, UPERCENTILE, and related functions.)
In Alpha Testing, resolve ties in ranking factor by FactSet option: choose midpoint.
STANDARD UNIVERSE SCREENS
US-listed stocks only
P_COUNTRY_ISO = “US”
***This should be double-checked. Does use of this variable introduce problems—for example, do there
exist companies that historically traded on US exchanges, but have since withdrawn from US exchanges in
favor of trading exclusively on foreign exchanges? Some firms talk about this possibility in order to avoid
cumbersome SEC regulations. Do there exist such firms and does this element of universe specificiation
exclude these companies from the historical dataset? Perhaps this parameter, like so many others, does not
update historically… ***
Exclude bad returns data
Because of what appear to be bad Compustat returns data, we also completely excluded
AU, CXW, and TKC from our universe (for Duke Investment Analytics). Fuqua
Investment Analytics’ work discerned how to exclude these stocks only on the months
(one month for each stock) for which their returns data appeared to be erroneous. Find
more details towards the end of this document.
This screen was added by FIA
Exclude ETFs and other investment funds.
We are trying to model factors that affect firm returns, not index or investment fund
returns. Additionally, to include these equities effectively double-counts component
companies.
Find more info later in this document pertaining to implementation of this screen.
This screen was added by FIA
Minimum daily dollar volume
Average daily dollar volume over the past month must be greater than $500,000 in 2005.
(S&P averaged a 10.4% CAGR from year-end 1984 through year-end 2004. Thus, we
have chosen to reduce our dollar volume screening threshold by 10.4% each year we go
back into time. It is $500K for 2005 and $84K for 1987.)
(This threshold is scaled downward in earlier periods because a portfolio manager would have had lesser dollar-liquidity requirements
because he would have been managing a lesser number of nominal dollars. Likening a minimum dollar-volume threshold to growth in
the S&P500 may not be the best method of specifying this screen- perhaps a better scaling rate than 10.4% time-constant threshold
growth can be determined.)
(CM_VOL(0)/(P_TRADING_DAYS(-1M,0M)-1))*((CM_PH(0)+CM_PL(0))/2) >
0.5*POWER(1.104,(INT(CM_DNC(0)/100)-2005))
Find more info later in this document pertaining to weaknesses in methods for estimating average daily dollar volume and market cap.
This screen was added by FIA, replacing a different market cap limit methodology used
by DIA
Minimum market cap
Market cap must be greater than $200M in 2005.
We reduced our market cap screening threshold by 7% each year we go back into time. It
is $200M for 2005 and $59M for 1987.
(This threshold is scaled downward in earlier periods because of currency inflation and growth in the stock market over time. Again,
future work may determine a better growth rate for this threshold than 7%.)
MSHS(0)*MP(0) > 200*POWER(1.07,(INT(CM_DNC(0)/100)-2005))
Find more info later in this document pertaining to weaknesses in methods for estimating average daily dollar volume and market cap.
FIA code used (a) to select US-traded stocks; (b) to exclude ETFs and other investment
funds; and (c) to exclude data points with questionable returns data:
(((CA_CUSIP<>"90011120" OR CM_DNC(0)<>200109) AND (CA_CUSIP<>"03512820"
OR CM_DNC(0)<>199806) AND (CA_CUSIP<>"22025Y40" OR CM_DNC(0)<>200008))
AND ((SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE), 0)<>6722 AND
SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE),0)<>6726)=1) AND
(P_COUNTRY_ISO="US"))=1
This screen was used by DIA, but weaknesses were discovered by the work of FIA. Thus,
this screen was abandoned.
NYSE, NASDAQ, AMEX only
Top 60% of these by market cap.
(Note that using this universe definition yields roughly 1100 firms in January 1987 and roughly 3500 firms by November 2001. The
market cap thresholds correspond to roughly $50mil in January 1987 and $200mil in February 2005.)
*** This parameter’s implementation, as used for this screen, has proven to be faulty. The values reported for
G_EXCHANGE_NAME do not update historically. Thus, companies such as Enron who are presently delisted, but
were on the NYSE historically are wrongly excluded from all backtests! ***
Also note that we investigated defining our universe by S&P or Russell index constituents, but it looks as if
such historical data is not presently available with our subscriptions.
STANDARD FORMULAS AND PARAMETER SELECTIONS TO USE FOR
COMMON VARIABLES (AND UNIVERSE-LIMITING CRITERIA)
Exchange:
I could find no parameter that would give me historically updated
exchange listing information. FactSet help gave me the following
response:
Dear Brian,
This is George from FactSet. I have just found the problem, and it is two fold. First, we don't have historical constituents
for the NYSE. Second, an additional subscription is required for historical constituents for the NASDAQ and AMEX.
Because of this, you will be unable to see the list as it was back in, say, 1990. Even with the additional subscription,
AMEX historical constituents only goes back until 1998.
Sorry for the bad news. Let me know if you have any other questions.
Sincerely,
George T. Hogan
George T. Hogan
|
Consultant
|
FactSet Research Systems
Considered— G_EXCHANGE_NAME, IB_EXCHANGE, CA_EXCHANGE_NAME, CA_EXCHANGEN, IH_EXCHANGE
G_EXCHANGE_NAME gives only record of current exchange listing. IH_EXCHANGE appears to
come the closest to working, but also has many errors and missing data among its records.
Exclusion of ETFs and other Funds:
(SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE),0)<>6722 AND
SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE),0)<>6726)=1
I might have alternately used PTYPEN, but in looking at the selection
of companies screened out/in, I thought this method did a better job
(it was not completely clear which was the better method though).
Price per share:
MP(0)
This checks out ok on splits (all historical prices are revised when a split occurs;
thus, they align with current price).
Also considered— CM_P, P(0), @AVAIL(P(0),MP(0))
P(0) gives all or almost all NAs
Shares Outstanding:
MSHS(0)
This checks out ok on splits (all historical values are revised when a split occurs;
thus, they align with current shares outstanding).
Though this reflects as-of-fiscal-period-end share counts that were not
necessarily available (ie not as-reported), it eliminates other problems with noncontemporaneous lagged data being used in the current period. For example, companies for
which shares outstanding should have been NA were finding an old (lagged 45 days) value
that, multiplied with their current day price, pushed them into the market cap
consideration set of companies— this caused some 80000% returns to enter our dataproblematic!
Also considered— CM_SHS(0), CM_SHS(0 L45D)
For future investigation- IH_SHRS_OUT
Perhaps this variable could be combined with a non-split adjusted price (perhaps
available via CM_P(xxx)?).
Market Capitalization:
MP(0) * MSHS(0)
Also considered— RI_MKTCAP(0)/1000, CM_MKT_VALUE(0), @AVAIL(P(0),MP(0)) * CM_SHS(0 L45D)
Average Daily Dollar Volume Over the Past Month:
(CM_VOL(0)/(P_TRADING_DAYS(-1M,0M)-1))*((CM_PH(0)*CM_PL(0))/2)
This checks out ok on splits (all historical prices and volumes are revised when a split
occurs; thus, they align with current price and shares outstanding; volume and price
metrics here are consistent for estimation of dollar volume).
This expression could still use a lot of improvement, but it’s the best
we have so far (ideally, we would probably calculate the median daily
dollar volume for the preceding 21 trading days—- which may now be
possible with P_VOLUME and P_PRICE parameters because the Calendar
setting problem has been rectified).
The parameter
Alpha Testing
give one more
It appears to
P_TRADING_DAYS(-1M,0M) was found to work (well enough) in
but not in Universal Screening Reports. It appears to
than the actual number of trading days in Alpha Testing.
give one more than the actual number of calendar days (!)
in Universal Screening Reports. Perhaps other parameters are similarly
functional in Alpha Testing but not in Universal Screening Reports?
Note that this calendar day versus trading day problem was later
resolved—one must take care to ensure that under Options-> Calendar,
the Other: US calendar is selected (not 7-day calendar for example).
???
SUM20(P_PRICE(0)*P_VOLUME(0))
???
Problem with P_VOLUME(-1), P_VOLUME(-2), etc. is that it appears to
increment backwards by calendar days, not trading days as desired.
Same problem with P_VOLUME(-1,0,1000), P_VOLUME(-2,0,1000), etc. These
problems appear to occur in the Universal Screening Report output, but
not necessarily in the Excel environment.
***I think that this was a result of the same problem:
{Note that this calendar day versus trading day problem was later
resolved—one must take care to ensure that under Options-> Calendar,
the Other: US calendar is selected (not 7-day calendar for example).}
The following (in blue) works in Excel, but not in Universal Screening
AVG(P_PRICE(-1M,0D,D)*P_VOLUME(-1M,0D,D))
Note that -1M bounds the date range at the last day of the preceding month. -1AM would
have picked out the same day as the day referenced by 0D, but a month earlier. See
FactSet Help PageID 1964.
I tested this code with regard to stock splits for one stock in a recent time period and
found that historical P_PRICE and P_VOLUME numbers had indeed been accurately revised in
the P_ database to align with the post-split share levels. …at least this was my initial
conclusion when comparing historical volume quotes for NYSE:SM in Yahoo! Finance with
those outputted from FactSet to an Excel spreadsheet. Note that, at least in the case of
the ERICY reverse split (and also the recent SM split I believe), it is evident that the
historical volumes listed in Yahoo! Finance ARE NOT changed to reflect the new number of
shares outstanding. This matter requires further attention/review.
See FactSet Help PageID 614 for info on handling of Dividends, Stock Splits, and
Spinoffs.
IB_VOL_1D does not appear to increment backwards through time (at least
not in Universal Screening Report (i.e. IB_VOL_1D gives same value as
IB_VOL_1D(-1) or IB_VOL_1D(-100)).
Note that median dollar volume would be preferable to mean dollar volume so that our
estimated liquidity of a given stock is not thrown off by a one day spike in volume that
enters the trailing window sample period.
Minimum Threshold for Average Daily Dollar Volume Over the Past Month:
0.5*POWER(1.104,(INT(CM_DNC(0)/100)-2005))
Note that using the same threshold for both NASDAQ and NYSE
not ideal. Due to double-counting of dealer trades, volume
stocks is inflated relative to NYSE and AMEX stocks. Thus,
screen would use different minimum dollar-volume thresholds
stocks than for NYSE and AMEX stocks.
stocks is
for NASDAQ
a better
for NASDAQ
(See “Standard Universe Screens” section for more.)
Book value per share:
AVAIL(CM_BK(0 L2M), G_BOOK_PS_USD(0 L2M))
An improvement?:
IF((SUM(0,IHLQEPSDNC(0))=CM_DNC(-1) OR SUM(0,IHLQEPSDNC(0)) = CM_DNC(2)), AVAIL(CM_BK(0), G_BOOK_PS_USD(0)), AVAIL(CM_BK(0 L2M),
G_BOOK_PS_USD(0 L2M)))
Note that because the data series we have chosen are predominantly updated on a monthly
basis, I believe that lagging by 45 days (L45D) will return data that is no different
from that obtained when lagging by 2 months (2M). Thus, it may be preferable to
explicitly lag by 2 months.
Also note that this method of seeking to access the most recent (least stale) data may
induce some lookahead bias in instances where firms shift their reporting schedules—
because it relies on the timing of last year’s quarterly report to estimate whether this
year’s quarterly report has occurred yet (and thus, whether we can look at the newest
data point).
This checks out ok on splits (all historical book values per share are revised when a
split occurs; thus, they align with current shares outstanding).
CM_BK(0) appears to update quarterly as-of-fiscal-period-end
CM_BK(0) appears to have less missing data (NAs) than G_BOOK_PS_USD(0)
G_BOOK_PS_USD appears to update annually as-of-fiscal-period-end
After finding problems with the G_ dividend variables using current rather than
historical values in Alpha Testing, I re-checked these G_ variables and fortunately did
not find this same problem.
Book Value-to-Price:
AVAIL(CM_BK(0 L2M), G_BOOK_PS_USD(0 L2M))/MP(0)
See immediately above for potential improvement
This checks out ok on splits (all historical values are revised when a split occurs.
thus, they align with current price and are consistent for the ratio calculation).
It looks as if CM_BK and G_BOOK_PS_USD may only update annually in the 88-89 timeframe.
Ideally, we would find a variable that updates quarterly.
Also considered— CM_PBK(0), G_PBK(0)
CM_PBK and G_PBK appear to contain the same data series
Trailing EPS:
AVAIL(IH_EPS_ACT_LTM(0), CM_EPS(0 L2M))
Use method shown for Book Value for improved timeliness of data?
This checks out ok on splits (all historical EPSs are revised when a split occurs; thus,
they align with current shares outstanding).
Note that there are significant differences between these two options.
IH_EPS_ACT_LTM is evaluated based on analyst consensus methods of
adjustment to earnings (to align with IBES consensus forecast
earnings).
CM_EPS matches the sum of the numbers reported in the firm’s 4 most
recent 10-Q’s. I believe these are basic EPS, not diluted.
Perhaps in a future analysis, I will use diluted EPS- this would
probably be preferable.
Forward EPS Estimate:
IH_MEAN_NTM(0)
(A)
Alternates:
AVAIL(IH_MEAN_NTM(0),
AVAIL((G_IBES_FY1_MEAN_USD(0)+IH_MEAN_NTMYR(0))/2,
AVAIL(IH_MEAN_NTMYR(0), G_IBES_FY1_MEAN_USD(0))))
AVAIL(IH_MEDIAN_NTM(0), G_IBES_FY1_MED_USD(0))
IH_MEDIAN_FY2(0)
(B)
(C)
(D)
Use definition C, except for negative FwdEPS. For these, a
combination/transform of IH_MEDIAN_FY2(0), IH_MEDIAN_FY3(0), and
CQ_SALES_PS_LTM(0 L2M)
(F)
AVAIL(IH_MEDIAN_NTM(0), IH_MEDIAN_NTM(-1), IH_MEDIAN_NTM(-2),
IH_MEDIAN_NTM(-3), IH_MEDIAN_NTM(-4), IH_MEDIAN_NTM(-5),
IH_MEDIAN_NTM(-6), IH_MEDIAN_NTM(-7), IH_MEDIAN_NTM(-8),
IH_MEDIAN_NTM(-9), IH_MEDIAN_NTM(-10), IH_MEDIAN_NTM(-11),
IH_MEDIAN_NTM(-12), G_IBES_FY1_MED_USD(0))
(G)
AVAIL(IH_MED_EPS_NTMA(0), IH_MEDIAN_NTM(0), G_IBES_FY1_MED_USD(0))
(H)
AVAIL(IH_MED_EPS_NTMA(0), IH_MEDIAN_NTM(0), G_IBES_FY1_MED_USD(0))
(J)
Note that in Definition J, I added an additional condition (screen) that excluded any
stocks with forward earnings yield estimates greater than 1. There are only a couple
stocks for which this is an issue (i.e. CUSIP 81600630 SEIBELS BRUCE GROUP INC in the
late 80s). Clearly this seems to be some sort of data error because a stock cannot be
expected to earn for in the coming year than it is worth!
Preliminary study suggests that there is no lookahead bias in G_IBES_FY1_MED_USD. I am
always concerned with G_ parameters because they have proven mostly unreliable in the
past. As it is, mixing a FY1 forecast with an NTM forecast is less than ideal.
Definition G was created because close inspection of the historical IH_MEDIAN_NTM data
series revealed that there were often NAs entering the time series surrounded by what
appear to be real earnings forecasts. Thus, if the data is NA for a given month, perhaps
it is best to look at the previous month’s entry for screening purposes.
Most of these check out ok on splits (all historical earnings forecasts are revised when
a split occurs; thus, they align with current shares outstanding). The only one I have
not checked (because it often returns “NA”) is IH_MEAN_NTMYR. All others have been
checked out and appear sound with respect to stock splits.
After finding problems with the G_ dividend variables using current rather than
historical values in alpha tester, I re-checked these G_ variables and fortunately did
not find this same problem.
I next recommend looking more closely into variables like IH_MEAN_EPS_NTMA and
IH_MEAN_EPS_STMA. Also IH_MED_EPS_NTMA, etc.
Definition J is believed to be the best found thus far, but I know that there remains
room for improvement to this variable/factor definition.
Also considered— These don’t look like they offer any additional beneficial information
over those already considered above: IH_MEAN_FY1, IH_MEAN_FY1R,
G_PRICE_USD(0)/G_PE_IBES_EST_FY1, G_PRICE_USD(0)/G_PE_IBES_EST_FY1D,
G_PRICE_USD(0)/G_PE_IBES_EST_NTM, G_PRICE_USD(0)/G_PE_IBES_EST_NTMD,
CM_P(0)/CM_PE_IBES_FY1, CM_P(0)/CM_PE_IBES_FY1R
Trailing Sales Per Share:
CQ_SALES_PS_LTM(0 L2M)
This checks out ok on splits (all historical values are revised when a split occurs;
thus, they align with current shares outstanding).
An improvement?:
IF((SUM(0,IHLQEPSDNC(0))=CM_DNC(-1) OR SUM(0,IHLQEPSDNC(0)) = CM_DNC(2)), CQ_SALES_PS_LTM(0), CQ_SALES_PS_LTM(0 L2M))
Dividend Yield:
CM_DIV_YLD(0)
Also considered- IH_DIV(0)/MP(0)*100
The adjustment of MP(0) and IH_DIV(0) for splits is not consistent. Thus, we have
discarded what previously was our secondary methodology of calculating dividend yield.
This was the approach used by DIA (before IH_DIV(0) weaknesses were discovered) and
subsequently discarded by FIA:
AVAIL(CM_DIV_YLD(0), IH_DIV(0)/MP(0)*100)
CM_DIV_YLD has fewer NAs than IH_DIV. Many IH_DIV NAs are 0’s in CM_DIV_YLD. CM_DIV_YLD
also appears to contain better data for ADRs than IH_DIV. Still, there are rare
instances where CM_DIV_YLD has NA when IH_DIV has a value. Thus, AVAIL function is used.
Also considered—
G_DIVS_PS_USD, G_COM_DIVS_TOTAL, G_DIV_YLD appeared to return current value only, not
historical values as desired in Alpha Testing. Dangerous! Looks like IC_ variables in
this respect.
CM_DIV, CQ_DIVS_PS_PDATE and CQ_DIVS_PS may report double the actual quarterly dividend
amount because if two dividend ex-dates or payments happen to occur in the same fiscal
quarter, it will report the sum of them rather than one or the other.
Implied Cost of Capital(ICC):
(A)
Uses Forward EPS Estimate method C (median estimate NTM or FY1)
for E1 in approximation of implied cost of capital.
(D)
Uses Forward EPS Estimate method D (median estimate FY2) for E1
in approximation of implied cost of capital.
See FactSet code directly for implementation w/quadratic equation.
Interest Expense:
???
CA_INT_EXP_LTD(0) too many NAs.
I_INTEREST_EXP(USD) returns NAs.
SUE:
IH_SUE_Q(0)
In the Excel Plug-In, it looks like IH_SUE_Q is recorded concurrent
with each fiscal quarter end (with NAs in months that do not correspond
to a fiscal quarter end). The SUE that is reported corresponds to the
surprise of that quarter’s new earnings report. However, in Alpha
Tester, it looks like IH_SUE_Q(0) does contain the SUE of the most
recently reported earnings (not backfilled SUE of the earnings for the
most recently ended fiscal quarter). This is good. Also note that
trying IH_SUE_Q(-1) in Alpha Tester appears to always give the same
result as IH_SUE_Q(0)- i.e. it does not look back to find the previous
SUE, but instead still gives the most recently available/reported SUE.
Note that it appears SUE data is not present in database until 8/31/89.
Thus, backtest periods cannot begin any earlier than this date.
Finally, note that I recall noticing in passing that there may have
been an abundance of 1.0 and -1.0 values returned by IH_SUE_Q in more
recent time periods. This is an area of concern and should be
investigated more thoroughly as it raises questions regarding the
reliability of the SUE database. Alternately, my recollection of
potentially flawed data may be what is flawed! This requires further
attention.
SUE Universe (A):
Seeks to limit universe to firms who, in the previous year, reported
quarterly earnings during the coming month (and presumably, are likely
to report again this month- thus, hopefully exhibiting the greatest SUE
impact upon returns).
IHLQEPSDNC(0)
This data series updates whenever new earnings are actually reported.
Thus, it can be used to find whether a company reported earnings in a
relevant past month (i.e. the same month as the present one, but a year
ago).
*** Look into improving this screen by using IH_SURPR_QDATE. I looked
at this briefly in the Excel Plug-In for ticker:CTAS and it looks like
it works! ***
Turnover:
See FactSet code directly for final implementation.
Abandoned:
CM_VOL(0)/(P_TRADING_DAYS(-1M,0M)-1)/MSHS(0)
This screen still needs work because it appears that CM_VOL data is not
available for dates preceding around 12/31/1986. Actually, CM_VOL may be
available but P_TRADING_DAYS may not (ending sometime in 1984?).
Abnormal Turnover is specified by performing the calculation shown
above and subtracting the mean of 60 values computed when the above
calculation is performed for each of the preceding 60 months for a
given firm.
Note that though computation of geometric means (of historical volume
metrics) might have improved performance in this metric, I could find
no function in Universal Screening that computes geometric means.
Performing it by multiplying 60 terms and raising to the 1/60th power did
not work either- it gave NAs- I think because FactSet can’t handle the
precision that was necessary to compute the product of so many numbers
most of which are <0.1.
Based on previous findings, splits should not cause data problems with this method.
However, I recognize that MSHS(0) is not precisely known at the time it is updated in the
historical time series (though the number of shares outstanding does not change very
rapidly). Using a lag of this variable would help, but I wonder whether that would
introduce split-related errors (as has been observed before with lagged shares
outstanding variables to calculate market cap). This matter warrants further
investigation.
Revision Ratio:
(SUM(IH_UP_FY1(0),IH_UP_FY1(-1),IH_UP_FY1(-2))SUM(IH_DOWN_FY1(0),IH_DOWN_FY1(-1),IH_DOWN_FY1(2)))/SUM(IH_NEST_FY1(0),IH_NEST_FY1(-1),IH_NEST_FY1(-2))
I checked that these parameters do indeed appear to report data
correctly in Alpha Tester (i.e. unlike IH_SUE_Q(0) and IH_SUE_Q(-1),
IH_UP_FY1(0) is indeed different than IH_UP_FY1(-1) when the historical
record is so).
Note that the observed power of this signal in Alpha Testing backtests is so strong in
the 1987-2000 period (though not 2001), that I am suspicious of possible look-ahead bias.
Maybe revision ratio really is a profoundly powerful signal (I know it is one of the most
widely-known and researched factors), but the observed strength of the signal in our
backtests leads me to seek further auditing of the data source. This matter warrants
further investigation.
Momentum:
(CM_P(-1)-CM_P(-13))/CM_P(-13)
A market index:
Best result so far that may require further evaluation and scrutiny:
VALUE(SP50, MP(0))
I discovered this command very late in model development, so we did not
have time to incorporate it into our screens. Upon first inspection,
it does look as if it gives reliable data in Alpha Testing. It (or
something similar) might be used to help dynamically scale minimum
dollar volume or market cap thresholds, etc.
No good:
SP_VALUE_D_IDX(0), SP_VALUE_D_IDX(-30),
SP_PRICE_IDX(0), SP_PRICE_IDX(-30),
SP_VALUE_IDX(0), SP_VALUE_IDX(-30),
SP_DIVIDEND_IDX(0), SP_DIVIDEND_IDX(-30),
SP_PRICE(0), SP_PRICE(-30)
Most give all NAs. SP_PRICE(0) sporadically returns values that are
equal to the per share price of the stock itself (not the index).
MISCELLANEOUS FACTSET PROBLEMS
Calendar:
Make sure that in Universal Screening, the Calendar option is set as
follows: Go to Options -> Calendar ; select Other and US.
Somehow, this got set to “Seven Day” at one point in our work and
caused a lot of confusion with regard to some of the parameters.
Also ensure that the US Calendar is selected in Alpha Testing. This
selection is available via a “Select” button on the Time Series tab of
the menu called up by the “Inputs” button under the heading “Model”.
CompuStat Quarterly data (CQ_):
A mysterious problem came and went (by creating a brand new screen and
manually re-entering all of the code). CQ_ parameters were returning
all NAs in Universal Screening and in Alpha Testing. The origin of
this problem was never determined. Beware.
COMMON PROBLEMS WITH PARAMETER CODE SELECTION
Historical time-alignment of data
- Some parameters do not have historically updated time series. They return the same single value across
all time (which is the current value). This introduces extreme lookahead bias into the backtest.
- Many parameters are backfilled such that new data appears immediately at the end of a fiscal quarter
(even though the company does not publicly report the data for until perhaps 3 to 8 weeks later). Other
parameters do update only as the new data became publicly available.
Inconsistent split adjustment
Beware of using parameters that do not historically adjust for splits in a consistent fashion (i.e. problems
were encountered when using a lag of shares outstanding {MSHS(0 L45D)} in conjunction with a current
price {P(0)}—these may or may not have been related to split adjustments).
Survivorship bias
Beware of setting a screening criteria or even just selecting a parameter for sorting/ranking (or screening)
that may exclude defunct companies or cause all of their values to return as NAs. Of course, always select
Universe->Use Research (Inactive) Companies.
Missing or bad data
Some parameters have very poor historical data (i.e. IH_EXCHANGE). Also, some parameters that do not
appear to work in the Universal Screening Report do work very nicely in Alpha Testing. And vice versa.
Beware. Some parameters with data that is only slightly in error can be fixed with the proper formula (i.e.
(P_TRADING_DAYS(-1M,0M)-1) ). Note that lagged parameters using the “L2M”- or “L45D”-type syntax
do not return lagged values in the Universal Screening Report, but do typically work properly in Alpha
Testing.
Extreme Compustat returns data
A robust universe definition should exclude most stocks with questionable returns data from backtests.
However, I find that some questionable returns data (very high monthly returns that differ from those
recorded in Yahoo! Finance) seem to persist (see more on this topic near the end of this document). I
recommend scrutinizing the max and min returns obtained in a backtest. Returns in excess of around 300%
should perhaps be double-checked against another database. I tried alternately selecting Worldscope ad
other returns sources (instead of CompuStat returns data) in Alpha Testing, but it appears that we may not
subscribe to those other databases.
MISCELLANEOUS NOTES
For Help with I/B/E/S parameters:
FactSet Online Assistant Page IDs: 214, 4728, 10490
FactSet Alpha Calculation:
FactSet Online Assistant Page IDs: 582
Note that FactSet does not use the risk free rate in estimating alphas.
It regresses the fractile returns against benchmark returns (not
fractile returns minus riskfree returns against benchmarks returns
minus riskfree returns).
UPERCENTILE Function (and similar ones):
Beware that these functions (also UGPERCENTILE, UQUINTILE, UDECILEX,
etc.) probably don’t work the way you intend them to work if your
universe is smaller than 100 (smaller than 10 for UDECILE, etc.) We
have developed code that overcomes this shortcoming of these FactSet
functions. (Find it in our FactSet Universal Screening code.)
Large anomalous-appearing returns that checked out real (in Yahoo!
Finance or at BigCharts.com or nasdaq.com):
IDCC 19991130
641%
OCCF 19960430
376%
REGN 20000130
359%
ARXX 20000131
283%
EDIG 19991231
353%
CYTO 20000131
260%
92908B30 VSOURCE INC 20000131
308% (at BigCharts.com)
64122D50 NETWORK PLUS CORP 20001229
313%
(at BigCharts.com)
45769740 INNOVATION HOLDINGS IVHN 20041029
-99.8%
(at nasdaq.com)
Large anomalous-appearing returns that checked out bad or questionable
according to Yahoo! Finance (and were eliminated from our backtesting
universe):
Cusip
90011120
03512820
22025Y40
Ticker Date
TKC 20010928 bad 1536%
AU 19980630 questionable 442%
CXW 20000831 bad 389%
Filenames of FIA’s most robust Universal Screening Code:
(As noted elsewhere, the [only] ones-digit significance of much of the historical SUE data by
IH_SUE_Q(0) raises concerns regarding the reliability/historical accuracy of this data series. More
auditing of this data series is warranted. Further refinement of abnormal dollar volume, abnormal
turnover, and change in net accruals definitions are other areas that warrant particular attention in the
future.)
Universe Definition:
!FIA – UniverseDefinition v004
Abnormal Dollar Volume (in biggest 30% of universe):
!FIA – AbnDV_B_BIG30_v006
Abnormal Dollar Volume (in next biggest (mid) 20% of universe):
!FIA – AbnDV_B_MID20_v006
Abnormal Dollar Volume (in smallest 50% of universe):
!FIA – AbnDV_B_SMALL_v006
Abnormal Turnover:
!FIA – AbnTO_B_v006
Book-To-Price:
!FIA – BtoP_v001
Change in Net Accruals:
!FIA – ChgAccruals_v007
Change in Shares Outstanding (over trailing 3, 6, 12, and 18 months):
!FIA – ChgShsOut_3_6_12_18M_v005
Dividend Yield:
!FIA – DY_v001
Forward Earnings Yield:
!FIA – FEYJ_v015
Forward Earnings Yield (including various industry-normalization schemes):
!FIA – FEYF_AllinOne_v0013
Trailing Earnings Yield:
!FIA – TEY_v001
Momentum:
!FIA – Momentum_v005
Revenue-to-Price:
!FIA – RtoP_v001
Reversal:
!FIA – Reversal_v004
Revision Ratio:
!FIA – RR_A_v003
Size (same as simple Universe Definition—use market cap to rank):
!FIA – Size_v003
Standardized Unexpected Earnings:
!FIA – SUEA_v013
Standardized Unexpected Earnings (in current month’s expected non-reporting universe):
!FIA – SUEANR_v010
Standardized Unexpected Earnings (in current month’s expected reporting universe):
!FIA – SUEAR_v012
FIA’s Mulitvariate Screen (with static factor weights):
!FIA – All_Screen_v005
Download