FACTSET NOTES VARIABLE/PARAMETER CODE SELECTION AND ALPHA TESTING CONVENTIONS Disclaimer: This document is a collection of raw notes, intended to aid the work of Duke Investment Analytics and Fuqua Investment Analytics. Little time has been spent on editing or organization for ease of external use. This document is shared only because it may still be of some use in subsequent FactSet-based research even in its present, raw form. Duke Investment Analytics (DIA), Global Asset Allocation and Stock Selection: Claudio Aritomi, Sam Ding, Mak Pitke, Marcus Shaw, Brian Wachob Fuqua Investment Analytics (FIA), Quantitative Stock Selection: Stefan Gertsch, Brian Wachob. ALPHA TESTING CONVENTIONS In-sample date range: 1/31/1987 – 11/31/2001 Out-of-sample date range: 12/31/2001 – 12/31/2004 (Note that using 31 as the last day of the month when specifying the date range is necessary—even when there is no 31st day of the specified month. If not used in this way, lagged variables (parameters) may return values with unintended time alignments (misalignments) in Alpha Tester.) (Also note that FactSet’s convention is to associate the returns from the following month with the data, and portfolio selected, for the final day of a given month. For example, the first in-sample portfolio is formed on Jan. 31, 1987 based on data available on Jan. 31, 1987 and the first return recorded in-sample is the return realized in February 1987 – this return is associated with the date Jan. 31, 1987.) In Alpha Testing, always select low fractile = low values. This convention will maintain consistency across our screens and alpha tests. (This convention was used in the work of Duke Investment Analytics. Fuqua Investment Analytics has chosen to associate the low numbered fractiles with high factor values—which is consistent with FactSet default conventions for UQUINTILE, UDECILE, UPERCENTILE, and related functions.) In Alpha Testing, resolve ties in ranking factor by FactSet option: choose midpoint. STANDARD UNIVERSE SCREENS US-listed stocks only P_COUNTRY_ISO = “US” ***This should be double-checked. Does use of this variable introduce problems—for example, do there exist companies that historically traded on US exchanges, but have since withdrawn from US exchanges in favor of trading exclusively on foreign exchanges? Some firms talk about this possibility in order to avoid cumbersome SEC regulations. Do there exist such firms and does this element of universe specificiation exclude these companies from the historical dataset? Perhaps this parameter, like so many others, does not update historically… *** Exclude bad returns data Because of what appear to be bad Compustat returns data, we also completely excluded AU, CXW, and TKC from our universe (for Duke Investment Analytics). Fuqua Investment Analytics’ work discerned how to exclude these stocks only on the months (one month for each stock) for which their returns data appeared to be erroneous. Find more details towards the end of this document. This screen was added by FIA Exclude ETFs and other investment funds. We are trying to model factors that affect firm returns, not index or investment fund returns. Additionally, to include these equities effectively double-counts component companies. Find more info later in this document pertaining to implementation of this screen. This screen was added by FIA Minimum daily dollar volume Average daily dollar volume over the past month must be greater than $500,000 in 2005. (S&P averaged a 10.4% CAGR from year-end 1984 through year-end 2004. Thus, we have chosen to reduce our dollar volume screening threshold by 10.4% each year we go back into time. It is $500K for 2005 and $84K for 1987.) (This threshold is scaled downward in earlier periods because a portfolio manager would have had lesser dollar-liquidity requirements because he would have been managing a lesser number of nominal dollars. Likening a minimum dollar-volume threshold to growth in the S&P500 may not be the best method of specifying this screen- perhaps a better scaling rate than 10.4% time-constant threshold growth can be determined.) (CM_VOL(0)/(P_TRADING_DAYS(-1M,0M)-1))*((CM_PH(0)+CM_PL(0))/2) > 0.5*POWER(1.104,(INT(CM_DNC(0)/100)-2005)) Find more info later in this document pertaining to weaknesses in methods for estimating average daily dollar volume and market cap. This screen was added by FIA, replacing a different market cap limit methodology used by DIA Minimum market cap Market cap must be greater than $200M in 2005. We reduced our market cap screening threshold by 7% each year we go back into time. It is $200M for 2005 and $59M for 1987. (This threshold is scaled downward in earlier periods because of currency inflation and growth in the stock market over time. Again, future work may determine a better growth rate for this threshold than 7%.) MSHS(0)*MP(0) > 200*POWER(1.07,(INT(CM_DNC(0)/100)-2005)) Find more info later in this document pertaining to weaknesses in methods for estimating average daily dollar volume and market cap. FIA code used (a) to select US-traded stocks; (b) to exclude ETFs and other investment funds; and (c) to exclude data points with questionable returns data: (((CA_CUSIP<>"90011120" OR CM_DNC(0)<>200109) AND (CA_CUSIP<>"03512820" OR CM_DNC(0)<>199806) AND (CA_CUSIP<>"22025Y40" OR CM_DNC(0)<>200008)) AND ((SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE), 0)<>6722 AND SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE),0)<>6726)=1) AND (P_COUNTRY_ISO="US"))=1 This screen was used by DIA, but weaknesses were discovered by the work of FIA. Thus, this screen was abandoned. NYSE, NASDAQ, AMEX only Top 60% of these by market cap. (Note that using this universe definition yields roughly 1100 firms in January 1987 and roughly 3500 firms by November 2001. The market cap thresholds correspond to roughly $50mil in January 1987 and $200mil in February 2005.) *** This parameter’s implementation, as used for this screen, has proven to be faulty. The values reported for G_EXCHANGE_NAME do not update historically. Thus, companies such as Enron who are presently delisted, but were on the NYSE historically are wrongly excluded from all backtests! *** Also note that we investigated defining our universe by S&P or Russell index constituents, but it looks as if such historical data is not presently available with our subscriptions. STANDARD FORMULAS AND PARAMETER SELECTIONS TO USE FOR COMMON VARIABLES (AND UNIVERSE-LIMITING CRITERIA) Exchange: I could find no parameter that would give me historically updated exchange listing information. FactSet help gave me the following response: Dear Brian, This is George from FactSet. I have just found the problem, and it is two fold. First, we don't have historical constituents for the NYSE. Second, an additional subscription is required for historical constituents for the NASDAQ and AMEX. Because of this, you will be unable to see the list as it was back in, say, 1990. Even with the additional subscription, AMEX historical constituents only goes back until 1998. Sorry for the bad news. Let me know if you have any other questions. Sincerely, George T. Hogan George T. Hogan | Consultant | FactSet Research Systems Considered— G_EXCHANGE_NAME, IB_EXCHANGE, CA_EXCHANGE_NAME, CA_EXCHANGEN, IH_EXCHANGE G_EXCHANGE_NAME gives only record of current exchange listing. IH_EXCHANGE appears to come the closest to working, but also has many errors and missing data among its records. Exclusion of ETFs and other Funds: (SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE),0)<>6722 AND SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE),0)<>6726)=1 I might have alternately used PTYPEN, but in looking at the selection of companies screened out/in, I thought this method did a better job (it was not completely clear which was the better method though). Price per share: MP(0) This checks out ok on splits (all historical prices are revised when a split occurs; thus, they align with current price). Also considered— CM_P, P(0), @AVAIL(P(0),MP(0)) P(0) gives all or almost all NAs Shares Outstanding: MSHS(0) This checks out ok on splits (all historical values are revised when a split occurs; thus, they align with current shares outstanding). Though this reflects as-of-fiscal-period-end share counts that were not necessarily available (ie not as-reported), it eliminates other problems with noncontemporaneous lagged data being used in the current period. For example, companies for which shares outstanding should have been NA were finding an old (lagged 45 days) value that, multiplied with their current day price, pushed them into the market cap consideration set of companies— this caused some 80000% returns to enter our dataproblematic! Also considered— CM_SHS(0), CM_SHS(0 L45D) For future investigation- IH_SHRS_OUT Perhaps this variable could be combined with a non-split adjusted price (perhaps available via CM_P(xxx)?). Market Capitalization: MP(0) * MSHS(0) Also considered— RI_MKTCAP(0)/1000, CM_MKT_VALUE(0), @AVAIL(P(0),MP(0)) * CM_SHS(0 L45D) Average Daily Dollar Volume Over the Past Month: (CM_VOL(0)/(P_TRADING_DAYS(-1M,0M)-1))*((CM_PH(0)*CM_PL(0))/2) This checks out ok on splits (all historical prices and volumes are revised when a split occurs; thus, they align with current price and shares outstanding; volume and price metrics here are consistent for estimation of dollar volume). This expression could still use a lot of improvement, but it’s the best we have so far (ideally, we would probably calculate the median daily dollar volume for the preceding 21 trading days—- which may now be possible with P_VOLUME and P_PRICE parameters because the Calendar setting problem has been rectified). The parameter Alpha Testing give one more It appears to P_TRADING_DAYS(-1M,0M) was found to work (well enough) in but not in Universal Screening Reports. It appears to than the actual number of trading days in Alpha Testing. give one more than the actual number of calendar days (!) in Universal Screening Reports. Perhaps other parameters are similarly functional in Alpha Testing but not in Universal Screening Reports? Note that this calendar day versus trading day problem was later resolved—one must take care to ensure that under Options-> Calendar, the Other: US calendar is selected (not 7-day calendar for example). ??? SUM20(P_PRICE(0)*P_VOLUME(0)) ??? Problem with P_VOLUME(-1), P_VOLUME(-2), etc. is that it appears to increment backwards by calendar days, not trading days as desired. Same problem with P_VOLUME(-1,0,1000), P_VOLUME(-2,0,1000), etc. These problems appear to occur in the Universal Screening Report output, but not necessarily in the Excel environment. ***I think that this was a result of the same problem: {Note that this calendar day versus trading day problem was later resolved—one must take care to ensure that under Options-> Calendar, the Other: US calendar is selected (not 7-day calendar for example).} The following (in blue) works in Excel, but not in Universal Screening AVG(P_PRICE(-1M,0D,D)*P_VOLUME(-1M,0D,D)) Note that -1M bounds the date range at the last day of the preceding month. -1AM would have picked out the same day as the day referenced by 0D, but a month earlier. See FactSet Help PageID 1964. I tested this code with regard to stock splits for one stock in a recent time period and found that historical P_PRICE and P_VOLUME numbers had indeed been accurately revised in the P_ database to align with the post-split share levels. …at least this was my initial conclusion when comparing historical volume quotes for NYSE:SM in Yahoo! Finance with those outputted from FactSet to an Excel spreadsheet. Note that, at least in the case of the ERICY reverse split (and also the recent SM split I believe), it is evident that the historical volumes listed in Yahoo! Finance ARE NOT changed to reflect the new number of shares outstanding. This matter requires further attention/review. See FactSet Help PageID 614 for info on handling of Dividends, Stock Splits, and Spinoffs. IB_VOL_1D does not appear to increment backwards through time (at least not in Universal Screening Report (i.e. IB_VOL_1D gives same value as IB_VOL_1D(-1) or IB_VOL_1D(-100)). Note that median dollar volume would be preferable to mean dollar volume so that our estimated liquidity of a given stock is not thrown off by a one day spike in volume that enters the trailing window sample period. Minimum Threshold for Average Daily Dollar Volume Over the Past Month: 0.5*POWER(1.104,(INT(CM_DNC(0)/100)-2005)) Note that using the same threshold for both NASDAQ and NYSE not ideal. Due to double-counting of dealer trades, volume stocks is inflated relative to NYSE and AMEX stocks. Thus, screen would use different minimum dollar-volume thresholds stocks than for NYSE and AMEX stocks. stocks is for NASDAQ a better for NASDAQ (See “Standard Universe Screens” section for more.) Book value per share: AVAIL(CM_BK(0 L2M), G_BOOK_PS_USD(0 L2M)) An improvement?: IF((SUM(0,IHLQEPSDNC(0))=CM_DNC(-1) OR SUM(0,IHLQEPSDNC(0)) = CM_DNC(2)), AVAIL(CM_BK(0), G_BOOK_PS_USD(0)), AVAIL(CM_BK(0 L2M), G_BOOK_PS_USD(0 L2M))) Note that because the data series we have chosen are predominantly updated on a monthly basis, I believe that lagging by 45 days (L45D) will return data that is no different from that obtained when lagging by 2 months (2M). Thus, it may be preferable to explicitly lag by 2 months. Also note that this method of seeking to access the most recent (least stale) data may induce some lookahead bias in instances where firms shift their reporting schedules— because it relies on the timing of last year’s quarterly report to estimate whether this year’s quarterly report has occurred yet (and thus, whether we can look at the newest data point). This checks out ok on splits (all historical book values per share are revised when a split occurs; thus, they align with current shares outstanding). CM_BK(0) appears to update quarterly as-of-fiscal-period-end CM_BK(0) appears to have less missing data (NAs) than G_BOOK_PS_USD(0) G_BOOK_PS_USD appears to update annually as-of-fiscal-period-end After finding problems with the G_ dividend variables using current rather than historical values in Alpha Testing, I re-checked these G_ variables and fortunately did not find this same problem. Book Value-to-Price: AVAIL(CM_BK(0 L2M), G_BOOK_PS_USD(0 L2M))/MP(0) See immediately above for potential improvement This checks out ok on splits (all historical values are revised when a split occurs. thus, they align with current price and are consistent for the ratio calculation). It looks as if CM_BK and G_BOOK_PS_USD may only update annually in the 88-89 timeframe. Ideally, we would find a variable that updates quarterly. Also considered— CM_PBK(0), G_PBK(0) CM_PBK and G_PBK appear to contain the same data series Trailing EPS: AVAIL(IH_EPS_ACT_LTM(0), CM_EPS(0 L2M)) Use method shown for Book Value for improved timeliness of data? This checks out ok on splits (all historical EPSs are revised when a split occurs; thus, they align with current shares outstanding). Note that there are significant differences between these two options. IH_EPS_ACT_LTM is evaluated based on analyst consensus methods of adjustment to earnings (to align with IBES consensus forecast earnings). CM_EPS matches the sum of the numbers reported in the firm’s 4 most recent 10-Q’s. I believe these are basic EPS, not diluted. Perhaps in a future analysis, I will use diluted EPS- this would probably be preferable. Forward EPS Estimate: IH_MEAN_NTM(0) (A) Alternates: AVAIL(IH_MEAN_NTM(0), AVAIL((G_IBES_FY1_MEAN_USD(0)+IH_MEAN_NTMYR(0))/2, AVAIL(IH_MEAN_NTMYR(0), G_IBES_FY1_MEAN_USD(0)))) AVAIL(IH_MEDIAN_NTM(0), G_IBES_FY1_MED_USD(0)) IH_MEDIAN_FY2(0) (B) (C) (D) Use definition C, except for negative FwdEPS. For these, a combination/transform of IH_MEDIAN_FY2(0), IH_MEDIAN_FY3(0), and CQ_SALES_PS_LTM(0 L2M) (F) AVAIL(IH_MEDIAN_NTM(0), IH_MEDIAN_NTM(-1), IH_MEDIAN_NTM(-2), IH_MEDIAN_NTM(-3), IH_MEDIAN_NTM(-4), IH_MEDIAN_NTM(-5), IH_MEDIAN_NTM(-6), IH_MEDIAN_NTM(-7), IH_MEDIAN_NTM(-8), IH_MEDIAN_NTM(-9), IH_MEDIAN_NTM(-10), IH_MEDIAN_NTM(-11), IH_MEDIAN_NTM(-12), G_IBES_FY1_MED_USD(0)) (G) AVAIL(IH_MED_EPS_NTMA(0), IH_MEDIAN_NTM(0), G_IBES_FY1_MED_USD(0)) (H) AVAIL(IH_MED_EPS_NTMA(0), IH_MEDIAN_NTM(0), G_IBES_FY1_MED_USD(0)) (J) Note that in Definition J, I added an additional condition (screen) that excluded any stocks with forward earnings yield estimates greater than 1. There are only a couple stocks for which this is an issue (i.e. CUSIP 81600630 SEIBELS BRUCE GROUP INC in the late 80s). Clearly this seems to be some sort of data error because a stock cannot be expected to earn for in the coming year than it is worth! Preliminary study suggests that there is no lookahead bias in G_IBES_FY1_MED_USD. I am always concerned with G_ parameters because they have proven mostly unreliable in the past. As it is, mixing a FY1 forecast with an NTM forecast is less than ideal. Definition G was created because close inspection of the historical IH_MEDIAN_NTM data series revealed that there were often NAs entering the time series surrounded by what appear to be real earnings forecasts. Thus, if the data is NA for a given month, perhaps it is best to look at the previous month’s entry for screening purposes. Most of these check out ok on splits (all historical earnings forecasts are revised when a split occurs; thus, they align with current shares outstanding). The only one I have not checked (because it often returns “NA”) is IH_MEAN_NTMYR. All others have been checked out and appear sound with respect to stock splits. After finding problems with the G_ dividend variables using current rather than historical values in alpha tester, I re-checked these G_ variables and fortunately did not find this same problem. I next recommend looking more closely into variables like IH_MEAN_EPS_NTMA and IH_MEAN_EPS_STMA. Also IH_MED_EPS_NTMA, etc. Definition J is believed to be the best found thus far, but I know that there remains room for improvement to this variable/factor definition. Also considered— These don’t look like they offer any additional beneficial information over those already considered above: IH_MEAN_FY1, IH_MEAN_FY1R, G_PRICE_USD(0)/G_PE_IBES_EST_FY1, G_PRICE_USD(0)/G_PE_IBES_EST_FY1D, G_PRICE_USD(0)/G_PE_IBES_EST_NTM, G_PRICE_USD(0)/G_PE_IBES_EST_NTMD, CM_P(0)/CM_PE_IBES_FY1, CM_P(0)/CM_PE_IBES_FY1R Trailing Sales Per Share: CQ_SALES_PS_LTM(0 L2M) This checks out ok on splits (all historical values are revised when a split occurs; thus, they align with current shares outstanding). An improvement?: IF((SUM(0,IHLQEPSDNC(0))=CM_DNC(-1) OR SUM(0,IHLQEPSDNC(0)) = CM_DNC(2)), CQ_SALES_PS_LTM(0), CQ_SALES_PS_LTM(0 L2M)) Dividend Yield: CM_DIV_YLD(0) Also considered- IH_DIV(0)/MP(0)*100 The adjustment of MP(0) and IH_DIV(0) for splits is not consistent. Thus, we have discarded what previously was our secondary methodology of calculating dividend yield. This was the approach used by DIA (before IH_DIV(0) weaknesses were discovered) and subsequently discarded by FIA: AVAIL(CM_DIV_YLD(0), IH_DIV(0)/MP(0)*100) CM_DIV_YLD has fewer NAs than IH_DIV. Many IH_DIV NAs are 0’s in CM_DIV_YLD. CM_DIV_YLD also appears to contain better data for ADRs than IH_DIV. Still, there are rare instances where CM_DIV_YLD has NA when IH_DIV has a value. Thus, AVAIL function is used. Also considered— G_DIVS_PS_USD, G_COM_DIVS_TOTAL, G_DIV_YLD appeared to return current value only, not historical values as desired in Alpha Testing. Dangerous! Looks like IC_ variables in this respect. CM_DIV, CQ_DIVS_PS_PDATE and CQ_DIVS_PS may report double the actual quarterly dividend amount because if two dividend ex-dates or payments happen to occur in the same fiscal quarter, it will report the sum of them rather than one or the other. Implied Cost of Capital(ICC): (A) Uses Forward EPS Estimate method C (median estimate NTM or FY1) for E1 in approximation of implied cost of capital. (D) Uses Forward EPS Estimate method D (median estimate FY2) for E1 in approximation of implied cost of capital. See FactSet code directly for implementation w/quadratic equation. Interest Expense: ??? CA_INT_EXP_LTD(0) too many NAs. I_INTEREST_EXP(USD) returns NAs. SUE: IH_SUE_Q(0) In the Excel Plug-In, it looks like IH_SUE_Q is recorded concurrent with each fiscal quarter end (with NAs in months that do not correspond to a fiscal quarter end). The SUE that is reported corresponds to the surprise of that quarter’s new earnings report. However, in Alpha Tester, it looks like IH_SUE_Q(0) does contain the SUE of the most recently reported earnings (not backfilled SUE of the earnings for the most recently ended fiscal quarter). This is good. Also note that trying IH_SUE_Q(-1) in Alpha Tester appears to always give the same result as IH_SUE_Q(0)- i.e. it does not look back to find the previous SUE, but instead still gives the most recently available/reported SUE. Note that it appears SUE data is not present in database until 8/31/89. Thus, backtest periods cannot begin any earlier than this date. Finally, note that I recall noticing in passing that there may have been an abundance of 1.0 and -1.0 values returned by IH_SUE_Q in more recent time periods. This is an area of concern and should be investigated more thoroughly as it raises questions regarding the reliability of the SUE database. Alternately, my recollection of potentially flawed data may be what is flawed! This requires further attention. SUE Universe (A): Seeks to limit universe to firms who, in the previous year, reported quarterly earnings during the coming month (and presumably, are likely to report again this month- thus, hopefully exhibiting the greatest SUE impact upon returns). IHLQEPSDNC(0) This data series updates whenever new earnings are actually reported. Thus, it can be used to find whether a company reported earnings in a relevant past month (i.e. the same month as the present one, but a year ago). *** Look into improving this screen by using IH_SURPR_QDATE. I looked at this briefly in the Excel Plug-In for ticker:CTAS and it looks like it works! *** Turnover: See FactSet code directly for final implementation. Abandoned: CM_VOL(0)/(P_TRADING_DAYS(-1M,0M)-1)/MSHS(0) This screen still needs work because it appears that CM_VOL data is not available for dates preceding around 12/31/1986. Actually, CM_VOL may be available but P_TRADING_DAYS may not (ending sometime in 1984?). Abnormal Turnover is specified by performing the calculation shown above and subtracting the mean of 60 values computed when the above calculation is performed for each of the preceding 60 months for a given firm. Note that though computation of geometric means (of historical volume metrics) might have improved performance in this metric, I could find no function in Universal Screening that computes geometric means. Performing it by multiplying 60 terms and raising to the 1/60th power did not work either- it gave NAs- I think because FactSet can’t handle the precision that was necessary to compute the product of so many numbers most of which are <0.1. Based on previous findings, splits should not cause data problems with this method. However, I recognize that MSHS(0) is not precisely known at the time it is updated in the historical time series (though the number of shares outstanding does not change very rapidly). Using a lag of this variable would help, but I wonder whether that would introduce split-related errors (as has been observed before with lagged shares outstanding variables to calculate market cap). This matter warrants further investigation. Revision Ratio: (SUM(IH_UP_FY1(0),IH_UP_FY1(-1),IH_UP_FY1(-2))SUM(IH_DOWN_FY1(0),IH_DOWN_FY1(-1),IH_DOWN_FY1(2)))/SUM(IH_NEST_FY1(0),IH_NEST_FY1(-1),IH_NEST_FY1(-2)) I checked that these parameters do indeed appear to report data correctly in Alpha Tester (i.e. unlike IH_SUE_Q(0) and IH_SUE_Q(-1), IH_UP_FY1(0) is indeed different than IH_UP_FY1(-1) when the historical record is so). Note that the observed power of this signal in Alpha Testing backtests is so strong in the 1987-2000 period (though not 2001), that I am suspicious of possible look-ahead bias. Maybe revision ratio really is a profoundly powerful signal (I know it is one of the most widely-known and researched factors), but the observed strength of the signal in our backtests leads me to seek further auditing of the data source. This matter warrants further investigation. Momentum: (CM_P(-1)-CM_P(-13))/CM_P(-13) A market index: Best result so far that may require further evaluation and scrutiny: VALUE(SP50, MP(0)) I discovered this command very late in model development, so we did not have time to incorporate it into our screens. Upon first inspection, it does look as if it gives reliable data in Alpha Testing. It (or something similar) might be used to help dynamically scale minimum dollar volume or market cap thresholds, etc. No good: SP_VALUE_D_IDX(0), SP_VALUE_D_IDX(-30), SP_PRICE_IDX(0), SP_PRICE_IDX(-30), SP_VALUE_IDX(0), SP_VALUE_IDX(-30), SP_DIVIDEND_IDX(0), SP_DIVIDEND_IDX(-30), SP_PRICE(0), SP_PRICE(-30) Most give all NAs. SP_PRICE(0) sporadically returns values that are equal to the per share price of the stock itself (not the index). MISCELLANEOUS FACTSET PROBLEMS Calendar: Make sure that in Universal Screening, the Calendar option is set as follows: Go to Options -> Calendar ; select Other and US. Somehow, this got set to “Seven Day” at one point in our work and caused a lot of confusion with regard to some of the parameters. Also ensure that the US Calendar is selected in Alpha Testing. This selection is available via a “Select” button on the Time Series tab of the menu called up by the “Inputs” button under the heading “Model”. CompuStat Quarterly data (CQ_): A mysterious problem came and went (by creating a brand new screen and manually re-entering all of the code). CQ_ parameters were returning all NAs in Universal Screening and in Alpha Testing. The origin of this problem was never determined. Beware. COMMON PROBLEMS WITH PARAMETER CODE SELECTION Historical time-alignment of data - Some parameters do not have historically updated time series. They return the same single value across all time (which is the current value). This introduces extreme lookahead bias into the backtest. - Many parameters are backfilled such that new data appears immediately at the end of a fiscal quarter (even though the company does not publicly report the data for until perhaps 3 to 8 weeks later). Other parameters do update only as the new data became publicly available. Inconsistent split adjustment Beware of using parameters that do not historically adjust for splits in a consistent fashion (i.e. problems were encountered when using a lag of shares outstanding {MSHS(0 L45D)} in conjunction with a current price {P(0)}—these may or may not have been related to split adjustments). Survivorship bias Beware of setting a screening criteria or even just selecting a parameter for sorting/ranking (or screening) that may exclude defunct companies or cause all of their values to return as NAs. Of course, always select Universe->Use Research (Inactive) Companies. Missing or bad data Some parameters have very poor historical data (i.e. IH_EXCHANGE). Also, some parameters that do not appear to work in the Universal Screening Report do work very nicely in Alpha Testing. And vice versa. Beware. Some parameters with data that is only slightly in error can be fixed with the proper formula (i.e. (P_TRADING_DAYS(-1M,0M)-1) ). Note that lagged parameters using the “L2M”- or “L45D”-type syntax do not return lagged values in the Universal Screening Report, but do typically work properly in Alpha Testing. Extreme Compustat returns data A robust universe definition should exclude most stocks with questionable returns data from backtests. However, I find that some questionable returns data (very high monthly returns that differ from those recorded in Yahoo! Finance) seem to persist (see more on this topic near the end of this document). I recommend scrutinizing the max and min returns obtained in a backtest. Returns in excess of around 300% should perhaps be double-checked against another database. I tried alternately selecting Worldscope ad other returns sources (instead of CompuStat returns data) in Alpha Testing, but it appears that we may not subscribe to those other databases. MISCELLANEOUS NOTES For Help with I/B/E/S parameters: FactSet Online Assistant Page IDs: 214, 4728, 10490 FactSet Alpha Calculation: FactSet Online Assistant Page IDs: 582 Note that FactSet does not use the risk free rate in estimating alphas. It regresses the fractile returns against benchmark returns (not fractile returns minus riskfree returns against benchmarks returns minus riskfree returns). UPERCENTILE Function (and similar ones): Beware that these functions (also UGPERCENTILE, UQUINTILE, UDECILEX, etc.) probably don’t work the way you intend them to work if your universe is smaller than 100 (smaller than 10 for UDECILE, etc.) We have developed code that overcomes this shortcoming of these FactSet functions. (Find it in our FactSet Universal Screening code.) Large anomalous-appearing returns that checked out real (in Yahoo! Finance or at BigCharts.com or nasdaq.com): IDCC 19991130 641% OCCF 19960430 376% REGN 20000130 359% ARXX 20000131 283% EDIG 19991231 353% CYTO 20000131 260% 92908B30 VSOURCE INC 20000131 308% (at BigCharts.com) 64122D50 NETWORK PLUS CORP 20001229 313% (at BigCharts.com) 45769740 INNOVATION HOLDINGS IVHN 20041029 -99.8% (at nasdaq.com) Large anomalous-appearing returns that checked out bad or questionable according to Yahoo! Finance (and were eliminated from our backtesting universe): Cusip 90011120 03512820 22025Y40 Ticker Date TKC 20010928 bad 1536% AU 19980630 questionable 442% CXW 20000831 bad 389% Filenames of FIA’s most robust Universal Screening Code: (As noted elsewhere, the [only] ones-digit significance of much of the historical SUE data by IH_SUE_Q(0) raises concerns regarding the reliability/historical accuracy of this data series. More auditing of this data series is warranted. Further refinement of abnormal dollar volume, abnormal turnover, and change in net accruals definitions are other areas that warrant particular attention in the future.) Universe Definition: !FIA – UniverseDefinition v004 Abnormal Dollar Volume (in biggest 30% of universe): !FIA – AbnDV_B_BIG30_v006 Abnormal Dollar Volume (in next biggest (mid) 20% of universe): !FIA – AbnDV_B_MID20_v006 Abnormal Dollar Volume (in smallest 50% of universe): !FIA – AbnDV_B_SMALL_v006 Abnormal Turnover: !FIA – AbnTO_B_v006 Book-To-Price: !FIA – BtoP_v001 Change in Net Accruals: !FIA – ChgAccruals_v007 Change in Shares Outstanding (over trailing 3, 6, 12, and 18 months): !FIA – ChgShsOut_3_6_12_18M_v005 Dividend Yield: !FIA – DY_v001 Forward Earnings Yield: !FIA – FEYJ_v015 Forward Earnings Yield (including various industry-normalization schemes): !FIA – FEYF_AllinOne_v0013 Trailing Earnings Yield: !FIA – TEY_v001 Momentum: !FIA – Momentum_v005 Revenue-to-Price: !FIA – RtoP_v001 Reversal: !FIA – Reversal_v004 Revision Ratio: !FIA – RR_A_v003 Size (same as simple Universe Definition—use market cap to rank): !FIA – Size_v003 Standardized Unexpected Earnings: !FIA – SUEA_v013 Standardized Unexpected Earnings (in current month’s expected non-reporting universe): !FIA – SUEANR_v010 Standardized Unexpected Earnings (in current month’s expected reporting universe): !FIA – SUEAR_v012 FIA’s Mulitvariate Screen (with static factor weights): !FIA – All_Screen_v005