What's My Line? A Comparison of Industry Classification Schemes

advertisement
What’s My Line? A Comparison of Industry
Classification Schemes for Capital Market Research
Sanjeev Bhojraj
Charles M. C. Lee
Derek K. Oler**
Johnson Graduate School of Management
Cornell University, Ithaca, NY 14853
First Draft: November 1st, 2002
Current Draft: May 19th, 2003
Comments Welcomed
**
Correspondences can be directed to either Sanjeev Bhojraj ([email protected]), Charles M. C. Lee
([email protected]), or Derek K. Oler ([email protected]). We thank Richard Leftwich (editor) and an
anonymous referee for helpful comments, David Blitzer and Maureen Maitland of Standard & Poor’s for
their patient tutelage on the Global Industry Classification Standard (GICS)SM system, and Thomson
Financial Services Inc. for providing earnings per share forecast data, available through the Institutional
Brokers Estimate System (I/B/E/S). The I/B/E/S data have been provided as part of a broad academic
program to encourage earnings expectation research.
What’s My Line? A Comparison of Industry
Classification Schemes for Capital Market Research
Abstract
This study compares four broadly available industry classification
schemes in a variety of applications common to capital market research.
SIC codes have been available since 1939, but are being replaced by
NAICS codes. The Global Industry Classifications Standard (GICS)SM
system, jointly developed by S&P and MSCI, is gaining popularity among
financial practitioners, while the Fama and French (1997; FF)) algorithm
is devised by academics. Our results show that GICS classifications are
significantly better at explaining stock return co-movements, as well as
cross-sectional variations in valuation-multiples, forecasted and realized
growth rates, R&D expenditures, and various key financial ratios. The
GICS advantage is consistent from year-to-year and is most pronounced
among large firms. The other three methods differ little from each other in
most applications.
1. Introduction
Capital market research often calls for firms to be divided into more homogenous groups,
and the most common method for achieving this end is through industry classifications.
In recent studies, academic researchers have used industry groupings to limit the scope of
their investigation, identify control firms, secure performance benchmarks, and provide
descriptive statistics on sample firms.1 In all these applications, an industry classification
scheme is used to separate firms into finer partitions, with the expectation that these
partitions will then offer a better context for financial and economic analysis.
Industry classification is a long-standing problem in financial research. Although
Standardized Industry Classification (SIC) codes have been available since 1939, they
are in the process of being replaced by North American Industry Classification System
(NAICS) codes. At the same time, the Global Industry Classifications Standard
(GICS)SM system, jointly developed by Standard and Poor’s (S&P) and Morgan Stanley
Capital International (MSCI), is also becoming widely accepted, particularly among
financial practitioners. As a result of these changes, major data vendors, such as
Standard and Poor’s Compustat, now carry all three sets of industry codes. In addition,
financial researchers have also sought their own solution to the industry classification
problem (e.g., the FF algorithm developed by Fama and French (1997)).
In this study, we evaluate each of these four industry classification schemes in a variety
of applications common to capital market research. Our main objective is to document
the relative merits of each scheme in settings that are most commonly encountered by
financial researchers. Specifically, we examine their usefulness and limitations in
explaining cross-sectional variations in firm-level stock returns, as well as in marketbased valuation-multiples, forecasted and realized growth rates, R&D expenditures, and
other key performance ratios extracted from firms’ financial statements.
1
In a survey of the seven major accounting and finance journals over the 2000 and 2001 period, we
identified 116 studies that used some sort of industry classification scheme in their research design, some
for multiple purposes. More than half of these studies used industry classifications to identify control
firms.
1
Despite its well-documented problems, most past researchers have used SIC codes to
form their industry partitions.2 We suspect that the continuing popularity of SIC codes is
attributable to the absence of a superior, and widely available, alternative. While
financial analysts and professional asset managers use many proprietary industry
classification schemes, most of these are not available to academic researchers at a
reasonable cost.3 Each of the four algorithms we examine is general enough to
encompass the universe of actively-traded firms, and is widely available at low cost.
Our analysis shows a high degree of correspondence between SIC, NAICS, and FF
classifications. However, GICS classifications are much more likely to disagree with the
other three. Specifically, we find that when firms grouped by two-digit SIC codes are
mapped into their “primary equivalent” by NAICS codes, 80% of these mappings
resulted in a one-to-one correspondence.4 Similarly, two-digit SIC groupings agree with
their primary FF industry groupings for 84% of these firms. The primary GICS industry
groupings, on the other hand, agreed with the SIC groupings only 56% of the time.
One measure of the “economic relatedness” of firms is the extent to which their stock
returns are contemporaneously correlated. We form equal-weighted industry portfolios
using each classification scheme, and compare the ability of these industry portfolios to
explain contemporary monthly firm-level stock returns. Our results show that industry
portfolios formed using GICS, consistently explain a significantly greater proportion of
the variation in cross-sectional firm-level returns. This result is extremely robust. GICS
outperforms SIC and FF in each of the 8 years in our sample (1994 to 2001). GICS also
outperforms NAICS in 7 out of 8 sample years.
2
Among the firms that used a general industry classification scheme, we found that more than 90% used
SIC codes. Studies that have discussed problems and limitations of SIC codes include Clarke (1989),
Kahle and Walking (1996), Guenther and Rosman (1994), and Fan and Lang (2000).
3
For example, proprietary industry classification schemes are available from analytical software providers
such as BARRATM and FactSetTM.
4
We define an industry group in a given classification scheme as a “primary equivalent” if its member
firms have the highest level of correspondence with the member firms in a particular two-digit SIC group.
2
Another measure of homogeneity across firms is the extent to which the market ascribes
similar valuation multiples to their key accounting measures, such as earnings, book
value of equity, and sales revenue. We evaluate the four classification schemes by
forming industry portfolios using each scheme, and comparing the ability of the mean
industry multiple to explain firm-level multiples. Using annual data, we show that
industry means based on GICS classifications explain a much greater proportion of the
variation in firm-level price-to-book (pb), enterprise-value-to-sales (evs), and price-toearnings (pe) ratios than the other three methods. The margin of victory varies across
time and accounting constructs. On average, we achieve a 10 to 30 percent increase in
the adjusted r-square when using GICS rather than one of the other three classification
schemes.
Financial researchers are also often interested in identifying firms with similar operating
characteristics, for comparison and control purposes. We form industry portfolios using
each scheme and compare the ability of the mean industry ratio to explain key firm-level
ratios. The financial ratios we consider include: (1) return-on-net-operating-assets
(rnoa), (2) return-on-equity (roe), (3) asset-turnover-ratio (at)5, (4) net profit margin
(pm), and (5) debt-to-book-equity (lev) . We find that industry means computed using
GICS consistently outperform industry means computed using the other three methods in
terms of their ability to explain cross-sectional variations in firm-level rnoa, roe, at, and
pm. For lev, we find that SIC and NAICS classifications perform better than GICS,
perhaps because SIC and NAICS are production-technology based algorithms that better
capture to the amount of debt firms tend to assume.
A variable of increasing importance in financial research is a firm’s forecasted five-year
earnings growth rate as supplied by sell-side analysts (ltgrowth). Prior researchers have
used this variable in equity valuation (e.g., Frankel and Lee (1998), Lee, Myers, and
Swaminathan (1999)), tests of market efficiency (e.g., La Porta (1996)), cost-of-capital
estimations (e.g, Gebhardt et al. (2001), Claus and Thomas (2001)) and identification of
peer firms (Bhojraj and Lee (2002)). To the extent that an industry classification scheme
5
Specifically, we use the inverse of the asset turnover ratio (i.e., total assets over total sales).
3
groups firms into more homogenous units by their expected growth, the mean forecasted
growth for each industry should explain a greater proportion of the firm-level variations.
Our results show that GICS industry groupings produce the best results on for this
variable. Specifically, mean industry forecasted growth rates under GICS explain, on
average, 41.9% of the cross-sectional firm-level variation. The closest competing scheme
(NAICS) explains only 33.7%.
Finally, we examine the effectiveness of the various classification schemes in grouping
firms by their one-year-ahead realized sales growth (sales growth) and R&D
expenditures scaled by sales (R&D). Because both ltgrowth and GICS classifications
are affected by analyst perceptions, one concern with the forecasted growth results is that
it reflects analyst bias rather than economic realities. The sales growth and R&D
variables avoid this problem because they measure economic phenomena over which
analysts have no control.
Our results show that GICS is again significantly better than the other three classification
schemes in grouping firms based on these two measures. In the case of sales growth,
mean industry growth rates under GICS explain, on average, 16.1% of the cross-sectional
firm-level variation, while the nearest competing scheme (NAICS) explains only 13.2%.
In the case of R&D, GICS industry means explain a remarkable 64.2% of the firm-level
variation, while the nearest competing scheme (FF) explain only 52.7%.
Further analysis shows that the GICS advantage has little to do with differences in the
size and number of industry categories used by the four schemes. The four classification
schemes do not divide firms into the same number of industry categories. For example,
two-digit SIC codes result in 54 functional categories, NAICS result in 56 categories,
GICS results in 51 categories, and FF results in 40 categories.6 To ensure our results are
not due to these differences, we conduct Monte Carlo simulations that neutralize the
advantage a scheme derives from having a greater number of industry categories. These
tests have little effect on our findings.
6
We define a functional category as one that contains five or more member firms.
4
Finally, we show that the GICS advantage is most pronounced among the largest firms.
For firms in the S&P 500 index, we find that GICS industry means dominate the industry
means from the other three methods in terms of their ability to explaining firm-level
returns and valuation-multiples in each of our sample years. The GICS margin of
victory, while still significant, is lower for mid-cap (S&P MidCap 400) and small-cap
(S&P SmallCap 600) stocks.
In sum, we find that in most research applications encountered by financial academics,
the GICS classification system provides a better technique for identifying industrial
peers. Given the increased availability of GICS information at relatively low cost, and its
wide acceptance by financial practitioners, we believe our findings provide a strong case
for its wider adoption by academic researchers in projects that involve industry
classifications.
The next section discusses prior related research. Section 3 provides background
information on each of the four competing classification algorithms. Section 4 describes
our research method and sample. Section 5 presents the empirical results; and Section 6
concludes.
2. Related Research
Despite the widespread use of industry classification schemes by academic researchers,
few studies have directly tested their efficacy. Two studies that addressed this issue
focused on differences in the SIC codes reported by Compustat and the Center for
Research in Stock Prices (CRSP) database. This problem arises because although the
SIC industry categories are established by the Federal Census Bureau, the responsibility
for assigning the primary industry code to a specific firm falls to the data vendor.
Frequently, this assignment is not made on a consistent basis across data vendors.
For example, Guenther and Rosman (1994) compared the SIC codes across Compustat
and CRSP and found that the primary two-digit SIC code from these two sources
5
disagreed, on average, 38 percent of the time. Moreover, they showed that Compustat
SIC codes yield higher intra-industry correlations in stock returns, and lower intraindustry variances in financial ratios. Kahle and Walkling (1996) confirmed the
Guenther and Rosman results, and further showed that Compustat-matched samples are
more likely to detect abnormal performance than CRSP-matched samples. Perhaps due
to these findings, most recent papers that feature SIC codes obtain them from Compustat
rather than from CRSP. We also use Compustat SIC codes.
Two other studies from the industrial organization literature highlight the shortcomings
of SIC codes in producing homogeneous industries. Clarke (1989) examines whether
firms in the same SIC category exhibit more similar sales changes, profit rates, or stock
price changes. He concluded that SIC is not successful at identifying firms with such
similar characteristic variables. Fan and Lang (2000) use commodity flow data from
input-output (IO) tables to construct alternative measures of economic relatedness. They
show that, at the industry level, the two IO-based measures they construct provide a
richer description of firms’ relatedness than traditional SIC-based measures. However,
their technique involves data that are not widely available, and is more suited to interindustry analysis than the firm-level applications common in capital market research.
Although the issue of industry classification is not its main focus, Ramnath (2001)
discusses an analyst-based definition of industry groups. Specifically, in Section 4.1 of
that study, the author defines an industry as a group of firms having at least five analysts
in common with every other firm in the group. For example, if firms A, B, C and Z all
have at least five analysts who also follow each of the other firms, then A, B, C and Z are
deemed to be in the same industry. He finds that industrial delineations defined in this
fashion differs sharply from those based on SIC codes. Although this approach to
industry partitioning has appeal for financial analysis, it is applicable only to firms with
five or more analysts and will leave many firms unclassified. Finally, Krishnan and Press
(2002) investigate the implications of the NAICS for accounting research. Using the
Guenther and Rosman (1994) methodology, these authors show that NAICS offers some
6
improvement over the SIC system in defining manufacturing, transportation, and service
industries.
In sum, most prior studies have documented problems with the SIC system without
nominating a decidedly superior alternative, and none have examined the efficacy of
general industry classification schemes beyond SIC and NAICS.
3. Background Information
In this section we provide information on the historical development, intent, and basic
philosophy behind each of the four competing classification algorithms.
3.1 SIC Codes
The oldest of the four, the Standardized Industry Classification (SIC) system, was
established in the 1930s by an Interdepartmental Committee on Industrial Classification
operating under the jurisdiction of the Central Statistical Board. The goal of this
committee was “to develop a plan of classification of various types of statistical data by
industries and to promote the general adoption of such classification as the standard
classification of the Federal Government.” (Pearce (1957)). True to this mandate, SIC
has become the primary algorithm for delineating industrial activities in the U.S., and is
widely used not only by governmental agencies, but also by marketers and financial
economists. This system has been periodically revised to reflect the economy’s changing
industrial composition and organization, but recent revisions have been deferred in
anticipation of the NAICS. The last revision of the SIC was in 1987.
3.2 NAICS Codes
In response to rapid changes in both the U.S. and world economies, governmental
statistical agencies in Canada, Mexico, and the U.S. undertook the joint development of a
uniform classification system. In 1999, the three countries announced the introduction of
the North American Industry Classification System (NAICS). The aim of NAICS is to
improve the SIC “by using a production-based framework throughout to eliminate
definitional differences; identifying new industries and reorganizing industry groups to
7
better reflect the dynamics of our economy; and allowing first-ever industry
comparability across North America.” (Saunders (1999)).7
Eventually, the NAICS is expected to replace SIC codes in the reporting of all
governmental statistics. However, during the current transition period, data vendors
generally carry both codes in their databases. For example, beginning February 2000,
NAICS codes have been available in the master files of the Compustat database for both
firms in both the Current and Research files. For firms in the research file, Compustat
reports the NAICS code that is applicable prior to delisting. Since NAICS categories
have only been available since 1997, the assignments are made on a retroactive basis for
firms that were in existence prior to 1997.
SIC and NAICS share many commonalities. Both emanate from similar sources (i.e.,
governmental agencies interested in collecting broad industrial statistics) and have a
common hierarchical lineage (i.e., they are “erected on a production-oriented or supplybased conceptual framework in that establishments are grouped into industries according
to similarity in the process used to produce goods or services.”)8 Neither system is
designed with the specific concerns of the finance community in mind. Therefore, it is
perhaps not surprising that their performances are fairly similar in most financial
applications.
3.3 Fama-French Industry Classifications
In contrast, the Fama-French industry classifications were developed by financial
academics. In their study of industrial costs-of-capital, Fama and French (1997) devised
an algorithm that reclassified existing SIC codes into 48 industry groupings – see that
paper’s Appendix for a concordance of these industries and their corresponding SIC
codes. Their aim was to address some of the more glaring problems with the SIC codes
by forming industry groups that are more likely to share common risk characteristics.
7
As quoted by Krishnan and Press (2002). Krishnan and Press also provide an excellent summary of the
differences between SIC and NAICS groupings.
8
Quote from the NAICS Manual (see OMB (1998; page 11)).
8
While these industry groupings have been used by other researchers (e.g., Gebhardt et al.
(2001); Lee et al. (1999)), their efficacy has never been directly tested.
3.4 GICS Codes
The GICS structure was also developed with the financial community in mind. This
system is the result of collaboration between Morgan Stanley Capital International
(MSCI) and Standard and Poor’s (S&P). As leading providers of stock indices and
benchmark-related products and services, both companies have a particular interest in
serving the needs of financial professionals. In fact, the GICS Guide Book describes the
system as a product that “aim(s) to enhance the investment research and asset
management process for financial professionals worldwide.”9
According to the GICS Guide Book, companies are classified on the basis of their
principal business activity. In making these assignments, S&P and MSCI analysts are
guided by information from annual reports and financial statements, as well as
investment research reports and other industry information. Specifically, a company’s
sources of revenue and earnings play important roles, as does market perception as
revealed by investment research reports. This approach represents a sharp departure from
SIC and NAICS, both of which rely on a production-oriented, supply-based approach in
delineating industry categories.
The GICS Guide Book offers further guidelines for companies that do not fall neatly into
a single category. A company significantly diversified across three or more sectors, none
of which contributes the majority of revenues or earnings, is classified under either the
Industrial Conglomerates sub-industry (Industrial Sector) or the Multi-Sector Holdings
sub-industry (Financial Sector). When a company is engaged in two or more
substantially different business activities, none of which contributes 60% or more of
revenues, it is classified in the sub-industry that provides the majority of both the
company’s revenues and earnings. When no sub-industry provides the majority of both
9
GICS, S&P and MSCI (2002; page 3).
9
the company’s revenues and earnings, the classification is determined by more extensive
analysis.
3.5 Availability of GICS Codes
Although widespread availability of GICS information is a recent phenomena, the
algorithm has deep historical roots. The current GICS system is a refinement of the S&P
industry classification system, which has been in existence for over 30 years. The
company introduced GICS in an August 1999 press release, and officially switched its
flagship index products to GICS in January 2001. Going forward, GICS is expected to be
the de facto standard in the company’s domestic as well as foreign products.
S&P supplies GICS information in two forms. The first, current gics (mnemonic:
spgicx) is the most recent GICS code for a given firm. For inactive firms, this variable
represents the GICS applicable for the firm immediately prior to delisting. The second,
historical gics (mnemonic: spgicm) is a monthly history of GICS. In theory, historical
GICS (spgicm) provides the most accurate measure of the company’s industry grouping
as of a given historical date. Therefore, our tests are based on historical GICS codes.
Occasionally, a firm’s historical GICS, NAICS, or SIC code was not available for a given
year. In these instances, we substituted the current industry classification.10
Beginning in 2002, researchers will be able to obtain GICS codes for U.S. firms in three
S&P products: GICS History, Research Insight, and Compustat PDE Files. Figure 1
summarizes the availability of historical GICS codes (spgicm) in each product. GICS
History has the most comprehensive historical coverage, going back to 1985 and
embracing over 20,000 active and inactive firms. However, this product only became
available in 2003 and at present it must be purchased separately.
10
In practice, industry classifications for a given firm rarely changed from year to year. Our analysis of
S&P1500 firms from 1994 to 2001 indicates that, on average, 2.8% of these firms had a change in their
GICS code assignment each year. This percentage is slightly higher for SIC and NAICS (3.4% and 3.5%,
respectively). Our results would not be significantly different if we delete observations with missing
spgicm rather than substituting the current industry classification. Our results are also unaffected if we
replace missing historical codes with the nearest adjacent historical code (from subsequent or prior years)
rather than the current code.
10
[Insert Figure 1 Here]
For this study, we used GICS codes from Research Insight, a PC-based product
developed by S&P. The February 29, 2002 release of Research Insight that we used
contains historical GICS for all members of the S&P Super Composite 1500 back to
1994. As of January 2002, S&P has also began providing GICS codes in its Compustat
PDE (price, dividend, and earnings) files. Our use of Research Insight data is due to data
problems we encountered with the GICS information contained in the current Compustat
release. Some of these problems have not yet been corrected as of the date of this
writing.11 We expect that in the near future, researchers will be able to use GICS codes
obtained directly from Compustat.
4. Data and Sample Description
In conducting our analyses, we focus primarily on S&P 1500 firms. We draw our
information on GICS, SIC and NAICS, from S&P Research Insight (the February 29,
2002 release), using S&P 500 (large), 400 (midcap), and 600 (small) membership lists as
of the end of December of the prior year. Our decision to focus on S&P 1500 firms is
driven largely by data availability. As indicated in Figure 1, historical GICS information
on Research Insight is available for S&P 1500 firms starting in 1994, but coverage for
non-S&P 1500 firms did not begin until 1999. Our main analyses are based on a sample
consisting of both active and inactive firms. The exclusion of the inactive firms does not
significantly change our results.
We use two-digit SIC codes as our primary definition of industry because these
groupings have appeared extensively in prior research. Conveniently, the Fama-French
(1997) classification scheme produces a similar number of industry groupings as twodigit SIC codes. To ensure that the number of partitions using GICS and NAICS are also
11
Specifically, we found significant errors in the historical gics data (mnenmonic: spgicm) in Compustat’s
research file, which contains all inactive stocks. We confirmed these problems with S&P, who plan to
release a corrected version soon.
11
comparable, we used the first three digits of the NAICS code and the first six digits of the
GICS code.
Table 1, Panel A provides a breakdown of SIC, NAICS, FF, and GICS categories at
various levels. Although the official number of categories for SIC and NAICS
(especially at the four and five digit level) is large, many of these categories are not used
by data vendors that make firm-level assignments. Moreover, a large number of these
industries have fewer than 5 member firms. To ensure the industry groupings are
meaningful, we define an industry category as “functional” if it encompasses at least 5
member firms in any given year. Industries with fewer than 5 firms are excluded from
our analyses. Panel A shows that, for our purposes, SIC has 54 functional categories,
NAICS has 56, FF has 40, and GICS has 51.
Panel B of Table 1 provides statistics on the number of firms per industry, according to
the above definitions. A key advantage of using GICS over other industry definitions is
immediately evident: GICS provides the most even distribution of firms across its
industry categories. The median number of firms per GICS industry category is 21, with
a mean of 26 firms per industry. In contrast, NAICS provides the most skewed
distribution (a median of 9 firms, with a mean of 20 firms per industry). The maximum
number of firms in an industry based on GICS is 87, as compared to 137, 173 and 140
based on SIC, NAICS and FF respectively.
[Insert Table 1 Here]
We obtain returns, share prices and shares outstanding information from the 2001 Center
for Research in Securities Prices (CRSP) monthly database. Share prices and shares
outstanding are as at the last trading day of December of each year. Financial statement
information is from the 2000 merged (active and research) Compustat database. For each
sample year, we use most recent median I/B/E/S consensus analyst long-term growth
forecasts as of December. To evaluate the efficacy of various classification schemes, we
12
also compute a number of market multiples and financial ratios. The various measures
used and their definitions are provided in Appendix B.
In carrying out our analyses (other than the returns regressions), we needed to impose
additional data availability requirements. Specifically, we drop all firms with missing
total assets (D6), total long term debt (D9), net income before extraordinary items (D18),
debt in current liabilities (D34), and operating income after depreciation (D178). To
reduce the effect of outliers, we also required that the share price on the last day of
December be more than $3, net sales (D12) be more than $100 million, and both total
common equity and total shareholders’ equity (D60 and D216) be positive.
In addition to these general requirements, we also imposed certain restrictions for specific
regressions. For regressions involving price-to-earnings (pe) multiples, we impose a
constraint of positive net income before extraordinary items (D18), and in regressions
involving rnoa, we require non-missing values for current assets (D4), current liabilities
(D5), and property, plant, and equipment (D8). Finally, to further reduce the influence of
outliers, we delete the top and bottom 1% of observations, sorted by the variable of
interest. These restrictions, coupled with the need to match cusip with permno to obtain
share trading information, resulted in less than 1,500 annual observations per year. The
actual observations used in various tests vary depending on data availability.
5. Empirical Results
Table 2 provides a concordance between SIC and the other classification codes. For each
two-digit SIC code, we report the number of firms within that category as of December
2001 (e.g., for SIC industry 20, we have 38 firms in the S&P1500). We then show the
corresponding NAICS, FF, and GICS industry that contains the highest number of those
firms (the “primary equivalent”).
In some cases, we find a perfect match (e.g., SIC industry 17 has three firms, and all three
firms are found within NAICS industry 235 and within FF industry “Construction”). In
other cases, the match is poor (e.g., SIC industry 50 has 30 firms, only 5 of which are
13
found in GICS industry 452030, which is the largest number of those 30 firms found in
any single GICS industry). The bottom row of this table, labeled “sum,” shows that for a
given S&P1500 firm, if you select a NAICS classification based only on that firm’s SIC
industry, you will (on average) be correct 80% of the time. If you select a FF industry
based on that firm’s SIC industry, you will be correct 84% of the time.12 But if you select
the firm’s GICS industry based on that firm’s SIC industry, you will be correct only 56%
of the time.
[Insert Table 2 Here]
5.1 Correlation in Monthly Returns
Next, we form equal-weighted industry portfolios using each classification scheme, and
compare the ability of these industry portfolios to explain contemporaneous monthly
firm-level stock returns. The result of this analysis is shown in Table 3. Panel A,
provides year-by-year results of monthly return regressions using the industry mean from
each of the four industry specifications. We find that industry definitions based on SIC
and FF result in adjusted r-squares that are quite similar to each other (22.9 and 22.6
percent, respectively). NAICS performs somewhat better, yielding an average adjusted rsquare of 24.2%. But the best result is achieved using GICS industries (average adjusted
r-square of 26.3).
Panel B shows that GICS explains more firm-level returns than SIC and FF in all 8 years.
It does better than NAICS in 7 of the 8 years studied. On average GICS outperforms
SIC, NAICS and FF by 3.4%, 2.1% and 3.7% respectively, which are significant at the
5% level or better. Further, the gap between GICS and the others is widening over time,
with the best results occurring in the last 3 years. For example GICS outperformed SIC,
NAICS and FF by 1.7%, 0.5% and 1.9% in 1994, and outperformed them by 7.8%, 6.1%
and 6.7% in 2000.
12
FF industries are a reclassification of 4-digit SIC codes. Therefore, using all four SIC digits, one would
achieve a 100% correspondence between FF and SIC.
14
[Insert Table 3 Here]
Figure 2 plots the average yearly adjusted r-square for each classification scheme for both
S&P 500 and 1500 firms. As discussed earlier, GICS outperforms the other classification
schemes, with the separation getting larger in recent years.13 The results also suggest that
the performance of GICS becomes stronger when the analysis is restricted to S&P 500
firms.
[Insert Figure 2 Here]
5.2 Valuation Multiples and Financial Ratios
While return association is a telling metric of economic relatedness, other measures are
also important. Table 4 reports results for a variety of other measures. With only one
exception (leverage), GICS provides a higher average r-square than competing
classifications. In most cases the difference is statistically significant. For example,
Panel B provides evidence on valuation multiples. We find that variations in pe
multiples are the most difficult to explain using industry membership, and variations in
evs multiples are the easiest. More importantly, GICS outperforms the other systems
across all three multiples (pb, evs, pe). Improvements in explaining pb range from 4.2%
to 5.1% while improvements for evs range from 3.8% to 5.9%. These represent
proportional increases of 10% to 30%. Improvements relating to the pe multiple are
about 1% (vs. SIC and NAICS) and 2.6% (vs FF), representing proportional
improvements of, once again, 10% and 30%.
Panel C and D provide evidence on the effect of the classification schemes on financial
ratios and other financial information, including realized sales growth, analyst forecasts
and R&D. The strongest results are for R&D, where GICS beats its closest competitor
(Fama-French) by an average margin of 11.5% (proportionately, an increase of 21.8%).
The only metric on which GICS under-performs the other schemes is leverage. This
13
This trend is evident only for stock return co-movements, and is not significant for most of the other
variables.
15
result may be attributable in part to the fact that SIC and NAICS are production-based
classification systems, and leverage is highly associated with production technology, with
capital intensive industries having higher leverage. Overall, the closest competitor to
GICS is probably NAICS. GICS performs better than NAICS in most categories, but the
different is not statistically significant for pe, rnoa, roe and pm.
[Insert Table 4 Here]
5.3 Monte Carlo Simulations
Although we have attempted to ensure that the number of industry groups are similar
across the various classification schemes, differences remain. Because we require at least
5 firms per industry, one possible concern is that one industry definition might have a
mechanical advantage over others. For example, a classification system with fewer
industry partitions has an inherent disadvantage when the total number of firms is held
constant. If one scheme has 10 functional industries consisting of 6 member firms each,
and another scheme has 3 functional industries consisting of 20 members each, we can
expect the first scheme to achieve a higher r-square than the second, even if the actual
allocation of firms for both schemes was random.
To ensure our results are not due to these differences, we conduct Monte Carlo
simulations that neutralize the advantage a scheme derives from having a greater number
of industry categories. To conduct the simulation, we randomly assign S&P1500 firms
into the same number of industry categories, with the same number of firms per industry,
as a given scheme. We then conduct the regression on the simulated data, generating a
“simulated” r-square. We then repeat the procedure 500 times, producing an average
simulated r-square for each classification scheme. Each simulated r-square serves as a
performance benchmark for the scheme in question.
For example, Panel A shows that, for the returns regressions, random assignment into the
54 2-digit SIC industries resulted in an average simulated r-square of 14.5%. In other
words, even if SIC codes have no ability at all to partition firms into more homogeneous
16
groups, we would still observe an average r-square of 14.5% in this regression.14 We also
compute a “Revised” r-square (the difference between the actual r-square achieved by the
scheme and its average simulated r-square). In the case of the returns regression, the SIC
classifications produced a revised r-square of 8.4%, compared to 11.9% for the GICS
classifications.
Table 5 indicates that mechanical differences between classification standards have little
effect on our results. Although the simulated r-squares do differ largely in the direction
we expected, the revised r-squares show that GICS continues to outperform its
competitors. In fact, the GICS advantage over SIC and NAICS generally widens after we
make this adjustment. For example, in panel A, the GICS outperforms SIC, NAICS and
FF by 3.5%, 2.2% and 3.8% respectively. This indicates an improvement of about 0.1%
compared to Panel A in Table 4.
Panels B, C and D describe the simulation adjusted performance of GICS for valuation
multiples, financial ratios and other financial information. In these panels the simulated
r-squares suggest that purely mechanical differences should cause GICS to perform about
the same as SIC, outperform FF by approximately 0.8% (3.9%-3.1%), and underperform
NAICS by about 0.5% (3.9%-4.4%). However, as seen from the revised r-square
numbers, GICS actually significantly outperforms both SIC and NAICS. In general,
GICS also continues to outperform FF with significance levels similar to Table 4 (except
for minor erosion in the statistical significance of a few results). Thus, in evaluating
performance relating to market multiples and financial ratios, the results in Table 5 are
quite similar to those in Table 4.
[Insert Table 5 Here]
5.4 The Effect of Firm Size
Finally, we also study the effect of firm size on the performance of the industry
classification schemes by examining S&P500, 400 (midcap), and 600 (smallcap) firms
14
The Monte Carlo R-sq can also be partially attributed to global (inter-industry) commonalities.
17
separately. Panel A of Table 6 shows that for the S&P500, GICS dominates the other
classification systems in all dimensions evaluated. The results for the S&P500 are much
stronger than the results based on the S&P 1500 (see Table 4).
The results using the S&P 400 (midcap) firms are weaker, with GICS dominating in 8 of
the 10 metrics used (see Panel B). However, in those 8 metrics, GICS outperforms the
others by a larger margin than when considering the entire 1500 firms. For example,
when considering returns, GICS outperforms SIC, NAICS and FF by 7.5%, 6.4%, and
6.6% respectively as compared with 3.4%, 2.1% and 3.7% respectively (Table 4). The
weakest results relate to the S&P 600 (small cap) firms (see panel C). However, even
among small firms, GICS outperforms the others in 8 of the 12 categories.15
[Insert Table 6 Here]
5.5 Analyst Perception versus Economic Reality
Because analyst perceptions play a role in GICS classifications, one concern with our
results is that it reflects analyst bias rather than economic realities. In other words, these
results may be induced by the fact that GICS classifications and analyst perceptions are
endogenous, thus rendering the findings less interpretable.
While it is impossible to rule out this concern entirely, we believe analyst perceptions are
unlikely to explain our findings for two reasons. First, we have conducted detailed
interviews with top S&P personnel in charge of GICS classifications to determine the
extent to which analyst perceptions play a role in GICS assignments. The clear
impression from these interviews is that while analyst perceptions influence the original
formulation of industry categories, analysts play no role in the assignment of individual
firms to the specific categories. Second, while analyst perception might affect some of
15
In the course of our extensive interviews with the top-ranking S&P personnel in charge of the GICS
project, two possible explanations for these findings were suggested. One explanation is that smaller firms
tend to have a single line of business, and SIC/NAICS tend to do a reasonable job of classifying these
firms. Another explanation is that, because of the nature of indexing, GICS industry definitions were
formulated primarily on the basis of the businesses that larger firms operate in. The smaller firms with
18
our test variables (e.g., ltgrowth, and possibly the valuation multiples), most of the
variables we test are beyond analyst control (e.g., sales growth, R&D, the fundamental
ratios, and stock returns). The fact that GICS outperforms the other classification
schemes using variables such as like R&D, realized sales growth, and stock returns,
suggests that analysts’ perceptions are unlikely to account for most of our results.
6. Conclusion
This study evaluates alternative industry classification schemes in various applications
common to capital market research. Specifically, we compare: (1) the Standardized
Industry Classification System (SIC); (2) the North American Industry Classification
System (NAICS); (3) Fama-French (1997) industry groupings (FF); and (4) the Global
Industry Classifications Standard (GICS).
We find that GICS classifications are significantly better at explaining stock return comovements, as well as cross-sectional variations in valuation-multiples, forecasted
growth rates, and key financial ratios. The GICS advantage is consistent from year-toyear and is most pronounced among large firms. For most of these applications, the
performance of the other three methods differ little from each other.
We believe that the superior performance of GICS is attributable to two main factors: (1)
the financial-oriented nature of the industry categories themselves, and (2) the
consistency of the firm assignment process. Unlike SIC and NAICS, GICS industry
groupings are established to meet the needs of investment professionals, and are not
primarily shaped by firms’ production-technology.16 In addition, the assignment of GICS
codes to individual firms is done by a team of specialists at S&P and MSCI, and is not
left to the discretion of the data vendor. This centralized assignment process also appears
to improve the consistency and usefulness of the industry groupings.
unusual business lines have to be fitted into the closest existing industry group, rendering a generally
poorer fit.
19
As we mentioned in the introduction, industry classification is of vital importance to
financial academics and practitioners. Indeed, the usefulness of industry groupings
affect, to varying degrees, virtually every significant area of empirical research. The
relative importance of the improvements we document will vary across specific research
settings. However, the fact that GICS performs as well as, or better than, each of the
other methods in virtually all our tests suggests that it should be the preferred method to
group firms by industry in most research settings.
For example, studies of earnings management often use a cross-sectional approach
to compute abnormal or discretionary accruals (e.g. Subramanyam (1996); Hribar and
Collins (2002); Kothari, Leone and Wasley (2003)). These studies generally feature
industry-based control samples, with the expectation that the relation between
operating accruals and changes in sales or property, plant, and equipment are constant
within an industry. The power of these tests, in turn, depends on our ability to effectively
group firms with similar operating characteristics. We expect GICS codes to provide
more powerful tests than SIC codes in these settings. More broadly, the GICS algorithm
offers an advantage whenever the research objective involves identifying (or quantifying)
unusual or abnormal operating activities.
Industry classification is also critical in fundamental analysis and valuation studies. For
example, valuation models that compute terminal values by reverting firm ROE or ROA
to industry mean/medians (e.g. Frankel and Lee (1998) and Lee, Myers, and
Swaminathan (1999)) should benefit from improved industry classifications. Industry
membership is crucial in determining a firm’s cost of capital (e.g., Fama and French
(1997); Gebhardt, Lee, and Swaminathan (2001)), estimating valuation multiples (Alford
(1992); Liu, Nissim, and Thomas (2002)), and identifying better comparable firms
(Bhojraj and Lee (2002)). More generally, we expect GICS codes to provide superior
industry classifications for most fundamental analysis and valuation studies that call for
industry-based control samples.
16
Even though FF classifications are also intended for a financial audience, these industry groupings are
based on a re-classification of SIC codes and do not reflect a fundamental shift away from the production-
20
Finally, we expect GICS industry classifications to be particularly useful in studies of
analyst behavior. In these studies, grouping firms by analyst perception is generally an
advantage rather than a drawback, thus accentuating the GICS advantage. For example, a
recent paper by Boni and Womack (2003) uses the GICS classification scheme to
develop an industry-based strategy of buying stocks net upgraded in each industry and
shorting stocks net downgraded in each industry. The authors argue that the contribution
of analysts is most evident only after we properly control for industry membership. In
this context, they argue specifically for the superiority of the GICS approach.
In summary, our evidence suggests that in most research applications encountered by
financial academics, the GICS classification system will provide a better technique for
identifying industrial peers. Historical GICS codes are already available at relatively low
cost for most active, as well as inactive, U. S. firms. When historical GICS codes are not
available, our results suggest that the current GICS code is a close substitute. When
neither current nor historical GICS code is available, the NAICS code appears to be the
next best solution.
In any event, the case for wider adoption of GICS industry classification codes by
financial academics is compelling. Given the increased availability this information at
low costs, its wide acceptance by practitioners, and the uniformity of its industry
definition across different countries, we believe financial researchers should seriously
consider using the GICS system in future projects that involve industry classifications.
orientation of SIC and NAICS.
21
Appendix A
Background Information on Four Industry Classification Schemes
Official Name
Developer of
Categories &
Definitions
Stated Basis for
Categories &
Definitions
Assigner of
category code to
each firm
Stated Criteria
for Assignment
of Firms to Each
Category
Current
Availability
SIC
Standardized
Industry
Classification
Codes
NAICS
North
American
Industry
Classification
Codes
FF
Fama-French
Industry
Classification
Codes
GICS
Global
Industry
Classification
Standard
U. S. Census
Bureau*
U. S. Census
Bureau*
Fama and
French (1997)
S&P and
MSCI**
Production &
Technology
Oriented
Vendorspecific
Production &
Technology
Oriented
Vendorspecific
Not Specified
Fama-French
Not Specified
Not Specified
Not Specified
From both
Compustat
and CRSP
From
Compustat
(since 2001)
See FF(1997)
Appendix
Principal
Business
Activity
S&P and
MSCI
Revenue,
Earnings, and
Market
Perception
GICS
History and
other S&P
and MSCI
products and
services
(including
Research
Insight &
Computstat)♥
*
The NAICS is jointed developed with Canada and Mexico, with the goal of establishing comparable
statistics among the three countries.
**
S&P and MSCI are Standard and Poor’s and Morgan Stanley Capital International, respectively.
♥
See Figure 1.
22
Appendix B
Variable Descriptions
All share price and shares outstanding data, used to calculate market capitalization
(market cap) are taken from the CRSP monthly dataset. Where annual information is
used, CRSP data is taken as at the end of December of each year. All financial statement
information is from the combined Compustat Active and Research files. Compustat data
item number is reported in parentheses. Analyst forecast long-term growth is the latest
median I/B/E/S consensus forecast available in December for each sample year.
Variable
Returns
Description
Monthly share price returns
pb
Price to book
market cap / total common equity (D60)
evs
Enterprise value to sales
(market cap + long term debt (D9) + debt
in current liabilities (D34)) / net sales
(D12)
pe
Price to earnings
stock price / net income before
extraordinary items (D18)
rnoa
Return on net operating assets
Net operating income after depreciation
(D178) / (property, plant, and equipment
(D8) + current assets (D4) – current
liabilities (D5))
roe
Return on equity
Net income before extraordinary items
(D18) / total common equity (D60)
at
Asset turnover
Total assets (D6) / net sales (D12)17
pm
Profit margin
Net operating income after depreciation
(D178) / net sales (D12)
lev
Leverage
Total liabilities (D9) / total stockholders’
equity (D216)
Median analyst long term growth
forecast
sales growth One year ahead realized sales
growth
Calculation
ltgrowth
R&D
17
Scaled research and development
expense
(net sales 1 year in the future – current year
net sales) / current year net sales (D12)
R&D Expense (D46) / Net Sales (D12)
This is the inverse of the standard asset turnover ratio. We find that the inverse ratio has fewer outliers.
23
Reference
ALFORD, A. W. “The Effect of the Set of Comparable Firms on the Accuracy of the
Price Earnings Valuation Method.” Journal of Accounting Research 30 (1992):94108.
BHOJRAJ, S., AND C. M. C. LEE. “Who is My Peer? A Valuation-based Approach to
the Selection of Comparable Firms.” Journal of Accounting Research 40 (2002): 407439.
BONI, L., K.L. WOMACK “Analysts, Industries, and Price Momentum.” Working
Paper, University of New Mexico and Dartmouth College (2003).
CLARKE, R. N. “SICs as Delineators of Economic Markets.” Journal of Business 62
(1989): 17-31.
CLAUS, J. AND J. THOMAS. “Equity Premia as Low as Three Percent? Evidence from
Analysts’ Earnings Forecasts for Domestic and International Stocks.” Journal of
Finance 56 (2001): 1629-1666.
FAMA, E. F., AND K. R. FRENCH. “Industry costs of equity.” Journal of Financial
Economics 43 (1997), 153-193.
FAN, J. P. H., AND L. H. P. LANG. “The Measurement of Relatedness: An Application
to Corporate Diversification.” Journal of Business 73 (2000): 629-660.
FRANKEL, R., AND C.M.C. LEE. “Accounting valuation, market expectation, and
cross-sectional stock returns.” Journal of Accounting and Economics 25 (1998):
283-319.
GEBHARDT, W. R.; C. M. C. LEE, AND B. SWAMNATHAN. “Toward an Implied
Cost-of-capital.” Journal of Accounting Research 39 (2001): 135-176.
GUENTHER, D. A., AND A. J. ROSMAN. “Differences between COMPUSTAT and
CRSP SIC Codes and Related Effects on Research.” Journal of Accounting and
Economics 18 (1994): 115-128.
HRIBAR P., AND D. W. COLLINS. “Errors in estimating accruals: Implications for
empirical research.” Journal of Accounting Research 40 (2002):105-134.
KAHLE, K. M., AND R. A. WALKLING. “The Impact of Industry Classifications on
Financial Research.” The Journal of Financial and Quantitative Analysis 31 (1996):
309-335.
24
KRISHNAN J., AND E. PRESS. “The North American Industry Classification System
and Its Implications for Accounting Research.” Working paper, Temple University
(July 2002).
KOTHARI, S. P.; A. LEONE.; AND C. WASLEY “Performance matched discretionary
accrual measures.” Working Paper, Massachusetts Institute of Technology and
University of Rochester (2003).
LAPORTA, R. “Expectations and the Cross-section of Stock Returns.” Journal of
Finance 51 (1996): 1715-1742.
LEE, C. M. C.; J. MYERS; AND B. SWAMINATHAN. “What is the Intrinsic Value of
the Dow?” Journal of Finance 54 (1999): 1693-1741.
LIU, J.; D. NISSIM; AND J. THOMAS. “Equity Valuation Using Multiples.” Journal of
Accounting Research 40 (2002): 135-172.
PEARCE, E. “History of the Standard Industrial Classification.” Executive Office of the
President, Office of Statistical Standards. U.S. Bureau of the Budget. Washington,
D.C. (July 1957) (mimeograph).
RAMNATH, S. “Investor and Analysts Reactions to Earnings Announcements of
Related Firms: An Empirical Analysis.” Working Paper Georgetown University
(2001).
SAUNDERS, N. C. “The North American Industry Classification System: Change on
the Horizon.” Occupational Outlook Quarterly (Fall, 1999): 34-37.
STANDARD AND POOR’S AND MORGAN STANLEY CAPITAL
INTERNATIONAL. Global Industry Classification Standard – A Guide to the GICS
Methodology (2002).
SUBRAMANYAM, K. R. “The pricing of discretionary accruals.” Journal of Accounting
& Economics 22 (1996):249-281
OFFICE OF MANAGEMENT AND BUDGET (OMB). North American Industry
Classification System: United States, 1997. Bernan Press: Lanham, ND, (1998).
25
Figure 1
Availability of historical GICS codes in various Standard and Poor’s Data Products
GICS information is provided in two forms. The first, current gics (mnemonic: spgicx) is the most recent GICS code for a given firm. For inactive
firms, this variable represents the GICS applicable for the firm immediately prior to delisting. The second, historical gics (mnemonic: spgicm) is a
monthly history of GICS codes for a given firm. Current GICS data (spgicx) is broadly available for firms in the Compustat Universe, but historical
GICS codes (spgicm) is available on a more limited basis. This figure depicts data availability for historical GICS codes (spgicm) in the three
commercial products currently available from Standard and Poor’s. GICS History is a new product that contains comprehensive GICS information
on over 20,000 active and inactive U.S. companies, Canadian companies, and ADRs. At present, GICS History must be purchased separately.
Historical GICS codes are also available on a more limited basis in the Research Insight and Compustat’s PDE (Price, Dividend, and Earnings) files.
For firms in the S&P 1500 Super Composite Index (that is, the S&P 500, the Midcap 400, and the Smallcap 600), historical GICS data is available
from 1994. For non-S&P 1500 firms, historical GICS data is only available from 1999.
2002
1999
1994
GICS History
S&P 1500 Firms
Non S&P 1500 Firms
Research Insight
S&P 1500 Firms
Non S&P 1500 Firms
1
1
1
1
1
1
1
1
1
Compustat PDE Files
S&P 1500 Firms
Non S&P 1500 Firms
1
1
1
1
1
1985
Figure 2: Adjusted R-squares of returns regressions for S&P 1500 and S&P 500 firms by year
Returns - S&P 1500
0.4
0.35
Adjusted R-square
0.3
0.25
GICS
SIC
0.2
NAICS
Fama French
0.15
0.1
0.05
0
1993
1994
1995
1996
1997
Year
1998
1999
2000
2001
2002
Returns - S&P 500
0.6
0.5
0.4
GICS
SIC
0.3
NAICS
Fama French
0.2
0.1
0
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
Table 1
Univariate Statistics for SIC, NAICS, Fama French, and GICS
This table reports univariate statistics for each classification level for SIC (Standard Industrial Classification), NAICS
(North American Industry Classification System), FF (Fama French), and GICS (Global Industry Classification
Standard), using S&P 1500 firms as of December 2001. Fama French refers to the industry classification system they
develop in their paper “Industry Costs of Equity,” (1997). Panel A reports the number of classification levels, the
official number of categories, and the functional number of categories for each classification level for each level of
classification. A category is defined as functional if it has at least 5 members. Although some research uses the first
digit is SIC as the broadest level, SIC codes are officially broken into 11 major divisions, labeled A through K. The 6 th
digit of the NAICS code is an additional level of detail specific to each country. For comparison purposes, the
categories in the 5th and 6 th digit levels are combined in this table, consistent with the 1997 NAICS manual. The level
of industry we use for our analysis is shaded. Panel B reports univariate statistics for each of the above classification
systems, using S&P 1500 firms as of December 2001, for the corresponding shaded level from panel A
Panel A: Official and Functional Categories (using S&P 1500 firms as of December 31, 2001)
Title
Division
Major Group
Industry Group
Industry (b)
Official
Categories
11
81
416
1004
Digits
first digit
first 2 digits
first 3 digits
all 4 digits
Functional
Categories
9
54
87
98
20
96
311
1170
first 2 digits
first 3 digits
first 4 digits
first 5 digits
17
56
89
75
SIC
Level 1 (broadest)
Level 2
Level 3
Level 4 (narrowest)
NAICS
Level 1 (broadest)
Level 2
Level 3
Level 4 (narrowest)
Sector
Subsector
Industry Group
Industry
FF
Level 1 (broadest)
-
48
-
40
GICS
Level 1 (broadest)
Level 2
Level 3
Level 4 (narrowest)
Sector
Industry Group
Industry
Sub-industry
10
23
59
123
first 2 digits
first 4 digits
first 6 digits
all 8 digits
10
23
51
89
Panel B: Univariate Statistics of Number of Firms per Industry
Minimum number of firms per industry
Maximum number of firms per industry
Median number of firms per industry
Mean number of firms per industry
Standard deviation
Skewness
Kurtosis
SIC (2 digit)
1
137
12
24
31
2.2
4.3
NAICS (3 digit)
1
173
9
20
29
3.1
11.5
FF
1
140
24
31
32
1.6
2.5
GICS (6 digit)
2
87
21
26
21
1.0
0.5
Table 2
Bridging Between SIC and NAICS, Fama French, and GICS
This table reports the degree of correspondence between SIC, Fama French (FF), NAICS, and GICS for the December 2001 S&P 1500 firms by showing the level of agreement between SIC
and the other three classifications. Fama French refers to the industry classification system they develop in their paper “Industry Costs of Equity,” (1997). See their Appendix A for a
description and definition of their industry names. We show the primary equivalent (ie, the other system’s category that has the highest level of correspondence) measured by the total number
of firms for each 2 digit SIC code. Only industry classifications that actually have member firms are considered. For example, the S&P 1500 has 38 firms in SIC industry 20 (Food and
Kindred Products). The NAICS classification system classifies 30 of these firms in subsector #311 (Food Manufacturing) for a 79% correspondence. The FF classification system classifies 29
of these firms in their category of “Food,” for a 76% correspondence. Finally, the GICS classification system classifies 25 of these firms in industry #302020 (Food Products), for a 66%
correspondence. For brevity, only the category with the highest level of correspondence is shown. Note that the FF correspondence is slightly misleading, since there is an explicit mapping
from SIC into FF using all four SIC digits. However, for comparative purposes, we use only 2 digit SIC here.
Two-digit SIC
Group
Total Firms
01
10
12
13
14
15
16
17
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
2
6
2
43
3
10
6
3
38
3
5
13
10
11
24
28
98
14
10
6
5
32
26
91
128
40
73
12
Primary
Equivalent
111
212
212
211
212
233
234
235
311
312
313
315
321
337
322
511
325
324
326
316
327
331
332
333
334
336
334
339
NAICS (3 digit)
Firms
Proportion of
Total
Primary
Equivalent
2
6
2
26
3
10
6
3
30
3
2
13
8
8
22
17
98
14
8
6
5
28
25
58
102
40
41
12
100%
100%
100%
60%
100%
100%
100%
100%
79%
100%
40%
100%
80%
73%
92%
61%
100%
100%
80%
100%
100%
88%
96%
64%
80%
100%
56%
100%
Agric
Gold
Coal
Enrgy
Mines
Cnstr
Cnstr
Cnstr
Food
Smoke
Txtls
Clths
BldMt
Hshld
Paper
Books
Drugs
Enrgy
Not Classified
Clths
BldMt
Steel
BldMt
Mach
Chips
Autos
LabEq
Toys
Fama French
Firms
Proportion of
Total
Primary
Equivalent
2
3
2
43
3
10
6
3
29
3
5
13
10
6
21
20
51
13
4
6
2
32
17
59
100
24
33
7
100%
50%
100%
100%
100%
100%
100%
100%
76%
100%
100%
100%
100%
55%
88%
71%
52%
93%
40%
100%
40%
100%
65%
65%
78%
60%
45%
58%
302020
151040
151040
101020
151020
252010
201030
201030
302020
302030
252010
252030
151050
252010
151050
254010
151010
101020
251010
252030
151020
151040
201060
201060
452050
201010
351010
252020
GICS (6 digit)
Firms
Proportion of
Total
2
6
2
26
2
10
6
2
25
3
2
11
5
5
10
16
36
12
3
6
2
19
8
29
44
12
35
6
100%
100%
100%
67%
100%
100%
100%
67%
66%
100%
40%
85%
50%
45%
42%
57%
37%
86%
30%
100%
40%
59%
31%
32%
34%
30%
48%
50%
Two-digit SIC
Group
Total Firms
40
42
44
45
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
67
70
72
73
75
78
79
80
82
87
99
Sum
5
9
6
15
5
27
99
30
21
5
19
7
6
25
9
24
22
92
13
21
61
9
15
6
5
140
2
1
10
22
6
15
6
1,500
Primary
Equivalent
482
484
483
481
488
513
221
421
422
444
452
445
441
448
442
722
451
522
522
523
524
524
533
721
812
511
532
512
713
621
611
541
999
NAICS (3 digit)
Firms
Proportion of
Total
Primary
Equivalent
5
9
5
12
5
27
86
29
21
4
19
7
5
25
5
24
6
92
13
21
60
9
8
6
3
55
1
1
4
13
6
13
6
1203
100%
100%
83%
80%
100%
100%
87%
97%
100%
80%
100%
100%
83%
100%
56%
100%
27%
100%
100%
100%
98%
100%
53%
100%
60%
39%
50%
100%
40%
59%
100%
87%
100%
80%
Trans
Trans
Trans
Trans
Trans
Telcm
Util
Whlsl
Whlsl
Rtail
Rtail
Rtail
Rtail
Rtail
Rtail
Meals
Rtail
Banks
Banks
Fin
Insur
Insur
Fin
Meals
PerSrv
BusSv
BusSv
Fun
Fun
Hlth
PerSrv
BusSv
Misc
Fama French
Firms
Proportion of
Total
Primary
Equivalent
5
9
6
15
5
27
99
30
21
5
19
7
6
25
9
24
22
92
13
21
61
9
15
6
5
123
1
1
10
22
6
15
6
1267
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
88%
50%
100%
100%
100%
100%
100%
100%
84%
203040
203040
203030
203020
203010
501010
551010
452030
351020
255040
255030
301010
255040
255040
255040
253010
255040
401010
402010
402010
403010
403010
253010
253010
202010
451030
203040
254010
253010
351020
202010
202010
201050
GICS (6 digit)
Firms
Proportion of
Total
5
9
3
10
3
12
54
5
6
4
19
7
5
22
9
24
11
87
11
19
47
4
2
6
4
59
1
1
9
20
6
7
6
842
100%
100%
50%
67%
60%
44%
55%
17%
29%
80%
100%
100%
83%
88%
100%
100%
50%
95%
85%
90%
77%
44%
13%
100%
80%
42%
50%
100%
90%
91%
100%
47%
100%
56%
Table 3
Comparison of Adjusted R 2 between SIC, NAICS, Fama French, and GICS for Returns
Ri,t = αt + βRind, t + εi,t
This table reports the firm-months and adjusted R 2 for the above monthly OLS regression. The dependent variable, R, is the monthly return for
firm i within industry ind at month t, from the CRSP monthly database. The independent variable, Rind, is the monthly average return for all firms
in that industry classification. Industries are defined by either the first 2 digits of the firm’s SIC code, the first 3 digits of the firm’s NAICS code,
the firm’s Fama French classification (FF), or the first 6 digits of the firm’s GICS code. Each industry included in these regressions must have at
least 5 members. Because classifications differ between SIC, FF, NAICS, and GICS, there will be differences in the number of firm-months
reported for each regression. We use all firms from the S&P index as at December for each year (from Research Insight) for which we are able to
find a permno from CRSP by matching based on cusip.
Panel A: Adjusted R-sq for all S&P Stocks
1994
1995
1996
1997
1998
1999
2000
2001
Average
SIC
Firm-Months
Adj R-Sq
17,119
18.8%
17,091
16.4%
17,066
20.7%
17,204
25.1%
17,370
30.1%
17,185
20.5%
17,303
20.0%
17,471
31.7%
22.9%
NAICS
Firm-Months
Adj R-Sq
17,002
20.0%
17,084
17.5%
16,838
21.4%
16,976
25.5%
16,994
31.9%
17,143
22.2%
17,075
21.7%
17,123
33.3%
24.2%
Fama French
Firm-Months
Adj R-Sq
17,191
18.6%
17,132
15.7%
17,126
19.6%
17,307
24.5%
17,294
29.5%
17,303
20.6%
17,339
21.1%
17,351
31.5%
22.6%
Panel B: Difference between Other Classification and GICS
GICS vs. SIC
1994
1995
1996
1997
1998
1999
2000
2001
Average
1.7%
1.7%
1.8%
2.1%
1.1%
5.6%
7.8%
5.7%
3.4%***
GICS vs. NAICS
0.5%
0.6%
1.1%
1.7%
-0.7%
3.9%
6.1%
4.1%
2.1%**
GICS vs. FF
1.9%
2.4%
2.9%
2.7%
1.7%
5.5%
6.7%
5.9%
3.7%***
*** Significant at the 1% level (two-tailed t-test)
** Significant at the 5% level (two-tailed t-test)
GICS
Firm-Months
Adj R-Sq
17,191
20.5%
17,276
18.1%
17,222
22.5%
17,379
27.2%
17,462
31.2%
17,347
26.1%
17,323
27.8%
17,399
37.4%
26.3%
Table 4
Comparison of Adjusted R2 for S&P 1500 firms between SIC, NAICS, Fama French, and GICS Industries for Returns, Financial Ratios
and Other Financial Information
vblei, t = αt + βvbleind, t + εi, t
2
This table reports the average adjusted R for S&P 1500 firms, from Research Insight for the above OLS regression. Returns are from CRSP’s
monthly database. Share prices and shares outstanding are drawn from CRSP as at December 31 of each year. Financial statement information
is from Compustat, for the fiscal year ended in that year. Analyst long-term growth forecasts are the most recent December consensus forecast
for that year, from I/B/E/S. The dependent variable, vble, is one of the following: Returns, price-to-book (pb, market cap divided by total
common equity), enterprise value-to-sales (evs, the sum of market cap and long-term debt divided by net sales), price-to-earnings (pe, market
cap divided by net income before extraordinary items), return on net operating assets (rnoa, net operating income after depreciation divided by
the sum of property, plant, and equipment and current assets, less current liabilities), return on equity (roe, net income before extraordinary items
divided by total common equity), asset turnover (at, total assets divided by net sales), profit margin (pm, net operating income after depreciation
divided by net sales), leverage (lev, total liabilities divided by total stockholders’ equity), long term analyst growth forecast (ltgrowth), one year
ahead realized sales growth (sales growth), and scaled research and development expense (R&D, research and development expense divided by
net sales) for firm i within industry ind at year t. The independent variable, vbleind, is the yearly average for that variable for all firms in that
industry classification. Industries are defined by either the first 2 digits of the firm’s SIC code, the firm’s Fama French classification (FF), the
first 3 digits of the firm’s NAICS code, or the first 6 digits of the firm’s GICS code. Each industry included in these regressions must have at
least 5 members. For each variable, the highest adjusted R2 is shaded. We perform a two-tailed t-test on the difference between GICS and other
classifications based on the time series of differences from 1994 to 2000 (2001 for returns). Panel A shows our results for returns, panel B shows
our results for valuation multiples, panel C shows our results for financial ratios, and panel D shows our results for other financial information.
*** signifies significance (from 0) at the 1% level, ** at the 5% level, and * at the 10% level.
SIC
NAICS
Fama French
GICS
22.6%
3.7%***
26.3%
Panel A: Returns
Returns
1994 to 2001
GICS vs. Competitor
22.9%
3.4%***
24.2%
2.1%**
Panel B: Valuation Multiples
pb
1994 to 2000
GICS vs. Competitor
18.2%
5.1%***
19.1%
4.2%***
18.8%
4.5%***
23.3%
evs
1994 to 2000
GICS vs. Competitor
31.5%
5.9%**
32.7%
4.7%**
33.6%
3.8%*
37.4%
pe
1994 to 2000
GICS vs. Competitor
9.5%
1.1%
9.9%
0.7%
8.0%
2.6%**
10.6%
Panel C: Financial Statement Ratios
rnoa
1994 to 2000
GICS vs. Competitor
18.3%
2.6%**
19.8%
1.1%
20.0%
0.9%
20.9%
roe
1994 to 2000
GICS vs. Competitor
9.7%
1.7%**
10.8%
0.6%
8.6%
2.8%***
11.4%
at
1994 to 2000
GICS vs. Competitor
86.2%
1.0%***
81.7%
5.5%***
82.9%
4.3%***
87.2%
pm
1994 to 2000
GICS vs. Competitor
40.9%
1.8%
41.9%
0.8%
40.5%
2.2%**
42.7%
lev
1994 to 2000
GICS vs. Competitor
19.6%
-1.8%**
21.3%
-3.5%***
17.9%
-0.1%
17.8%
Panel D: Other Financial Information
ltgrowth
1994 to 2000
GICS vs. Competitor
32.8%
9.1%***
33.7%
8.2%***
33.5%
8.4%***
41.9%
sales growth
1994 to 2000
GICS vs. Competitor
12.4%
3.7**
13.2%
2.9%**
12.0%
4.1%***
16.1%
R&D
1994 to 2000
GICS vs. Competitor
42.5%
21.7%***
51.9%
12.3%***
52.7%
11.5%***
64.2%
Table 5
Monte Carlo Simulated Adjusted R 2
This table compares the average adjusted R2 from the S&P1500 firms from table 4 to simulated Adjusted R 2’s created by randomly allocating firms to
industry classifications. We repeated each simulation 500 times and present the average adjusted R2. The “Adjusted” column is created by
subtracting the actual average R2 from the average of the simulations. Returns are from CRSP’s monthly database. Share prices and shares
outstanding are drawn from CRSP as at December 31 of each year. Financial statement information is from Compustat, for the fiscal year ended in
that year. Analyst long-term growth forecasts are the most recent December consensus forecast for that year, from I/B/E/S. The dependent variable is
one of the following: Returns, price-to-book (pb, market cap divided by total common equity), enterprise value-to-sales (evs, the sum of market cap
and long-term debt divided by net sales), price-to-earnings (pe, market cap divided by net income before extraordinary items), return on net operating
assets (rnoa, net operating income after depreciation divided by the sum of property, plant, and equipment and current assets, less current liabilities),
return on equity (roe, net income before extraordinary items divided by total common equity), asset turnover (at, total assets divided by net sales),
profit margin (pm, net operating income after depreciation divided by net sales), leverage (lev, total liabilities divided by total stockholders’ equity),
and long term analyst growth forecast (ltgrowth), one year ahead realized sales growth (sales growth), and scaled research and development expense
(R&D, research and development expense divided by net sales) for firm i within industry ind at year t. Panel A shows our results for returns, panel B
shows our results for valuation multiples, panel C shows our results for financial ratios, and panel D shows our results for other financial information.
Industries are defined by either the first 2 digits of the firm’s SIC code, the firm’s Fama French classification (FF), the first 3 digits of the firm’s
NAICS code, or the first 6 digits of the firm’s GICS code. Each industry included in these regressions must have at least 5 members. We perform a
two-tailed t-test on the adjusted difference between GICS and other classifications based on the time series of differences from 1994 to 2000 (2001 for
returns). We treat the average simulated adjusted R2 as a constant for these tests. *** signifies significance (from 0) at the 1% level, ** at the 5%
level, and * at the 10% level.
Actual
Simulated
Revised
GICS vs
Competitor
Actual
Panel A: Returns
Returns
SIC
NAICS
FF
GICS
22.9%
24.2%
22.6%
26.3%
14.5%
14.5%
14.5%
14.4%
Revised
GICS vs
Competitor
Panel B: Valuation Multiples
8.4%
9.7%
8.1%
11.9%
3.5%***
2.2%**
3.8%***
pb
SIC
NAICS
FF
GICS
18.2%
19.1%
18.8%
23.3%
3.9%
4.4%
3.1%
3.9%
14.3%
14.7%
15.7%
19.4%
5.1%***
4.7%***
3.7%***
evs
SIC
NAICS
FF
GICS
31.5%
32.7%
33.6%
37.4%
3.9%
4.4%
3.1%
3.9%
27.6%
28.3%
30.5%
33.5%
5.9%**
5.2%**
3.0%
pe
SIC
NAICS
FF
GICS
9.5%
9.9%
8.0%
10.6%
4.2%
4.7%
3.4%
4.2%
5.3%
5.2%
4.6%
6.4%
1.1%
1.2%
1.8%*
Panel C: Financial Statement Ratios
rnoa
Simulated
SIC
NAICS
FF
GICS
18.3%
19.8%
20.0%
20.9%
4.3%
4.8%
3.5%
4.4%
14.0%
15.0%
16.5%
16.5%
2.5%**
1.5%
0.0%
SIC
NAICS
FF
GICS
9.7%
10.8%
8.6%
11.4%
3.9%
4.4%
3.1%
3.9%
5.8%
6.4%
5.5%
7.5%
1.7%*
1.1%
2.0%**
at
SIC
NAICS
FF
GICS
86.2%
81.7%
82.9%
87.2%
3.9%
4.4%
3.1%
3.9%
82.3%
77.3%
79.8%
83.3%
1.0%***
6.0%***
3.5%***
ltgrowth
SIC
NAICS
FF
GICS
32.8%
33.7%
33.5%
41.9%
3.9%
4.4%
3.1%
3.9%
28.9%
29.3%
30.4%
38.0%
9.1%***
8.7%***
7.6%***
pm
SIC
NAICS
FF
GICS
40.9%
41.9%
40.5%
42.7%
3.9%
4.4%
3.1%
3.9%
37.0%
37.5%
37.4%
38.8%
1.8%
1.3%
1.4%
sales growth SIC
NAICS
FF
GICS
12.4%
13.2%
12.0%
16.1%
4.0%
4.5%
3.2%
4.0%
8.4%
8.7%
8.8%
12.1%
3.7%**
3.4%***
3.3%***
lev
SIC
NAICS
FF
GICS
19.6%
21.3%
17.9%
17.8%
3.9%
4.4%
3.1%
3.9%
15.7%
16.9%
14.8%
13.9%
R&D
42.5%
51.9%
52.7%
64.2%
5.8%
6.1%
5.2%
6.4%
36.7%
45.8%
47.5%
57.8%
roe
Panel D: Other Financial Information
-1.8%**
-3.0%**
-0.9%
SIC
NAICS
FF
GICS
21.1%***
12.0%***
10.3%***
Table 6
Comparison of Adjusted R2 between SIC, NAICS, Fama French, and GICS Industries for Returns, Financial Ratios, and other
Financial Information
vblei,t = αt + βvbleind,t + εi,t
2
This table reports the average adjusted R for firms from 1994 to 2000 (2001 for returns), from Research Insight for the above OLS
regression. Returns are from CRSP’s monthly database. Share prices and shares outstanding are drawn from CRSP as at December 31 of
each year. Financial statement information is from Compustat, for the fiscal year ended in that year. Analyst long-term growth forecasts
are the most recent December consensus forecast for that year, from I/B/E/S. The dependent variable, vble, is one of the following:
Returns, price-to-book (pb, market cap divided by total common equity), enterprise value-to-sales (evs, the sum of market cap and longterm debt divided by net sales), price-to-earnings (pe, market cap divided by net income before extraordinary items, for firms with positive
net income only), return on net operating assets (rnoa, net operating income after depreciation divided by the sum of property, plant, and
equipment and current assets, less current liabilities), return on equity (roe, net income before extraordinary items divided by total common
equity), asset turnover (at, total assets divided by net sales), profit margin (pm, net operating income after depreciation divided by net
sales), leverage (lev, total liabilities divided by total stockholders’ equity), and long term analyst growth forecast (ltgrowth), one year
ahead realized sales growth (sales growth), and scaled research and development expense (R&D, research and development expense
divided by net sales) for firm i within industry ind at year t. The independent variable, vbleind, is the yearly average (monthly average for
returns) for that variable for all firms in that industry classification. Industries are defined by either the first 2 digits of the firm’s SIC code,
the firm’s Fama French classification (FF), the first 3 digits of the firm’s NAICS code, or the first 6 digits of the firm’s GICS code. Each
industry included in these regressions must have at least 5 members. For each variable, the highest average adjusted R2 is shaded. We
perform a two-tailed t-test on the difference between GICS and other classifications based on the time series of differences. Panel A shows
our results for S&P 500 firms, panel B shows our results for S&P 400 (midcap) firms, and panel C shows our results for S&P 600
(smallcap) firms. *** signifies significance (from 0) at the 1% level, ** at the 5% level, and * at the 10% level.
SIC
NAICS
Fama French
GICS
Returns
1994 to 2001
GICS vs Competitor
Panel A: S&P 500 Firms
34.5%
35.4%
6.1%***
5.2%***
35.9%
4.7%***
40.6%
pb
1994 to 2000
GICS vs Competitor
24.1%
12.9%***
27.9%
9.1%***
30.1%
6.9%***
37.0%
evs
1994 to 2000
GICS vs Competitor
31.5%
13.1%***
32.0%
12.6%***
40.6%
4.0%***
44.6%
pe
1994 to 2000
GICS vs Competitor
10.8%
8.8%***
13.6%
7.0%**
10.4%
9.2%***
19.6%
rnoa
1994 to 2000
GICS vs Competitor
30.2%
6.2%**
30.3%
6.1%**
34.5%
1.9%*
36.4%
roe
1994 to 2000
GICS vs Competitor
20.2%
8.4%***
22.2%
6.4%***
23.8%
4.8%***
28.6%
at
1994 to 2000
GICS vs Competitor
83.5%
1.8%***
80.5%
4.8%***
80.1%
5.2%***
85.3%
pm
1994 to 2000
GICS vs Competitor
45.3%
11.5%***
45.1%
11.7%***
48.8%
8.0%***
56.8%
lev
1994 to 2000
GICS vs Competitor
24.4%
5.0%***
27.7%
1.7%
25.8%
3.6%**
29.4%
ltgrowth
1994 to 2000
GICS vs Competitor
27.9%
19.9%***
32.8%
15.0%***
31.1%
16.7%***
47.8%
sales growth
1994 to 2000
GICS vs Competitor
17.4%
5.7%***
18.8%
4.3%***
17.8%
5.3%***
23.1%
R&D
1994 to 2000
GICS vs Competitor
37.0%
32.5%***
44.9%
27.6%***
60.3%
12.2%***
72.5%
SIC
NAICS
Fama French
GICS
Returns
1994 to 2001
GICS vs Competitor
Panel B: S&P 400 Firms (Midcap)
28.8%
29.9%
7.5%***
6.4%***
29.7%
6.6%***
36.3%
pb
1994 to 2000
GICS vs Competitor
21.8%
7.8%***
20.6%
9.0%***
23.0%
6.6%***
29.6%
evs
1994 to 2000
GICS vs Competitor
27.5%
13.1%**
24.9%
15.7%***
34.5%
6.1%*
40.6%
pe
1994 to 2000
GICS vs Competitor
15.9%
3.6%
15.7%
3.8%***
15.8%
3.7%**
19.5%
rnoa
1994 to 2000
GICS vs Competitor
22.5%
-1.5%
22.7%
-1.7%
20.6%
0.4%
21.0%
roe
1994 to 2000
GICS vs Competitor
12.0%
3.7%**
9.8%
5.9%***
11.5%
4.2%**
15.7%
at
1994 to 2000
GICS vs Competitor
89.9%
0.6%
87.6%
2.9%
87.4%
3.1%**
90.5%
pm
1994 to 2000
GICS vs Competitor
43.9%
5.0%**
46.6%
2.3%
42.7%
6.2%**
48.9%
lev
1994 to 2000
GICS vs Competitor
19.7%
2.0%**
22.2%
-0.5%
24.8%
-3.1%**
21.7%
ltgrowth
1994 to 2000
GICS vs Competitor
43.4%
8.4%***
41.3%
10.5%***
47.9%
3.9%*
51.8%
sales growth
1994 to 2000
GICS vs Competitor
13.2%
10.0%***
12.8%
10.3%***
14.0%
8.2%**
23.2%
R&D
1994 to 2000
GICS vs Competitor
43.7%
16.0***
54.3%
5.4%***
53.2%
6.5%***
59.7%
SIC
NAICS
Fama French
GICS
Returns
1994 to 2001
GICS vs Competitor
Panel C: S&P 600 Firms (Smallcap)
22.5%
23.8%
2.6%**
1.3%
22.0%
3.1%***
25.1%
pb
1994 to 2000
GICS vs Competitor
20.6%
1.2%
20.6%
1.2%
17.9%
3.9%**
21.8%
evs
1994 to 2000
GICS vs Competitor
38.0%
-0.4%
41.6%
-4.0%**
36.9%
0.7%
37.6%
pe
1994 to 2000
GICS vs Competitor
13.7%
1.4%
13.2%
1.9%
12.2%
2.9%**
15.1%
rnoa
1994 to 2000
GICS vs Competitor
12.8%
5.3%**
13.5%
4.6%**
15.1%
3.0%**
18.1%
roe
1994 to 2000
GICS vs Competitor
9.8%
5.4%**
11.2%
4.0%*
9.3%
5.9%**
15.2%
at
1994 to 2000
GICS vs Competitor
89.9%
-0.2%
82.9%
6.8%***
87.2%
2.5%***
89.7%
pm
1994 to 2000
GICS vs Competitor
41.8%
0.9%
42.7%
0.0%
41.6%
1.1%
42.7%
lev
1994 to 2000
GICS vs Competitor
22.4%
-1.9%
24.5%
-4.0%**
19.5%
1.0%
20.5%
ltgrowth
1994 to 2000
GICS vs Competitor
36.4%
4.0%***
38.5%
1.9%
36.9%
3.5%***
40.4%
sales growth
1994 to 2000
GICS vs Competitor
14.3%
5.0%**
15.4%
3.9%***
14.1%
5.2%***
19.3%
R&D
1994 to 2000
GICS vs Competitor
29.2%
44.8%***
36.1%
37.9%***
45.9%
28.1%***
74.0%
Download