Do Compustat Financial Statement Data Articulate?

advertisement
Do Compustat Financial Statement Data Articulate?*
Ryan Casey
University of Denver
ryan.casey@du.edu
Feng Gao
University of Illinois at Chicago
gaof1@uic.edu
Michael Kirschenheiter
University of Illinois at Chicago
mkirsche@uic.edu
Siyi Li
University of Illinois at Chicago
siyili@uic.edu
Shailendra Pandit
University of Illinois at Chicago
shail@uic.edu
This Draft: May 29, 2015
*
We wish to thank the Center for Education and Research in Financial Reporting Quality (CERFRQ) of the University
of Illinois at Chicago for financial assistance. We also gratefully acknowledge helpful comments and suggestions
from the Compustat team at S&P Capital IQ, Mark Evans, Hong Xie (AAA discussant), and workshop participants at
the University of Illinois at Chicago, the First UIC CERFRQ Accounting Research Conference, the 37th Annual
Congress of the European Accounting Association, and the 2014 AAA Annual Meeting.
Do Compustat Financial Statement Data Articulate?
ABSTRACT
Using the Financial Statement Balancing Model (FSBM) from Compustat, we examine
whether financial statement data articulate for 10,681 U.S. non-financial firms for 24 years, a total
of 92,951 firm years. We accomplish three research goals. First, we build the first formal model of
financial statement articulation, providing a benchmark for subsequent discussions of articulation.
Second, we show how to handle missing data to ensure articulation, by either filling in zeros or
inferring the missing data using other variables in the equations. Third, we produce modified
variables that resolve exceptions in the articulating equations, so that these variables form relations
that are consistent across time and firms. We then compare the “modified database” (MDB) using
these updated variables with the original Compustat data, and find significant differences in many
commonly used financial variables, such as Altman’s z-score. We believe that our MDB has the
potential to help researchers increase sample size and data quality in empirical studies.
Key Words: financial statement articulation; data integrity; Compustat.
Do Compustat Financial Statement Data Articulate?
I.
INTRODUCTION
A key attribute of financial statements (F/S) is that they articulate, that is, the stock
documents, the beginning and ending balance sheets (B/S), connect to each other through the three
flow documents, the cash flow, income, and changes in owners’ equity statements (CF/S, I/S and
OE/S, respectively). The purpose of this study is to formally define what articulation of accounting
data means and examine whether articulation holds in Compustat, one of the most important
databases for financial statement information used by accounting and finance researchers.1
Ensuring articulation in Compustat financial statement data is important for two reasons.
First, articulation drives the construction of financial statements. F/S articulation is a direct
outcome of double-entry bookkeeping system in which accounting variables are codetermined
through the resolution of multiple accounting identities (Christodoulou and McLeay 2014,
hereafter CM). Our model of articulation shares the spirit of CM, in which relationships among
accounting variables are required to hold contemporaneously, across time, and across financial
statements. While we know that theoretically the amounts in F/S should articulate, we do not know
whether the data that researchers actually use also articulate. As CM show, an empirical research
design that ignores the articulating nature of accounting data can lead to incorrect or incomplete
inferences. We rely on an articulating model to modify Compustat data and produce a fully
“articulated” dataset, which allows researchers to utilize articulation in their research design.
Second, F/S articulation can help empirical researchers manage issues in data quality. The
1
Compustat is prepared and marketed by Standard & Poor’s (S&P) Capital IQ division using information from firms’
financial disclosures. Such information undergoes a standardization process before being coded into the Compustat
database. Also, as discussed in sections II and III below, the academic community appears to lack agreement on
whether or not F/S data used in research will articulate or even what “articulation” means.
1
first common data quality issue is the existence of “missing” variables.2 Researchers use a variety
of approaches to address the issue of missing data. Some exclude them from the sample (e.g., Lee,
Pandit and Willis 2013); some set them to zero (e.g., Richardson et al. 2005) and some apply both
approaches to different variables (Bloomfield et al. 2015). Koh and Reeb (2014) find that
indiscriminately treating missing R&D values as zero introduces bias into the analysis. While
indiscriminate replacement of missing values with zeros is inappropriate, dropping such
observations leads to a reduction in the sample size, potentially reducing the power of the
empirical tests. Using our model of articulation, researchers can restore the missing data in a
systematic manner by replacing missing data, either with zeros or with summary amounts, only for
observations that meet our articulating criteria.
Another common data issue concerns general data accuracy when variables are not missing;
articulation relations can provide a way to resolve this issue. For example, we find some instances
where articulating relationships fail to hold even when data are not missing. Simply filling zeros or
dropping the observations cannot address such cases. Rather, we rely on our articulation model to
create modified Compustat variables to incorporate the fully articulated model in such instances.
Our approach of systematic replacement of specific missing values with zeros and modification of
variables to ensure articulation yields a significant increase in sample size along with a significant
change in the observed values for several commonly used variables and ratios.
We construct a version of the Compustat Financial Statement Balancing Model, or FSBM,
2
Compustat indicates missing value for a variable using a null value in place of a number for that variable. According
to Compustat, such “nulls” can indicate cases where either the firm does not report the amount or Compustat analysts
are unable to assign the amount to a specific variable field, potentially due to deficient company reporting. There are
also cases where Compustat use nulls in combination with data codes to convey certain issues in the supplied financial
statement data. See Section IV for more discussion. Also, researchers generally analyze Compustat data using
software tools such as SAS, which typically convert null values of numeric data fields to ‘dots’. Chen et al. (2014)
implicitly refer to non-reporting of variable values by Compustat when they develop their disclosure quality measure
based on a count of ‘dots’. We sometimes refer to null fields (‘dots’) as missing data.
2
which specifies 20 equations for the B/S, I/S, and CF/S.3 Based on the FSBM, we build a Modified
Financial Statement Balancing Model, denoted as FSBMm, with five additional equations
representing the OE/S, which is notably absent in Compustat’s FSBM. We present all 25
equations, representing articulation among line items on the four F/S, and then empirically test
these relations to examine whether articulation holds in the original Compustat database.
Using Compustat data for U.S. non-financial firms for the time period 1988-2011, we
construct two databases called RDB and MDB (for ‘raw’ and ‘modified’ databases, respectively).
We start building the RDB from selecting variables used in the 20 FSBM equations from
Compustat. Applying these equations that represent the B/S, I/S and CF/S relations yields the RDB
that contains 1,847,444 firm year equation observations, or FYEOs. Our initial application of the
equations to the data produces 560,684 exceptions, an exception rate of 30%. We follow a
three-step process to resolve the exceptions. We first replaced all null variable values with zeros
and resolved roughly 80% of the exceptions by this method, leaving 96,937 exceptions. We then
replaced missing or zero values in total assets, total liabilities, total current assets, and total current
liabilities with the sum of their respective components, which resolves another 10,342 exceptions.
We then resolve another 80,814 exceptions based on changes to GAAP over the 24-year sample
period. In total, we resolve about 99% of the exceptions and produce the MDB by modifying the
Compustat variables and applying the FSBMm equations to ensure articulation in the four F/S.4
The remaining 1% exceptions are not resolved due to other issues such as inconsistencies between
changes in cash on the B/S versus the CF/S or when companies do not report financial statements
3
Three equations, Equations 1E, 2E, and 3D.ii, are not included in the Compustat version of FSBM, but are included
in our testing. Equation 1E says total assets equal total liabilities plus total equities; we find it always holds. Equation
2E, for comprehensive income, is not included in the Compustat version of FSBM, but is shown to hold elsewhere in
the Compustat manual. Equation 3D.ii says change in cash on the B/S equals change in cash on the CF/S.; we find
numerous exceptions to this equation, as discussed below. The primary reason appears to be restatements, but
additional work will be needed to resolve these exceptions more fully.
4
SAS code to generate the MDB is available upon request.
3
that articulate.5
Our paper contributes to the literature in the following four ways. First, we are the first to use
a comprehensive financial statement balancing model to evaluate the articulation of Compustat
data. Our study complements CM, which assumes that accounting data articulate from double
entry accounting and provide a structural model of estimation to mitigate the endogeneity issue.
Our study specifically addresses cases when data do not articulate, which enables researchers to
design and conduct studies that take advantage of the articulating nature of F/S, and thus mitigate
research design issues stemming from non-articulation of data.
Second, we provide a systematic approach to the issue of missing data in Compustat data. As
discussed above, empirical researchers differ in how they address missing values in data.
Excluding such observations can result in a significant reduction in sample size, while setting all
missing values to zero can introduce biases into the data, since zero may not be the actual value for
the variable in question.6 Our system of equations in the FSBMm offers researchers a way to treat
missing variables – we set the missing values to zero only when the changes ensure that the FSBM
equations hold. This approach effectively expands the sample and increases the power of empirical
tests without introducing bias. In cases where there are no missing data but the articulation
equations do not hold, we replace existing Compustat variables with modified variables to ensure
articulation of the data. We show that modifying the data has an impact on values of variables and
ratios commonly used in the literature, such as inventory, selling, general and administrative
(SGA) expenses, current ratio, working capital, and Altman’s Z-score.
5
More specifically, other than the inconsistencies between changes in cash on the B/S versus the CF/S in Equation
3D.ii, only 0.23% of the original population of exceptions remain unresolved.
6
Missing variables values can have a significant impact on the sample size, thus potentially affecting the power of
empirical tests. For example, Lee et al. (2013) report that they lose 9,059 (23.4%) observations out of an initial sample
of 38,697 firm-year observations due to missing values of ESUB, ‘Equity in earnings – unconsolidated subsidiaries’,
variable in Compustat. Nonmissing data requirements for control variables further reduce their sample by 8,302
(21.5%) observations, leaving a final sample of 21,336 observations, which represents 55.1% of the original sample.
Also see Koh and Reeb (2014) for an analysis of missing research and development data.
4
Third, our study adds to an emerging literature that examines the count and distributional
properties of reported accounting variables to infer certain firm attributes. For example, Li (2008)
and Lundholm, Rogo and Zhang (2014) use the number of non-missing items from Compustat annual
data as a proxy for financial complexity. A recent study by Chen, Miao and Shevlin (2014) proposes
a measure of disclosure quality by aggregating the number of non-missing variables in Compustat
line items to draw inferences on the level of detail in firms’ annual reports. To examine the impact
of our approach on the literature, we compare the percentile rankings of Chen et al.’s (2014)
disclosure quality measured using the original Compustat data and our modified database.7
Interestingly, the original Compustat data produce disclosure quality measures with upward biases
for large firms, and downward bias for smaller firms. We believe that sophisticated investors may
choose to follow our approach and replace certain missing values to yield a more accurate
disclosure quality measure.
Fourth, we add to the literature that focuses on the accuracy of different data aggregators or
redistributors. Recent studies such as Boritz and No (2013) and Eilifsen and Messier (2013) add to
a long literature that investigates issues in commonly used data sources. We contribute to this
literature by proposing a systematic method based on accounting data articulation to address
missing or inconsistent Compustat data due to incomplete firm disclosure or GAAP changes.
The remainder of the paper proceeds as follows. Section II reviews related literature. In
Section III, we present the development of the FSBMm, the Modified Financial Statement
Balancing Model. Section IV provides the empirical analysis. Section V concludes.
II. LITERATURE REVIEW
This study builds on and adds to three strands of accounting literature. The first line of
7
See the online appendix D for details of the analysis.
5
research provides our study with theoretical motivation. F/S articulation, which derives naturally
from the double-entry bookkeeping system, is often utilized in accounting models. For instance,
the clean surplus relation, one of the summary articulation relations, is employed in valuation
studies such as Ohlson (1995) to help derive the residual income valuation model. Christodoulou
and McLeay (2014), in an effort to fully exploit the articulation relations among financial
statement variables, develop a generalized structural system where the deterministic relationships
governing articulation are clearly defined. They apply the framework to develop fully identified
models that are consistent with the duality of the accounting data generating process and show that
the new models, when applied to specific research settings, such as equity pricing and investment
sensitivity to operating cash flows, can yield more precise estimates. By the same token, ignoring
the constraints imposed by articulation relations may lead to erroneous or incomplete inferences.
Our articulation model shares the spirit of CM. Moreover, for empirical researchers who want to
use the CM framework, a fully articulated dataset, like the one we will produce, would be required.
The second strand of related literature defines articulated F/S as those for which changes in
B/S working capital accounts equal the working capital amounts reported on the CF/S and
considers all other F/S as failing to exhibit articulation. Under this definition, events such as
reclassifications, accounting changes, foreign currency translations and acquisitions would
destroy articulation. Studies that adopt this approach include Dritina and Largay (1985), Huefner
et al. (1989), Bahnson et al. (1996), Revsine et al. (2002), Wilkins and Loudder (2000), Hribar and
Collins (2002), and Shi and Zhang (2011). Our study complements the work in this area of
research as we provide a competing perspective on handling these “non-articulating” events. Our
study extends the above literature by developing a comprehensive framework of accounting data
articulation, which formalizes the relationships among accounting variables across financial
6
statements and across time.
The third strand of related literature focuses on potential data omissions and the impact on
research. Bennin (1980) compares monthly returns reported from Compustat and CRSP between
1962 and 1978 and finds an error rate of 0.25%, about one third of what Rosenberg and Houglet
(1974) document for an earlier period. Boritz and No (2013) study 150 XBRL filings of a random
sample of 75 firms that filed their interactive data on EDGAR between 2009 and 2012. They find
that data aggregation and redistribution from commonly used data aggregators have an omission
rate of 50% for items reported in the interactive data. For items that are reported by data
aggregators, about 8% do not match the information from interactive data and more than 50% of
such mismatches are material based on conventional standards (Eilifsen and Messier, 2013). Our
system of F/S equations allows us to examine whether the data reflect articulation. Our approach
to resolving omitted values in data is similar in spirit to that of Koh and Reeb (2014) who detail a
systematic approach to deal with missing R&D values. They report that panel data can benefit
from a hybrid approach to replacing missing data with either industry or historical firm average
R&D.
III. DEVELOPING THE FSBMm
In this section we present the Modified Financial Statement Balancing Model, abbreviated as
FSBMm, where the subscript indicates we have modified the Compustat FSBM. We start by
describing what we mean by “articulation” since this term has been defined in different ways in the
literature. We provide the 25 equations that constitute the FSBMm, after which we describe how
we constructed the model.
7
Overview of Articulation
Compustat is the most widely used database of financial statement information for accounting
and finance research. According to S&P Capital IQ, a subsidiary of McGraw Hill Financial and the
provider of Compustat databases, data included in Compustat go through an extensive process to
collect and standardize fundamental company data.8 Despite the extensive application in
accounting and finance research, to our knowledge there has not been s systematic examination of
whether Compustat F/S data articulate.
Most definitions of articulation refer to how F/S relate to one another, but no single
authoritative definition of articulation exists. Often, accountants informally refer to articulation as
the flow line items (on the I/S, CF/S and OE/S) connecting or relating to the stock line items (on
the B/S).9 Hribar and Collins (2002) consider how non-operating events contribute to
“non-articulation” between financial statements, but define articulation differently than we do in
this paper. In an early explicit use of the term articulation, Mann (1984) showed how I/S and CF/S
information can be used to reconcile B/S accounts. In another early explicit use of articulation,
Black (1993) uses articulation as being equivalent to “clean surplus” accounting. More recently, in
commenting on the Financial Accounting Standards Board’s (FASB’s) discussion on Financial
Statement Presentation, Moehrle et al. (2010) use articulation to mean linkage of F/S, and they
connect such articulation to the FASB’s cohesiveness objective. While Moehrle et al. (2010) do
8
The four-step process includes: 1) Alignment of data according to Financial Accounting Standards Board (FASB),
Securities & Exchange Commission (SEC) and GAAP (including IFRS) guidelines and principles; 2) Examination by
staff analysts to ensure the quality of data integrated from third-party partners; 3) Extraction of information from the
financial statement notes; 4) Completion of comprehensive data reviews including over 14,000 system-based validity
checks. The description of the Compustat standardization process was found through the following URL:
http://www.compustat.com/Compustat_Standardization/. It has since been removed following S&P Capital IQ’s
website reorganization.
9
If we define a statement based on the types of line items included, there is a single flow statement (the I/S), as both
the CF/S and OE/S include both flow and stock items. Also, while the information reported in the OE/S is required to
be reported, it may be reported in a separate OE/S, in the notes or in the other statements. Finally, while the
Comprehensive Income Statement (CI/S) is now a fifth document forming the set of F/S, we incorporate the CI/S
information as I/S equations in our FSBMm.
8
not explicitly state that they are interpreting the term articulation in the broader sense, it is closer to
our definition of articulation.
Our Definition of Articulation and the FSBMm
We believe that a full model of articulation needs to describe how line items on each F/S
aggregate as well as how individual line items connect the different F/S. To formally define
articulation, we build on the Financial Statements Balancing Model (FSBM) for North American
Companies constructed by Compustat, which describes the inter-relations among standardized
data items from the I/S, B/S, and CF/S.10 More specifically, articulation in this paper is defined to
hold for a set of F/S if all the equations constituting the FSBMm hold.
The advantage of our definition is that: (1) it is explicit, (2) it can be used for comparison
purposes, and (3) it covers all the transactions required to reconcile B/S amounts. More
specifically, we specify explicitly how the different F/S aggregate and then use the information
from the flow documents to reconcile beginning and ending B/S amounts. Hence our model is
more general, for example, we can describe the articulating activity per the Hribar and Collins
(2002) definition as a subset of the articulating activity included in our model.
We complete the creation of the FSBMm by introducing an OE/S to the FSBM, after having
created new variables to complete the articulation of the F/S data. The OE/S relations add five new
equations, so that the FSBMm is composed of 25 equations in total, featuring eight, seven, five and
five equations representing the relations for the B/S, I/S, CF/S and OE/S, respectively.11
10
We download the file named “Financial Statement Balancing Model – North American Companies.xls” which
contains Compustat’s FSBM from the Compustat website at Wharton Research Data Services as of July 1, 2012. We
show how we construct our version of Compustat’s FSBM and the expanded model, the FSBMm, in detail in an online
appendix.
11
An important caveat to our study is that our FSBMm does not develop the articulating relations for all line items in
the F/S. It represents a complete set of aggregation relations on all F/S, articulating relations between the I/S and CF/S,
and articulating relations for some, but not all, line items on the B/S. Specifically, we develop articulating relations for
the cash account and for the equity accounts on the B/S, but not for the non-cash asset or the liability accounts. We
defer the development of a more articulating model to future work; see Casey et al. (2015).
9
We present the FSBMm in four parts, covering the equations relating to the balance sheet
(B/S), income statement (I/S), cash flow statement (CF/S) and owners’ equity statement (OE/S)
relations, respectively.
B/S (Eight Equations: 1Am – 1Em)
Assets – Current:
CHm+IVST+RECTR+TXR+RECCO+INVT+ACO=ACTm
Total:
ACTm +PPEGT –DPACT +IVAEQ +IVAO +INTAN +AO = ATm
Liabilities – Current:
DLC+AP+ TXP+LCO = LCTm
Total:
LCTm +DLTT +TXDB+ITCB +LO = LTm
Equity – Retained earnings: REUNAm + AOCIa = REm
Total:
STKNa + REm = SEQ
Liabilities and equities:
LTm + SEQ + MIBTa = LSE
B/S Totals:
LSE = ATm
(1A.im)
(1A.iim)
(1B.im)
(1B.iim)
(1C.im)
(1C.iim)
(1Dm)
(1Em)
I/S (Seven Equations: 2A – 2F)
Operating income:
sale – cogs – xsga – dp = oiadp
Pre-tax income:
oiadp – xint + nopi + spi = pi
Net income (NI):
pi – txt – mii = ib
NI equivalence – CF/S:
ib = ibcm – mii
and
– OE/S:
ib + xido = cibegnim – mii
Comprehensive income:
cibegnim + ocia – cimii = citotalm
Extraordinary Income:
ib – dvp + cstke + xido = niadj12
(2A)
(2B)
(2C)
(2D.im)
(2D.iim)
(2Em)
(2F)
CF/S (Five Equations: 3Am – 3D.iim)
Operating:
oancf = ibcm +dpc +xidoc +txdc +esubc +sppiv +fopo
+recch +invch +apalch +txach +aoloch
(3Am)
Investing:
ivncf = –ivch +siv +ivstch –capx +sppe –aqc +ivaco
(3B)
Financing:
fincf = sstk +txbcof –prstkc – dv +dltis –dltr +dlcch +fiao
(3C)
CF/S checks:
oancf + ivncf + fincf + exre = chech
(3D.i)
and
(CHm,t – CHm,t-1) = chech
(3D.iim)
OE/S (Five Equations: 4Am – 4Em)
Capital Stock:
STKNa,t– STKNa,t-1 = sstk – prstkc + STKNaplug
Retained Earnings:
REUNAm,t – REUNAm,t-1 = ibcm – mii – dvt + REUNAmplug
AOCI:
AOCIa,t – AOCIa,t-1 = citotalm – cibegnim + AOCIaplug
OE/S:
SEQt – SEQ t-1 = citotalm + sstk – prstkc – dvt + STKNaplug
+ REUNAmplug + AOCIaplug
Non-controlling interest:
MIBTa,t – MIBTa,t-1 = mii + cimii + MIBTmplug
(4Am)
(4Bm)
(4Cm)
(4Dm)
(4Em)
A formal definition of articulation with respect to F/S data requires that we specify exactly
how the specific F/S data variables aggregate within each F/S as well as between F/S’s. For
12
The Compustat FSBM includes more variables covering extraordinary income relations. We choose the simpler
structure of a single equation as we decided not to build additional framework on these other income relations.
10
example, we specify how the B/S asset line items aggregate into total assets as well as how the line
items from the flow documents affect the B/S. As mentioned, Compustat provides articulating
relations in equations shown in their FSBM; however multiple possible variations of the FSBM
exist that are consistent with the equations that Compustat provides. More specifically, building
the FSBM requires researchers to choose among different variables and exercise discretion when
constructing the FSBM equations.13 For example, we use total inventory, INVT, rather than the
individual inventory amounts, because these latter amounts are occasionally missing even when
INVT is non-zero.14 The discretion in building the FSBM extends to the FSBMm, but before
discussing the FSBMm in more detail, we first clarify expositional issues on variable construction
and notation.
First, all variables listed in the equations in this study are either Compustat variables or are
based on Compustat variables. For example, we construct both aggregated and modified variables
to make the FSBM consistent over time, using “a” or “m” in the variable subscript to indicate an
aggregated or modified variable, respectively. We construct our five aggregated variables by
simply summing existing Compustat variables.15
We have ten modified variables, seven for the B/S and three for the I/S.16 We “back fill”, or
use existing data, to create the modified variables. We use two types of procedures, back-filling
13
Again, see the online appendix for more details.
For example, in 2004 the Compustat record for General Electric (ticker: GE) displays total inventory (INVT) of
$16,279 million while each of the inventory components (raw material, work-in-process, finished goods and other
inventory, denoted as INVRM, INVWIP, INVFG, and INVO, respectively) display a null to indicate missing data. Our
S&P Capital IQ contact informed the authors that the null was used in this case because the reported inventory was not
classifiable into its components.
15
The aggregated variables, denoted as MIBTa, PSTKa, AOCIa, STKNa, and ocia, measure, respectively,
non-controlling interest, preferred equity, common equity, accumulated other comprehensive income on the B/S and
other comprehensive income on the CI/S. See the online appendix for the equations for these variables.
16
The modified variables for the B/S, denoted as ACTm, ATm, CHm, LCTm, LTm, REm,and REUNAm, measure,
respectively, current and total assets, cash, current and total liabilities, and retained earnings including and excluding
accumulated other comprehensive income. The modified variables for the I/S, denotes as cibegnim, citotalm, and ibcm,
measure, respectively, comprehensive income beginning net income, parent comprehensive income, and income
before extraordinary items on the CF/S.
14
11
based on identities or based on changes in GAAP. We create six of the B/S variables by filling in
missing variables using identities. So, for example, when variable ACT is missing but its current
asset components are not missing,17 we set ACTm equal to the sum of the components; in analogous
fashion we generate variables ATm, CHm, LCTm, LTm, and REm. We carry out such backfilling only
when the subtotals are missing; stated differently, if the subtotal values are available we do not
change them since this would imply fundamental changes to the data (also see Example 1 in
Appendix B).
We create the remaining B/S and all three I/S variables based on changes in GAAP. The two
GAAP changes that we use are the reporting of other comprehensive income and of
non-controlling interest; these changes affect Compustat amounts beginning in fiscal years 2001
and 2009, respectively. For example, cibegni is not reported prior to 2001 and is reported without
mii (income to non-controlling interest) between 2001 and 2009. We create the modified variable
cibegnim to represent income before extraordinary items, including mii, for the entire period.18 We
generate variables REUNAm, citotalm, and ibcm using a similar approach.
Second, Compustat variables are not case sensitive, but we use case to denote whether a
variable is a stock or a flow variable. As these equations show, we use upper case letters to denote
B/S variables and use lower case letters to denote flow statement variables, in other words,
variables from the I/S, the CF/S or the OE/S. Hence, the first four aggregated variables, AOCIa,
MIBTa, PSTKa, and STKNa, are all B/S line items while the fifth variable, ocia, denoting the
amount of other comprehensive income for the period, is a flow document line item from the CF/S.
Next, we discuss the 25 equations that constitute our version of Compustat’s FSBM. The 20
17
According to S&P Capital IQ, examples of situations when a “Null” value may be assigned to ACT can include cases
when a firm has unclassified balance sheet and does not report total current assets.
18
Our method treats income to minority interests and to non-controlling interests as the same item. While not, strictly
speaking, accurate, we felt that it was more appropriate than excluding all firm years with this income non-zero. It
cannot be ignored, since then the F/S would not articulate. Also, we use GAAP effective as of 2011 as our benchmark.
12
equations, Equations A1.i – 3D.ii, form a set of equations representing the B/S, I/S and CF/S
relations that should hold, by definitions, over the entire Compustat database population. 16 of
these equations are identities that aggregate components into subtotal and totals (e.g. Equation
2A); all of these equations also belong to the Compustat FSBM. Four equations relate the F/S to
each other. Equation 1Em ensures that assets equal liabilities plus equities; while considered an
accounting identity, it is not part of the FSBM. Equations 2D.i and 2D.ii connect the I/S to the
CF/S (relating ib to ibc) and connect the I/S to the CI/S (relating ib to cibegni). The 4th equation in
this group, 3D.iim, connects the change in cash on the CF/S to the change on the B/S.
The last five equations represent the relations of the equity accounts in the OE/S. Equations
4Am – 4Cm and 4Em represent the B/S equity variables, capital stock, retained earnings,
accumulated other comprehensive income and non-controlling interest. Equation 4Dm shows that
change in the total parent equity B/S variable, SEQ, is equal to the sum of the changes in the three
component parent equity B/S variables, STKNa, REUNAm and AOCIa. These equations are
qualitatively different from the first 20 equations that are based on the B/S, I/S and CF/S relations
as they are not part of the FSBM published by Compustat. While accountants may be interested in
the changes to owners’ equity due to operating and non-operating activities, not all Compustat
customers are similarly interested.19 Since these relations do not exist in Compustat’s FSBM, we
expect there to be situations where these equations do not hold. Hence, we define plug variables,
denoted with “plug” at the end of the variable name, to capture these differences.
19
S&P Capital IQ indicated that the lack of OE/S data probably represented lack of interest from potential customers.
However, some academic accounting researchers find these data potentially valuable. Penman (2012), for instance,
suggests that financial statement analysis should begin with the OE/S. Although we think these equations are
necessary for any complete version of a FSBM, there are clearly issues with these constructs. For example, consider
Equation 4Am. Since sale and purchase of common and preferred shares (sstk and prstkc, respectively) are both CF/S
variables, for most FY’s, these amounts will not reflect the amounts flowing through the equity accounts. However,
we do not have any OE/S variables, so we used the CF/S variables as proxies for the missing OE/S variables. Insofar as
these are poor proxies, the associated plug variables should be large.
13
IV. EMPIRICAL ANALYSIS
In this section we use our version of Compustat’s FSBM to ascertain the extent to which
exceptions exist. We then describe how to resolve these exceptions through three steps and create
a new modified database, denoted as the MDB. The MDB is comprised of all initial Compustat
variables as well as the new variables described in Section III above. After resolving the
exceptions, we present descriptive statistics to gauge the impact our modifications have on the
original Compustat data in general and some commonly used variables and ratios in particular.
Exceptions and their resolution
Table 1 details how the initial sample of FYEOs and exceptions is generated. We use 1988 –
2011 as our sample period to ensure the availability of CF/S variables and obtain a set of 266,711
firm-year observations corresponding to 28,209 unique firms. For each firm-year observation, we
download the data for 95 Compustat variables, including 38, 26, 31 and 1 variables from the B/S,
I/S, CF/S and CI/S or OE/S, respectively.20 We delete 10,032 firms in the financial, insurance or
utilities sectors, leaving 18,177 firms and 165,563 firm years. We also remove non-US firms, firms
with sales or total assets less than $1 million, and each firm year for which there is insufficient
CRSP data. This process of elimination results in 10,681 unique firms and 92,951 firm-year
observations remaining. Requiring firms to have sales and total assets of at least $1 million ensures
that we do not have firm-year observations where all B/S or I/S data fields are nulls or zeros. These
firm years form the basis of our initial database, which we refer to as the “raw database” or RDB.21
[Insert Table 1 about here]
20
See the online appendix B for a list of all Compustat variables in the database. Also, for variables used in multiple
statements (e.g., ibc is in I/S and CF/s), we counted the variable on the statement that shows up first in the FSBMm. So,
for example, we counted ibc as being on the I/S, not the CF/S.
21
Changes in fiscal year end and accounting restatements are likely to affect the number of exceptions. There are 750
unique firms and 785 firm year observations for which there is a change in the month of their fiscal year-end. Our
sample also contains 5,646 firm-year observations (6.07% of the sample) that were subject to a financial
restatement. These restatements originate from 1,883 (17.62%) unique firms.
14
Applying the 20 equations based directly on Compustat’s FSBM to the 92,951 firm-years, we
obtain a total of 1,847,444 firm-year-equation-observations (FYEOs) in the initial database.22 Of
all the total FYEOs, we initially find 560,684 total exceptions. As is shown in Figure 1, most of the
exceptions are due to missing values, or null values (548,902 out of 560,684).23 Of the 92,951
firm-years in the sample, 99% firm-years have at least one exception, 98% have more than one
exception from different equations, and less than 1% have no exceptions.
The first step in resolving exceptions relates to how Compustat codes the data obtained from
the firms’ financial statements. Compustat indicates missing information for a variable using a null
value in place of a number for that variable. According to Compustat, such “nulls” can include
cases where either the firms did not report the amount or Compustat analysts were unable to assign
the amount to a specific variable field due to imperfect company reporting. As an example, for a
given firm-year observation the amount of total inventories, INVT, may be available but inventory
components such as work-in-process (WIP) could be nulls since Compustat was unable to
determine how much of total INVT could be assigned to WIP. In other cases, Compustat uses null
values in conjunction with data codes to indicate that a company mentions having that data but
does not report a reasonable value for Compustat to report. This is often due to two different data
points getting reported on a combined basis by the company, such as ‘prepaid expenses and other
current assets’ with no additional break out provided.24 Regardless of their origin, such null fields
represent missing data to end users and pose a challenge to their research design.
[Insert Table 2 about here]
22
For 92,951 firm-year observations with 20 equations for each firm-year, there are 1,859,020 FYEOs at the
maximum. Eliminating 11,576 observations without consecutive data items to calculate Equation 3D.ii, we are left
with 1,847,444 total FYEOs.
23
Empirically, we identify an exception when the absolute value of the difference between the left-hand side and the
right-hand side of an equation is greater than or equal to $1 million. We consider a difference an exception if it is at
least $1 million to exclude possible rounding errors from exceptions.
24
We thank S&P Capital IQ for explaining this practice to us.
15
As Table 2 and Figure 1 show, most of the exceptions (463,747/560,684 = 82.7%) are
resolved by replacing each null value for variables in our system of equations with a zero, in other
words, the equation holds once we insert the zeros. For example, 7,335 of the 92,951 firm years
(roughly 8%) have at least one out of the eight variables in Equation 1A.im with a null value instead
of a number. Replacing nulls with zeros resolves 4,281 exceptions, about 58.4% of the exceptions
in this equation. Therefore, a researcher who exclude the null or missing value observations in
these variables would lose about 8% of these firm years; those who choose to replace these missing
data with zeros would be justified about 58.3% of the time. For a more specific example, Verizon
has a null value for xsga (SG&A expense) in 2002. Once we replace the null value with a zero,
Equation 2A holds and the exception is resolved.25 Using the FSBMm equations to determine
when to fill in missing values avoids the unnecessary loss of data in the first case, and in the
second, gives a reliable guideline to when a null variable should be replaced with the number zero.
We believe that our FSBMm offers empirical researchers a method of using accounting equations
as an internally consistent way to expand the sample size in their research.
Although replacing nulls with zeros resolves most exceptions, many exceptions remain after
step 1. In the third column in Table 2, we report the remaining exceptions by equation. Our second
step is to replace subtotals with the sum of components for total current assets, total current
liabilities, total assets, and total liabilities.26 As part of step 2, we also use CHm in place of CH.27
We discovered that there are a significant number of these subtotals with null values. The process
of summing the components of these subtotals results in 10,342 of these exceptions being
resolved, leaving 86,595 exceptions remaining after step 2.
25
See more details in Appendix B, Example 3
.
26
For an example of this procedure, see Appendix B, Example 1, which details our procedure for Berkshire Hathaway
for 2011.
27
Details are explained in the online appendix A.
16
The final stage in resolving the exceptions involves understanding how changes in GAAP
apply to the remaining exceptions. Many of these arise because our equations are based on GAAP
as of 2011, hence they do not apply to all years in the sample. For instance, Compustat changed
reporting methods in 2001 for the GAAP comprehensive income requirements. In particular, our
Equation 1C.i is a B/S equation for the equity accounts and includes the accumulated other
comprehensive income balance, ACOMINC, as part of the aggregated variable AOCIa. The line
item ACOMINC is first required on the B/S in 1998.28 While Compustat reports ACOMINC for
some firms beginning in 2000 and earlier, the variable is missing from most firm years prior to
2001. As exceptions arise in our equations over time, due to changes in GAAP and the resulting
changes in Compustat’s reporting policies, we modify variables in our version of the FSBM.
Additionally, Equations 2D.i and 2D.ii are affected by the issuance of SFAS #160 in 2009 that
changes the reporting of minority interest to the reporting of non-controlling interest. We modify
ibc to ibcm and cibegni to cibegnim to resolve the exceptions in these two equations. In total, these
changes to the variables resolved 80,814 of the 86,595 remaining exceptions identified, more than
93% of the remaining exceptions. For a specific example of this procedure, see Appendix B,
Example 4, where we detail the procedure we utilize to resolve the exception in Equation 2D.ii for
GE in 2003.
Overall, we are able to resolve all but 5,781, or 1% of all the 560,684 exceptions identified via
the 20 articulation equations.29 With the additional five OE/S equations (Equations 4Am – 4Em)
28
Statement of Financial Accounting Standard (SFAS) #130, “Reporting Comprehensive Income” (ASC Topic 220)
is issued in June, 1997 and applied to fiscal years beginning after December 15, 1997. However, Compustat seems to
have included the information on a regular basis only after 2001.
29
Of the 5,781 unresolved exceptions, the majority (4,477) relate to Equation E3D.ii, which requires the change in the
cash on the B/S to equal the change in cash on the CF/S. Excluding the exceptions from Equation 3D.ii, this means
only 1,304 out of the 560,684 original population of exceptions (0.233%) remain unresolved. Restated F/S would
likely cause an exception in Equation 3D.ii. While we do not know how many of the 4,477 exceptions are due to
restated F/S, this is likely to be a primary source of these exceptions. We forwarded a copy of our list of unresolved
exception to personnel at S&P Capital IQ and they are updating their data accordingly.
17
and the new plug variables from them, we generate a new database, which we refer to as the
“modified database” or MDB.
[Insert Table 3 about here]
Table 3 reports the descriptive statistics for the remaining exceptions after step 1 where we
replaced the null values with zeros. For example, in Panel A of Table 3, 3,054 exceptions remain
after step 1. Despite the relatively low frequency of this type of exception (3.3% of the total
number of the firm-year observations), the average magnitude of the exception is $5,389 million,
with a minimum of -$3,698 million and a maximum of $471,520 million. Many of the exceptions
related to changes in GAAP could also be resolved by filling in zeros. For example, only 21% of
the exceptions (12,763 out of 59,799) remain in Equation 1C.i after step 1, but it has an average
that is both statistically different from zero and economically significant ($17.19 million).
To further illustrate the magnitudes of the exceptions remaining after step 1, we deflate the
variables by total assets and present the summary statistics in Panel B. Nine of the twenty
equations still contain exceptions with averages that are statistically significantly different than
zero at the 1% level. Of those, five equations, in other words E1A.i, E1A.ii, E1B.i, E1C.ii, and
E1D, have exceptions with an average of greater than 20% of total assets. We believe these
exceptions are material enough to warrant the attention of researchers using Compustat data.
Table 4 presents descriptive statistics for the remaining exceptions after all modifications.
Panel A of Table 4 displays the raw values of the exceptions that remain after our three-step
process, while Panel B displays these values scaled by total assets. Analyzing these panels
simultaneously reveals that many of the remaining exceptions are indeed economically significant.
[Insert Table 4 about here]
Table 4 also presents descriptive statistics for our OE/S plug variables. As mentioned above,
18
these “plug” variables capture the exceptions in the OE/S equations, Equations 4Am – 4Cm and
4Em, in other words, cases where these equations do not hold. The large and significant STKNaplug
(with a mean of $52.99 million) indicates that many observations in our final dataset suffer from
incomplete data flowing through the equity accounts. The significantly negative value for the
REUNAmplug (with a mean of -$16.20 million) suggests inconsistencies in reporting minority
interest have a significant impact on the equity measurement. Lastly, we expected the MIBTaplug
to capture much of the missing information in equity. It is less economically significant compared
to the other plug variables (with a mean of $-4.81 million).
Impact of resolutions on Financial Statement Ratios and Modified Variables
Table 5 displays descriptive statistics for three types of data: variables that are significantly
different in the MDB than the original data, some accrual variables, and financial statement ratios.
The first six rows represent some commonly used variables that are significantly different between
the original Compustat data and our modified data. For example, two subtotal accounts, current
assets (ACT) and current liabilities (LCT) are significantly different (690.65 versus 523.14, t=7.69;
476.63 versus 373.75, t=6.25), which agrees with earlier analysis on the effect of null values
present in these subtotals. Another interesting finding in two of the comprehensive income items
(cibegni, citotal) are significantly different between the two sets of data (73.78 versus 182.19,
t=-14.72; 70.55 versus 168.44, t=-13.08). This suggests that researchers may want to employ our
model when utilizing data involving ACT, LCT and comprehensive income.
[Insert Table 5 about here]
The next seven rows in Table 5 present differences in accrual variables constructed using both
our modified data (MDB) and the original Compustat data. The accrual variable definitions are
based on Richardson et al. (2005). Significant differences exist between the two datasets for six of
19
the seven accrual variables (all but the raw working capital accruals). Finally, the last three rows in
Table 5 display differences in some financial ratios often used by researchers between the two
datasets. Both the Altman’s Z-score and the “quick ratio” are found to be have significantly
different distribution at the 1% level (Z=8.30; Z=5.50). Taken collectively, these differences
suggest that our process of modifying the raw data may have an impact on various activities where
this data is used, both in academic research and in practice. That is, if one of these variables is the
variable of interest, researchers need to be aware of the potential effects different sample and
variable treatment methods may have on their inferences.
V. CONCLUSIONS
We build a version of the Financial Statement Balancing Model (FSBM) from Compustat and
expand it to include stockholders’ equity statement relations to obtain a fully articulated modified
FSBM model, or FSBMm. We then verify these relations using historical data on a sample of
Compustat non-financial firms between 1988 and 2011. We identify 30.3% (560,684/1,847,444)
of the cases where there are exceptions to these equations. We next investigate the causes of
exceptions, and make necessary modifications that are consistent with the articulation of the
FSBM model. We are able to resolve about 99% of the exceptions.
Our paper is unique in being the first to use a fully articulated model to examine the
Compustat data. Whether financial statements articulate in the databases we use in our research
has implications on two fronts. First, in the face of missing values or other non-articulation
situations, forming a sample or defining variables of interest without addressing these issues in a
systematic fashion might complicate or even bias inferences, due to issues such as lower statistical
power or omitted correlated variables. Second, the articulating nature of financial statement data
20
has been under-appreciated and under-utilized in research. By presenting a framework to produce
a fully articulated dataset using our articulation model, we facilitate research efforts taking
advantage of the articulation relations in the spirit of Christodoulou and McLeay (2014). Through
this study we hope to shed light on these two issues and we present evidence that the process has
the potential to make an impact on the sample size and the accuracy of data in empirical research.
Our focus is on selected equations of the FSBM model, therefore only a subset of variables in
Compustat is examined. However, the equations we present and the methods we use to address the
exceptions can apply more broadly. Following our methodology, future researchers interested in
data integrity for empirical research in accounting and finance could extend the analysis beyond
our current sample non-financial US companies, or even extend our FSBM model to include their
variables of interest.
21
REFERENCES
Bahnson, P., P. B. Miller, and B. P. Budge. 1996. Nonarticulation in cash flow statements and
implications for education, research and practice. Accounting Horizons 10(4): 1–15.
Bennin, R. 1980. Error rates in CRSP and Compustat: A second look. Journal of Finance 35 (5):
1267–1271.
Black, F. 1993. Choosing accounting rules. Accounting Horizons 7 (4): 1–17.
Bloomfield, M. J., J. Gerakos, and A. Kovrijnykh. 2015. Accrual reversals and cash conversion.
Working paper, University of Chicago. Available at SSRN: http://ssrn.com/abstract=2495610
or http://dx.doi.org/10.2139/ssrn.2495610.
Boritz, J. E., and W. G. No. 2013. The quality of interactive data: XBRL versus Compustat, Yahoo
Finance, and Google Finance. Available at SSRN: http://ssrn.com/abstract=2253638 or
http://dx.doi.org/10.2139/ssrn.2253638.
Casey, R., F. Gao, M. Kirschenheiter, S. Li, and S. Pandit. 2015. Articulation based accruals.
working paper, University of Illinois at Chicago.
Chen, S., B. Miao, and T. Shevlin. 2014. A new measure of disclosure quality. Working paper,
University of Texas at Austin.
Christodoulou, D., and S. McLeay. 2014. The double entry constraint, structural modeling and
econometric estimation. Contemporary Accounting Research 7 (4): 1–20.
Dritina, R., and J. A. Largay. 1985. Pitfalls in calculating cash flows from operations. The
Accounting Review 60 (2): 314–326.
Eilifsen, A., and W. F. Messier. 2013. Materiality guidance of the major auditing firms. Working
paper, Norwegian School of Economics and University of Nevada Las Vegas.
Hribar, P., and D. W. Collins. 2002. Errors in estimating accruals: Implications for empirical
research. Journal of Accounting Research 40 (1): 105–134.
Huefner, R. J., J. E. Ketz, and J. A. Largay. 1989. Foreign currency translation and the cash flow
statement. Accounting Horizons 3 (2): 66–75.
Koh, P. and D. M. Reeb. 2014. Missing R&D. Working paper, Hong Kong University of Science
and Technology and National University of Singapore.
Lee, S., S. Pandit, and R.H. Willis. 2013. Equity method investments and sell–side analysts’
information environment. The Accounting Review 88 (6): 2089–2115.
Li, F. 2008. Annual report readability, current earnings, and earnings persistence. Journal of
22
Accounting and Economics 45: 221-247.
Lundholm, R. J, R. Rogo and J. L. Zhang. 2014. Restoring the tower of babel: how foreign firms
communicate with U.S. investors. The Accounting Review 89: 1453-1485.
Mann, H. 1984. A worksheet for demonstrating the articulation of financial statements. The
Accounting Review 59 (4): 669–673.
Moehrle, S., T. Stober, K. Jamal (Chairman), R. Bloomfield, T. E. Christensen, R. H. Colson, J.
Ohlson, S. Penman, S. Sunder, and R. L. Watts. 2010. Response to the Financial Accounting
Standards Board’s and the International Accounting Standard Board’s joint discussion paper
entitled “Preliminary views on financial statement presentation”. Accounting Horizons 24 (1):
149–158.
Ohlson, J. A. 1995. Earnings, book values, and dividends in equity valuation. Contemporary
Accounting Research 11 (2): 661–687.
Penman, S. H. 2012. Financial statement analysis and security valuation, 5th Edition, McGraw–
Hill/Irwin.
Revsine, L., D. W. Collins, and W. B. Johnson. 2002. Financial reporting and analysis, 2nd
Edition, Upper Saddle River, NJ. Prentice–Hall.
Richardson, S. A., R. G. Sloan, M. T. Soliman, and I. Tuna. 2005. Accrual reliability, earnings
persistence and stock prices. Journal of Accounting and Economics 39: 437–485.
Rosenberg, B., and M. Houglet. 1974. Error rates in CSRP and Compustat databases and their
implications. Journal of Finance 29 (4): 1303–1310.
Shi, L., and H. Zhang. 2011. On alternative measures of accruals. Accounting Horizons 25 (4):
811–836.
Wilkins, M. S. and M. L. Loudder. 2000. Articulation in cash flows statements: a resource for
financial accounting courses. Journal of Accounting Education 18: 115–126.
23
APPENDIX A. FSBMm Variables and Equations
In this appendix, we first list the new variables that we created for this study and second, we
provide the detailed equations for the Financial Statement Balancing Model or FSBM. We created
the new variables either by simply aggregating existing Compustat variables or by modifying
Compustat variables; we denote the aggregated or modified variables with an “a” or an “m” in the
subscript, respectively. These variables are then used to build the modified FSBM, denoted as
FSBMm. We start with the aggregated and then follow with the modified variables, each set
divided between B/S and I/S variables.
Aggregated variables used in FSBMm
B/S equations affected are 1C.im and 1C.iim:
AOCIa = ACOMINC + SEQO.
MIBTa = MIB + MIBN
PSTKa = PSTKR + PSTKN
STKNa = CSTK + CAPS – TSTK + PSTK.
I/S equation affected is 2Em:
ocia = cicurr + cidergl + cisecgl + ciother + cipen.
Modified variables used in FSBMm
B/S equations affected are 1C.im and 1C.iim:
The following definitions are useful for expressing the modified variables. Let the current and
non-current asset and current and non-current liability sums, denoted as Σ( ACTm ) , Σ ( ANT ) ,
Σ (LCT ) , and Σ (LNT ) , respectively, be defined as follows.
Σ( ACTm ) ≡ CH m + IVST + RECTGR + TXR + RECCO + INVT + ACO ,
Σ ( ANT ) ≡ PPEGT − DPACT + IVAEQ + IVAO + INTNG + AO ,
Σ (LCT ) ≡ DLSC + AP + TXP + LCO , and
Σ (LNT ) ≡ DLTT + TXDB + ITCB + LO .
Then the following definitions hold.
Σ(ACTm ) if 1A.i ≠ 0 and ACT = 0 
ACTm = 

if 1A.i = 0
 ACT

 ACTm + Σ(ANT) if 1A.ii ≠ 0 and AT = 0 
ATm = 

if 1A.ii = 0
 AT

24
if CH = IVST = 0, but CHE > 0 
CHE
CHm = 

if CH > 0 and CHE ≠ 0
 CH

Σ(LCT) if 1A.i ≠ 0 and LCT = 0 
LCTm = 

 LCT if 1B.i = 0

(
)
LCT
+
Σ
LNT
if
1B.ii
≠
0
and
LT
=0

m
LTm = 

if 1B.ii = 0
 LT

 AOCI a

REm =  REUNAm
 RE

 RE
REUNAm = 
 REUNA
if 1C.ii ≠ 0 and if RE = REUNAm = 0


if 1C.ii ≠ 0 and if RE = 0 and REUNA ≠ 0


if 1C.ii = 0 or if RE ≠ 0

if 1C.i ≠ 0 or if 1C.ii ≠ 0 and if year < 2001 and AOCI a = 0
if 1C.i = 1C.ii = 0 or if year ≥ 2001 or AOCI a ≠ 0
Variables used in I/S equations 2C, 2D.i, 2D.ii, 2E and 2F:
if 2D.ii ≠ 0 and if year < 2001 and ocia = 0
 ib



if 2D.ii ≠ 0 and if 2009 > year ≥ 2001 or ocia ≠ 0 
cibegnim = cibegni + mii
 cibegni

if 2D.ii = 0 or if year ≥ 2009


if 2D.ii ≠ 0 or if 2E ≠ 0 and if year < 2001 and ocia = 0
 ib

if 2D.ii ≠ 0 or if 2E ≠ 0 and if 2009 > year ≥ 2001 or ocia ≠ 0
citotalm = citotal + mii
 citotal
if 2D.ii = 2E = 0 or if year ≥ 2009

ibc + mii

ibcm =  ibc + mii
ibc









if year < 2009
if year ≥ 2009 and ibc ≠ pi − txt



if 2D.i = 3A = 0 or if year ≥ 2009 and ibc = pi − txt 
We next show equations in the form used in older Compustat manuals to help readers better
visualize how the numbers add-up. First, we show the 20 Compustat equations that form the
FSBM. These cover the B/S, I/S and CF/S (8, 7 and 5 equations, respectively). We then follow
with five equations from the OE/S, which, when added to the FSBM, form the FSBMm. Variables
in parentheses indicate subtraction.
25
FSBMm Equations
B/S (Eight Equations: 1Am – 1Em)
Equation (1A.im): Assets – Current
Cash
Short – Term Investments
Receivables – Trade
Income Tax Refund
Receivables – Current – Other
Inventories – Total
Current Assets - Other
Current Assets – Total
CHm
IVST
RECTR
TXR
RECCO
INVT
ACO
ACTm
Equation (1A.iim): Assets – Total
Current Assets – Total
Property Plant and Equipment – Total (Gross)
Depreciation, Depletion and Amortization (Accumulated)
Investment and Advances – Equity
Investment and Advances Other
Intangible Assets – Total
Assets - Other
Assets – Total
ACTm
PPEGT
(DPACT)
IVAEQ
IVAO
INTAN
AO
Equation (1B.im): Liabilities – Current
Debt in Current Liabilities
Account Payable/Creditors – Trade
Income Taxes Payable
Current Liabilities – Other
Current Liabilities – Total
DLC
AP
TXP
LCO
Equation (1B.iim): Liabilities – Total
Current Liabilities – Total
Long – Term Debt – Total
Deferred Taxes – Balance Sheet
Investment Tax Credit – Balance Sheet
Liabilities – Other
Liabilities – Total
LCTm
DLTT
TXDB
ITCB
LO
Equation (1C.im): Equity – Retained earnings
Retained Earnings – unadjusted
Accumulated Other Comprehensive Income
Retained Earnings
REUNAm
AOCIa
ATm
LCTm
LTm
26
REm
B/S (Eight Equations: 1A.im – 1Em, continued)
Equation (1C.iim): Equity – Total
Stockholders’ equity accounts
Retained Earnings
Stockholders Equity – Parent – Total
Equation (1Dm): Liabilities and equities
Liabilities – Total
Stockholders Equity – Parent – Total
Noncontrolling Interest – Total
Liabilities and Stockholders’ Equity - Total
STKNa
REm
SEQ
LTm
SEQ
MIBTa
LSE
Equation (1E): B/S Totals
Liabilities and Stockholders' Equity - Total
Assets – Total
I/S (Seven Equations: 2A – 2F)
Equation (2A): Operating income
Sales/Turnover (Net)
Cost of Goods Sold
Selling, General and Administrative Expense
Depreciation and Amortization
Operating Income After Depreciation
LSE
ATm
sale
(cogs)
(xsga)
(dp)
oiadp
Equation (2B): Pre-tax income
Operating Income After Depreciation
Interest and Related Expense
Nonoperating Income (Expense) - Total
Special Items
Pretax Income
oiadp
(xint)
(nopi)
(spi)
Equation (2C): Net income
Pretax Income
Income Taxes - Total
Noncontrolling Interest - Income Account
Income Before Extraordinary Items
pi
(txt)
(mii)
pi
Equation (2D.im): NI equivalence – CF/S
Income Before Extraordinary Items
Income Before Extraord. Items and Noncontrolling Interest
Noncontrolling Interest - Income Account
27
ib
ib
ibcm
(mii)
I/S (Seven Equations: 2A – 2F, continued)
Equation (2D.iim): NI equivalence – OE/S
Income Before Extraordinary Items
Extraordinary Items and Discontinued Operations
Comprehensive Income Beginning Net Income
Noncontrolling Interest (or NCI) - Income Account
Equation (2Em): Comprehensive income
Comprehensive Income Beginning Net Income
Other Comprehensive Income
Comprehensive Income - NCI
Comprehensive Income – Parent
Equation (2F): Extraordinary Income
Income Before Extraordinary Items
Dividends - Preferred/Preference
Common Stock Equivalents - Dollar Savings
Extraordinary Items and Discontinued Operations
Net Income (Loss)
CF/S (Five Equations: 3Am – 3D.iim)
Equation (3Am): Operating
Income Before Extraord. Items and NCI
Depreciation and Amortization
Extraordinary Items and Discontinued Operations
Deferred Taxes
Equity in Net Loss (Earnings)
Sale of PP&E and Investments - (Gain) Loss
Funds from Operations - Other
Accounts Receivable - Decrease (Increase)
Inventory - Decrease (Increase)
Accounts Payable and Accrued Liabilities - Incr (Decr)
Income Taxes - Accrued - Increase (Decrease)
Assets and Liabilities - Other (Net Change)
Operating Activities - Net Cash Flow
Equation (3B): Investing
Increase in Investments
Sale of Investments
Short-Term Investments - Change
Capital Expenditures
Sale of Property, Plant & Equipment
Acquisitions
Investing Activities - Other
Investing Activities – Net Cash Flow
ib
xido
cibegnim
(mii)
cibegnim
ocia
(cimii)
citotalm
ib
(dvp)
cstke
xido
niadj
ibcm
dpc
xidoc
txdc
esubc
sppiv
fopo
recch
invch
apalch
txach
aoloch
oancf
(ivch)
siv
ivstch
(capx)
sppe
(aqc)
ivaco
ivncf
28
CF/S (Five Equations: 3Am – 3D.iim, continued)
Equation (3C): Financing
Sale of Common and Preferred Stock
Excess Tax Benefit of Stock Options - Cash Flow Fin.
Purchase of Common and Preferred Stock
Cash Dividends
Long-Term Debt – Issuance
Long-Term Debt - Reduction
Changes in Current Debt
Financing Activities – Other
Financing Activities - Net Cash Flow
sstk
txbcof
(prstkc)
(dv)
dltis
(dltr)
dlcch
fiao
fincf
Equation (3D.i): CF/S checks
Operating Activities - Net Cash Flow
Investing Activities – Net Cash Flow
Financing Activities - Net Cash Flow
Exchange Rate Effect
Cash and Cash Equivalents - Increase (Decrease)
oancf
ivncf
fincf
exre
Equation (3D.iim): CF/S checks
Cash (t)
Cash (t-1)
Cash and Cash Equivalents - Increase (Decrease) (t)
CHm,t
(CHm,t-1)
OE/S (Five Equations: 4Am – 4Em)
Equation (4Am): Capital Stock
Stockholders’ equity accounts (t)
Stockholders’ equity accounts (t-1)
Sale of Common and Preferred Stock
Purchase of Common and Preferred Stock
Stockholders’ equity accounts Plug
chech
checht
STKNa,t
(STKNa,t-1)
sstk
(prstkc)
STKNaplug
Equation (4Bm): Retained Earnings
Retained Earnings – unadjusted (t)
Retained Earnings – unadjusted (t-1)
Income Before Extraord. Items and NCI
Noncontrolling Interest - Income Account
Dividends - Total
Retained Earnings – unadjusted Plug
REUNAmplug
REUNAm,t
(REUNAm,t-1)
ibcm
(mii)
(dvt)
29
OE/S (Five Equations: 4Am – 4Em, continued)
Equation (4Cm): AOCI
Accumulated Other Comprehensive Income (t)
Accumulated Other Comprehensive Income (t-1)
Comprehensive Income – Parent
Comprehensive Income Beginning Net Income
Accumulated Other Comprehensive Income Plug
Equation (4Dm): OE/S
Stockholders Equity – Parent – Total (t)
Stockholders Equity – Parent – Total (t-1)
Comprehensive Income – Parent
Sale of Common and Preferred Stock
Purchase of Common and Preferred Stock
Dividends - Total
Stockholders’ equity accounts Plug
Retained Earnings – unadjusted Plug
Accumulated Other Comprehensive Income Plug
Equation (4Em): Non-controlling interest
Noncontrolling Interest – Total (t)
Noncontrolling Interest – Total (t-1)
Noncontrolling Interest - Income Account
Comprehensive Income - NCI
Noncontrolling Interest – Total Plug
AOCIa,t
(AOCIa,t-1)
citotalm
(cibegnim)
AOCIaplug
SEQt
(SEQ t-1)
citotalm
sstk
(prstkc)
(dvt)
STKNaplug
REUNAmplug
AOCIaplug
MIBTa,t
(MIBTa,t-1)
mii
cimii
MIBTmplug
30
APPENDIX B. Examples of Resolution of Exceptions
This appendix illustrates several examples of how we resolve the exceptions. Some are
resolved after step 1, while others require three steps before resolution.
Example 1 – Equation 1A.im for Berkshire Hathaway (Ticker = “BRK.B”) in 2011
Step 1 (raw data) and step 2 (replace with zero):
(1A.i)
CH+IVST+RECTR+TXR+RECCO+INVT+ACO=ACT
BRK.B 2011
37,299 + 7,063 + 0 + 0 + 32,946 + 8,975 + 0 = 0; yields an exception of 86,283
Step 3 (use sum of components in place of null subtotal):
(1A.i)
CHm+IVST+RECTR+TXR+RECCO+INVT+ACO=ACTm
BRK.B 2011
37,299 + 7,063 + 0 + 0 + 32,946 + 8,975 + 0 = 86,283; yields an exception of 0
Example 2 – Equation 1C.iim for Ford (Ticker = “F”) in 2000
Step 1 (raw data):
(1C.ii)
Ford (2000) step 1:
STKNa + RE = SEQ
“.” + 14,452 = 18,610; yields an exception of “.” or null
Step 2 (replace with zeros):
(1C.ii)
STKNa + RE = SEQ
Ford (2000) step 2:
0 + 14,452 = 18,610; yields an exception of -4,158
Step 3 (GAAP changes):
(1C.ii)
STKNa + REm = SEQ
Ford (2000) step 2:
4,158 + 14,452 = 18,610; yields an exception of 0
Note: The 10-K for Ford has 4,158 under STKNa but we do not know why the STKNa field in
Compustat has a null value.
Example 3 – Equation 2A for Verizon (Ticker = “VZ”) in 2002
Step 1 (raw data)
(2A)
sale – cogs – xsga – dp = oiadp
Verizon (2002) 67,625 – 38,664 – “.” – 13,423 = 15,538; yields an exception of “.”
Step 2 (replace with zeros):
(2A)
sale – cogs – xsga – dp = oiadp
Verizon (2002) 67,625 – 38,664 – 0 – 13,423 = 15,538; yields an exception of 0
31
Example 4 – Equation 2D.iim for GE (Ticker “GE”) in 2003
Step 1 (raw data):
(2D.ii)
ib
+ xido = cibegni – mii
GE (2003) step 1: 15,589 + -587 = “.”
- 290; yields an exception of . or null
Step 2 (replace with zeros):
(2D.ii)
ib
+ xido = cibegni – mii
GE (2003) step 2:
15,589 + -587 = 0
- 290; yields an exception of -15,292
Step 3 (GAAP changes):
(2D.im)
ib
+ xido = cibegnim – mii
Ford (2000) step 2:
15,589 + -587 = 15,292 - 290; yields an exception of 0
Note: For GE it would not be possible to find 15,292 under cibegni in their income statement
because GE did not report comprehensive income in 2003 (they started reporting it in 2004).
Example 5a – Equation 3D.ii for Walmart (Ticker = “WMT”) in 2007
Step 3 (after all changes have been implemented)
(3Dii)
(CHm,t – CHm,t-1) = checht
Walmart 2007
5,569 – 7,373 = -2,198; yields an exception of 394
Example 5b – Equation 3Dii for General Electric (Ticker = “GE”) in 2003-2008
Step 3 (after all changes have been implemented)
(3Dii)
(CHm,t – CHm,t-1) = checht
Cash Amounts for General Electric Corporation
GE
Fy
2003
2004
2005
2006
2007
2008
CHt
12,664*
15,328*
9,011
14,275*
15,747
48,187*
CHt –
CHt-1
2,664
-6,317
5,264
1,472
32,440
checht
3D.iim
*
2,664
-3,527*
2,474*
1,755*
32,336*
0
-2790
2790
283
104
CF/S Beg
8,910
12,664
15,328
11,801
14,276
16,031
CF/S End
12,664
15,328
11,801
14.275
16,031
48,367
Discont.
Cash
0
3,267
2,976
0
300
180
As our source for the CF/S amounts in the three final columns, we used the two most recent years from
the three annual reports for fiscal years 2004, 2006 and 2008. For each amount marked with a “*”, we trace
the CH amount to the B/S and other amounts to the CF/S per the 2004, 2006 and 2008 annual reports.
32
Figure 1. Resolution of Exceptions
Total number of exceptions: 560,684
Exceptions with Nulls:
548,902
Exceptions with Non-Nulls:
11,782
Step 1: replace nulls with zeros
Number of exceptions after Step 1:
96,937
Step 2: replace sums/components
Number of exceptions after Step 2:
86,595
Step 3: GAAP changes
Number of exceptions after Step 3:
5,781
This chart depicts the three steps we take to resolve the exceptions.
Step 1: Replacing null with zero;
Step 2: Replacing subtotals with the sum of components for total current assets, total current liabilities, total assets,
and total liabilities;
Step 3: Adjusting variables in reflection of GAAP changes.
33
Table 1. Number of Firms, Firm-years (FYs), Firm-Year-Equation-Observations (FYEOs)
and FYEO Exceptions in Sample
Description
Firms
Firm Years
Initial sample
28,209
266,711
Less: Financial, utilities, or insurance firms/FY's
18,177
165,563
Less: Non-U.S. firms/FY's
14,377
135,967
Less: Firms/FY's with sales < $1 million or assets < $1 million
13,120
120,092
Less: Firms/FY's with insufficient CRSP data
10,681
92,951
Number of FYEOs in final sample
1,847,444
This table summarizes how we generate the initial database of FYEOs. We start with all firms in the Compustat
Fundamental Annual database as of January 17, 2014 over the 24 year period, 1988-2011. Our initial database has
28,209 unique firms and 266,711 FY’s. We delete the financial, utilities, and insurance firms and FY’s from the
sample, leaving 18,177 firms and 165,563 FY’s. Next, we delete 29,596 FY’s corresponding to 3,800 firms that are not
domiciled or headquartered in the US. We also delete 15,875 FY’s corresponding to 1,257 firms having assets or sales
below $1 million, and a further 27,141 FY’s (2,439 firms) with insufficient data on CRSP, yielding a final sample of
10,681 unique firms with 92,951 FY’s. Last, we run the 20 equations on the 92,951 FY’s to obtain 1,859,020 FYEOs,
and after adjusting for 11,576 firm-year observations where we do not have consecutive annual data items to calculate
Equation 3D.ii, we are left with a total of 1,847,444 FYEOs.
34
Table 2. Resolution of Exceptions
Equation
E1A.i
E1A.ii
E1B.i
E1B.ii
E1C.i:
E1C.ii:
E1D
E1E
E2A
E2B
E2C
E2D.i:
E2D.ii:
E2E
E2F
E3A
E3B
E3C
E3D.i
E3D.ii
Total
Total # of
exceptions
# of exceptions
resolved in
Step 1
# remaining
exceptions
after Step 1
# of exceptions
resolved in
Step 2
# remaining
exceptions
after Step 2
# of exceptions
resolved in
Step 3
# remaining
exceptions
after Step 3
7,335
23,057
3,082
5,965
57,999
1,219
85,639
0
7,600
8,703
10,927
16,716
74,164
85,356
2
49,038
33,639
83,826
310
6,107
560,684
4,281
20,491
995
3,679
45,236
3
85,430
0
7,598
8,699
10,926
10,935
12,428
85,336
1
48,987
33,608
83,791
283
1,040
463,747
3,054
2,566
2,087
2,286
12,763
1,216
209
0
2
4
1
5,781
61,736
20
1
51
31
35
27
5,067
96,937
3,041
2,350
2,077
2,284
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
590
10,342
13
216
10
2
12,763
1,216
209
0
2
4
1
5,781
61,736
20
1
51
31
35
27
4,477
86,595
0
0
0
0
12,636
905
0
0
0
0
0
5,706
61,550
17
0
0
0
0
0
0
80,814
13
216
10
2
127
311
209
0
2
4
1
75
186
3
1
51
31
35
27
4,477
5,781
This table summarizes the frequencies at which exceptions are resolved using the three different steps we employ. All equations are defined in the appendices.
Step 1: Replacing null with zero;
Step 2: Replacing subtotals with the sum of components for total current assets, total current liabilities, total assets, and total liabilities;
Step 3: Adjusting variables in reflection of GAAP changes.
35
Table 3. Descriptive Statistics of Remaining Exceptions after Step 1
Panel A. Raw Values
Exception
E1A.i
E1A.ii
E1B.i
E1B.ii
E1C.i:
E1C.ii:
E1D
E1E
E2A
E2B
E2C
E2D.i:
E2D.ii:
E2E
E2F
E3A
E3B
E3C
E3D.i
E3D.ii
N
3,054
2,566
2,087
2,286
12,763
1,216
209
0
2
4
1
5,781
61,736
20
1
51
31
35
27
5,067
MEAN
5,389.74
-6,597.35
4,967.45
-4,516.56
17.19
-826.77
-204.64
0
10.00
16.00
68.00
22.71
43.85
542.05
1.00
70.51
87.48
-53.86
-3.04
-14.33
***
***
***
***
***
***
***
*
***
***
**
**
**
*
***
STD DEV
30,673.30
33,811.27
25,849.33
24,741.54
248.37
3,119.39
411.75
0
7.07
12.52
.
176.41
701.98
1,153.79
.
235.73
180.55
169.80
41.46
647.91
MAX
471,520.00
5.00
331,312.00
2,420.00
5,765.00
2,322.00
-1.00
0
15.00
34.00
68.00
6,155.00
26,305.00
4,300.00
1.00
834.00
799.00
293.00
124.00
35,622.00
36
Q3
688.00
-38.00
471.00
-14.00
11.00
-50.00
-16.00
0
15.00
24.50
68.00
15.00
20.00
615.50
1.00
14.00
106.00
1.00
11.00
6.00
MEDIAN
99.50
-233.00
101.00
-74.50
2.00
-188.50
-56.00
0
10.00
11.50
68.00
4.00
3.00
-4.00
1.00
2.00
9.00
-9.00
-2.00
-1.00
***
***
***
***
***
***
***
***
***
**
*
***
Q1
6.00
-1,032.00
22.00
-398.00
-2.00
-540.50
-158.00
0
5.00
7.50
68.00
1.00
-4.00
-12.50
1.00
-3.00
1.00
-76.00
-15.00
-12.00
MIN
-3,698.00
-471,520.00
-148.00
-331,312.00
-6,979.00
-55,617.00
-2,420.00
0
5.00
7.00
68.00
-4,120.00
-98,418.00
-264.00
1.00
-185.00
-69.00
-655.00
-83.00
-12,234.00
Table 3. Descriptive Statistics of Remaining Exceptions after Step 1
Panel B. Values Scaled by Total Assets
Exception
E1A.i
E1A.ii
E1B.i
E1B.ii
E1C.i:
E1C.ii:
E1D
E1E
E2A
E2B
E2C
E2D.i:
E2D.ii:
E2E
E2F
E3A
E3B
E3C
E3D.i
E3D.ii
N
3,054
2,566
2,087
2,286
12,763
1,216
209
0
2
4
1
5,781
61,736
20
1
51
31
35
27
5,067
MEAN
0.36
-0.49
0.23
-0.18
0.01
-0.40
-0.32
0
0.17
0.07
0.08
0.00
-0.06
0.00
0.01
0.09
0.14
-0.24
0.00
-0.01
***
***
***
***
***
***
***
*
***
**
**
*
***
STD DEV
0.39
0.32
0.23
0.28
0.15
0.44
0.19
0
0.21
0.12
.
0.06
0.45
0.11
.
0.18
0.29
0.38
0.38
0.14
MAX
1.00
0.00
6.26
1.14
11.20
6.26
-0.04
0
0.33
0.24
0.08
0.46
21.79
0.24
0.01
0.63
0.64
0.85
0.94
1.79
Q3
0.75
-0.18
0.32
-0.08
0.02
-0.22
-0.18
0
0.33
0.13
0.08
0.01
0.07
0.07
0.01
0.21
0.27
0.01
0.08
0.01
MEDIAN
0.30
-0.50
0.16
-0.15
0.00
-0.39
-0.27
0
0.17
0.02
0.08
0.00
0.03
-0.01
0.01
0.01
0.15
-0.18
-0.01
0.00
***
***
***
***
***
***
***
***
***
**
*
***
Q1
0.05
-0.81
0.09
-0.30
0.00
-0.57
-0.44
0
0.02
0.01
0.08
0.00
-0.07
-0.08
0.01
-0.05
0.04
-0.43
-0.15
-0.03
MIN
-0.91
-1.00
-0.16
-6.26
-1.39
-6.57
-1.14
0
0.02
0.01
0.08
-3.14
-49.83
-0.15
0.01
-0.12
-1.10
-1.16
-0.72
-2.99
This table provides descriptive statistics for exceptions remaining after the resolutions in Step 1, in which we replace nulls with zeros. Panel A is for raw values of
the exceptions, while Panel B shows the values of the exceptions scaled by concurrent total assets. *, **, *** next to the mean and median columns indicate that the
means and medians are statistically different from 0 at significance levels of 10%, 5%, and 1%.
37
Table 4. Descriptive Statistics of Remaining Exceptions after Step 3
Panel A. Raw Values
Exception
E1A.i
E1A.ii
E1B.i
E1B.ii
E1C.i:
E1C.ii:
E1D
E1E
E2A
E2B
E2C
E2D.i:
E2D.ii:
E2E
E2F
E3A
E3B
E3C
E3D.i
E3D.ii
E4Am: STKNaplug
E4Bm: REUNAmplug
E4Cm: AOCIaplug
E4Em: MIBTmplug
N
13
216
10
2
127
311
209
0
2
4
1
75
186
3
1
51
31
35
27
4,477
76,370
73,712
16,621
15,123
MEAN
-63.23
-254.68
-45.90
-12.00
24.56
9.20
-204.64
0
10.00
16.00
68.00
59.59
13.13
87.33
1.00
70.51
87.48
-53.86
-3.04
-14.96
52.99
-16.20
-9.97
-4.81
***
***
***
***
***
*
***
**
**
*
***
***
***
***
*
STD DEV
86.47
817.17
56.42
18.39
118.25
196.01
411.75
0
7.07
12.52
.
217.56
213.97
158.20
.
235.74
180.55
169.80
41.47
299.68
979.62
381.56
384.41
345.81
MAX
-2.00
155.00
-1.00
1.00
894.00
2,317.00
-1.00
0
15.00
34.00
68.00
855.00
1,703.00
270.00
1.00
834.00
799.00
293.00
124.00
4,984.00
152,984.00
17,481.00
22,226.00
20,955.00
38
Q3
-5.00
-8.00
-1.00
1.00
17.00
15.00
-16.00
0
15.00
24.50
68.00
14.00
9.00
270.00
1.00
14.00
106.00
1.00
11.00
5.00
8.00
0.02
0.23
0.60
MEDIAN
-23.00
-24.00
-23.00
-12.00
4.00
-1.00
-56.00
0
10.00
11.50
68.00
-1.00
1.00
-3.00
1.00
2.00
9.00
-9.00
-2.00
-1.00
1.00
0.00
0.00
-0.08
***
***
***
***
***
***
***
*
***
***
**
**
**
*
***
***
***
***
***
Q1
-100.00
-112.50
-101.00
-25.00
1.00
-14.00
-158.00
0
5.00
7.50
68.00
-7.00
-2.00
-5.00
1.00
-3.00
1.00
-76.00
-15.00
-11.00
0.02
-0.24
-0.40
-3.00
MIN
-269.00
-6,260.00
-148.00
-25.00
-587.00
-941.00
-2,420.00
0
5.00
7.00
68.00
-360.00
-1,773.00
-5.00
1.00
-185.00
-69.00
-655.00
-83.00
-11,137.00
-52,346.00
-28,809.00
-17,489.00
-13,994.00
Table 4. Descriptive Statistics of Remaining Exceptions after Step 3
Panel B. Values Scaled by Total Assets
Exception
E1A.i
E1A.ii
E1B.i
E1B.ii
E1C.i:
E1C.ii:
E1D
E1E
E2A
E2B
E2C
E2D.i:
E2D.ii:
E2E
E2F
E3A
E3B
E3C
E3D.i
E3D.ii
E4Am: STKNaplug
E4Bm: REUNAmplug
E4Cm: AOCIaplug
E4Em: MIBTmplug
N
13
216
10
2
127
311
209
0
2
4
1
75
186
3
1
51
31
35
27
4,477
76,370
73,712
16,621
15,123
MEAN
-0.20
-0.19
-0.06
-0.01
0.02
0.07
-0.32
0
0.17
0.07
0.08
-0.13
-0.01
-0.01
0.01
0.09
0.14
-0.24
0.00
-0.02
0.03
0.00
0.00
0.00
***
***
***
***
***
***
**
**
*
***
***
***
***
**
STD DEV
0.25
0.27
0.06
0.03
0.06
0.42
0.19
0
0.21
0.12
.
0.47
0.21
0.09
.
0.18
0.29
0.38
0.38
0.13
0.24
0.20
0.05
0.09
MAX
0.00
0.16
0.00
0.01
0.17
6.26
-0.04
0
0.33
0.24
0.08
0.46
1.61
0.08
0.01
0.63
0.64
0.85
0.94
1.08
12.51
32.79
2.51
1.62
Q3
-0.02
-0.01
-0.03
0.01
0.03
0.03
-0.18
0
0.33
0.13
0.08
0.04
0.01
0.08
0.01
0.21
0.27
0.01
0.08
0.01
0.02
0.00
0.00
0.00
MEDIAN
-0.06
-0.05
-0.05
-0.01
0.01
0.00
-0.27
0
0.17
0.02
0.08
-0.01
0.00
-0.01
0.01
0.01
0.15
-0.18
-0.01
0.00
0.00
0.00
0.00
0.00
***
***
***
***
***
***
**
***
***
***
***
***
***
Q1
-0.33
-0.21
-0.10
-0.03
0.00
-0.01
-0.44
0
0.02
0.01
0.08
-0.09
0.00
-0.10
0.01
-0.05
0.04
-0.43
-0.15
-0.03
0.00
0.00
0.00
0.00
MIN
-0.73
-0.93
-0.16
-0.03
-0.40
-1.39
-1.14
0
0.02
0.01
0.08
-3.14
-1.52
-0.10
0.01
-0.12
-1.10
-1.16
-0.72
-2.99
-25.09
-3.80
-1.56
-7.90
This table provides descriptive statistics for exceptions remaining after the resolutions through Step 1, 2, and 3. Panel A is for raw values of the exceptions, while
Panel B shows the values of the exceptions scaled by concurrent total assets. *, **, *** next to the mean and median columns indicate that the means and medians
are statistically different from 0 at significance levels of 10%, 5%, and 1%.
39
Table 5. Descriptive Statistics for Financial Statement Ratios and Modified Variables That Are Significantly Different between
the Original Compustat Dataset and the Modified Dataset
Variable
INVT
xsga
ACT
LCT
cibegni
citotal
ACCR_BS_RAW
WCACCR_CF_RAW
WCACCR_CF
ACCR_CF_RAW
ACCR_CF
WC_RAW
WC_SCALED
ALTZ
CURRENT_RATIO
QUICK_RATIO
Based on Modified Data (MDB)
N
Mean Median Std Dev
92,951 158.96
8.81
876.02
92,951 276.15
29.15 1,464.85
92,951 690.65
70.66 6,140.76
92,951 476.63
30.48 4,477.30
92,951
73.78
2.39
887.98
92,951
70.55
2.33
898.86
81,375
-85.20
-4.40 1,119.26
92,951
-23.37
-1.86
287.97
81,375
-0.03
-0.02
0.13
92,951 -107.90
-9.11
645.70
81,375
-0.09
-0.07
0.15
92,951 175.40
10.39 3,618.00
81,375 574.11
48.80 5,615.41
92,951
4.72
3.15
12.83
92,871
2.98
2.05
4.06
92,871
2.09
1.21
3.65
Based on Original Compustat Data
T Test
N
Mean Median Std Dev tValue Probt
92,092 160.44
9.11
879.96
-0.36
0.72
85,477 300.30
35.65 1,525.18
0.00
-3.41
90,457 523.14
69.04 2,307.38
0.00
7.69
90,796 373.75
29.79 2,167.07
0.00
6.25
22,759 182.19
9.73 1,349.11 -14.72
0.00
22,774 168.44
8.67 1,382.52 -13.08
0.00
79,045 -77.92
-4.33
509.39
0.10
-1.67
50,442 -21.20
-1.74
232.44
-1.45
0.15
43,229
-0.04
-0.02
0.15
0.00
6.93
50,067 -68.10
-6.38
394.39 -12.58
0.00
42,901
-0.09
-0.08
0.16
0.00
7.35
90,427
77.08
10.10
543.20
0.00
8.08
79,219 413.09
47.46 1,736.93
0.00
7.72
82,560
4.74
3.26
11.96
-0.25
0.80
90,438
2.95
2.05
3.76
0.09
1.68
88,982
2.10
1.23
3.48
-0.65
0.52
Wilcoxon Test
Z
PROBZ
0.00
2.84
0.00
29.38
0.01
-2.69
0.02
-2.29
0.00
28.06
0.00
24.77
0.38
0.70
0.70
0.48
0.00
-10.06
0.00
22.91
0.00
-11.40
0.01
-2.73
0.00
-2.88
0.00
8.30
0.29
0.77
0.00
5.50
This table presents descriptive statistics for ratios and modified variables that are significantly different when calculated using the original Compustat dataset and the modified dataset.
The modified dataset is constructed after resolutions of exceptions through Step 1, 2, and 3. Variables INVT, xsga, ACT, LCT, cibegni, and citotal are defined in the online appendix
B. The other accruals variables and ratios are defined as follows: ACCR_BS_RAW: raw operating accruals, defined as ∆(current assets – cash and cash equivalents) – ∆(current
liabilities – debt in current liabilities) – depreciation & amortization, calculated using balance sheet data; WCACCR_CF_RAW: raw working capital accruals, defined as change in
accounts receivable + change in inventory – change in accounts payable and accrued liabilities – change in taxes payable – change in other assets and liabilities, calculated using
statement of cash flows data; WCACCR_CF: working capital accruals, calculated as WCACCR_CF_RAW scaled by average total assets; ACCR_CF_RAW: raw operating accruals,
defined as change in accounts receivable + change in inventory – change in accounts payable and accrued liabilities – change in taxes payable – change in other assets and liabilities
– depreciation & amortization, calculated using statement of cash flows data; ACCR_CF: operating accruals, calculated ACCR_CF_RAW scaled by average total assets; ALTZ:
Altman's Z Score, calculated as 1.2 x working capital/total assets + 1.4 x retained earnings/total assets + 3.3*operating income after depreciation & amortization/total assets + 0.6 x
market value of equity/total liabilities + sales/total assets; WC_RAW: raw working capital, defined as ∆(current assets – cash and cash equivalents) – ∆(current liabilities – debt in
current liabilities), calculated using balance sheet data; WC_SCALED: working capital, calculated as WC_RAW scaled by average total assets;
CURRENT_RATIO: current ratio, calculated as total current assets/total current liabilities; QUICK_RATIO: quick ratio, calculated as (cash and cash equivalents + trade
receivable)/total current liabilities.
40
Download