Do Compustat Financial Statement Data Articulate?* Ryan Casey University of Denver ryan.casey@du.edu Feng Gao University of Illinois at Chicago gaof1@uic.edu Michael Kirschenheiter University of Illinois at Chicago mkirsche@uic.edu Siyi Li University of Illinois at Chicago siyili@uic.edu Shailendra Pandit University of Illinois at Chicago shail@uic.edu This Draft: May 29, 2015 * We wish to thank the Center for Education and Research in Financial Reporting Quality (CERFRQ) of the University of Illinois at Chicago for financial assistance. We also gratefully acknowledge helpful comments and suggestions from the Compustat team at S&P Capital IQ, Mark Evans, Hong Xie (AAA discussant), and workshop participants at the University of Illinois at Chicago, the First UIC CERFRQ Accounting Research Conference, the 37th Annual Congress of the European Accounting Association, and the 2014 AAA Annual Meeting. Do Compustat Financial Statement Data Articulate? ABSTRACT Using the Financial Statement Balancing Model (FSBM) from Compustat, we examine whether financial statement data articulate for 10,681 U.S. non-financial firms for 24 years, a total of 92,951 firm years. We accomplish three research goals. First, we build the first formal model of financial statement articulation, providing a benchmark for subsequent discussions of articulation. Second, we show how to handle missing data to ensure articulation, by either filling in zeros or inferring the missing data using other variables in the equations. Third, we produce modified variables that resolve exceptions in the articulating equations, so that these variables form relations that are consistent across time and firms. We then compare the “modified database” (MDB) using these updated variables with the original Compustat data, and find significant differences in many commonly used financial variables, such as Altman’s z-score. We believe that our MDB has the potential to help researchers increase sample size and data quality in empirical studies. Key Words: financial statement articulation; data integrity; Compustat. Do Compustat Financial Statement Data Articulate? I. INTRODUCTION A key attribute of financial statements (F/S) is that they articulate, that is, the stock documents, the beginning and ending balance sheets (B/S), connect to each other through the three flow documents, the cash flow, income, and changes in owners’ equity statements (CF/S, I/S and OE/S, respectively). The purpose of this study is to formally define what articulation of accounting data means and examine whether articulation holds in Compustat, one of the most important databases for financial statement information used by accounting and finance researchers.1 Ensuring articulation in Compustat financial statement data is important for two reasons. First, articulation drives the construction of financial statements. F/S articulation is a direct outcome of double-entry bookkeeping system in which accounting variables are codetermined through the resolution of multiple accounting identities (Christodoulou and McLeay 2014, hereafter CM). Our model of articulation shares the spirit of CM, in which relationships among accounting variables are required to hold contemporaneously, across time, and across financial statements. While we know that theoretically the amounts in F/S should articulate, we do not know whether the data that researchers actually use also articulate. As CM show, an empirical research design that ignores the articulating nature of accounting data can lead to incorrect or incomplete inferences. We rely on an articulating model to modify Compustat data and produce a fully “articulated” dataset, which allows researchers to utilize articulation in their research design. Second, F/S articulation can help empirical researchers manage issues in data quality. The 1 Compustat is prepared and marketed by Standard & Poor’s (S&P) Capital IQ division using information from firms’ financial disclosures. Such information undergoes a standardization process before being coded into the Compustat database. Also, as discussed in sections II and III below, the academic community appears to lack agreement on whether or not F/S data used in research will articulate or even what “articulation” means. 1 first common data quality issue is the existence of “missing” variables.2 Researchers use a variety of approaches to address the issue of missing data. Some exclude them from the sample (e.g., Lee, Pandit and Willis 2013); some set them to zero (e.g., Richardson et al. 2005) and some apply both approaches to different variables (Bloomfield et al. 2015). Koh and Reeb (2014) find that indiscriminately treating missing R&D values as zero introduces bias into the analysis. While indiscriminate replacement of missing values with zeros is inappropriate, dropping such observations leads to a reduction in the sample size, potentially reducing the power of the empirical tests. Using our model of articulation, researchers can restore the missing data in a systematic manner by replacing missing data, either with zeros or with summary amounts, only for observations that meet our articulating criteria. Another common data issue concerns general data accuracy when variables are not missing; articulation relations can provide a way to resolve this issue. For example, we find some instances where articulating relationships fail to hold even when data are not missing. Simply filling zeros or dropping the observations cannot address such cases. Rather, we rely on our articulation model to create modified Compustat variables to incorporate the fully articulated model in such instances. Our approach of systematic replacement of specific missing values with zeros and modification of variables to ensure articulation yields a significant increase in sample size along with a significant change in the observed values for several commonly used variables and ratios. We construct a version of the Compustat Financial Statement Balancing Model, or FSBM, 2 Compustat indicates missing value for a variable using a null value in place of a number for that variable. According to Compustat, such “nulls” can indicate cases where either the firm does not report the amount or Compustat analysts are unable to assign the amount to a specific variable field, potentially due to deficient company reporting. There are also cases where Compustat use nulls in combination with data codes to convey certain issues in the supplied financial statement data. See Section IV for more discussion. Also, researchers generally analyze Compustat data using software tools such as SAS, which typically convert null values of numeric data fields to ‘dots’. Chen et al. (2014) implicitly refer to non-reporting of variable values by Compustat when they develop their disclosure quality measure based on a count of ‘dots’. We sometimes refer to null fields (‘dots’) as missing data. 2 which specifies 20 equations for the B/S, I/S, and CF/S.3 Based on the FSBM, we build a Modified Financial Statement Balancing Model, denoted as FSBMm, with five additional equations representing the OE/S, which is notably absent in Compustat’s FSBM. We present all 25 equations, representing articulation among line items on the four F/S, and then empirically test these relations to examine whether articulation holds in the original Compustat database. Using Compustat data for U.S. non-financial firms for the time period 1988-2011, we construct two databases called RDB and MDB (for ‘raw’ and ‘modified’ databases, respectively). We start building the RDB from selecting variables used in the 20 FSBM equations from Compustat. Applying these equations that represent the B/S, I/S and CF/S relations yields the RDB that contains 1,847,444 firm year equation observations, or FYEOs. Our initial application of the equations to the data produces 560,684 exceptions, an exception rate of 30%. We follow a three-step process to resolve the exceptions. We first replaced all null variable values with zeros and resolved roughly 80% of the exceptions by this method, leaving 96,937 exceptions. We then replaced missing or zero values in total assets, total liabilities, total current assets, and total current liabilities with the sum of their respective components, which resolves another 10,342 exceptions. We then resolve another 80,814 exceptions based on changes to GAAP over the 24-year sample period. In total, we resolve about 99% of the exceptions and produce the MDB by modifying the Compustat variables and applying the FSBMm equations to ensure articulation in the four F/S.4 The remaining 1% exceptions are not resolved due to other issues such as inconsistencies between changes in cash on the B/S versus the CF/S or when companies do not report financial statements 3 Three equations, Equations 1E, 2E, and 3D.ii, are not included in the Compustat version of FSBM, but are included in our testing. Equation 1E says total assets equal total liabilities plus total equities; we find it always holds. Equation 2E, for comprehensive income, is not included in the Compustat version of FSBM, but is shown to hold elsewhere in the Compustat manual. Equation 3D.ii says change in cash on the B/S equals change in cash on the CF/S.; we find numerous exceptions to this equation, as discussed below. The primary reason appears to be restatements, but additional work will be needed to resolve these exceptions more fully. 4 SAS code to generate the MDB is available upon request. 3 that articulate.5 Our paper contributes to the literature in the following four ways. First, we are the first to use a comprehensive financial statement balancing model to evaluate the articulation of Compustat data. Our study complements CM, which assumes that accounting data articulate from double entry accounting and provide a structural model of estimation to mitigate the endogeneity issue. Our study specifically addresses cases when data do not articulate, which enables researchers to design and conduct studies that take advantage of the articulating nature of F/S, and thus mitigate research design issues stemming from non-articulation of data. Second, we provide a systematic approach to the issue of missing data in Compustat data. As discussed above, empirical researchers differ in how they address missing values in data. Excluding such observations can result in a significant reduction in sample size, while setting all missing values to zero can introduce biases into the data, since zero may not be the actual value for the variable in question.6 Our system of equations in the FSBMm offers researchers a way to treat missing variables – we set the missing values to zero only when the changes ensure that the FSBM equations hold. This approach effectively expands the sample and increases the power of empirical tests without introducing bias. In cases where there are no missing data but the articulation equations do not hold, we replace existing Compustat variables with modified variables to ensure articulation of the data. We show that modifying the data has an impact on values of variables and ratios commonly used in the literature, such as inventory, selling, general and administrative (SGA) expenses, current ratio, working capital, and Altman’s Z-score. 5 More specifically, other than the inconsistencies between changes in cash on the B/S versus the CF/S in Equation 3D.ii, only 0.23% of the original population of exceptions remain unresolved. 6 Missing variables values can have a significant impact on the sample size, thus potentially affecting the power of empirical tests. For example, Lee et al. (2013) report that they lose 9,059 (23.4%) observations out of an initial sample of 38,697 firm-year observations due to missing values of ESUB, ‘Equity in earnings – unconsolidated subsidiaries’, variable in Compustat. Nonmissing data requirements for control variables further reduce their sample by 8,302 (21.5%) observations, leaving a final sample of 21,336 observations, which represents 55.1% of the original sample. Also see Koh and Reeb (2014) for an analysis of missing research and development data. 4 Third, our study adds to an emerging literature that examines the count and distributional properties of reported accounting variables to infer certain firm attributes. For example, Li (2008) and Lundholm, Rogo and Zhang (2014) use the number of non-missing items from Compustat annual data as a proxy for financial complexity. A recent study by Chen, Miao and Shevlin (2014) proposes a measure of disclosure quality by aggregating the number of non-missing variables in Compustat line items to draw inferences on the level of detail in firms’ annual reports. To examine the impact of our approach on the literature, we compare the percentile rankings of Chen et al.’s (2014) disclosure quality measured using the original Compustat data and our modified database.7 Interestingly, the original Compustat data produce disclosure quality measures with upward biases for large firms, and downward bias for smaller firms. We believe that sophisticated investors may choose to follow our approach and replace certain missing values to yield a more accurate disclosure quality measure. Fourth, we add to the literature that focuses on the accuracy of different data aggregators or redistributors. Recent studies such as Boritz and No (2013) and Eilifsen and Messier (2013) add to a long literature that investigates issues in commonly used data sources. We contribute to this literature by proposing a systematic method based on accounting data articulation to address missing or inconsistent Compustat data due to incomplete firm disclosure or GAAP changes. The remainder of the paper proceeds as follows. Section II reviews related literature. In Section III, we present the development of the FSBMm, the Modified Financial Statement Balancing Model. Section IV provides the empirical analysis. Section V concludes. II. LITERATURE REVIEW This study builds on and adds to three strands of accounting literature. The first line of 7 See the online appendix D for details of the analysis. 5 research provides our study with theoretical motivation. F/S articulation, which derives naturally from the double-entry bookkeeping system, is often utilized in accounting models. For instance, the clean surplus relation, one of the summary articulation relations, is employed in valuation studies such as Ohlson (1995) to help derive the residual income valuation model. Christodoulou and McLeay (2014), in an effort to fully exploit the articulation relations among financial statement variables, develop a generalized structural system where the deterministic relationships governing articulation are clearly defined. They apply the framework to develop fully identified models that are consistent with the duality of the accounting data generating process and show that the new models, when applied to specific research settings, such as equity pricing and investment sensitivity to operating cash flows, can yield more precise estimates. By the same token, ignoring the constraints imposed by articulation relations may lead to erroneous or incomplete inferences. Our articulation model shares the spirit of CM. Moreover, for empirical researchers who want to use the CM framework, a fully articulated dataset, like the one we will produce, would be required. The second strand of related literature defines articulated F/S as those for which changes in B/S working capital accounts equal the working capital amounts reported on the CF/S and considers all other F/S as failing to exhibit articulation. Under this definition, events such as reclassifications, accounting changes, foreign currency translations and acquisitions would destroy articulation. Studies that adopt this approach include Dritina and Largay (1985), Huefner et al. (1989), Bahnson et al. (1996), Revsine et al. (2002), Wilkins and Loudder (2000), Hribar and Collins (2002), and Shi and Zhang (2011). Our study complements the work in this area of research as we provide a competing perspective on handling these “non-articulating” events. Our study extends the above literature by developing a comprehensive framework of accounting data articulation, which formalizes the relationships among accounting variables across financial 6 statements and across time. The third strand of related literature focuses on potential data omissions and the impact on research. Bennin (1980) compares monthly returns reported from Compustat and CRSP between 1962 and 1978 and finds an error rate of 0.25%, about one third of what Rosenberg and Houglet (1974) document for an earlier period. Boritz and No (2013) study 150 XBRL filings of a random sample of 75 firms that filed their interactive data on EDGAR between 2009 and 2012. They find that data aggregation and redistribution from commonly used data aggregators have an omission rate of 50% for items reported in the interactive data. For items that are reported by data aggregators, about 8% do not match the information from interactive data and more than 50% of such mismatches are material based on conventional standards (Eilifsen and Messier, 2013). Our system of F/S equations allows us to examine whether the data reflect articulation. Our approach to resolving omitted values in data is similar in spirit to that of Koh and Reeb (2014) who detail a systematic approach to deal with missing R&D values. They report that panel data can benefit from a hybrid approach to replacing missing data with either industry or historical firm average R&D. III. DEVELOPING THE FSBMm In this section we present the Modified Financial Statement Balancing Model, abbreviated as FSBMm, where the subscript indicates we have modified the Compustat FSBM. We start by describing what we mean by “articulation” since this term has been defined in different ways in the literature. We provide the 25 equations that constitute the FSBMm, after which we describe how we constructed the model. 7 Overview of Articulation Compustat is the most widely used database of financial statement information for accounting and finance research. According to S&P Capital IQ, a subsidiary of McGraw Hill Financial and the provider of Compustat databases, data included in Compustat go through an extensive process to collect and standardize fundamental company data.8 Despite the extensive application in accounting and finance research, to our knowledge there has not been s systematic examination of whether Compustat F/S data articulate. Most definitions of articulation refer to how F/S relate to one another, but no single authoritative definition of articulation exists. Often, accountants informally refer to articulation as the flow line items (on the I/S, CF/S and OE/S) connecting or relating to the stock line items (on the B/S).9 Hribar and Collins (2002) consider how non-operating events contribute to “non-articulation” between financial statements, but define articulation differently than we do in this paper. In an early explicit use of the term articulation, Mann (1984) showed how I/S and CF/S information can be used to reconcile B/S accounts. In another early explicit use of articulation, Black (1993) uses articulation as being equivalent to “clean surplus” accounting. More recently, in commenting on the Financial Accounting Standards Board’s (FASB’s) discussion on Financial Statement Presentation, Moehrle et al. (2010) use articulation to mean linkage of F/S, and they connect such articulation to the FASB’s cohesiveness objective. While Moehrle et al. (2010) do 8 The four-step process includes: 1) Alignment of data according to Financial Accounting Standards Board (FASB), Securities & Exchange Commission (SEC) and GAAP (including IFRS) guidelines and principles; 2) Examination by staff analysts to ensure the quality of data integrated from third-party partners; 3) Extraction of information from the financial statement notes; 4) Completion of comprehensive data reviews including over 14,000 system-based validity checks. The description of the Compustat standardization process was found through the following URL: http://www.compustat.com/Compustat_Standardization/. It has since been removed following S&P Capital IQ’s website reorganization. 9 If we define a statement based on the types of line items included, there is a single flow statement (the I/S), as both the CF/S and OE/S include both flow and stock items. Also, while the information reported in the OE/S is required to be reported, it may be reported in a separate OE/S, in the notes or in the other statements. Finally, while the Comprehensive Income Statement (CI/S) is now a fifth document forming the set of F/S, we incorporate the CI/S information as I/S equations in our FSBMm. 8 not explicitly state that they are interpreting the term articulation in the broader sense, it is closer to our definition of articulation. Our Definition of Articulation and the FSBMm We believe that a full model of articulation needs to describe how line items on each F/S aggregate as well as how individual line items connect the different F/S. To formally define articulation, we build on the Financial Statements Balancing Model (FSBM) for North American Companies constructed by Compustat, which describes the inter-relations among standardized data items from the I/S, B/S, and CF/S.10 More specifically, articulation in this paper is defined to hold for a set of F/S if all the equations constituting the FSBMm hold. The advantage of our definition is that: (1) it is explicit, (2) it can be used for comparison purposes, and (3) it covers all the transactions required to reconcile B/S amounts. More specifically, we specify explicitly how the different F/S aggregate and then use the information from the flow documents to reconcile beginning and ending B/S amounts. Hence our model is more general, for example, we can describe the articulating activity per the Hribar and Collins (2002) definition as a subset of the articulating activity included in our model. We complete the creation of the FSBMm by introducing an OE/S to the FSBM, after having created new variables to complete the articulation of the F/S data. The OE/S relations add five new equations, so that the FSBMm is composed of 25 equations in total, featuring eight, seven, five and five equations representing the relations for the B/S, I/S, CF/S and OE/S, respectively.11 10 We download the file named “Financial Statement Balancing Model – North American Companies.xls” which contains Compustat’s FSBM from the Compustat website at Wharton Research Data Services as of July 1, 2012. We show how we construct our version of Compustat’s FSBM and the expanded model, the FSBMm, in detail in an online appendix. 11 An important caveat to our study is that our FSBMm does not develop the articulating relations for all line items in the F/S. It represents a complete set of aggregation relations on all F/S, articulating relations between the I/S and CF/S, and articulating relations for some, but not all, line items on the B/S. Specifically, we develop articulating relations for the cash account and for the equity accounts on the B/S, but not for the non-cash asset or the liability accounts. We defer the development of a more articulating model to future work; see Casey et al. (2015). 9 We present the FSBMm in four parts, covering the equations relating to the balance sheet (B/S), income statement (I/S), cash flow statement (CF/S) and owners’ equity statement (OE/S) relations, respectively. B/S (Eight Equations: 1Am – 1Em) Assets – Current: CHm+IVST+RECTR+TXR+RECCO+INVT+ACO=ACTm Total: ACTm +PPEGT –DPACT +IVAEQ +IVAO +INTAN +AO = ATm Liabilities – Current: DLC+AP+ TXP+LCO = LCTm Total: LCTm +DLTT +TXDB+ITCB +LO = LTm Equity – Retained earnings: REUNAm + AOCIa = REm Total: STKNa + REm = SEQ Liabilities and equities: LTm + SEQ + MIBTa = LSE B/S Totals: LSE = ATm (1A.im) (1A.iim) (1B.im) (1B.iim) (1C.im) (1C.iim) (1Dm) (1Em) I/S (Seven Equations: 2A – 2F) Operating income: sale – cogs – xsga – dp = oiadp Pre-tax income: oiadp – xint + nopi + spi = pi Net income (NI): pi – txt – mii = ib NI equivalence – CF/S: ib = ibcm – mii and – OE/S: ib + xido = cibegnim – mii Comprehensive income: cibegnim + ocia – cimii = citotalm Extraordinary Income: ib – dvp + cstke + xido = niadj12 (2A) (2B) (2C) (2D.im) (2D.iim) (2Em) (2F) CF/S (Five Equations: 3Am – 3D.iim) Operating: oancf = ibcm +dpc +xidoc +txdc +esubc +sppiv +fopo +recch +invch +apalch +txach +aoloch (3Am) Investing: ivncf = –ivch +siv +ivstch –capx +sppe –aqc +ivaco (3B) Financing: fincf = sstk +txbcof –prstkc – dv +dltis –dltr +dlcch +fiao (3C) CF/S checks: oancf + ivncf + fincf + exre = chech (3D.i) and (CHm,t – CHm,t-1) = chech (3D.iim) OE/S (Five Equations: 4Am – 4Em) Capital Stock: STKNa,t– STKNa,t-1 = sstk – prstkc + STKNaplug Retained Earnings: REUNAm,t – REUNAm,t-1 = ibcm – mii – dvt + REUNAmplug AOCI: AOCIa,t – AOCIa,t-1 = citotalm – cibegnim + AOCIaplug OE/S: SEQt – SEQ t-1 = citotalm + sstk – prstkc – dvt + STKNaplug + REUNAmplug + AOCIaplug Non-controlling interest: MIBTa,t – MIBTa,t-1 = mii + cimii + MIBTmplug (4Am) (4Bm) (4Cm) (4Dm) (4Em) A formal definition of articulation with respect to F/S data requires that we specify exactly how the specific F/S data variables aggregate within each F/S as well as between F/S’s. For 12 The Compustat FSBM includes more variables covering extraordinary income relations. We choose the simpler structure of a single equation as we decided not to build additional framework on these other income relations. 10 example, we specify how the B/S asset line items aggregate into total assets as well as how the line items from the flow documents affect the B/S. As mentioned, Compustat provides articulating relations in equations shown in their FSBM; however multiple possible variations of the FSBM exist that are consistent with the equations that Compustat provides. More specifically, building the FSBM requires researchers to choose among different variables and exercise discretion when constructing the FSBM equations.13 For example, we use total inventory, INVT, rather than the individual inventory amounts, because these latter amounts are occasionally missing even when INVT is non-zero.14 The discretion in building the FSBM extends to the FSBMm, but before discussing the FSBMm in more detail, we first clarify expositional issues on variable construction and notation. First, all variables listed in the equations in this study are either Compustat variables or are based on Compustat variables. For example, we construct both aggregated and modified variables to make the FSBM consistent over time, using “a” or “m” in the variable subscript to indicate an aggregated or modified variable, respectively. We construct our five aggregated variables by simply summing existing Compustat variables.15 We have ten modified variables, seven for the B/S and three for the I/S.16 We “back fill”, or use existing data, to create the modified variables. We use two types of procedures, back-filling 13 Again, see the online appendix for more details. For example, in 2004 the Compustat record for General Electric (ticker: GE) displays total inventory (INVT) of $16,279 million while each of the inventory components (raw material, work-in-process, finished goods and other inventory, denoted as INVRM, INVWIP, INVFG, and INVO, respectively) display a null to indicate missing data. Our S&P Capital IQ contact informed the authors that the null was used in this case because the reported inventory was not classifiable into its components. 15 The aggregated variables, denoted as MIBTa, PSTKa, AOCIa, STKNa, and ocia, measure, respectively, non-controlling interest, preferred equity, common equity, accumulated other comprehensive income on the B/S and other comprehensive income on the CI/S. See the online appendix for the equations for these variables. 16 The modified variables for the B/S, denoted as ACTm, ATm, CHm, LCTm, LTm, REm,and REUNAm, measure, respectively, current and total assets, cash, current and total liabilities, and retained earnings including and excluding accumulated other comprehensive income. The modified variables for the I/S, denotes as cibegnim, citotalm, and ibcm, measure, respectively, comprehensive income beginning net income, parent comprehensive income, and income before extraordinary items on the CF/S. 14 11 based on identities or based on changes in GAAP. We create six of the B/S variables by filling in missing variables using identities. So, for example, when variable ACT is missing but its current asset components are not missing,17 we set ACTm equal to the sum of the components; in analogous fashion we generate variables ATm, CHm, LCTm, LTm, and REm. We carry out such backfilling only when the subtotals are missing; stated differently, if the subtotal values are available we do not change them since this would imply fundamental changes to the data (also see Example 1 in Appendix B). We create the remaining B/S and all three I/S variables based on changes in GAAP. The two GAAP changes that we use are the reporting of other comprehensive income and of non-controlling interest; these changes affect Compustat amounts beginning in fiscal years 2001 and 2009, respectively. For example, cibegni is not reported prior to 2001 and is reported without mii (income to non-controlling interest) between 2001 and 2009. We create the modified variable cibegnim to represent income before extraordinary items, including mii, for the entire period.18 We generate variables REUNAm, citotalm, and ibcm using a similar approach. Second, Compustat variables are not case sensitive, but we use case to denote whether a variable is a stock or a flow variable. As these equations show, we use upper case letters to denote B/S variables and use lower case letters to denote flow statement variables, in other words, variables from the I/S, the CF/S or the OE/S. Hence, the first four aggregated variables, AOCIa, MIBTa, PSTKa, and STKNa, are all B/S line items while the fifth variable, ocia, denoting the amount of other comprehensive income for the period, is a flow document line item from the CF/S. Next, we discuss the 25 equations that constitute our version of Compustat’s FSBM. The 20 17 According to S&P Capital IQ, examples of situations when a “Null” value may be assigned to ACT can include cases when a firm has unclassified balance sheet and does not report total current assets. 18 Our method treats income to minority interests and to non-controlling interests as the same item. While not, strictly speaking, accurate, we felt that it was more appropriate than excluding all firm years with this income non-zero. It cannot be ignored, since then the F/S would not articulate. Also, we use GAAP effective as of 2011 as our benchmark. 12 equations, Equations A1.i – 3D.ii, form a set of equations representing the B/S, I/S and CF/S relations that should hold, by definitions, over the entire Compustat database population. 16 of these equations are identities that aggregate components into subtotal and totals (e.g. Equation 2A); all of these equations also belong to the Compustat FSBM. Four equations relate the F/S to each other. Equation 1Em ensures that assets equal liabilities plus equities; while considered an accounting identity, it is not part of the FSBM. Equations 2D.i and 2D.ii connect the I/S to the CF/S (relating ib to ibc) and connect the I/S to the CI/S (relating ib to cibegni). The 4th equation in this group, 3D.iim, connects the change in cash on the CF/S to the change on the B/S. The last five equations represent the relations of the equity accounts in the OE/S. Equations 4Am – 4Cm and 4Em represent the B/S equity variables, capital stock, retained earnings, accumulated other comprehensive income and non-controlling interest. Equation 4Dm shows that change in the total parent equity B/S variable, SEQ, is equal to the sum of the changes in the three component parent equity B/S variables, STKNa, REUNAm and AOCIa. These equations are qualitatively different from the first 20 equations that are based on the B/S, I/S and CF/S relations as they are not part of the FSBM published by Compustat. While accountants may be interested in the changes to owners’ equity due to operating and non-operating activities, not all Compustat customers are similarly interested.19 Since these relations do not exist in Compustat’s FSBM, we expect there to be situations where these equations do not hold. Hence, we define plug variables, denoted with “plug” at the end of the variable name, to capture these differences. 19 S&P Capital IQ indicated that the lack of OE/S data probably represented lack of interest from potential customers. However, some academic accounting researchers find these data potentially valuable. Penman (2012), for instance, suggests that financial statement analysis should begin with the OE/S. Although we think these equations are necessary for any complete version of a FSBM, there are clearly issues with these constructs. For example, consider Equation 4Am. Since sale and purchase of common and preferred shares (sstk and prstkc, respectively) are both CF/S variables, for most FY’s, these amounts will not reflect the amounts flowing through the equity accounts. However, we do not have any OE/S variables, so we used the CF/S variables as proxies for the missing OE/S variables. Insofar as these are poor proxies, the associated plug variables should be large. 13 IV. EMPIRICAL ANALYSIS In this section we use our version of Compustat’s FSBM to ascertain the extent to which exceptions exist. We then describe how to resolve these exceptions through three steps and create a new modified database, denoted as the MDB. The MDB is comprised of all initial Compustat variables as well as the new variables described in Section III above. After resolving the exceptions, we present descriptive statistics to gauge the impact our modifications have on the original Compustat data in general and some commonly used variables and ratios in particular. Exceptions and their resolution Table 1 details how the initial sample of FYEOs and exceptions is generated. We use 1988 – 2011 as our sample period to ensure the availability of CF/S variables and obtain a set of 266,711 firm-year observations corresponding to 28,209 unique firms. For each firm-year observation, we download the data for 95 Compustat variables, including 38, 26, 31 and 1 variables from the B/S, I/S, CF/S and CI/S or OE/S, respectively.20 We delete 10,032 firms in the financial, insurance or utilities sectors, leaving 18,177 firms and 165,563 firm years. We also remove non-US firms, firms with sales or total assets less than $1 million, and each firm year for which there is insufficient CRSP data. This process of elimination results in 10,681 unique firms and 92,951 firm-year observations remaining. Requiring firms to have sales and total assets of at least $1 million ensures that we do not have firm-year observations where all B/S or I/S data fields are nulls or zeros. These firm years form the basis of our initial database, which we refer to as the “raw database” or RDB.21 [Insert Table 1 about here] 20 See the online appendix B for a list of all Compustat variables in the database. Also, for variables used in multiple statements (e.g., ibc is in I/S and CF/s), we counted the variable on the statement that shows up first in the FSBMm. So, for example, we counted ibc as being on the I/S, not the CF/S. 21 Changes in fiscal year end and accounting restatements are likely to affect the number of exceptions. There are 750 unique firms and 785 firm year observations for which there is a change in the month of their fiscal year-end. Our sample also contains 5,646 firm-year observations (6.07% of the sample) that were subject to a financial restatement. These restatements originate from 1,883 (17.62%) unique firms. 14 Applying the 20 equations based directly on Compustat’s FSBM to the 92,951 firm-years, we obtain a total of 1,847,444 firm-year-equation-observations (FYEOs) in the initial database.22 Of all the total FYEOs, we initially find 560,684 total exceptions. As is shown in Figure 1, most of the exceptions are due to missing values, or null values (548,902 out of 560,684).23 Of the 92,951 firm-years in the sample, 99% firm-years have at least one exception, 98% have more than one exception from different equations, and less than 1% have no exceptions. The first step in resolving exceptions relates to how Compustat codes the data obtained from the firms’ financial statements. Compustat indicates missing information for a variable using a null value in place of a number for that variable. According to Compustat, such “nulls” can include cases where either the firms did not report the amount or Compustat analysts were unable to assign the amount to a specific variable field due to imperfect company reporting. As an example, for a given firm-year observation the amount of total inventories, INVT, may be available but inventory components such as work-in-process (WIP) could be nulls since Compustat was unable to determine how much of total INVT could be assigned to WIP. In other cases, Compustat uses null values in conjunction with data codes to indicate that a company mentions having that data but does not report a reasonable value for Compustat to report. This is often due to two different data points getting reported on a combined basis by the company, such as ‘prepaid expenses and other current assets’ with no additional break out provided.24 Regardless of their origin, such null fields represent missing data to end users and pose a challenge to their research design. [Insert Table 2 about here] 22 For 92,951 firm-year observations with 20 equations for each firm-year, there are 1,859,020 FYEOs at the maximum. Eliminating 11,576 observations without consecutive data items to calculate Equation 3D.ii, we are left with 1,847,444 total FYEOs. 23 Empirically, we identify an exception when the absolute value of the difference between the left-hand side and the right-hand side of an equation is greater than or equal to $1 million. We consider a difference an exception if it is at least $1 million to exclude possible rounding errors from exceptions. 24 We thank S&P Capital IQ for explaining this practice to us. 15 As Table 2 and Figure 1 show, most of the exceptions (463,747/560,684 = 82.7%) are resolved by replacing each null value for variables in our system of equations with a zero, in other words, the equation holds once we insert the zeros. For example, 7,335 of the 92,951 firm years (roughly 8%) have at least one out of the eight variables in Equation 1A.im with a null value instead of a number. Replacing nulls with zeros resolves 4,281 exceptions, about 58.4% of the exceptions in this equation. Therefore, a researcher who exclude the null or missing value observations in these variables would lose about 8% of these firm years; those who choose to replace these missing data with zeros would be justified about 58.3% of the time. For a more specific example, Verizon has a null value for xsga (SG&A expense) in 2002. Once we replace the null value with a zero, Equation 2A holds and the exception is resolved.25 Using the FSBMm equations to determine when to fill in missing values avoids the unnecessary loss of data in the first case, and in the second, gives a reliable guideline to when a null variable should be replaced with the number zero. We believe that our FSBMm offers empirical researchers a method of using accounting equations as an internally consistent way to expand the sample size in their research. Although replacing nulls with zeros resolves most exceptions, many exceptions remain after step 1. In the third column in Table 2, we report the remaining exceptions by equation. Our second step is to replace subtotals with the sum of components for total current assets, total current liabilities, total assets, and total liabilities.26 As part of step 2, we also use CHm in place of CH.27 We discovered that there are a significant number of these subtotals with null values. The process of summing the components of these subtotals results in 10,342 of these exceptions being resolved, leaving 86,595 exceptions remaining after step 2. 25 See more details in Appendix B, Example 3 . 26 For an example of this procedure, see Appendix B, Example 1, which details our procedure for Berkshire Hathaway for 2011. 27 Details are explained in the online appendix A. 16 The final stage in resolving the exceptions involves understanding how changes in GAAP apply to the remaining exceptions. Many of these arise because our equations are based on GAAP as of 2011, hence they do not apply to all years in the sample. For instance, Compustat changed reporting methods in 2001 for the GAAP comprehensive income requirements. In particular, our Equation 1C.i is a B/S equation for the equity accounts and includes the accumulated other comprehensive income balance, ACOMINC, as part of the aggregated variable AOCIa. The line item ACOMINC is first required on the B/S in 1998.28 While Compustat reports ACOMINC for some firms beginning in 2000 and earlier, the variable is missing from most firm years prior to 2001. As exceptions arise in our equations over time, due to changes in GAAP and the resulting changes in Compustat’s reporting policies, we modify variables in our version of the FSBM. Additionally, Equations 2D.i and 2D.ii are affected by the issuance of SFAS #160 in 2009 that changes the reporting of minority interest to the reporting of non-controlling interest. We modify ibc to ibcm and cibegni to cibegnim to resolve the exceptions in these two equations. In total, these changes to the variables resolved 80,814 of the 86,595 remaining exceptions identified, more than 93% of the remaining exceptions. For a specific example of this procedure, see Appendix B, Example 4, where we detail the procedure we utilize to resolve the exception in Equation 2D.ii for GE in 2003. Overall, we are able to resolve all but 5,781, or 1% of all the 560,684 exceptions identified via the 20 articulation equations.29 With the additional five OE/S equations (Equations 4Am – 4Em) 28 Statement of Financial Accounting Standard (SFAS) #130, “Reporting Comprehensive Income” (ASC Topic 220) is issued in June, 1997 and applied to fiscal years beginning after December 15, 1997. However, Compustat seems to have included the information on a regular basis only after 2001. 29 Of the 5,781 unresolved exceptions, the majority (4,477) relate to Equation E3D.ii, which requires the change in the cash on the B/S to equal the change in cash on the CF/S. Excluding the exceptions from Equation 3D.ii, this means only 1,304 out of the 560,684 original population of exceptions (0.233%) remain unresolved. Restated F/S would likely cause an exception in Equation 3D.ii. While we do not know how many of the 4,477 exceptions are due to restated F/S, this is likely to be a primary source of these exceptions. We forwarded a copy of our list of unresolved exception to personnel at S&P Capital IQ and they are updating their data accordingly. 17 and the new plug variables from them, we generate a new database, which we refer to as the “modified database” or MDB. [Insert Table 3 about here] Table 3 reports the descriptive statistics for the remaining exceptions after step 1 where we replaced the null values with zeros. For example, in Panel A of Table 3, 3,054 exceptions remain after step 1. Despite the relatively low frequency of this type of exception (3.3% of the total number of the firm-year observations), the average magnitude of the exception is $5,389 million, with a minimum of -$3,698 million and a maximum of $471,520 million. Many of the exceptions related to changes in GAAP could also be resolved by filling in zeros. For example, only 21% of the exceptions (12,763 out of 59,799) remain in Equation 1C.i after step 1, but it has an average that is both statistically different from zero and economically significant ($17.19 million). To further illustrate the magnitudes of the exceptions remaining after step 1, we deflate the variables by total assets and present the summary statistics in Panel B. Nine of the twenty equations still contain exceptions with averages that are statistically significantly different than zero at the 1% level. Of those, five equations, in other words E1A.i, E1A.ii, E1B.i, E1C.ii, and E1D, have exceptions with an average of greater than 20% of total assets. We believe these exceptions are material enough to warrant the attention of researchers using Compustat data. Table 4 presents descriptive statistics for the remaining exceptions after all modifications. Panel A of Table 4 displays the raw values of the exceptions that remain after our three-step process, while Panel B displays these values scaled by total assets. Analyzing these panels simultaneously reveals that many of the remaining exceptions are indeed economically significant. [Insert Table 4 about here] Table 4 also presents descriptive statistics for our OE/S plug variables. As mentioned above, 18 these “plug” variables capture the exceptions in the OE/S equations, Equations 4Am – 4Cm and 4Em, in other words, cases where these equations do not hold. The large and significant STKNaplug (with a mean of $52.99 million) indicates that many observations in our final dataset suffer from incomplete data flowing through the equity accounts. The significantly negative value for the REUNAmplug (with a mean of -$16.20 million) suggests inconsistencies in reporting minority interest have a significant impact on the equity measurement. Lastly, we expected the MIBTaplug to capture much of the missing information in equity. It is less economically significant compared to the other plug variables (with a mean of $-4.81 million). Impact of resolutions on Financial Statement Ratios and Modified Variables Table 5 displays descriptive statistics for three types of data: variables that are significantly different in the MDB than the original data, some accrual variables, and financial statement ratios. The first six rows represent some commonly used variables that are significantly different between the original Compustat data and our modified data. For example, two subtotal accounts, current assets (ACT) and current liabilities (LCT) are significantly different (690.65 versus 523.14, t=7.69; 476.63 versus 373.75, t=6.25), which agrees with earlier analysis on the effect of null values present in these subtotals. Another interesting finding in two of the comprehensive income items (cibegni, citotal) are significantly different between the two sets of data (73.78 versus 182.19, t=-14.72; 70.55 versus 168.44, t=-13.08). This suggests that researchers may want to employ our model when utilizing data involving ACT, LCT and comprehensive income. [Insert Table 5 about here] The next seven rows in Table 5 present differences in accrual variables constructed using both our modified data (MDB) and the original Compustat data. The accrual variable definitions are based on Richardson et al. (2005). Significant differences exist between the two datasets for six of 19 the seven accrual variables (all but the raw working capital accruals). Finally, the last three rows in Table 5 display differences in some financial ratios often used by researchers between the two datasets. Both the Altman’s Z-score and the “quick ratio” are found to be have significantly different distribution at the 1% level (Z=8.30; Z=5.50). Taken collectively, these differences suggest that our process of modifying the raw data may have an impact on various activities where this data is used, both in academic research and in practice. That is, if one of these variables is the variable of interest, researchers need to be aware of the potential effects different sample and variable treatment methods may have on their inferences. V. CONCLUSIONS We build a version of the Financial Statement Balancing Model (FSBM) from Compustat and expand it to include stockholders’ equity statement relations to obtain a fully articulated modified FSBM model, or FSBMm. We then verify these relations using historical data on a sample of Compustat non-financial firms between 1988 and 2011. We identify 30.3% (560,684/1,847,444) of the cases where there are exceptions to these equations. We next investigate the causes of exceptions, and make necessary modifications that are consistent with the articulation of the FSBM model. We are able to resolve about 99% of the exceptions. Our paper is unique in being the first to use a fully articulated model to examine the Compustat data. Whether financial statements articulate in the databases we use in our research has implications on two fronts. First, in the face of missing values or other non-articulation situations, forming a sample or defining variables of interest without addressing these issues in a systematic fashion might complicate or even bias inferences, due to issues such as lower statistical power or omitted correlated variables. Second, the articulating nature of financial statement data 20 has been under-appreciated and under-utilized in research. By presenting a framework to produce a fully articulated dataset using our articulation model, we facilitate research efforts taking advantage of the articulation relations in the spirit of Christodoulou and McLeay (2014). Through this study we hope to shed light on these two issues and we present evidence that the process has the potential to make an impact on the sample size and the accuracy of data in empirical research. Our focus is on selected equations of the FSBM model, therefore only a subset of variables in Compustat is examined. However, the equations we present and the methods we use to address the exceptions can apply more broadly. Following our methodology, future researchers interested in data integrity for empirical research in accounting and finance could extend the analysis beyond our current sample non-financial US companies, or even extend our FSBM model to include their variables of interest. 21 REFERENCES Bahnson, P., P. B. Miller, and B. P. Budge. 1996. Nonarticulation in cash flow statements and implications for education, research and practice. Accounting Horizons 10(4): 1–15. Bennin, R. 1980. Error rates in CRSP and Compustat: A second look. Journal of Finance 35 (5): 1267–1271. Black, F. 1993. Choosing accounting rules. Accounting Horizons 7 (4): 1–17. Bloomfield, M. J., J. Gerakos, and A. Kovrijnykh. 2015. Accrual reversals and cash conversion. Working paper, University of Chicago. Available at SSRN: http://ssrn.com/abstract=2495610 or http://dx.doi.org/10.2139/ssrn.2495610. Boritz, J. E., and W. G. No. 2013. The quality of interactive data: XBRL versus Compustat, Yahoo Finance, and Google Finance. Available at SSRN: http://ssrn.com/abstract=2253638 or http://dx.doi.org/10.2139/ssrn.2253638. Casey, R., F. Gao, M. Kirschenheiter, S. Li, and S. Pandit. 2015. Articulation based accruals. working paper, University of Illinois at Chicago. Chen, S., B. Miao, and T. Shevlin. 2014. A new measure of disclosure quality. Working paper, University of Texas at Austin. Christodoulou, D., and S. McLeay. 2014. The double entry constraint, structural modeling and econometric estimation. Contemporary Accounting Research 7 (4): 1–20. Dritina, R., and J. A. Largay. 1985. Pitfalls in calculating cash flows from operations. The Accounting Review 60 (2): 314–326. Eilifsen, A., and W. F. Messier. 2013. Materiality guidance of the major auditing firms. Working paper, Norwegian School of Economics and University of Nevada Las Vegas. Hribar, P., and D. W. Collins. 2002. Errors in estimating accruals: Implications for empirical research. Journal of Accounting Research 40 (1): 105–134. Huefner, R. J., J. E. Ketz, and J. A. Largay. 1989. Foreign currency translation and the cash flow statement. Accounting Horizons 3 (2): 66–75. Koh, P. and D. M. Reeb. 2014. Missing R&D. Working paper, Hong Kong University of Science and Technology and National University of Singapore. Lee, S., S. Pandit, and R.H. Willis. 2013. Equity method investments and sell–side analysts’ information environment. The Accounting Review 88 (6): 2089–2115. Li, F. 2008. Annual report readability, current earnings, and earnings persistence. Journal of 22 Accounting and Economics 45: 221-247. Lundholm, R. J, R. Rogo and J. L. Zhang. 2014. Restoring the tower of babel: how foreign firms communicate with U.S. investors. The Accounting Review 89: 1453-1485. Mann, H. 1984. A worksheet for demonstrating the articulation of financial statements. The Accounting Review 59 (4): 669–673. Moehrle, S., T. Stober, K. Jamal (Chairman), R. Bloomfield, T. E. Christensen, R. H. Colson, J. Ohlson, S. Penman, S. Sunder, and R. L. Watts. 2010. Response to the Financial Accounting Standards Board’s and the International Accounting Standard Board’s joint discussion paper entitled “Preliminary views on financial statement presentation”. Accounting Horizons 24 (1): 149–158. Ohlson, J. A. 1995. Earnings, book values, and dividends in equity valuation. Contemporary Accounting Research 11 (2): 661–687. Penman, S. H. 2012. Financial statement analysis and security valuation, 5th Edition, McGraw– Hill/Irwin. Revsine, L., D. W. Collins, and W. B. Johnson. 2002. Financial reporting and analysis, 2nd Edition, Upper Saddle River, NJ. Prentice–Hall. Richardson, S. A., R. G. Sloan, M. T. Soliman, and I. Tuna. 2005. Accrual reliability, earnings persistence and stock prices. Journal of Accounting and Economics 39: 437–485. Rosenberg, B., and M. Houglet. 1974. Error rates in CSRP and Compustat databases and their implications. Journal of Finance 29 (4): 1303–1310. Shi, L., and H. Zhang. 2011. On alternative measures of accruals. Accounting Horizons 25 (4): 811–836. Wilkins, M. S. and M. L. Loudder. 2000. Articulation in cash flows statements: a resource for financial accounting courses. Journal of Accounting Education 18: 115–126. 23 APPENDIX A. FSBMm Variables and Equations In this appendix, we first list the new variables that we created for this study and second, we provide the detailed equations for the Financial Statement Balancing Model or FSBM. We created the new variables either by simply aggregating existing Compustat variables or by modifying Compustat variables; we denote the aggregated or modified variables with an “a” or an “m” in the subscript, respectively. These variables are then used to build the modified FSBM, denoted as FSBMm. We start with the aggregated and then follow with the modified variables, each set divided between B/S and I/S variables. Aggregated variables used in FSBMm B/S equations affected are 1C.im and 1C.iim: AOCIa = ACOMINC + SEQO. MIBTa = MIB + MIBN PSTKa = PSTKR + PSTKN STKNa = CSTK + CAPS – TSTK + PSTK. I/S equation affected is 2Em: ocia = cicurr + cidergl + cisecgl + ciother + cipen. Modified variables used in FSBMm B/S equations affected are 1C.im and 1C.iim: The following definitions are useful for expressing the modified variables. Let the current and non-current asset and current and non-current liability sums, denoted as Σ( ACTm ) , Σ ( ANT ) , Σ (LCT ) , and Σ (LNT ) , respectively, be defined as follows. Σ( ACTm ) ≡ CH m + IVST + RECTGR + TXR + RECCO + INVT + ACO , Σ ( ANT ) ≡ PPEGT − DPACT + IVAEQ + IVAO + INTNG + AO , Σ (LCT ) ≡ DLSC + AP + TXP + LCO , and Σ (LNT ) ≡ DLTT + TXDB + ITCB + LO . Then the following definitions hold. Σ(ACTm ) if 1A.i ≠ 0 and ACT = 0 ACTm = if 1A.i = 0 ACT ACTm + Σ(ANT) if 1A.ii ≠ 0 and AT = 0 ATm = if 1A.ii = 0 AT 24 if CH = IVST = 0, but CHE > 0 CHE CHm = if CH > 0 and CHE ≠ 0 CH Σ(LCT) if 1A.i ≠ 0 and LCT = 0 LCTm = LCT if 1B.i = 0 ( ) LCT + Σ LNT if 1B.ii ≠ 0 and LT =0 m LTm = if 1B.ii = 0 LT AOCI a REm = REUNAm RE RE REUNAm = REUNA if 1C.ii ≠ 0 and if RE = REUNAm = 0 if 1C.ii ≠ 0 and if RE = 0 and REUNA ≠ 0 if 1C.ii = 0 or if RE ≠ 0 if 1C.i ≠ 0 or if 1C.ii ≠ 0 and if year < 2001 and AOCI a = 0 if 1C.i = 1C.ii = 0 or if year ≥ 2001 or AOCI a ≠ 0 Variables used in I/S equations 2C, 2D.i, 2D.ii, 2E and 2F: if 2D.ii ≠ 0 and if year < 2001 and ocia = 0 ib if 2D.ii ≠ 0 and if 2009 > year ≥ 2001 or ocia ≠ 0 cibegnim = cibegni + mii cibegni if 2D.ii = 0 or if year ≥ 2009 if 2D.ii ≠ 0 or if 2E ≠ 0 and if year < 2001 and ocia = 0 ib if 2D.ii ≠ 0 or if 2E ≠ 0 and if 2009 > year ≥ 2001 or ocia ≠ 0 citotalm = citotal + mii citotal if 2D.ii = 2E = 0 or if year ≥ 2009 ibc + mii ibcm = ibc + mii ibc if year < 2009 if year ≥ 2009 and ibc ≠ pi − txt if 2D.i = 3A = 0 or if year ≥ 2009 and ibc = pi − txt We next show equations in the form used in older Compustat manuals to help readers better visualize how the numbers add-up. First, we show the 20 Compustat equations that form the FSBM. These cover the B/S, I/S and CF/S (8, 7 and 5 equations, respectively). We then follow with five equations from the OE/S, which, when added to the FSBM, form the FSBMm. Variables in parentheses indicate subtraction. 25 FSBMm Equations B/S (Eight Equations: 1Am – 1Em) Equation (1A.im): Assets – Current Cash Short – Term Investments Receivables – Trade Income Tax Refund Receivables – Current – Other Inventories – Total Current Assets - Other Current Assets – Total CHm IVST RECTR TXR RECCO INVT ACO ACTm Equation (1A.iim): Assets – Total Current Assets – Total Property Plant and Equipment – Total (Gross) Depreciation, Depletion and Amortization (Accumulated) Investment and Advances – Equity Investment and Advances Other Intangible Assets – Total Assets - Other Assets – Total ACTm PPEGT (DPACT) IVAEQ IVAO INTAN AO Equation (1B.im): Liabilities – Current Debt in Current Liabilities Account Payable/Creditors – Trade Income Taxes Payable Current Liabilities – Other Current Liabilities – Total DLC AP TXP LCO Equation (1B.iim): Liabilities – Total Current Liabilities – Total Long – Term Debt – Total Deferred Taxes – Balance Sheet Investment Tax Credit – Balance Sheet Liabilities – Other Liabilities – Total LCTm DLTT TXDB ITCB LO Equation (1C.im): Equity – Retained earnings Retained Earnings – unadjusted Accumulated Other Comprehensive Income Retained Earnings REUNAm AOCIa ATm LCTm LTm 26 REm B/S (Eight Equations: 1A.im – 1Em, continued) Equation (1C.iim): Equity – Total Stockholders’ equity accounts Retained Earnings Stockholders Equity – Parent – Total Equation (1Dm): Liabilities and equities Liabilities – Total Stockholders Equity – Parent – Total Noncontrolling Interest – Total Liabilities and Stockholders’ Equity - Total STKNa REm SEQ LTm SEQ MIBTa LSE Equation (1E): B/S Totals Liabilities and Stockholders' Equity - Total Assets – Total I/S (Seven Equations: 2A – 2F) Equation (2A): Operating income Sales/Turnover (Net) Cost of Goods Sold Selling, General and Administrative Expense Depreciation and Amortization Operating Income After Depreciation LSE ATm sale (cogs) (xsga) (dp) oiadp Equation (2B): Pre-tax income Operating Income After Depreciation Interest and Related Expense Nonoperating Income (Expense) - Total Special Items Pretax Income oiadp (xint) (nopi) (spi) Equation (2C): Net income Pretax Income Income Taxes - Total Noncontrolling Interest - Income Account Income Before Extraordinary Items pi (txt) (mii) pi Equation (2D.im): NI equivalence – CF/S Income Before Extraordinary Items Income Before Extraord. Items and Noncontrolling Interest Noncontrolling Interest - Income Account 27 ib ib ibcm (mii) I/S (Seven Equations: 2A – 2F, continued) Equation (2D.iim): NI equivalence – OE/S Income Before Extraordinary Items Extraordinary Items and Discontinued Operations Comprehensive Income Beginning Net Income Noncontrolling Interest (or NCI) - Income Account Equation (2Em): Comprehensive income Comprehensive Income Beginning Net Income Other Comprehensive Income Comprehensive Income - NCI Comprehensive Income – Parent Equation (2F): Extraordinary Income Income Before Extraordinary Items Dividends - Preferred/Preference Common Stock Equivalents - Dollar Savings Extraordinary Items and Discontinued Operations Net Income (Loss) CF/S (Five Equations: 3Am – 3D.iim) Equation (3Am): Operating Income Before Extraord. Items and NCI Depreciation and Amortization Extraordinary Items and Discontinued Operations Deferred Taxes Equity in Net Loss (Earnings) Sale of PP&E and Investments - (Gain) Loss Funds from Operations - Other Accounts Receivable - Decrease (Increase) Inventory - Decrease (Increase) Accounts Payable and Accrued Liabilities - Incr (Decr) Income Taxes - Accrued - Increase (Decrease) Assets and Liabilities - Other (Net Change) Operating Activities - Net Cash Flow Equation (3B): Investing Increase in Investments Sale of Investments Short-Term Investments - Change Capital Expenditures Sale of Property, Plant & Equipment Acquisitions Investing Activities - Other Investing Activities – Net Cash Flow ib xido cibegnim (mii) cibegnim ocia (cimii) citotalm ib (dvp) cstke xido niadj ibcm dpc xidoc txdc esubc sppiv fopo recch invch apalch txach aoloch oancf (ivch) siv ivstch (capx) sppe (aqc) ivaco ivncf 28 CF/S (Five Equations: 3Am – 3D.iim, continued) Equation (3C): Financing Sale of Common and Preferred Stock Excess Tax Benefit of Stock Options - Cash Flow Fin. Purchase of Common and Preferred Stock Cash Dividends Long-Term Debt – Issuance Long-Term Debt - Reduction Changes in Current Debt Financing Activities – Other Financing Activities - Net Cash Flow sstk txbcof (prstkc) (dv) dltis (dltr) dlcch fiao fincf Equation (3D.i): CF/S checks Operating Activities - Net Cash Flow Investing Activities – Net Cash Flow Financing Activities - Net Cash Flow Exchange Rate Effect Cash and Cash Equivalents - Increase (Decrease) oancf ivncf fincf exre Equation (3D.iim): CF/S checks Cash (t) Cash (t-1) Cash and Cash Equivalents - Increase (Decrease) (t) CHm,t (CHm,t-1) OE/S (Five Equations: 4Am – 4Em) Equation (4Am): Capital Stock Stockholders’ equity accounts (t) Stockholders’ equity accounts (t-1) Sale of Common and Preferred Stock Purchase of Common and Preferred Stock Stockholders’ equity accounts Plug chech checht STKNa,t (STKNa,t-1) sstk (prstkc) STKNaplug Equation (4Bm): Retained Earnings Retained Earnings – unadjusted (t) Retained Earnings – unadjusted (t-1) Income Before Extraord. Items and NCI Noncontrolling Interest - Income Account Dividends - Total Retained Earnings – unadjusted Plug REUNAmplug REUNAm,t (REUNAm,t-1) ibcm (mii) (dvt) 29 OE/S (Five Equations: 4Am – 4Em, continued) Equation (4Cm): AOCI Accumulated Other Comprehensive Income (t) Accumulated Other Comprehensive Income (t-1) Comprehensive Income – Parent Comprehensive Income Beginning Net Income Accumulated Other Comprehensive Income Plug Equation (4Dm): OE/S Stockholders Equity – Parent – Total (t) Stockholders Equity – Parent – Total (t-1) Comprehensive Income – Parent Sale of Common and Preferred Stock Purchase of Common and Preferred Stock Dividends - Total Stockholders’ equity accounts Plug Retained Earnings – unadjusted Plug Accumulated Other Comprehensive Income Plug Equation (4Em): Non-controlling interest Noncontrolling Interest – Total (t) Noncontrolling Interest – Total (t-1) Noncontrolling Interest - Income Account Comprehensive Income - NCI Noncontrolling Interest – Total Plug AOCIa,t (AOCIa,t-1) citotalm (cibegnim) AOCIaplug SEQt (SEQ t-1) citotalm sstk (prstkc) (dvt) STKNaplug REUNAmplug AOCIaplug MIBTa,t (MIBTa,t-1) mii cimii MIBTmplug 30 APPENDIX B. Examples of Resolution of Exceptions This appendix illustrates several examples of how we resolve the exceptions. Some are resolved after step 1, while others require three steps before resolution. Example 1 – Equation 1A.im for Berkshire Hathaway (Ticker = “BRK.B”) in 2011 Step 1 (raw data) and step 2 (replace with zero): (1A.i) CH+IVST+RECTR+TXR+RECCO+INVT+ACO=ACT BRK.B 2011 37,299 + 7,063 + 0 + 0 + 32,946 + 8,975 + 0 = 0; yields an exception of 86,283 Step 3 (use sum of components in place of null subtotal): (1A.i) CHm+IVST+RECTR+TXR+RECCO+INVT+ACO=ACTm BRK.B 2011 37,299 + 7,063 + 0 + 0 + 32,946 + 8,975 + 0 = 86,283; yields an exception of 0 Example 2 – Equation 1C.iim for Ford (Ticker = “F”) in 2000 Step 1 (raw data): (1C.ii) Ford (2000) step 1: STKNa + RE = SEQ “.” + 14,452 = 18,610; yields an exception of “.” or null Step 2 (replace with zeros): (1C.ii) STKNa + RE = SEQ Ford (2000) step 2: 0 + 14,452 = 18,610; yields an exception of -4,158 Step 3 (GAAP changes): (1C.ii) STKNa + REm = SEQ Ford (2000) step 2: 4,158 + 14,452 = 18,610; yields an exception of 0 Note: The 10-K for Ford has 4,158 under STKNa but we do not know why the STKNa field in Compustat has a null value. Example 3 – Equation 2A for Verizon (Ticker = “VZ”) in 2002 Step 1 (raw data) (2A) sale – cogs – xsga – dp = oiadp Verizon (2002) 67,625 – 38,664 – “.” – 13,423 = 15,538; yields an exception of “.” Step 2 (replace with zeros): (2A) sale – cogs – xsga – dp = oiadp Verizon (2002) 67,625 – 38,664 – 0 – 13,423 = 15,538; yields an exception of 0 31 Example 4 – Equation 2D.iim for GE (Ticker “GE”) in 2003 Step 1 (raw data): (2D.ii) ib + xido = cibegni – mii GE (2003) step 1: 15,589 + -587 = “.” - 290; yields an exception of . or null Step 2 (replace with zeros): (2D.ii) ib + xido = cibegni – mii GE (2003) step 2: 15,589 + -587 = 0 - 290; yields an exception of -15,292 Step 3 (GAAP changes): (2D.im) ib + xido = cibegnim – mii Ford (2000) step 2: 15,589 + -587 = 15,292 - 290; yields an exception of 0 Note: For GE it would not be possible to find 15,292 under cibegni in their income statement because GE did not report comprehensive income in 2003 (they started reporting it in 2004). Example 5a – Equation 3D.ii for Walmart (Ticker = “WMT”) in 2007 Step 3 (after all changes have been implemented) (3Dii) (CHm,t – CHm,t-1) = checht Walmart 2007 5,569 – 7,373 = -2,198; yields an exception of 394 Example 5b – Equation 3Dii for General Electric (Ticker = “GE”) in 2003-2008 Step 3 (after all changes have been implemented) (3Dii) (CHm,t – CHm,t-1) = checht Cash Amounts for General Electric Corporation GE Fy 2003 2004 2005 2006 2007 2008 CHt 12,664* 15,328* 9,011 14,275* 15,747 48,187* CHt – CHt-1 2,664 -6,317 5,264 1,472 32,440 checht 3D.iim * 2,664 -3,527* 2,474* 1,755* 32,336* 0 -2790 2790 283 104 CF/S Beg 8,910 12,664 15,328 11,801 14,276 16,031 CF/S End 12,664 15,328 11,801 14.275 16,031 48,367 Discont. Cash 0 3,267 2,976 0 300 180 As our source for the CF/S amounts in the three final columns, we used the two most recent years from the three annual reports for fiscal years 2004, 2006 and 2008. For each amount marked with a “*”, we trace the CH amount to the B/S and other amounts to the CF/S per the 2004, 2006 and 2008 annual reports. 32 Figure 1. Resolution of Exceptions Total number of exceptions: 560,684 Exceptions with Nulls: 548,902 Exceptions with Non-Nulls: 11,782 Step 1: replace nulls with zeros Number of exceptions after Step 1: 96,937 Step 2: replace sums/components Number of exceptions after Step 2: 86,595 Step 3: GAAP changes Number of exceptions after Step 3: 5,781 This chart depicts the three steps we take to resolve the exceptions. Step 1: Replacing null with zero; Step 2: Replacing subtotals with the sum of components for total current assets, total current liabilities, total assets, and total liabilities; Step 3: Adjusting variables in reflection of GAAP changes. 33 Table 1. Number of Firms, Firm-years (FYs), Firm-Year-Equation-Observations (FYEOs) and FYEO Exceptions in Sample Description Firms Firm Years Initial sample 28,209 266,711 Less: Financial, utilities, or insurance firms/FY's 18,177 165,563 Less: Non-U.S. firms/FY's 14,377 135,967 Less: Firms/FY's with sales < $1 million or assets < $1 million 13,120 120,092 Less: Firms/FY's with insufficient CRSP data 10,681 92,951 Number of FYEOs in final sample 1,847,444 This table summarizes how we generate the initial database of FYEOs. We start with all firms in the Compustat Fundamental Annual database as of January 17, 2014 over the 24 year period, 1988-2011. Our initial database has 28,209 unique firms and 266,711 FY’s. We delete the financial, utilities, and insurance firms and FY’s from the sample, leaving 18,177 firms and 165,563 FY’s. Next, we delete 29,596 FY’s corresponding to 3,800 firms that are not domiciled or headquartered in the US. We also delete 15,875 FY’s corresponding to 1,257 firms having assets or sales below $1 million, and a further 27,141 FY’s (2,439 firms) with insufficient data on CRSP, yielding a final sample of 10,681 unique firms with 92,951 FY’s. Last, we run the 20 equations on the 92,951 FY’s to obtain 1,859,020 FYEOs, and after adjusting for 11,576 firm-year observations where we do not have consecutive annual data items to calculate Equation 3D.ii, we are left with a total of 1,847,444 FYEOs. 34 Table 2. Resolution of Exceptions Equation E1A.i E1A.ii E1B.i E1B.ii E1C.i: E1C.ii: E1D E1E E2A E2B E2C E2D.i: E2D.ii: E2E E2F E3A E3B E3C E3D.i E3D.ii Total Total # of exceptions # of exceptions resolved in Step 1 # remaining exceptions after Step 1 # of exceptions resolved in Step 2 # remaining exceptions after Step 2 # of exceptions resolved in Step 3 # remaining exceptions after Step 3 7,335 23,057 3,082 5,965 57,999 1,219 85,639 0 7,600 8,703 10,927 16,716 74,164 85,356 2 49,038 33,639 83,826 310 6,107 560,684 4,281 20,491 995 3,679 45,236 3 85,430 0 7,598 8,699 10,926 10,935 12,428 85,336 1 48,987 33,608 83,791 283 1,040 463,747 3,054 2,566 2,087 2,286 12,763 1,216 209 0 2 4 1 5,781 61,736 20 1 51 31 35 27 5,067 96,937 3,041 2,350 2,077 2,284 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 590 10,342 13 216 10 2 12,763 1,216 209 0 2 4 1 5,781 61,736 20 1 51 31 35 27 4,477 86,595 0 0 0 0 12,636 905 0 0 0 0 0 5,706 61,550 17 0 0 0 0 0 0 80,814 13 216 10 2 127 311 209 0 2 4 1 75 186 3 1 51 31 35 27 4,477 5,781 This table summarizes the frequencies at which exceptions are resolved using the three different steps we employ. All equations are defined in the appendices. Step 1: Replacing null with zero; Step 2: Replacing subtotals with the sum of components for total current assets, total current liabilities, total assets, and total liabilities; Step 3: Adjusting variables in reflection of GAAP changes. 35 Table 3. Descriptive Statistics of Remaining Exceptions after Step 1 Panel A. Raw Values Exception E1A.i E1A.ii E1B.i E1B.ii E1C.i: E1C.ii: E1D E1E E2A E2B E2C E2D.i: E2D.ii: E2E E2F E3A E3B E3C E3D.i E3D.ii N 3,054 2,566 2,087 2,286 12,763 1,216 209 0 2 4 1 5,781 61,736 20 1 51 31 35 27 5,067 MEAN 5,389.74 -6,597.35 4,967.45 -4,516.56 17.19 -826.77 -204.64 0 10.00 16.00 68.00 22.71 43.85 542.05 1.00 70.51 87.48 -53.86 -3.04 -14.33 *** *** *** *** *** *** *** * *** *** ** ** ** * *** STD DEV 30,673.30 33,811.27 25,849.33 24,741.54 248.37 3,119.39 411.75 0 7.07 12.52 . 176.41 701.98 1,153.79 . 235.73 180.55 169.80 41.46 647.91 MAX 471,520.00 5.00 331,312.00 2,420.00 5,765.00 2,322.00 -1.00 0 15.00 34.00 68.00 6,155.00 26,305.00 4,300.00 1.00 834.00 799.00 293.00 124.00 35,622.00 36 Q3 688.00 -38.00 471.00 -14.00 11.00 -50.00 -16.00 0 15.00 24.50 68.00 15.00 20.00 615.50 1.00 14.00 106.00 1.00 11.00 6.00 MEDIAN 99.50 -233.00 101.00 -74.50 2.00 -188.50 -56.00 0 10.00 11.50 68.00 4.00 3.00 -4.00 1.00 2.00 9.00 -9.00 -2.00 -1.00 *** *** *** *** *** *** *** *** *** ** * *** Q1 6.00 -1,032.00 22.00 -398.00 -2.00 -540.50 -158.00 0 5.00 7.50 68.00 1.00 -4.00 -12.50 1.00 -3.00 1.00 -76.00 -15.00 -12.00 MIN -3,698.00 -471,520.00 -148.00 -331,312.00 -6,979.00 -55,617.00 -2,420.00 0 5.00 7.00 68.00 -4,120.00 -98,418.00 -264.00 1.00 -185.00 -69.00 -655.00 -83.00 -12,234.00 Table 3. Descriptive Statistics of Remaining Exceptions after Step 1 Panel B. Values Scaled by Total Assets Exception E1A.i E1A.ii E1B.i E1B.ii E1C.i: E1C.ii: E1D E1E E2A E2B E2C E2D.i: E2D.ii: E2E E2F E3A E3B E3C E3D.i E3D.ii N 3,054 2,566 2,087 2,286 12,763 1,216 209 0 2 4 1 5,781 61,736 20 1 51 31 35 27 5,067 MEAN 0.36 -0.49 0.23 -0.18 0.01 -0.40 -0.32 0 0.17 0.07 0.08 0.00 -0.06 0.00 0.01 0.09 0.14 -0.24 0.00 -0.01 *** *** *** *** *** *** *** * *** ** ** * *** STD DEV 0.39 0.32 0.23 0.28 0.15 0.44 0.19 0 0.21 0.12 . 0.06 0.45 0.11 . 0.18 0.29 0.38 0.38 0.14 MAX 1.00 0.00 6.26 1.14 11.20 6.26 -0.04 0 0.33 0.24 0.08 0.46 21.79 0.24 0.01 0.63 0.64 0.85 0.94 1.79 Q3 0.75 -0.18 0.32 -0.08 0.02 -0.22 -0.18 0 0.33 0.13 0.08 0.01 0.07 0.07 0.01 0.21 0.27 0.01 0.08 0.01 MEDIAN 0.30 -0.50 0.16 -0.15 0.00 -0.39 -0.27 0 0.17 0.02 0.08 0.00 0.03 -0.01 0.01 0.01 0.15 -0.18 -0.01 0.00 *** *** *** *** *** *** *** *** *** ** * *** Q1 0.05 -0.81 0.09 -0.30 0.00 -0.57 -0.44 0 0.02 0.01 0.08 0.00 -0.07 -0.08 0.01 -0.05 0.04 -0.43 -0.15 -0.03 MIN -0.91 -1.00 -0.16 -6.26 -1.39 -6.57 -1.14 0 0.02 0.01 0.08 -3.14 -49.83 -0.15 0.01 -0.12 -1.10 -1.16 -0.72 -2.99 This table provides descriptive statistics for exceptions remaining after the resolutions in Step 1, in which we replace nulls with zeros. Panel A is for raw values of the exceptions, while Panel B shows the values of the exceptions scaled by concurrent total assets. *, **, *** next to the mean and median columns indicate that the means and medians are statistically different from 0 at significance levels of 10%, 5%, and 1%. 37 Table 4. Descriptive Statistics of Remaining Exceptions after Step 3 Panel A. Raw Values Exception E1A.i E1A.ii E1B.i E1B.ii E1C.i: E1C.ii: E1D E1E E2A E2B E2C E2D.i: E2D.ii: E2E E2F E3A E3B E3C E3D.i E3D.ii E4Am: STKNaplug E4Bm: REUNAmplug E4Cm: AOCIaplug E4Em: MIBTmplug N 13 216 10 2 127 311 209 0 2 4 1 75 186 3 1 51 31 35 27 4,477 76,370 73,712 16,621 15,123 MEAN -63.23 -254.68 -45.90 -12.00 24.56 9.20 -204.64 0 10.00 16.00 68.00 59.59 13.13 87.33 1.00 70.51 87.48 -53.86 -3.04 -14.96 52.99 -16.20 -9.97 -4.81 *** *** *** *** *** * *** ** ** * *** *** *** *** * STD DEV 86.47 817.17 56.42 18.39 118.25 196.01 411.75 0 7.07 12.52 . 217.56 213.97 158.20 . 235.74 180.55 169.80 41.47 299.68 979.62 381.56 384.41 345.81 MAX -2.00 155.00 -1.00 1.00 894.00 2,317.00 -1.00 0 15.00 34.00 68.00 855.00 1,703.00 270.00 1.00 834.00 799.00 293.00 124.00 4,984.00 152,984.00 17,481.00 22,226.00 20,955.00 38 Q3 -5.00 -8.00 -1.00 1.00 17.00 15.00 -16.00 0 15.00 24.50 68.00 14.00 9.00 270.00 1.00 14.00 106.00 1.00 11.00 5.00 8.00 0.02 0.23 0.60 MEDIAN -23.00 -24.00 -23.00 -12.00 4.00 -1.00 -56.00 0 10.00 11.50 68.00 -1.00 1.00 -3.00 1.00 2.00 9.00 -9.00 -2.00 -1.00 1.00 0.00 0.00 -0.08 *** *** *** *** *** *** *** * *** *** ** ** ** * *** *** *** *** *** Q1 -100.00 -112.50 -101.00 -25.00 1.00 -14.00 -158.00 0 5.00 7.50 68.00 -7.00 -2.00 -5.00 1.00 -3.00 1.00 -76.00 -15.00 -11.00 0.02 -0.24 -0.40 -3.00 MIN -269.00 -6,260.00 -148.00 -25.00 -587.00 -941.00 -2,420.00 0 5.00 7.00 68.00 -360.00 -1,773.00 -5.00 1.00 -185.00 -69.00 -655.00 -83.00 -11,137.00 -52,346.00 -28,809.00 -17,489.00 -13,994.00 Table 4. Descriptive Statistics of Remaining Exceptions after Step 3 Panel B. Values Scaled by Total Assets Exception E1A.i E1A.ii E1B.i E1B.ii E1C.i: E1C.ii: E1D E1E E2A E2B E2C E2D.i: E2D.ii: E2E E2F E3A E3B E3C E3D.i E3D.ii E4Am: STKNaplug E4Bm: REUNAmplug E4Cm: AOCIaplug E4Em: MIBTmplug N 13 216 10 2 127 311 209 0 2 4 1 75 186 3 1 51 31 35 27 4,477 76,370 73,712 16,621 15,123 MEAN -0.20 -0.19 -0.06 -0.01 0.02 0.07 -0.32 0 0.17 0.07 0.08 -0.13 -0.01 -0.01 0.01 0.09 0.14 -0.24 0.00 -0.02 0.03 0.00 0.00 0.00 *** *** *** *** *** *** ** ** * *** *** *** *** ** STD DEV 0.25 0.27 0.06 0.03 0.06 0.42 0.19 0 0.21 0.12 . 0.47 0.21 0.09 . 0.18 0.29 0.38 0.38 0.13 0.24 0.20 0.05 0.09 MAX 0.00 0.16 0.00 0.01 0.17 6.26 -0.04 0 0.33 0.24 0.08 0.46 1.61 0.08 0.01 0.63 0.64 0.85 0.94 1.08 12.51 32.79 2.51 1.62 Q3 -0.02 -0.01 -0.03 0.01 0.03 0.03 -0.18 0 0.33 0.13 0.08 0.04 0.01 0.08 0.01 0.21 0.27 0.01 0.08 0.01 0.02 0.00 0.00 0.00 MEDIAN -0.06 -0.05 -0.05 -0.01 0.01 0.00 -0.27 0 0.17 0.02 0.08 -0.01 0.00 -0.01 0.01 0.01 0.15 -0.18 -0.01 0.00 0.00 0.00 0.00 0.00 *** *** *** *** *** *** ** *** *** *** *** *** *** Q1 -0.33 -0.21 -0.10 -0.03 0.00 -0.01 -0.44 0 0.02 0.01 0.08 -0.09 0.00 -0.10 0.01 -0.05 0.04 -0.43 -0.15 -0.03 0.00 0.00 0.00 0.00 MIN -0.73 -0.93 -0.16 -0.03 -0.40 -1.39 -1.14 0 0.02 0.01 0.08 -3.14 -1.52 -0.10 0.01 -0.12 -1.10 -1.16 -0.72 -2.99 -25.09 -3.80 -1.56 -7.90 This table provides descriptive statistics for exceptions remaining after the resolutions through Step 1, 2, and 3. Panel A is for raw values of the exceptions, while Panel B shows the values of the exceptions scaled by concurrent total assets. *, **, *** next to the mean and median columns indicate that the means and medians are statistically different from 0 at significance levels of 10%, 5%, and 1%. 39 Table 5. Descriptive Statistics for Financial Statement Ratios and Modified Variables That Are Significantly Different between the Original Compustat Dataset and the Modified Dataset Variable INVT xsga ACT LCT cibegni citotal ACCR_BS_RAW WCACCR_CF_RAW WCACCR_CF ACCR_CF_RAW ACCR_CF WC_RAW WC_SCALED ALTZ CURRENT_RATIO QUICK_RATIO Based on Modified Data (MDB) N Mean Median Std Dev 92,951 158.96 8.81 876.02 92,951 276.15 29.15 1,464.85 92,951 690.65 70.66 6,140.76 92,951 476.63 30.48 4,477.30 92,951 73.78 2.39 887.98 92,951 70.55 2.33 898.86 81,375 -85.20 -4.40 1,119.26 92,951 -23.37 -1.86 287.97 81,375 -0.03 -0.02 0.13 92,951 -107.90 -9.11 645.70 81,375 -0.09 -0.07 0.15 92,951 175.40 10.39 3,618.00 81,375 574.11 48.80 5,615.41 92,951 4.72 3.15 12.83 92,871 2.98 2.05 4.06 92,871 2.09 1.21 3.65 Based on Original Compustat Data T Test N Mean Median Std Dev tValue Probt 92,092 160.44 9.11 879.96 -0.36 0.72 85,477 300.30 35.65 1,525.18 0.00 -3.41 90,457 523.14 69.04 2,307.38 0.00 7.69 90,796 373.75 29.79 2,167.07 0.00 6.25 22,759 182.19 9.73 1,349.11 -14.72 0.00 22,774 168.44 8.67 1,382.52 -13.08 0.00 79,045 -77.92 -4.33 509.39 0.10 -1.67 50,442 -21.20 -1.74 232.44 -1.45 0.15 43,229 -0.04 -0.02 0.15 0.00 6.93 50,067 -68.10 -6.38 394.39 -12.58 0.00 42,901 -0.09 -0.08 0.16 0.00 7.35 90,427 77.08 10.10 543.20 0.00 8.08 79,219 413.09 47.46 1,736.93 0.00 7.72 82,560 4.74 3.26 11.96 -0.25 0.80 90,438 2.95 2.05 3.76 0.09 1.68 88,982 2.10 1.23 3.48 -0.65 0.52 Wilcoxon Test Z PROBZ 0.00 2.84 0.00 29.38 0.01 -2.69 0.02 -2.29 0.00 28.06 0.00 24.77 0.38 0.70 0.70 0.48 0.00 -10.06 0.00 22.91 0.00 -11.40 0.01 -2.73 0.00 -2.88 0.00 8.30 0.29 0.77 0.00 5.50 This table presents descriptive statistics for ratios and modified variables that are significantly different when calculated using the original Compustat dataset and the modified dataset. The modified dataset is constructed after resolutions of exceptions through Step 1, 2, and 3. Variables INVT, xsga, ACT, LCT, cibegni, and citotal are defined in the online appendix B. The other accruals variables and ratios are defined as follows: ACCR_BS_RAW: raw operating accruals, defined as ∆(current assets – cash and cash equivalents) – ∆(current liabilities – debt in current liabilities) – depreciation & amortization, calculated using balance sheet data; WCACCR_CF_RAW: raw working capital accruals, defined as change in accounts receivable + change in inventory – change in accounts payable and accrued liabilities – change in taxes payable – change in other assets and liabilities, calculated using statement of cash flows data; WCACCR_CF: working capital accruals, calculated as WCACCR_CF_RAW scaled by average total assets; ACCR_CF_RAW: raw operating accruals, defined as change in accounts receivable + change in inventory – change in accounts payable and accrued liabilities – change in taxes payable – change in other assets and liabilities – depreciation & amortization, calculated using statement of cash flows data; ACCR_CF: operating accruals, calculated ACCR_CF_RAW scaled by average total assets; ALTZ: Altman's Z Score, calculated as 1.2 x working capital/total assets + 1.4 x retained earnings/total assets + 3.3*operating income after depreciation & amortization/total assets + 0.6 x market value of equity/total liabilities + sales/total assets; WC_RAW: raw working capital, defined as ∆(current assets – cash and cash equivalents) – ∆(current liabilities – debt in current liabilities), calculated using balance sheet data; WC_SCALED: working capital, calculated as WC_RAW scaled by average total assets; CURRENT_RATIO: current ratio, calculated as total current assets/total current liabilities; QUICK_RATIO: quick ratio, calculated as (cash and cash equivalents + trade receivable)/total current liabilities. 40