Estimating Firms’ Research Quotient (RQ) Anne Marie Knott The Harvard Business Review article, “The Trillion Dollar R&D Fix” describes a measure of firms’ R&D effectiveness called RQ (Research Quotient) and explains how firms can utilize it to determine the optimal level of R&D investment, R*. At the request of the HBR editors, the article also includes an excel example capturing the intuition behind RQ. This is a very simplified approach to estimating RQ, which leads to substantial errors. In fact the raw RQ for the fictitious firm in the Excel example is high by a factor of two. I included a caution in the article comparing the two estimates, but it didn’t survive final edit. I have received a number of inquiries from firms indicating they want to estimate RQ internally on a regular basis. Because erroneous estimates of RQ could create worse problems than the current problem of not knowing RQ, I have created this tutorial. The tutorial describes how firms’ can estimate RQ on a regular basis. In addition to generating top-level (company-wide) estimates for RQ using “real numbers”, the advantage of internal estimation is the ability (in some cases) to generate division level estimates of RQ and R*. Such division level estimates 1) inform how R&D should be allocated across divisions, and 2) support internal benchmarking (by identifying which divisions have higher/lower R&D effectiveness). The tutorial has five sections. Section 1 describes the link to theory; Section 2 defines the fully specified approach to estimation; Section 3 describes the excel exercise and explains why it leads to erroneous estimates; Section 4 identifies a commercial means to obtain firm RQs; Section 5 provides links to other RQ resources. 1. Theory The formal link between firms’ R&D spending and growth come from endogenous growth theory, e.g., Romer 1990. These models rely on a construct of R&D elasticity to define the R&D spending that maximizes firm market value. The principal limitation of growth theory from the firm perspective is that it is concerned with macro-economics and thus typically treats firms as being identical. Accordingly, when the models are tested, R&D elasticity is estimated at the industry level or higher. More recent growth models e.g, Lentz and Mortensen 2008, have accommodated firm heterogeneity, but there hasn’t been a firmlevel measure of R&D elasticity to test them directly. RQ is the missing measure. Raw RQ is the “firm-specific output elasticity of R&D”. It is the term, γ, in the firm’s final goods production function (equation 1). Gamma is interpreted as the percentage increase in revenues associated with a 1% increase in R&D. Y = K L RSA where: α β γ δ ε (1) Y = output K = capital L = labor R = R&D S = spillovers (R&D available for free-riding) A = advertising Because γ is normally distributed, firm RQ resembles individual IQ. Both capture problem solving capability. For individuals, IQ is captured as the speed and accuracy of solving problems of increasing difficulty--within any given time constraint, individuals with higher IQ solve more problems correctly than those with lower IQ. For firms, IQ is efficiency solving new problems. For any given level of R&D spending, high IQ firms will generate more innovations, or for any given innovation, high IQ firms will invest less developing it. Accordingly, the raw values of γ are mapped onto the IQ scale (mean = 100, standard deviation = 15) to support intuition. 2. Fully-specified estimation 2.1 Model We derive (RQ) by estimating the production function (equation 1) with a random coefficients model that allows for heterogeneity in the output elasticity for R&D (as well as all other inputs). A random coefficients model represents a general functional form model which treats coefficients as being non-fixed (across members of a cross- section or over time) and potentially correlated with the error term. Random coefficient models are those in which each coefficient has two components: 1) the direct effect of the explanatory variable and 2) the random component that proxies for the effects of omitted variables. The empirical model (Equation 2) is a log transform of equation 1 that models output (valueadded, Y) for firm i in year t with random coefficients for all inputs (capital, K, labor, L, R&D, R, spillovers, S, and advertising, A) as well as the intercept: ln Yit = ( β 0 + β 0i ) + ( β1 + β1i ) ln K it + ( β 2 + β 2i ) ln Lit + ( β3 + β3i ) ln Rit + ( β 4 + β 4i ) ln Sit + ( β5 + β5i ) ln Ait + ε it (2) We estimate Equation 2 using the Stata program, xtmixed. xtmixed fits linear mixed models (both fixed effects and random effects) using maximum likelihood estimation. The random effects, β_i, are not directly estimated, but we form best linear unbiased predictions (BLUPs) of them (and standard errors) using xtmixed postestimation. 2.2 Data and variables We estimate firm RQ using moving 7- year panels of all US publicly traded firms engaged in R&D. Data for the study comes from the Compustat industrial annual file. Firm level data items include (in $MM unless otherwise stated): revenues (Yit), capital as net property, plant and equipment (Kit), labor as full-time equivalent employees (1000) (Lit), advertising (Ait), and R&D (Rit). From these primary data, we derive a secondary measure: firm specific spillovers (Sit) which is computed as the sum of the differences in knowledge between focal firm i and rival firm j for all firms in the respective industry (2-digit SIC) with more knowledge than firm i. Spillovers represent the knowledge firms free-ride on in generating their products/processes. Failure to include them in the estimation leads to substantial upward bias in gamma. 2.3 Getting by without the full compustat dataset In principle RQ estimation does not require the full set of firms in compustat. Non-focal firms play two roles: First, they “bootstrap” the production function (the estimation exploits information from all firms to better gauge what the elasticities should be for each input). Second, they help control for year-to-year changes in economic conditions. Third, they provide data on the spillover pool. There should be a subset of firms (most likely the set of firms in the two digit industry) sufficient to support reliable estimation. If so, firms could obtain 10K data for these firms directly from EDGAR rather than subscribing to Compustat. Since the reliability of these subset estimates varies across firms and industries, we recommend validating results against the full dataset (either on your own or by providing subset estimates to Berkeley Research Group for comparison to the master set of estimates) Note however subset estimation still requires the use of random coefficients to obtain the firm (or division) specific coefficients. 3. Spreadsheet estimation (use for intuition only) As mentioned in the introduction, spreadsheet estimation should only be used for developing intuition. It does not generate reliable estimates. I estimate RQ by combining data from all US publicly traded firms reporting R&D. This generates precise estimates that control for spillovers between firms and for economy-wide effects like recessions. To generate the spreadsheet estimate, you need several years’ data on revenues, and annual expenditures on inputs: PP&E (property, plant and equipment), labor, R&D and advertising. We show these for a fictitious firm in Table 1 (columns 1-5). Table 1 Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 1 2 Revenue PP&E 1484 1646 1717 1642 1846 1991 2225 2541 2753 4010 4089 3903 4061 4144 4324 4388 4644 4847 5273 5450 5534 518 522 581 538 533 525 551 571 596 1054 1079 1046 922 1072 1052 999 1004 976 960 955 979 3 4 Employee Advertising 6 6 6 5 5 5 5 6 7 11 11 11 10 9 9 8 8 8 8 8 8 219 244 270 243 287 272 285 349 362 474 465 352 397 456 429 435 450 474 486 499 518 5 6 7 8 9 10 R&D ln(Rev) ln(PPE) ln(Emp) ln(Adv) ln(R&D) 39 47 45 42 45 45 46 50 56 62 63 67 67 76 84 88 99 108 111 114 119 7.30 7.41 7.45 7.40 7.52 7.60 7.71 7.84 7.92 8.30 8.32 8.27 8.31 8.33 8.37 8.39 8.44 8.49 8.57 8.60 8.62 6.25 6.26 6.37 6.29 6.28 6.26 6.31 6.35 6.39 6.96 6.98 6.95 6.83 6.98 6.96 6.91 6.91 6.88 6.87 6.86 6.89 1.70 1.81 1.76 1.55 1.58 1.55 1.67 1.70 1.89 2.40 2.40 2.40 2.25 2.19 2.15 2.03 2.03 2.05 2.12 2.12 2.12 5.39 5.50 5.60 5.49 5.66 5.60 5.65 5.85 5.89 6.16 6.14 5.86 5.98 6.12 6.06 6.08 6.11 6.16 6.19 6.21 6.25 3.67 3.86 3.80 3.75 3.80 3.80 3.82 3.92 4.03 4.13 4.14 4.20 4.20 4.33 4.43 4.48 4.60 4.68 4.71 4.74 4.78 Next, transform each variable into log form (columns 6-10 of the table). To perform the analysis in Excel, choose “regression” from the Data Analysis tab.1 Designate column 6 as your dependent variable by highlighting rows 1-21 of that column. Designate columns 710 as your independent variables. Then choose “labels” to indicate the first row includes the variable name. When you have run the analysis, Excel will open a new worksheet with the regression results. We’ve shown you these results for the fictitious firm in Table 2. Column 2 marked “coefficients” contains the elasticity for each variable in column 1. The elasticities for PP&E, employees, advertising and R&D for the sample firm are 0.23, 0.15, 0.74, 0.43, respectively. These coefficients are not accurate for the reasons discussed previously. (When the fictitious firm data is combined with all publicly traded firms in a data set that also includes knowledge spillovers, its coefficients are 0.13, 0.51, 0.18, 0.21, respectively). Thus the coefficients on capital and R&D are high by a factor of 2, advertising is high by a factor of 4, and the labor coefficient is 70% below the fullyspecified estimate. 1 You need to have loaded the “Analysis Toolpak” to get the Data Analysis tab Table 2. SUMMARY OUTPUT Regression Statistics Multiple R 0.990523 R Square 0.981137 Adjusted R S0.976421 quare Standard Error 0.070398 Observations 21 ANOVA df Regression Residual Total SS MS F Significance F 4 4.12428 1.03107 208.0504 1.42E-­‐13 16 0.079294 0.004956 20 4.203573 Coefficients Standard Error t Stat P-­‐value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept 0.0705 0.8407 0.0838 0.9342 -­‐1.7117 1.8527 -­‐1.7117 1.8527 ln(PP&E) 0.2322 0.2055 1.1299 0.2752 -­‐0.2035 0.6679 -­‐0.2035 0.6679 ln(Employee)0.1499 0.1633 0.9176 0.3724 -­‐0.1963 0.4961 -­‐0.1963 0.4961 ln(Advertising0.7356 0.1653 4.4498 0.0004 0.3852 1.0861 0.3852 1.0861 ln(R&D) 0.4303 0.1089 3.9509 0.0011 0.1994 0.6612 0.1994 0.6612 4. Alternatives to internal estimation If you have no need to do divisional estimates, then it probably doesn’t make sense to invest in compustat and labor to generate RQ annually. Instead, you can obtain RQ and R* on a regular basis via an RQ data subscription. 5. Other RQ resources 5.1 Academic articles R&D/Returns Causality: Absorptive Capacity or Organizational IQ RQ and endogenous firm growth 5.2 NSF grants 0965147: Firm IQ: A Universal, Uniform and Reliable Measure of R&D Effectiveness The Impact of R&D Practices on R&D effectiveness (RQ) 5.3 HBR article 5.4 Consulting