The Employer Universe: The Business Register and the Longitudinal Business Database Javier Miranda, U.S. Census Bureau Census Research Data Center Network Overview • Employer universe business data at the US Census Bureau • Research • Public use versions 2 The Business Register: The Census Bureau’s Business Master List • Universe coverage of employers in the U.S. with IRS filings • Transaction list of administrative records (income, payroll) • Enhanced with Census Collections to provide detail • Origin and Use • Enumeration list for census and frame for surveys • Central storage of admin data for statistical products • Source data for Census products (CBP, LBD, BDS, BITS…) • Structure: • Annual snapshots back to 1974, Single/Multi unit files • Statistical Units: • EIN (the admin unit), Establishments and firms 3 The Business Register: The Census Bureau Business Master List • Data in the BR • Industry, Geography, Employment, Payroll, LFO, Sales, Name and Address… • Data often require substantial value added to be utilized for research. • Solution: The LBD 4 Longitudinal Business Database (LBD) • Longitudinal Universe Database of US Employer Business Establishments • Uses Census Business Register longitudinal linkages of both firms and establishments – Census uniquely tracks firms and establishments through Company Organization Survey and Economic Censuses (and other surveys) • All employers in the U.S. – Complete sectoral coverage – Detailed geography and industry – Basic backbone to which all other Census business data can be linked • Long time series 1976-2008 • Firm and establishment characteristics • Including size and age. Age is critical to understanding dynamics and entrepreneurship. 5 LBD: Large vs Small vs Young Small (1-500) Large (500+) Important to put job creation and destruction in context… 6 LBD: Life cycle dynamics of businesses and who creates jobs Net Employment Growth for Continuing Firms by Firm Age 0.2 0.15 0.1 0.05 0 a) 1 b) 2 c) 3 d) 4 e) 5 Firm Age Class Age Only f) 6 g) 7 h) 8 i) 9 k ) 10 1 2 3 4 5 l) 1 m) 1 n) 1 o) 1 p) 1 q) 16+ With Base Year Size Controls With Current Year Size Controls 7 LBD: Life cycle dynamics of businesses and who destroys jobs Job Destruction from Firm Exit by Firm Age 0.15 0.1 0.05 0 Firm Age Class Age Only With Base Year Size Controls With Current Year Size Controls 8 Census Data: Productivity Growth Productivity Relative to Mature Surviving Incumbents 10% Exits: Young & Mature 5% Young Survivors 5% 3% 0% -5% Young Exits Mature Exits Young Survivors Young Survivors Five Years Later -10% -15% -20% -25% -27% -30% -35% -32% 9 LBD: The effect of business cycle dynamics and credit conditions on firms and job creation Net Job Creation- Effects of Business Cycle* by Firm Size, 1981-2008 Forms of financing differ for small and large firms… 0.000 -1.000 -2.000 -3.000 -4.000 Urate (ind*yr fixed effects) Urate Net Job Creation- Effects of FHFA* Prices by Firm Size, 1981-2008 Urate (by itself) 0.200 0.150 0.100 0.050 0.000 -0.050 11 FHFA (ind*yr fixed effects) FHFA FHFA (by itself) Private equity 1.6 1800 1.4 1600 1400 1.2 Thousands Employment under Buyout Targets: By Year and as a Percent of Economy 1200 Large growth of private equity since 1980’s 1 1000 0.8 800 0.6 600 0.4 400 0.2 200 0 0 19801981198219831984198519861987198819891990199119921993199419951996199719981999200020012002200320042005 Percent of LBD Employment Buyout Employment (right axes) Net loss of jobs but consistent with restructuring and creative destruction LBD: Entrepreneurial activity over time and across states Entrepreneurial activity differs across states… 13 Available Data • Confidential Microdata only through the RDCs LBD: Public Use Products • Business Dynamics Statistics – Basic data by firm size and age across sectors, states and time. Expansions to detailed ind and geography. – Data visualizations available – http://www.ces.census.gov/index.php/bds/bds_ho me • Synthetic LBD (ver. 1) – public use microdata – To be deployed this coming year via the Cornell Virtual RDC (http://www.vrdc.cornell.edu/news/data/lbdsynthetic-data/) • Sister program: ILBD 15 Summary • Very rich data by itself and when linked to other products • A NAS study “Understanding Business Dynamics” discusses the importance of these data for accurate and timely measurement of critical economic and social concepts • Lots of research opportunities… More information about LBD and BDS can be found at Center for Economic Studies http://www.ces.census.gov You can email me at Javier.miranda@census.gov An Overview of Data from the Economic Directorate Shawn D. Klimek U.S. Census Bureau The Business Register • The primary frame for establishment and firm level surveys • Identifying Business Units – Employer Identification Number (EIN) – Survey Unit Identifier (SURVUID, CFN) – Firm Identifier (ALPHA, FIRMID) – Social Security Number (SSN, PIK) 21 Hierarchy of BR Identifiers ENTERPRISE (1234560000) SURVUID1 (EIN1) SURVUID2 (EIN2) SURVUID3 (EIN2) 22 Structure of Identifiers • Census File Numbers (CFN, pre-2001) – Single Units (0+EIN) – Multi-units (ALPHA+PLANT #, e.g. 123456 0001) • Survey Unit ID (2002 and later) – 2XXXXXXXXX (Survey Unit) – 8XXXXXXXXX (Alternate Reporting Unit) • Firm ID – SU (0+EIN) – MU (ALPHA+0000) 23 Business Register Data Sources • NAICS Industry Codes – Economic Census and Surveys – Bureau of Labor Statistics – Internal Revenue Service – Social Security Administration 24 Business Register Data Sources • Firm Ownership & Control – Annual Survey of Manufactures (~50,000 estabs) – Company Organization Survey • Annual • Firms >250 employees (40,000 firms) • List of establishments, basic frame information – Economic Census • Every 5 years • All MU establishments (~1.6 million) • Sample of SU firms (~2.9 million) – Long/short forms (1.9 million) – Classification forms (1 million) 25 Business Register Data Sources • Geography – Address • Census Physical Address – Company Organization Survey – Economic Census – Other surveys • IRS Mailing Address • BLS Mailing Address 26 2007 Economic Census • Mailed 4.5 out of 7 million establishments – All MU establishments – Sample of SU establishments – 86% response rate overall • Roughly 600 forms designed • Roughly 1200 NAICS industries 27 “Division” of Labor • Manufacturing Construction Division (MCD) – Manufacturing – Mining – Construction • Service Sector Statistics Division (SSSD) – – – – Retail Wholesale Services Communications, Utilities & Transportation 28 2007 Economic Census Timeline • Collection Activities – October 2007 to October 2008 • Publications – Advance Report – early 2009 – Industry Series – December 2009 – Geographic Areas Series – December 2010 – Miscellaneous Subject Series – June 2011 29 The most detailed snapshot of the economy • 20,000+ items collected • Basic Data Items – e.g. Payroll, Employment, Revenue • Industry – Six-digit NAICS (~1,100) • Geography – State & County (~3,100) – MSAs (~900) – Census Places (~5,000 out of 18,000) • Products & Revenue Lines • Special Inquiries 30 2012 Economic Census Changes • Proposed expansion of geography – publishing as many Census Places as feasible • North American Product Classification System – changes to manufacturing, retail, and wholesale product detail. • NAICS 2012 – significant reduction in the number of manufacturing industries (~260 down from ~470) • Manufacturing “type of operations” may be coming – Integrated Manufacturer – Contract Manufacturer – Factoryless Goods Producers 31 2012 Enterprise Statistics Program • Intellectual Property Revenue – All Multi-unit firms – Sample of Single-unit firms (100,000) – Different types of revenue • Royalties • Licensing Fees • Franchising • Manufacturing Activities – Outsourcing – Offshoring 32 Business Sample Revision (BSR) • Derived from the Business Register • Frame for Services Sector Statistics Division (SSSD) – Services: Quarterly & Annual, Expenses – Retail: Monthly & Annual, Expenses – Wholesale: Monthly & Annual, Expenses • Sampling Units – Firm Level, Industry Units – EIN Level, Industry Units 33 Complications – e.g. productivity • Outside of manufacturing we collect input data in a number of programs – Capital • Annual Capital Expenditures Survey (ACES) • Firm Level – Employment & Payroll • Business Register, Census, Annuals – Other inputs (e.g. inventories, benefits) • Annuals • BSR Units • Relatively few projects request these data, but replacement of the Assets and Expenditures Survey (1992) and Business Expenses Survey (BES) with the Annuals means demand should be increasing. 34 Concluding Remarks • Core Programs for research – Business Register – Economic Census – Annual Survey of Manufacturers • Many other programs… – Company Organization Survey – Indicators (M3, Retail, Wholesale) 35 Longitudinal Employer-Household Dynamics (LEHD) Program A Dynamic Data Source for the 21st Century Erika McEntarfer LEHD Economic Research Group Center for Economic Studies, U.S. Census Bureau Disclaimer: All data examples are fictional and do not reflect any individual or firm data. Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau. What is LEHD? • At its core, LEHD is a National Longitudinal Job Frame – Based on UI-Wage and other administrative data sources • Primary Products – Public use products: QWI, OnTheMap – Rich micro data for research in the RDCs 37 Where does LEHD fit within the Census Bureau’s data infrastructure? • The Census Bureau maintains national frames of household and business establishments • Household Frame: Master Address File • Decennial Census, ACS, CPS, SIPP, etc. • Establishment Frame: Business Register • Economics Census, Monthly and Annual Surveys, Longitudinal Business Database, County Business Patterns, etc 38 LEHD is a national jobs frame • Jobs are the unit of analysis in LEHD data – Jobs are an employer – employee pair for a given time period • Integrate with – Person and Household Data via “employee” information – Establishment and Firm data via “employer” information • Integration permits: – Improved Public Use Products – Richer Microdata for Research (via the Research Data Centers) 39 The Concept – Data Integration Longitudinal National Frame of Jobs • Leverage existing data • Create new data and products • Make valid detailed data available while protect confidentiality • Cost-effective • No respondent burden New data and products 40 Local Employment Dynamics A voluntary partnership between the states and the U.S. Census Bureau States supply quarterly worker (UI wage) and business (QCEW) records Census Bureau merges the state records with other data to produce new data and products about jobs, workers, industries and your local economy 41 LEHD microdata available for research in the RDCs Changes jobs in Q3 Employment History File (EHF) PIK SEIN Q1 Q2 Q3 Q4 Q5 Person1 Firm A 7000 7000 3000 0 0 Person1 Firm B 0 0 4000 8000 8000 Person2 Firm A 500 0 0 0 0 Person2 Firm D 0 1000 1000 0 0 Person2 Firm F 0 0 3000 4000 4000 Unit of observation is a job Universe is jobs covered by State UI 42 LEHD microdata available for research in the RDCs Employer Characteristics File (ECF) Universe is employers reporting QCEW data SEIN SEINUNIT Qtr Industry M1size M2size M3size FirmA Unit1 1 333333 302 335 330 FirmA Unit2 1 666111 4030 4032 4031 FirmA Unit3 1 444222 20 23 21 FirmB Singleunit 1 771111 1 1 0 FirmC Singleunit 1 666622 5 7 7 Unit of observation is a State UI taxpayer ID 43 LEHD microdata available for research in the RDCs Individual Characteristics File (ICF) PIK DOB Sex Race Person1 MM/DD/YYYY M/F Race1 Person2 MM/DD/YYYY M/F Race4 Person3 MM/DD/YYYY M/F Race1 Demographic information from Census surveys and SSA administrative data. Unit of observation is a Person ID (PIK) 44 Linking the data for analysis Geo-coded Address List: Person and Firm address data EHF ICF PIK DOB Sex Race Person1 1/3/73 F White Person2 3/1/37 F Asian PIK SEIN Q1 Q2 Q3 Person1 Firm A 7000 7000 3000 Person1 Firm B 0 0 4000 Person2 Firm A 500 0 0 Person2 Firm D 0 1000 1000 Person2 Firm F 0 0 3000 ECF SEIN SEINUNIT Qtr Industry M1size M2size M3size FirmA Unit1 1 333333 302 335 330 FirmA Unit2 1 666111 4030 4032 4031 FirmA Unit3 1 444222 20 23 21 U2W: imputes PIK -> SEINUNIT 45 LEHD microdata available for research in the RDCs • Employment History Files • PIK-level file, wage and employment history • Employer Characteristics Files • SEIN-level file, information on employers • Individual Characteristics File • Worker characteristics • Geo-coded Address List • SEIN and PIK addresses • Unit-to-Worker Imputation File • Impute from SEIN to establishment • Business Register Bridge 46 Questions for Research: Example Business Formation and Innovation • Business formation is critical for job and productivity growth • New firms are often small, sole proprietors and an important fraction start as micro-enterprises (non-employer firms) • By integrating LEHD microdata with business microdata data researchers can track business startups. – Where did the entrepreneur come from? • What type of firm was entrepreneur working at? • Are some business types and locations especially effective incubators of new firms? – What kinds of jobs do start-ups create? • What kind of job paths are there at successful startups? • Do workers at startups come from the community or are the workers migrants? 47 Questions for Research: Example Displaced worker outcomes • What happens to the workers at establishments that have mass layoff events? – LEHD data allow researchers to follow these workers to their subsequent jobs – Can examine their wage outcomes and the characteristics of the businesses that reemploy them. • Tracking employment outcomes for workers who are displaced • How long does it take to become re-employed? • What types of jobs are they hired into (location, industry)? • What are the earnings outcomes? 48 Summary: Research using LEHD data in RDCs • LEHD microdata offer many unique advantages for economic research: • • • • Longitudinal linked employer-employee data Follow employment histories of workers Can identify nascent firms and follow them over time Can identify co-workers • Ability to link at the micro (individual, household, establishment, firm) level records from different census, survey and administrative programs, as well as researcher provided data. – Dramatically increases the analytical power of the data. 49 Newly Discovered Microdata on U.S. Manufacturing Plants from the 1950s and 1960s Randy A. Becker and Cheryl A. Grim Center for Economic Studies U.S. Bureau of the Census January 2011 Disclaimer The opinions and conclusions expressed here today are our own and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed. Unisys Clearpath IX 4400 UNIVAC I and UNIVAC 1105 Unisys Clearpath IX 4400 Historical Microdata Recovery from Unisys Mainframe Challenges – Arcane, proprietary file format (CENIO) – Data completely unstructured • And record layout may no longer exist. – Employed one or more (now) esoteric character sets (not ASCII) • E.g., FIELDATA, Excess-3, EBCDIC, Binary integer numeric Recovered before decommissioning in Spring 2010: – Over 2,500 tapes containing more than 7,000 files – Business microdata • Covering nearly all sectors of the economy, including manufacturing, mining, retail, wholesale, services, construction, transportation, and agriculture. • From as early as 1953 (and perhaps 1947) Making the Data “Usable” For technical reasons, the data were downloaded from the Unisys in two forms: – Assuming the data was all Excess-3 – Assuming the data was all FIELDATA If the data is in another character set (e.g., Binary integer numeric) the “gibberish” in the above FIELDATA file must be converted to ASCII using the implied mapping. Challenges in creating ASCII and SAS datasets: – Data in a record might employ multiple character sets (e.g., both Excess-3 & Binary, depending on variable). – Record layout may no longer exist (explaining variables, their lengths, and character set employed), or it may be wrong. • An electronic file containing the first 100 records using 5 different assumptions of character set can help reveal variable length & character set. • Existing microdata (and published data) can potentially help in determining what variables are. Getting Access to these Data If research does NOT hinge critically on recovered data – Submit CES proposal as usual, requesting recovered data • Development of these data can be cited as a benefit • Within 6 months, the researcher provides internal technical note concerning the data quality, cleaned-up datasets, programs, and documentation If research DOES hinge critically on recovered data – Feasibility access, provide a brief description of intended research • Researcher will obtain SSS clearance • Access only to historical data of interest and (if requested) analogous data for later years for quality assurance purposes • RDC lab fees are waived • The project has no public output • Within 6 months, the researcher provides internal technical note concerning the data quality, cleaned-up datasets, programs, and documentation – Research access, submit CES proposal as usual • Feasibility work can be cited as a benefit Current U.S. Manufacturing Microdata Annual Survey of Manufactures (ASM) – 1972 to present Census of Manufactures (CM) – 1963, 1967, and every 5 years thereafter Longitudinally-consistent establishment identifiers – Plant entry, exit, growth, change Cross-sectional links to data from other surveys of manufactures – R&D, PACE, MECS, PCU, SMT Expansion of These Data Now Possible Individual ASMs from 1954-1964 and 1966-1971. Longitudinally-linked, plant-level ASM data for: – 25 selected 4-digit SIC industries – 1954-1961 – Perhaps through 1963 and perhaps back to 1947. Hundreds of rolls of 16 mm microfilm containing images of completed survey forms from: – 1954-1958 ASM – 1958 CM Decades-old research datasets by Richard & Nancy Ruggles, Zvi Griliches, and Lawrence Klein. 1954-1958 ASM Shuttle Form Years Data items Number of observations Match rate: 48% Weighted match rate: 66% Match rate: 83% Weighted match rate: 93% Same SIC & county: 91% Match rate: 99.9% Weighted match rate: forthcoming Year-to-year match rate: 78% to 98% Match rate: 91% Weighted match rate: 97% Same SIC & county: ~ 100% Conclusion Much more work needs to be done – Constructing linkages – Differentiating between multiple versions of same – “Proving in” the data (e.g., tab to publish totals) Why is this worthwhile? – Data over a few more business cycles – New baselines, for example: • The “heyday” of U.S. manufacturing • Before 1970s energy crisis • Before environmental regulation Start to make plans to use these data! Questions? randy.a.becker@census.gov cheryl.ann.grim@census.gov www.census.gov/ces/ A Guide to the Proposal Process and Using and RDC James C. Davis Boston Census Research Data Center Center for Economic Studies US Bureau of the Census Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed. 70 Agenda • Process for accessing restricted-use data – Research Data Center (RDC) – Using an RDC – Proposal Process – Research Examples 71 Research Data Center (RDC) • Census Bureau – university partnerships – RDC fees • Secure access to confidential microdata – Thin client access to Census linux servers – Census Bureau and other Federal statistical data • Authorized researchers on approved projects – Proposal – RDC analysis – Statistical estimates disclosure 72 73 Why Restrict Microdata Access? Titles 13 (Census) /26 (IRS) U.S.C. and CIPSEA protect confidentiality – respondent cannot be identified – only Census employees and temporary staff can access microdata – use limited to statistical purpose – access must potentially provide legitimate benefits to Census Bureau programs 74 Proposal Process • Preliminary proposal – www.ces.census.gov • Proposal development – Involve RDC staff • Census Review – – – – – – – Feasibility Requirement of benefits to Census Scientific merit Statistical purpose Need for non-public data Risk of disclosure Availability of resources • Other Agency Review • Special Sworn Status application 75 Example Proposal Outline • Overview • Benefits to Census • Methodology – Estimating equations • Required Data • Expected Output • Duration and Funding 76 9 Criteria for Benefits • Understanding/improving the quality of data • Leading to new or improved methodology to collect, measure, or tabulate • Enhancing the data collected (e.g. improving imputations for non-response, developing links across time or entities) • Identifying limitations/improving the Business Register • Documenting new data collection needs • Constructing, verifying, improving sampling frames • Preparing estimates/characteristics of population • Developing methodology for estimating nonresponse • Developing statistical weights for a survey 77 Data Availability Census Bureau Data – Economic Data • establishment or firm level – Demographic Data • household or individual level – Combined Econ/Demo Data • Longitudinal Employer-Household Dynamics (LEHD) Other Agency Data – National Center for Health Statistics (NCHS) – Agency for Healthcare Research and Quality (AHRQ) 78 RDC Economic Data Advantages • No publicly-available microdata – Internal data at establishment and firm level – Universal scope – Detailed industry and geography • Linking Data – Consistent identifiers – Business register • External data 79 Economic Research Examples • Bernard, Redding, Schott – (2010), “Multiple-Product Firms and Product Switching,” American Economic Review – (forthcoming), “Multi-Product Firms and Trade Liberalization,” Quarterly Journal of Economics • Census of Manufacturers, Longitudinal Business Database, Business Register • One half of firms alter their mix of products every five years • Firms exporting many products also serve many export destinations and export more of a given product to a given destination 80 Economic Research Examples • Ellison, Glaeser, Kerr (2010), “What Causes Industry Agglomeration? Evidence from Coagglomeration Patterns,” American Economic Review – Economic Census and LBD – Construct pairwise coagglomeration indices for US manufacturing industries – Relate coagglomeration levels to the degree to which industry pairs share goods, labor, or ideas 81 Economic Research Examples • Greenstone, Hornbeck, Moretti (2010), “Identifying Agglomeration Spillovers: Evidence from Winners and Losers of Large Plant Openings,” Journal of Political Economy – Economic Census and LBD – Winning and losing counties have similar trends in incumbents’ TFP prior to a large new plant opening. – Five years after the opening, incumbent plants’ TFP is 12 percent higher in winning counties. 82 Economic Research Examples • Chemmanur, He, Nandy (2010), “The Going Public Decision and the Product Market,” Review of Financial Studies – Longitudinal Business Database (LBD), Census of Manufacturers, Annual Survey of Manufacturers – A private firm’s characteristics (e.g. TFP, sales growth) significantly affect its likelihood of going public after controlling for its access to private financing – IPOs of firms occur at the peak of their productivity cycle 83 Conclusions • • • • Start the process early Use standard data sets if time-constrained Write proposals geared towards multiple papers Use proposal development as research time – Understand the data & data limitations – Read on-line documentation • CES Working Papers • Sampling Methodology/Survey Forms • History of the Economic Census • Time and data requests are crucial components – adding data and/or time is difficult for Census projects once underway • Remember that the Predominant Purpose is to benefit Census • www.ces.census.gov 84