firm - APDU: The Association of Public Data Users

advertisement
The Employer Universe:
The Business Register and the
Longitudinal Business Database
Javier Miranda, U.S. Census Bureau
Census Research Data Center Network
Overview
• Employer universe business data at the
US Census Bureau
• Research
• Public use versions
2
The Business Register: The Census
Bureau’s Business Master List
• Universe coverage of employers in the U.S. with IRS
filings
• Transaction list of administrative records (income, payroll)
• Enhanced with Census Collections to provide detail
• Origin and Use
• Enumeration list for census and frame for surveys
• Central storage of admin data for statistical products
• Source data for Census products (CBP, LBD, BDS, BITS…)
• Structure:
• Annual snapshots back to 1974, Single/Multi unit files
• Statistical Units:
• EIN (the admin unit), Establishments and firms
3
The Business Register: The Census
Bureau Business Master List
• Data in the BR
• Industry, Geography, Employment, Payroll, LFO, Sales,
Name and Address…
• Data often require substantial value added to be
utilized for research.
• Solution: The LBD
4
Longitudinal Business Database (LBD)
• Longitudinal Universe Database of US Employer
Business Establishments
• Uses Census Business Register longitudinal
linkages of both firms and establishments
– Census uniquely tracks firms and establishments through
Company Organization Survey and Economic Censuses
(and other surveys)
• All employers in the U.S.
– Complete sectoral coverage
– Detailed geography and industry
– Basic backbone to which all other Census business data
can be linked
• Long time series 1976-2008
• Firm and establishment characteristics
• Including size and age. Age is critical to understanding dynamics
and entrepreneurship.
5
LBD: Large vs Small vs Young
Small (1-500)
Large (500+)
Important to put job creation and destruction in context…
6
LBD: Life cycle dynamics of
businesses and who creates jobs
Net Employment Growth for Continuing Firms by Firm Age
0.2
0.15
0.1
0.05
0
a) 1
b) 2
c) 3
d) 4
e) 5
Firm Age Class
Age Only
f) 6
g) 7
h) 8
i) 9 k ) 10
1
2
3
4
5
l) 1 m) 1 n) 1 o) 1 p) 1 q) 16+
With Base Year Size Controls
With Current Year Size Controls
7
LBD: Life cycle dynamics of
businesses and who destroys jobs
Job Destruction from Firm Exit by Firm Age
0.15
0.1
0.05
0
Firm Age Class
Age Only
With Base Year Size Controls
With Current Year Size Controls
8
Census Data: Productivity Growth
Productivity Relative to Mature Surviving Incumbents
10%
Exits: Young & Mature
5%
Young Survivors
5%
3%
0%
-5%
Young Exits
Mature Exits
Young Survivors
Young Survivors Five
Years Later
-10%
-15%
-20%
-25%
-27%
-30%
-35%
-32%
9
LBD: The effect of business cycle
dynamics and credit conditions on
firms and job creation
Net Job Creation- Effects of Business
Cycle* by Firm Size, 1981-2008
Forms of financing differ
for small and large
firms…
0.000
-1.000
-2.000
-3.000
-4.000
Urate (ind*yr fixed effects)
Urate
Net Job Creation- Effects of FHFA* Prices
by Firm Size, 1981-2008
Urate (by itself)
0.200
0.150
0.100
0.050
0.000
-0.050
11
FHFA (ind*yr fixed effects)
FHFA
FHFA (by itself)
Private equity
1.6
1800
1.4
1600
1400
1.2
Thousands
Employment under Buyout Targets: By Year and as a Percent of Economy
1200
Large growth of
private equity since
1980’s
1
1000
0.8
800
0.6
600
0.4
400
0.2
200
0
0
19801981198219831984198519861987198819891990199119921993199419951996199719981999200020012002200320042005
Percent of LBD Employment
Buyout Employment (right axes)
Net loss of jobs but
consistent with
restructuring and
creative destruction
LBD: Entrepreneurial activity over time
and across states
Entrepreneurial
activity differs
across states…
13
Available Data
• Confidential Microdata only through the
RDCs
LBD: Public Use Products
• Business Dynamics Statistics
– Basic data by firm size and age across sectors,
states and time. Expansions to detailed ind and
geography.
– Data visualizations available
– http://www.ces.census.gov/index.php/bds/bds_ho
me
• Synthetic LBD (ver. 1) – public use microdata
– To be deployed this coming year via the Cornell
Virtual RDC
(http://www.vrdc.cornell.edu/news/data/lbdsynthetic-data/)
• Sister program: ILBD
15
Summary
• Very rich data by itself and when linked to other
products
• A NAS study “Understanding Business Dynamics”
discusses the importance of these data for accurate
and timely measurement of critical economic and
social concepts
• Lots of research opportunities…
More information about LBD and
BDS can be found at
Center for Economic Studies
http://www.ces.census.gov
You can email me at
Javier.miranda@census.gov
An Overview of Data from the
Economic Directorate
Shawn D. Klimek
U.S. Census Bureau
The Business Register
• The primary frame for establishment
and firm level surveys
• Identifying Business Units
– Employer Identification Number (EIN)
– Survey Unit Identifier (SURVUID, CFN)
– Firm Identifier (ALPHA, FIRMID)
– Social Security Number (SSN, PIK)
21
Hierarchy of BR Identifiers
ENTERPRISE
(1234560000)
SURVUID1
(EIN1)
SURVUID2
(EIN2)
SURVUID3
(EIN2)
22
Structure of Identifiers
• Census File Numbers (CFN, pre-2001)
– Single Units (0+EIN)
– Multi-units (ALPHA+PLANT #, e.g. 123456 0001)
• Survey Unit ID (2002 and later)
– 2XXXXXXXXX (Survey Unit)
– 8XXXXXXXXX (Alternate Reporting Unit)
• Firm ID
– SU (0+EIN)
– MU (ALPHA+0000)
23
Business Register Data Sources
• NAICS Industry Codes
– Economic Census and Surveys
– Bureau of Labor Statistics
– Internal Revenue Service
– Social Security Administration
24
Business Register Data Sources
• Firm Ownership & Control
– Annual Survey of Manufactures (~50,000 estabs)
– Company Organization Survey
• Annual
• Firms >250 employees (40,000 firms)
• List of establishments, basic frame information
– Economic Census
• Every 5 years
• All MU establishments (~1.6 million)
• Sample of SU firms (~2.9 million)
– Long/short forms (1.9 million)
– Classification forms (1 million)
25
Business Register Data Sources
• Geography
– Address
• Census Physical Address
– Company Organization Survey
– Economic Census
– Other surveys
• IRS Mailing Address
• BLS Mailing Address
26
2007 Economic Census
• Mailed 4.5 out of 7 million
establishments
– All MU establishments
– Sample of SU establishments
– 86% response rate overall
• Roughly 600 forms designed
• Roughly 1200 NAICS industries
27
“Division” of Labor
• Manufacturing Construction Division (MCD)
– Manufacturing
– Mining
– Construction
• Service Sector Statistics Division (SSSD)
–
–
–
–
Retail
Wholesale
Services
Communications, Utilities & Transportation
28
2007 Economic Census Timeline
• Collection Activities
– October 2007 to October 2008
• Publications
– Advance Report – early 2009
– Industry Series – December 2009
– Geographic Areas Series – December
2010
– Miscellaneous Subject Series – June 2011
29
The most detailed snapshot of the
economy
• 20,000+ items collected
• Basic Data Items – e.g. Payroll, Employment,
Revenue
• Industry
– Six-digit NAICS (~1,100)
• Geography
– State & County (~3,100)
– MSAs (~900)
– Census Places (~5,000 out of 18,000)
• Products & Revenue Lines
• Special Inquiries
30
2012 Economic Census Changes
• Proposed expansion of geography – publishing as
many Census Places as feasible
• North American Product Classification System –
changes to manufacturing, retail, and wholesale
product detail.
• NAICS 2012 – significant reduction in the number of
manufacturing industries (~260 down from ~470)
• Manufacturing “type of operations” may be coming
– Integrated Manufacturer
– Contract Manufacturer
– Factoryless Goods Producers
31
2012 Enterprise Statistics Program
• Intellectual Property Revenue
– All Multi-unit firms
– Sample of Single-unit firms (100,000)
– Different types of revenue
• Royalties
• Licensing Fees
• Franchising
• Manufacturing Activities
– Outsourcing
– Offshoring
32
Business Sample Revision (BSR)
• Derived from the Business Register
• Frame for Services Sector Statistics Division
(SSSD)
– Services: Quarterly & Annual, Expenses
– Retail: Monthly & Annual, Expenses
– Wholesale: Monthly & Annual, Expenses
• Sampling Units
– Firm Level, Industry Units
– EIN Level, Industry Units
33
Complications – e.g. productivity
• Outside of manufacturing we collect input data in a number of
programs
– Capital
• Annual Capital Expenditures Survey (ACES)
• Firm Level
– Employment & Payroll
• Business Register, Census, Annuals
– Other inputs (e.g. inventories, benefits)
• Annuals
• BSR Units
• Relatively few projects request these data, but replacement of
the Assets and Expenditures Survey (1992) and Business
Expenses Survey (BES) with the Annuals means demand
should be increasing.
34
Concluding Remarks
• Core Programs for research
– Business Register
– Economic Census
– Annual Survey of Manufacturers
• Many other programs…
– Company Organization Survey
– Indicators (M3, Retail, Wholesale)
35
Longitudinal Employer-Household
Dynamics (LEHD) Program
A Dynamic Data Source for the 21st Century
Erika McEntarfer
LEHD Economic Research Group
Center for Economic Studies, U.S. Census Bureau
Disclaimer: All data examples are fictional and do not reflect any individual or firm
data. Any opinions and conclusions expressed herein are those of the authors and
do not necessarily represent the views of the U.S. Census Bureau.
What is LEHD?
•
At its core, LEHD is a National
Longitudinal Job Frame
– Based on UI-Wage and other
administrative data sources
•
Primary Products
– Public use products: QWI, OnTheMap
– Rich micro data for research in the RDCs
37
Where does LEHD fit within the
Census Bureau’s data infrastructure?
• The Census Bureau maintains national
frames of household and business
establishments
• Household Frame: Master Address File
• Decennial Census, ACS, CPS, SIPP, etc.
• Establishment Frame: Business Register
• Economics Census, Monthly and Annual
Surveys, Longitudinal Business Database,
County Business Patterns, etc
38
LEHD is a national jobs frame
• Jobs are the unit of analysis in LEHD data
– Jobs are an employer – employee pair for a given time
period
• Integrate with
– Person and Household Data via “employee” information
– Establishment and Firm data via “employer” information
• Integration permits:
– Improved Public Use Products
– Richer Microdata for Research (via the Research Data
Centers)
39
The Concept – Data Integration
Longitudinal National
Frame of Jobs
• Leverage existing
data
• Create new data and
products
• Make valid detailed
data available while
protect confidentiality
• Cost-effective
• No respondent
burden
New data and products
40
Local Employment Dynamics
 A voluntary partnership between the states
and the U.S. Census Bureau
 States supply quarterly worker (UI wage)
and business (QCEW) records
 Census Bureau merges the state records
with other data to produce new data and
products about jobs, workers, industries and
your local economy
41
LEHD microdata available for
research in the RDCs
Changes jobs in Q3
Employment History File (EHF)
PIK
SEIN
Q1
Q2
Q3
Q4
Q5
Person1 Firm A
7000
7000
3000
0
0
Person1 Firm B
0
0
4000
8000
8000
Person2 Firm A
500
0
0
0
0
Person2 Firm D
0
1000
1000
0
0
Person2 Firm F
0
0
3000
4000
4000
Unit of observation is a job
Universe is jobs
covered by State UI
42
LEHD microdata available for
research in the RDCs
Employer Characteristics File (ECF)
Universe is
employers reporting
QCEW data
SEIN
SEINUNIT
Qtr
Industry
M1size
M2size
M3size
FirmA
Unit1
1
333333
302
335
330
FirmA
Unit2
1
666111
4030
4032
4031
FirmA
Unit3
1
444222
20
23
21
FirmB
Singleunit
1
771111
1
1
0
FirmC
Singleunit
1
666622
5
7
7
Unit of observation is a State UI taxpayer ID
43
LEHD microdata available for
research in the RDCs
Individual Characteristics File (ICF)
PIK
DOB
Sex
Race
Person1 MM/DD/YYYY
M/F
Race1
Person2 MM/DD/YYYY
M/F
Race4
Person3 MM/DD/YYYY
M/F
Race1
Demographic
information from
Census surveys and
SSA administrative
data.
Unit of observation is a Person ID (PIK)
44
Linking the data for analysis
Geo-coded Address List:
Person and Firm address
data
EHF
ICF
PIK
DOB
Sex
Race
Person1
1/3/73
F
White
Person2
3/1/37
F
Asian
PIK
SEIN
Q1
Q2
Q3
Person1
Firm A
7000
7000
3000
Person1
Firm B
0
0
4000
Person2
Firm A
500
0
0
Person2
Firm D
0
1000
1000
Person2
Firm F
0
0
3000
ECF
SEIN
SEINUNIT
Qtr
Industry
M1size
M2size
M3size
FirmA
Unit1
1
333333
302
335
330
FirmA
Unit2
1
666111
4030
4032
4031
FirmA
Unit3
1
444222
20
23
21
U2W: imputes PIK -> SEINUNIT
45
LEHD microdata available for research in
the RDCs
• Employment History Files
• PIK-level file, wage and employment history
• Employer Characteristics Files
• SEIN-level file, information on employers
• Individual Characteristics File
• Worker characteristics
• Geo-coded Address List
• SEIN and PIK addresses
• Unit-to-Worker Imputation File
• Impute from SEIN to establishment
• Business Register Bridge
46
Questions for Research: Example
Business Formation and Innovation
• Business formation is critical for job and productivity growth
• New firms are often small, sole proprietors and an important
fraction start as micro-enterprises (non-employer firms)
• By integrating LEHD microdata with business microdata data
researchers can track business startups.
– Where did the entrepreneur come from?
• What type of firm was entrepreneur working at?
• Are some business types and locations especially effective incubators
of new firms?
– What kinds of jobs do start-ups create?
• What kind of job paths are there at successful startups?
• Do workers at startups come from the community or are the workers
migrants?
47
Questions for Research: Example
Displaced worker outcomes
• What happens to the workers at establishments that have mass
layoff events?
– LEHD data allow researchers to follow these workers to their
subsequent jobs
– Can examine their wage outcomes and the characteristics of the
businesses that reemploy them.
• Tracking employment outcomes for workers who are displaced
• How long does it take to become re-employed?
• What types of jobs are they hired into (location,
industry)?
• What are the earnings outcomes?
48
Summary: Research using LEHD
data in RDCs
• LEHD microdata offer many unique advantages for
economic research:
•
•
•
•
Longitudinal linked employer-employee data
Follow employment histories of workers
Can identify nascent firms and follow them over time
Can identify co-workers
• Ability to link at the micro (individual, household,
establishment, firm) level records from different
census, survey and administrative programs, as well
as researcher provided data.
– Dramatically increases the analytical power of the
data.
49
Newly Discovered Microdata on
U.S. Manufacturing Plants from
the 1950s and 1960s
Randy A. Becker and Cheryl A. Grim
Center for Economic Studies
U.S. Bureau of the Census
January 2011
Disclaimer
The opinions and conclusions expressed
here today are our own and do not
necessarily represent the views of the
U.S. Census Bureau. All results have
been reviewed to ensure that no
confidential information is disclosed.
Unisys Clearpath IX 4400
UNIVAC I and UNIVAC 1105
Unisys Clearpath IX 4400
Historical Microdata Recovery from
Unisys Mainframe
Challenges
– Arcane, proprietary file format (CENIO)
– Data completely unstructured
• And record layout may no longer exist.
– Employed one or more (now) esoteric character sets (not ASCII)
• E.g., FIELDATA, Excess-3, EBCDIC, Binary integer numeric
Recovered before decommissioning in Spring 2010:
– Over 2,500 tapes containing more than 7,000 files
– Business microdata
• Covering nearly all sectors of the economy, including
manufacturing, mining, retail, wholesale, services, construction,
transportation, and agriculture.
• From as early as 1953 (and perhaps 1947)
Making the Data “Usable”
For technical reasons, the data were downloaded from the Unisys in
two forms:
– Assuming the data was all Excess-3
– Assuming the data was all FIELDATA
If the data is in another character set (e.g., Binary integer numeric) the
“gibberish” in the above FIELDATA file must be converted to ASCII
using the implied mapping.
Challenges in creating ASCII and SAS datasets:
– Data in a record might employ multiple character sets (e.g.,
both Excess-3 & Binary, depending on variable).
– Record layout may no longer exist (explaining variables, their
lengths, and character set employed), or it may be wrong.
• An electronic file containing the first 100 records using 5 different
assumptions of character set can help reveal variable length & character
set.
• Existing microdata (and published data) can potentially help in determining
what variables are.
Getting Access to these Data
If research does NOT hinge critically on recovered data
– Submit CES proposal as usual, requesting recovered data
• Development of these data can be cited as a benefit
• Within 6 months, the researcher provides internal technical note
concerning
the data quality, cleaned-up datasets, programs, and documentation
If research DOES hinge critically on recovered data
– Feasibility access, provide a brief description of intended research
• Researcher will obtain SSS clearance
• Access only to historical data of interest and (if requested)
analogous data
for later years for quality assurance purposes
• RDC lab fees are waived
• The project has no public output
• Within 6 months, the researcher provides internal technical note
concerning
the data quality, cleaned-up datasets, programs, and documentation
– Research access, submit CES proposal as usual
• Feasibility work can be cited as a benefit
Current U.S. Manufacturing
Microdata
Annual Survey of Manufactures (ASM)
– 1972 to present
Census of Manufactures (CM)
– 1963, 1967, and every 5 years thereafter
Longitudinally-consistent establishment
identifiers
– Plant entry, exit, growth, change
Cross-sectional links to data from other surveys
of manufactures
– R&D, PACE, MECS, PCU, SMT
Expansion of These Data Now
Possible
Individual ASMs from 1954-1964 and 1966-1971.
Longitudinally-linked, plant-level ASM data for:
– 25 selected 4-digit SIC industries
– 1954-1961
– Perhaps through 1963 and perhaps back to 1947.
Hundreds of rolls of 16 mm microfilm containing images of
completed survey forms from:
– 1954-1958 ASM
– 1958 CM
Decades-old research datasets by Richard & Nancy
Ruggles, Zvi Griliches, and Lawrence Klein.
1954-1958 ASM Shuttle Form
Years
Data items
Number of
observations
Match rate:
 48%
Weighted match rate:
 66%
Match rate:
 83%
Weighted match rate:
93%
Same SIC & county:
91%
Match rate:
 99.9%
Weighted match rate:
forthcoming
Year-to-year match rate:
78% to 98%
Match rate:
 91%
Weighted match rate:
 97%
Same SIC & county:
~ 100%
Conclusion
Much more work needs to be done
– Constructing linkages
– Differentiating between multiple versions of same
– “Proving in” the data (e.g., tab to publish totals)
Why is this worthwhile?
– Data over a few more business cycles
– New baselines, for example:
• The “heyday” of U.S. manufacturing
• Before 1970s energy crisis
• Before environmental regulation
Start to make plans to use these data!
Questions?
randy.a.becker@census.gov
cheryl.ann.grim@census.gov
www.census.gov/ces/
A Guide to the Proposal
Process and Using and RDC
James C. Davis
Boston Census Research Data Center
Center for Economic Studies
US Bureau of the Census
Any opinions and conclusions expressed herein are those of the authors and do not
necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to
ensure that no confidential information is disclosed.
70
Agenda
• Process for accessing restricted-use
data
– Research Data Center (RDC)
– Using an RDC
– Proposal Process
– Research Examples
71
Research Data Center (RDC)
• Census Bureau – university partnerships
– RDC fees
• Secure access to confidential microdata
– Thin client access to Census linux servers
– Census Bureau and other Federal statistical data
• Authorized researchers on approved projects
– Proposal
– RDC analysis
– Statistical estimates disclosure
72
73
Why Restrict Microdata Access?
Titles 13 (Census) /26 (IRS) U.S.C. and
CIPSEA protect confidentiality
– respondent cannot be identified
– only Census employees and temporary staff
can access microdata
– use limited to statistical purpose
– access must potentially provide legitimate
benefits to Census Bureau programs
74
Proposal Process
• Preliminary proposal
– www.ces.census.gov
• Proposal development
– Involve RDC staff
• Census Review
–
–
–
–
–
–
–
Feasibility
Requirement of benefits to Census
Scientific merit
Statistical purpose
Need for non-public data
Risk of disclosure
Availability of resources
• Other Agency Review
• Special Sworn Status application
75
Example Proposal Outline
• Overview
• Benefits to Census
• Methodology
– Estimating equations
• Required Data
• Expected Output
• Duration and Funding
76
9 Criteria for Benefits
• Understanding/improving the quality of data
• Leading to new or improved methodology to collect,
measure, or tabulate
• Enhancing the data collected (e.g. improving
imputations for non-response, developing links
across time or entities)
• Identifying limitations/improving the Business
Register
• Documenting new data collection needs
• Constructing, verifying, improving sampling frames
• Preparing estimates/characteristics of population
• Developing methodology for estimating nonresponse
• Developing statistical weights for a survey
77
Data Availability
Census Bureau Data
– Economic Data
• establishment or firm level
– Demographic Data
• household or individual level
– Combined Econ/Demo Data
• Longitudinal Employer-Household Dynamics (LEHD)
Other Agency Data
– National Center for Health Statistics (NCHS)
– Agency for Healthcare Research and Quality
(AHRQ)
78
RDC Economic Data Advantages
• No publicly-available microdata
– Internal data at establishment and firm level
– Universal scope
– Detailed industry and geography
• Linking Data
– Consistent identifiers
– Business register
• External data
79
Economic Research Examples
• Bernard, Redding, Schott
– (2010), “Multiple-Product Firms and Product
Switching,” American Economic Review
– (forthcoming), “Multi-Product Firms and Trade
Liberalization,” Quarterly Journal of Economics
• Census of Manufacturers, Longitudinal Business
Database, Business Register
• One half of firms alter their mix of products every five
years
• Firms exporting many products also serve many export
destinations and export more of a given product to a
given destination
80
Economic Research Examples
• Ellison, Glaeser, Kerr (2010), “What
Causes Industry Agglomeration?
Evidence from Coagglomeration
Patterns,” American Economic Review
– Economic Census and LBD
– Construct pairwise coagglomeration
indices for US manufacturing industries
– Relate coagglomeration levels to the
degree to which industry pairs share
goods, labor, or ideas
81
Economic Research Examples
• Greenstone, Hornbeck, Moretti (2010),
“Identifying Agglomeration Spillovers:
Evidence from Winners and Losers of Large
Plant Openings,” Journal of Political Economy
– Economic Census and LBD
– Winning and losing counties have similar trends in
incumbents’ TFP prior to a large new plant
opening.
– Five years after the opening, incumbent plants’
TFP is 12 percent higher in winning counties.
82
Economic Research Examples
• Chemmanur, He, Nandy (2010), “The Going
Public Decision and the Product Market,”
Review of Financial Studies
– Longitudinal Business Database (LBD), Census of
Manufacturers, Annual Survey of Manufacturers
– A private firm’s characteristics (e.g. TFP, sales
growth) significantly affect its likelihood of going
public after controlling for its access to private
financing
– IPOs of firms occur at the peak of their productivity
cycle
83
Conclusions
•
•
•
•
Start the process early
Use standard data sets if time-constrained
Write proposals geared towards multiple papers
Use proposal development as research time
– Understand the data & data limitations
– Read on-line documentation
• CES Working Papers
• Sampling Methodology/Survey Forms
• History of the Economic Census
• Time and data requests are crucial components –
adding data and/or time is difficult for Census
projects once underway
• Remember that the Predominant Purpose is to
benefit Census
• www.ces.census.gov
84
Download