Who’s Employed? An in Depth Comparison of Employment Data Sources Gregory Giaimo, PE Samuel Granato, PE Andrew Hurst The Ohio Department of Transportation Division of Planning Presented at The 14th Transportation Planning Applications Conference May 6, 2013 Overview • Motivation • Macro View-QCEW vs. BEA Control Totals for Data Expansion • Micro View-QCEW vs. Purchased Data for Possible Replacement Motivation • For Travel Modeling Want Employment Data With: • Accuracy (correct employment/employers) • Completeness (all employment/employers) • Spatial Precision (geocodable address of individual employers at actual place of business activity) • Temporal Consistency (no defunct businesses, contain new businesses extant on the supposed date of the dataset) • Categorization (correct NAICS or similar) • Disaggregate (individual employer records allows data checking, finer TAZ disaggregation and future travel demand models (particularly freight) will include disaggregate attraction end modeling including business synthesizers similar to current household synthesizers) • There Area a Number of Potential Employment Data Sources Motivation • QCEW (Quarterly Census of Employment and Wages) • Regulatory dataset for Federal unemployment insurance • Pros: cheap, regulatory basis implies it is complete and temporally consistent for covered sectors • Cons: confidentiality restrictions, uncovered sectors for those exempt from Federal unemployment insurance laws (sole proprietors, small farms, railroads, military, small non-profits, student workers, elected officials etc.), sub-county location must be geocoded by user from mailing addresses (regulations only require correct county and ability to mail a bill), single site reporting for multisite businesses, government particularly poor • BEA (Bureau of Economic Analysis) • Dataset maintained by Federal Government for Macro-Economic Analysis • Pros: based on QCEW but enhanced with other administrative sources such as income tax data to provide complete and temporally consistent data • Cons: Only aggregate county level data available Motivation • LEHD (Longitudinal Employer-Household Dynamics) • Census Bureau product based on QCEW and linked with ACS data • Pros: Same pros as other QCEW based sources, no confidentiality restrictions or costs, in addition dataset provides linkages between employee residences and employer locations • Cons: Same pros as other QCEW based sources, plus no employer records only aggregate employment, Census Bureau masking, a PUMS-like product for employment would alleviate some of this constraint • Private Sources (InfoGroup’s InfoUSA/ReferenceUSA, Dun & Bradstreet’s Global Commercial Database etc.) • Several firms assemble employment data, primarily for resale for business marketing purposes, they use phone directories and other publicly available sources and then enhance and verify it with their staff • Pros: Good spatial precision, few of the multi-site problems in QCEW, reasonably complete • Cons: Cost, lack of regulatory basis means incompleteness is ill-defined, temporal consistency is poor because primary purpose of dataset makes it more likely that defunct businesses are retained Motivation • Since 2000 ODOT has utilized QCEW as its primary source of employment data, confidentiality requirements mean model employment data can’t be given out freely creating some logistical issues with the models and consultant contracts, also the latest confidentiality agreement includes stricter personal liability making some hesitant to sign • Ohio library system has a license for Infogroups’s ReferenceUSA, allowing state agencies to query 50 records at a time, based on this data, ODOT also received a small area sample of their InfoUSA database for this study • ODOT Economic Development and Planning Offices also recently purchased two separate version of the Dun and Bradstreet database for their own purposes (largely due to QCEW confidentiality limits) • Taken with the public availability of LEHD and BEA data this provided an opportunity and need for ODOT to compare and contrast data sources Macro-View • Macro-View will focus on QCEW vs. BEA • Expand QCEW to BEA to account for: 1. Ungeocoded QCEW (records do travel modelers no good if not located) 2. Uncovered employment sectors 3. Sole proprietors (most important) 4. Difference between 1st Qtr. QCEW and annual average BEA Total Employment Employees Percent QCEW Geocoded 4765940 74% QCEW Total 4909538 76% BEA Wage 5199216 81% BEA Total 6451236 100% Ohio Employment Sources 7000000 6000000 • Important to expand by county and industry as will be shown 5000000 BEA Proprietors 4000000 Extra BEA Wage 3000000 Ungeocoded 2000000 Geocoded 1000000 0 Employees QCEW vs. BEA Industry Level QCEW vs. BEA QCEW BEA Employers Employees County INDUSTRY GeocodedUngeocoded %Geocoded GeocodedUngeocoded %Geocoded Total Allocated %Allocated%QCEWofBEA AG/FISH/FOREST 1150 47 96% 11770 128 99% 91078 84038 92% 13% MINNING 709 83 90% 9885 462 96% 27895 19410 70% 37% UTILITIES 894 86 91% 29659 1946 94% 20765 17853 86% 152% CONSTRUCTION 22411 2235 91% 150915 6822 96% 296852 291608 98% 53% MANUFACTURING 16008 524 97% 608488 2580 100% 648564 647290 100% 94% WHOLESALE 15815 7228 69% 193657 21674 90% 236906 226113 95% 91% RETAIL 35467 1080 97% 536292 4922 99% 671615 671615 100% 81% TRANS/WAREHOUSE 8000 763 91% 183774 3288 98% 215452 196664 91% 87% INFORMATION 3730 913 80% 86949 5673 94% 93023 92724 100% 100% FINANCE/INS 16390 1292 93% 203054 6198 97% 331883 331377 100% 63% REAL ESTATE/RENT 9642 696 93% 55617 1679 97% 234520 233849 100% 24% PROF/TECH SERVICES 24846 4983 83% 227422 16112 93% 367974 355874 97% 66% MGMT SERVICES1531 215 88% 106652 1344 99% 113014 110997 98% 96% ADMIN/SUPPORT 13990 SRV 2470 85% 248063 17312 93% 387132 383296 99% 69% EDUCATION 6419 324 95% 456385 5389 99% 147691 137663 93% 313% HEALTH CARE/SOCIAL 26928 858 97% 805857 14069 98% 830432 778222 94% 99% ARTS/REC 3739 300 93% 56763 2282 96% 119530 119412 100% 49% ACCOMODATION/FOOD 22412 529 98% 413534 3468 99% 443910 443303 100% 94% OTHER SERVICES 22661 1390 94% 146197 3370 98% 338268 337561 100% 44% PUBLIC ADMIN 6850 1153 86% 234043 24569 90% 834732 834732 100% 31% UNCLASSIFIED 547 309 64% 964 311 76% 0 0 0 Total 260139 27478 90% 4765940 143598 97% 6451236 6451236 100% 76% QCEW vs. BEA • There are significant differences so it’s worth delving a bit deeper QCEW Geocoding • Mostly automated but manual passes on large employers (hence while only 90% of employers geocoded, 97% of employment) • Geocoding not even across industry categories or counties • ODOT spent a lot of time fixing multi-site employers, especially school districts which now appear in Ohio’s official file QCEW Geocoding Percentages 100% 90% 80% 70% 60% 50% 40% 30% 20% Employers 10% Employees UNCLASSIFIED PUBLIC ADMIN OTHER SERVICES ACCOMODATION/FOOD ARTS/REC HEALTH CARE/SOCIAL EDUCATION ADMIN/SUPPORT SRV MGMT SERVICES PROF/TECH SERVICES REAL ESTATE/RENT FINANCE/INS INFORMATION TRANS/WAREHOUSE RETAIL WHOLESALE MANUFACTURING CONSTRUCTION UTILITIES MINNING AG/FISH/FOREST 0% BEA Characteristics • While BEA industry and county marginal totals add up, the joint distribution values do not due to limitations in the sources BEA uses to fill in QCEW gaps • Hence if you are expanding to industry/county totals you need to use an Iterative Proportional Fitting routine (i.e. Fratar) to account for the unallocated employment (not all industries/counties equal in this regard) • BEA data has different (and much higher) sole proprietor rate for farm than other types BEA Percent Allocated to Counties 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% BEA Proprietor Rates Farm 83% Private 21% Government 0% Comparing QCEW/BEA • BEA adds many commission only employees in NAICS 50 categories, particularly real estate so you should expect high expansion factors here • ODOT uses Q1 QCEW so we get high expansion factors in seasonal industries (construction and arts/recreation) UNCLASSIFIED PUBLIC ADMIN OTHER SERVICES ARTS/REC HEALTH CARE/SOCIAL EDUCATION ADMIN/SUPPORT SRV MGMT SERVICES PROF/TECH SERVICES REAL ESTATE/RENT FINANCE/INS INFORMATION TRANS/WAREHOUSE RETAIL WHOLESALE MANUFACTURING CONSTRUCTION UTILITIES MINNING AG/FISH/FOREST 350% 300% 250% 200% 150% 100% 50% 0% ACCOMODATION/FO… Percent Total QCEW to Total BEA • Note similarity to previous map Comparing QCEW/BEA • Tiny representation of agriculture in QCEW renders direct expansion sub-optimal • ODOT allocates the BEA farm proprietors based on agricultural acreage instead Agricultural Employment From ES202 vs Distributed Proportionally to Ag. Acreage 800 700 600 500 es202 farm 400 300 200 100 0 1 46 91 136 181 226 271 316 361 406 451 496 541 586 631 676 721 766 811 856 901 946 991 1036 Comparing QCEW/BEA • While of minor importance, we decided to allocate some of the missing transportation employment to rail terminals prior to expansion Macro-View Wrap Up • As mentioned previous, ODOT evaluated other sources beyond QCEW • At a macro level, there are significant differences • These are more difficult to understand at this level, so ODOT conducted some micro analysis at several locations Micro-View • This presentation will focus on one location for clarity • A relatively recent and growing commercial/ industrial area in the western suburbs of Columbus • Contains diverse mix of employment types • However, due to small study area, results shown here should not be generalized, consider them as illustrative only Micro-View • The same area looks a bit different depending on the source • RefUSA data only obtained for a subarea • D&B data only obtained for 4+ employee employers Comparison Methodology • Obtained data for (mostly) the same area • Compared the employment records by address since no other common unique identifier • Combined this with detailed local knowledge and aerial imagery (study areas were selected based on analyst knowledge) • Necessary to determine when duplicate addresses are valid (office parks, suite’s, corporate vs. franchise and subsidiaries often have employee’s at same address) or when multiple occupants from different year’s are in data • Theoretical maximum employment for an address taken as the maximum valid employment from any of the sources (this is not necessarily the true value since that source may have over-stated the number) • LEHD not included in most comparison’s since it is aggregate data Comparison Methodology • Purchased data sources contain many duplicate businesses which need removed prior to comparison • More problematic for smaller employers Comparisons • After removal of duplicates, REFUSA and QCEW performed similarly for large employers, REFUSA had better coverage of small employers (includes some sole proprietors and commission employee’s not in QCEW) • D&B didn’t perform as well in this study area Harris one of the two versions of the D&B data purchased by ODOT, only had 20+ employee employers Combining Datasets • Employers included in purchased data and QCEW were nearly statistically independent • Given the 75% and 92% employer coverage in QCEW and Reference USA, one would expect 98% coverage by combining the sources (analyst could not identify any missing employers which implies 100% was obtained but there is certainly some margin of error) Number of Employers (4+ employees) by Source 140 120 100 D RD 80 R QRD 60 QD QR 40 Q 20 0 QCEW QCEW/REFUSA QCEW/D&B REFUSA Number of Employers if Only Use These Sourceas D&B Categorization • Categorization by industry was similar (89% same for same employers) Future Direction • Given these results and the desire to produce model datasets not subject to confidentiality constraints ODOT will purchase employment data and develop a process to: 1. 2. 3. 4. 5. Geocode Remove duplicates Cross match with previous year’s data Cross match with QCEW Develop an employment estimate for employer’s identified by QCEW rather than using value directly