Empirical Appendix The Supply Side of Innovation: H-1B Visa Reforms and US Ethnic Invention William R. Kerr Harvard Business School Boston MA William F. Lincoln University of Michigan Ann Arbor MI 1 Ethnic Inventor Contributions by City Total Patenting Share Atlanta, GA Austin, TX Baltimore, MD Boston, MA Buffalo, NY Charlotte, NC Chicago, IL Cincinnati, OH Cleveland, OH Columbus, OH Dallas-Fort Worth, TX Denver, CO Detroit, MI Greensboro-W.S., NC Hartford, CT Houston, TX Indianapolis, IN Jacksonville, NC Kansas City, MO Las Vegas, NV Los Angeles, CA Memphis, TN Miami, FL Milwaukee, WI Minneap.-St. Paul, MN Non-English Ethnic Patenting Share Indian and Chinese Patenting Share 19751984 19851994 19952004 20012006 (A) 19751984 19851994 19952004 20012006 (A) 19751984 19851994 19952004 20012006 (A) 0.6% 0.4% 0.8% 3.6% 0.6% 0.3% 6.0% 1.0% 2.3% 0.7% 1.6% 1.0% 3.1% 0.2% 0.9% 2.3% 0.8% 0.1% 0.4% 0.1% 6.6% 0.1% 0.8% 1.0% 1.9% 1.0% 0.9% 0.8% 3.8% 0.5% 0.3% 4.6% 1.1% 1.7% 0.5% 2.0% 1.2% 3.3% 0.3% 0.9% 2.5% 0.7% 0.1% 0.3% 0.1% 6.1% 0.2% 0.9% 0.9% 2.4% 1.3% 1.8% 0.7% 3.9% 0.4% 0.3% 3.5% 1.0% 1.3% 0.5% 2.3% 1.3% 2.9% 0.3% 0.6% 1.9% 0.7% 0.1% 0.4% 0.2% 6.0% 0.2% 0.7% 0.8% 2.7% 1.5% 2.0% 0.7% 4.6% 0.3% 0.3% 3.2% 1.0% 1.1% 0.4% 2.1% 1.3% 2.8% 0.2% 0.6% 2.0% 0.5% 0.1% 0.3% 0.3% 5.7% 0.3% 0.7% 0.7% 2.8% 0.3% 0.5% 0.7% 3.9% 0.8% 0.2% 6.9% 0.9% 2.5% 0.6% 1.1% 0.8% 3.1% 0.1% 1.0% 1.8% 0.6% 0.1% 0.2% 0.1% 7.2% 0.1% 1.0% 0.8% 1.6% 0.7% 1.2% 0.7% 4.2% 0.6% 0.2% 5.0% 0.9% 1.5% 0.6% 1.9% 1.0% 3.1% 0.2% 0.8% 2.3% 0.4% 0.1% 0.2% 0.1% 7.2% 0.1% 1.3% 0.8% 2.0% 1.0% 1.9% 0.6% 4.1% 0.4% 0.2% 3.5% 0.7% 1.0% 0.4% 2.3% 0.9% 2.6% 0.2% 0.5% 1.8% 0.4% 0.1% 0.2% 0.2% 7.9% 0.1% 1.0% 0.6% 2.0% 1.1% 2.0% 0.5% 4.8% 0.3% 0.2% 3.0% 0.7% 0.8% 0.3% 2.2% 0.8% 2.6% 0.1% 0.5% 1.9% 0.3% 0.1% 0.2% 0.2% 7.3% 0.2% 0.9% 0.5% 2.0% 0.3% 0.4% 0.4% 4.0% 1.1% 0.1% 5.6% 0.7% 2.5% 0.8% 1.5% 0.8% 3.2% 0.2% 0.8% 2.2% 0.7% 0.1% 0.2% 0.0% 6.7% 0.1% 0.5% 0.5% 1.5% 0.7% 1.6% 0.5% 4.0% 0.7% 0.2% 3.9% 1.0% 1.4% 0.7% 2.4% 1.0% 2.8% 0.2% 0.6% 2.8% 0.5% 0.1% 0.1% 0.1% 6.9% 0.1% 0.6% 0.4% 1.7% 1.0% 2.3% 0.6% 3.6% 0.4% 0.1% 2.9% 0.6% 0.9% 0.3% 2.9% 0.6% 2.5% 0.1% 0.3% 1.8% 0.4% 0.1% 0.2% 0.1% 7.5% 0.1% 0.5% 0.5% 1.7% 1.2% 2.3% 0.5% 4.3% 0.3% 0.2% 2.8% 0.6% 0.6% 0.3% 2.8% 0.5% 2.5% 0.1% 0.4% 1.9% 0.3% 0.1% 0.2% 0.1% 7.0% 0.1% 0.4% 0.4% 1.8% Ethnic Inventor Contributions by City, continued Total Patenting Share non-English Ethnic Patenting Share Indian and Chinese Patenting Share 19751984 19851994 19952004 20012006 (A) 19751984 19851994 19952004 20012006 (A) 19751984 19851994 19952004 20012006 (A) Nashville, TN New Orleans, LA New York, NY Norfolk-VA Beach, VA Orlando, FL Philadelphia, PA Phoenix, AZ Pittsburgh, PA Portland, OR Providence, RI Raleigh-Durham, NC Richmond, VA Sacramento, CA Salt Lake City, UT San Antonio, TX San Diego, CA San Francisco, CA Seattle, WA St. Louis, MO Tallahassee, FL Washington, DC West Palm Beach, FL 0.1% 0.3% 11.5% 0.2% 0.2% 4.6% 1.0% 2.0% 0.5% 0.3% 0.3% 0.3% 0.2% 0.4% 0.1% 1.1% 4.8% 0.9% 1.0% 0.4% 1.5% 0.3% 0.2% 0.2% 8.9% 0.2% 0.3% 4.0% 1.2% 1.3% 0.8% 0.3% 0.6% 0.3% 0.4% 0.5% 0.2% 1.6% 6.6% 1.3% 0.9% 0.5% 1.5% 0.5% 0.2% 0.2% 7.3% 0.2% 0.3% 2.7% 1.4% 0.8% 1.4% 0.3% 1.1% 0.2% 0.5% 0.6% 0.2% 2.2% 12.1% 1.9% 0.8% 0.4% 1.4% 0.4% 0.2% 0.1% 6.9% 0.1% 0.3% 2.8% 1.3% 0.7% 1.6% 0.2% 1.5% 0.2% 0.5% 0.6% 0.2% 2.8% 13.2% 3.4% 0.8% 0.4% 1.6% 0.4% 0.0% 0.3% 16.6% 0.1% 0.1% 5.6% 0.6% 2.2% 0.3% 0.3% 0.3% 0.3% 0.2% 0.2% 0.1% 1.1% 6.2% 0.8% 0.9% 0.3% 1.6% 0.3% 0.1% 0.3% 13.1% 0.1% 0.2% 4.9% 1.1% 1.4% 0.6% 0.4% 0.6% 0.3% 0.4% 0.4% 0.2% 1.6% 9.3% 1.1% 0.8% 0.4% 1.6% 0.5% 0.1% 0.1% 10.1% 0.1% 0.3% 2.8% 1.3% 0.6% 1.4% 0.3% 1.0% 0.2% 0.5% 0.3% 0.2% 2.6% 19.3% 1.8% 0.8% 0.3% 1.5% 0.4% 0.1% 0.1% 8.9% 0.1% 0.3% 2.9% 1.2% 0.5% 1.6% 0.2% 1.3% 0.2% 0.5% 0.3% 0.2% 3.6% 19.9% 3.5% 0.7% 0.3% 1.7% 0.4% 0.1% 0.2% 16.6% 0.1% 0.1% 6.2% 0.4% 2.2% 0.2% 0.2% 0.3% 0.3% 0.2% 0.2% 0.2% 0.8% 8.4% 0.6% 1.0% 0.2% 1.6% 0.3% 0.1% 0.2% 13.3% 0.1% 0.2% 5.8% 1.0% 1.3% 0.6% 0.3% 0.8% 0.4% 0.3% 0.3% 0.1% 1.4% 13.0% 1.0% 0.8% 0.2% 1.7% 0.3% 0.1% 0.0% 9.7% 0.1% 0.3% 2.8% 1.4% 0.5% 1.7% 0.2% 1.0% 0.2% 0.5% 0.3% 0.1% 2.4% 25.4% 1.8% 0.4% 0.2% 1.5% 0.2% 0.1% 0.0% 9.0% 0.1% 0.3% 3.0% 1.3% 0.5% 2.0% 0.2% 1.2% 0.2% 0.5% 0.3% 0.1% 3.9% 24.0% 3.7% 0.4% 0.2% 1.7% 0.2% Other 234 Major Cities Not in a Major City 21.8% 9.0% 22.3% 8.2% 20.7% 6.6% 18.4% 6.2% 18.1% 6.3% 18.1% 5.4% 15.6% 3.7% 13.6% 4.1% 19.7% 5.2% 18.2% 3.8% 14.6% 2.5% 12.7% 2.7% Notes: See Table 1. The first three columns of each grouping are for granted patents. The fourth column, marked with (A), is for published patent applications. Univariate Regressions for Table 2 City & Year Fixed Effects (1) Log English Patenting (1) plus (2) plus (2) plus (2) plus Expected State-Yr Population Dropping Patenting Fixed Weights Largest Trends Effects 20% (2) (3) (4) (5) City & Year Fixed Effects (6) Log Total Patenting (6) plus (7) plus (7) plus (7) plus Expected State-Yr Population Dropping Patenting Fixed Weights Largest Trends Effects 20% (7) (8) (9) (10) Table Documents 50 Univariate Regressions with Separate Ethnic Patenting Measures Log Indian Patenting 0.105 (0.021) 0.078 (0.017) 0.050 (0.019) 0.098 (0.021) 0.085 (0.020) 0.155 (0.021) 0.128 (0.017) 0.103 (0.019) 0.150 (0.024) 0.141 (0.019) Log Chinese Patenting 0.109 (0.023) 0.084 (0.019) 0.072 (0.021) 0.100 (0.020) 0.083 (0.022) 0.163 (0.021) 0.138 (0.018) 0.135 (0.019) 0.155 (0.020) 0.144 (0.020) Log European Patenting 0.151 (0.027) 0.107 (0.023) 0.076 (0.021) 0.141 (0.024) 0.097 (0.025) 0.229 (0.023) 0.185 (0.019) 0.163 (0.019) 0.212 (0.022) 0.180 (0.021) Log Hispanic Patenting 0.133 (0.023) 0.102 (0.017) 0.076 (0.019) 0.116 (0.021) 0.107 (0.019) 0.192 (0.021) 0.160 (0.016) 0.142 (0.018) 0.171 (0.023) 0.173 (0.018) Log Russian Patenting 0.080 (0.024) 0.074 (0.018) 0.065 (0.020) 0.080 (0.021) 0.078 (0.022) 0.112 (0.024) 0.106 (0.019) 0.100 (0.020) 0.118 (0.023) 0.114 (0.023) Notes: See Table 2. Further Robustness Checks on Table 2 City & Year Fixed Effects Log Indian and Chinese Patenting Observations Log English Patenting (1) with (1) plus (1) plus Grants Appl. Dropping 1995-2001 2001-2006 Zero Only Only Counts (1) plus Dropping West Coast City & Year Fixed Effects Log Total Patenting (1) with (1) plus (1) plus Grants Appl. Dropping 1995-2001 2001-2006 Zero Only Only Counts (1) plus Dropping West Coast (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) 0.137 (0.024) 0.149 (0.029) 0.106 (0.025) 0.180 (0.025) 0.139 (0.025) 0.211 (0.022) 0.237 (0.025) 0.196 (0.021) 0.260 (0.022) 0.211 (0.023) 3372 2248 1686 2022 3036 3372 2248 1686 2022 3036 Notes: See Table 2. First-Differenced Regressions of Table 2 Year Fixed Effects Δ Log Indian and Chinese Patenting Notes: See Table 2. Δ Log English Patenting (1) plus (2) plus (2) plus (2) plus Expected State-Yr Population Dropping Patenting Fixed Weights Largest Trends Effects 20% Year Fixed Effects Δ Log Total Patenting (6) plus (7) plus (7) plus (7) plus Expected State-Yr Population Dropping Patenting Fixed Weights Largest Trends Effects 20% (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) 0.079 (0.015) 0.056 (0.015) 0.049 (0.017) 0.050 (0.012) 0.057 (0.017) 0.167 (0.015) 0.144 (0.014) 0.139 (0.016) 0.128 (0.012) 0.149 (0.016) Dependencies on H-1B Program for Major Patenting Cities LCA-Based Dependency City's 2001-2002 LCA Filings for H-1B Visas Per Capita (x1000) (LCA Continued) Census-Based Dependency City's 1990 Non-Citizen Immigrant SE Workforce per Capita (x1000) (1) San Francisco, CA Miami, FL Washington, DC Raleigh-Durham, NC Boston, MA Austin, TX New York, NY Atlanta, GA Dallas-Fort Worth, TX Houston, TX Seattle, WA San Diego, CA Los Angeles, CA West Palm Beach, FL Detroit, MI Denver, CO Chicago, IL Orlando, FL Columbus, OH Philadelphia, PA Richmond, VA Hartford, CT Minneapolis-St. Paul, MN Portland, OR (Census Continued) (2) 8.323 5.502 5.430 5.220 5.149 4.897 4.777 4.116 3.943 3.712 3.393 3.021 2.811 2.744 2.729 2.407 2.372 2.343 2.116 2.112 2.108 2.010 1.852 1.708 Kansas City, MO Charlotte, SC Indianapolis, IN Baltimore, MD Phoenix, AZ Memphis, TN Sacramento, CA Las Vegas, NV Pittsburgh, PA Jacksonville, NC Cincinnati, OH Tallahassee, FL St. Louis, MO Milwaukee, WI Providence, RI Nashville, TN Cleveland, OH Salt Lake City, UT New Orleans, LA San Antonio, TX Greensboro-W. Salem, NC Buffalo, NY Norfolk-VA Beach, VA 1.697 1.649 1.620 1.612 1.580 1.561 1.490 1.462 1.438 1.266 1.224 1.211 1.203 1.170 1.158 1.136 1.134 1.058 0.977 0.877 0.859 0.703 0.536 San Francisco, CA Washington, DC Boston, MA Raleigh-Durham, NC Los Angeles, CA New York, NY Houston, TX San Diego, CA Austin, TX Detroit, MI Miami, FL Dallas-Fort Worth, TX Philadelphia, PA Columbus, OH Seattle, WA Hartford, CT Atlanta, GA West Palm Beach, FL Chicago, IL Sacramento, CA Salt Lake City, UT Portland, OR Phoenix, AZ Pittsburgh, PA Notes: See Table 3. Table presents largest dependency values on the H-1B program among major patenting cities. 5.096 3.168 3.129 2.723 2.288 2.185 2.156 2.040 1.770 1.545 1.517 1.442 1.423 1.411 1.340 1.212 1.185 1.147 1.145 1.107 1.021 0.983 0.975 0.904 Richmond, VA Cleveland, OH Denver, CO Buffalo, NY Orlando, FL New Orleans, LA Charlotte, SC Milwaukee, WI Cincinnati, OH Baltimore, MD Memphis, TN Indianapolis, IN Minneapolis-St. Paul, MN St. Louis, MO Greensboro-W. Salem, NC Nashville, TN Kansas City, MO Norfolk-VA Beach, VA Tallahassee, FL Providence, RI San Antonio, TX Jacksonville, NC Las Vegas, NV 0.887 0.860 0.791 0.770 0.757 0.751 0.749 0.741 0.722 0.700 0.615 0.600 0.600 0.541 0.496 0.495 0.489 0.356 0.326 0.272 0.264 0.242 0.154 City-Year Regressions with LCA-Based Dependency Log Indian Patenting Log Chinese Patenting Log Other Patenting Log English Patenting Log Total Patenting A. Base Regression with City and Year Fixed Effects Log National H-1B Population x Third Dependency Quintile [LCA] 0.313 (0.087) 0.311 (0.095) 0.305 (0.106) -0.010 (0.101) 0.037 (0.107) Log National H-1B Population x Second Dependency Quintile [LCA] 0.623 (0.090) 0.741 (0.108) 0.461 (0.096) 0.050 (0.087) 0.078 (0.083) Log National H-1B Population x Most Dependent Quintile [LCA] 0.982 (0.078) 1.179 (0.091) 0.593 (0.092) 0.109 (0.086) 0.172 (0.086) B. Substituting Six-Year Cap Summation for H-1B Population Log H-1B Cap Summation x Third Dependency Quintile [LCA] 0.361 (0.098) 0.318 (0.102) 0.304 (0.111) -0.026 (0.107) 0.023 (0.111) Log H-1B Cap Summation x Second Dependency Quintile [LCA] 0.661 (0.095) 0.810 (0.115) 0.480 (0.105) 0.039 (0.091) 0.072 (0.089) Log H-1B Cap Summation x Most Dependent Quintile [LCA] 1.057 (0.085) 1.198 (0.099) 0.630 (0.100) 0.092 (0.091) 0.163 (0.091) C. Including State-Year Fixed Effects Log National H-1B Population x Third Dependency Quintile [LCA] 0.295 (0.094) 0.267 (0.110) 0.182 (0.104) -0.057 (0.090) -0.006 (0.094) Log National H-1B Population x Second Dependency Quintile [LCA] 0.606 (0.107) 0.661 (0.118) 0.353 (0.092) -0.040 (0.079) 0.002 (0.081) Log National H-1B Population x Most Dependent Quintile [LCA] 0.949 (0.085) 1.176 (0.105) 0.479 (0.093) 0.036 (0.085) 0.113 (0.083) D. Dropping West Coast (Census Region 9) Log National H-1B Population x Third Dependency Quintile [LCA] 0.275 (0.088) 0.318 (0.102) 0.330 (0.110) -0.004 (0.110) 0.049 (0.116) Log National H-1B Population x Second Dependency Quintile [LCA] 0.632 (0.095) 0.733 (0.113) 0.489 (0.095) 0.050 (0.088) 0.092 (0.086) Log National H-1B Population x Most Dependent Quintile [LCA] 0.917 (0.080) 1.143 (0.098) 0.619 (0.098) 0.126 (0.094) 0.185 (0.094) E. Dropping The 20 Most Dependent Cities Log National H-1B Population x Third Dependency Quintile [LCA] 0.313 (0.087) 0.311 (0.095) 0.305 (0.106) -0.010 (0.101) 0.037 (0.107) Log National H-1B Population x Second Dependency Quintile [LCA] 0.623 (0.090) 0.741 (0.108) 0.461 (0.096) 0.050 (0.087) 0.078 (0.083) Log National H-1B Population x Most Dependent Quintile [LCA] 0.967 (0.094) 1.206 (0.106) 0.615 (0.105) 0.140 (0.097) 0.201 (0.096) Notes: See Table 4A. City-Year Regressions with LCA-Based Dependency and Ethnic Tech Trends Log Indian Patenting Log Chinese Patenting Log Other Patenting Log English Patenting Log Total Patenting A. Base Regression with City and Year Fixed Effects Log National H-1B Population x Third Dependency Quintile [LCA] 0.172 (0.088) 0.154 (0.086) 0.210 (0.102) 0.043 (0.074) 0.103 (0.077) Log National H-1B Population x Second Dependency Quintile [LCA] 0.362 (0.089) 0.455 (0.113) 0.285 (0.097) 0.105 (0.077) 0.162 (0.075) Log National H-1B Population x Most Dependent Quintile [LCA] 0.552 (0.098) 0.714 (0.106) 0.338 (0.098) 0.164 (0.081) 0.266 (0.079) B. Substituting Six-Year Cap Summation for H-1B Population Log H-1B Cap Summation x Third Dependency Quintile [LCA] 0.219 (0.097) 0.152 (0.092) 0.210 (0.107) 0.022 (0.080) 0.082 (0.081) Log H-1B Cap Summation x Second Dependency Quintile [LCA] 0.390 (0.093) 0.498 (0.118) 0.295 (0.106) 0.087 (0.082) 0.149 (0.083) Log H-1B Cap Summation x Most Dependent Quintile [LCA] 0.608 (0.102) 0.686 (0.109) 0.351 (0.102) 0.128 (0.084) 0.237 (0.082) C. Including State-Year Fixed Effects Log National H-1B Population x Third Dependency Quintile [LCA] 0.142 (0.092) 0.138 (0.107) 0.128 (0.104) 0.022 (0.074) 0.077 (0.075) Log National H-1B Population x Second Dependency Quintile [LCA] 0.322 (0.100) 0.421 (0.119) 0.201 (0.099) 0.054 (0.080) 0.114 (0.084) Log National H-1B Population x Most Dependent Quintile [LCA] 0.468 (0.100) 0.785 (0.117) 0.274 (0.096) 0.121 (0.088) 0.218 (0.084) D. Dropping West Coast (Census Region 9) Log National H-1B Population x Third Dependency Quintile [LCA] 0.123 (0.084) 0.154 (0.092) 0.223 (0.104) 0.041 (0.081) 0.111 (0.084) Log National H-1B Population x Second Dependency Quintile [LCA] 0.360 (0.094) 0.442 (0.117) 0.317 (0.099) 0.108 (0.082) 0.180 (0.081) Log National H-1B Population x Most Dependent Quintile [LCA] 0.476 (0.101) 0.673 (0.112) 0.357 (0.101) 0.162 (0.087) 0.264 (0.084) E. Dropping The 20 Most Dependent Cities Log National H-1B Population x Third Dependency Quintile [LCA] 0.172 (0.088) 0.155 (0.086) 0.211 (0.102) 0.044 (0.074) 0.104 (0.077) Log National H-1B Population x Second Dependency Quintile [LCA] 0.361 (0.089) 0.456 (0.113) 0.286 (0.097) 0.106 (0.078) 0.164 (0.076) Log National H-1B Population x Most Dependent Quintile [LCA] 0.550 (0.109) 0.756 (0.115) 0.357 (0.107) 0.180 (0.086) 0.281 (0.083) Notes: See Table 4B. Regressions include unreported ethnic-specific technology trends. City-Year Regressions with Census-Based Dependency Log Indian Patenting Log Chinese Patenting Log Other Patenting Log English Patenting Log Total Patenting A. Base Regression with City and Year Fixed Effects Log National H-1B Population x Third Dependency Quintile [Census] 0.207 (0.104) 0.569 (0.123) 0.134 (0.109) 0.048 (0.097) 0.064 (0.099) Log National H-1B Population x Second Dependency Quintile [Census] 0.398 (0.096) 0.489 (0.115) 0.285 (0.103) 0.064 (0.100) 0.080 (0.098) Log National H-1B Population x Most Dependent Quintile [Census] 0.550 (0.097) 0.718 (0.109) 0.215 (0.101) -0.019 (0.081) 0.029 (0.083) B. Substituting Six-Year Cap Summation for H-1B Population Log H-1B Cap Summation x Third Dependency Quintile [Census] 0.240 (0.115) 0.610 (0.133) 0.103 (0.117) 0.019 (0.101) 0.038 (0.102) Log H-1B Cap Summation x Second Dependency Quintile [Census] 0.418 (0.102) 0.495 (0.118) 0.266 (0.114) 0.031 (0.105) 0.043 (0.103) Log H-1B Cap Summation x Most Dependent Quintile [Census] 0.593 (0.106) 0.782 (0.115) 0.211 (0.106) -0.037 (0.086) 0.013 (0.090) C. Including State-Year Fixed Effects Log National H-1B Population x Third Dependency Quintile [Census] 0.203 (0.114) 0.498 (0.130) 0.081 (0.115) 0.076 (0.088) 0.090 (0.087) Log National H-1B Population x Second Dependency Quintile [Census] 0.393 (0.111) 0.422 (0.135) 0.155 (0.109) 0.052 (0.084) 0.080 (0.087) Log National H-1B Population x Most Dependent Quintile [Census] 0.619 (0.104) 0.733 (0.119) 0.184 (0.095) -0.008 (0.079) 0.048 (0.080) D. Dropping West Coast (Census Region 9) Log National H-1B Population x Third Dependency Quintile [Census] 0.209 (0.111) 0.564 (0.132) 0.132 (0.112) 0.051 (0.104) 0.071 (0.106) Log National H-1B Population x Second Dependency Quintile [Census] 0.406 (0.099) 0.486 (0.121) 0.319 (0.103) 0.063 (0.102) 0.098 (0.102) Log National H-1B Population x Most Dependent Quintile [Census] 0.495 (0.097) 0.668 (0.117) 0.225 (0.109) -0.029 (0.089) 0.019 (0.092) E. Dropping The 20 Most Dependent Cities Log National H-1B Population x Third Dependency Quintile [Census] 0.207 (0.104) 0.569 (0.123) 0.134 (0.109) 0.048 (0.097) 0.064 (0.099) Log National H-1B Population x Second Dependency Quintile [Census] 0.398 (0.096) 0.489 (0.115) 0.285 (0.103) 0.064 (0.100) 0.080 (0.098) Log National H-1B Population x Most Dependent Quintile [Census] 0.461 (0.112) 0.647 (0.129) 0.176 (0.115) -0.019 (0.092) 0.026 (0.095) Notes: See Table 4A. City-Year Regressions with Census-Based Dependency and Ethnic Tech Trends Log Indian Patenting Log Chinese Patenting Log Other Patenting Log English Patenting Log Total Patenting A. Base Regression with City and Year Fixed Effects Log National H-1B Population x Third Dependency Quintile [Census] 0.067 (0.087) 0.412 (0.106) 0.049 (0.098) 0.014 (0.081) 0.027 (0.082) Log National H-1B Population x Second Dependency Quintile [Census] 0.105 (0.085) 0.162 (0.101) 0.112 (0.095) 0.056 (0.076) 0.079 (0.072) Log National H-1B Population x Most Dependent Quintile [Census] 0.139 (0.080) 0.256 (0.089) -0.014 (0.092) 0.006 (0.072) 0.059 (0.073) B. Substituting Six-Year Cap Summation for H-1B Population Log H-1B Cap Summation x Third Dependency Quintile [Census] 0.095 (0.096) 0.447 (0.119) 0.016 (0.106) -0.004 (0.085) 0.013 (0.085) Log H-1B Cap Summation x Second Dependency Quintile [Census] 0.133 (0.092) 0.176 (0.104) 0.101 (0.107) 0.049 (0.081) 0.068 (0.078) Log H-1B Cap Summation x Most Dependent Quintile [Census] 0.180 (0.087) 0.318 (0.092) -0.024 (0.095) -0.002 (0.075) 0.053 (0.078) C. Including State-Year Fixed Effects Log National H-1B Population x Third Dependency Quintile [Census] 0.074 (0.102) 0.372 (0.122) 0.020 (0.110) 0.037 (0.073) 0.043 (0.073) Log National H-1B Population x Second Dependency Quintile [Census] 0.150 (0.092) 0.182 (0.118) 0.051 (0.102) 0.046 (0.075) 0.069 (0.075) Log National H-1B Population x Most Dependent Quintile [Census] 0.215 (0.090) 0.331 (0.105) 0.013 (0.094) 0.037 (0.074) 0.085 (0.073) D. Dropping West Coast (Census Region 9) Log National H-1B Population x Third Dependency Quintile [Census] 0.083 (0.092) 0.421 (0.111) 0.057 (0.099) 0.020 (0.088) 0.040 (0.088) Log National H-1B Population x Second Dependency Quintile [Census] 0.115 (0.089) 0.157 (0.107) 0.155 (0.098) 0.055 (0.080) 0.097 (0.076) Log National H-1B Population x Most Dependent Quintile [Census] 0.101 (0.081) 0.220 (0.095) 0.002 (0.097) -0.006 (0.077) 0.051 (0.079) E. Dropping The 20 Most Dependent Cities Log National H-1B Population x Third Dependency Quintile [Census] 0.068 (0.087) 0.413 (0.106) 0.050 (0.098) 0.014 (0.082) 0.029 (0.082) Log National H-1B Population x Second Dependency Quintile [Census] 0.108 (0.086) 0.163 (0.102) 0.113 (0.096) 0.057 (0.076) 0.081 (0.073) Log National H-1B Population x Most Dependent Quintile [Census] 0.103 (0.086) 0.244 (0.098) -0.028 (0.100) 0.003 (0.077) 0.053 (0.079) Notes: See Table 4B. Regressions include unreported ethnic-specific technology trends. City-Year Regressions in First-Differenced Specifications Δ Log Indian Patenting Δ Log Chinese Patenting Δ Log Other Patenting Δ Log English Patenting Δ Log Total Patenting A. LCA-Based Dependency Δ Log National H-1B Population x Third Dependency Quintile [LCA] 0.007 (0.080) 0.141 (0.102) 0.171 (0.142) 0.122 (0.128) 0.174 (0.136) Δ Log National H-1B Population x Second Dependency Quintile [LCA] 0.549 (0.100) 0.437 (0.114) 0.237 (0.130) 0.106 (0.104) 0.123 (0.097) Δ Log National H-1B Population x Most Dependent Quintile [LCA] 0.511 (0.100) 0.810 (0.102) 0.301 (0.116) 0.149 (0.107) 0.171 (0.099) B. LCA-Based Dependency and Ethnic-Specific Technology Trends Δ Log National H-1B Population x Third Dependency Quintile [LCA] -0.098 (0.088) 0.060 (0.102) 0.083 (0.138) 0.136 (0.096) 0.194 (0.097) Δ Log National H-1B Population x Second Dependency Quintile [LCA] 0.360 (0.102) 0.282 (0.111) 0.081 (0.129) 0.152 (0.092) 0.187 (0.084) Δ Log National H-1B Population x Most Dependent Quintile [LCA] 0.227 (0.118) 0.586 (0.110) 0.095 (0.122) 0.204 (0.096) 0.251 (0.086) C. Census-Based Dependency Δ Log National H-1B Population x Third Dependency Quintile [Census] -0.017 (0.083) 0.258 (0.111) 0.047 (0.149) 0.025 (0.117) 0.020 (0.114) Δ Log National H-1B Population x Second Dependency Quintile [Census] 0.305 (0.102) 0.336 (0.119) 0.082 (0.127) 0.236 (0.124) 0.189 (0.123) Δ Log National H-1B Population x Most Dependent Quintile [Census] 0.320 (0.108) 0.390 (0.110) 0.124 (0.116) -0.002 (0.099) 0.021 (0.097) D. Census-Based Dependency and Ethnic-Specific Technology Trends Δ Log National H-1B Population x Third Dependency Quintile [Census] -0.092 (0.085) 0.201 (0.108) 0.009 (0.143) 0.065 (0.097) 0.064 (0.092) Δ Log National H-1B Population x Second Dependency Quintile [Census] 0.137 (0.103) 0.204 (0.113) -0.021 (0.126) 0.219 (0.093) 0.182 (0.086) Δ Log National H-1B Population x Most Dependent Quintile [Census] 0.108 (0.114) 0.223 (0.107) 0.005 (0.113) 0.046 (0.087) 0.083 (0.082) Notes: See Tables 4A and 4B. Regressions in Panels B and D include unreported ethnic-specific technology trends. City-Year Regressions Comparing Granted Patents and Patent Applications Log Indian Patenting Log Chinese Patenting Log Other Patenting Log English Patenting Log Total Patenting A. Base Regression, 1995-2006 Log National H-1B Population x Third Dependency Quintile [LCA] 0.313 (0.087) 0.311 (0.095) 0.305 (0.106) -0.010 (0.101) 0.037 (0.107) Log National H-1B Population x Second Dependency Quintile [LCA] 0.623 (0.090) 0.741 (0.108) 0.461 (0.096) 0.050 (0.087) 0.078 (0.083) Log National H-1B Population x Most Dependent Quintile [LCA] 0.982 (0.078) 1.179 (0.091) 0.593 (0.092) 0.109 (0.086) 0.172 (0.086) B. Employing Granted Patents Only, 1995-2002 Log National H-1B Population x Third Dependency Quintile [LCA] -0.057 (0.084) 0.173 (0.093) 0.164 (0.131) 0.003 (0.114) 0.070 (0.116) Log National H-1B Population x Second Dependency Quintile [LCA] 0.252 (0.099) 0.191 (0.110) 0.191 (0.113) 0.111 (0.105) 0.130 (0.098) Log National H-1B Population x Most Dependent Quintile [LCA] 0.391 (0.086) 0.694 (0.108) 0.227 (0.090) 0.135 (0.097) 0.191 (0.095) C. Employing Granted Patents + non-Overlapping Applications After 2004 Log National H-1B Population x Third Dependency Quintile [LCA] 0.217 (0.080) 0.233 (0.086) 0.222 (0.100) -0.012 (0.095) 0.040 (0.100) Log National H-1B Population x Second Dependency Quintile [LCA] 0.471 (0.078) 0.568 (0.096) 0.338 (0.088) 0.067 (0.084) 0.091 (0.080) Log National H-1B Population x Most Dependent Quintile [LCA] 0.754 (0.073) 0.942 (0.085) 0.422 (0.086) 0.094 (0.079) 0.156 (0.078) Notes: See Table 4A. Rows show LCA regression results with different data cuts. The core results employ granted patents from 1995-2006 and non-overlapping patents from 2001-2006. Panel B uses only granted patents from 1995-2002. Panel C uses granted patents plus non-overlapping patents from 2004-2006, years in which granted patents are weakest due to review lags. Similar patterns are evident in all panels. We have also confirmed similar results to Panel B when dropping computer-related patent grants. The greater explanatory power for Indian and Chinese patents in Panel A is partly due to modeling both the rise and decline in H-1B population growth and caps exhibited in Figure 4. This requires well measured data extending 1995-2006. Breaking the 1995-2006 sample period results in more monotonic H-1B trends that are less separable from aggregate effects. City-Year Regressions Removing 307 Top Patenting Firms Log Indian Patenting Log Chinese Patenting Log Other Patenting Log English Patenting Log Total Patenting A. LCA-Based Dependency Log National H-1B Population x Third Dependency Quintile [LCA] 0.244 (0.077) 0.335 (0.099) 0.379 (0.116) -0.016 (0.100) 0.033 (0.104) Log National H-1B Population x Second Dependency Quintile [LCA] 0.719 (0.111) 0.851 (0.126) 0.618 (0.114) 0.072 (0.094) 0.109 (0.092) Log National H-1B Population x Most Dependent Quintile [LCA] 1.377 (0.119) 1.518 (0.115) 0.878 (0.107) 0.305 (0.090) 0.365 (0.091) B. LCA-Based Dependency and Ethnic-Specific Technology Trends Log National H-1B Population x Third Dependency Quintile [LCA] 0.222 (0.075) 0.310 (0.097) 0.341 (0.114) -0.004 (0.101) 0.045 (0.105) Log National H-1B Population x Second Dependency Quintile [LCA] 0.670 (0.107) 0.792 (0.121) 0.572 (0.115) 0.087 (0.095) 0.126 (0.093) Log National H-1B Population x Most Dependent Quintile [LCA] 1.286 (0.121) 1.390 (0.123) 0.806 (0.114) 0.309 (0.094) 0.380 (0.097) C. Census-Based Dependency Log National H-1B Population x Third Dependency Quintile [Census] 0.196 (0.102) 0.451 (0.120) 0.172 (0.123) 0.002 (0.095) 0.030 (0.096) Log National H-1B Population x Second Dependency Quintile [Census] 0.600 (0.134) 0.713 (0.151) 0.468 (0.121) 0.166 (0.106) 0.185 (0.105) Log National H-1B Population x Most Dependent Quintile [Census] 0.898 (0.134) 0.993 (0.138) 0.517 (0.119) 0.204 (0.088) 0.245 (0.092) D. Census-Based Dependency and Ethnic-Specific Technology Trends Log National H-1B Population x Third Dependency Quintile [Census] 0.171 (0.093) 0.424 (0.108) 0.149 (0.115) 0.006 (0.095) 0.033 (0.096) Log National H-1B Population x Second Dependency Quintile [Census] 0.532 (0.128) 0.638 (0.144) 0.413 (0.119) 0.170 (0.107) 0.189 (0.106) Log National H-1B Population x Most Dependent Quintile [Census] 0.777 (0.128) 0.850 (0.132) 0.441 (0.116) 0.194 (0.089) 0.237 (0.094) Notes: See Tables 4A and 4B. Regressions in Panels B and D include unreported ethnic-specific technology trends. These estimations exclude patents associated with the top firm panel. This panel is comprised of the most dependent LCA firms and the largest US patenters. City-Year Regressions with Population Quintiles or Citizen Immigrant Trends Log Indian Patenting Log Chinese Patenting Log Other Patenting Log English Patenting Log Total Patenting A. Testing Against Population Quintiles Log National H-1B Population x Third Dependency Quintile [LCA] 0.252 (0.085) 0.255 (0.091) 0.255 (0.105) 0.008 (0.103) 0.064 (0.109) Log National H-1B Population x Second Dependency Quintile [LCA] 0.436 (0.087) 0.607 (0.123) 0.385 (0.102) 0.099 (0.088) 0.144 (0.085) Log National H-1B Population x Most Dependent Quintile [LCA] 0.715 (0.103) 1.033 (0.120) 0.564 (0.124) 0.169 (0.110) 0.246 (0.107) Log National H-1B Population x Third Population Quintile 0.110 (0.076) 0.147 (0.101) 0.190 (0.096) 0.104 (0.103) 0.069 (0.105) Log National H-1B Population x Second Population Quintile 0.133 (0.076) 0.234 (0.107) 0.299 (0.108) -0.109 (0.086) -0.155 (0.088) Log National H-1B Population x Largest Population Quintile 0.532 (0.113) 0.337 (0.117) 0.147 (0.114) -0.098 (0.100) -0.140 (0.097) B. Test Against US Citizen Immigrant SEs in US Log National H-1B Population x Third Dependency Quintile [LCA] 0.305 (0.097) 0.313 (0.120) 0.296 (0.125) -0.084 (0.121) -0.018 (0.125) Log National H-1B Population x Second Dependency Quintile [LCA] 0.580 (0.101) 0.757 (0.112) 0.481 (0.128) -0.029 (0.099) 0.017 (0.101) Log National H-1B Population x Most Dependent Quintile [LCA] 0.855 (0.090) 1.005 (0.108) 0.605 (0.113) 0.033 (0.102) 0.105 (0.101) Log US Citizen Immigrant SEs x Third Dependency Quintile [LCA] 0.018 (0.104) -0.003 (0.135) 0.021 (0.161) 0.179 (0.168) 0.133 (0.166) Log US Citizen Immigrant SEs x Second Dependency Quintile [LCA] 0.105 (0.141) -0.041 (0.165) -0.049 (0.160) 0.194 (0.145) 0.149 (0.145) Log US Citizen Immigrant SEs x Most Dependent Quintile [LCA] 0.310 (0.128) 0.425 (0.141) -0.030 (0.136) 0.186 (0.145) 0.162 (0.139) Notes: See Tables 4A and 4B. Data Appendix The Supply Side of Innovation: H-1B Visa Reforms and US Ethnic Invention William R. Kerr Harvard Business School Boston MA William F. Lincoln University of Michigan Ann Arbor MI 1 1 Introduction This Data Appendix gives an extensive description of the di¤erent sources of our data and how we combined them together to perform our analyses at both the city and …rm levels. We focus our discussions on data that is less commonly used or is particular to the H-1B program. We refer readers to other sources for the commonly used data sets that we utilize, such as Compustat. Section 2 details the data on Canada that we use and lays out how we identi…ed MSA-like metropolitan areas in Canada. In Section 3 we overview the L-1 and TN visas, which are the most likely substitutes for the H-1B. In Section 4 we describe the LCA data that we used in both the labor market and …rm-level analyses. Section 5 details the general methodology by which we constructed our …rm panel. The …nal section provides more detail about particular decisions that were made with regard to individual …rms in the process of constructing the …rm panel. For details regarding the construction of the ethnic patenting data set, we refer readers to Kerr (2007). 2 Canadian Analysis To conduct our international analyses accurately, we needed to identify metropolitan areas in Canada in the same way in which we identi…ed them in the United States. Fortunately, the way that Canada classi…es metropolitan areas is quite similar to the way in which the United States classi…es them. In Canada, however, these areas are split up into two types: Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). CMAs consist of population centers with at least 100,000 people in what is called the “urban core.”CAs are de…ned similarly but only have urban core populations of 10,000 to 99,999. Metropolitan Statistical Areas in the United States are similarly de…ned, with the exception that there is only one threshold – an urban core of 50,000. To alleviate this discrepancy, we …rst matched the city listed on each patent to its appropriate CMA or CA. We then looked up the urban core population of each CMA and CA and identi…ed those with urban core populations of at least 50,000 as Canadian MSAs. We used the 2006 Canadian Census from Statistics Canada to determine these population thresholds. All together in Canada, there are 33 CMAs and 111 CAs, 49 (33+16) of which qualify as MSA equivalents. We have had to match the Canadian cities listed on each patent to the corresponding CMA or CA by hand. We have looked up all of the city names that had more than 5 applications associated with them from 1990 to 2007. This ends up matching approximately 95% of Canadian patents. One important issue with this matching is that the patent data do not have information on what province the city was located in. While most city names uniquely identify cities, irrespective of province, several cities have the same name in di¤erent 2 provinces (in the US context, this is like Spring…eld, MA and Spring…eld, IL). Among the observations that we match on, the number of patents for which this is the case is approximately 10% (10,602/107,362). Since we can not determine which city these names refer to, we drop these patents from the analysis. The composition of the dropped observations is quite similar to that of the overall sample. Our general statistics on immigration to Canada come from a 2008 publication by Citizenship and Immigration Canada entitled "Immigration Overview: Permanent and Temporary Residents 2007." This publication has information on Canadian immigration for permanent residents and temporary workers broken down by several di¤erent characteristics. The data extend back as far as 1983. 3 L-1 and TN Visa Data 3.1 L-1 Visa The L-1 visa is intended to enable multinational …rms to transfer their employees that work in foreign o¢ ces into the United States. It is split into two distinct categories: the L-1A is intended for managers and executives and the L-1B is for workers with “specialized knowledge.”The L-1A is valid for 3 years and can be extended to a maximum stay of 7 years. The initial length of the L-1B is also 3 years, although it can only be extended to a maximum stay of 5 years. Transferred employees have generally been required to have worked for the …rm abroad for at least one continuous year out of the last three years. An exception to this restriction was the blanket L-1 visa. This allowed …rms of su¢ cient size and history in sponsoring L-1s to apply for a special status in which they were allowed to transfer workers after having employed them for 6 months. One of the major changes to the program during our period of analysis, the L-1 Visa Reform Act of 2004, concerned this rule. Under the Act, all new L-1 workers were required to have worked for the …rm abroad for at least a year, regardless of the …rm’s status. These restrictions on foreign work history were intended to prevent …rms from hiring abroad to …ll domestic worker needs1 . Unlike the H-1B visa there are no wage restrictions for L-1 workers, with the exception that compensation has to be high enough to prevent the worker from becoming a public charge. Our data on L-1 visas come from several sources. L-1 issuances come from Kirkegaard (2005), who originally obtained the (theretofore unpublished) statistics from the Department of State’s O¢ ce of Public and Diplomatic Liaison. Our data on the border crossings of L-1 holders come from the USCIS Yearbook 1 Department of Homeland Security (2006) 3 of Immigration Statistics. Figure 1 plots these three series2 . We divide L-1 border crossings by …ve only for reasons of scale; with this modi…cation, the …gure demonstrates how closely border crossings track new L-1 issuances. The evidence suggests that the restrictions on the use of the L-1 visa are such that it is not being widely used as a substitute for the H-1B. Indeed, the regulations on the use of the visa appear to have had their intended e¤ect. As shown in Figure 1, the use of the L-1 has grown steadily over time, with a leveling o¤ after 2001. If substitution were happening on a large scale, one would expect to see much larger increases in the number of L-1 visas issued after 2003 when the H-1B cap became binding and demand for foreign workers far outpaced supply. One would also expect to see a decrease in the number of L-1 visas issued when H-1B admissions levels were signi…cantly increased in the late 1990s. Neither of these patterns is evident in either L-1 issuances or border crossings. A 2006 study by the Department of Homeland Security investigated abuses of the L-1 visa, in particular whether …rms were inappropriately using the L-1 to circumvent the H-1B cap. This report found little evidence to substantiate these concerns. Their conclusions are quite relevant for our work and merit quoting at length "While many of the claims that appear in the media about L-1 workers displacing American workers and testimony may have merit, they do not seem to represent a signi…cant national trend. While L-1 visa issuance has generally increased in the decades since the category was created, issuance has abated in recent years. And while it is possible for the L-1B program to be used by some individuals who are also eligible for H-1B program, we could not establish how often this occurs. In 2004, only 1,975 applicants applied for both the L-1 and H-1B. Adjudicators pointed out to us that it sometimes occurs that a foreign student about to graduate might receive multiple legitimate job o¤ers and be the bene…ciary of two or more petitions …led during the same period. Such an event does not indicate that either of the petitioners, or the bene…ciary, is trying to take advantage of the system. Another possible indication that L-1s are not widely used as alternatives to the H-1B is that in …scal year 2004 the congressional numerical limit on H-1B status was signi…cantly reduced, but no increase in L receipts or approvals was observed." They go on to write "Most of the discussion of the job losses American workers have experienced as a result of L visas is focused on L-1B specialized knowledge workers, not L-1A managers and executives... The great majority of the new foreign IT employees entered the United States 2 We unfortunately have not been able to obtain data on L-1 border crossings for the year 1997. 4 using the H-1B temporary worker visa, not the L-1. There is considerable room for overlapping of the two categories, but the most important distinction is that H-1B workers are petitioned for directly by U.S. companies, and are usually new hires, whereas L-1s are being transferred from a foreign company. The H-1B visa is so popular that Congress has placed explicit limits on the number of petitions that can be issued in any one year... L-1 foreign IT workers represented only a small component of a much larger wave of foreign IT workers that came to the United States on temporary worker visas. The busiest year for L-1B visas, …scal year 2000, saw more than ten H-1B workers for every one L-1B worker. In FY 2002, the ratio was twenty to one. Foreign IT workers may indeed have a¤ected employment opportunities for American IT workers, but the L-1B visa would appear to be only a very small element of the problem." We believe from our own work that these …ndings are accurate. As such, we do not think that large-scale substitution between L-1 and H-1B visas present problems for our estimations that are not solved by the panel e¤ects. 3.2 TN Visa The TN visa was created under NAFTA in 1994. It allows workers from Canada and Mexico employed in a set of high-skilled occupations that are generally narrower than those covered by the H-1B visa to come to work in the US. The number of TN workers from Canada has been unlimited since the visa’s inception. Visas given to Mexican nationals, however, were limited to 5,500 a year until 2004, at which time the cap was lifted to an unlimited number. Prior to 2008, workers on TN status were allowed to stay for a maximum of one year. At the end of the year, they were required to apply for an extension of their stay to USCIS. This rule was recently changed to extend the period to a maximum of three years before such paperwork needs to be …led. Although there is no limit on the number of permitted renewals, the TN visa cannot be used as a substitute for immigrating to the United States. The decision of whether to deny renewals over this concern is up to the discretion of US immigration o¢ cials. Our data on TN visas are more limited and also come from the USCIS Yearbook of Immigration Statistics. They consist of border crossings by workers with TN visas. Figure 1 shows that these crossings have not seen large ‡uctuations over time, particularly since 2002. Like data on L-1 crossings, TN crossings are not available for the year 1997. We have unfortunately been unable to collect any data on new TN visa issuances or the number of workers in the US on TN visa status. The number of crossings each year, however, suggests that the TN population is quite small relative to that of the H-1B. Also, unless the average number of border crossings per TN worker has changed signi…cantly over time these data suggest that the number of TN visa holders in the US has not shown 5 dramatic changes over time. The population shows little sign of decreasing as the H-1B cap was signi…cantly raised or increasing when the H-1B was lowered. This is reassuring for the interpretation of our results, as it suggests that the TN visa is not being used as a substitute for the H-1B in large numbers. 4 Labor Condition Application Data LCA data were obtained from the United States Department of Labor website3 . The contents of the data are detailed in Data Appendix Table 1. MSAs are identi…ed from the primary work location that the employer proposes on the LCA using city lists collected from the O¢ ce of Social and Economic Data Analysis at the University of Missouri. We obtain an initial match rate of 93%. Manual coding further ensures a match rate of 98%. A more extensive description of the characteristics of this data is found in Data Appendix Table 2. Here, we see that the number of applications have grown steadily over 2001-2006. At the same time, the rankings of the top …rms and MSAs applying for visas has remained remarkably steady over time. Our core panel of 76 …rms has typically accounted for slightly more than 4% of all LCAs, with the larger panel covering about 12%. These shares are relative to all institutions, including universities. 5 Construction of Firm Panel To identify …rms that are the most at-risk to changes in high skilled immigration, we pool information from several sources. This section details the main data that was used to construct the …rm panel, including the characteristics of …rms that were not eventually included in the analysis. We begin with two lists of the top employers of H-1B visa holders, shown in Data Appendix Tables 3 and 4. The …rst was published by the United States Immigration and Naturalization Services in June of 2000. It details the companies with the largest number of approved H-1B visas that were authorized from October 1, 1999 to February 29, 2000. It contains 102 companies, all of which have more than 60 approved visas. Their individual totals add up to 13,940 visas, which account for just above 17% of the total number of petitions approved during this time period. The second list comes from BusinessWeek magazine and contains the names of companies receiving the most H-1B visas in the …scal year 2006. This list identi…es 200 companies, each of which had over 141 visas approved. It also 3 http://www.‡cdatacenter.com/CaseH1B.aspx. 6 shows the structural shift in the type of …rms using the visa that was noted in the main text. Four of the top …ve …rms in the 2006 Top 200 list are Indian, a signi…cant change from the 1999 Top 100 list. We next created two lists of companies from the patent data — one from the data set of patents granted from January 1975 to April 2007 and the other from the data set of patent applications published by the USPTO since 2000. One issue to note with this process is that the names of the companies on patents are frequently assigned di¤erently across di¤erent patents. As one of many examples of this issue, the same company might have the identi…er "Corporation" after its name in one observation and not have the identi…er in another observation. In the granted patents data, we …rst sorted the company names by the number of US industry patents that were associated with it in the data. All company names that accounted for at least 0.05% of the total number of US industry patents over 2001-2004 were included in the list. We then included the next top 10 patenting …rms, for a total of 233 …rms all together. This process yielded very similar results to the one that considered the top patentors over the 1995-2006 period (instead of the 2001-2004 period). A similar methodology was used for the patent applications data, using the same threshold of .05%. This was also done over the period 2001-2004 in the applications data. This process ended up identifying 210 unique …rms. We employ a similar methodology for creating a list of the most at-risk …rms from the LCA data. We use a di¤erent threshold, however, in including all …rm names in the initial list that had more than 0.03% of all LCA applications 20012006. With the addition of the next top 10 …rms outside of this threshold, this identi…es a list of 221 …rms. A comprehensive list of …rms containing information from all of these sources was then created. The initial list included 592 unique …rms. 307 of these had at least one patent 1995-2006. We get a list of 76 …rms when we further restrict the sample to US-based …rms that had at least 5 patents per year and that had at least one inventor of each relevant ethnicity (Indian, Chinese, English, Other) on one of its patents in every year 1995-2006. We subsequently refer to these …rms that we follow as …rms "on our lists." The list of …rms that are included in our analyses and the reason each was included is detailed in Data Appendix Table 5. Tables 8 and 9 in the main text show that this list includes a wide range of types of …rms, including those from di¤erent industries and regions of the country as well as those of di¤erent size and, to some extent, reliance on the H-1B visa. Data from Compustat was merged on to this master …le, containing information about the …rm’s sales, number of employees, assets, and research and development expenditures in each year. The analyses that use this data are limited, however, as Compustat only includes information about publicly traded …rms. Since …rm names on patents and LCAs are often entered di¤erently across observations, we had to manually match the names to our lists of the top employers of H-1B visas, names of the top patentors, and names of the top LCA …lers. The same procedure was used to match companies in all data sets. A list of all of the potential patents/LCA applications that could potentially refer 7 to the company in question was …rst obtained by searching by the name of the company. From this master list, matches to the relevant company were then determined. In the end, 8967 di¤erent company names were found for the 307 companies in our initial list. If a listed name was a subsidiary of a larger company (e.g. Verizon Data Services), we match on the parent company as well. While this process required signi…cant research into each company, it is an issue that all studies that have analyzed patenting in …rms have had to face (see, for example, Hall and Ziedonis (2001)). In the patents data, we identi…ed all company names that were potential matches to the …rms in our sample. In the LCA data, we employed a more limited matching algorithm. We …rst created a rough list of potential name matches and then omitted matching on the company names that account for a relatively small number of LCA applications. This was done based on an algorithm that involved several steps. For a particular …rm in our lists, if there was a clear match for at least one name that accounted for over 100 applications, we then limited the rest of our search to company names that accounted for more than 5 applications. Similarly, if there was a clear match for at least one name with 50-100 applications, we then limited the rest of our search to company names that had more than 3 applications. Otherwise, we considered all company names that were potential matches. For most …rms, 2-3 company names were listed on the vast majority of applications that were potential matches, with a large number of remainder company names that accounted for 1-2 applications (from misspellings etc.). Our procedure consequently ended up accounting for 95% of all observations that we believe could be potential matches to …rms in our sample. Our …rm matching work also takes into account large mergers and acquisitions. We …rst obtained Compustat data for each …rm on our lists and then identi…ed years in which there was a 50% decrease in employees, a 100% increase in employees, or a sudden stop in the data. From this, we searched the company history of each of these …rms and determined whether there was a merger, acquisition, or divestiture in that year for which we had to account. We documented all of the cases in which there was and then matched on these additional companies in the patents and LCA data, as well as getting data for them in Compustat. Composite …rms were then created where, for example, we treated two …rms that merged together as one …rm prior to the merger (e.g. Lockheed Corporation and Martin Marietta are treated as one …rm together before their merger in 1995). The company name matching in the patents and LCA data was also updated to re‡ect these changes. Firms that went through such a large amount of corporate restructurings as to make a coherent composite …rm infeasible to construct were dropped from the sample. More details about this process are found in the following sections of this appendix. We additionally accounted for joint ventures in both the patent and LCA data. If a …rm was a part of any joint venture that produced an application, we counted that application for that …rm. In the small number of cases where the joint venture was between …rms that were both in our sample, we counted the application for both …rms. 8 As a double check on our work in matching …rms in the patents data, we went through a more limited patent-…rm match data set maintained by Browyn Hall. All of the company names that were matched to the …rms in our sample in this data set were also incorporated into our work4 . 6 Firm-Speci…c Details In this section we include an extensive description of the details surrounding particular …rms that came to our attention in the process of constructing our panel data set. This includes information about mergers and acquisitions, divestitures, notable subsidiary relationships, name changes, joint ventures, and an accounting of the …rms that had to be dropped from the analysis. Since most of the companies on our lists went through at least some restructuring over our sample period and most of this activity was small relative to the size of the …rm, we only account for large changes in the structure of each company. As it is often useful to know whether a company that we refer to was one of the companies that we were searching for, or was instead a related company, we note them as (e) for being an entry on our lists and (ne) for not being an entry on our lists. It should be noted that many of these details did not eventually signi…cantly a¤ect the composition of the panel that we use for our main …rmlevel estimations, as it was only once we had gone through such details that the …rm sample could be properly restricted to the types of …rms that we intended to consider. The details noted below restrict our analyses to the 177 …rm sample considered in Tables 8 and 9 in the main text. We get our sample of 76 …rms used for the main estimations when we restrict the sample to …rms based in the US that had at least 5 patents per year and that had at least one inventor of each relevant ethnicity (Indian, Chinese, English, Other) on one of its patents in every year 1995-2006. A list of the companies that were included in our …nal panel of 76 …rms is found in Data Appendix Table 5. 6.1 Mergers and Acquisitions Several of the …rms on our lists went through mergers and acquisitions. A description of the exact process by which we chose which mergers and acquisitions to account for was laid out in Section 3. Here we detail the corporate restructurings for which we did account. In most of these cases, we created a composite …rm, where the …rms that went through mergers or acquisitions are 4 This data set is publicly available online at: http://elsa.berkeley.edu/~bhhall/pat/namematch.html 9 treated as the same …rm prior to their joining together. The patents, LCAs, and Compustat data are all matched together for the composite …rm. Further details for each case are listed below. 1. American Cyanamid (ne) and its subsidiary Lederle Laboratories (ne) were acquired by American Home Products (e) in 1994. American Home Products then changed its name to Wyeth (e) in 2002. Both of the names “American Home Products”and “Wyeth”are on our lists. We treat American Cyanamid, Lederle Laboratories, American Home Products (Wyeth) as a composite …rm and match on all of these names in the patents and LCA data. 2. Andrew Corporation (e) acquired Comsearch (ne) and Allen Telecom (ne) in 2003. We found records for Allen Telecom in Compustat but no such records for Comsearch, since Comsearch was never public. Since the Comsearch acquisition was not that signi…cant by itself, we keep Andrew without the update on the old sales of Comsearch etc. 3. Donnelly (e) was acquired by Magna International (ne) in 2002. We match on both company names together under Magna International. 4. Engelhard (e) was acquired by BASF (e) in 2006 and only in August did it begin to rename Engelhard. We treat these companies together as a composite …rm. 5. Exxon Mobil (e) is the parent of Esso (ne), Mobil (ne) and ExxonMobil (e) companies. Exxon (ne) and Mobile (ne) companies merged in 1999 to form Exxon Mobile. We match on all of these names together and treat them as a composite …rm. 6. Gillette (e) was acquired by Proctor and Gamble (e) in 2005. We treat these companies together as a composite …rm. 7. Hewlett Packard (e) and Compaq (e) merged in 2002. We treat these companies together as a composite …rm. 8. Hughes Electronics (e) used to be a part of General Motors (it was acquired in 1985) but was sold to NewsCorp in 2003. We consider it a part of General Motors for the whole of the period of our analyses. 9. Immunex (e) was acquired by Amgen (e) in 2001. We treat these companies as a composite …rm. 10. Chase Manhattan Corporation (ne) acquired JP Morgan (ne) to form JP Morgan Chase (e) in 2000. The company then merged with Bank One (ne) in 2004 but still kept the name JP Morgan Chase. We treat these companies as a composite …rm. 10 11. KLA-Tencor (e) was formed in May of 1997 through the merger of KLA Instruments (ne) and Tencor Instruments (ne). We treat these companies as a composite …rm. 12. Kraft Foods (e) was a subsidiary of Phillips Morris (ne) from 1988 to 2007. In 2000 Philip Morris acquired Nabisco. In 2003 Philip Morris Companies Inc. changed its name to Altria Group (ne). We treat these companies as a composite …rm. 13. Lockheed Martin (e) was formed in 1995 from the merger of Lockheed Corporation (ne) and Martin Marietta (ne). We treat these companies as a composite …rm. 14. Lucent (e) and Alcatel (e) merged in late 2006 to form Alcatel-Lucent (ne). We consider these two companies as the same company prior to this merger. 15. In May of 2006, Maxtor Corporation (e) was acquired by Seagate Technology (e). We treat these companies as a composite …rm. 16. McDonnell Douglas (ne) and Boeing (e) merged in 1997. We treat these companies as a composite …rm. 17. In 2005 Oracle (e) acquired Siebel Systems (e). We treat these companies as a composite …rm. 18. Pacesetter, Inc (e) is a part of St Jude (ne). St Jude acquired Pacesetter in 1994. We treat these companies as a composite …rm and refer to it as St Jude for future reference. 19. Pioneer Hi-Bred (e) was acquired by DuPont (e) in 1999. We treat these companies as a composite …rm. 20. Schering (e) acquired Bayer (e) in late 2006. Although they were separate for the majority of the time in our analyses, we still consider them as a composite company. 21. When Sprint Corporation (e) purchased Nextel Communications (e) in 2005, Sprint Nextel (e) was created. We treat these companies as a composite …rm. 22. SPX (e) merged with General Signal Corporation (ne) in 1998. We treat these companies as a composite …rm. 23. Storage Technology Corporation (e) was acquired by Sun Microsystems (e) in 2005. We treat these companies as a composite …rm. 24. Synopsys (e) acquired Numerical Technologies (e) in 2003. We treat these companies as a composite …rm. 11 25. United Technologies (e) acquired Chubb plc (ne) in 2003. We treat these companies as a composite …rm. 26. In April of 2006, Whirlpool (e) acquired Maytag (e). We treat these companies as a composite …rm. 6.2 Divestitures There were a couple listings in our …rm sample that were originally a part of other companies. As the events occurred during our period of analysis, we count these as a part of their original parent company. Composite …rms were created just as they were with companies that went through mergers and acquisitions. 1. Delphi Technologies (e) was created from a General Motors (e) spin o¤ in 1998. We continue to treat it as if it were a part of General Motors. 2. Freescale Semiconductor (e) was spun o¤ from Motorola (e) in 2004 and has basically retained its identity from there on. Since this was a large part of Motorola (and is a top H-1B sponsor), we continue to treat it as if it were a part of Motorola. 3. In…neon (e) used to be a part of Siemens (e) but was spun o¤ in 1999. We treat these companies as a composite …rm. 6.3 Notable Subsidiaries Several of the company names in our lists that created the initial …rm sample were actually subsidiaries of other companies. In this case, we matched on the names for the subsidiary as well as for the parent company in the patents and LCA data. We then used Compustat data for the parent company and refer to the parent as the …rm on which we match. This is not an exhaustive list of these relationships but rather a partial listing of those that would not be obvious to an observer without extensive background in the histories of these companies. 1. Ethicon (e) is a subsidiary of Johnson & Johnson (ne). We match on both Johnson & Johnson and Ethicon together and refer to the composite company as Johnson & Johnson for future reference. 12 2. Marvell Semiconductor (e) is a subsidiary of Marvell Technology Group (ne). We match on Marvell Technology Group as well as its subsidiary Marvell Semiconductor, Inc. 3. Palo Alto Research Center (e) is a subsidiary of Xerox Corporation (e). We count the patents and LCAs for Palo Alto Research Center for Xerox. 4. Weatherford/Lamb (e) is a subsidiary of Weatherford International (ne). We …nd all assignee names that match to Weatherford International as well as its subsidiary Weatherford/Lamb. 6.4 Name Issues There were several …rms in our sample that changed their names or go by different names in di¤erent contexts. This was particularly relevant for matching these companies to the assignee names in the patents and LCA data. Here, we document these cases. In each case, we search for all company names in the patents and LCA data. 1. Advanced Micro Devices (e) is also known as AMD (ne). 2. Advanced Technology Materials (e) is also known as ATMI (ne). 3. American Home Products (e) is an entry in our lists. In 2002 it changed its name to Wyeth (e). 4. Atlantic Duncans International (e) changed its name to Optimos Inc (ne) in 2000. 5. Hon Hai (e) is an entry in one of our lists but its trade name in the United States is “Foxconn” (ne). 6. Incyte Genomics (e) and Incyte Pharmaceuticals (e) are refer to the same company –Incyte. 7. Koninklijke Philips Electronics NV (e) and Philips Electronics of North America (e) are the same company for our purposes. 8. Matsushita (e) goes by the trade name Panasonic (ne) in North America. 9. Mastech (e) and iGate Mastech (e) are two di¤erent entries on our lists. They are the same company, however, and we match on both of them together. 13 10. STMicroelectronics (e) was formed in June 1987 by the merger of semiconductor companies Thomson Semiconducteurs (ne), a part of the french company Thomson (ne), and SGS Microelettronica (ne). At the time of the merger the company was known as SGS-Thomson (ne) but took its current name in May 1998 following the withdrawal of Thomson SA as an owner. We search for both company names - SGS-Thomson and STMicroelectronics - in the patents and LCA data. 6.5 Joint Ventures There were a number of joint ventures for which we accounted. The collaborations that we account for, however, are not comprehensive; doing a full match on joint ventures would have required a far more extensive company name matching procedure. We primarily identi…ed them using (1) the fact that some of the company names in our lists were joint ventures and (2) the fact that searching for the names of the …rms in our sample naturally brought certain joint ventures to our attention. As long as the company that we were searching for was one of the names on the application (that is, one of the …rms in the joint venture), we included it. This did not present an issue for most of the joint ventures. However, there were several such collaborations where both of the companies that were listed on the applications were companies for which we were searching. The joint venture Dow Corning provides an example, since both Dow Chemical and Corning are on our lists. In these cases we have attributed the patent or LCA to both companies. Below, we have described the details about four of the major joint ventures that we identi…ed in the data. It is worth noting, however, that joint ventures between two or more …rms on our lists were quite small in number compared to the overall activity for our …rms both in the patents and LCA data. 1. HRL Labs (Hughes Research Laboratories) (e) has been a subsidiary of General Motors (e), Raytheon (e), and Boeing (e) during the period of our analyses. It was under General Motors’ control until 1997, a joint venture between Raytheon and Boeing 1997-2000, a joint venture between Raytheon, Boeing and General Motors 2001-2005, and a joint venture between Boeing and General Motors 2006-today. The patents and LCAs are attributed to the owners according to this timeline. 2. Dow Corning was created as a joint venture between Corning Glass Works (now Corning, Incorporated) (e) and Dow Chemical Company (e). 3. “Fuji Xerox” (e) is a joint venture between Fuji Film (ne) and Xerox Corporation (e). We have only counted observations with this company name towards the applications for Xerox and have not included Fuji Film in our analyses. 14 4. UOP LLC (e) is an entry on one of our lists as is Honeywell International Inc. (e) and Dow Chemical (e). UOP was a joint venture between Honeywell and Dow until 2005, at which time Honeywell took over. Applications in the patents and LCA data are counted for Dow through 2005. As noted below, we drop Honeywell (e). 6.6 Dropped From The Analysis There were several companies in our initial lists that were not possible to include in our analyses, typically due to a large degree of merger and acquisition activity. Here we document these cases. Note that this section does not describe …rms that we have not been able to include due to the fact that they do not have the needed records in Compustat. There was no preset algorithm for determining which companies to drop – dropping each …rm was based on our judgment of the particular circumstances surrounding each company. 1. Acushnet (e) is a subsidiary of Fortune Brands (ne) and has been since the 1970s. Moen (e) is a subsidiary as well. The problem arises in that patents and LCAs are listed under Fortune’s subsidiaries’ names, while the Compustat data is only for the parent company Fortune Brands. Following all of Fortune’s subsidiaries would be too di¢ cult and thus we have to drop Acushnet and Moen from the analyses 2. Allied Signal (ne) and Honeywell (e) merged in 1999. We were not able to get Compustat records for Allied Signal and thus have to drop Honeywell. This means that the joint venture UOP (e) will be dropped for Honeywell as well. 3. Applera (e) and PE Corporation (e) relate to a common company. However, the preceding company, Perkin-Elmer (ne), split into two companies in 1999 that had their own restructurings. It is thus not possible to construct a composite company and we drop Applera (e) and PE Corporation (e) from the analysis. 4. AT&T (e) is an entry on our lists. Southwestern Bell Corporation (SBC) (e) acquired Paci…c Telesis (ne) and Southern New England Telecommunications (ne) in 1997. SBC (e) and Ameritech (e) merged in 1999. SBC purchased AT&T Corp. (e) in 2005. Following all of these …rms (which also had their fair share of mergers and acquisitions) together would be quite di¢ cult and we thus drop AT&T from our analyses. AT&T (e) also acquired BellSouth (e) in late 2006. BellSouth itself has gone through several mergers and acquisitions (including a large acquisition of a part of AT&T in 2004). We thus drop BellSouth as well. 15 5. What is now known as Bank of America (e) went through several signi…cant mergers and acquisitions in the 1990s, particularly in 1989, 1991, and 1998. Following all of these …rms is quite di¢ cult and we thus drop Bank of America. 6. BAE Systems (e) has a very large jump in employment in 1999, when it acquired a part of General Electric (e). Since it is impossible to trace this part of General Electric separately, we drop BAE Systems from the analysis. 7. In a move that signi…cantly changed the size of the company, Boston Scienti…c (e) acquired Guidant Corporation (ne) in 2006 and had four significant mergers and acquisitions in 1995. Following all of these …rms would be quite di¢ cult and we thus drop Boston Scienti…c from the analysis (see point #8 for related reference). 8. Cardiac Pacemakers (e) and Advance Cardiovascular Systems (e) were spun o¤ from Eli Lilly (e) in 1995 (among other divisions) to form Guidant (ne). Guidant then made several acquisitions during 1995-2006. Since getting preperiod sales for either Cardiac Pacemakers or Advance Cardiovascular Systems would not be possible and we drop Boston Scienti…c, we drop Cardiac Pacemakers and Advance Cardiovascular Systems as well. 9. Chevron (e) and Texaco (ne) merged to form ChevronTexaco in 2001. The name was then changed to Chevron in 2005. The company also acquired Unocal Corporation (ne) in 2005. “Chevron Chemical Company” is an entry on our lists. In 2000, Chevron Corporation and Phillips Petroleum Company formed Chevron Phillips Chemical Company. Chevron and ConocoPhillips (e) each currently own 50 percent of this joint venture. Conoco (e) and ConocoPhillips (e) are also both entries on our lists. Conoco Inc merged with Phillips Petroleum Company in 2000 to form ConocoPhillips. As this is the only company name in our lists that refers to Chevron, as Chevron has gone through a number of mergers and acquisitions, and as we drop ConocoPhillips (see below), we drop Chevron. 10. Chiron (e) was acquired by Cetus (ne) in 1991, was partially bought by Novartis (ne) in 2005 and then fully bought by it in 2006. Novartis, on the other hand, has gone through many mergers and acquisitions over the years and so we drop Chiron. 11. Citigroup (e) went through several large mergers and acquisitions in 1993, 1998, 2001 and has not even been known as Citigroup for the whole period of our analyses. Since these changes were so large, we drop Citigroup. 12. CNH America (e) stands for Case New Holland America. It was created in 1999 through the merger of New Holland N.V. (ne) and Case Corporation (ne). The history behind New Holland and Case Corporation is also replete with merger and acquisition activity and so we choose to drop CNH. 16 13. Conoco (e) and ConocoPhillips (e) are both entries on our lists. Conoco Inc merged with Phillips Petroleum Company (ne) in 2002 to form ConocoPhillips. Conoco, in turn, was created as a spin o¤ from Dupont in 1997. We thus can not get a measure of preperiod sales for Conoco and thus we also drop ConocoPhillips as well. Note that we still keep Dupont, however, since this was not extremely large in comparison with its size. 14. Corixa (e) was sold to GlaxoSmithKline (e) in 2005. Since tracking all of the …rms that made GlaxoSmithKline was too di¢ cult, we drop Corixa (see the discussion of GlaxoSmithKline below). 15. CVS Pharmacy (e) is an entry on one of our lists. In 1997 it acquired Revco (ne) and in 2006 it acquired Minute Clinic (ne). We have not been able to …nd records for many of these companies in Compustat and following them would be di¢ cult. We drop CVS from the analysis. 16. DaimlerChrysler (e) was formed from the merger of Daimler-Benz (ne) and Chrysler Corporation (ne) in 1998. Treating all of these …rms together as a composite …rm would be too di¢ cult and we consequently have to drop DaimlerChrysler. 17. EMC (e) went through a large number of acquisitions in the early 2000s and many of the companies were private. It would thus be very di¢ cult to construct a composite company. We consequently drop EMC from the analysis. 18. Ernst & Young (e) sold its consultancy group to Cap Gemini (ne) in 2000. Since Ernst & Young’s consultancy likely used plenty of LCAs (as the whole company used a large number) and Cap Gemini does not have any patents, we drop Ernst & Young from the analysis. 19. Federal Mogul (e) experienced a large increase in employment in 1998. In that year it acquired Turner & Newall (ne), Cooper Automotive (ne) from Cooper Industries (ne), and Fel-Pro (ne) all of which were large acquisitions. Following all of these …rms would be quite di¢ cult and so we drop Federal Mogul. 20. It could be argued that we “dropped” Fuji Film. The relevant company name that was in our lists was "Fuji Xerox," which is a joint venture between Fuji Film and Xerox (e). We have just matched this with Xerox, which is another company on our lists. 21. In 1997 General Instrument (e) split into three companies –General Semiconductor (ne), CommScope (ne) and NextLevel Systems (ne). It does not make sense to track all of these companies as one, and so we drop General Instrument. 22. In 2000 i2 Technologies (e) acquired Aspect Development (ne). Aspect Development was a private company and consequently does not have a 17 record in Compustat. Since this was a large acquisition, we can not track i2 Technologies as one company and we have to drop it. 23. In 2000 Incyte (e) acquired Proteome Inc (ne), which signi…cantly expanded the size of the company. Proteome Inc was private and does not have a record in Compustat. We thus can not track Incyte as one company and we have to drop it. 24. JDS Uniphase (e) was formed when JDS FITEL (ne) and Uniphase Corporation (ne) merged in 1999. It rebranded itself to be called JDSU (ne) in 2005. As both JDS FITEL and Uniphase Corporation went through several signi…cant mergers, acquisitions, and divestitures themselves prior to merging together, we drop JDS Uniphase from the analysis. 25. Masco (e) acquired Zenith Products Corporation (e) in 1994. We have not found a record for Zenith Products in Compustat and so we drop both Zenith Products Corporation (e) and Masco Corporation (e) from our analyses. 26. Monsanto (e) went through very large corporate restructurings in the late 1990s, to the point where the current Monsanto is a di¤erent legal entity than the “Monsanto” operating before 2000 and is in a signi…cantly di¤erent line of business. We therefore drop it from the analysis. 27. During 1991-3 NCR Corporation (e) was a subsidiary of AT&T (e) and has gone through several restructurings since. We therefore drop it from the analysis. 28. In 2001, Northrop Grumman (e) acquired Litton (ne) and TRW (e). In that year, its employment more than doubled. It subsequently sold o¤ a major part of TRW (e). Since tracking this all together would be di¢ cult, we drop Northrop Grumman. As TRW (e) is also an entry on our lists, we drop this as well. 29. Nortel Networks (e) as it is today was created from a spin o¤ of BCE (Bell Canada Enterprises). BCE (ne) is a public company but from its company history we know that it also went through plenty of other mergers, acquisitions and divestitures. We consequently drop Nortel Networks. 30. SanDisk (e) acquired Matrix Semiconductor (e) in 2005. Matrix Semiconductor was a private …rm and thus we do not have any Compustat records for it. The acquisition caused a very large change in the size of the company and thus we drop Sandisk. 31. Semiconductor Components Industries (e) is a subsidiary of ON Semiconductor (ne). ON Semiconductor was spun o¤ from Motorola (e) in 1999. It then proceeded to make several acquisitions, signi…cantly increasing the size of the company. It would be very di¢ cult to follow all of these companies together and thus we drop ON Semiconductor from the analysis. 18 Note, however, that the spin o¤ of ON Semiconductor from Motorola was not large relative to Motorola’s size and thus we keep Motorola in the analysis. 32. Sugen (e) merged with Pharmacia & Upjohn Inc. (ne) in 1999. Following these two companies together would be di¢ cult and thus we drop Sugen. 33. T-Mobile (e) was previously known as VoiceStream Wireless (ne) and Powertel (ne). VoiceStream was acquired by Deutsche Telekom (ne) in 2001, and in changed its name to T-Mobile in 2002. We drop it because of the signi…cant identity changes and fact that we could not …nd a record for it in Compustat. 34. SmithKline Beckman (ne) and The Beecham Group (ne) merged to form SmithKline Beecham (e) in 1989. In 1995 Glaxo (ne) and Wellcome (ne) merged to form Glaxo Wellcome (ne). Also in 1995, Glaxo Wellcome acquired A¤ymax (ne). Glaxo Wellcome and SmithKline Beecham merged in 2001 to form GlaxoSmithKline (ne). GlaxoSmithKline (e) subsequently bought Corixa (e) in 2005. Since there were many mergers and acquisitions for this company and many of these companies were foreign (and thus do not have records in Compustat) this …rm was dropped from the analysis. 35. Verizon (e) was formed through a series of mergers and acquisitions that made it and its antecedents impossible to follow as a composite company. Bell Atlantic (e) merged with NYNEX (ne) in 1997. GTE (ne) then merged with Bell Atlantic (e) in 2000 to form Verizon. Verizon acquired MCI (ne), which was formerly WorldCom (e) in 2005. Because following all of these companies together as a composite company would not make sense, we have to drop Verizon, Bell Atlantic, and WorldCom from our analyses. 19 Data Appendix Table 1: LCA Data Variable Description Submitted_Date Case_No Name Address Address2 City State Postal_Code Nbr_Immigrants Begin_Date End_Date Job_Title Dol_Decision_Date Certified_Begin_Date Certified_End_Date Job_Code Approval_Status Wage_Rate_1 Rate_Per_1 Max_Rate_1 Part_Time_1 City_1 State_1 Prevailing_Wage_1 Wage_Source_1 Yr_Source_Pub_1 Other_Wage_Source_1 Wage_Rate_2 Rate_Per_2 Max_Rate_2 Part_Time_2 City_2 State_2 Prevailing_Wage_2 Wage_Source_2 Yr_Source_Pub_2 Other_Wage_Source_2 Date and time the application was submitted Case number Employer's name Employer's address Employer's address2 Employer's city Employer's state Employer's postal code Number of job openings Proposed begin date Proposed end date Job title Date certified or denied Certification start date Certification end date Three digit occupational group Approval status - certified or denied Employer's proposed wage rate Unit of pay for proposed wage rate Maximum proposed wage rate Y = Part time; N = Full time position Work city (location of the job opening) Work state (location of the job opening) Prevailing wage rate Collective bargaining; SESA; Other Year that the prevailing wage data was published Description of the Other wage source Employer's proposed wage rate - second location Unit of pay for proposed wage - second location Maximum proposed wage rate - second location Y = Part time; N = Full time position Work city - second location Work state - second location Prevailing wage rate - second location Collective bargaining; SESA; Other Year that the prevailing wage data was published Description of the Other wage source Notes: LCA data are kept by the Department of Labor and are publicly available at http://www.flcdatacenter.com/CaseH1B.aspx. Data Appendix Table 2: LCA Summary Statistics Number of Applications Most Common MSA Second Most Common MSA Third Most Common MSA LCAs By 76 Firm Panel LCAs By 307 Firm Panel Most Common Firm Second Most Common Firm Third Most Common Firm 2001 2002 2003 2004 2005 2006 Overall 239,123 244,759 257,199 330,111 312,741 374,463 1,758,396 NY SF LA NY LA SF NY SF LA NY SF LA NY SF LA NY SF LA NY SF LA 4.6% 12.3% 3.3% 8.6% 4.9% 11.6% 4.3% 11.5% 4.3% 12.7% 4.9% 13.4% 4.4% 11.8% Oracle Microsoft Microsoft Microsoft Microsoft Microsoft Microsoft IBM IBM Microsoft Cisco Intel IBM IBM IBM IBM Oracle IBM Oracle Oracle Oracle Notes: The firm rankings of the largest LCA applicants are only done with respect to the companies in our panel of 76 firms. Acronyms stand for (i) NY: New York City (ii) SF: San Francisco (iii) LA: Los Angeles (iv) MSA: Metropolitan Statistical Area. Shares are relative to all institutions, including universities. Data Appendix Table 3: Top 100 List Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 28 29 30 31 32 33 35 Company Motorola Inc Oracle Corp Cisco Systems Inc Mastech Intel Corp Microsoft Corp Rapidigm Syntel Inc Wipro LTD Tata Consultancy Serv PriceWaterhouseCoopers LLP People Com Consultants Inc Lucent Technologies Infosys Technologies LTD Nortel Networks Inc Tekedge Corp Data Conversion Tata Infotech Cotelligent USA Inc Sun Microsystems Inc Compuware Corp KPMG LLP Intelligroup Hi Tech Consultants Inc Group Ipex Inc Ace Technologies Inc Hewlett Packard Co Everest Consulting GR Bell Atlantic Network Serv Ernst Young LLP Agilent Technologies Inc Deloitte Touche LLP Birlasoft Global Consultants IBM R Systems Inc Sprint United Mgt Wireless Facilities Number of H-1B Visas 618 455 398 389 367 362 357 337 327 320 272 261 255 239 234 219 195 185 183 182 179 177 161 157 151 149 149 147 141 137 136 130 128 128 124 124 124 124 Notes: This list was originally published by the United States United States Immigration and Naturalization Service in 2000 as "Leading Employers of Specialty Occupation Workers (H-1B): October 1999 to February 2000." Data Appendix Table 3: Top 100 List (Continued) Rank 39 41 42 43 44 45 47 48 49 50 51 52 53 55 56 57 58 60 61 64 65 68 69 70 71 74 75 Company Cognizant Technology Solutions Satyam Computer Serv Keane University of Washington Analysts Intl Corp Capital One Serv Apar Infotech Modis Inc L & T Technology LTD Complete Business Solutions Inc Techspan CMOS Soft Inc Renaissance Worldwide University of PA Conexant Systems Inc I2 Technologies Inc AT T Jean Martin EMC Atlantic Duncans Intl Merrill Lynch Unique Computing Computer Intl Indotronix Intl Nationwide Insurance Interim Technology Consulting Compaq Computer GE MSI Majesco Software Inc Data Core Systems IT Solutions Inc Allied Informatics Inc Ciber Inc Deloitte Consulting LLC Goldman Sachs Baton Rouge Intl Cyberthink Stanford University Number of H-1B Visas 123 123 114 113 110 109 108 108 107 105 101 100 99 97 96 96 93 91 90 87 87 86 85 85 85 84 80 80 80 78 77 76 75 75 75 74 73 73 Data Appendix Table 3: Top 100 List (Continued) Rank 77 79 82 86 87 88 89 93 94 96 99 Company Cap Gemini America Infogain Corp Ajilon Serv Allsoft Technologies Inc Morgan Stanley Dean Witter Ericsson Inc Harvard University Sabre Inc Yash Technologies Inc Pyramid Consulting Inc MSX Intl Inc Softplus Inc Baylor College Of Medicine Microstrategy University of Minnesota Universal Software Computer Horizons Ramco Systems Siebel Systems Inc Insight Solutions Inc Synopsys Inc Texas Instruments Inc Infosynergy Lason Systems Inc Vanguard GR Yale University Number of H-1B Visas 72 72 71 71 71 70 70 70 70 69 68 67 65 65 65 65 64 63 63 62 62 62 61 61 61 61 Data Appendix Table 4: Top 200 List Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Company Infosys Technologies Wipro Microsoft Tata Consultancy Services Satyam Computer Services Cognizant Tech Solutions U.S. Patni Computer Systems IBM Oracle Larsen & Toubro Infotech HCL America Deloitte & Touche Cisco Systems Intel I-Flex Solutions Ernst & Young Tech Mahindra Americas Motorola MphasiS Deloitte Consulting LanceSoft New York City Public Schools Accenture JP Morgan Chase Polaris Software Lab India Covansys PricewaterhouseCoopers Qualcomm Goldman Sachs KPMG Marlabs University of Michigan Univ. of Illinois at Chicago University of Pennsylvania The Johns Hopkins University Syntel Consulting Citigroup Global Markets BearingPoint University of Maryland Keane Notes: This list is published by BusinessWeek magazine and can be found at http://www.businessweek.com/table/0518_h1btable.htm. Number of H-1B Visas 4,908 4,002 3,117 3,046 2,880 2,226 1,391 1,130 1,022 947 910 890 828 828 817 774 770 760 751 665 645 642 637 632 611 611 591 533 529 476 475 437 434 432 432 416 413 413 404 386 Data Appendix Table 4: Top 200 List (Continued) Rank 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 Company HTC Global Services iGate Mastech Hexaware Technologies Capital One Services Columbia University Lehman Brothers Yahoo! U.S. Technology Resources Intelligroup Hewlett-Packard Rapidigm Merrill Lynch Google Citibank Dis National Insts of Health DHHS Yale University Nokia Texas Instruments Capgemini Harvard University EMC Sun Microsystems Rite Aid Bloomberg General Electric Amgen McKinsey U.S. Morgan Stanley Stanford University Washington Univ. in St. Louis Verizon Data Services NYC-HHC Harlem Hospital Center University of Pittsburgh Indiana University Ohio State Everest Consulting Group Univ. of Minnesota Amtex Systems Univ. of Wisconsin at Madison SUNY-Stony Brook Number of H-1B Visas 382 378 362 362 355 352 347 339 336 333 330 329 328 322 322 316 314 313 309 308 305 303 301 298 292 289 286 285 279 278 276 276 275 273 271 269 269 268 268 262 Data Appendix Table 4: Top 200 List (Continued) Rank Company 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 Amazon Global Resources Cleveland Clinic Foundation Dallas Independent School District Univ. of Calif. at Davis Northwestern Syntel Univ. of Missouri at Columbia GlobalCynex Kanbay American Solutions Univ. of Florida Intl. Center UCLA Duke Univ. Medical Center Mount Sinai Medical Center Bank of America Software Research Group Baylor College of Medicine Massachusetts General Hospital Ciber Verinon Technology Solutions Everest Business Solutions Volt Technical Resources Oklahoma State University Compunnel Software Group U.S. Tech Solutions Symantec JSMN International UBS CVS Pharmacy The Pennsylvania State University University of Washington Nortel Networks Univ. of Calif. at San Francisco University of Mass. Medical School Sprint/United Management Houston Independent School District Purdue Global Consultants Emory University UT Health Science Center Number of H-1B Visas 262 256 255 254 251 250 247 247 246 242 240 239 238 236 236 234 234 232 232 230 226 224 223 222 221 220 218 216 213 213 213 212 211 210 209 209 208 207 207 207 Data Appendix Table 4: Top 200 List (Continued) Rank Company 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 Univ. of Colorado Vanderbilt University ObjectWin Technology Diaspark HSBC Bank USA eBusiness Application Solutions Broadcom Prince Georges County (Md.) Public Schs Micron Technology Countrywide Home Loans Texas A&M Applied Materials Schlumberger Technology University of Iowa IBM Global Svcs. India Deloitte Tax Cummins iTech U.S. Compuware Intl. Students And Scholars Office Univ. of Calif. at San Diego Walgreen's Howard Hughes Medical Institute USC Vision Systems Group T Mobile USA Multivision Electronic Data Systems Massachusetts Institute of Technology California Institute of Technology Case Western Reserve Univ. UNC at Chapel Hill Univ. of Alabama at Birmingham Deutsche Bank Caterpillar Hallmark Global Technologies cyberThink Corporate Computer Services Advanced Micro Devices Megasoft Consultants Number of H-1B Visas 207 205 205 204 203 203 203 203 202 198 198 195 194 194 194 194 193 191 189 186 185 184 184 183 182 180 178 177 175 174 173 173 172 170 170 169 169 167 167 166 Data Appendix Table 4: Top 200 List (Continued) Rank Company 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 Enterprise Solutions Freescale Semiconductor UT Southwestern Medical Center First Tek Technologies Michigan State Research Fdn of the State Univ Of COMSYS Services Virginia Tech Juniper Networks University of Arizona Iowa State University University of Virginia FedEx Corporate Services Credit Suisse First Boston Bristol-Myers Squibb Verizon Services Ebay Ajilon Consulting General Motors Camo Technologies Marvell Semiconductor CMC Americas UT M.D. Anderson Cancer Center NVIDIA AT&T Services Weill Medical College of Cornell Axiom Systems Wayne State University Mayo Clinic Rochester North Carolina State Genentech Makro Technologies SVAM International Memorial Sloan-Kettering Cancer Nutech Information Systems Xpedite Technologies Automatic Data Processing Louisiana State Fannie Mae MindTree Consulting Number of H-1B Visas 165 163 163 161 161 160 160 160 160 158 157 157 157 156 156 156 155 154 153 152 151 150 149 149 147 146 146 146 146 146 146 145 144 143 143 143 143 142 141 141 Data Appendix Table 5: Data Source of Each Firm Firm Abbott Laboratories Air Products and Chemicals Inc Allergan Inc Altera Corporation Advanced Micro Devices Altria Amgen Inc Apple Computer Inc Applied Materials Inc Baker Hughes Inc Baxter International Becton, Dickinson and Company Black and Decker Inc Boeing Company Bristol-Myers Squibb Company Caterpillar Inc Cirrus Logic Inc Cisco Systems Inc Corning Inc Cummins Cypress Semiconductor Corporation Dell Dow Chemical Company E I Du Pont De Nemours and Company Eastman Kodak Company Eaton Corporation Emerson Electric Company ExxonMobile Ford General Electric Company General Instrument Corporation General Motors Corporation Goodyear Tire and Rubber Company Halliburton Company 1999 Top 100 List 2006 Top 200 List LCA Data Yes 159 66 132 65 Yes 13 Yes 137 Yes 65 179 Yes Yes Yes Yes Yes Yes Yes 175 155 3 Patent Grants Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Patent Applications Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Notes: This table lists the 76 firms that make up our firm panel and the reasons why they were included. If a firm was listed in the 1999 Top 100 or 2006 Top 200 lists, they are denoted as such, including the ranking that they received in the list. If the firm was one of the top LCA filers (>=.03% of LCA applications) or top patentors in either the grants or applications data (>=.05% in either data set) they are marked with a "Yes" designation. The average annual number of LCAs for a given firm was 171. Similarly, the average annual number of patent applications for a given firm was 197; the minimum annual average was 25 patent applications. Data Appendix Table 5: Data Source of Each Firm (Continued) Firm Hewlett Packard-Compaq Human Genome Sciences Inc IBM Corporation Illinois Tool Works Inc Intel Corporation International Rectifier Corporation Isis Pharmaceuticals Johnson & Johnson Kimberly Clark Worldwide Inc Lam Research Corporation Lexmark International Inc Lockheed Martin Corporation LSI Logic Corporation Medtronic Inc Merck and Company Micron Technology Microsoft Corporation Molex Inc Motorola Inc National Semiconductor Corporation Oracle Corporation Pfizer Inc Pitney Bowes Inc PPG Industries Proctor and Gamble Company Qualcomm Inc Rambus Inc Raytheon Company Rockwell Automation Technologies Inc Schlumberger Technology Corporation St. Jude Sun Microsystems Inc Symbol Technologies Inc Synopsys Inc Texas Instruments Inc 3Com 3M Unisys Corporation United Technologies Corporation Wyeth Xerox Corporation Xilinx Inc 1999 Top 100 List 2006 Top 200 List LCA Data 26, 65 50 Yes 35 8 Yes 5 14 Yes 6 129 3 Yes Yes 1 18 Yes 2 9 Yes 28 Yes 133 Yes 20 62 Yes 96 96 58 Yes Patent Grants Patent Applications Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Data Appendix Figure 1: Other Relevant Visas 200,000 150,000 100,000 50,000 0 1990 1992 1994 TN Border Crossings 1996 1998 2000 (L1 Border Crossings)/5 2002 2004 New L1 Issuances 2006 The Ethnic Composition of US Inventors William R. Kerr Harvard Business School Boston MA HBS Working Paper 08-006 (revised) December 2008 Abstract The ethnic composition of US scientists and engineers is undergoing a signi…cant transformation. This study applies an ethnic-name database to individual patent records granted by the United States Patent and Trademark O¢ ce to document these trends with greater detail than previously available. Most notably, the contributions of Chinese and Indian scientists to US technology formation increased dramatically in the 1990s, before noticeably leveling o¤ after 2000 and declining in the case of Indian researchers. Growth in ethnic innovation is concentrated in high-tech sectors; the institutional and geographic dimensions are further characterized. JEL Classi…cation: F15, F22, J44, J61, O31. Key Words: Innovation, Research and Development, Patents, Scientists, Engineers, Inventors, Ethnicity, Immigration. Comments are appreciated and can be sent to wkerr@hbs.edu. This permanent working paper is continually updated as additional patenting data are collected. The …rst version is included in Kerr (2005). I am grateful to William Lincoln and Debbie Strumsky for data assistance. This research is supported by the National Science Foundation, HBS Research, the Innovation Policy and the Economy Group, and the MIT George Schultz Fund. 1 1 Introduction The contributions of immigrants to US technology formation are staggering: while foreign-born account for just over 10% of the US working population, they represent 25% of the US science and engineering (SE) workforce and nearly 50% of those with doctorates. Even looking within the Ph.D. level, ethnic researchers make an exceptional contribution to science as measured by Nobel Prizes, election to the National Academy of Sciences, patent citation counts, and so on.1 Moreover, ethnic entrepreneurs are very active in commercializing new technologies, especially in the high-tech sectors (e.g., Saxenian 2002a). The magnitude of these ethnic contributions raises many research and policy questions: debates regarding the appropriate quota for H1-B temporary visas, the possible crowding out of native students from SE …elds, the brain-drain or brain-circulation e¤ect on sending countries, and the future prospects for US technology leadership are just four examples.2 Econometric studies quantifying the role of ethnic scientists and engineers for technology formation and di¤usion are often hampered, however, by data constraints. It is very di¢ cult to assemble su¢ cient cross-sectional and longitudinal variation for large-scale panel exercises.3 This paper describes a new approach for quantifying the ethnic composition of US inventors with previously unavailable detail. The technique exploits the inventor names contained on the micro-records for all patents granted by the United States Patent and Trademark O¢ ce (USPTO) from January 1975 to May 2008.4 Each patent record lists one or more inventors, with 8 million inventor names associated with the 4.5 million patents. The USPTO grants patents to inventors living within and outside of the US, with each group accounting for about half of patents over the 1975-2008 period. This study maps into these inventor names an ethnic-name database typically used for commercial applications.5 This approach exploits the idea that inventors with the surnames Chang or Wang are likely of Chinese ethnicity, those with surnames Rodriguez or Martinez of Hispanic ethnicity, and so on. The match rates range from 92%-98% for US domestic inventor records, depending upon the procedure employed, and the process a¤ords the distinction of nine ethnicities: Chinese, English, European, Hispanic/Filipino, Indian/Hindi, Japanese, Korean, Russian, 1 For example, Stephan and Levin (2001), Burton and Wang (1999), Johnson (1998, 2001), and Streeter (1997). Representative papers are Lowell (2000), Borjas (2004), Saxenian (2002b), and Freeman (2005) respectively. 3 While the decennial Census provides detailed cross-sectional descriptions, its longitudinal variation is necessarily limited. On the other hand, the annual Current Population Survey provides poor cross-sectional detail and does not ask immigrant status until 1994. The SESTAT data o¤er a better trade-o¤ between the two dimensions but su¤er important sampling biases with respect to immigrants (Kannankutty and Wilkinson 1999). 4 The project initially employed the NBER Patent Data File, compiled by Hall et al. (2001), that includes patents granted by the USPTO from January 1975 to December 1999. The current version now employs an extended version developed by HBS Research that includes patents granted through mid 2008. Some of the descriptive calculations have not been updated from their 1975-1999 values (noted in text). 5 The database is constructed by the Melissa Data Corporation for the design of direct-mail advertisements. I am grateful to the MIT George Schultz Fund for …nancial assistance in its purchase. 2 1 and Vietnamese. Moreover, because the matching is done at the micro-level, greater detail on the ethnic composition of inventors is available annually on multiple dimensions: technologies, cities, companies, etc.6 The next section details the ethnic-name matching strategy, outlines the strengths and weaknesses of the database selected, and o¤ers some validation exercises using patent records …led by foreign inventors with the USPTO. Section 3 then documents the growing contribution of ethnic inventors to US technology formation. The rapid increase during the 1990s in the percentage of high-tech patents granted to Chinese and Indian inventors is particularly striking, as is the leveling o¤ in these trends after 2000. The relative contributions from scientists of European ethnicity, however, decline somewhat from their levels in 1975. The institutional and geographic dimensions of ethnic innovation are further delineated. Section 4 concludes. 2 Ethnic-Name Matching Technique This section describes the ethnic-name matching strategy employed with the inventor names contained in the NBER Patent Data File. To begin, two common liabilities associated with using ethnic-name databases are identi…ed. Addressing these limitations guides the selection of the Melissa database and the design of the name-matching strategy, which is described in detail. Descriptive statistics are then provided from a quality-assurance exercise of applying the ethnic-name strategy to inventors residing outside of the US who …le patent applications with the USPTO. The section concludes with a further discussion of the advantages and disadvantages for empirical estimations of the resulting dataset. 2.1 Melissa Ethnic-Name Database and Name-Matching Technique Ethnic-name databases su¤er from two inherent limitations — not all ethnicities are covered, and included ethnicities usually receive unequal treatment. The strength of the ethnic-name database obtained from the Melissa Data Corporation is the identi…cation of Asian ethnicities, especially Chinese, Indian/Hindi, Japanese, Korean, Russian, and Vietnamese names. The database is comparatively weaker for looking within continental Europe. For example, Dutch surnames are collected without …rst names, while the opposite is true for French names. The Asian comparative advantage and overall cost e¤ectiveness led to the selection of the Melissa database, as well as the European amalgamation employed in the matching technique. In total, nine ethnicities are distinguished: Chinese, English, European, Hispanic/Filipino, Indian/Hindi, 6 This ethnic patenting database is employed by Kerr (2005, 2008a-c), Kerr and Lincoln (2008), and Foley and Kerr (2008) to study the role of ethnic scientists and entrepreneurs in technology formation and di¤usion. 2 Japanese, Korean, Russian, and Vietnamese. The largest ethnicity in the US SE workforce absent from the ethnic-name database is Iranian, which accounted for 0.7% of bachelor-level SEs in the 1990 Census.7 The second limitation is that commercial databases vary in the number of names they contain for each ethnicity. These di¤erences re‡ect both uneven coverage and that some ethnicities are more homogeneous in their naming conventions. For example, the 1975 to 1999 Her…ndahl indices for Korean (470) and Vietnamese (1121) surnames are signi…cantly higher than Japanese (132) and English (164) due to frequent Korean surnames like Kim (16%) and Park (12%) and Vietnamese surnames like Nguyen (29%) and Tran (12%). Two polar matching strategies are employed to ensure coverage di¤erences do not overly in‡uence ethnicity assignments. Full Matching: This procedure utilizes all of the name assignments in the Melissa database and manually codes any unmatched surname or …rst name associated with 100 or more inventor records. This technique further exploits the international distribution of inventor names within the patent database to provide superior results.8 The match rate for this procedure is 98% (98% US, 98% foreign). This rate should be less than 100% with the Melissa database as not all ethnicities are included. Restricted Matching: A second strategy employs a uniform name database using only the 3000 and 200 most common surnames and …rst names, respectively, for each ethnicity. These numerical bars are the lowest common denominators across the major ethnicities studied. The match rate for this restricted procedure is 89% (92% US, 86% foreign). For matching, names in both the patent and ethnic-name databases are capitalized and truncated to ten characters. Approximately 88% of the patent name records have a unique surname, …rst 7 The ethnic groups employed: Chinese, English, European (including Dutch, French, German, Italian, and Polish names), Hispanic/Filipino (including Latino and Filipino/Tagalog names), Indian/Hindi (including Bangladeshi and Pakistani names), Japanese, Korean, Russian (including Armenian and Carpatho-Rusyns names), and Vietnamese. The …nal matching procedure employs a joint Hispanic/Filipino ethnicity, while in earlier work they are kept separate. These two ethnic groups are combined due to extensive name overlaps (e.g., the common surnames Martinez and Ramirez are in both ethnic lists), but this choice is not a …rst-order concern. The Bangladeshi and Pakistani name counts are extremely small (8 and 15 respectively) and are not distinct from the Indian/Hindi names. Their assignment does not materially a¤ect the Indian/Hindi outcome, which represents in some ways a South Asian identifer. Jewish ethnic names overlap extensively with other ethnic groupings and are not separately treated. A handful of names classi…ed as Arab, Burmese, and Malay are also discarded. 8 A simple rule is applied to take advantage of the information embedded in the patent database itself. If over 90% of the USPTO records associated with a name are concentrated in a non-English ethnicity country or region, the name is assigned that ethnicity. As the test includes the domestic US inventors, comprising over 50% of all inventors, this technique is very stringent and mainly bolsters European ethnic matching (the comparative weakness of the Melissa database). The rule is not applied to names with fewer than ten occurrences during 1975 to 1999. 3 name, or middle name match in the Full Matching procedure (77% in the Restricted Matching), a¤ording a single ethnicity determination with priority given to surname matches. For inventors residing in the US, representative probabilities are assigned to non-unique matches using the masters-level SE communities in Metropolitan Statistical Areas (MSAs). Ethnic probabilities for the remaining 3% of records (mostly foreign) are calculated as equal shares. MSA ethnic compositions are averages of the 1980 and 1990 US 5% Census …les; they are kept constant through the sample period. The sample considers civilians aged 22-54 listing Engineers, Mathematical and Computer Scientists, or Natural Scientists as their occupations. The master’s degree cut-o¤ re‡ects the higher average education level of patenting scientists within the scienti…c community (e.g., Kannankutty and Wilkinson 1999). Country of birth is used to assign ethnicities into broad categories that match the name records. To illustrate, take the San Francisco scienti…c community to be 12.1% Chinese, 66.1% English, and 4.6% European (with other ethnicities omitted). A San Francisco-based record matching to Chinese, English, and European surnames would be assigned a probabilistic ethnicity of 14.6% Chinese, 79.8% English, and 5.6% European (summing to 100%). A China-based record matching all three ethnicities would be assigned a 33.3% probability for each. 2.2 Inventors Residing in Foreign Countries and Regions The application of the ethnic-name database to the inventors residing outside of the US provides a natural quality-assurance exercise for the technique. Inventions originating outside the US account for just under half of USPTO patents, with applications from Japan comprising about half of this foreign total. The top panel of Table 1 summarizes the results, with the rows presenting the matched characteristics for countries and regions grouped to the ethnicities identi…able with the database. The results are very encouraging. First, the Full Matching procedure assigns ethnicities to a large percentage of foreign records, with the match rates greater than 93% for all countries. In the Restricted Matching procedure, a matching rate of greater than 74% holds for all regions. Second, the estimated inventor compositions are reasonable. The own-ethnicity shares are summarized in the fourth and …fth columns. The weighted average is 86% in the Full Matching procedure, and own-ethnicity contributions are greater than 80% in the UK, China, India, Japan, Korea, and Russia regardless of the matching procedure employed. Like the US, own-ethnicity contributions should be less than 100% due to foreign researchers. The high success rate using the Restricted Matching procedure indicates that the ethnic-name database performs well without exploiting the international distribution of names, although power is lost with Europe. Likewise, uneven coverage in the Melissa database is not driving the ethnic composition trends. 4 The bottom panel of Table 1 presents the complete ethnic compositions estimated for the foreign countries. Many of the positive o¤-diagonals are to be expected, either due to foreign expatriates (UK, Vietnam), small sample sizes (Vietnam), or overlaps of common names. Two prominent examples of common names are the surname Lee (Chinese, English, and Korean) and the …rst name Igor (Hispanic and Russian). The most frequent name overlap occurs between the European and Hispanic ethnicities.9 One advantage the matching technique possesses for inventors residing in the US is the ability to use the Census to assign probabilistic estimates for overlapping names; foreign records are only assigned as equal shares. The last two columns of Table 1’s top panel indicate the percentage of the foreign inventors assigned at least partially to their own-ethnicity. While this study does not make the strong assumption that ties should go to the country’s own-ethnicity, the additional power provided by using the US Census for breaking domestic ties is illustrated. 2.3 Advantages and Disadvantages of Name-Matching Technique Visual con…rmation of the top 1000 surnames and …rst names in the USPTO records con…rms the name-matching technique works well. Table A1 in the appendix lists the 100 most common surnames of US-based inventors for each ethnicity, along with their relative contributions. These counts sum the ethnic contribution from inventors with each surname. These counts include partial or split assignments. Moreover, they are not necessarily direct or exclusive matches (e.g., the ethnic match may have occurred through the …rst name). While some inventors are certainly misclassi…ed, the measurement error in aggregate trends building from the micro-data is minor. The Full Matching procedure is the preferred technique and underlies the trends presented in the next section, but most applications …nd negligible di¤erences when the Restricted Matching dataset is employed instead. The matched records describe the ethnic composition of US SEs with previously unavailable detail: incorporating the major ethnicities working in the US SE community; separating out detailed technologies and manufacturing industries; providing city and state statistics; and providing annual metrics. Moreover, the assignment of patents to corporations and institutions a¤ords …rm-level and university-level characterizations (e.g., the ethnic composition of IBM’s inventors …ling computer patents from San Francisco in 1985). Detailed econometrics require this level of cross-sectional and longitudinal variation, and the next section provides graphical descriptions along these various dimensions. These descriptive statistics highlight the advantages of name matching through individual patent records. 9 The main US SE ethnicity missing from the database is Iranian. Running the ethnic-name database on the few patents from Iran yields a 55%-77% match rate. Iran’s predicted composition does not favor any of the nine ethnicities studied, with the largest overlap being the English ethnicity at 52%. Ongoing work is attempting to develop better strength for Iranian names. 5 The ethnic-name procedure does, however, have two potential limitations for empirical work that should be highlighted. First, the approach does not distinguish foreign-born ethnic researchers in the US from later generations working as SEs. The procedure can only estimate total ethnic SE populations, and these levels are to some extent measured with time-invariant error due to the name-matching approach. The resulting data are very powerful, however, for panel econometrics employ changes in these ethnic SE populations for identi…cation. Moreover, Census and INS records con…rm these changes are primarily due to new SE immigration for this period, substantially weakening this overall concern. The name-matching technique also does not distinguish …ner divisions within the nine major ethnic groupings. For ethnic network analyses, it would be advantageous to separate Mexican from Chilean scientists within the Hispanic ethnicity, to distinguish Chinese engineers with ethnic ties to Taipei versus Beijing versus Shanghai, and so on. These distinctions are not possible with the Melissa database, and researchers should understand that measurement error from the broader ethnic divisions may bias their estimated coe¢ cients downward depending upon the application.10 Nevertheless, Section 3 demonstrates how the deep variation available with the ethnic patenting data provides a much richer description of US ethnic invention than previously available. 3 Ethnic Composition of US Inventors Table 2 describes the ethnic composition of US inventors for 1975-2004.11 The trends demonstrate a growing ethnic contribution to US technology development, especially among Chinese and Indian scientists. Ethnic inventors are more concentrated in high-tech industries like computers and pharmaceuticals and in gateway cities relatively closer to their home countries (e.g., Chinese in San Francisco, European in New York, and Hispanics in Miami). The …nal three rows demonstrate a close correspondence of the estimated ethnic composition to the country-of-birth 10 When mapping the ethnic patenting data to country-level data for international di¤usion estimations, researchers will also need to cluster their standard errors to re‡ect the multiple country-to-ethnicity mappings. 11 The current patent data incorporate all patents granted by May 2008. The application years of patents, however, provide the best description of when innovative research is being undertaken, due to the substantial and uneven lags in the USPTO reviews. Accordingly, the annual descriptions employed in this study are undertaken by application years. Unfortunately, this approach leads to signi…cant attrition in the last two years — patents are only included in the database if they have been granted, but a smaller number of applications close to the cut-o¤ have completed the review cycle. Raw patent counts should be treated with caution. Changes in the personnel resources and review policies of the USPTO in‡uence the number of patents granted over time (e.g., Griliches 1990), and the explosive climb in patent grants over the last two decades is di¢ cult to interpret (e.g., Kortum and Lerner 2000, Kim and Marschke 2004, Hall 2005, Ja¤e and Lerner 2005, and Branstetter and Ogura 2005). Accordingly, this study considers patent shares, which avoids these interpretation concerns. Studies seeking to quantify the number of ethnic researchers in the US should supplement this data with immigration records or demographic surveys (with an unfortunate loss of detail). Trajtenberg (2005) and HBS Research are working on algorithms to identify individual scientists with the USPTO data. 6 composition of the US SE workforce in the 1990 Census.12 closely examine each dimension of this data. 3.1 The next four subsections more Contributions by Year Figure 1 illustrates the evolving ethnic composition of US inventors from 1975-2004. The omitted English share declines from 83% to 70% during this period. Looking across all technology categories, the European ethnicity is initially the largest foreign contributor to US technology development. Like the English ethnicity, however, the European share of US domestic inventors declines steadily from 8% in 1975 to 6% in 2004. This declining share is partly due to the exceptional growth over the thirty years of the Chinese and Indian ethnicities, which increase from under 2% to over 8% and 5%, respectively. As shown below, this Chinese and Indian growth is concentrated in high-tech sectors, where Chinese inventors supplant European researchers as the largest ethnic contributor to US technology formation. The Indian ethnic contribution declines somewhat after 2000, mostly due to changes within the computer technology sector as seen below. Among the other ethnicities, the Hispanic contribution grows from 3% to 4% from 1975 to 2004. The level of this series is likely mismeasured due to the extensive overlap of Hispanic and European names, but the positive growth is consistent with stronger Latino and Filipino scienti…c contributions in Florida and California. The Korean share increases dramatically from 0.3% to 1.1% over the thirty years, while the Russian climbs from 1.2% to 2.2%. Although di¢ cult to see with Figure 1’s scaling, much of the Russian increase occurs in the 1990s following the dissolution of the Soviet Union. The Japanese share steadily increases from 0.6% to 1.0%. Finally, while the Vietnamese contribution is the lowest throughout the sample, it does exhibit the strongest relative growth from 0.1% to 0.6%. 3.2 Contributions by Technology Figure 2 documents the total ethnic contribution by the six broad technology groups into which patents are often classi…ed: Chemicals, Computers and Communications, Drugs and Medical, Electrical and Electronic, Mechanical, and Others. The miscellaneous group includes patents for agriculture, textiles, furniture, and the like. Growth in ethnic patenting is clearly stronger in high-tech sectors than in more traditional industries. Figures 3-8 provide the ethnic contributions within each technology category. The growing ethnic contribution in high-tech sectors is easily traced to the Chinese and Indian ethnicities. Moreover, these two ethnicities exhibit 12 The estimated European contribution in Table 2 is naturally higher than the immigrant contribution measured by foreign born. 7 the most interesting and economically meaningful variation across technologies, as summarized in Figures 9 and 10.13 3.3 Contributions by Institution Figure 11 demonstrates that intriguing di¤erences in ethnic scienti…c contributions also exist by institution type. Over the 1975-2004 period, ethnic inventors are more concentrated in government and university research labs and in publicly listed companies than in private companies or as una¢ liated inventors. Part of this levels di¤erence is certainly due to immigration visa sponsorships by larger institutions. Growth in ethnic shares are initially stronger in the government and university labs, but publicly listed companies appear to close the gap by 2004. The other interesting trend in Figure 11 is for private companies, where the ethnic contribution sharply increases in the 1990s. This rise coincides with the strong growth in ethnic entrepreneurship in high-tech sectors.14 3.4 Contributions by Geography This paper closes its descriptive statistics with an examination of the 1975-2004 ethnic inventor contributions by major cities in Table 3. Cities are de…ned through 281 Metropolitan Statistical Areas.15 Not surprisingly, total patenting shares are highly correlated with city size, with the three largest shares of US domestic patenting for 1995-2004 found in San Francisco (12%), New York (7%), and Los Angeles (6%). More interestingly, non-English patenting is more concentrated than general innovation. The 1995-2004 non-English patent shares of San Francisco, New York, and Los Angeles are 19%, 10%, and 8%, respectively. Similarly, 81% of non-English invention occurs in the top 47 patenting cities listed in Table 3, compared to 73% of total patenting. Indian and Chinese invention is even further agglomerated. San Francisco shows exceptional growth from an 8% share of total US Indian and Chinese patenting in 13 The USPTO issues patents by technology categories rather than by industries. Combining the work of Johnson (1999), Silverman (1999), and Kerr (2008a), concordances can be developed to map the USPTO classi…cation scheme to the three-digit industries in which new inventions are manufactured or used. Scherer (1984) and Keller (2002) further discuss the importance of inter-industry R&D ‡ows. 14 Publicly listed companies are identi…ed from a 1989 mapping developed by Hall et al. (2001). This company list is not updated for delistings or new public o¤erings. This approach maintains a constant public grouping for reference, but it also weakens the respresentativeness of the public and private company groupings at the sample extremes for current companies. Industry patents account for 72% of patents granted from 1980-1997. Public companies account for 59% of industry patents during the period and are identi…ed through Compustat records. Government and university institutions are identi…ed through institution names and account for about 4% of patents granted. Federally funded research and development centers (FFRDCs) are included in both industry and government groups. Unassigned patents account for about 26% of patents granted. 15 MSAs are identi…ed from inventors’city names using city lists collected from the O¢ ce of Social and Economic Data Analysis at the University of Missouri, with a matching rate of 99%. Manual coding further ensures all patents with more than 100 citations and all city names with more than 100 patents are identi…ed. 8 1975-1984 to 25% in 1995-2004, while the combined shares of New York and Chicago decline from 22% to 13%. Agrawal et al. (2007a,b) and Kerr (2008c) further describe ethnic inventor agglomeration in the US using the ethnic name approach. Not only are ethnic scientists disproportionately concentrated in major cities, but growth in a city’s share of ethnic patenting is highly correlated with growth in its share of total US patenting. Across the whole sample and including all of the intervening years, an increase of 1% in a city’s ethnic patenting share correlates with a 0.6% increase in the city’s total invention share. This coe¢ cient is remarkably high, as the ethnic share of total invention during this period is around 20%. Shifts in the concentration of ethnic inventors appear to facilitate changes in the geographic composition of US innovation.16 4 Conclusion Ethnic scientists and engineers are an important and growing contributor to US technology development. The Chinese and Indian ethnicities, in particular, are now an integral part of US invention in high-tech sectors. This paper describes how the probable ethnicities of US researchers can be determined at the micro-level through their names available with USPTO patent records. The ethnic-name database this study employs distinguishes nine ethnic groups, and the matched database describes the ethnic composition of US inventors with previously unavailable cross-sectional and longitudinal detail. This richer variation can support more detailed and informative empirical analyses than would be feasible otherwise. 16 The ethnic-name approach does not distinguish ethnic inventor shifts due to new immigration, domestic migration, or occupational changes. It is likewise beyond the scope of this descriptive note to explore issues of causality or e¤ects on native workers. See Kerr and Lincoln (2008) for recent work in this area. 9 References [1] Agrawal, Ajay, Devesh Kapur, and John McHale, "Birds of a Feather – Better Together? Exploring the Optimal Spatial Distribution of Ethnic Inventors", NBER Working Paper 12823 (2007a). [2] Agrawal, Ajay, Devesh Kapur, and John McHale, "Brain Drain or Brain Bank? The Impact of Skilled Emigration on Poor-Country Innovation", Working Paper (2007b). [3] Borjas, George, "Do Foreign Students Crowd Out Native Students from Graduate Programs?", NBER Working Paper 10349 (2004). [4] Branstetter, Lee, and Yoshiaki Ogura, "Is Academic Science Driving a Surge in Industrial Innovation? Evidence from Patent Citations", NBER Working Paper 11561 (2005). [5] Burton, Lawrence, and Jack Wang, "How Much Does the U.S. Rely on Immigrant Engineers?", NSF SRS Issue Brief (1999). [6] Foley, C. Fritz, and William Kerr, "US Ethnic Scientists and Foreign Direct Investment Placement", Working Paper (2008). [7] Freeman, Richard, "Does Globalization of the Scienti…c/Engineering Workforce Threaten U.S. Economic Leadership?", NBER Working Paper 11457 (2005). [8] Griliches, Zvi, "Patent Statistics as Economic Indicators: A Survey", Journal of Economic Literature 28:4 (1990), 1661-1707. [9] Hall, Bronwyn, "Exploring the Patent Explosion", Journal of Technology Transfer 30 (2005), 35-48. [10] Hall, Bronwyn, Adam Ja¤e, and Manuel Trajtenberg, "The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools", NBER Working Paper 8498 (2001). [11] Ja¤e, Adam, and Joshua Lerner, Innovation and Its Discontents (Boston, MA: Harvard Business School Press, 2005). [12] Johnson, Daniel, "150 Years of American Invention: Methodology and a First Geographic Application", Wellesley College Economics Working Paper 99-01 (1999). Data currently reside at http://faculty1.coloradocollege.edu/~djohnson/uships.html. [13] Johnson, Jean, "Statistical Pro…les of Foreign Doctoral Recipients in Science and Engineering: Plans to Stay in the United States", NSF SRS Report (1998). [14] Johnson, Jean, "Human Resource Contribution to U.S. Science and Engineering From China", NSF SRS Issue Brief (2001). [15] Kannankutty, Nirmala, and R. Keith Wilkinson, "SESTAT: A Tool for Studying Scientists and Engineers in the United States", NSF SRS Report (1999). [16] Keller, Wolfgang, "Trade and the Transmission of Technology", Journal of Economic Growth 7 (2002), 5-24. [17] Kerr, William, "Ethnic Scienti…c Communities and International Technology Di¤usion", Review of Economics and Statistics 90:3 (2008a), 518-537. 10 [18] Kerr, William, "Heterogeneous Technology Di¤usion and Ricardian Trade Patterns", Working Paper (2008b). [19] Kerr, William, "The Agglomeration of US Ethnic Inventors", HBS Working Paper (2008c). [20] Kerr, William, "The Role of Immigrant Scientists and Entrepreneurs in International Technology Transfer", MIT Ph.D. Dissertation (2005). [21] Kerr, William, and William Lincoln, "The Supply Side of Innovation: H-1B Visa Reforms and US Ethnic Invention", HBS Working Paper 09-005 (2008). [22] Kim, Jinyoung, and Gerald Marschke, "Accounting for the Recent Surge in U.S. Patenting: Changes in R&D Expenditures, Patent Yields, and the High Tech Sector", Economics of Innovation and New Technologies 13:6 (2004), 543-558. [23] Kortum, Samuel, and Joshua Lerner, "Assessing the Contribution of Venture Capital to Innovation", RAND Journal of Economics 31:4 (2000), 674-692. [24] Lowell, B. Lindsay, "H1-B Temporary Workers: Estimating the Population", The Center for Comparative Immigration Studies Working Paper 12 (2000). [25] Saxenian, AnnaLee, with Yasuyuki Motoyama and Xiaohong Quan, Local and Global Networks of Immigrant Professionals in Silicon Valley (San Francisco, CA: Public Policy Institute of California, 2002a). [26] Saxenian, AnnaLee, "Silicon Valley’s New Immigrant High-Growth Entrepreneurs", Economic Development Quarterly 16:1 (2002b), 20-31. [27] Scherer, Frederic, "Using Linked Patent Data and R&D Data to Measure Technology Flows", in Griliches, Zvi (ed.) R & D, Patents and Productivity (Chicago, IL: University of Chicago Press, 1984). [28] Silverman, Brian, "Technological Resources and the Direction of Corporate Diversi…cation: Toward an Integration of the Resource-Based View and Transaction Cost Economics", Management Science 45:8 (1999), 1109-1124. [29] Stephan, Paula, and Sharon Levin, "Exceptional Contributions to US Science by the Foreign-Born and Foreign-Educated", Population Research and Policy Review 20:1 (2001), 59-79. [30] Streeter, Joanne, "Major Declines in Admissions of Immigrant Scientists and Engineers in Fiscal Year 1994", NSF SRS Issue Brief (1997). [31] Trajtenberg, Manuel, "The Mobility of Inventors and the Productivity of Research", Working Paper (2005). [32] Wadhwa, Vivek, AnnaLee Saxenian, Ben Rissing, and Gary Gere¢ , "America’s New Immigrant Entrepreneurs I", Working Paper (2007). 11 Fig. 1: Ethnic Share of US Domestic Patents Percentage of Patent Applications 10% 8% 6% 4% 2% 0% 1975 1980 1985 1990 1995 2000 Chinese European Hispanic Indian Japanese Korean Russian Vietnam. Fig. 2: Total US Ethnic Share by Technology Percentage of Patent Applications 39% 34% 29% 24% 19% 14% 1975 1980 Chemicals 1985 Computers 1990 Drugs 1995 Electrical 2000 Mechanical Other Fig. 3: US Ethnic Patenting - Chemicals Percentage of Patent Applications 10% 8% 6% 4% 2% 0% 1975 1980 1985 1990 1995 2000 Chinese European Hispanic Indian Japanese Korean Russian Vietnam. Fig. 4: US Ethnic Patenting - Computers Percentage of Patent Applications 14% 12% 10% 8% 6% 4% 2% 0% 1975 1980 1985 1990 1995 2000 Chinese European Hispanic Indian Japanese Korean Russian Vietnam. Fig. 5: US Ethnic Patenting - Drugs Percentage of Patent Applications 12% 10% 8% 6% 4% 2% 0% 1975 1980 1985 1990 1995 2000 Chinese European Hispanic Indian Japanese Korean Russian Vietnam. Fig. 6: US Ethnic Patenting - Electrical Percentage of Patent Applications 14% 12% 10% 8% 6% 4% 2% 0% 1975 1980 1985 1990 1995 2000 Chinese European Hispanic Indian Japanese Korean Russian Vietnam. Fig. 7: US Ethnic Patenting - Mechanical 10% Percentage of Patent Applications 9% 8% 7% 6% 5% 4% 3% 2% 1% 0% 1975 1980 1985 1990 1995 2000 Chinese European Hispanic Indian Japanese Korean Russian Vietnam. Fig. 8: US Ethnic Patenting - Other 10% Percentage of Patent Applications 9% 8% 7% 6% 5% 4% 3% 2% 1% 0% 1975 1980 1985 1990 1995 2000 Chinese European Hispanic Indian Japanese Korean Russian Vietnam. Fig. 9: Chinese Contribution by Technology 14% Percentage of Patent Applications 12% 10% 8% 6% 4% 2% 0% 1975 1980 Chemicals 1985 Computers 1990 Drugs 1995 Electrical 2000 Mechanical Other Fig. 10: Indian Contribution by Technology 10% Percentage of Patent Applications 9% 8% 7% 6% 5% 4% 3% 2% 1% 0% 1975 1980 Chemicals 1985 Computers 1990 Drugs 1995 Electrical 2000 Mechanical Other Fig. 11: Total US Ethnic Share by Institution Percentage of Patent Applications 40% 35% 30% 25% 20% 15% 10% 1975 1980 Total Public Industry 1985 1990 Private Industry 1995 Government/University 2000 Unassigned Table 1: Descriptive Statistics for Inventors Residing in Foreign Countries and Regions Summary Statistics for Full and Restricted Matching Procedures Percentage of Region's Inventors Matched with Ethnic Database Percentage of Region's Inventors Assigned Ethnicity of Their Region Percentage of Region's Inventors Assigned Ethnicity of Region (Partial) Obs. Full Restrict. Full Restrict. Full Restrict. United Kingdom 187,266 99% 95% 85% 83% 92% 91% China, Singapore 167,370 100% 98% 88% 89% 91% 91% Western Europe 1,210,231 98% 79% 66% 46% 73% 58% Hispanic Nations 27,298 99% 74% 74% 69% 93% 93% India 13,582 93% 76% 88% 88% 90% 89% Japan 1,822,253 100% 89% 100% 96% 100% 96% South Korea 127,975 100% 100% 84% 83% 89% 88% Russia 33,237 94% 78% 81% 84% 93% 94% 41 100% 98% 36% 43% 44% 43% Vietnam Complete Ethnic Composition of Region's Inventors (Full Matching) English Chinese European Hispanic Indian Japanese Korean Russian Vietnam. United Kingdom 85% 2% 5% 3% 2% 0% 0% 2% 0% China, Singapore 3% 88% 1% 1% 1% 1% 4% 1% 1% Western Europe 21% 1% 66% 8% 1% 0% 0% 3% 0% Hispanic Nations 11% 1% 10% 74% 0% 1% 0% 2% 0% India 3% 1% 1% 5% 88% 0% 0% 2% 0% Japan 0% 0% 0% 0% 0% 100% 0% 0% 0% South Korea 2% 11% 0% 1% 0% 1% 84% 1% 0% Russia 5% 1% 3% 9% 0% 0% 0% 81% 0% Vietnam 17% 21% 12% 0% 0% 10% 2% 2% 36% Notes: Matching is undertaken at inventor level using the Full and Restricted Matching procedures outlined in the text. The middle columns of the top panel summarize the share of each region's inventors assigned the ethnicity of that region; the complete composition for the Full Matching procedure is detailed in the bottom panel. The right-hand columns in the top panel document the percentage of the region's inventors assigned at least partially to their region's ethnicity. Greater China includes Mainland China, Hong Kong, Macao, and Taiwan. Western Europe includes Austria, Belgium, Denmark, Finland, France, Germany, Italy, Luxembourg, Netherlands, Norway, Poland, Sweden, and Switzerland. Hispanic Nations includes Argentina, Belize, Brazil, Chile, Columbia, Costa Rica, Cuba, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Philippines, Portugal, Spain, Uruguay, and Venezuela. Russia includes former Soviet Union countries. Table 2: Descriptive Statistics for Inventors Residing in US English Chinese European Ethnicity of Inventor Hispanic Indian Japanese Korean Russian Vietnam. A. Ethnic Inventor Shares Estimated from US Inventor Records, 1975-2004 1975-1979 1980-1984 1985-1989 1990-1994 1995-1999 2000-2004 82.5% 81.1% 79.8% 77.6% 73.9% 70.4% 2.2% 2.9% 3.6% 4.6% 6.5% 8.5% 8.3% 7.9% 7.5% 7.2% 6.8% 6.4% 2.9% 3.0% 3.2% 3.5% 3.9% 4.2% 1.9% 2.4% 2.9% 3.6% 4.8% 5.4% 0.6% 0.7% 0.8% 0.9% 0.9% 1.0% 0.3% 0.5% 0.6% 0.7% 0.8% 1.1% 1.2% 1.3% 1.4% 1.5% 1.8% 2.2% 0.1% 0.1% 0.2% 0.4% 0.5% 0.6% Chemicals Computers Pharmaceuticals Electrical Mechanical Miscellaneous 73.4% 70.1% 72.9% 71.6% 80.4% 81.3% 7.2% 8.2% 7.1% 8.0% 3.2% 2.9% 7.5% 6.3% 7.4% 6.8% 7.1% 7.0% 3.6% 3.8% 4.3% 3.7% 3.5% 3.8% 4.5% 6.9% 4.2% 4.9% 2.6% 2.1% 1.0% 1.1% 1.1% 1.1% 0.7% 0.6% 0.8% 0.9% 0.9% 1.1% 0.6% 0.6% 1.7% 2.1% 1.8% 2.1% 1.6% 1.4% 0.3% 0.7% 0.4% 0.7% 0.2% 0.3% Top Cities as a Percentage of City’s Patents KC (89) WS (88) NAS (88) SF (13) LA (8) AUS (6) NOR (12) STL (11) NYC (11) MIA (16) SA (9) WPB (7) SF (7) AUS (7) PRT (6) SD (2) SF (2) LA (2) BAL (2) LA (2) SF (1) BOS (3) NYC (3) SF (3) AUS (2) SF (1) LA (1) B. Ethnic Scientist and Engineer Shares Estimated from 1990 US Census Records Bachelors Share Masters Share Doctorate Share 87.6% 78.9% 71.2% 2.7% 6.7% 13.2% 2.3% 3.4% 4.0% 2.4% 2.2% 1.7% 2.3% 5.4% 6.5% 0.6% 0.9% 0.9% 0.5% 0.7% 1.5% 0.4% 0.8% 0.5% 1.2% 1.0% 0.4% Notes: Panel A presents descriptive statistics for inventors residing in the US at the time of patent application. Inventor ethnicities are estimated through inventors' names using techniques described in the text. Patents are grouped by application years and major technology fields. Cities, defined through Metropolitan Statistical Areas, include AUS (Austin), BAL (Baltimore), BOS (Boston), KC (Kansas City), LA (Los Angeles), MIA (Miami), NAS (Nashville), NOR (New Orleans), NYC (New York City), PRT (Portland), SA (San Antonio), SD (San Diego), SF (San Francisco), STL (St. Louis), WPB (West Palm Beach), and WS (Winston-Salem). Cities are identified from inventors' city names using city lists collected from the Office of Social and Economic Data Analysis at the University of Missouri, with a matching rate of 99%. Manual recoding further ensures all patents with more than 100 citations and all city names with more than 100 patents are identified. Panel B presents comparable statistics calculated from the 1990 Census using country of birth for scientists and engineers. Country groupings follow Table 1; English provides a residual in the Census statistics. Table 3: Ethnic Inventor Contributions by City Total Patenting Share Atlanta, GA Austin, TX Baltimore, MD Boston, MA Buffalo, NY Charlotte, NC Chicago, IL Cincinnati, OH Cleveland, OH Columbus, OH Dallas-Fort Worth, TX Denver, CO Detroit, MI Greensboro-W.S., NC Hartford, CT Houston, TX Indianapolis, IN Jacksonville, NC Kansas City, MO Las Vegas, NV Los Angeles, CA Memphis, TN Miami, FL Milwaukee, WI Minneap.-St. Paul, MN non-English Ethnic Patenting Share Indian and Chinese Patenting Share 19751984 19851994 19952004 20012006 (A) 19751984 19851994 19952004 20012006 (A) 19751984 19851994 19952004 20012006 (A) 0.6% 0.4% 0.8% 3.6% 0.6% 0.3% 6.0% 1.0% 2.3% 0.7% 1.6% 1.0% 3.1% 0.2% 0.9% 2.3% 0.8% 0.1% 0.4% 0.1% 6.6% 0.1% 0.8% 1.0% 1.9% 1.0% 0.9% 0.8% 3.8% 0.5% 0.3% 4.6% 1.1% 1.7% 0.5% 2.0% 1.2% 3.3% 0.3% 0.9% 2.5% 0.7% 0.1% 0.3% 0.1% 6.1% 0.2% 0.9% 0.9% 2.4% 1.3% 1.8% 0.7% 3.9% 0.4% 0.3% 3.5% 1.0% 1.3% 0.5% 2.3% 1.3% 2.9% 0.3% 0.6% 1.9% 0.7% 0.1% 0.4% 0.2% 6.0% 0.2% 0.7% 0.8% 2.7% 1.5% 2.0% 0.7% 4.6% 0.3% 0.3% 3.2% 1.0% 1.1% 0.4% 2.1% 1.3% 2.8% 0.2% 0.6% 2.0% 0.5% 0.1% 0.3% 0.3% 5.7% 0.3% 0.7% 0.7% 2.8% 0.3% 0.5% 0.7% 3.9% 0.8% 0.2% 6.9% 0.9% 2.5% 0.6% 1.1% 0.8% 3.1% 0.1% 1.0% 1.8% 0.6% 0.1% 0.2% 0.1% 7.2% 0.1% 1.0% 0.8% 1.6% 0.7% 1.2% 0.7% 4.2% 0.6% 0.2% 5.0% 0.9% 1.5% 0.6% 1.9% 1.0% 3.1% 0.2% 0.8% 2.3% 0.4% 0.1% 0.2% 0.1% 7.2% 0.1% 1.3% 0.8% 2.0% 1.0% 1.9% 0.6% 4.1% 0.4% 0.2% 3.5% 0.7% 1.0% 0.4% 2.3% 0.9% 2.6% 0.2% 0.5% 1.8% 0.4% 0.1% 0.2% 0.2% 7.9% 0.1% 1.0% 0.6% 2.0% 1.1% 2.0% 0.5% 4.8% 0.3% 0.2% 3.0% 0.7% 0.8% 0.3% 2.2% 0.8% 2.6% 0.1% 0.5% 1.9% 0.3% 0.1% 0.2% 0.2% 7.3% 0.2% 0.9% 0.5% 2.0% 0.3% 0.4% 0.4% 4.0% 1.1% 0.1% 5.6% 0.7% 2.5% 0.8% 1.5% 0.8% 3.2% 0.2% 0.8% 2.2% 0.7% 0.1% 0.2% 0.0% 6.7% 0.1% 0.5% 0.5% 1.5% 0.7% 1.6% 0.5% 4.0% 0.7% 0.2% 3.9% 1.0% 1.4% 0.7% 2.4% 1.0% 2.8% 0.2% 0.6% 2.8% 0.5% 0.1% 0.1% 0.1% 6.9% 0.1% 0.6% 0.4% 1.7% 1.0% 2.3% 0.6% 3.6% 0.4% 0.1% 2.9% 0.6% 0.9% 0.3% 2.9% 0.6% 2.5% 0.1% 0.3% 1.8% 0.4% 0.1% 0.2% 0.1% 7.5% 0.1% 0.5% 0.5% 1.7% 1.2% 2.3% 0.5% 4.3% 0.3% 0.2% 2.8% 0.6% 0.6% 0.3% 2.8% 0.5% 2.5% 0.1% 0.4% 1.9% 0.3% 0.1% 0.2% 0.1% 7.0% 0.1% 0.4% 0.4% 1.8% Table 3: Ethnic Inventor Contributions by City, continued Total Patenting Share non-English Ethnic Patenting Share Indian and Chinese Patenting Share 19751984 19851994 19952004 20012006 (A) 19751984 19851994 19952004 20012006 (A) 19751984 19851994 19952004 20012006 (A) Nashville, TN New Orleans, LA New York, NY Norfolk-VA Beach, VA Orlando, FL Philadelphia, PA Phoenix, AZ Pittsburgh, PA Portland, OR Providence, RI Raleigh-Durham, NC Richmond, VA Sacramento, CA Salt Lake City, UT San Antonio, TX San Diego, CA San Francisco, CA Seattle, WA St. Louis, MO Tallahassee, FL Washington, DC West Palm Beach, FL 0.1% 0.3% 11.5% 0.2% 0.2% 4.6% 1.0% 2.0% 0.5% 0.3% 0.3% 0.3% 0.2% 0.4% 0.1% 1.1% 4.8% 0.9% 1.0% 0.4% 1.5% 0.3% 0.2% 0.2% 8.9% 0.2% 0.3% 4.0% 1.2% 1.3% 0.8% 0.3% 0.6% 0.3% 0.4% 0.5% 0.2% 1.6% 6.6% 1.3% 0.9% 0.5% 1.5% 0.5% 0.2% 0.2% 7.3% 0.2% 0.3% 2.7% 1.4% 0.8% 1.4% 0.3% 1.1% 0.2% 0.5% 0.6% 0.2% 2.2% 12.1% 1.9% 0.8% 0.4% 1.4% 0.4% 0.2% 0.1% 6.9% 0.1% 0.3% 2.8% 1.3% 0.7% 1.6% 0.2% 1.5% 0.2% 0.5% 0.6% 0.2% 2.8% 13.2% 3.4% 0.8% 0.4% 1.6% 0.4% 0.0% 0.3% 16.6% 0.1% 0.1% 5.6% 0.6% 2.2% 0.3% 0.3% 0.3% 0.3% 0.2% 0.2% 0.1% 1.1% 6.2% 0.8% 0.9% 0.3% 1.6% 0.3% 0.1% 0.3% 13.1% 0.1% 0.2% 4.9% 1.1% 1.4% 0.6% 0.4% 0.6% 0.3% 0.4% 0.4% 0.2% 1.6% 9.3% 1.1% 0.8% 0.4% 1.6% 0.5% 0.1% 0.1% 10.1% 0.1% 0.3% 2.8% 1.3% 0.6% 1.4% 0.3% 1.0% 0.2% 0.5% 0.3% 0.2% 2.6% 19.3% 1.8% 0.8% 0.3% 1.5% 0.4% 0.1% 0.1% 8.9% 0.1% 0.3% 2.9% 1.2% 0.5% 1.6% 0.2% 1.3% 0.2% 0.5% 0.3% 0.2% 3.6% 19.9% 3.5% 0.7% 0.3% 1.7% 0.4% 0.1% 0.2% 16.6% 0.1% 0.1% 6.2% 0.4% 2.2% 0.2% 0.2% 0.3% 0.3% 0.2% 0.2% 0.2% 0.8% 8.4% 0.6% 1.0% 0.2% 1.6% 0.3% 0.1% 0.2% 13.3% 0.1% 0.2% 5.8% 1.0% 1.3% 0.6% 0.3% 0.8% 0.4% 0.3% 0.3% 0.1% 1.4% 13.0% 1.0% 0.8% 0.2% 1.7% 0.3% 0.1% 0.0% 9.7% 0.1% 0.3% 2.8% 1.4% 0.5% 1.7% 0.2% 1.0% 0.2% 0.5% 0.3% 0.1% 2.4% 25.4% 1.8% 0.4% 0.2% 1.5% 0.2% 0.1% 0.0% 9.0% 0.1% 0.3% 3.0% 1.3% 0.5% 2.0% 0.2% 1.2% 0.2% 0.5% 0.3% 0.1% 3.9% 24.0% 3.7% 0.4% 0.2% 1.7% 0.2% Other 234 Major Cities Not in a Major City 21.8% 9.0% 22.3% 8.2% 20.7% 6.6% 18.4% 6.2% 18.1% 6.3% 18.1% 5.4% 15.6% 3.7% 13.6% 4.1% 19.7% 5.2% 18.2% 3.8% 14.6% 2.5% 12.7% 2.7% Notes: See Table 1. The first three columns of each grouping are for granted patents. The fourth column, marked with (A), is for published patent applications. Table A1: Most Common Ethnic Surnames for Inventors Residing in the US Chinese CAI CAO CHAN CHANG CHAO CHAU CHEN CHENG CHEUNG CHIANG CHIEN CHIN CHIU CHOU CHOW CHU DENG DING DONG FAN FANG FENG FONG FU FUNG GAO GUO HAN HE HO HSIEH HSU HU HUANG HUI HUNG HWANG JIANG KAO KUO LAI LAM LAU LEE LEUNG LEW LI LIANG LIAO LIM LIN LING English 585 657 3,096 3,842 796 486 12,860 2,648 950 1,112 429 423 924 1,144 1,139 2,353 439 589 492 1,036 846 658 727 767 455 785 921 777 1,159 1,282 980 3,034 1,695 4,605 451 562 800 1,399 714 1,157 1,134 1,336 1,320 4,006 1,165 460 6,863 1,173 553 485 5,770 521 ADAMS ALLEN ANDERSON BAILEY BAKER BELL BENNETT BROOKS BROWN BURNS CAMPBELL CARLSON CARTER CHANG CLARK COHEN COLE COLLINS COOK COOPER COX DAVIS EDWARDS EVANS FISCHER FISHER FOSTER FOX GARDNER GORDON GRAHAM GRAY GREEN HALL HAMILTON HANSON HARRIS HAYES HILL HOFFMAN HOWARD HUGHES JACKSON JENSEN JOHNSON JONES KELLER KELLY KENNEDY KING KLEIN LARSON European 4,490 5,074 10,719 2,431 4,671 2,738 2,734 2,015 11,662 2,098 3,959 2,745 2,658 2,032 5,493 2,626 2,143 2,992 3,556 3,045 2,407 8,848 3,375 4,082 2,081 2,748 2,616 1,990 2,412 2,315 2,042 2,626 3,540 4,907 1,991 2,148 4,793 2,031 3,590 2,387 2,160 2,198 3,980 2,361 17,960 10,630 2,041 2,775 2,208 4,686 2,347 2,537 ABEL ALBRECHT ANTOS AUERBACH BAER BAERLOCHER BAUER BECHTEL BECK BENDER BERG BERGER BOEHM BOUTAGHOU CARON CERAMI CHANDRARATNA CHEVALLIER DIETRICH DIETZ EBERHARDT EHRLICH ERRICO FARKAS FERRARI FISCHELL FUCHS GAISER GELARDI GRILLIOT GUEGLER GUNTER GUNTHER HAAS HAMPEL HANSEN HARTMAN HARTMANN HAUSE HECHT HEINZ HORODYSKY HORVATH IACOVELLI JACOBS KARR KASPER KEMPF KNAPP KNIFTON KOENIG KRESGE Hispanic / Filipino 269 564 230 193 422 252 1,470 179 1,712 650 1,465 1,304 256 266 290 172 229 204 312 496 192 311 190 169 177 280 394 193 176 201 179 177 247 843 187 2,947 1,214 385 266 245 168 230 387 287 1,962 196 227 228 833 206 521 179 ACOSTA AGUILAR ALVAREZ ANDREAS AYER AYRES BALES BLANCO BOLANOS BOLES CABRAL CABRERA CALDERON CASTANEDA CASTILLO CASTRO CHAVEZ CONTRERAS CRUZ CUEVAS DAS DELGADO DIAS DIAZ DOMINGUEZ DURAN ELIAS ESTRADA FERNANDES FERNANDEZ FIGUEROA FLORES FREITAS GAGNON GARCIA GARZA GOMES GOMEZ GONSALVES GONZALES GONZALEZ GUTIERREZ GUZMAN HALASA HERNANDEZ HERRERA HERRON HIDALGO JIMENEZ LEE LOPEZ MACHADO Indian / Hindi 171 138 446 128 166 180 240 141 130 118 154 163 124 116 124 119 194 137 319 123 213 216 174 584 195 142 230 142 152 546 146 191 132 265 1,310 167 199 413 141 281 1,055 601 139 202 703 171 450 186 246 237 738 135 ACHARYA AGARWAL AGGARWAL AGRAWAL AHMAD AHMED AKRAM ALI ARIMILLI ARORA ASH BALAKRISHNAN BANERJEE BASU BHAT BHATIA BHATT BHATTACHARYA BHATTACHARYYA BOSE CHANDRA CHATTERJEE DAOUD DAS DATTA DE DESAI DIXIT DUTTA GANDHI GARG GHOSH GOEL GUPTA HASSAN HUSSAIN HUSSAINI ISLAM IYER JAIN JOSHI KAMATH KAPOOR KHANNA KRISHNAMURTHY KRISHNAN KULKARNI KUMAR LAL MALIK MATHUR MEHROTRA 338 580 282 797 355 652 640 559 432 214 290 228 371 233 224 411 242 216 265 238 221 647 305 522 424 234 974 256 338 228 345 661 279 1,935 217 233 299 266 601 912 886 219 222 378 369 512 299 2,005 366 532 306 265 Table A1: Most Common US Ethnic Surnames (continued) Chinese LIU LO LU LUO MA MAO NG ONG PAN PENG SHEN SHI SHIH SONG SU SUN TAI TAM TAN TANG TENG TONG TSAI TSANG TSENG TUNG WANG WEI WEN WONG WOO WU XIE XU YAN YANG YAO YE YEE YEH YEN YIN YU YUAN ZHANG ZHAO ZHENG ZHOU ZHU English 6,406 1,053 2,289 815 1,708 545 1,132 473 1,435 530 1,480 964 938 636 1,025 2,521 463 589 1,105 2,277 437 677 1,244 499 538 565 11,905 1,317 455 4,811 710 5,521 609 2,249 826 4,584 699 525 729 928 467 617 2,293 825 4,532 1,337 1,037 1,517 1,749 European LEE 9,490 LANGE LEWIS 4,732 LASKARIS LONG 2,392 LEMELSON MARSHALL 2,088 LIOTTA MARTIN 6,773 LORENZ MILLER 14,942 LUDWIG MITCHELL 3,075 LUTZ MOORE 6,459 MAIER MORGAN 2,824 MARTIN MORRIS 3,223 MAYER MURPHY 3,609 MEYER MURRAY 2,207 MOLNAR MYERS 2,625 MORIN NELSON 6,444 MUELLER OLSON 3,140 MULLER PARKER 3,181 NAGEL PETERSON 4,912 NATHAN PHILLIPS 3,875 NILSSEN PRICE 2,062 NOVAK REED 2,645 PAGANO RICHARDSON 2,114 PALERMO ROBERTS 4,352 PASTOR ROBINSON 3,741 POPP ROGERS 2,974 RAO ROSS 2,377 REITZ RUSSELL 2,611 ROHRBACH RYAN 2,404 ROMAN SCOTT 3,583 ROSTOKER SHAW 2,369 SCHMIDT SIMPSON 2,014 SCHNEIDER SMITH 24,173 SCHULTZ SNYDER 2,335 SCHULZ STEVENS 2,221 SCHWARTZ STEWART 2,924 SCHWARZ SULLIVAN 2,933 SPERANZA TAYLOR 6,659 SPIEGEL THOMAS 5,312 STRAETER THOMPSON 6,424 THEEUWES TURNER 2,855 TROKHAN WALKER 4,887 VOCK WALLACE 1,963 WACHTER WARD 2,913 WAGNER WATSON 2,139 WEBER WHITE 6,190 WEDER WILLIAMS 10,442 WEISS WILSON 7,677 WOLF WOOD 4,525 WRISTERS WRIGHT 4,521 ZIMMERMAN YOUNG 5,957 ZIMMERMANN Hispanic / Filipino 757 192 324 171 341 500 679 492 223 1,097 3,004 335 320 2,242 985 383 171 234 788 177 177 238 202 343 248 246 362 245 3,753 2,246 2,273 921 2,394 633 215 177 454 247 167 423 199 2,499 3,003 1,067 1,533 1,604 185 1,542 226 MARIN MARQUEZ MARTIN MARTINEZ MATIS MEDINA MENARD MENDOZA MIRANDA MOLINA MORALES MORENO MUNOZ NUNEZ ORTEGA ORTIZ PADILLA PAZ DE ARAUJO PEREIRA PEREZ QUINTANA RAMIREZ RAMOS REGNIER REIS REYES RIVERA RODRIGUES RODRIGUEZ ROMERO RUIZ SALAZAR SANCHEZ SANTIAGO SERRANO SILVA SOTO SOUZA SUAREZ TORRES VALDEZ VARGA VASQUEZ VAZQUEZ VELAZQUEZ VINALS YU ZAMORA ZUNIGA Indian / Hindi 177 117 183 1,112 249 192 149 173 140 129 146 128 177 207 206 362 116 148 280 675 126 345 226 137 168 150 489 188 1,314 292 297 179 717 158 172 457 158 145 150 352 127 130 153 260 134 220 140 120 128 MEHTA MENON MISHRA MISRA MOOKHERJEE MUKHERJEE MURTHY NAGARAJAN NAIR NARASIMHAN NARAYAN NARAYANAN NATARAJAN PAREKH PARIKH PATEL PATIL PRAKASH PRASAD PURI RAGHAVAN RAHMAN RAJAGOPALAN RAMACHANDRAN RAMAKRISHNAN RAMAN RAMASWAMY RAMESH RANGARAJAN RAO REDDY ROY SANDHU SAXENA SHAH SHARMA SINGH SINGHAL SINHA SIRCAR SRINIVASAN SRIVASTAVA SUBRAMANIAN THAKUR TRIVEDI VENKATESAN VERMA VISWANATHAN VORA 925 325 348 282 272 327 236 270 560 225 312 419 301 301 286 3,879 352 326 549 233 378 367 396 388 270 222 244 364 244 1,196 459 279 878 213 2,467 1,249 2,412 245 463 225 876 498 702 381 383 281 262 218 223 Table A1: Most Common US Ethnic Surnames (continued) Japanese AOKI AOYAMA ASATO CHEN DOI FUJII FUJIMOTO FUKUDA FURUKAWA HANAWA HARADA HASEGAWA HASHIMOTO HAYASHI HEY HIGASHI HIGUCHI HONDA IDE IKEDA IMAI INOUE IRICK ISHIDA ISHII ISHIKAWA ITO IWAMOTO KANEKO KATO KAUTZ KAWAMURA KAWASAKI KAYA KIMURA KINO KINOSHITA KIRIHATA KISHI KIWALA KOBAYASHI LI LIU MAKI MATSUMOTO MIYANO MIZUHARA MORI MORITA MOSLEHI MOTOYAMA MURAKAMI Korean 141 66 73 88 90 92 98 84 218 69 90 171 110 148 75 98 81 102 136 98 129 90 86 93 82 208 260 78 157 113 87 87 104 78 108 74 93 107 65 132 296 75 84 167 147 70 87 128 64 165 130 67 AHN BAE BAEK BAK BANG BARK BYUN CHA CHAE CHANG CHIN CHO CHOE CHOI CHON CHOO CHUN CHUNG DROZD EYUBOGLU GANG GU HAHM HAHN HAM HAN HANSELL HOGLE HONE HONG HOSKING HUH HWANG HYUN IM JANG JEON JEONG JI JIN JO JOO JU JUNG KANG KIANI KIM KO KOO KUN KWAK KWON Russian 610 122 77 68 91 39 87 45 33 289 33 977 193 1,081 33 94 330 1,499 45 36 34 533 42 1,016 45 145 39 43 78 907 63 32 108 54 80 46 134 122 268 673 41 68 55 582 809 74 5,455 595 214 63 96 298 AGHAJANIAN ALPEROVICH ALTSHULER ANDREEV ANSCHER BABICH BABLER BARINAGA BARNA BELOPOLSKY BERCHENKO BLASKO BLONDER BONIN CODILIAN COMISKEY DAMADIAN DANKO DAYAN DERDERIAN DOMBROSKI ELKO FETCENKO FISHKIN FOMENKOV FRENKEL FRIDMAN FROLOV GARABEDIAN GELFAND GINZBURG GITLIN GLUSCHENKOV GORALSKI GORDIN GORIN GRINBERG GROCHOWSKI GUREVICH GURSKY GUZIK HABA HYNECEK IBRAHIM IVANOV IVERS JOVANOVIC JU JUHASZ KAHLE KAMINSKI KAMINSKY Vietnamese 77 64 71 94 95 79 73 72 96 71 94 79 82 97 90 74 118 69 143 169 66 81 62 82 73 71 67 68 104 139 73 73 73 69 65 99 104 77 107 89 79 96 82 229 165 66 65 126 71 173 393 150 ABOU-GHARBIA BAHN BANH BI BICH BIEN BUI CAN CONG DANG DIEM DIEP DINH DIP DO DOAN DOMINH DONLAN DOVAN DUAN DUE DUONG DUONG-VAN ESKEW GRAN HAC HAUGAN HO HOANG HOPPING HUYNH HUYNH-BA KHA KHAW KHIEU KHU KHUC LAHUE LAURSEN LAVAN LE LE ROY LEEN LEMINH LUONG LY MINH NELLUMS NGO NGUY NGUYEN NHO 22 15 21 158 18 91 309 19 41 23 24 52 232 11 13 616 33 21 26 241 20 153 13 12 20 20 16 35 277 15 317 19 13 20 35 13 15 17 72 18 1,263 29 75 17 107 118 41 17 735 12 4,720 12 Table A1: Most Common US Ethnic Surnames (continued) Japanese NAJJAR NAKAGAWA NAKAJIMA NAKAMURA NAKANISHI NAKANO NEMOTO NISHIBORI NISHIMURA NODA OGAWA OGURA OHARA OHKAWA OKADA OKAMOTO ONO OVSHINSKY SAITO SAKAI SASAKI SATO SETO SHIMIZU SUZUKI TAKAHASHI TAKEUCHI TAMURA TANAKA THOR TSUJI TSUKAMOTO UCHIDA UEDA WADA WANG WATANABE WU YAMADA YAMAGUCHI YAMAMOTO YAMASAKI YAMASHITA YAMAZAKI YANG YASUDA YOSHIDA YUAN ZHAO Korean 81 125 99 187 64 104 70 88 131 107 74 209 269 89 87 103 148 314 136 79 209 231 73 103 306 245 242 83 328 66 92 89 72 72 153 81 416 67 180 102 432 67 105 91 65 75 178 112 81 LEE LIM MENNIE MIN NA NAM NEVINS NYCE OH PAEK PAIK PAK PARK QUAY RHEE RIM RYANG RYU SAHM SAHOO SEO SHIM SHIN SHINN SIN SJOSTROM SO SOHN SON SONG SUE SUH SUK SUNG SUR TOOHEY UM WHANG WON YI YIM YOHN YOO YOON YOUN YU YUH YUM YUN Russian 1,032 135 96 242 34 68 42 56 461 41 144 116 2,145 107 191 57 38 99 45 58 47 162 399 96 62 39 332 78 147 105 64 311 75 41 38 33 36 175 108 237 145 32 290 614 38 198 96 78 222 KANEVSKY KAPLINSKY KAPOSI KHAN KHANDROS KHOVAYLO KOLMANOVSKY KORSUNSKY KOWAL LAPIDUS LEE LOPATA MESSING METLITSKY MIKHAIL MIRKIN MOGHADAM NADELSON NAZARIAN NEMIROVSKY NIE OGG PAPADOPOULOS PAPATHOMAS PETROV PINARBASI PINCHUK POPOV PROKOP RABER RABINOVICH ROBICHAUX RUBSAMEN SAHATJIAN SARKISIAN SARRAF SCHREIER SCHWAN SIMKO SMETANA SOFRANKO SOKOLOV SORKIN TABAK TEPMAN TERZIAN VASHCHENKO WASILEWSKI ZEMEL Vietnamese 114 69 72 104 161 69 70 153 74 63 113 113 74 95 115 66 72 65 75 73 72 125 132 67 102 131 123 81 86 78 123 65 69 66 65 82 62 81 77 69 66 91 111 85 80 87 96 80 126 NIEH NIM PHAM PHAN PHANG PHY POSTMAN QUACH QUI QUY ROCH TA TAKACH TAU THACH THAI THAO THI THIEN THUT TIEDT TIEP TIETJEN TO TON-THAT TRAN TRANDAI TRANG TRANK TRIEU TRONG TRUC TU TUTEN TUY TY VAN VAN CLEVE VAN DAM VAN LE VAN NGUYEN VAN PHAN VAN TRAN VIET VO VO-DINH VOVAN VU VUONG 69 14 901 27 11 19 12 95 11 13 26 91 30 23 33 86 21 13 15 28 14 12 59 76 16 2,050 14 34 11 49 12 27 545 23 16 27 58 40 20 17 29 26 15 11 269 32 20 502 107