Empirical Appendix The Supply Side of Innovation: Invention

advertisement
Empirical Appendix
The Supply Side of Innovation:
H-1B Visa Reforms and US Ethnic
Invention
William R. Kerr
Harvard Business School
Boston MA
William F. Lincoln
University of Michigan
Ann Arbor MI
1
Ethnic Inventor Contributions by City
Total Patenting Share
Atlanta, GA
Austin, TX
Baltimore, MD
Boston, MA
Buffalo, NY
Charlotte, NC
Chicago, IL
Cincinnati, OH
Cleveland, OH
Columbus, OH
Dallas-Fort Worth, TX
Denver, CO
Detroit, MI
Greensboro-W.S., NC
Hartford, CT
Houston, TX
Indianapolis, IN
Jacksonville, NC
Kansas City, MO
Las Vegas, NV
Los Angeles, CA
Memphis, TN
Miami, FL
Milwaukee, WI
Minneap.-St. Paul, MN
Non-English Ethnic Patenting Share
Indian and Chinese Patenting Share
19751984
19851994
19952004
20012006 (A)
19751984
19851994
19952004
20012006 (A)
19751984
19851994
19952004
20012006 (A)
0.6%
0.4%
0.8%
3.6%
0.6%
0.3%
6.0%
1.0%
2.3%
0.7%
1.6%
1.0%
3.1%
0.2%
0.9%
2.3%
0.8%
0.1%
0.4%
0.1%
6.6%
0.1%
0.8%
1.0%
1.9%
1.0%
0.9%
0.8%
3.8%
0.5%
0.3%
4.6%
1.1%
1.7%
0.5%
2.0%
1.2%
3.3%
0.3%
0.9%
2.5%
0.7%
0.1%
0.3%
0.1%
6.1%
0.2%
0.9%
0.9%
2.4%
1.3%
1.8%
0.7%
3.9%
0.4%
0.3%
3.5%
1.0%
1.3%
0.5%
2.3%
1.3%
2.9%
0.3%
0.6%
1.9%
0.7%
0.1%
0.4%
0.2%
6.0%
0.2%
0.7%
0.8%
2.7%
1.5%
2.0%
0.7%
4.6%
0.3%
0.3%
3.2%
1.0%
1.1%
0.4%
2.1%
1.3%
2.8%
0.2%
0.6%
2.0%
0.5%
0.1%
0.3%
0.3%
5.7%
0.3%
0.7%
0.7%
2.8%
0.3%
0.5%
0.7%
3.9%
0.8%
0.2%
6.9%
0.9%
2.5%
0.6%
1.1%
0.8%
3.1%
0.1%
1.0%
1.8%
0.6%
0.1%
0.2%
0.1%
7.2%
0.1%
1.0%
0.8%
1.6%
0.7%
1.2%
0.7%
4.2%
0.6%
0.2%
5.0%
0.9%
1.5%
0.6%
1.9%
1.0%
3.1%
0.2%
0.8%
2.3%
0.4%
0.1%
0.2%
0.1%
7.2%
0.1%
1.3%
0.8%
2.0%
1.0%
1.9%
0.6%
4.1%
0.4%
0.2%
3.5%
0.7%
1.0%
0.4%
2.3%
0.9%
2.6%
0.2%
0.5%
1.8%
0.4%
0.1%
0.2%
0.2%
7.9%
0.1%
1.0%
0.6%
2.0%
1.1%
2.0%
0.5%
4.8%
0.3%
0.2%
3.0%
0.7%
0.8%
0.3%
2.2%
0.8%
2.6%
0.1%
0.5%
1.9%
0.3%
0.1%
0.2%
0.2%
7.3%
0.2%
0.9%
0.5%
2.0%
0.3%
0.4%
0.4%
4.0%
1.1%
0.1%
5.6%
0.7%
2.5%
0.8%
1.5%
0.8%
3.2%
0.2%
0.8%
2.2%
0.7%
0.1%
0.2%
0.0%
6.7%
0.1%
0.5%
0.5%
1.5%
0.7%
1.6%
0.5%
4.0%
0.7%
0.2%
3.9%
1.0%
1.4%
0.7%
2.4%
1.0%
2.8%
0.2%
0.6%
2.8%
0.5%
0.1%
0.1%
0.1%
6.9%
0.1%
0.6%
0.4%
1.7%
1.0%
2.3%
0.6%
3.6%
0.4%
0.1%
2.9%
0.6%
0.9%
0.3%
2.9%
0.6%
2.5%
0.1%
0.3%
1.8%
0.4%
0.1%
0.2%
0.1%
7.5%
0.1%
0.5%
0.5%
1.7%
1.2%
2.3%
0.5%
4.3%
0.3%
0.2%
2.8%
0.6%
0.6%
0.3%
2.8%
0.5%
2.5%
0.1%
0.4%
1.9%
0.3%
0.1%
0.2%
0.1%
7.0%
0.1%
0.4%
0.4%
1.8%
Ethnic Inventor Contributions by City, continued
Total Patenting Share
non-English Ethnic Patenting Share
Indian and Chinese Patenting Share
19751984
19851994
19952004
20012006 (A)
19751984
19851994
19952004
20012006 (A)
19751984
19851994
19952004
20012006 (A)
Nashville, TN
New Orleans, LA
New York, NY
Norfolk-VA Beach, VA
Orlando, FL
Philadelphia, PA
Phoenix, AZ
Pittsburgh, PA
Portland, OR
Providence, RI
Raleigh-Durham, NC
Richmond, VA
Sacramento, CA
Salt Lake City, UT
San Antonio, TX
San Diego, CA
San Francisco, CA
Seattle, WA
St. Louis, MO
Tallahassee, FL
Washington, DC
West Palm Beach, FL
0.1%
0.3%
11.5%
0.2%
0.2%
4.6%
1.0%
2.0%
0.5%
0.3%
0.3%
0.3%
0.2%
0.4%
0.1%
1.1%
4.8%
0.9%
1.0%
0.4%
1.5%
0.3%
0.2%
0.2%
8.9%
0.2%
0.3%
4.0%
1.2%
1.3%
0.8%
0.3%
0.6%
0.3%
0.4%
0.5%
0.2%
1.6%
6.6%
1.3%
0.9%
0.5%
1.5%
0.5%
0.2%
0.2%
7.3%
0.2%
0.3%
2.7%
1.4%
0.8%
1.4%
0.3%
1.1%
0.2%
0.5%
0.6%
0.2%
2.2%
12.1%
1.9%
0.8%
0.4%
1.4%
0.4%
0.2%
0.1%
6.9%
0.1%
0.3%
2.8%
1.3%
0.7%
1.6%
0.2%
1.5%
0.2%
0.5%
0.6%
0.2%
2.8%
13.2%
3.4%
0.8%
0.4%
1.6%
0.4%
0.0%
0.3%
16.6%
0.1%
0.1%
5.6%
0.6%
2.2%
0.3%
0.3%
0.3%
0.3%
0.2%
0.2%
0.1%
1.1%
6.2%
0.8%
0.9%
0.3%
1.6%
0.3%
0.1%
0.3%
13.1%
0.1%
0.2%
4.9%
1.1%
1.4%
0.6%
0.4%
0.6%
0.3%
0.4%
0.4%
0.2%
1.6%
9.3%
1.1%
0.8%
0.4%
1.6%
0.5%
0.1%
0.1%
10.1%
0.1%
0.3%
2.8%
1.3%
0.6%
1.4%
0.3%
1.0%
0.2%
0.5%
0.3%
0.2%
2.6%
19.3%
1.8%
0.8%
0.3%
1.5%
0.4%
0.1%
0.1%
8.9%
0.1%
0.3%
2.9%
1.2%
0.5%
1.6%
0.2%
1.3%
0.2%
0.5%
0.3%
0.2%
3.6%
19.9%
3.5%
0.7%
0.3%
1.7%
0.4%
0.1%
0.2%
16.6%
0.1%
0.1%
6.2%
0.4%
2.2%
0.2%
0.2%
0.3%
0.3%
0.2%
0.2%
0.2%
0.8%
8.4%
0.6%
1.0%
0.2%
1.6%
0.3%
0.1%
0.2%
13.3%
0.1%
0.2%
5.8%
1.0%
1.3%
0.6%
0.3%
0.8%
0.4%
0.3%
0.3%
0.1%
1.4%
13.0%
1.0%
0.8%
0.2%
1.7%
0.3%
0.1%
0.0%
9.7%
0.1%
0.3%
2.8%
1.4%
0.5%
1.7%
0.2%
1.0%
0.2%
0.5%
0.3%
0.1%
2.4%
25.4%
1.8%
0.4%
0.2%
1.5%
0.2%
0.1%
0.0%
9.0%
0.1%
0.3%
3.0%
1.3%
0.5%
2.0%
0.2%
1.2%
0.2%
0.5%
0.3%
0.1%
3.9%
24.0%
3.7%
0.4%
0.2%
1.7%
0.2%
Other 234 Major Cities
Not in a Major City
21.8%
9.0%
22.3%
8.2%
20.7%
6.6%
18.4%
6.2%
18.1%
6.3%
18.1%
5.4%
15.6%
3.7%
13.6%
4.1%
19.7%
5.2%
18.2%
3.8%
14.6%
2.5%
12.7%
2.7%
Notes: See Table 1. The first three columns of each grouping are for granted patents. The fourth column, marked with (A), is for published patent applications.
Univariate Regressions for Table 2
City &
Year
Fixed
Effects
(1)
Log English Patenting
(1) plus
(2) plus
(2) plus
(2) plus
Expected State-Yr Population Dropping
Patenting
Fixed
Weights
Largest
Trends
Effects
20%
(2)
(3)
(4)
(5)
City &
Year
Fixed
Effects
(6)
Log Total Patenting
(6) plus
(7) plus
(7) plus
(7) plus
Expected State-Yr Population Dropping
Patenting
Fixed
Weights
Largest
Trends
Effects
20%
(7)
(8)
(9)
(10)
Table Documents 50 Univariate Regressions with Separate Ethnic Patenting Measures
Log Indian Patenting
0.105
(0.021)
0.078
(0.017)
0.050
(0.019)
0.098
(0.021)
0.085
(0.020)
0.155
(0.021)
0.128
(0.017)
0.103
(0.019)
0.150
(0.024)
0.141
(0.019)
Log Chinese Patenting
0.109
(0.023)
0.084
(0.019)
0.072
(0.021)
0.100
(0.020)
0.083
(0.022)
0.163
(0.021)
0.138
(0.018)
0.135
(0.019)
0.155
(0.020)
0.144
(0.020)
Log European Patenting
0.151
(0.027)
0.107
(0.023)
0.076
(0.021)
0.141
(0.024)
0.097
(0.025)
0.229
(0.023)
0.185
(0.019)
0.163
(0.019)
0.212
(0.022)
0.180
(0.021)
Log Hispanic Patenting
0.133
(0.023)
0.102
(0.017)
0.076
(0.019)
0.116
(0.021)
0.107
(0.019)
0.192
(0.021)
0.160
(0.016)
0.142
(0.018)
0.171
(0.023)
0.173
(0.018)
Log Russian Patenting
0.080
(0.024)
0.074
(0.018)
0.065
(0.020)
0.080
(0.021)
0.078
(0.022)
0.112
(0.024)
0.106
(0.019)
0.100
(0.020)
0.118
(0.023)
0.114
(0.023)
Notes: See Table 2.
Further Robustness Checks on Table 2
City &
Year
Fixed
Effects
Log Indian and
Chinese Patenting
Observations
Log English Patenting
(1) with
(1) plus
(1) plus
Grants
Appl.
Dropping
1995-2001 2001-2006
Zero
Only
Only
Counts
(1) plus
Dropping
West
Coast
City &
Year
Fixed
Effects
Log Total Patenting
(1) with
(1) plus
(1) plus
Grants
Appl.
Dropping
1995-2001 2001-2006
Zero
Only
Only
Counts
(1) plus
Dropping
West
Coast
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
0.137
(0.024)
0.149
(0.029)
0.106
(0.025)
0.180
(0.025)
0.139
(0.025)
0.211
(0.022)
0.237
(0.025)
0.196
(0.021)
0.260
(0.022)
0.211
(0.023)
3372
2248
1686
2022
3036
3372
2248
1686
2022
3036
Notes: See Table 2.
First-Differenced Regressions of Table 2
Year
Fixed
Effects
Δ Log Indian and
Chinese Patenting
Notes: See Table 2.
Δ Log English Patenting
(1) plus
(2) plus
(2) plus
(2) plus
Expected State-Yr Population Dropping
Patenting
Fixed
Weights
Largest
Trends
Effects
20%
Year
Fixed
Effects
Δ Log Total Patenting
(6) plus
(7) plus
(7) plus
(7) plus
Expected State-Yr Population Dropping
Patenting
Fixed
Weights
Largest
Trends
Effects
20%
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
0.079
(0.015)
0.056
(0.015)
0.049
(0.017)
0.050
(0.012)
0.057
(0.017)
0.167
(0.015)
0.144
(0.014)
0.139
(0.016)
0.128
(0.012)
0.149
(0.016)
Dependencies on H-1B Program for Major Patenting Cities
LCA-Based Dependency
City's 2001-2002 LCA Filings for
H-1B Visas Per Capita (x1000)
(LCA Continued)
Census-Based Dependency
City's 1990 Non-Citizen Immigrant
SE Workforce per Capita (x1000)
(1)
San Francisco, CA
Miami, FL
Washington, DC
Raleigh-Durham, NC
Boston, MA
Austin, TX
New York, NY
Atlanta, GA
Dallas-Fort Worth, TX
Houston, TX
Seattle, WA
San Diego, CA
Los Angeles, CA
West Palm Beach, FL
Detroit, MI
Denver, CO
Chicago, IL
Orlando, FL
Columbus, OH
Philadelphia, PA
Richmond, VA
Hartford, CT
Minneapolis-St. Paul, MN
Portland, OR
(Census Continued)
(2)
8.323
5.502
5.430
5.220
5.149
4.897
4.777
4.116
3.943
3.712
3.393
3.021
2.811
2.744
2.729
2.407
2.372
2.343
2.116
2.112
2.108
2.010
1.852
1.708
Kansas City, MO
Charlotte, SC
Indianapolis, IN
Baltimore, MD
Phoenix, AZ
Memphis, TN
Sacramento, CA
Las Vegas, NV
Pittsburgh, PA
Jacksonville, NC
Cincinnati, OH
Tallahassee, FL
St. Louis, MO
Milwaukee, WI
Providence, RI
Nashville, TN
Cleveland, OH
Salt Lake City, UT
New Orleans, LA
San Antonio, TX
Greensboro-W. Salem, NC
Buffalo, NY
Norfolk-VA Beach, VA
1.697
1.649
1.620
1.612
1.580
1.561
1.490
1.462
1.438
1.266
1.224
1.211
1.203
1.170
1.158
1.136
1.134
1.058
0.977
0.877
0.859
0.703
0.536
San Francisco, CA
Washington, DC
Boston, MA
Raleigh-Durham, NC
Los Angeles, CA
New York, NY
Houston, TX
San Diego, CA
Austin, TX
Detroit, MI
Miami, FL
Dallas-Fort Worth, TX
Philadelphia, PA
Columbus, OH
Seattle, WA
Hartford, CT
Atlanta, GA
West Palm Beach, FL
Chicago, IL
Sacramento, CA
Salt Lake City, UT
Portland, OR
Phoenix, AZ
Pittsburgh, PA
Notes: See Table 3. Table presents largest dependency values on the H-1B program among major patenting cities.
5.096
3.168
3.129
2.723
2.288
2.185
2.156
2.040
1.770
1.545
1.517
1.442
1.423
1.411
1.340
1.212
1.185
1.147
1.145
1.107
1.021
0.983
0.975
0.904
Richmond, VA
Cleveland, OH
Denver, CO
Buffalo, NY
Orlando, FL
New Orleans, LA
Charlotte, SC
Milwaukee, WI
Cincinnati, OH
Baltimore, MD
Memphis, TN
Indianapolis, IN
Minneapolis-St. Paul, MN
St. Louis, MO
Greensboro-W. Salem, NC
Nashville, TN
Kansas City, MO
Norfolk-VA Beach, VA
Tallahassee, FL
Providence, RI
San Antonio, TX
Jacksonville, NC
Las Vegas, NV
0.887
0.860
0.791
0.770
0.757
0.751
0.749
0.741
0.722
0.700
0.615
0.600
0.600
0.541
0.496
0.495
0.489
0.356
0.326
0.272
0.264
0.242
0.154
City-Year Regressions with LCA-Based Dependency
Log
Indian
Patenting
Log
Chinese
Patenting
Log
Other
Patenting
Log
English
Patenting
Log
Total
Patenting
A. Base Regression with City and Year Fixed Effects
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.313
(0.087)
0.311
(0.095)
0.305
(0.106)
-0.010
(0.101)
0.037
(0.107)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.623
(0.090)
0.741
(0.108)
0.461
(0.096)
0.050
(0.087)
0.078
(0.083)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.982
(0.078)
1.179
(0.091)
0.593
(0.092)
0.109
(0.086)
0.172
(0.086)
B. Substituting Six-Year Cap Summation for H-1B Population
Log H-1B Cap Summation x
Third Dependency Quintile [LCA]
0.361
(0.098)
0.318
(0.102)
0.304
(0.111)
-0.026
(0.107)
0.023
(0.111)
Log H-1B Cap Summation x
Second Dependency Quintile [LCA]
0.661
(0.095)
0.810
(0.115)
0.480
(0.105)
0.039
(0.091)
0.072
(0.089)
Log H-1B Cap Summation x
Most Dependent Quintile [LCA]
1.057
(0.085)
1.198
(0.099)
0.630
(0.100)
0.092
(0.091)
0.163
(0.091)
C. Including State-Year Fixed Effects
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.295
(0.094)
0.267
(0.110)
0.182
(0.104)
-0.057
(0.090)
-0.006
(0.094)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.606
(0.107)
0.661
(0.118)
0.353
(0.092)
-0.040
(0.079)
0.002
(0.081)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.949
(0.085)
1.176
(0.105)
0.479
(0.093)
0.036
(0.085)
0.113
(0.083)
D. Dropping West Coast (Census Region 9)
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.275
(0.088)
0.318
(0.102)
0.330
(0.110)
-0.004
(0.110)
0.049
(0.116)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.632
(0.095)
0.733
(0.113)
0.489
(0.095)
0.050
(0.088)
0.092
(0.086)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.917
(0.080)
1.143
(0.098)
0.619
(0.098)
0.126
(0.094)
0.185
(0.094)
E. Dropping The 20 Most Dependent Cities
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.313
(0.087)
0.311
(0.095)
0.305
(0.106)
-0.010
(0.101)
0.037
(0.107)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.623
(0.090)
0.741
(0.108)
0.461
(0.096)
0.050
(0.087)
0.078
(0.083)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.967
(0.094)
1.206
(0.106)
0.615
(0.105)
0.140
(0.097)
0.201
(0.096)
Notes: See Table 4A.
City-Year Regressions with LCA-Based Dependency and Ethnic Tech Trends
Log
Indian
Patenting
Log
Chinese
Patenting
Log
Other
Patenting
Log
English
Patenting
Log
Total
Patenting
A. Base Regression with City and Year Fixed Effects
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.172
(0.088)
0.154
(0.086)
0.210
(0.102)
0.043
(0.074)
0.103
(0.077)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.362
(0.089)
0.455
(0.113)
0.285
(0.097)
0.105
(0.077)
0.162
(0.075)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.552
(0.098)
0.714
(0.106)
0.338
(0.098)
0.164
(0.081)
0.266
(0.079)
B. Substituting Six-Year Cap Summation for H-1B Population
Log H-1B Cap Summation x
Third Dependency Quintile [LCA]
0.219
(0.097)
0.152
(0.092)
0.210
(0.107)
0.022
(0.080)
0.082
(0.081)
Log H-1B Cap Summation x
Second Dependency Quintile [LCA]
0.390
(0.093)
0.498
(0.118)
0.295
(0.106)
0.087
(0.082)
0.149
(0.083)
Log H-1B Cap Summation x
Most Dependent Quintile [LCA]
0.608
(0.102)
0.686
(0.109)
0.351
(0.102)
0.128
(0.084)
0.237
(0.082)
C. Including State-Year Fixed Effects
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.142
(0.092)
0.138
(0.107)
0.128
(0.104)
0.022
(0.074)
0.077
(0.075)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.322
(0.100)
0.421
(0.119)
0.201
(0.099)
0.054
(0.080)
0.114
(0.084)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.468
(0.100)
0.785
(0.117)
0.274
(0.096)
0.121
(0.088)
0.218
(0.084)
D. Dropping West Coast (Census Region 9)
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.123
(0.084)
0.154
(0.092)
0.223
(0.104)
0.041
(0.081)
0.111
(0.084)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.360
(0.094)
0.442
(0.117)
0.317
(0.099)
0.108
(0.082)
0.180
(0.081)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.476
(0.101)
0.673
(0.112)
0.357
(0.101)
0.162
(0.087)
0.264
(0.084)
E. Dropping The 20 Most Dependent Cities
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.172
(0.088)
0.155
(0.086)
0.211
(0.102)
0.044
(0.074)
0.104
(0.077)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.361
(0.089)
0.456
(0.113)
0.286
(0.097)
0.106
(0.078)
0.164
(0.076)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.550
(0.109)
0.756
(0.115)
0.357
(0.107)
0.180
(0.086)
0.281
(0.083)
Notes: See Table 4B. Regressions include unreported ethnic-specific technology trends.
City-Year Regressions with Census-Based Dependency
Log
Indian
Patenting
Log
Chinese
Patenting
Log
Other
Patenting
Log
English
Patenting
Log
Total
Patenting
A. Base Regression with City and Year Fixed Effects
Log National H-1B Population x
Third Dependency Quintile [Census]
0.207
(0.104)
0.569
(0.123)
0.134
(0.109)
0.048
(0.097)
0.064
(0.099)
Log National H-1B Population x
Second Dependency Quintile [Census]
0.398
(0.096)
0.489
(0.115)
0.285
(0.103)
0.064
(0.100)
0.080
(0.098)
Log National H-1B Population x
Most Dependent Quintile [Census]
0.550
(0.097)
0.718
(0.109)
0.215
(0.101)
-0.019
(0.081)
0.029
(0.083)
B. Substituting Six-Year Cap Summation for H-1B Population
Log H-1B Cap Summation x
Third Dependency Quintile [Census]
0.240
(0.115)
0.610
(0.133)
0.103
(0.117)
0.019
(0.101)
0.038
(0.102)
Log H-1B Cap Summation x
Second Dependency Quintile [Census]
0.418
(0.102)
0.495
(0.118)
0.266
(0.114)
0.031
(0.105)
0.043
(0.103)
Log H-1B Cap Summation x
Most Dependent Quintile [Census]
0.593
(0.106)
0.782
(0.115)
0.211
(0.106)
-0.037
(0.086)
0.013
(0.090)
C. Including State-Year Fixed Effects
Log National H-1B Population x
Third Dependency Quintile [Census]
0.203
(0.114)
0.498
(0.130)
0.081
(0.115)
0.076
(0.088)
0.090
(0.087)
Log National H-1B Population x
Second Dependency Quintile [Census]
0.393
(0.111)
0.422
(0.135)
0.155
(0.109)
0.052
(0.084)
0.080
(0.087)
Log National H-1B Population x
Most Dependent Quintile [Census]
0.619
(0.104)
0.733
(0.119)
0.184
(0.095)
-0.008
(0.079)
0.048
(0.080)
D. Dropping West Coast (Census Region 9)
Log National H-1B Population x
Third Dependency Quintile [Census]
0.209
(0.111)
0.564
(0.132)
0.132
(0.112)
0.051
(0.104)
0.071
(0.106)
Log National H-1B Population x
Second Dependency Quintile [Census]
0.406
(0.099)
0.486
(0.121)
0.319
(0.103)
0.063
(0.102)
0.098
(0.102)
Log National H-1B Population x
Most Dependent Quintile [Census]
0.495
(0.097)
0.668
(0.117)
0.225
(0.109)
-0.029
(0.089)
0.019
(0.092)
E. Dropping The 20 Most Dependent Cities
Log National H-1B Population x
Third Dependency Quintile [Census]
0.207
(0.104)
0.569
(0.123)
0.134
(0.109)
0.048
(0.097)
0.064
(0.099)
Log National H-1B Population x
Second Dependency Quintile [Census]
0.398
(0.096)
0.489
(0.115)
0.285
(0.103)
0.064
(0.100)
0.080
(0.098)
Log National H-1B Population x
Most Dependent Quintile [Census]
0.461
(0.112)
0.647
(0.129)
0.176
(0.115)
-0.019
(0.092)
0.026
(0.095)
Notes: See Table 4A.
City-Year Regressions with Census-Based Dependency and Ethnic Tech Trends
Log
Indian
Patenting
Log
Chinese
Patenting
Log
Other
Patenting
Log
English
Patenting
Log
Total
Patenting
A. Base Regression with City and Year Fixed Effects
Log National H-1B Population x
Third Dependency Quintile [Census]
0.067
(0.087)
0.412
(0.106)
0.049
(0.098)
0.014
(0.081)
0.027
(0.082)
Log National H-1B Population x
Second Dependency Quintile [Census]
0.105
(0.085)
0.162
(0.101)
0.112
(0.095)
0.056
(0.076)
0.079
(0.072)
Log National H-1B Population x
Most Dependent Quintile [Census]
0.139
(0.080)
0.256
(0.089)
-0.014
(0.092)
0.006
(0.072)
0.059
(0.073)
B. Substituting Six-Year Cap Summation for H-1B Population
Log H-1B Cap Summation x
Third Dependency Quintile [Census]
0.095
(0.096)
0.447
(0.119)
0.016
(0.106)
-0.004
(0.085)
0.013
(0.085)
Log H-1B Cap Summation x
Second Dependency Quintile [Census]
0.133
(0.092)
0.176
(0.104)
0.101
(0.107)
0.049
(0.081)
0.068
(0.078)
Log H-1B Cap Summation x
Most Dependent Quintile [Census]
0.180
(0.087)
0.318
(0.092)
-0.024
(0.095)
-0.002
(0.075)
0.053
(0.078)
C. Including State-Year Fixed Effects
Log National H-1B Population x
Third Dependency Quintile [Census]
0.074
(0.102)
0.372
(0.122)
0.020
(0.110)
0.037
(0.073)
0.043
(0.073)
Log National H-1B Population x
Second Dependency Quintile [Census]
0.150
(0.092)
0.182
(0.118)
0.051
(0.102)
0.046
(0.075)
0.069
(0.075)
Log National H-1B Population x
Most Dependent Quintile [Census]
0.215
(0.090)
0.331
(0.105)
0.013
(0.094)
0.037
(0.074)
0.085
(0.073)
D. Dropping West Coast (Census Region 9)
Log National H-1B Population x
Third Dependency Quintile [Census]
0.083
(0.092)
0.421
(0.111)
0.057
(0.099)
0.020
(0.088)
0.040
(0.088)
Log National H-1B Population x
Second Dependency Quintile [Census]
0.115
(0.089)
0.157
(0.107)
0.155
(0.098)
0.055
(0.080)
0.097
(0.076)
Log National H-1B Population x
Most Dependent Quintile [Census]
0.101
(0.081)
0.220
(0.095)
0.002
(0.097)
-0.006
(0.077)
0.051
(0.079)
E. Dropping The 20 Most Dependent Cities
Log National H-1B Population x
Third Dependency Quintile [Census]
0.068
(0.087)
0.413
(0.106)
0.050
(0.098)
0.014
(0.082)
0.029
(0.082)
Log National H-1B Population x
Second Dependency Quintile [Census]
0.108
(0.086)
0.163
(0.102)
0.113
(0.096)
0.057
(0.076)
0.081
(0.073)
Log National H-1B Population x
Most Dependent Quintile [Census]
0.103
(0.086)
0.244
(0.098)
-0.028
(0.100)
0.003
(0.077)
0.053
(0.079)
Notes: See Table 4B. Regressions include unreported ethnic-specific technology trends.
City-Year Regressions in First-Differenced Specifications
Δ Log
Indian
Patenting
Δ Log
Chinese
Patenting
Δ Log
Other
Patenting
Δ Log
English
Patenting
Δ Log
Total
Patenting
A. LCA-Based Dependency
Δ Log National H-1B Population x
Third Dependency Quintile [LCA]
0.007
(0.080)
0.141
(0.102)
0.171
(0.142)
0.122
(0.128)
0.174
(0.136)
Δ Log National H-1B Population x
Second Dependency Quintile [LCA]
0.549
(0.100)
0.437
(0.114)
0.237
(0.130)
0.106
(0.104)
0.123
(0.097)
Δ Log National H-1B Population x
Most Dependent Quintile [LCA]
0.511
(0.100)
0.810
(0.102)
0.301
(0.116)
0.149
(0.107)
0.171
(0.099)
B. LCA-Based Dependency and Ethnic-Specific Technology Trends
Δ Log National H-1B Population x
Third Dependency Quintile [LCA]
-0.098
(0.088)
0.060
(0.102)
0.083
(0.138)
0.136
(0.096)
0.194
(0.097)
Δ Log National H-1B Population x
Second Dependency Quintile [LCA]
0.360
(0.102)
0.282
(0.111)
0.081
(0.129)
0.152
(0.092)
0.187
(0.084)
Δ Log National H-1B Population x
Most Dependent Quintile [LCA]
0.227
(0.118)
0.586
(0.110)
0.095
(0.122)
0.204
(0.096)
0.251
(0.086)
C. Census-Based Dependency
Δ Log National H-1B Population x
Third Dependency Quintile [Census]
-0.017
(0.083)
0.258
(0.111)
0.047
(0.149)
0.025
(0.117)
0.020
(0.114)
Δ Log National H-1B Population x
Second Dependency Quintile [Census]
0.305
(0.102)
0.336
(0.119)
0.082
(0.127)
0.236
(0.124)
0.189
(0.123)
Δ Log National H-1B Population x
Most Dependent Quintile [Census]
0.320
(0.108)
0.390
(0.110)
0.124
(0.116)
-0.002
(0.099)
0.021
(0.097)
D. Census-Based Dependency and Ethnic-Specific Technology Trends
Δ Log National H-1B Population x
Third Dependency Quintile [Census]
-0.092
(0.085)
0.201
(0.108)
0.009
(0.143)
0.065
(0.097)
0.064
(0.092)
Δ Log National H-1B Population x
Second Dependency Quintile [Census]
0.137
(0.103)
0.204
(0.113)
-0.021
(0.126)
0.219
(0.093)
0.182
(0.086)
Δ Log National H-1B Population x
Most Dependent Quintile [Census]
0.108
(0.114)
0.223
(0.107)
0.005
(0.113)
0.046
(0.087)
0.083
(0.082)
Notes: See Tables 4A and 4B. Regressions in Panels B and D include unreported ethnic-specific technology trends.
City-Year Regressions Comparing Granted Patents and Patent Applications
Log
Indian
Patenting
Log
Chinese
Patenting
Log
Other
Patenting
Log
English
Patenting
Log
Total
Patenting
A. Base Regression, 1995-2006
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.313
(0.087)
0.311
(0.095)
0.305
(0.106)
-0.010
(0.101)
0.037
(0.107)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.623
(0.090)
0.741
(0.108)
0.461
(0.096)
0.050
(0.087)
0.078
(0.083)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.982
(0.078)
1.179
(0.091)
0.593
(0.092)
0.109
(0.086)
0.172
(0.086)
B. Employing Granted Patents Only, 1995-2002
Log National H-1B Population x
Third Dependency Quintile [LCA]
-0.057
(0.084)
0.173
(0.093)
0.164
(0.131)
0.003
(0.114)
0.070
(0.116)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.252
(0.099)
0.191
(0.110)
0.191
(0.113)
0.111
(0.105)
0.130
(0.098)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.391
(0.086)
0.694
(0.108)
0.227
(0.090)
0.135
(0.097)
0.191
(0.095)
C. Employing Granted Patents + non-Overlapping Applications After 2004
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.217
(0.080)
0.233
(0.086)
0.222
(0.100)
-0.012
(0.095)
0.040
(0.100)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.471
(0.078)
0.568
(0.096)
0.338
(0.088)
0.067
(0.084)
0.091
(0.080)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.754
(0.073)
0.942
(0.085)
0.422
(0.086)
0.094
(0.079)
0.156
(0.078)
Notes: See Table 4A. Rows show LCA regression results with different data cuts. The core results employ granted patents from
1995-2006 and non-overlapping patents from 2001-2006. Panel B uses only granted patents from 1995-2002. Panel C uses granted
patents plus non-overlapping patents from 2004-2006, years in which granted patents are weakest due to review lags. Similar
patterns are evident in all panels. We have also confirmed similar results to Panel B when dropping computer-related patent grants.
The greater explanatory power for Indian and Chinese patents in Panel A is partly due to modeling both the rise and decline in H-1B
population growth and caps exhibited in Figure 4. This requires well measured data extending 1995-2006. Breaking the 1995-2006
sample period results in more monotonic H-1B trends that are less separable from aggregate effects.
City-Year Regressions Removing 307 Top Patenting Firms
Log
Indian
Patenting
Log
Chinese
Patenting
Log
Other
Patenting
Log
English
Patenting
Log
Total
Patenting
A. LCA-Based Dependency
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.244
(0.077)
0.335
(0.099)
0.379
(0.116)
-0.016
(0.100)
0.033
(0.104)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.719
(0.111)
0.851
(0.126)
0.618
(0.114)
0.072
(0.094)
0.109
(0.092)
Log National H-1B Population x
Most Dependent Quintile [LCA]
1.377
(0.119)
1.518
(0.115)
0.878
(0.107)
0.305
(0.090)
0.365
(0.091)
B. LCA-Based Dependency and Ethnic-Specific Technology Trends
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.222
(0.075)
0.310
(0.097)
0.341
(0.114)
-0.004
(0.101)
0.045
(0.105)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.670
(0.107)
0.792
(0.121)
0.572
(0.115)
0.087
(0.095)
0.126
(0.093)
Log National H-1B Population x
Most Dependent Quintile [LCA]
1.286
(0.121)
1.390
(0.123)
0.806
(0.114)
0.309
(0.094)
0.380
(0.097)
C. Census-Based Dependency
Log National H-1B Population x
Third Dependency Quintile [Census]
0.196
(0.102)
0.451
(0.120)
0.172
(0.123)
0.002
(0.095)
0.030
(0.096)
Log National H-1B Population x
Second Dependency Quintile [Census]
0.600
(0.134)
0.713
(0.151)
0.468
(0.121)
0.166
(0.106)
0.185
(0.105)
Log National H-1B Population x
Most Dependent Quintile [Census]
0.898
(0.134)
0.993
(0.138)
0.517
(0.119)
0.204
(0.088)
0.245
(0.092)
D. Census-Based Dependency and Ethnic-Specific Technology Trends
Log National H-1B Population x
Third Dependency Quintile [Census]
0.171
(0.093)
0.424
(0.108)
0.149
(0.115)
0.006
(0.095)
0.033
(0.096)
Log National H-1B Population x
Second Dependency Quintile [Census]
0.532
(0.128)
0.638
(0.144)
0.413
(0.119)
0.170
(0.107)
0.189
(0.106)
Log National H-1B Population x
Most Dependent Quintile [Census]
0.777
(0.128)
0.850
(0.132)
0.441
(0.116)
0.194
(0.089)
0.237
(0.094)
Notes: See Tables 4A and 4B. Regressions in Panels B and D include unreported ethnic-specific technology trends. These
estimations exclude patents associated with the top firm panel. This panel is comprised of the most dependent LCA firms and the
largest US patenters.
City-Year Regressions with Population Quintiles or Citizen Immigrant Trends
Log
Indian
Patenting
Log
Chinese
Patenting
Log
Other
Patenting
Log
English
Patenting
Log
Total
Patenting
A. Testing Against Population Quintiles
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.252
(0.085)
0.255
(0.091)
0.255
(0.105)
0.008
(0.103)
0.064
(0.109)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.436
(0.087)
0.607
(0.123)
0.385
(0.102)
0.099
(0.088)
0.144
(0.085)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.715
(0.103)
1.033
(0.120)
0.564
(0.124)
0.169
(0.110)
0.246
(0.107)
Log National H-1B Population x
Third Population Quintile
0.110
(0.076)
0.147
(0.101)
0.190
(0.096)
0.104
(0.103)
0.069
(0.105)
Log National H-1B Population x
Second Population Quintile
0.133
(0.076)
0.234
(0.107)
0.299
(0.108)
-0.109
(0.086)
-0.155
(0.088)
Log National H-1B Population x
Largest Population Quintile
0.532
(0.113)
0.337
(0.117)
0.147
(0.114)
-0.098
(0.100)
-0.140
(0.097)
B. Test Against US Citizen Immigrant SEs in US
Log National H-1B Population x
Third Dependency Quintile [LCA]
0.305
(0.097)
0.313
(0.120)
0.296
(0.125)
-0.084
(0.121)
-0.018
(0.125)
Log National H-1B Population x
Second Dependency Quintile [LCA]
0.580
(0.101)
0.757
(0.112)
0.481
(0.128)
-0.029
(0.099)
0.017
(0.101)
Log National H-1B Population x
Most Dependent Quintile [LCA]
0.855
(0.090)
1.005
(0.108)
0.605
(0.113)
0.033
(0.102)
0.105
(0.101)
Log US Citizen Immigrant SEs x
Third Dependency Quintile [LCA]
0.018
(0.104)
-0.003
(0.135)
0.021
(0.161)
0.179
(0.168)
0.133
(0.166)
Log US Citizen Immigrant SEs x
Second Dependency Quintile [LCA]
0.105
(0.141)
-0.041
(0.165)
-0.049
(0.160)
0.194
(0.145)
0.149
(0.145)
Log US Citizen Immigrant SEs x
Most Dependent Quintile [LCA]
0.310
(0.128)
0.425
(0.141)
-0.030
(0.136)
0.186
(0.145)
0.162
(0.139)
Notes: See Tables 4A and 4B.
Data Appendix
The Supply Side of Innovation:
H-1B Visa Reforms and US Ethnic
Invention
William R. Kerr
Harvard Business School
Boston MA
William F. Lincoln
University of Michigan
Ann Arbor MI
1
1
Introduction
This Data Appendix gives an extensive description of the di¤erent sources of
our data and how we combined them together to perform our analyses at both
the city and …rm levels. We focus our discussions on data that is less commonly
used or is particular to the H-1B program. We refer readers to other sources
for the commonly used data sets that we utilize, such as Compustat. Section 2
details the data on Canada that we use and lays out how we identi…ed MSA-like
metropolitan areas in Canada. In Section 3 we overview the L-1 and TN visas,
which are the most likely substitutes for the H-1B. In Section 4 we describe
the LCA data that we used in both the labor market and …rm-level analyses.
Section 5 details the general methodology by which we constructed our …rm
panel. The …nal section provides more detail about particular decisions that
were made with regard to individual …rms in the process of constructing the
…rm panel. For details regarding the construction of the ethnic patenting data
set, we refer readers to Kerr (2007).
2
Canadian Analysis
To conduct our international analyses accurately, we needed to identify metropolitan areas in Canada in the same way in which we identi…ed them in the United
States. Fortunately, the way that Canada classi…es metropolitan areas is quite
similar to the way in which the United States classi…es them. In Canada, however, these areas are split up into two types: Census Metropolitan Areas (CMAs)
and Census Agglomerations (CAs). CMAs consist of population centers with
at least 100,000 people in what is called the “urban core.”CAs are de…ned similarly but only have urban core populations of 10,000 to 99,999. Metropolitan
Statistical Areas in the United States are similarly de…ned, with the exception
that there is only one threshold – an urban core of 50,000. To alleviate this
discrepancy, we …rst matched the city listed on each patent to its appropriate CMA or CA. We then looked up the urban core population of each CMA
and CA and identi…ed those with urban core populations of at least 50,000 as
Canadian MSAs. We used the 2006 Canadian Census from Statistics Canada
to determine these population thresholds. All together in Canada, there are 33
CMAs and 111 CAs, 49 (33+16) of which qualify as MSA equivalents.
We have had to match the Canadian cities listed on each patent to the
corresponding CMA or CA by hand. We have looked up all of the city names
that had more than 5 applications associated with them from 1990 to 2007. This
ends up matching approximately 95% of Canadian patents. One important
issue with this matching is that the patent data do not have information on
what province the city was located in. While most city names uniquely identify
cities, irrespective of province, several cities have the same name in di¤erent
2
provinces (in the US context, this is like Spring…eld, MA and Spring…eld, IL).
Among the observations that we match on, the number of patents for which this
is the case is approximately 10% (10,602/107,362). Since we can not determine
which city these names refer to, we drop these patents from the analysis. The
composition of the dropped observations is quite similar to that of the overall
sample.
Our general statistics on immigration to Canada come from a 2008 publication by Citizenship and Immigration Canada entitled "Immigration Overview:
Permanent and Temporary Residents 2007." This publication has information
on Canadian immigration for permanent residents and temporary workers broken down by several di¤erent characteristics. The data extend back as far as
1983.
3
L-1 and TN Visa Data
3.1
L-1 Visa
The L-1 visa is intended to enable multinational …rms to transfer their employees
that work in foreign o¢ ces into the United States. It is split into two distinct
categories: the L-1A is intended for managers and executives and the L-1B is
for workers with “specialized knowledge.”The L-1A is valid for 3 years and can
be extended to a maximum stay of 7 years. The initial length of the L-1B is
also 3 years, although it can only be extended to a maximum stay of 5 years.
Transferred employees have generally been required to have worked for the …rm
abroad for at least one continuous year out of the last three years. An exception
to this restriction was the blanket L-1 visa. This allowed …rms of su¢ cient size
and history in sponsoring L-1s to apply for a special status in which they were
allowed to transfer workers after having employed them for 6 months. One of
the major changes to the program during our period of analysis, the L-1 Visa
Reform Act of 2004, concerned this rule. Under the Act, all new L-1 workers
were required to have worked for the …rm abroad for at least a year, regardless
of the …rm’s status. These restrictions on foreign work history were intended
to prevent …rms from hiring abroad to …ll domestic worker needs1 . Unlike the
H-1B visa there are no wage restrictions for L-1 workers, with the exception
that compensation has to be high enough to prevent the worker from becoming
a public charge.
Our data on L-1 visas come from several sources. L-1 issuances come from
Kirkegaard (2005), who originally obtained the (theretofore unpublished) statistics from the Department of State’s O¢ ce of Public and Diplomatic Liaison.
Our data on the border crossings of L-1 holders come from the USCIS Yearbook
1 Department
of Homeland Security (2006)
3
of Immigration Statistics. Figure 1 plots these three series2 . We divide L-1
border crossings by …ve only for reasons of scale; with this modi…cation, the
…gure demonstrates how closely border crossings track new L-1 issuances.
The evidence suggests that the restrictions on the use of the L-1 visa are
such that it is not being widely used as a substitute for the H-1B. Indeed, the
regulations on the use of the visa appear to have had their intended e¤ect.
As shown in Figure 1, the use of the L-1 has grown steadily over time, with
a leveling o¤ after 2001. If substitution were happening on a large scale, one
would expect to see much larger increases in the number of L-1 visas issued
after 2003 when the H-1B cap became binding and demand for foreign workers
far outpaced supply. One would also expect to see a decrease in the number
of L-1 visas issued when H-1B admissions levels were signi…cantly increased in
the late 1990s. Neither of these patterns is evident in either L-1 issuances or
border crossings.
A 2006 study by the Department of Homeland Security investigated abuses
of the L-1 visa, in particular whether …rms were inappropriately using the L-1
to circumvent the H-1B cap. This report found little evidence to substantiate
these concerns. Their conclusions are quite relevant for our work and merit
quoting at length
"While many of the claims that appear in the media about L-1
workers displacing American workers and testimony may have merit,
they do not seem to represent a signi…cant national trend. While L-1
visa issuance has generally increased in the decades since the category was created, issuance has abated in recent years. And while it
is possible for the L-1B program to be used by some individuals who
are also eligible for H-1B program, we could not establish how often
this occurs. In 2004, only 1,975 applicants applied for both the L-1
and H-1B. Adjudicators pointed out to us that it sometimes occurs
that a foreign student about to graduate might receive multiple legitimate job o¤ers and be the bene…ciary of two or more petitions
…led during the same period. Such an event does not indicate that
either of the petitioners, or the bene…ciary, is trying to take advantage of the system. Another possible indication that L-1s are not
widely used as alternatives to the H-1B is that in …scal year 2004
the congressional numerical limit on H-1B status was signi…cantly
reduced, but no increase in L receipts or approvals was observed."
They go on to write
"Most of the discussion of the job losses American workers have
experienced as a result of L visas is focused on L-1B specialized
knowledge workers, not L-1A managers and executives... The great
majority of the new foreign IT employees entered the United States
2 We unfortunately have not been able to obtain data on L-1 border crossings for the year
1997.
4
using the H-1B temporary worker visa, not the L-1. There is considerable room for overlapping of the two categories, but the most
important distinction is that H-1B workers are petitioned for directly
by U.S. companies, and are usually new hires, whereas L-1s are being transferred from a foreign company. The H-1B visa is so popular
that Congress has placed explicit limits on the number of petitions
that can be issued in any one year... L-1 foreign IT workers represented only a small component of a much larger wave of foreign IT
workers that came to the United States on temporary worker visas.
The busiest year for L-1B visas, …scal year 2000, saw more than ten
H-1B workers for every one L-1B worker. In FY 2002, the ratio was
twenty to one. Foreign IT workers may indeed have a¤ected employment opportunities for American IT workers, but the L-1B visa
would appear to be only a very small element of the problem."
We believe from our own work that these …ndings are accurate. As such, we
do not think that large-scale substitution between L-1 and H-1B visas present
problems for our estimations that are not solved by the panel e¤ects.
3.2
TN Visa
The TN visa was created under NAFTA in 1994. It allows workers from Canada
and Mexico employed in a set of high-skilled occupations that are generally
narrower than those covered by the H-1B visa to come to work in the US.
The number of TN workers from Canada has been unlimited since the visa’s
inception. Visas given to Mexican nationals, however, were limited to 5,500 a
year until 2004, at which time the cap was lifted to an unlimited number. Prior
to 2008, workers on TN status were allowed to stay for a maximum of one year.
At the end of the year, they were required to apply for an extension of their stay
to USCIS. This rule was recently changed to extend the period to a maximum of
three years before such paperwork needs to be …led. Although there is no limit
on the number of permitted renewals, the TN visa cannot be used as a substitute
for immigrating to the United States. The decision of whether to deny renewals
over this concern is up to the discretion of US immigration o¢ cials.
Our data on TN visas are more limited and also come from the USCIS Yearbook of Immigration Statistics. They consist of border crossings by workers with
TN visas. Figure 1 shows that these crossings have not seen large ‡uctuations
over time, particularly since 2002. Like data on L-1 crossings, TN crossings are
not available for the year 1997. We have unfortunately been unable to collect
any data on new TN visa issuances or the number of workers in the US on TN
visa status. The number of crossings each year, however, suggests that the TN
population is quite small relative to that of the H-1B. Also, unless the average
number of border crossings per TN worker has changed signi…cantly over time
these data suggest that the number of TN visa holders in the US has not shown
5
dramatic changes over time. The population shows little sign of decreasing as
the H-1B cap was signi…cantly raised or increasing when the H-1B was lowered.
This is reassuring for the interpretation of our results, as it suggests that the
TN visa is not being used as a substitute for the H-1B in large numbers.
4
Labor Condition Application Data
LCA data were obtained from the United States Department of Labor website3 .
The contents of the data are detailed in Data Appendix Table 1. MSAs are
identi…ed from the primary work location that the employer proposes on the
LCA using city lists collected from the O¢ ce of Social and Economic Data
Analysis at the University of Missouri. We obtain an initial match rate of 93%.
Manual coding further ensures a match rate of 98%.
A more extensive description of the characteristics of this data is found in
Data Appendix Table 2. Here, we see that the number of applications have
grown steadily over 2001-2006. At the same time, the rankings of the top …rms
and MSAs applying for visas has remained remarkably steady over time. Our
core panel of 76 …rms has typically accounted for slightly more than 4% of all
LCAs, with the larger panel covering about 12%. These shares are relative to
all institutions, including universities.
5
Construction of Firm Panel
To identify …rms that are the most at-risk to changes in high skilled immigration,
we pool information from several sources. This section details the main data
that was used to construct the …rm panel, including the characteristics of …rms
that were not eventually included in the analysis. We begin with two lists of the
top employers of H-1B visa holders, shown in Data Appendix Tables 3 and 4.
The …rst was published by the United States Immigration and Naturalization
Services in June of 2000. It details the companies with the largest number of
approved H-1B visas that were authorized from October 1, 1999 to February
29, 2000. It contains 102 companies, all of which have more than 60 approved
visas. Their individual totals add up to 13,940 visas, which account for just
above 17% of the total number of petitions approved during this time period.
The second list comes from BusinessWeek magazine and contains the names
of companies receiving the most H-1B visas in the …scal year 2006. This list
identi…es 200 companies, each of which had over 141 visas approved. It also
3 http://www.‡cdatacenter.com/CaseH1B.aspx.
6
shows the structural shift in the type of …rms using the visa that was noted in
the main text. Four of the top …ve …rms in the 2006 Top 200 list are Indian, a
signi…cant change from the 1999 Top 100 list.
We next created two lists of companies from the patent data — one from
the data set of patents granted from January 1975 to April 2007 and the other
from the data set of patent applications published by the USPTO since 2000.
One issue to note with this process is that the names of the companies on
patents are frequently assigned di¤erently across di¤erent patents. As one
of many examples of this issue, the same company might have the identi…er
"Corporation" after its name in one observation and not have the identi…er in
another observation. In the granted patents data, we …rst sorted the company
names by the number of US industry patents that were associated with it in the
data. All company names that accounted for at least 0.05% of the total number
of US industry patents over 2001-2004 were included in the list. We then
included the next top 10 patenting …rms, for a total of 233 …rms all together.
This process yielded very similar results to the one that considered the top
patentors over the 1995-2006 period (instead of the 2001-2004 period). A similar
methodology was used for the patent applications data, using the same threshold
of .05%. This was also done over the period 2001-2004 in the applications data.
This process ended up identifying 210 unique …rms.
We employ a similar methodology for creating a list of the most at-risk …rms
from the LCA data. We use a di¤erent threshold, however, in including all …rm
names in the initial list that had more than 0.03% of all LCA applications 20012006. With the addition of the next top 10 …rms outside of this threshold, this
identi…es a list of 221 …rms.
A comprehensive list of …rms containing information from all of these sources
was then created. The initial list included 592 unique …rms. 307 of these had
at least one patent 1995-2006. We get a list of 76 …rms when we further restrict
the sample to US-based …rms that had at least 5 patents per year and that had
at least one inventor of each relevant ethnicity (Indian, Chinese, English, Other)
on one of its patents in every year 1995-2006. We subsequently refer to these
…rms that we follow as …rms "on our lists." The list of …rms that are included
in our analyses and the reason each was included is detailed in Data Appendix
Table 5. Tables 8 and 9 in the main text show that this list includes a wide
range of types of …rms, including those from di¤erent industries and regions of
the country as well as those of di¤erent size and, to some extent, reliance on the
H-1B visa. Data from Compustat was merged on to this master …le, containing
information about the …rm’s sales, number of employees, assets, and research
and development expenditures in each year. The analyses that use this data
are limited, however, as Compustat only includes information about publicly
traded …rms.
Since …rm names on patents and LCAs are often entered di¤erently across
observations, we had to manually match the names to our lists of the top employers of H-1B visas, names of the top patentors, and names of the top LCA
…lers. The same procedure was used to match companies in all data sets. A
list of all of the potential patents/LCA applications that could potentially refer
7
to the company in question was …rst obtained by searching by the name of the
company. From this master list, matches to the relevant company were then
determined. In the end, 8967 di¤erent company names were found for the 307
companies in our initial list. If a listed name was a subsidiary of a larger company (e.g. Verizon Data Services), we match on the parent company as well.
While this process required signi…cant research into each company, it is an issue
that all studies that have analyzed patenting in …rms have had to face (see, for
example, Hall and Ziedonis (2001)).
In the patents data, we identi…ed all company names that were potential
matches to the …rms in our sample. In the LCA data, we employed a more
limited matching algorithm. We …rst created a rough list of potential name
matches and then omitted matching on the company names that account for
a relatively small number of LCA applications. This was done based on an
algorithm that involved several steps. For a particular …rm in our lists, if there
was a clear match for at least one name that accounted for over 100 applications, we then limited the rest of our search to company names that accounted
for more than 5 applications. Similarly, if there was a clear match for at least
one name with 50-100 applications, we then limited the rest of our search to
company names that had more than 3 applications. Otherwise, we considered
all company names that were potential matches. For most …rms, 2-3 company names were listed on the vast majority of applications that were potential
matches, with a large number of remainder company names that accounted for
1-2 applications (from misspellings etc.). Our procedure consequently ended
up accounting for 95% of all observations that we believe could be potential
matches to …rms in our sample.
Our …rm matching work also takes into account large mergers and acquisitions. We …rst obtained Compustat data for each …rm on our lists and then
identi…ed years in which there was a 50% decrease in employees, a 100% increase in employees, or a sudden stop in the data. From this, we searched
the company history of each of these …rms and determined whether there was
a merger, acquisition, or divestiture in that year for which we had to account.
We documented all of the cases in which there was and then matched on these
additional companies in the patents and LCA data, as well as getting data for
them in Compustat. Composite …rms were then created where, for example,
we treated two …rms that merged together as one …rm prior to the merger (e.g.
Lockheed Corporation and Martin Marietta are treated as one …rm together
before their merger in 1995). The company name matching in the patents and
LCA data was also updated to re‡ect these changes. Firms that went through
such a large amount of corporate restructurings as to make a coherent composite
…rm infeasible to construct were dropped from the sample. More details about
this process are found in the following sections of this appendix.
We additionally accounted for joint ventures in both the patent and LCA
data. If a …rm was a part of any joint venture that produced an application, we
counted that application for that …rm. In the small number of cases where the
joint venture was between …rms that were both in our sample, we counted the
application for both …rms.
8
As a double check on our work in matching …rms in the patents data, we
went through a more limited patent-…rm match data set maintained by Browyn
Hall. All of the company names that were matched to the …rms in our sample
in this data set were also incorporated into our work4 .
6
Firm-Speci…c Details
In this section we include an extensive description of the details surrounding
particular …rms that came to our attention in the process of constructing our
panel data set. This includes information about mergers and acquisitions,
divestitures, notable subsidiary relationships, name changes, joint ventures, and
an accounting of the …rms that had to be dropped from the analysis. Since
most of the companies on our lists went through at least some restructuring
over our sample period and most of this activity was small relative to the size
of the …rm, we only account for large changes in the structure of each company.
As it is often useful to know whether a company that we refer to was one of
the companies that we were searching for, or was instead a related company, we
note them as (e) for being an entry on our lists and (ne) for not being an entry
on our lists. It should be noted that many of these details did not eventually
signi…cantly a¤ect the composition of the panel that we use for our main …rmlevel estimations, as it was only once we had gone through such details that the
…rm sample could be properly restricted to the types of …rms that we intended to
consider. The details noted below restrict our analyses to the 177 …rm sample
considered in Tables 8 and 9 in the main text. We get our sample of 76 …rms
used for the main estimations when we restrict the sample to …rms based in the
US that had at least 5 patents per year and that had at least one inventor of
each relevant ethnicity (Indian, Chinese, English, Other) on one of its patents
in every year 1995-2006. A list of the companies that were included in our …nal
panel of 76 …rms is found in Data Appendix Table 5.
6.1
Mergers and Acquisitions
Several of the …rms on our lists went through mergers and acquisitions. A
description of the exact process by which we chose which mergers and acquisitions to account for was laid out in Section 3. Here we detail the corporate
restructurings for which we did account. In most of these cases, we created a
composite …rm, where the …rms that went through mergers or acquisitions are
4 This data set is publicly available online at:
http://elsa.berkeley.edu/~bhhall/pat/namematch.html
9
treated as the same …rm prior to their joining together. The patents, LCAs,
and Compustat data are all matched together for the composite …rm. Further
details for each case are listed below.
1. American Cyanamid (ne) and its subsidiary Lederle Laboratories (ne) were
acquired by American Home Products (e) in 1994. American Home Products then changed its name to Wyeth (e) in 2002. Both of the names
“American Home Products”and “Wyeth”are on our lists. We treat American Cyanamid, Lederle Laboratories, American Home Products (Wyeth)
as a composite …rm and match on all of these names in the patents and
LCA data.
2. Andrew Corporation (e) acquired Comsearch (ne) and Allen Telecom (ne)
in 2003. We found records for Allen Telecom in Compustat but no such
records for Comsearch, since Comsearch was never public. Since the Comsearch acquisition was not that signi…cant by itself, we keep Andrew without the update on the old sales of Comsearch etc.
3. Donnelly (e) was acquired by Magna International (ne) in 2002. We match
on both company names together under Magna International.
4. Engelhard (e) was acquired by BASF (e) in 2006 and only in August did
it begin to rename Engelhard. We treat these companies together as a
composite …rm.
5. Exxon Mobil (e) is the parent of Esso (ne), Mobil (ne) and ExxonMobil
(e) companies. Exxon (ne) and Mobile (ne) companies merged in 1999 to
form Exxon Mobile. We match on all of these names together and treat
them as a composite …rm.
6. Gillette (e) was acquired by Proctor and Gamble (e) in 2005. We treat
these companies together as a composite …rm.
7. Hewlett Packard (e) and Compaq (e) merged in 2002. We treat these
companies together as a composite …rm.
8. Hughes Electronics (e) used to be a part of General Motors (it was acquired
in 1985) but was sold to NewsCorp in 2003. We consider it a part of
General Motors for the whole of the period of our analyses.
9. Immunex (e) was acquired by Amgen (e) in 2001. We treat these companies as a composite …rm.
10. Chase Manhattan Corporation (ne) acquired JP Morgan (ne) to form JP
Morgan Chase (e) in 2000. The company then merged with Bank One
(ne) in 2004 but still kept the name JP Morgan Chase. We treat these
companies as a composite …rm.
10
11. KLA-Tencor (e) was formed in May of 1997 through the merger of KLA
Instruments (ne) and Tencor Instruments (ne). We treat these companies
as a composite …rm.
12. Kraft Foods (e) was a subsidiary of Phillips Morris (ne) from 1988 to 2007.
In 2000 Philip Morris acquired Nabisco. In 2003 Philip Morris Companies
Inc. changed its name to Altria Group (ne). We treat these companies
as a composite …rm.
13. Lockheed Martin (e) was formed in 1995 from the merger of Lockheed
Corporation (ne) and Martin Marietta (ne). We treat these companies as
a composite …rm.
14. Lucent (e) and Alcatel (e) merged in late 2006 to form Alcatel-Lucent
(ne). We consider these two companies as the same company prior to this
merger.
15. In May of 2006, Maxtor Corporation (e) was acquired by Seagate Technology (e). We treat these companies as a composite …rm.
16. McDonnell Douglas (ne) and Boeing (e) merged in 1997. We treat these
companies as a composite …rm.
17. In 2005 Oracle (e) acquired Siebel Systems (e). We treat these companies
as a composite …rm.
18. Pacesetter, Inc (e) is a part of St Jude (ne). St Jude acquired Pacesetter
in 1994. We treat these companies as a composite …rm and refer to it as
St Jude for future reference.
19. Pioneer Hi-Bred (e) was acquired by DuPont (e) in 1999. We treat these
companies as a composite …rm.
20. Schering (e) acquired Bayer (e) in late 2006. Although they were separate
for the majority of the time in our analyses, we still consider them as a
composite company.
21. When Sprint Corporation (e) purchased Nextel Communications (e) in
2005, Sprint Nextel (e) was created. We treat these companies as a composite …rm.
22. SPX (e) merged with General Signal Corporation (ne) in 1998. We treat
these companies as a composite …rm.
23. Storage Technology Corporation (e) was acquired by Sun Microsystems
(e) in 2005. We treat these companies as a composite …rm.
24. Synopsys (e) acquired Numerical Technologies (e) in 2003. We treat these
companies as a composite …rm.
11
25. United Technologies (e) acquired Chubb plc (ne) in 2003. We treat these
companies as a composite …rm.
26. In April of 2006, Whirlpool (e) acquired Maytag (e). We treat these
companies as a composite …rm.
6.2
Divestitures
There were a couple listings in our …rm sample that were originally a part of
other companies. As the events occurred during our period of analysis, we count
these as a part of their original parent company. Composite …rms were created
just as they were with companies that went through mergers and acquisitions.
1. Delphi Technologies (e) was created from a General Motors (e) spin o¤ in
1998. We continue to treat it as if it were a part of General Motors.
2. Freescale Semiconductor (e) was spun o¤ from Motorola (e) in 2004 and
has basically retained its identity from there on. Since this was a large
part of Motorola (and is a top H-1B sponsor), we continue to treat it as
if it were a part of Motorola.
3. In…neon (e) used to be a part of Siemens (e) but was spun o¤ in 1999. We
treat these companies as a composite …rm.
6.3
Notable Subsidiaries
Several of the company names in our lists that created the initial …rm sample
were actually subsidiaries of other companies. In this case, we matched on the
names for the subsidiary as well as for the parent company in the patents and
LCA data. We then used Compustat data for the parent company and refer
to the parent as the …rm on which we match. This is not an exhaustive list
of these relationships but rather a partial listing of those that would not be
obvious to an observer without extensive background in the histories of these
companies.
1. Ethicon (e) is a subsidiary of Johnson & Johnson (ne). We match on
both Johnson & Johnson and Ethicon together and refer to the composite
company as Johnson & Johnson for future reference.
12
2. Marvell Semiconductor (e) is a subsidiary of Marvell Technology Group
(ne). We match on Marvell Technology Group as well as its subsidiary
Marvell Semiconductor, Inc.
3. Palo Alto Research Center (e) is a subsidiary of Xerox Corporation (e).
We count the patents and LCAs for Palo Alto Research Center for Xerox.
4. Weatherford/Lamb (e) is a subsidiary of Weatherford International (ne).
We …nd all assignee names that match to Weatherford International as
well as its subsidiary Weatherford/Lamb.
6.4
Name Issues
There were several …rms in our sample that changed their names or go by different names in di¤erent contexts. This was particularly relevant for matching
these companies to the assignee names in the patents and LCA data. Here, we
document these cases. In each case, we search for all company names in the
patents and LCA data.
1. Advanced Micro Devices (e) is also known as AMD (ne).
2. Advanced Technology Materials (e) is also known as ATMI (ne).
3. American Home Products (e) is an entry in our lists. In 2002 it changed
its name to Wyeth (e).
4. Atlantic Duncans International (e) changed its name to Optimos Inc (ne)
in 2000.
5. Hon Hai (e) is an entry in one of our lists but its trade name in the United
States is “Foxconn” (ne).
6. Incyte Genomics (e) and Incyte Pharmaceuticals (e) are refer to the same
company –Incyte.
7. Koninklijke Philips Electronics NV (e) and Philips Electronics of North
America (e) are the same company for our purposes.
8. Matsushita (e) goes by the trade name Panasonic (ne) in North America.
9. Mastech (e) and iGate Mastech (e) are two di¤erent entries on our lists.
They are the same company, however, and we match on both of them
together.
13
10. STMicroelectronics (e) was formed in June 1987 by the merger of semiconductor companies Thomson Semiconducteurs (ne), a part of the french
company Thomson (ne), and SGS Microelettronica (ne). At the time of
the merger the company was known as SGS-Thomson (ne) but took its
current name in May 1998 following the withdrawal of Thomson SA as an
owner. We search for both company names - SGS-Thomson and STMicroelectronics - in the patents and LCA data.
6.5
Joint Ventures
There were a number of joint ventures for which we accounted. The collaborations that we account for, however, are not comprehensive; doing a full match on
joint ventures would have required a far more extensive company name matching procedure. We primarily identi…ed them using (1) the fact that some of the
company names in our lists were joint ventures and (2) the fact that searching
for the names of the …rms in our sample naturally brought certain joint ventures
to our attention. As long as the company that we were searching for was one
of the names on the application (that is, one of the …rms in the joint venture),
we included it. This did not present an issue for most of the joint ventures.
However, there were several such collaborations where both of the companies
that were listed on the applications were companies for which we were searching.
The joint venture Dow Corning provides an example, since both Dow Chemical
and Corning are on our lists. In these cases we have attributed the patent or
LCA to both companies. Below, we have described the details about four of the
major joint ventures that we identi…ed in the data. It is worth noting, however,
that joint ventures between two or more …rms on our lists were quite small in
number compared to the overall activity for our …rms both in the patents and
LCA data.
1. HRL Labs (Hughes Research Laboratories) (e) has been a subsidiary of
General Motors (e), Raytheon (e), and Boeing (e) during the period of
our analyses. It was under General Motors’ control until 1997, a joint
venture between Raytheon and Boeing 1997-2000, a joint venture between
Raytheon, Boeing and General Motors 2001-2005, and a joint venture
between Boeing and General Motors 2006-today. The patents and LCAs
are attributed to the owners according to this timeline.
2. Dow Corning was created as a joint venture between Corning Glass Works
(now Corning, Incorporated) (e) and Dow Chemical Company (e).
3. “Fuji Xerox” (e) is a joint venture between Fuji Film (ne) and Xerox
Corporation (e). We have only counted observations with this company
name towards the applications for Xerox and have not included Fuji Film
in our analyses.
14
4. UOP LLC (e) is an entry on one of our lists as is Honeywell International
Inc. (e) and Dow Chemical (e). UOP was a joint venture between Honeywell and Dow until 2005, at which time Honeywell took over. Applications
in the patents and LCA data are counted for Dow through 2005. As noted
below, we drop Honeywell (e).
6.6
Dropped From The Analysis
There were several companies in our initial lists that were not possible to include
in our analyses, typically due to a large degree of merger and acquisition activity.
Here we document these cases. Note that this section does not describe …rms
that we have not been able to include due to the fact that they do not have the
needed records in Compustat. There was no preset algorithm for determining
which companies to drop – dropping each …rm was based on our judgment of
the particular circumstances surrounding each company.
1. Acushnet (e) is a subsidiary of Fortune Brands (ne) and has been since
the 1970s. Moen (e) is a subsidiary as well. The problem arises in that
patents and LCAs are listed under Fortune’s subsidiaries’ names, while
the Compustat data is only for the parent company Fortune Brands. Following all of Fortune’s subsidiaries would be too di¢ cult and thus we have
to drop Acushnet and Moen from the analyses
2. Allied Signal (ne) and Honeywell (e) merged in 1999. We were not able to
get Compustat records for Allied Signal and thus have to drop Honeywell.
This means that the joint venture UOP (e) will be dropped for Honeywell
as well.
3. Applera (e) and PE Corporation (e) relate to a common company. However, the preceding company, Perkin-Elmer (ne), split into two companies
in 1999 that had their own restructurings. It is thus not possible to construct a composite company and we drop Applera (e) and PE Corporation
(e) from the analysis.
4. AT&T (e) is an entry on our lists. Southwestern Bell Corporation (SBC)
(e) acquired Paci…c Telesis (ne) and Southern New England Telecommunications (ne) in 1997. SBC (e) and Ameritech (e) merged in 1999. SBC
purchased AT&T Corp. (e) in 2005. Following all of these …rms (which
also had their fair share of mergers and acquisitions) together would be
quite di¢ cult and we thus drop AT&T from our analyses. AT&T (e) also
acquired BellSouth (e) in late 2006. BellSouth itself has gone through
several mergers and acquisitions (including a large acquisition of a part of
AT&T in 2004). We thus drop BellSouth as well.
15
5. What is now known as Bank of America (e) went through several signi…cant mergers and acquisitions in the 1990s, particularly in 1989, 1991, and
1998. Following all of these …rms is quite di¢ cult and we thus drop Bank
of America.
6. BAE Systems (e) has a very large jump in employment in 1999, when
it acquired a part of General Electric (e). Since it is impossible to trace
this part of General Electric separately, we drop BAE Systems from the
analysis.
7. In a move that signi…cantly changed the size of the company, Boston Scienti…c (e) acquired Guidant Corporation (ne) in 2006 and had four significant mergers and acquisitions in 1995. Following all of these …rms would
be quite di¢ cult and we thus drop Boston Scienti…c from the analysis (see
point #8 for related reference).
8. Cardiac Pacemakers (e) and Advance Cardiovascular Systems (e) were
spun o¤ from Eli Lilly (e) in 1995 (among other divisions) to form Guidant
(ne). Guidant then made several acquisitions during 1995-2006. Since
getting preperiod sales for either Cardiac Pacemakers or Advance Cardiovascular Systems would not be possible and we drop Boston Scienti…c, we
drop Cardiac Pacemakers and Advance Cardiovascular Systems as well.
9. Chevron (e) and Texaco (ne) merged to form ChevronTexaco in 2001.
The name was then changed to Chevron in 2005. The company also acquired Unocal Corporation (ne) in 2005. “Chevron Chemical Company”
is an entry on our lists. In 2000, Chevron Corporation and Phillips Petroleum Company formed Chevron Phillips Chemical Company. Chevron
and ConocoPhillips (e) each currently own 50 percent of this joint venture. Conoco (e) and ConocoPhillips (e) are also both entries on our
lists. Conoco Inc merged with Phillips Petroleum Company in 2000 to
form ConocoPhillips. As this is the only company name in our lists that
refers to Chevron, as Chevron has gone through a number of mergers and
acquisitions, and as we drop ConocoPhillips (see below), we drop Chevron.
10. Chiron (e) was acquired by Cetus (ne) in 1991, was partially bought by
Novartis (ne) in 2005 and then fully bought by it in 2006. Novartis, on
the other hand, has gone through many mergers and acquisitions over the
years and so we drop Chiron.
11. Citigroup (e) went through several large mergers and acquisitions in 1993,
1998, 2001 and has not even been known as Citigroup for the whole period
of our analyses. Since these changes were so large, we drop Citigroup.
12. CNH America (e) stands for Case New Holland America. It was created in
1999 through the merger of New Holland N.V. (ne) and Case Corporation
(ne). The history behind New Holland and Case Corporation is also
replete with merger and acquisition activity and so we choose to drop
CNH.
16
13. Conoco (e) and ConocoPhillips (e) are both entries on our lists. Conoco
Inc merged with Phillips Petroleum Company (ne) in 2002 to form ConocoPhillips. Conoco, in turn, was created as a spin o¤ from Dupont in
1997. We thus can not get a measure of preperiod sales for Conoco and
thus we also drop ConocoPhillips as well. Note that we still keep Dupont,
however, since this was not extremely large in comparison with its size.
14. Corixa (e) was sold to GlaxoSmithKline (e) in 2005. Since tracking all of
the …rms that made GlaxoSmithKline was too di¢ cult, we drop Corixa
(see the discussion of GlaxoSmithKline below).
15. CVS Pharmacy (e) is an entry on one of our lists. In 1997 it acquired
Revco (ne) and in 2006 it acquired Minute Clinic (ne). We have not
been able to …nd records for many of these companies in Compustat and
following them would be di¢ cult. We drop CVS from the analysis.
16. DaimlerChrysler (e) was formed from the merger of Daimler-Benz (ne) and
Chrysler Corporation (ne) in 1998. Treating all of these …rms together as
a composite …rm would be too di¢ cult and we consequently have to drop
DaimlerChrysler.
17. EMC (e) went through a large number of acquisitions in the early 2000s
and many of the companies were private. It would thus be very di¢ cult
to construct a composite company. We consequently drop EMC from the
analysis.
18. Ernst & Young (e) sold its consultancy group to Cap Gemini (ne) in 2000.
Since Ernst & Young’s consultancy likely used plenty of LCAs (as the
whole company used a large number) and Cap Gemini does not have any
patents, we drop Ernst & Young from the analysis.
19. Federal Mogul (e) experienced a large increase in employment in 1998.
In that year it acquired Turner & Newall (ne), Cooper Automotive (ne)
from Cooper Industries (ne), and Fel-Pro (ne) all of which were large
acquisitions. Following all of these …rms would be quite di¢ cult and so
we drop Federal Mogul.
20. It could be argued that we “dropped” Fuji Film. The relevant company
name that was in our lists was "Fuji Xerox," which is a joint venture
between Fuji Film and Xerox (e). We have just matched this with Xerox,
which is another company on our lists.
21. In 1997 General Instrument (e) split into three companies –General Semiconductor (ne), CommScope (ne) and NextLevel Systems (ne). It does not
make sense to track all of these companies as one, and so we drop General
Instrument.
22. In 2000 i2 Technologies (e) acquired Aspect Development (ne). Aspect
Development was a private company and consequently does not have a
17
record in Compustat. Since this was a large acquisition, we can not track
i2 Technologies as one company and we have to drop it.
23. In 2000 Incyte (e) acquired Proteome Inc (ne), which signi…cantly expanded the size of the company. Proteome Inc was private and does
not have a record in Compustat. We thus can not track Incyte as one
company and we have to drop it.
24. JDS Uniphase (e) was formed when JDS FITEL (ne) and Uniphase Corporation (ne) merged in 1999. It rebranded itself to be called JDSU (ne)
in 2005. As both JDS FITEL and Uniphase Corporation went through
several signi…cant mergers, acquisitions, and divestitures themselves prior
to merging together, we drop JDS Uniphase from the analysis.
25. Masco (e) acquired Zenith Products Corporation (e) in 1994. We have
not found a record for Zenith Products in Compustat and so we drop
both Zenith Products Corporation (e) and Masco Corporation (e) from
our analyses.
26. Monsanto (e) went through very large corporate restructurings in the late
1990s, to the point where the current Monsanto is a di¤erent legal entity than the “Monsanto” operating before 2000 and is in a signi…cantly
di¤erent line of business. We therefore drop it from the analysis.
27. During 1991-3 NCR Corporation (e) was a subsidiary of AT&T (e) and
has gone through several restructurings since. We therefore drop it from
the analysis.
28. In 2001, Northrop Grumman (e) acquired Litton (ne) and TRW (e). In
that year, its employment more than doubled. It subsequently sold o¤ a
major part of TRW (e). Since tracking this all together would be di¢ cult,
we drop Northrop Grumman. As TRW (e) is also an entry on our lists,
we drop this as well.
29. Nortel Networks (e) as it is today was created from a spin o¤ of BCE (Bell
Canada Enterprises). BCE (ne) is a public company but from its company history we know that it also went through plenty of other mergers,
acquisitions and divestitures. We consequently drop Nortel Networks.
30. SanDisk (e) acquired Matrix Semiconductor (e) in 2005. Matrix Semiconductor was a private …rm and thus we do not have any Compustat
records for it. The acquisition caused a very large change in the size of
the company and thus we drop Sandisk.
31. Semiconductor Components Industries (e) is a subsidiary of ON Semiconductor (ne). ON Semiconductor was spun o¤ from Motorola (e) in 1999.
It then proceeded to make several acquisitions, signi…cantly increasing the
size of the company. It would be very di¢ cult to follow all of these companies together and thus we drop ON Semiconductor from the analysis.
18
Note, however, that the spin o¤ of ON Semiconductor from Motorola was
not large relative to Motorola’s size and thus we keep Motorola in the
analysis.
32. Sugen (e) merged with Pharmacia & Upjohn Inc. (ne) in 1999. Following
these two companies together would be di¢ cult and thus we drop Sugen.
33. T-Mobile (e) was previously known as VoiceStream Wireless (ne) and
Powertel (ne). VoiceStream was acquired by Deutsche Telekom (ne) in
2001, and in changed its name to T-Mobile in 2002. We drop it because
of the signi…cant identity changes and fact that we could not …nd a record
for it in Compustat.
34. SmithKline Beckman (ne) and The Beecham Group (ne) merged to form
SmithKline Beecham (e) in 1989. In 1995 Glaxo (ne) and Wellcome (ne)
merged to form Glaxo Wellcome (ne). Also in 1995, Glaxo Wellcome acquired A¤ymax (ne). Glaxo Wellcome and SmithKline Beecham merged
in 2001 to form GlaxoSmithKline (ne). GlaxoSmithKline (e) subsequently
bought Corixa (e) in 2005. Since there were many mergers and acquisitions for this company and many of these companies were foreign (and
thus do not have records in Compustat) this …rm was dropped from the
analysis.
35. Verizon (e) was formed through a series of mergers and acquisitions that
made it and its antecedents impossible to follow as a composite company. Bell Atlantic (e) merged with NYNEX (ne) in 1997. GTE (ne)
then merged with Bell Atlantic (e) in 2000 to form Verizon. Verizon acquired MCI (ne), which was formerly WorldCom (e) in 2005. Because
following all of these companies together as a composite company would
not make sense, we have to drop Verizon, Bell Atlantic, and WorldCom
from our analyses.
19
Data Appendix Table 1: LCA Data
Variable
Description
Submitted_Date
Case_No
Name
Address
Address2
City
State
Postal_Code
Nbr_Immigrants
Begin_Date
End_Date
Job_Title
Dol_Decision_Date
Certified_Begin_Date
Certified_End_Date
Job_Code
Approval_Status
Wage_Rate_1
Rate_Per_1
Max_Rate_1
Part_Time_1
City_1
State_1
Prevailing_Wage_1
Wage_Source_1
Yr_Source_Pub_1
Other_Wage_Source_1
Wage_Rate_2
Rate_Per_2
Max_Rate_2
Part_Time_2
City_2
State_2
Prevailing_Wage_2
Wage_Source_2
Yr_Source_Pub_2
Other_Wage_Source_2
Date and time the application was submitted
Case number
Employer's name
Employer's address
Employer's address2
Employer's city
Employer's state
Employer's postal code
Number of job openings
Proposed begin date
Proposed end date
Job title
Date certified or denied
Certification start date
Certification end date
Three digit occupational group
Approval status - certified or denied
Employer's proposed wage rate
Unit of pay for proposed wage rate
Maximum proposed wage rate
Y = Part time; N = Full time position
Work city (location of the job opening)
Work state (location of the job opening)
Prevailing wage rate
Collective bargaining; SESA; Other
Year that the prevailing wage data was published
Description of the Other wage source
Employer's proposed wage rate - second location
Unit of pay for proposed wage - second location
Maximum proposed wage rate - second location
Y = Part time; N = Full time position
Work city - second location
Work state - second location
Prevailing wage rate - second location
Collective bargaining; SESA; Other
Year that the prevailing wage data was published
Description of the Other wage source
Notes: LCA data are kept by the Department of Labor and are publicly available at
http://www.flcdatacenter.com/CaseH1B.aspx.
Data Appendix Table 2: LCA Summary Statistics
Number of Applications
Most Common MSA
Second Most Common MSA
Third Most Common MSA
LCAs By 76 Firm Panel
LCAs By 307 Firm Panel
Most Common Firm
Second Most Common Firm
Third Most Common Firm
2001
2002
2003
2004
2005
2006
Overall
239,123
244,759
257,199
330,111
312,741
374,463
1,758,396
NY
SF
LA
NY
LA
SF
NY
SF
LA
NY
SF
LA
NY
SF
LA
NY
SF
LA
NY
SF
LA
4.6%
12.3%
3.3%
8.6%
4.9%
11.6%
4.3%
11.5%
4.3%
12.7%
4.9%
13.4%
4.4%
11.8%
Oracle Microsoft Microsoft Microsoft Microsoft Microsoft Microsoft
IBM
IBM
Microsoft
Cisco
Intel
IBM
IBM
IBM
IBM
Oracle
IBM
Oracle
Oracle
Oracle
Notes: The firm rankings of the largest LCA applicants are only done with respect to the companies in our panel of 76 firms.
Acronyms stand for (i) NY: New York City (ii) SF: San Francisco (iii) LA: Los Angeles (iv) MSA: Metropolitan Statistical
Area. Shares are relative to all institutions, including universities.
Data Appendix Table 3: Top 100 List
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
28
29
30
31
32
33
35
Company
Motorola Inc
Oracle Corp
Cisco Systems Inc
Mastech
Intel Corp
Microsoft Corp
Rapidigm
Syntel Inc
Wipro LTD
Tata Consultancy Serv
PriceWaterhouseCoopers LLP
People Com Consultants Inc
Lucent Technologies
Infosys Technologies LTD
Nortel Networks Inc
Tekedge Corp
Data Conversion
Tata Infotech
Cotelligent USA Inc
Sun Microsystems Inc
Compuware Corp
KPMG LLP
Intelligroup
Hi Tech Consultants Inc
Group Ipex Inc
Ace Technologies Inc
Hewlett Packard Co
Everest Consulting GR
Bell Atlantic Network Serv
Ernst Young LLP
Agilent Technologies Inc
Deloitte Touche LLP
Birlasoft
Global Consultants
IBM
R Systems Inc
Sprint United Mgt
Wireless Facilities
Number of H-1B Visas
618
455
398
389
367
362
357
337
327
320
272
261
255
239
234
219
195
185
183
182
179
177
161
157
151
149
149
147
141
137
136
130
128
128
124
124
124
124
Notes: This list was originally published by the United States United States Immigration and Naturalization Service in 2000 as
"Leading Employers of Specialty Occupation Workers (H-1B): October 1999 to February 2000."
Data Appendix Table 3: Top 100 List (Continued)
Rank
39
41
42
43
44
45
47
48
49
50
51
52
53
55
56
57
58
60
61
64
65
68
69
70
71
74
75
Company
Cognizant Technology Solutions
Satyam Computer Serv
Keane
University of Washington
Analysts Intl Corp
Capital One Serv
Apar Infotech
Modis Inc
L & T Technology LTD
Complete Business Solutions Inc
Techspan
CMOS Soft Inc
Renaissance Worldwide
University of PA
Conexant Systems Inc
I2 Technologies Inc
AT T
Jean Martin
EMC
Atlantic Duncans Intl
Merrill Lynch
Unique Computing
Computer Intl
Indotronix Intl
Nationwide Insurance
Interim Technology Consulting
Compaq Computer
GE
MSI Majesco Software Inc
Data Core Systems
IT Solutions Inc
Allied Informatics Inc
Ciber Inc
Deloitte Consulting LLC
Goldman Sachs
Baton Rouge Intl
Cyberthink
Stanford University
Number of H-1B Visas
123
123
114
113
110
109
108
108
107
105
101
100
99
97
96
96
93
91
90
87
87
86
85
85
85
84
80
80
80
78
77
76
75
75
75
74
73
73
Data Appendix Table 3: Top 100 List (Continued)
Rank
77
79
82
86
87
88
89
93
94
96
99
Company
Cap Gemini America
Infogain Corp
Ajilon Serv
Allsoft Technologies Inc
Morgan Stanley Dean Witter
Ericsson Inc
Harvard University
Sabre Inc
Yash Technologies Inc
Pyramid Consulting Inc
MSX Intl Inc
Softplus Inc
Baylor College Of Medicine
Microstrategy
University of Minnesota
Universal Software
Computer Horizons
Ramco Systems
Siebel Systems Inc
Insight Solutions Inc
Synopsys Inc
Texas Instruments Inc
Infosynergy
Lason Systems Inc
Vanguard GR
Yale University
Number of H-1B Visas
72
72
71
71
71
70
70
70
70
69
68
67
65
65
65
65
64
63
63
62
62
62
61
61
61
61
Data Appendix Table 4: Top 200 List
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Company
Infosys Technologies
Wipro
Microsoft
Tata Consultancy Services
Satyam Computer Services
Cognizant Tech Solutions U.S.
Patni Computer Systems
IBM
Oracle
Larsen & Toubro Infotech
HCL America
Deloitte & Touche
Cisco Systems
Intel
I-Flex Solutions
Ernst & Young
Tech Mahindra Americas
Motorola
MphasiS
Deloitte Consulting
LanceSoft
New York City Public Schools
Accenture
JP Morgan Chase
Polaris Software Lab India
Covansys
PricewaterhouseCoopers
Qualcomm
Goldman Sachs
KPMG
Marlabs
University of Michigan
Univ. of Illinois at Chicago
University of Pennsylvania
The Johns Hopkins University
Syntel Consulting
Citigroup Global Markets
BearingPoint
University of Maryland
Keane
Notes: This list is published by BusinessWeek magazine and can be found at
http://www.businessweek.com/table/0518_h1btable.htm.
Number of H-1B Visas
4,908
4,002
3,117
3,046
2,880
2,226
1,391
1,130
1,022
947
910
890
828
828
817
774
770
760
751
665
645
642
637
632
611
611
591
533
529
476
475
437
434
432
432
416
413
413
404
386
Data Appendix Table 4: Top 200 List (Continued)
Rank
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
Company
HTC Global Services
iGate Mastech
Hexaware Technologies
Capital One Services
Columbia University
Lehman Brothers
Yahoo!
U.S. Technology Resources
Intelligroup
Hewlett-Packard
Rapidigm
Merrill Lynch
Google
Citibank
Dis National Insts of Health DHHS
Yale University
Nokia
Texas Instruments
Capgemini
Harvard University
EMC
Sun Microsystems
Rite Aid
Bloomberg
General Electric
Amgen
McKinsey U.S.
Morgan Stanley
Stanford University
Washington Univ. in St. Louis
Verizon Data Services
NYC-HHC Harlem Hospital Center
University of Pittsburgh
Indiana University
Ohio State
Everest Consulting Group
Univ. of Minnesota
Amtex Systems
Univ. of Wisconsin at Madison
SUNY-Stony Brook
Number of H-1B Visas
382
378
362
362
355
352
347
339
336
333
330
329
328
322
322
316
314
313
309
308
305
303
301
298
292
289
286
285
279
278
276
276
275
273
271
269
269
268
268
262
Data Appendix Table 4: Top 200 List (Continued)
Rank
Company
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
Amazon Global Resources
Cleveland Clinic Foundation
Dallas Independent School District
Univ. of Calif. at Davis
Northwestern
Syntel
Univ. of Missouri at Columbia
GlobalCynex
Kanbay
American Solutions
Univ. of Florida Intl. Center
UCLA
Duke Univ. Medical Center
Mount Sinai Medical Center
Bank of America
Software Research Group
Baylor College of Medicine
Massachusetts General Hospital
Ciber
Verinon Technology Solutions
Everest Business Solutions
Volt Technical Resources
Oklahoma State University
Compunnel Software Group
U.S. Tech Solutions
Symantec
JSMN International
UBS
CVS Pharmacy
The Pennsylvania State University
University of Washington
Nortel Networks
Univ. of Calif. at San Francisco
University of Mass. Medical School
Sprint/United Management
Houston Independent School District
Purdue
Global Consultants
Emory University
UT Health Science Center
Number of H-1B Visas
262
256
255
254
251
250
247
247
246
242
240
239
238
236
236
234
234
232
232
230
226
224
223
222
221
220
218
216
213
213
213
212
211
210
209
209
208
207
207
207
Data Appendix Table 4: Top 200 List (Continued)
Rank
Company
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
Univ. of Colorado
Vanderbilt University
ObjectWin Technology
Diaspark
HSBC Bank USA
eBusiness Application Solutions
Broadcom
Prince Georges County (Md.) Public Schs
Micron Technology
Countrywide Home Loans
Texas A&M
Applied Materials
Schlumberger Technology
University of Iowa
IBM Global Svcs. India
Deloitte Tax
Cummins
iTech U.S.
Compuware
Intl. Students And Scholars Office
Univ. of Calif. at San Diego
Walgreen's
Howard Hughes Medical Institute
USC
Vision Systems Group
T Mobile USA
Multivision
Electronic Data Systems
Massachusetts Institute of Technology
California Institute of Technology
Case Western Reserve Univ.
UNC at Chapel Hill
Univ. of Alabama at Birmingham
Deutsche Bank
Caterpillar
Hallmark Global Technologies
cyberThink
Corporate Computer Services
Advanced Micro Devices
Megasoft Consultants
Number of H-1B Visas
207
205
205
204
203
203
203
203
202
198
198
195
194
194
194
194
193
191
189
186
185
184
184
183
182
180
178
177
175
174
173
173
172
170
170
169
169
167
167
166
Data Appendix Table 4: Top 200 List (Continued)
Rank
Company
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
Enterprise Solutions
Freescale Semiconductor
UT Southwestern Medical Center
First Tek Technologies
Michigan State
Research Fdn of the State Univ Of
COMSYS Services
Virginia Tech
Juniper Networks
University of Arizona
Iowa State University
University of Virginia
FedEx Corporate Services
Credit Suisse First Boston
Bristol-Myers Squibb
Verizon Services
Ebay
Ajilon Consulting
General Motors
Camo Technologies
Marvell Semiconductor
CMC Americas
UT M.D. Anderson Cancer Center
NVIDIA
AT&T Services
Weill Medical College of Cornell
Axiom Systems
Wayne State University
Mayo Clinic Rochester
North Carolina State
Genentech
Makro Technologies
SVAM International
Memorial Sloan-Kettering Cancer
Nutech Information Systems
Xpedite Technologies
Automatic Data Processing
Louisiana State
Fannie Mae
MindTree Consulting
Number of H-1B Visas
165
163
163
161
161
160
160
160
160
158
157
157
157
156
156
156
155
154
153
152
151
150
149
149
147
146
146
146
146
146
146
145
144
143
143
143
143
142
141
141
Data Appendix Table 5: Data Source of Each Firm
Firm
Abbott Laboratories
Air Products and Chemicals Inc
Allergan Inc
Altera Corporation
Advanced Micro Devices
Altria
Amgen Inc
Apple Computer Inc
Applied Materials Inc
Baker Hughes Inc
Baxter International
Becton, Dickinson and Company
Black and Decker Inc
Boeing Company
Bristol-Myers Squibb Company
Caterpillar Inc
Cirrus Logic Inc
Cisco Systems Inc
Corning Inc
Cummins
Cypress Semiconductor Corporation
Dell
Dow Chemical Company
E I Du Pont De Nemours and Company
Eastman Kodak Company
Eaton Corporation
Emerson Electric Company
ExxonMobile
Ford
General Electric Company
General Instrument Corporation
General Motors Corporation
Goodyear Tire and Rubber Company
Halliburton Company
1999 Top
100 List
2006 Top
200 List
LCA Data
Yes
159
66
132
65
Yes
13
Yes
137
Yes
65
179
Yes
Yes
Yes
Yes
Yes
Yes
Yes
175
155
3
Patent
Grants
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Patent
Applications
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Notes: This table lists the 76 firms that make up our firm panel and the reasons why they were included. If a firm was listed in
the 1999 Top 100 or 2006 Top 200 lists, they are denoted as such, including the ranking that they received in the list. If the
firm was one of the top LCA filers (>=.03% of LCA applications) or top patentors in either the grants or applications data
(>=.05% in either data set) they are marked with a "Yes" designation. The average annual number of LCAs for a given firm
was 171. Similarly, the average annual number of patent applications for a given firm was 197; the minimum annual average
was 25 patent applications.
Data Appendix Table 5: Data Source of Each Firm (Continued)
Firm
Hewlett Packard-Compaq
Human Genome Sciences Inc
IBM Corporation
Illinois Tool Works Inc
Intel Corporation
International Rectifier Corporation
Isis Pharmaceuticals
Johnson & Johnson
Kimberly Clark Worldwide Inc
Lam Research Corporation
Lexmark International Inc
Lockheed Martin Corporation
LSI Logic Corporation
Medtronic Inc
Merck and Company
Micron Technology
Microsoft Corporation
Molex Inc
Motorola Inc
National Semiconductor Corporation
Oracle Corporation
Pfizer Inc
Pitney Bowes Inc
PPG Industries
Proctor and Gamble Company
Qualcomm Inc
Rambus Inc
Raytheon Company
Rockwell Automation Technologies Inc
Schlumberger Technology Corporation
St. Jude
Sun Microsystems Inc
Symbol Technologies Inc
Synopsys Inc
Texas Instruments Inc
3Com
3M
Unisys Corporation
United Technologies Corporation
Wyeth
Xerox Corporation
Xilinx Inc
1999 Top
100 List
2006 Top
200 List
LCA Data
26, 65
50
Yes
35
8
Yes
5
14
Yes
6
129
3
Yes
Yes
1
18
Yes
2
9
Yes
28
Yes
133
Yes
20
62
Yes
96
96
58
Yes
Patent
Grants
Patent
Applications
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Data Appendix Figure 1: Other Relevant Visas
200,000
150,000
100,000
50,000
0
1990
1992
1994
TN Border Crossings
1996
1998
2000
(L1 Border Crossings)/5
2002
2004
New L1 Issuances
2006
The Ethnic Composition of US Inventors
William R. Kerr
Harvard Business School
Boston MA
HBS Working Paper 08-006 (revised)
December 2008
Abstract
The ethnic composition of US scientists and engineers is undergoing a signi…cant transformation. This study applies an ethnic-name database to individual patent records
granted by the United States Patent and Trademark O¢ ce to document these trends with
greater detail than previously available. Most notably, the contributions of Chinese and
Indian scientists to US technology formation increased dramatically in the 1990s, before
noticeably leveling o¤ after 2000 and declining in the case of Indian researchers. Growth
in ethnic innovation is concentrated in high-tech sectors; the institutional and geographic
dimensions are further characterized.
JEL Classi…cation: F15, F22, J44, J61, O31.
Key Words: Innovation, Research and Development, Patents, Scientists, Engineers,
Inventors, Ethnicity, Immigration.
Comments are appreciated and can be sent to wkerr@hbs.edu. This permanent working paper is continually
updated as additional patenting data are collected. The …rst version is included in Kerr (2005). I am grateful to
William Lincoln and Debbie Strumsky for data assistance. This research is supported by the National Science
Foundation, HBS Research, the Innovation Policy and the Economy Group, and the MIT George Schultz Fund.
1
1
Introduction
The contributions of immigrants to US technology formation are staggering: while foreign-born
account for just over 10% of the US working population, they represent 25% of the US science
and engineering (SE) workforce and nearly 50% of those with doctorates. Even looking within
the Ph.D. level, ethnic researchers make an exceptional contribution to science as measured by
Nobel Prizes, election to the National Academy of Sciences, patent citation counts, and so on.1
Moreover, ethnic entrepreneurs are very active in commercializing new technologies, especially
in the high-tech sectors (e.g., Saxenian 2002a). The magnitude of these ethnic contributions
raises many research and policy questions: debates regarding the appropriate quota for H1-B
temporary visas, the possible crowding out of native students from SE …elds, the brain-drain
or brain-circulation e¤ect on sending countries, and the future prospects for US technology
leadership are just four examples.2
Econometric studies quantifying the role of ethnic scientists and engineers for technology
formation and di¤usion are often hampered, however, by data constraints. It is very di¢ cult
to assemble su¢ cient cross-sectional and longitudinal variation for large-scale panel exercises.3
This paper describes a new approach for quantifying the ethnic composition of US inventors
with previously unavailable detail. The technique exploits the inventor names contained on
the micro-records for all patents granted by the United States Patent and Trademark O¢ ce
(USPTO) from January 1975 to May 2008.4 Each patent record lists one or more inventors,
with 8 million inventor names associated with the 4.5 million patents. The USPTO grants
patents to inventors living within and outside of the US, with each group accounting for about
half of patents over the 1975-2008 period.
This study maps into these inventor names an ethnic-name database typically used for commercial applications.5 This approach exploits the idea that inventors with the surnames Chang
or Wang are likely of Chinese ethnicity, those with surnames Rodriguez or Martinez of Hispanic
ethnicity, and so on. The match rates range from 92%-98% for US domestic inventor records,
depending upon the procedure employed, and the process a¤ords the distinction of nine ethnicities: Chinese, English, European, Hispanic/Filipino, Indian/Hindi, Japanese, Korean, Russian,
1
For example, Stephan and Levin (2001), Burton and Wang (1999), Johnson (1998, 2001), and Streeter (1997).
Representative papers are Lowell (2000), Borjas (2004), Saxenian (2002b), and Freeman (2005) respectively.
3
While the decennial Census provides detailed cross-sectional descriptions, its longitudinal variation is necessarily limited. On the other hand, the annual Current Population Survey provides poor cross-sectional detail and
does not ask immigrant status until 1994. The SESTAT data o¤er a better trade-o¤ between the two dimensions
but su¤er important sampling biases with respect to immigrants (Kannankutty and Wilkinson 1999).
4
The project initially employed the NBER Patent Data File, compiled by Hall et al. (2001), that includes
patents granted by the USPTO from January 1975 to December 1999. The current version now employs an
extended version developed by HBS Research that includes patents granted through mid 2008. Some of the
descriptive calculations have not been updated from their 1975-1999 values (noted in text).
5
The database is constructed by the Melissa Data Corporation for the design of direct-mail advertisements.
I am grateful to the MIT George Schultz Fund for …nancial assistance in its purchase.
2
1
and Vietnamese. Moreover, because the matching is done at the micro-level, greater detail on
the ethnic composition of inventors is available annually on multiple dimensions: technologies,
cities, companies, etc.6
The next section details the ethnic-name matching strategy, outlines the strengths and weaknesses of the database selected, and o¤ers some validation exercises using patent records …led by
foreign inventors with the USPTO. Section 3 then documents the growing contribution of ethnic
inventors to US technology formation. The rapid increase during the 1990s in the percentage
of high-tech patents granted to Chinese and Indian inventors is particularly striking, as is the
leveling o¤ in these trends after 2000. The relative contributions from scientists of European
ethnicity, however, decline somewhat from their levels in 1975. The institutional and geographic
dimensions of ethnic innovation are further delineated. Section 4 concludes.
2
Ethnic-Name Matching Technique
This section describes the ethnic-name matching strategy employed with the inventor names
contained in the NBER Patent Data File. To begin, two common liabilities associated with
using ethnic-name databases are identi…ed. Addressing these limitations guides the selection
of the Melissa database and the design of the name-matching strategy, which is described in
detail. Descriptive statistics are then provided from a quality-assurance exercise of applying the
ethnic-name strategy to inventors residing outside of the US who …le patent applications with the
USPTO. The section concludes with a further discussion of the advantages and disadvantages
for empirical estimations of the resulting dataset.
2.1
Melissa Ethnic-Name Database and Name-Matching Technique
Ethnic-name databases su¤er from two inherent limitations — not all ethnicities are covered,
and included ethnicities usually receive unequal treatment. The strength of the ethnic-name
database obtained from the Melissa Data Corporation is the identi…cation of Asian ethnicities,
especially Chinese, Indian/Hindi, Japanese, Korean, Russian, and Vietnamese names. The
database is comparatively weaker for looking within continental Europe. For example, Dutch
surnames are collected without …rst names, while the opposite is true for French names. The
Asian comparative advantage and overall cost e¤ectiveness led to the selection of the Melissa
database, as well as the European amalgamation employed in the matching technique. In total,
nine ethnicities are distinguished: Chinese, English, European, Hispanic/Filipino, Indian/Hindi,
6
This ethnic patenting database is employed by Kerr (2005, 2008a-c), Kerr and Lincoln (2008), and Foley and
Kerr (2008) to study the role of ethnic scientists and entrepreneurs in technology formation and di¤usion.
2
Japanese, Korean, Russian, and Vietnamese. The largest ethnicity in the US SE workforce
absent from the ethnic-name database is Iranian, which accounted for 0.7% of bachelor-level SEs
in the 1990 Census.7
The second limitation is that commercial databases vary in the number of names they contain
for each ethnicity. These di¤erences re‡ect both uneven coverage and that some ethnicities are
more homogeneous in their naming conventions. For example, the 1975 to 1999 Her…ndahl
indices for Korean (470) and Vietnamese (1121) surnames are signi…cantly higher than Japanese
(132) and English (164) due to frequent Korean surnames like Kim (16%) and Park (12%) and
Vietnamese surnames like Nguyen (29%) and Tran (12%).
Two polar matching strategies are employed to ensure coverage di¤erences do not overly
in‡uence ethnicity assignments.
Full Matching: This procedure utilizes all of the name assignments in the Melissa
database and manually codes any unmatched surname or …rst name associated with
100 or more inventor records. This technique further exploits the international
distribution of inventor names within the patent database to provide superior results.8
The match rate for this procedure is 98% (98% US, 98% foreign). This rate should
be less than 100% with the Melissa database as not all ethnicities are included.
Restricted Matching: A second strategy employs a uniform name database using
only the 3000 and 200 most common surnames and …rst names, respectively, for each
ethnicity. These numerical bars are the lowest common denominators across the
major ethnicities studied. The match rate for this restricted procedure is 89% (92%
US, 86% foreign).
For matching, names in both the patent and ethnic-name databases are capitalized and truncated
to ten characters. Approximately 88% of the patent name records have a unique surname, …rst
7
The ethnic groups employed: Chinese, English, European (including Dutch, French, German, Italian,
and Polish names), Hispanic/Filipino (including Latino and Filipino/Tagalog names), Indian/Hindi (including Bangladeshi and Pakistani names), Japanese, Korean, Russian (including Armenian and Carpatho-Rusyns
names), and Vietnamese.
The …nal matching procedure employs a joint Hispanic/Filipino ethnicity, while in earlier work they are kept
separate. These two ethnic groups are combined due to extensive name overlaps (e.g., the common surnames
Martinez and Ramirez are in both ethnic lists), but this choice is not a …rst-order concern.
The Bangladeshi and Pakistani name counts are extremely small (8 and 15 respectively) and are not distinct
from the Indian/Hindi names. Their assignment does not materially a¤ect the Indian/Hindi outcome, which
represents in some ways a South Asian identifer.
Jewish ethnic names overlap extensively with other ethnic groupings and are not separately treated. A handful
of names classi…ed as Arab, Burmese, and Malay are also discarded.
8
A simple rule is applied to take advantage of the information embedded in the patent database itself. If
over 90% of the USPTO records associated with a name are concentrated in a non-English ethnicity country or
region, the name is assigned that ethnicity. As the test includes the domestic US inventors, comprising over 50%
of all inventors, this technique is very stringent and mainly bolsters European ethnic matching (the comparative
weakness of the Melissa database). The rule is not applied to names with fewer than ten occurrences during
1975 to 1999.
3
name, or middle name match in the Full Matching procedure (77% in the Restricted Matching),
a¤ording a single ethnicity determination with priority given to surname matches.
For inventors residing in the US, representative probabilities are assigned to non-unique
matches using the masters-level SE communities in Metropolitan Statistical Areas (MSAs).
Ethnic probabilities for the remaining 3% of records (mostly foreign) are calculated as equal
shares. MSA ethnic compositions are averages of the 1980 and 1990 US 5% Census …les; they
are kept constant through the sample period. The sample considers civilians aged 22-54 listing
Engineers, Mathematical and Computer Scientists, or Natural Scientists as their occupations.
The master’s degree cut-o¤ re‡ects the higher average education level of patenting scientists
within the scienti…c community (e.g., Kannankutty and Wilkinson 1999). Country of birth is
used to assign ethnicities into broad categories that match the name records.
To illustrate, take the San Francisco scienti…c community to be 12.1% Chinese, 66.1% English,
and 4.6% European (with other ethnicities omitted). A San Francisco-based record matching
to Chinese, English, and European surnames would be assigned a probabilistic ethnicity of
14.6% Chinese, 79.8% English, and 5.6% European (summing to 100%). A China-based record
matching all three ethnicities would be assigned a 33.3% probability for each.
2.2
Inventors Residing in Foreign Countries and Regions
The application of the ethnic-name database to the inventors residing outside of the US provides
a natural quality-assurance exercise for the technique. Inventions originating outside the US account for just under half of USPTO patents, with applications from Japan comprising about half
of this foreign total. The top panel of Table 1 summarizes the results, with the rows presenting
the matched characteristics for countries and regions grouped to the ethnicities identi…able with
the database. The results are very encouraging. First, the Full Matching procedure assigns
ethnicities to a large percentage of foreign records, with the match rates greater than 93% for
all countries. In the Restricted Matching procedure, a matching rate of greater than 74% holds
for all regions.
Second, the estimated inventor compositions are reasonable. The own-ethnicity shares are
summarized in the fourth and …fth columns. The weighted average is 86% in the Full Matching
procedure, and own-ethnicity contributions are greater than 80% in the UK, China, India, Japan,
Korea, and Russia regardless of the matching procedure employed. Like the US, own-ethnicity
contributions should be less than 100% due to foreign researchers. The high success rate
using the Restricted Matching procedure indicates that the ethnic-name database performs well
without exploiting the international distribution of names, although power is lost with Europe.
Likewise, uneven coverage in the Melissa database is not driving the ethnic composition trends.
4
The bottom panel of Table 1 presents the complete ethnic compositions estimated for the
foreign countries. Many of the positive o¤-diagonals are to be expected, either due to foreign
expatriates (UK, Vietnam), small sample sizes (Vietnam), or overlaps of common names. Two
prominent examples of common names are the surname Lee (Chinese, English, and Korean) and
the …rst name Igor (Hispanic and Russian). The most frequent name overlap occurs between
the European and Hispanic ethnicities.9
One advantage the matching technique possesses for inventors residing in the US is the ability
to use the Census to assign probabilistic estimates for overlapping names; foreign records are only
assigned as equal shares. The last two columns of Table 1’s top panel indicate the percentage of
the foreign inventors assigned at least partially to their own-ethnicity. While this study does not
make the strong assumption that ties should go to the country’s own-ethnicity, the additional
power provided by using the US Census for breaking domestic ties is illustrated.
2.3
Advantages and Disadvantages of Name-Matching Technique
Visual con…rmation of the top 1000 surnames and …rst names in the USPTO records con…rms
the name-matching technique works well. Table A1 in the appendix lists the 100 most common
surnames of US-based inventors for each ethnicity, along with their relative contributions. These
counts sum the ethnic contribution from inventors with each surname. These counts include
partial or split assignments. Moreover, they are not necessarily direct or exclusive matches (e.g.,
the ethnic match may have occurred through the …rst name). While some inventors are certainly
misclassi…ed, the measurement error in aggregate trends building from the micro-data is minor.
The Full Matching procedure is the preferred technique and underlies the trends presented in
the next section, but most applications …nd negligible di¤erences when the Restricted Matching
dataset is employed instead.
The matched records describe the ethnic composition of US SEs with previously unavailable
detail: incorporating the major ethnicities working in the US SE community; separating out
detailed technologies and manufacturing industries; providing city and state statistics; and providing annual metrics. Moreover, the assignment of patents to corporations and institutions
a¤ords …rm-level and university-level characterizations (e.g., the ethnic composition of IBM’s inventors …ling computer patents from San Francisco in 1985). Detailed econometrics require this
level of cross-sectional and longitudinal variation, and the next section provides graphical descriptions along these various dimensions. These descriptive statistics highlight the advantages
of name matching through individual patent records.
9
The main US SE ethnicity missing from the database is Iranian. Running the ethnic-name database on the
few patents from Iran yields a 55%-77% match rate. Iran’s predicted composition does not favor any of the nine
ethnicities studied, with the largest overlap being the English ethnicity at 52%. Ongoing work is attempting to
develop better strength for Iranian names.
5
The ethnic-name procedure does, however, have two potential limitations for empirical work
that should be highlighted. First, the approach does not distinguish foreign-born ethnic researchers in the US from later generations working as SEs. The procedure can only estimate
total ethnic SE populations, and these levels are to some extent measured with time-invariant
error due to the name-matching approach. The resulting data are very powerful, however, for
panel econometrics employ changes in these ethnic SE populations for identi…cation. Moreover,
Census and INS records con…rm these changes are primarily due to new SE immigration for this
period, substantially weakening this overall concern.
The name-matching technique also does not distinguish …ner divisions within the nine major
ethnic groupings. For ethnic network analyses, it would be advantageous to separate Mexican
from Chilean scientists within the Hispanic ethnicity, to distinguish Chinese engineers with ethnic
ties to Taipei versus Beijing versus Shanghai, and so on. These distinctions are not possible
with the Melissa database, and researchers should understand that measurement error from the
broader ethnic divisions may bias their estimated coe¢ cients downward depending upon the
application.10 Nevertheless, Section 3 demonstrates how the deep variation available with the
ethnic patenting data provides a much richer description of US ethnic invention than previously
available.
3
Ethnic Composition of US Inventors
Table 2 describes the ethnic composition of US inventors for 1975-2004.11 The trends demonstrate a growing ethnic contribution to US technology development, especially among Chinese
and Indian scientists. Ethnic inventors are more concentrated in high-tech industries like computers and pharmaceuticals and in gateway cities relatively closer to their home countries (e.g.,
Chinese in San Francisco, European in New York, and Hispanics in Miami). The …nal three rows
demonstrate a close correspondence of the estimated ethnic composition to the country-of-birth
10
When mapping the ethnic patenting data to country-level data for international di¤usion estimations, researchers will also need to cluster their standard errors to re‡ect the multiple country-to-ethnicity mappings.
11
The current patent data incorporate all patents granted by May 2008. The application years of patents,
however, provide the best description of when innovative research is being undertaken, due to the substantial and
uneven lags in the USPTO reviews. Accordingly, the annual descriptions employed in this study are undertaken
by application years. Unfortunately, this approach leads to signi…cant attrition in the last two years — patents
are only included in the database if they have been granted, but a smaller number of applications close to the
cut-o¤ have completed the review cycle.
Raw patent counts should be treated with caution. Changes in the personnel resources and review policies of
the USPTO in‡uence the number of patents granted over time (e.g., Griliches 1990), and the explosive climb in
patent grants over the last two decades is di¢ cult to interpret (e.g., Kortum and Lerner 2000, Kim and Marschke
2004, Hall 2005, Ja¤e and Lerner 2005, and Branstetter and Ogura 2005). Accordingly, this study considers
patent shares, which avoids these interpretation concerns.
Studies seeking to quantify the number of ethnic researchers in the US should supplement this data with
immigration records or demographic surveys (with an unfortunate loss of detail). Trajtenberg (2005) and HBS
Research are working on algorithms to identify individual scientists with the USPTO data.
6
composition of the US SE workforce in the 1990 Census.12
closely examine each dimension of this data.
3.1
The next four subsections more
Contributions by Year
Figure 1 illustrates the evolving ethnic composition of US inventors from 1975-2004. The omitted
English share declines from 83% to 70% during this period. Looking across all technology
categories, the European ethnicity is initially the largest foreign contributor to US technology
development. Like the English ethnicity, however, the European share of US domestic inventors
declines steadily from 8% in 1975 to 6% in 2004. This declining share is partly due to the
exceptional growth over the thirty years of the Chinese and Indian ethnicities, which increase
from under 2% to over 8% and 5%, respectively. As shown below, this Chinese and Indian growth
is concentrated in high-tech sectors, where Chinese inventors supplant European researchers as
the largest ethnic contributor to US technology formation. The Indian ethnic contribution
declines somewhat after 2000, mostly due to changes within the computer technology sector as
seen below.
Among the other ethnicities, the Hispanic contribution grows from 3% to 4% from 1975 to
2004. The level of this series is likely mismeasured due to the extensive overlap of Hispanic
and European names, but the positive growth is consistent with stronger Latino and Filipino
scienti…c contributions in Florida and California. The Korean share increases dramatically
from 0.3% to 1.1% over the thirty years, while the Russian climbs from 1.2% to 2.2%. Although
di¢ cult to see with Figure 1’s scaling, much of the Russian increase occurs in the 1990s following
the dissolution of the Soviet Union. The Japanese share steadily increases from 0.6% to 1.0%.
Finally, while the Vietnamese contribution is the lowest throughout the sample, it does exhibit
the strongest relative growth from 0.1% to 0.6%.
3.2
Contributions by Technology
Figure 2 documents the total ethnic contribution by the six broad technology groups into which
patents are often classi…ed: Chemicals, Computers and Communications, Drugs and Medical,
Electrical and Electronic, Mechanical, and Others. The miscellaneous group includes patents
for agriculture, textiles, furniture, and the like. Growth in ethnic patenting is clearly stronger
in high-tech sectors than in more traditional industries. Figures 3-8 provide the ethnic contributions within each technology category. The growing ethnic contribution in high-tech sectors
is easily traced to the Chinese and Indian ethnicities. Moreover, these two ethnicities exhibit
12
The estimated European contribution in Table 2 is naturally higher than the immigrant contribution measured by foreign born.
7
the most interesting and economically meaningful variation across technologies, as summarized
in Figures 9 and 10.13
3.3
Contributions by Institution
Figure 11 demonstrates that intriguing di¤erences in ethnic scienti…c contributions also exist by
institution type. Over the 1975-2004 period, ethnic inventors are more concentrated in government and university research labs and in publicly listed companies than in private companies or
as una¢ liated inventors. Part of this levels di¤erence is certainly due to immigration visa sponsorships by larger institutions. Growth in ethnic shares are initially stronger in the government
and university labs, but publicly listed companies appear to close the gap by 2004. The other
interesting trend in Figure 11 is for private companies, where the ethnic contribution sharply
increases in the 1990s. This rise coincides with the strong growth in ethnic entrepreneurship in
high-tech sectors.14
3.4
Contributions by Geography
This paper closes its descriptive statistics with an examination of the 1975-2004 ethnic inventor
contributions by major cities in Table 3. Cities are de…ned through 281 Metropolitan Statistical Areas.15 Not surprisingly, total patenting shares are highly correlated with city size,
with the three largest shares of US domestic patenting for 1995-2004 found in San Francisco
(12%), New York (7%), and Los Angeles (6%). More interestingly, non-English patenting is
more concentrated than general innovation. The 1995-2004 non-English patent shares of San
Francisco, New York, and Los Angeles are 19%, 10%, and 8%, respectively. Similarly, 81% of
non-English invention occurs in the top 47 patenting cities listed in Table 3, compared to 73%
of total patenting. Indian and Chinese invention is even further agglomerated. San Francisco shows exceptional growth from an 8% share of total US Indian and Chinese patenting in
13
The USPTO issues patents by technology categories rather than by industries. Combining the work of
Johnson (1999), Silverman (1999), and Kerr (2008a), concordances can be developed to map the USPTO classi…cation scheme to the three-digit industries in which new inventions are manufactured or used. Scherer (1984)
and Keller (2002) further discuss the importance of inter-industry R&D ‡ows.
14
Publicly listed companies are identi…ed from a 1989 mapping developed by Hall et al. (2001). This company
list is not updated for delistings or new public o¤erings. This approach maintains a constant public grouping for
reference, but it also weakens the respresentativeness of the public and private company groupings at the sample
extremes for current companies.
Industry patents account for 72% of patents granted from 1980-1997. Public companies account for 59% of
industry patents during the period and are identi…ed through Compustat records. Government and university
institutions are identi…ed through institution names and account for about 4% of patents granted. Federally
funded research and development centers (FFRDCs) are included in both industry and government groups.
Unassigned patents account for about 26% of patents granted.
15
MSAs are identi…ed from inventors’city names using city lists collected from the O¢ ce of Social and Economic
Data Analysis at the University of Missouri, with a matching rate of 99%. Manual coding further ensures all
patents with more than 100 citations and all city names with more than 100 patents are identi…ed.
8
1975-1984 to 25% in 1995-2004, while the combined shares of New York and Chicago decline
from 22% to 13%. Agrawal et al. (2007a,b) and Kerr (2008c) further describe ethnic inventor
agglomeration in the US using the ethnic name approach.
Not only are ethnic scientists disproportionately concentrated in major cities, but growth
in a city’s share of ethnic patenting is highly correlated with growth in its share of total US
patenting. Across the whole sample and including all of the intervening years, an increase of
1% in a city’s ethnic patenting share correlates with a 0.6% increase in the city’s total invention
share. This coe¢ cient is remarkably high, as the ethnic share of total invention during this
period is around 20%. Shifts in the concentration of ethnic inventors appear to facilitate changes
in the geographic composition of US innovation.16
4
Conclusion
Ethnic scientists and engineers are an important and growing contributor to US technology
development. The Chinese and Indian ethnicities, in particular, are now an integral part of
US invention in high-tech sectors. This paper describes how the probable ethnicities of US
researchers can be determined at the micro-level through their names available with USPTO
patent records. The ethnic-name database this study employs distinguishes nine ethnic groups,
and the matched database describes the ethnic composition of US inventors with previously
unavailable cross-sectional and longitudinal detail. This richer variation can support more
detailed and informative empirical analyses than would be feasible otherwise.
16
The ethnic-name approach does not distinguish ethnic inventor shifts due to new immigration, domestic
migration, or occupational changes. It is likewise beyond the scope of this descriptive note to explore issues of
causality or e¤ects on native workers. See Kerr and Lincoln (2008) for recent work in this area.
9
References
[1] Agrawal, Ajay, Devesh Kapur, and John McHale, "Birds of a Feather – Better Together?
Exploring the Optimal Spatial Distribution of Ethnic Inventors", NBER Working Paper
12823 (2007a).
[2] Agrawal, Ajay, Devesh Kapur, and John McHale, "Brain Drain or Brain Bank? The Impact
of Skilled Emigration on Poor-Country Innovation", Working Paper (2007b).
[3] Borjas, George, "Do Foreign Students Crowd Out Native Students from Graduate Programs?", NBER Working Paper 10349 (2004).
[4] Branstetter, Lee, and Yoshiaki Ogura, "Is Academic Science Driving a Surge in Industrial
Innovation? Evidence from Patent Citations", NBER Working Paper 11561 (2005).
[5] Burton, Lawrence, and Jack Wang, "How Much Does the U.S. Rely on Immigrant Engineers?", NSF SRS Issue Brief (1999).
[6] Foley, C. Fritz, and William Kerr, "US Ethnic Scientists and Foreign Direct Investment
Placement", Working Paper (2008).
[7] Freeman, Richard, "Does Globalization of the Scienti…c/Engineering Workforce Threaten
U.S. Economic Leadership?", NBER Working Paper 11457 (2005).
[8] Griliches, Zvi, "Patent Statistics as Economic Indicators: A Survey", Journal of Economic
Literature 28:4 (1990), 1661-1707.
[9] Hall, Bronwyn, "Exploring the Patent Explosion", Journal of Technology Transfer 30
(2005), 35-48.
[10] Hall, Bronwyn, Adam Ja¤e, and Manuel Trajtenberg, "The NBER Patent Citation Data
File: Lessons, Insights and Methodological Tools", NBER Working Paper 8498 (2001).
[11] Ja¤e, Adam, and Joshua Lerner, Innovation and Its Discontents (Boston, MA: Harvard
Business School Press, 2005).
[12] Johnson, Daniel, "150 Years of American Invention: Methodology and a First Geographic
Application", Wellesley College Economics Working Paper 99-01 (1999). Data currently
reside at http://faculty1.coloradocollege.edu/~djohnson/uships.html.
[13] Johnson, Jean, "Statistical Pro…les of Foreign Doctoral Recipients in Science and Engineering: Plans to Stay in the United States", NSF SRS Report (1998).
[14] Johnson, Jean, "Human Resource Contribution to U.S. Science and Engineering From
China", NSF SRS Issue Brief (2001).
[15] Kannankutty, Nirmala, and R. Keith Wilkinson, "SESTAT: A Tool for Studying Scientists
and Engineers in the United States", NSF SRS Report (1999).
[16] Keller, Wolfgang, "Trade and the Transmission of Technology", Journal of Economic
Growth 7 (2002), 5-24.
[17] Kerr, William, "Ethnic Scienti…c Communities and International Technology Di¤usion",
Review of Economics and Statistics 90:3 (2008a), 518-537.
10
[18] Kerr, William, "Heterogeneous Technology Di¤usion and Ricardian Trade Patterns", Working Paper (2008b).
[19] Kerr, William, "The Agglomeration of US Ethnic Inventors", HBS Working Paper (2008c).
[20] Kerr, William, "The Role of Immigrant Scientists and Entrepreneurs in International Technology Transfer", MIT Ph.D. Dissertation (2005).
[21] Kerr, William, and William Lincoln, "The Supply Side of Innovation: H-1B Visa Reforms
and US Ethnic Invention", HBS Working Paper 09-005 (2008).
[22] Kim, Jinyoung, and Gerald Marschke, "Accounting for the Recent Surge in U.S. Patenting:
Changes in R&D Expenditures, Patent Yields, and the High Tech Sector", Economics of
Innovation and New Technologies 13:6 (2004), 543-558.
[23] Kortum, Samuel, and Joshua Lerner, "Assessing the Contribution of Venture Capital to
Innovation", RAND Journal of Economics 31:4 (2000), 674-692.
[24] Lowell, B. Lindsay, "H1-B Temporary Workers: Estimating the Population", The Center
for Comparative Immigration Studies Working Paper 12 (2000).
[25] Saxenian, AnnaLee, with Yasuyuki Motoyama and Xiaohong Quan, Local and Global Networks of Immigrant Professionals in Silicon Valley (San Francisco, CA: Public Policy Institute of California, 2002a).
[26] Saxenian, AnnaLee, "Silicon Valley’s New Immigrant High-Growth Entrepreneurs", Economic Development Quarterly 16:1 (2002b), 20-31.
[27] Scherer, Frederic, "Using Linked Patent Data and R&D Data to Measure Technology
Flows", in Griliches, Zvi (ed.) R & D, Patents and Productivity (Chicago, IL: University of
Chicago Press, 1984).
[28] Silverman, Brian, "Technological Resources and the Direction of Corporate Diversi…cation:
Toward an Integration of the Resource-Based View and Transaction Cost Economics", Management Science 45:8 (1999), 1109-1124.
[29] Stephan, Paula, and Sharon Levin, "Exceptional Contributions to US Science by the
Foreign-Born and Foreign-Educated", Population Research and Policy Review 20:1 (2001),
59-79.
[30] Streeter, Joanne, "Major Declines in Admissions of Immigrant Scientists and Engineers in
Fiscal Year 1994", NSF SRS Issue Brief (1997).
[31] Trajtenberg, Manuel, "The Mobility of Inventors and the Productivity of Research", Working Paper (2005).
[32] Wadhwa, Vivek, AnnaLee Saxenian, Ben Rissing, and Gary Gere¢ , "America’s New Immigrant Entrepreneurs I", Working Paper (2007).
11
Fig. 1: Ethnic Share of US Domestic Patents
Percentage of Patent Applications
10%
8%
6%
4%
2%
0%
1975
1980
1985
1990
1995
2000
Chinese
European
Hispanic
Indian
Japanese
Korean
Russian
Vietnam.
Fig. 2: Total US Ethnic Share by Technology
Percentage of Patent Applications
39%
34%
29%
24%
19%
14%
1975
1980
Chemicals
1985
Computers
1990
Drugs
1995
Electrical
2000
Mechanical
Other
Fig. 3: US Ethnic Patenting - Chemicals
Percentage of Patent Applications
10%
8%
6%
4%
2%
0%
1975
1980
1985
1990
1995
2000
Chinese
European
Hispanic
Indian
Japanese
Korean
Russian
Vietnam.
Fig. 4: US Ethnic Patenting - Computers
Percentage of Patent Applications
14%
12%
10%
8%
6%
4%
2%
0%
1975
1980
1985
1990
1995
2000
Chinese
European
Hispanic
Indian
Japanese
Korean
Russian
Vietnam.
Fig. 5: US Ethnic Patenting - Drugs
Percentage of Patent Applications
12%
10%
8%
6%
4%
2%
0%
1975
1980
1985
1990
1995
2000
Chinese
European
Hispanic
Indian
Japanese
Korean
Russian
Vietnam.
Fig. 6: US Ethnic Patenting - Electrical
Percentage of Patent Applications
14%
12%
10%
8%
6%
4%
2%
0%
1975
1980
1985
1990
1995
2000
Chinese
European
Hispanic
Indian
Japanese
Korean
Russian
Vietnam.
Fig. 7: US Ethnic Patenting - Mechanical
10%
Percentage of Patent Applications
9%
8%
7%
6%
5%
4%
3%
2%
1%
0%
1975
1980
1985
1990
1995
2000
Chinese
European
Hispanic
Indian
Japanese
Korean
Russian
Vietnam.
Fig. 8: US Ethnic Patenting - Other
10%
Percentage of Patent Applications
9%
8%
7%
6%
5%
4%
3%
2%
1%
0%
1975
1980
1985
1990
1995
2000
Chinese
European
Hispanic
Indian
Japanese
Korean
Russian
Vietnam.
Fig. 9: Chinese Contribution by Technology
14%
Percentage of Patent Applications
12%
10%
8%
6%
4%
2%
0%
1975
1980
Chemicals
1985
Computers
1990
Drugs
1995
Electrical
2000
Mechanical
Other
Fig. 10: Indian Contribution by Technology
10%
Percentage of Patent Applications
9%
8%
7%
6%
5%
4%
3%
2%
1%
0%
1975
1980
Chemicals
1985
Computers
1990
Drugs
1995
Electrical
2000
Mechanical
Other
Fig. 11: Total US Ethnic Share by Institution
Percentage of Patent Applications
40%
35%
30%
25%
20%
15%
10%
1975
1980
Total
Public Industry
1985
1990
Private Industry
1995
Government/University
2000
Unassigned
Table 1: Descriptive Statistics for Inventors Residing in Foreign Countries and Regions
Summary Statistics for Full and Restricted Matching Procedures
Percentage of
Region's Inventors
Matched with
Ethnic Database
Percentage of
Region's Inventors
Assigned Ethnicity
of Their Region
Percentage of
Region's Inventors
Assigned Ethnicity
of Region (Partial)
Obs.
Full
Restrict.
Full
Restrict.
Full
Restrict.
United Kingdom
187,266
99%
95%
85%
83%
92%
91%
China, Singapore
167,370
100%
98%
88%
89%
91%
91%
Western Europe
1,210,231
98%
79%
66%
46%
73%
58%
Hispanic Nations
27,298
99%
74%
74%
69%
93%
93%
India
13,582
93%
76%
88%
88%
90%
89%
Japan
1,822,253
100%
89%
100%
96%
100%
96%
South Korea
127,975
100%
100%
84%
83%
89%
88%
Russia
33,237
94%
78%
81%
84%
93%
94%
41
100%
98%
36%
43%
44%
43%
Vietnam
Complete Ethnic Composition of Region's Inventors (Full Matching)
English
Chinese European Hispanic
Indian
Japanese
Korean
Russian
Vietnam.
United Kingdom
85%
2%
5%
3%
2%
0%
0%
2%
0%
China, Singapore
3%
88%
1%
1%
1%
1%
4%
1%
1%
Western Europe
21%
1%
66%
8%
1%
0%
0%
3%
0%
Hispanic Nations
11%
1%
10%
74%
0%
1%
0%
2%
0%
India
3%
1%
1%
5%
88%
0%
0%
2%
0%
Japan
0%
0%
0%
0%
0%
100%
0%
0%
0%
South Korea
2%
11%
0%
1%
0%
1%
84%
1%
0%
Russia
5%
1%
3%
9%
0%
0%
0%
81%
0%
Vietnam
17%
21%
12%
0%
0%
10%
2%
2%
36%
Notes: Matching is undertaken at inventor level using the Full and Restricted Matching procedures outlined in the text. The middle
columns of the top panel summarize the share of each region's inventors assigned the ethnicity of that region; the complete
composition for the Full Matching procedure is detailed in the bottom panel. The right-hand columns in the top panel document the
percentage of the region's inventors assigned at least partially to their region's ethnicity.
Greater China includes Mainland China, Hong Kong, Macao, and Taiwan. Western Europe includes Austria, Belgium, Denmark,
Finland, France, Germany, Italy, Luxembourg, Netherlands, Norway, Poland, Sweden, and Switzerland. Hispanic Nations includes
Argentina, Belize, Brazil, Chile, Columbia, Costa Rica, Cuba, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras,
Mexico, Nicaragua, Panama, Paraguay, Peru, Philippines, Portugal, Spain, Uruguay, and Venezuela. Russia includes former Soviet
Union countries.
Table 2: Descriptive Statistics for Inventors Residing in US
English
Chinese
European
Ethnicity of Inventor
Hispanic
Indian
Japanese
Korean
Russian
Vietnam.
A. Ethnic Inventor Shares Estimated from US Inventor Records, 1975-2004
1975-1979
1980-1984
1985-1989
1990-1994
1995-1999
2000-2004
82.5%
81.1%
79.8%
77.6%
73.9%
70.4%
2.2%
2.9%
3.6%
4.6%
6.5%
8.5%
8.3%
7.9%
7.5%
7.2%
6.8%
6.4%
2.9%
3.0%
3.2%
3.5%
3.9%
4.2%
1.9%
2.4%
2.9%
3.6%
4.8%
5.4%
0.6%
0.7%
0.8%
0.9%
0.9%
1.0%
0.3%
0.5%
0.6%
0.7%
0.8%
1.1%
1.2%
1.3%
1.4%
1.5%
1.8%
2.2%
0.1%
0.1%
0.2%
0.4%
0.5%
0.6%
Chemicals
Computers
Pharmaceuticals
Electrical
Mechanical
Miscellaneous
73.4%
70.1%
72.9%
71.6%
80.4%
81.3%
7.2%
8.2%
7.1%
8.0%
3.2%
2.9%
7.5%
6.3%
7.4%
6.8%
7.1%
7.0%
3.6%
3.8%
4.3%
3.7%
3.5%
3.8%
4.5%
6.9%
4.2%
4.9%
2.6%
2.1%
1.0%
1.1%
1.1%
1.1%
0.7%
0.6%
0.8%
0.9%
0.9%
1.1%
0.6%
0.6%
1.7%
2.1%
1.8%
2.1%
1.6%
1.4%
0.3%
0.7%
0.4%
0.7%
0.2%
0.3%
Top Cities as a
Percentage of
City’s Patents
KC (89)
WS (88)
NAS (88)
SF (13)
LA (8)
AUS (6)
NOR (12)
STL (11)
NYC (11)
MIA (16)
SA (9)
WPB (7)
SF (7)
AUS (7)
PRT (6)
SD (2)
SF (2)
LA (2)
BAL (2)
LA (2)
SF (1)
BOS (3)
NYC (3)
SF (3)
AUS (2)
SF (1)
LA (1)
B. Ethnic Scientist and Engineer Shares Estimated from 1990 US Census Records
Bachelors Share
Masters Share
Doctorate Share
87.6%
78.9%
71.2%
2.7%
6.7%
13.2%
2.3%
3.4%
4.0%
2.4%
2.2%
1.7%
2.3%
5.4%
6.5%
0.6%
0.9%
0.9%
0.5%
0.7%
1.5%
0.4%
0.8%
0.5%
1.2%
1.0%
0.4%
Notes: Panel A presents descriptive statistics for inventors residing in the US at the time of patent application. Inventor ethnicities are estimated through inventors'
names using techniques described in the text. Patents are grouped by application years and major technology fields. Cities, defined through Metropolitan Statistical
Areas, include AUS (Austin), BAL (Baltimore), BOS (Boston), KC (Kansas City), LA (Los Angeles), MIA (Miami), NAS (Nashville), NOR (New Orleans), NYC (New
York City), PRT (Portland), SA (San Antonio), SD (San Diego), SF (San Francisco), STL (St. Louis), WPB (West Palm Beach), and WS (Winston-Salem). Cities are
identified from inventors' city names using city lists collected from the Office of Social and Economic Data Analysis at the University of Missouri, with a matching rate of
99%. Manual recoding further ensures all patents with more than 100 citations and all city names with more than 100 patents are identified. Panel B presents comparable
statistics calculated from the 1990 Census using country of birth for scientists and engineers. Country groupings follow Table 1; English provides a residual in the Census
statistics.
Table 3: Ethnic Inventor Contributions by City
Total Patenting Share
Atlanta, GA
Austin, TX
Baltimore, MD
Boston, MA
Buffalo, NY
Charlotte, NC
Chicago, IL
Cincinnati, OH
Cleveland, OH
Columbus, OH
Dallas-Fort Worth, TX
Denver, CO
Detroit, MI
Greensboro-W.S., NC
Hartford, CT
Houston, TX
Indianapolis, IN
Jacksonville, NC
Kansas City, MO
Las Vegas, NV
Los Angeles, CA
Memphis, TN
Miami, FL
Milwaukee, WI
Minneap.-St. Paul, MN
non-English Ethnic Patenting Share
Indian and Chinese Patenting Share
19751984
19851994
19952004
20012006 (A)
19751984
19851994
19952004
20012006 (A)
19751984
19851994
19952004
20012006 (A)
0.6%
0.4%
0.8%
3.6%
0.6%
0.3%
6.0%
1.0%
2.3%
0.7%
1.6%
1.0%
3.1%
0.2%
0.9%
2.3%
0.8%
0.1%
0.4%
0.1%
6.6%
0.1%
0.8%
1.0%
1.9%
1.0%
0.9%
0.8%
3.8%
0.5%
0.3%
4.6%
1.1%
1.7%
0.5%
2.0%
1.2%
3.3%
0.3%
0.9%
2.5%
0.7%
0.1%
0.3%
0.1%
6.1%
0.2%
0.9%
0.9%
2.4%
1.3%
1.8%
0.7%
3.9%
0.4%
0.3%
3.5%
1.0%
1.3%
0.5%
2.3%
1.3%
2.9%
0.3%
0.6%
1.9%
0.7%
0.1%
0.4%
0.2%
6.0%
0.2%
0.7%
0.8%
2.7%
1.5%
2.0%
0.7%
4.6%
0.3%
0.3%
3.2%
1.0%
1.1%
0.4%
2.1%
1.3%
2.8%
0.2%
0.6%
2.0%
0.5%
0.1%
0.3%
0.3%
5.7%
0.3%
0.7%
0.7%
2.8%
0.3%
0.5%
0.7%
3.9%
0.8%
0.2%
6.9%
0.9%
2.5%
0.6%
1.1%
0.8%
3.1%
0.1%
1.0%
1.8%
0.6%
0.1%
0.2%
0.1%
7.2%
0.1%
1.0%
0.8%
1.6%
0.7%
1.2%
0.7%
4.2%
0.6%
0.2%
5.0%
0.9%
1.5%
0.6%
1.9%
1.0%
3.1%
0.2%
0.8%
2.3%
0.4%
0.1%
0.2%
0.1%
7.2%
0.1%
1.3%
0.8%
2.0%
1.0%
1.9%
0.6%
4.1%
0.4%
0.2%
3.5%
0.7%
1.0%
0.4%
2.3%
0.9%
2.6%
0.2%
0.5%
1.8%
0.4%
0.1%
0.2%
0.2%
7.9%
0.1%
1.0%
0.6%
2.0%
1.1%
2.0%
0.5%
4.8%
0.3%
0.2%
3.0%
0.7%
0.8%
0.3%
2.2%
0.8%
2.6%
0.1%
0.5%
1.9%
0.3%
0.1%
0.2%
0.2%
7.3%
0.2%
0.9%
0.5%
2.0%
0.3%
0.4%
0.4%
4.0%
1.1%
0.1%
5.6%
0.7%
2.5%
0.8%
1.5%
0.8%
3.2%
0.2%
0.8%
2.2%
0.7%
0.1%
0.2%
0.0%
6.7%
0.1%
0.5%
0.5%
1.5%
0.7%
1.6%
0.5%
4.0%
0.7%
0.2%
3.9%
1.0%
1.4%
0.7%
2.4%
1.0%
2.8%
0.2%
0.6%
2.8%
0.5%
0.1%
0.1%
0.1%
6.9%
0.1%
0.6%
0.4%
1.7%
1.0%
2.3%
0.6%
3.6%
0.4%
0.1%
2.9%
0.6%
0.9%
0.3%
2.9%
0.6%
2.5%
0.1%
0.3%
1.8%
0.4%
0.1%
0.2%
0.1%
7.5%
0.1%
0.5%
0.5%
1.7%
1.2%
2.3%
0.5%
4.3%
0.3%
0.2%
2.8%
0.6%
0.6%
0.3%
2.8%
0.5%
2.5%
0.1%
0.4%
1.9%
0.3%
0.1%
0.2%
0.1%
7.0%
0.1%
0.4%
0.4%
1.8%
Table 3: Ethnic Inventor Contributions by City, continued
Total Patenting Share
non-English Ethnic Patenting Share
Indian and Chinese Patenting Share
19751984
19851994
19952004
20012006 (A)
19751984
19851994
19952004
20012006 (A)
19751984
19851994
19952004
20012006 (A)
Nashville, TN
New Orleans, LA
New York, NY
Norfolk-VA Beach, VA
Orlando, FL
Philadelphia, PA
Phoenix, AZ
Pittsburgh, PA
Portland, OR
Providence, RI
Raleigh-Durham, NC
Richmond, VA
Sacramento, CA
Salt Lake City, UT
San Antonio, TX
San Diego, CA
San Francisco, CA
Seattle, WA
St. Louis, MO
Tallahassee, FL
Washington, DC
West Palm Beach, FL
0.1%
0.3%
11.5%
0.2%
0.2%
4.6%
1.0%
2.0%
0.5%
0.3%
0.3%
0.3%
0.2%
0.4%
0.1%
1.1%
4.8%
0.9%
1.0%
0.4%
1.5%
0.3%
0.2%
0.2%
8.9%
0.2%
0.3%
4.0%
1.2%
1.3%
0.8%
0.3%
0.6%
0.3%
0.4%
0.5%
0.2%
1.6%
6.6%
1.3%
0.9%
0.5%
1.5%
0.5%
0.2%
0.2%
7.3%
0.2%
0.3%
2.7%
1.4%
0.8%
1.4%
0.3%
1.1%
0.2%
0.5%
0.6%
0.2%
2.2%
12.1%
1.9%
0.8%
0.4%
1.4%
0.4%
0.2%
0.1%
6.9%
0.1%
0.3%
2.8%
1.3%
0.7%
1.6%
0.2%
1.5%
0.2%
0.5%
0.6%
0.2%
2.8%
13.2%
3.4%
0.8%
0.4%
1.6%
0.4%
0.0%
0.3%
16.6%
0.1%
0.1%
5.6%
0.6%
2.2%
0.3%
0.3%
0.3%
0.3%
0.2%
0.2%
0.1%
1.1%
6.2%
0.8%
0.9%
0.3%
1.6%
0.3%
0.1%
0.3%
13.1%
0.1%
0.2%
4.9%
1.1%
1.4%
0.6%
0.4%
0.6%
0.3%
0.4%
0.4%
0.2%
1.6%
9.3%
1.1%
0.8%
0.4%
1.6%
0.5%
0.1%
0.1%
10.1%
0.1%
0.3%
2.8%
1.3%
0.6%
1.4%
0.3%
1.0%
0.2%
0.5%
0.3%
0.2%
2.6%
19.3%
1.8%
0.8%
0.3%
1.5%
0.4%
0.1%
0.1%
8.9%
0.1%
0.3%
2.9%
1.2%
0.5%
1.6%
0.2%
1.3%
0.2%
0.5%
0.3%
0.2%
3.6%
19.9%
3.5%
0.7%
0.3%
1.7%
0.4%
0.1%
0.2%
16.6%
0.1%
0.1%
6.2%
0.4%
2.2%
0.2%
0.2%
0.3%
0.3%
0.2%
0.2%
0.2%
0.8%
8.4%
0.6%
1.0%
0.2%
1.6%
0.3%
0.1%
0.2%
13.3%
0.1%
0.2%
5.8%
1.0%
1.3%
0.6%
0.3%
0.8%
0.4%
0.3%
0.3%
0.1%
1.4%
13.0%
1.0%
0.8%
0.2%
1.7%
0.3%
0.1%
0.0%
9.7%
0.1%
0.3%
2.8%
1.4%
0.5%
1.7%
0.2%
1.0%
0.2%
0.5%
0.3%
0.1%
2.4%
25.4%
1.8%
0.4%
0.2%
1.5%
0.2%
0.1%
0.0%
9.0%
0.1%
0.3%
3.0%
1.3%
0.5%
2.0%
0.2%
1.2%
0.2%
0.5%
0.3%
0.1%
3.9%
24.0%
3.7%
0.4%
0.2%
1.7%
0.2%
Other 234 Major Cities
Not in a Major City
21.8%
9.0%
22.3%
8.2%
20.7%
6.6%
18.4%
6.2%
18.1%
6.3%
18.1%
5.4%
15.6%
3.7%
13.6%
4.1%
19.7%
5.2%
18.2%
3.8%
14.6%
2.5%
12.7%
2.7%
Notes: See Table 1. The first three columns of each grouping are for granted patents. The fourth column, marked with (A), is for published patent applications.
Table A1: Most Common Ethnic Surnames for Inventors Residing in the US
Chinese
CAI
CAO
CHAN
CHANG
CHAO
CHAU
CHEN
CHENG
CHEUNG
CHIANG
CHIEN
CHIN
CHIU
CHOU
CHOW
CHU
DENG
DING
DONG
FAN
FANG
FENG
FONG
FU
FUNG
GAO
GUO
HAN
HE
HO
HSIEH
HSU
HU
HUANG
HUI
HUNG
HWANG
JIANG
KAO
KUO
LAI
LAM
LAU
LEE
LEUNG
LEW
LI
LIANG
LIAO
LIM
LIN
LING
English
585
657
3,096
3,842
796
486
12,860
2,648
950
1,112
429
423
924
1,144
1,139
2,353
439
589
492
1,036
846
658
727
767
455
785
921
777
1,159
1,282
980
3,034
1,695
4,605
451
562
800
1,399
714
1,157
1,134
1,336
1,320
4,006
1,165
460
6,863
1,173
553
485
5,770
521
ADAMS
ALLEN
ANDERSON
BAILEY
BAKER
BELL
BENNETT
BROOKS
BROWN
BURNS
CAMPBELL
CARLSON
CARTER
CHANG
CLARK
COHEN
COLE
COLLINS
COOK
COOPER
COX
DAVIS
EDWARDS
EVANS
FISCHER
FISHER
FOSTER
FOX
GARDNER
GORDON
GRAHAM
GRAY
GREEN
HALL
HAMILTON
HANSON
HARRIS
HAYES
HILL
HOFFMAN
HOWARD
HUGHES
JACKSON
JENSEN
JOHNSON
JONES
KELLER
KELLY
KENNEDY
KING
KLEIN
LARSON
European
4,490
5,074
10,719
2,431
4,671
2,738
2,734
2,015
11,662
2,098
3,959
2,745
2,658
2,032
5,493
2,626
2,143
2,992
3,556
3,045
2,407
8,848
3,375
4,082
2,081
2,748
2,616
1,990
2,412
2,315
2,042
2,626
3,540
4,907
1,991
2,148
4,793
2,031
3,590
2,387
2,160
2,198
3,980
2,361
17,960
10,630
2,041
2,775
2,208
4,686
2,347
2,537
ABEL
ALBRECHT
ANTOS
AUERBACH
BAER
BAERLOCHER
BAUER
BECHTEL
BECK
BENDER
BERG
BERGER
BOEHM
BOUTAGHOU
CARON
CERAMI
CHANDRARATNA
CHEVALLIER
DIETRICH
DIETZ
EBERHARDT
EHRLICH
ERRICO
FARKAS
FERRARI
FISCHELL
FUCHS
GAISER
GELARDI
GRILLIOT
GUEGLER
GUNTER
GUNTHER
HAAS
HAMPEL
HANSEN
HARTMAN
HARTMANN
HAUSE
HECHT
HEINZ
HORODYSKY
HORVATH
IACOVELLI
JACOBS
KARR
KASPER
KEMPF
KNAPP
KNIFTON
KOENIG
KRESGE
Hispanic / Filipino
269
564
230
193
422
252
1,470
179
1,712
650
1,465
1,304
256
266
290
172
229
204
312
496
192
311
190
169
177
280
394
193
176
201
179
177
247
843
187
2,947
1,214
385
266
245
168
230
387
287
1,962
196
227
228
833
206
521
179
ACOSTA
AGUILAR
ALVAREZ
ANDREAS
AYER
AYRES
BALES
BLANCO
BOLANOS
BOLES
CABRAL
CABRERA
CALDERON
CASTANEDA
CASTILLO
CASTRO
CHAVEZ
CONTRERAS
CRUZ
CUEVAS
DAS
DELGADO
DIAS
DIAZ
DOMINGUEZ
DURAN
ELIAS
ESTRADA
FERNANDES
FERNANDEZ
FIGUEROA
FLORES
FREITAS
GAGNON
GARCIA
GARZA
GOMES
GOMEZ
GONSALVES
GONZALES
GONZALEZ
GUTIERREZ
GUZMAN
HALASA
HERNANDEZ
HERRERA
HERRON
HIDALGO
JIMENEZ
LEE
LOPEZ
MACHADO
Indian / Hindi
171
138
446
128
166
180
240
141
130
118
154
163
124
116
124
119
194
137
319
123
213
216
174
584
195
142
230
142
152
546
146
191
132
265
1,310
167
199
413
141
281
1,055
601
139
202
703
171
450
186
246
237
738
135
ACHARYA
AGARWAL
AGGARWAL
AGRAWAL
AHMAD
AHMED
AKRAM
ALI
ARIMILLI
ARORA
ASH
BALAKRISHNAN
BANERJEE
BASU
BHAT
BHATIA
BHATT
BHATTACHARYA
BHATTACHARYYA
BOSE
CHANDRA
CHATTERJEE
DAOUD
DAS
DATTA
DE
DESAI
DIXIT
DUTTA
GANDHI
GARG
GHOSH
GOEL
GUPTA
HASSAN
HUSSAIN
HUSSAINI
ISLAM
IYER
JAIN
JOSHI
KAMATH
KAPOOR
KHANNA
KRISHNAMURTHY
KRISHNAN
KULKARNI
KUMAR
LAL
MALIK
MATHUR
MEHROTRA
338
580
282
797
355
652
640
559
432
214
290
228
371
233
224
411
242
216
265
238
221
647
305
522
424
234
974
256
338
228
345
661
279
1,935
217
233
299
266
601
912
886
219
222
378
369
512
299
2,005
366
532
306
265
Table A1: Most Common US Ethnic Surnames (continued)
Chinese
LIU
LO
LU
LUO
MA
MAO
NG
ONG
PAN
PENG
SHEN
SHI
SHIH
SONG
SU
SUN
TAI
TAM
TAN
TANG
TENG
TONG
TSAI
TSANG
TSENG
TUNG
WANG
WEI
WEN
WONG
WOO
WU
XIE
XU
YAN
YANG
YAO
YE
YEE
YEH
YEN
YIN
YU
YUAN
ZHANG
ZHAO
ZHENG
ZHOU
ZHU
English
6,406
1,053
2,289
815
1,708
545
1,132
473
1,435
530
1,480
964
938
636
1,025
2,521
463
589
1,105
2,277
437
677
1,244
499
538
565
11,905
1,317
455
4,811
710
5,521
609
2,249
826
4,584
699
525
729
928
467
617
2,293
825
4,532
1,337
1,037
1,517
1,749
European
LEE
9,490 LANGE
LEWIS
4,732 LASKARIS
LONG
2,392 LEMELSON
MARSHALL
2,088 LIOTTA
MARTIN
6,773 LORENZ
MILLER
14,942 LUDWIG
MITCHELL
3,075 LUTZ
MOORE
6,459 MAIER
MORGAN
2,824 MARTIN
MORRIS
3,223 MAYER
MURPHY
3,609 MEYER
MURRAY
2,207 MOLNAR
MYERS
2,625 MORIN
NELSON
6,444 MUELLER
OLSON
3,140 MULLER
PARKER
3,181 NAGEL
PETERSON
4,912 NATHAN
PHILLIPS
3,875 NILSSEN
PRICE
2,062 NOVAK
REED
2,645 PAGANO
RICHARDSON 2,114 PALERMO
ROBERTS
4,352 PASTOR
ROBINSON
3,741 POPP
ROGERS
2,974 RAO
ROSS
2,377 REITZ
RUSSELL
2,611 ROHRBACH
RYAN
2,404 ROMAN
SCOTT
3,583 ROSTOKER
SHAW
2,369 SCHMIDT
SIMPSON
2,014 SCHNEIDER
SMITH
24,173 SCHULTZ
SNYDER
2,335 SCHULZ
STEVENS
2,221 SCHWARTZ
STEWART
2,924 SCHWARZ
SULLIVAN
2,933 SPERANZA
TAYLOR
6,659 SPIEGEL
THOMAS
5,312 STRAETER
THOMPSON
6,424 THEEUWES
TURNER
2,855 TROKHAN
WALKER
4,887 VOCK
WALLACE
1,963 WACHTER
WARD
2,913 WAGNER
WATSON
2,139 WEBER
WHITE
6,190 WEDER
WILLIAMS
10,442 WEISS
WILSON
7,677 WOLF
WOOD
4,525 WRISTERS
WRIGHT
4,521 ZIMMERMAN
YOUNG
5,957 ZIMMERMANN
Hispanic / Filipino
757
192
324
171
341
500
679
492
223
1,097
3,004
335
320
2,242
985
383
171
234
788
177
177
238
202
343
248
246
362
245
3,753
2,246
2,273
921
2,394
633
215
177
454
247
167
423
199
2,499
3,003
1,067
1,533
1,604
185
1,542
226
MARIN
MARQUEZ
MARTIN
MARTINEZ
MATIS
MEDINA
MENARD
MENDOZA
MIRANDA
MOLINA
MORALES
MORENO
MUNOZ
NUNEZ
ORTEGA
ORTIZ
PADILLA
PAZ DE ARAUJO
PEREIRA
PEREZ
QUINTANA
RAMIREZ
RAMOS
REGNIER
REIS
REYES
RIVERA
RODRIGUES
RODRIGUEZ
ROMERO
RUIZ
SALAZAR
SANCHEZ
SANTIAGO
SERRANO
SILVA
SOTO
SOUZA
SUAREZ
TORRES
VALDEZ
VARGA
VASQUEZ
VAZQUEZ
VELAZQUEZ
VINALS
YU
ZAMORA
ZUNIGA
Indian / Hindi
177
117
183
1,112
249
192
149
173
140
129
146
128
177
207
206
362
116
148
280
675
126
345
226
137
168
150
489
188
1,314
292
297
179
717
158
172
457
158
145
150
352
127
130
153
260
134
220
140
120
128
MEHTA
MENON
MISHRA
MISRA
MOOKHERJEE
MUKHERJEE
MURTHY
NAGARAJAN
NAIR
NARASIMHAN
NARAYAN
NARAYANAN
NATARAJAN
PAREKH
PARIKH
PATEL
PATIL
PRAKASH
PRASAD
PURI
RAGHAVAN
RAHMAN
RAJAGOPALAN
RAMACHANDRAN
RAMAKRISHNAN
RAMAN
RAMASWAMY
RAMESH
RANGARAJAN
RAO
REDDY
ROY
SANDHU
SAXENA
SHAH
SHARMA
SINGH
SINGHAL
SINHA
SIRCAR
SRINIVASAN
SRIVASTAVA
SUBRAMANIAN
THAKUR
TRIVEDI
VENKATESAN
VERMA
VISWANATHAN
VORA
925
325
348
282
272
327
236
270
560
225
312
419
301
301
286
3,879
352
326
549
233
378
367
396
388
270
222
244
364
244
1,196
459
279
878
213
2,467
1,249
2,412
245
463
225
876
498
702
381
383
281
262
218
223
Table A1: Most Common US Ethnic Surnames (continued)
Japanese
AOKI
AOYAMA
ASATO
CHEN
DOI
FUJII
FUJIMOTO
FUKUDA
FURUKAWA
HANAWA
HARADA
HASEGAWA
HASHIMOTO
HAYASHI
HEY
HIGASHI
HIGUCHI
HONDA
IDE
IKEDA
IMAI
INOUE
IRICK
ISHIDA
ISHII
ISHIKAWA
ITO
IWAMOTO
KANEKO
KATO
KAUTZ
KAWAMURA
KAWASAKI
KAYA
KIMURA
KINO
KINOSHITA
KIRIHATA
KISHI
KIWALA
KOBAYASHI
LI
LIU
MAKI
MATSUMOTO
MIYANO
MIZUHARA
MORI
MORITA
MOSLEHI
MOTOYAMA
MURAKAMI
Korean
141
66
73
88
90
92
98
84
218
69
90
171
110
148
75
98
81
102
136
98
129
90
86
93
82
208
260
78
157
113
87
87
104
78
108
74
93
107
65
132
296
75
84
167
147
70
87
128
64
165
130
67
AHN
BAE
BAEK
BAK
BANG
BARK
BYUN
CHA
CHAE
CHANG
CHIN
CHO
CHOE
CHOI
CHON
CHOO
CHUN
CHUNG
DROZD
EYUBOGLU
GANG
GU
HAHM
HAHN
HAM
HAN
HANSELL
HOGLE
HONE
HONG
HOSKING
HUH
HWANG
HYUN
IM
JANG
JEON
JEONG
JI
JIN
JO
JOO
JU
JUNG
KANG
KIANI
KIM
KO
KOO
KUN
KWAK
KWON
Russian
610
122
77
68
91
39
87
45
33
289
33
977
193
1,081
33
94
330
1,499
45
36
34
533
42
1,016
45
145
39
43
78
907
63
32
108
54
80
46
134
122
268
673
41
68
55
582
809
74
5,455
595
214
63
96
298
AGHAJANIAN
ALPEROVICH
ALTSHULER
ANDREEV
ANSCHER
BABICH
BABLER
BARINAGA
BARNA
BELOPOLSKY
BERCHENKO
BLASKO
BLONDER
BONIN
CODILIAN
COMISKEY
DAMADIAN
DANKO
DAYAN
DERDERIAN
DOMBROSKI
ELKO
FETCENKO
FISHKIN
FOMENKOV
FRENKEL
FRIDMAN
FROLOV
GARABEDIAN
GELFAND
GINZBURG
GITLIN
GLUSCHENKOV
GORALSKI
GORDIN
GORIN
GRINBERG
GROCHOWSKI
GUREVICH
GURSKY
GUZIK
HABA
HYNECEK
IBRAHIM
IVANOV
IVERS
JOVANOVIC
JU
JUHASZ
KAHLE
KAMINSKI
KAMINSKY
Vietnamese
77
64
71
94
95
79
73
72
96
71
94
79
82
97
90
74
118
69
143
169
66
81
62
82
73
71
67
68
104
139
73
73
73
69
65
99
104
77
107
89
79
96
82
229
165
66
65
126
71
173
393
150
ABOU-GHARBIA
BAHN
BANH
BI
BICH
BIEN
BUI
CAN
CONG
DANG
DIEM
DIEP
DINH
DIP
DO
DOAN
DOMINH
DONLAN
DOVAN
DUAN
DUE
DUONG
DUONG-VAN
ESKEW
GRAN
HAC
HAUGAN
HO
HOANG
HOPPING
HUYNH
HUYNH-BA
KHA
KHAW
KHIEU
KHU
KHUC
LAHUE
LAURSEN
LAVAN
LE
LE ROY
LEEN
LEMINH
LUONG
LY
MINH
NELLUMS
NGO
NGUY
NGUYEN
NHO
22
15
21
158
18
91
309
19
41
23
24
52
232
11
13
616
33
21
26
241
20
153
13
12
20
20
16
35
277
15
317
19
13
20
35
13
15
17
72
18
1,263
29
75
17
107
118
41
17
735
12
4,720
12
Table A1: Most Common US Ethnic Surnames (continued)
Japanese
NAJJAR
NAKAGAWA
NAKAJIMA
NAKAMURA
NAKANISHI
NAKANO
NEMOTO
NISHIBORI
NISHIMURA
NODA
OGAWA
OGURA
OHARA
OHKAWA
OKADA
OKAMOTO
ONO
OVSHINSKY
SAITO
SAKAI
SASAKI
SATO
SETO
SHIMIZU
SUZUKI
TAKAHASHI
TAKEUCHI
TAMURA
TANAKA
THOR
TSUJI
TSUKAMOTO
UCHIDA
UEDA
WADA
WANG
WATANABE
WU
YAMADA
YAMAGUCHI
YAMAMOTO
YAMASAKI
YAMASHITA
YAMAZAKI
YANG
YASUDA
YOSHIDA
YUAN
ZHAO
Korean
81
125
99
187
64
104
70
88
131
107
74
209
269
89
87
103
148
314
136
79
209
231
73
103
306
245
242
83
328
66
92
89
72
72
153
81
416
67
180
102
432
67
105
91
65
75
178
112
81
LEE
LIM
MENNIE
MIN
NA
NAM
NEVINS
NYCE
OH
PAEK
PAIK
PAK
PARK
QUAY
RHEE
RIM
RYANG
RYU
SAHM
SAHOO
SEO
SHIM
SHIN
SHINN
SIN
SJOSTROM
SO
SOHN
SON
SONG
SUE
SUH
SUK
SUNG
SUR
TOOHEY
UM
WHANG
WON
YI
YIM
YOHN
YOO
YOON
YOUN
YU
YUH
YUM
YUN
Russian
1,032
135
96
242
34
68
42
56
461
41
144
116
2,145
107
191
57
38
99
45
58
47
162
399
96
62
39
332
78
147
105
64
311
75
41
38
33
36
175
108
237
145
32
290
614
38
198
96
78
222
KANEVSKY
KAPLINSKY
KAPOSI
KHAN
KHANDROS
KHOVAYLO
KOLMANOVSKY
KORSUNSKY
KOWAL
LAPIDUS
LEE
LOPATA
MESSING
METLITSKY
MIKHAIL
MIRKIN
MOGHADAM
NADELSON
NAZARIAN
NEMIROVSKY
NIE
OGG
PAPADOPOULOS
PAPATHOMAS
PETROV
PINARBASI
PINCHUK
POPOV
PROKOP
RABER
RABINOVICH
ROBICHAUX
RUBSAMEN
SAHATJIAN
SARKISIAN
SARRAF
SCHREIER
SCHWAN
SIMKO
SMETANA
SOFRANKO
SOKOLOV
SORKIN
TABAK
TEPMAN
TERZIAN
VASHCHENKO
WASILEWSKI
ZEMEL
Vietnamese
114
69
72
104
161
69
70
153
74
63
113
113
74
95
115
66
72
65
75
73
72
125
132
67
102
131
123
81
86
78
123
65
69
66
65
82
62
81
77
69
66
91
111
85
80
87
96
80
126
NIEH
NIM
PHAM
PHAN
PHANG
PHY
POSTMAN
QUACH
QUI
QUY
ROCH
TA
TAKACH
TAU
THACH
THAI
THAO
THI
THIEN
THUT
TIEDT
TIEP
TIETJEN
TO
TON-THAT
TRAN
TRANDAI
TRANG
TRANK
TRIEU
TRONG
TRUC
TU
TUTEN
TUY
TY
VAN
VAN CLEVE
VAN DAM
VAN LE
VAN NGUYEN
VAN PHAN
VAN TRAN
VIET
VO
VO-DINH
VOVAN
VU
VUONG
69
14
901
27
11
19
12
95
11
13
26
91
30
23
33
86
21
13
15
28
14
12
59
76
16
2,050
14
34
11
49
12
27
545
23
16
27
58
40
20
17
29
26
15
11
269
32
20
502
107
Related documents
Download