The building blocks of economic complexity Ce´sar A. Hidalgo and Ricardo Hausmann

advertisement
The building blocks of economic complexity
César A. Hidalgo1 and Ricardo Hausmann
aCenter
for International Development and Harvard Kennedy School, Harvard University, Cambridge, MA 02138
Edited by Partha Sarathi Dasgupta, University of Cambridge, Cambridge, United Kingdom, and approved May 1, 2009 (received for review January 28, 2009)
For Adam Smith, wealth was related to the division of labor. As
people and firms specialize in different activities, economic efficiency increases, suggesting that development is associated with
an increase in the number of individual activities and with the
complexity that emerges from the interactions between them.
Here we develop a view of economic growth and development
that gives a central role to the complexity of a country’s economy
by interpreting trade data as a bipartite network in which countries
are connected to the products they export, and show that it is
possible to quantify the complexity of a country’s economy by
characterizing the structure of this network. Furthermore, we
show that the measures of complexity we derive are correlated
with a country’s level of income, and that deviations from this
relationship are predictive of future growth. This suggests that
countries tend to converge to the level of income dictated by the
complexity of their productive structures, indicating that development efforts should focus on generating the conditions that would
allow complexity to emerge to generate sustained growth and
prosperity.
economic development 兩 networks
F
or Adam Smith, the secret to the wealth of nations was related
to the division of labor. As people and firms specialize in
different activities, economic efficiency increases. This division of
labor, however, is limited by the extent of the market: The bigger
the market, the more its participants can specialize and the deeper
the division of labor that can be achieved. This suggests that wealth
and development are related to the complexity that emerges from
the interactions between the increasing number of individual
activities that conform an economy (1–3).
Now, if all countries are connected to each other through a global
market for inputs and outputs so that they can exploit a division of
labor at the global scale, why have differences in Gross Domestic
Product (GDP) per capita exploded over the past 2 centuries? (4,
5, *) One possible answer is that some of the individual activities
that arise from the division of labor described above cannot be
imported, such as property rights, regulation, infrastructure, specific labor skills, etc., and so countries need to have them locally
available to produce. Hence, the productivity of a country resides
in the diversity of its available nontradable “capabilities,” and
therefore, cross-country differences in income can be explained by
differences in economic complexity, as measured by the diversity of
capabilities present in a country and their interactions.
During the last 20 years, models of economic growth have often
included the assumption that the variety of inputs that go into the
production of the goods produced by a country affects that country’s overall productivity (3, 6). There have been very few attempts,
however, to bring this intuition to the data. In fact, the most
frequently cited surveys of the empirical literature do not incorporate a single reference to any measure of diversity of inputs or
complexity (7).
We can create indirect measures of the capabilities available in
a country by thinking of each capability as a building block or Lego
piece. In this analogy, a product is equivalent to a Lego model, and
a country is equivalent to a bucket of Legos. Countries will be able
to make products for which they have all of the necessary capabilities, just like a child is able to produce a Lego model if the child’s
bucket contains all of the necessary Lego pieces. Using this analogy,
10570 –10575 兩 PNAS 兩 June 30, 2009 兩 vol. 106 兩 no. 26
the question of economic complexity is equivalent to asking
whether we can infer properties such as the diversity and exclusivity
of the Lego pieces inside a child’s bucket by looking only at the
models that a group of children, each with a different bucket of
Legos, can make. Here we show that this is possible if we interpret
data connecting countries to the products they export as a bipartite
network and assume that this network is the result of a larger,
tripartite network, connecting countries to the capabilities they
have and products to the capabilities they require (Fig. 1A). Hence,
connections between countries and products signal the availability
of capabilities in a country just like the creation of a model by a child
signals the availability of a specific set of Lego pieces.
Note that this interpretation says nothing of the processes
whereby countries accumulate capabilities and the characteristics of
an economy that might affect them. It just attempts to develop
measures of the complexity of a country’s economy at a point in
time. However, the approach presented here can be seen as a
building block of a theory that accounts for the process by which
countries accumulate capabilities. A detailed analysis of capability
accumulation is beyond the scope of this article but the implications
of our approach will be discussed briefly in Discussion.
In this article we develop a method to characterize the structure
of bipartite networks, which we call the Method of Reflections, and
apply it to trade data to illustrate how it can be used to extract
relevant information about the availability of capabilities in a
country. We interpret the variables produced by the Method of
Reflections as indicators of economic complexity and show that the
complexity of a country’s economy is correlated with income and
that deviations from this relationship are predictive of future
growth, suggesting that countries tend to approach the level of
income associated with the capability set available in them. We
validate our measures of the capabilities available in a country by
introducing a model and by showing empirically that our metrics are
strongly correlated with the diversity of the labor inputs used in the
production of a country’s goods, approximated by using data on the
use of labor inputs in the United States. Finally, we show that the
level of complexity of a country’s economy predicts the types of
products that countries will be able to develop in the future,
suggesting that the new products that a country develops depend
substantially on the capabilities already available in that country.
Methods
We look at country product associations by using international
trade data with products disaggregated according to 3 alternative
data sources and classifications: First, the Standard International
Trade Classification (SITC) revision 4 at the 4-digit level (see ref.
8; the data are available at www.nber.org/data, http://cid.econ.
udavis.edu/data/undata/undata.html, and www.chidalgo.com/
Author contributions: C.A.H. and R.H. designed research, performed research, contributed
new reagents/analytic tools, analyzed data, and wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
1To
whom correspondence should be addressed. E-mail: cesar㛭hidalgo@ksg.harvard.edu.
*In ref. 4, Maddison presents GDP per capita measures for 60 countries since 1820. In that
year, the ratio of the 95th to the 5th percentile was 3.18 but it increased to 17.82 by the
year 2000. Today, the U.S. GDP per capita is ⬎60 times higher than Malawi’s.
This article contains supporting information online at www.pnas.org/cgi/content/full/
0900943106/DCSupplemental.
www.pnas.org兾cgi兾doi兾10.1073兾pnas.0900943106
A Countries Capabilities
Products
a1
p1
c2
a2
p2
c3
a3
p3
c1
MYS
PAK
JPN
PHL
Countries Products
c1
p1
c2
p2
c3
p3
B Node Color SITC-4 Category Name
0-999 Food & live animals
1000-1999 Beverages & tobacco
2000-2999 Raw materials
3000-3999 Mineral fuels, lubricants & related materials
4000-4999 Animal & vegetable oils, fats & waxes
5000-5999 Chemicals
6000-6999 Manufactured goods by material
7000-7999 Machinery & transport equipment
8000-8999 Miscellanous manufactured articles
9000-9999 Miscellaneous
35
Non-Diversified
Countries
Producing
Standard
Products
Diversified
Countries
Producing
Standard
Products
Non-Diversified
Countries
Producing
Exclusive
Products
Diversified
Countries
Producing
Exclusive
Products
25
20
<k
>
SGP
GBR
15
10
0
kc,0
100
200
300
kc,0
productspace/data.html); second, the COMTRADE Harmonized
System at the 4-digit level; and third, the North American Industry
Classification System (NAICS) at the 6-digit level (SI Appendix,
Section 1). We interpret these data as bipartite networks in which
countries are connected to the products they export (Fig. 1B).
Mathematically, we represent this network using the adjacency
matrix Mcp, where Mcp ⫽ 1 if country c is a significant exporter of
product p and 0 otherwise. We consider country c to be a significant
exporter of product p if its Revealed Comparative Advantage
(RCA) (the share of product p in the export basket of country c to
the share of product p in world trade) is greater than some threshold
value, which we take as 1 in this exercise (RCAcp ⱖ 1) (see SI
Appendix, Section 2).
Method of Reflections. We characterize countries and products by
introducing a family of variables capturing the structure of the
network defined by Mcp (SI Appendix, Section 3). Because of the
symmetry of the bipartite network, we refer to this technique as the
‘‘Method of Reflections,’’ as the method produces a symmetric set
of variables for the 2 types of nodes in the network (countries and
products).
The Method of Reflections consists of iteratively calculating the
average value of the previous-level properties of a node’s neighbors
and is defined as the set of observables:
kc, N ⫽
kp, N ⫽
1
k c,0
冘
1
k p,0
冘
M cpk p,N⫺1,
[1]
M cpk c,N⫺1,
[2]
p
c
for N ⱖ 1. With initial conditions given by the degree, or number
of links, of countries and products:
Hidalgo and Hausmann
USA
DEU
JPN
400
Fig. 1. Quantifying countries’ economic complexity.
(A) A country will be able to produce a product if it has
all of the available capabilities, hence the bipartite
network connecting countries to products is a result of
the tripartite network connecting countries to their
available capabilities and products to the capabilities
they require. (B) Network visualization of a subset of
Mcp in which we show Malaysia (MYS), Pakistan (PAK),
Philippines (PHL), Japan (JPN), and all of the products
exported by them in the year 2000 (colored circles),
illustrating how countries and products are connected
in Mcp. (C) kc,0–kc,1 diagram divided into 4 quadrants
defined by the empirically observed averages 具kc,0典 and
具kc,1典.
kc,0 ⫽
冘
冘
M cp,
[3]
M cp.
[4]
p
kp,0 ⫽
c
kc,0 and kp,0 represent, respectively, the observed levels of diversification of a country (the number of products exported by that
country), and the ubiquity of a product (the number of countries
exporting that product). Hence, we characterize each country
through the vector kជ c ⫽ (kc,0, kc,1, kc,2 . . . kc,N) and each product by
the vector kជ p ⫽ (kp,0,kp,1,kp,2, . . . ,kp,N).
For countries, even variables (kc,0,kc,2,kc,4, . . . ) are generalized
measures of diversification, whereas odd variables (kc,1,kc,3,kc,5, . . . )
are generalized measures of the ubiquity of their exports. For
products, even variables are related to their ubiquity and the
ubiquity of other related products, whereas odd variables are
related to the diversification of countries exporting those products.
In network terms, kc,1 and kp,1 are known as the average nearest
neighbor degree (9,10). Higher order variables, however, (N ⬎ 1)
can be interpreted as a linear combination of the properties of all
of the nodes in the network with coefficients given by the probability that a random walker that started at a given node ends up at
another node after N steps (see SI Appendix, Section 4).
Results
We can begin understanding the type of information about countries captured by the Method of Reflections by looking at where
countries are located in the space defined by the first two sets of
variables produced by our method: kc,0 and kc,1. Fig. 1C shows that
there is a strong negative correlation between kc,0 and kc,1 (10, 11),
meaning that diversified countries tend to export less ubiquitous
products. Deviations from this behavior, however, are informative.
For example, whereas Malaysia and Pakistan export the same
PNAS 兩 June 30, 2009 兩 vol. 106 兩 no. 26 兩 10571
STATISTICS
30
>
ECONOMIC SCIENCES
<k
kc,1
kc,1
C
MWI FJI
MDG
HTIHND
WSM
c,0
SLV GTM
NIC
GMB
GIN JAM
GUY MUS
BGD
TGO
CAF SDN MAC
CRI DOM
TKMMNG
UGA
KEN ALB MAR
CMR SYR
SEN
NPL
MOZ
BDI
PAK
GAB BLZ TZA
LVA
BLR
NCLETH
LTU
TJK
MDA
BHR
NGATTOGHA
EGY
PNG AZEBOL LKA
BFACIV
LBN EST
ZWE
c,1
ZMBVEN ECU
CYP
BENISLARM
PAN
PHL
KGZ
DZA BHS
HRV
GEO JOR PER
TUR
SLE
COL
MLT
RWASAU
ROM IDN
PRY
GRC
OMN MLIKNA
CHL
PRT
NZL
NERIRN BRB
THA
URY
ZAFUKR SVKIND
POL
MEX ARG
HUN SVN
KAZ NOR
AUSHKG
CHN
RUS
DNK
BRA
CZE ESP
ISR
CAN
ITA
FIN KOR
AUT
MYS
NLD
IRL
SWE
500
100
700
50
100
150
200
Capabilities
50
100
150
Capabilities
40
10
600
120
50
200
30
0
0
10
20
kc,0
35
Mcp
80
100
300
500
MYS
FIN
SWE
JPN
110
120
130
HUN
SVK
ROMCAN
JOR
PHL
SEN NOR
HRV
EST
SVN
DNK
BRA
KOR
LVA
BLR ALB
IRL
SGP
THA
PRT CHN
IDN
BOLLBN ISR
NLD
MEX UKR
RUSLTU HKG
BRB
GIN
BHS
CRI
OMN
MAR
TUR
CYP
MUS
URY
HNDZWE
NZL
TGO
NPL
KNA
KAZ
SLV
IND
COL
MNG
PRYMDA
GHA
ZAF GRC
ECU
BEN
ARG
GTM
ZMBFJI
TTO
MLT
KGZ PER
CIV
CAF MLI
ISL
MAC
PAN AUS
CHL
DZA
BGD
MOZ
MDG
KEN
UGA
NER
EGY
GEO
JAM
MWI
NCL
BHR
TZA
GUY
NIC
TKM
VEN
CMR
GMB
ETH
BDI
BFA
SAU
SDN
BLZ
100
50
kc,0
JPN
AUT ITA
CZE
DEU
POL
ARM
0
Na=50
ESP
200
300
400
100
200
kc,0
300
15
kc,0
20
40
20
15
20
10
150
80
60
100
400
40
20
10
20
30
0
40
110
130
150
Na
MYS
FIN
SWE
JPN
AUT
ITA
ARM
SVK CZE
CAN HUN
JOR ROM
NOR
PHL
POL
SEN
HRV
EST
SVN DEU
DNK
BRA
KOR
BLR
ALBLVA
IRL
SGP
IDN THA
CHN
ISRPRT
BOL
LBN
NLD ESP
MEX
LTU
UKR
RUS HKG
BRB
GIN
BHS
CRI MAR
OMN
TUR
CYPURY
NZL
TGO HNDMUS
NPL
ZWE
KNA
KAZ
SLV
IND
COL
MNG
PRY
GHA
ZAF
ECU
GRC
BEN
ARG
GTM
MDA
ZMB
FJITTO
MLT
KGZ
MLI
PER
CAFCIV
ISL
PAN
CHL
DZA
BGD
AUS
MOZMAC
KEN
UGA MDG
GEO
JAM NERBHREGY
MWITZA
NCL
GUY
NICTKM
VEN
CMR
ETH
GMB
BDI
BFA
SDN
BLZ SAU
PNG
AZEGAB
NGA
25
0
r=0.7 q=0.05
Na=200
Na
MYS
SWE
FIN
AUT
ITA
ARM
SVK ROM
CAN HUN
CZE
JOR
DEU
NOR
PHL
POL HRV
SEN
ESTBLR
SVN
DNK
BRA
KOR
LVA
ALB
IRL
SGP ISR CHN THA
PRTIDN BOL
LBN
NLDESP MEX
LTU
RUS
BRB
HKGUKR
BHS
CRIGIN HND
OMN
MAR
TUR
CYP
URY
NZL
TGO
NPL MUS
KAZ
INDKNA
COLZWE
MNG SLV
PRY
GHA
ZAF
ECU
GRC
BEN
ARG
GTM
MDA
ZMB
FJI
TTO
MLT
KGZCIV
MLI
PER
CAF
ISL
MAC
PAN
DZA
AUS CHL
BGD
MOZ
MDG
KEN
UGA
NER
EGY
GEO
MWI
NCLTZA JAM
BHR
GUY
TKM NIC
VEN
CMR GMB
ETH
BDI
BFA
SAU
SDN
BLZ
IRN
100
25
50
30
100 0
GAB
PNGAZE
NGA
IRN
90
Average Number of Labor Inputs
140
Products
D
k c,0
kc,1
10
0
700
300
40
15
120
200
50
20
100
100
60
25
50
30
20
0
70
kc,1
60
40
Na=200
30
40
r=0.7
Countries
20
30
30
kc,1
400
r=0.55 q=0.1
N =50
60 a
kc,0
Πpa
300
Na=50
60
kc,1
Cca
Na=200
C
q=0.1
70
20
r=0.55
80
200
Products
Countries
60
q=0.05
30
100
20
40
B
kc,1
q=0.05
kc,0
r=0.7
kc,1
A
PNG GAB AZE
NGA
IRN
30
35
120
140
k c,1
160
180
kc,2
200
Fig. 2. Capabilities and bipartite network structure. (A) We model the structure of Mcp by taking 2 random matrices representing the availability of capabilities
in a country and the requirement of capabilities by products and consider that countries are able to produce products if they have all of the required capabilities.
(B) The kc,0–kc,1 diagrams that emerge from 4 implementations of the model described in A. (C) kc,0 and kc,1 as a function of the number of capabilities (Nc) available
in countries for 2 implementations of the model. (D) Average number of labor inputs required by products produced in a country as a function of the first 3
components of kជ c.
number of products, the products exported by Malaysia (kMYS,0 ⫽
104, kMYS,1 ⫽ 18) are exported by fewer countries than those
exported by Pakistan (kPAK,0 ⫽ 104, kPAK,1 ⫽ 27.5). Combining this
fact with our third level of analysis, we see that Malaysian products
are exported by more diversified countries than the exports of
Pakistan (kMYS,2 ⫽ 163 kPAK,2 ⫽ 142, SI Appendix, Section 8). This
suggests that the productive structure of Malaysia is more complex
than that of Pakistan, due, as we will show shortly, to a larger
number of capabilities available in Malaysia than in Pakistan.
In SI Appendix we show that the negative relationship presented
in the kc,0–kc,1 diagram is not a consequence of variations in the level
of diversification of countries and in the ubiquity of products. We
prove this by creating 4 null models (11) that control, with increasing stringency, for the diversification of countries and the ubiquity
of products and show that these distributions, per se, are not
responsible for the negative relationship observed in the data (see
SI Appendix, section 6).
Minimalistic Model. We show that the location of countries in the
kc,0–kc,1 diagram is informative about the capabilities available in a
country by introducing a simple model based on the assumption
that country c will be able to produce product p if it has all of the
required capabilities (Fig. 2A).
We implement this model by considering a fixed number of
capabilities in each country and represent this by using a matrix Cca,
that is equal to 1 if country c has capability a and 0 otherwise. We
represent the relationship between capabilities and the products
that require them by a matrix ⌸pa whose elements are equal to 1 if
product p requires capability a and 0 otherwise.
10572 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0900943106
Using the notation introduced above, together with our only
assumption, we can model the structure of the Mcp matrix as:
Mcp ⫽ 1
if
冘
a
⌸ pa ⫽
冘
⌸ paC ca
a
and
M cp ⫽ 0
otherwise
[5]
The simplest implementation of this model is to consider Cca ⫽ 1
with probability r and 0 with probability 1 ⫺ r and ⌸pa ⫽ 1 with
probability q and 0 with probability 1 ⫺ q. An emergent property
of the matrix resulting from this model is that the average ubiquity
of a country’s products tends to decrease with its level of diversification for a wide range of parameters (Fig. 2B). We interpret this
negative relationship by considering that countries with many
capabilities will be more diversified, because they can produce a
wider set of products, and that because they can make products
requiring many capabilities, few other countries will have all of the
requisite capabilities to make them, hence diversified countries will
be able to make less ubiquitous products.
The model allows us to test directly whether given this set of
assumptions we should expect countries with more capabilities to be
more diversified and produce less ubiquitous products. Fig. 2C
shows that, in the model, the diversity of a country increases with
the number of capabilities it poses, whereas the ubiquity of a
country’s products is a decreasing function of the number of
capabilities available in that country, providing further theoretical
evidence that kជ c captures information on the availability of capabilities in a country, and therefore, about the complexity of its
economy.
Hidalgo and Hausmann
STATISTICS
ECONOMIC SCIENCES
Fig. 3. Bipartite network structure and income (all GDPs have been adjusted by Purchasing Power Parity PPP). A–E were constructed with data from the year
2000. (A–C) GDP per capita adjusted by purchasing power parity as a function of our first 3 measures of diversification (kc,0,kc,2,kc,4), normalized by subtracting
their respective means (具kc,N典) and dividing them by their standard deviations (stdev(kc,N)). (A) kc,0. (B) kc,2. (C) kc,4. (D) Comparison between the ranking of
countries based on successive measures of diversification (kc,2N) (E) Absolute value of the Pearson correlation between the log GDP per capita at ppp of countries
and theit local network structure characterized by kc,N. (F) Growth in GDP per capita at ppp observed between 1985 and 2005 as a function of growth predicted
from kc,18 and kc,19 measured in 1985 and controlling for GDP per capita at ppp in 1985.
Direct Measurement of a Subset of Capabilities. We provide empir-
ical evidence that the method of reflections extracts information
that is related to the capabilities available in a country by looking
at a measurable subset of the capabilities required by products. Fig.
2D shows the average number of different employment categories
required by products exported by countries versus kc,0, kc,1, and kc,2.
We measure the number of employment categories that go into a
product by using the data of the U.S. Bureau of Labor Statistics (see
SI Appendix, Section 1). This data should play against us, because
Hidalgo and Hausmann
we are disregarding the fact that other countries may use different
technologies to produce goods that are similarly classified†. Despite
this, we find a strong positive correlation between the average
†Indeed,
it is common for poorer countries to exchange labor for capital. For example,
building a road in the US is done by a relatively small team of workers, each of them
specialized to operate a different machine or technique, whereas more modest economies
will tend to use more workers, yet less specialized ones, because the relative cost of
machines to labor is larger in poorer economies. Hence we should expect poor countries
PNAS 兩 June 30, 2009 兩 vol. 106 兩 no. 26 兩 10573
A
B
40
35
40
k1=0.83k1-1.83
35
WSM
Pearson correlation = 0.63
t-test=9.17 p-value<2x10-15
25
20
15
10
5
0
HTI
NIC MNGMDG
TKM TGOSDNVEN
PNG
CAF
BDI ZMB
GUY
TZAGTM
TJK
SLV
NPL
MDABLR
ETH SEN
FJI
GIN GHA
FIN
ARM
UGA BOL
ALBDOM
BHR
RWA
NCL
AZE
KGZ
MOZ
HND
NGA
BLZ
PAN
BEN
BFA
MAR
ZWE
MAC
NER
GMB
JOR
CYP
SYR
LTU
LBN IRL
PRY
KEN
SLECMR
MYS
ECU
SVK
GEO
BHS
MLT LVA
PER
EST
SWE
MLI
PAK
BRB
OMN
BGD
UKR
GAB
AUS HKG
DZAIRNTTOCIV
KAZ CRI LKAZAF
ARG
PHL
KNA EGY IDN
ISL MUS
TUR
ROM
URY
PRT
THA
HRV
SAU
COL
CAN
IND
CHL
NORISR
BRA HUN
JAM
NZLGRC POL
SVN
SGP
RUS
MEX CHN
AUT
ESP
KOR
DNK
DEU
ITA
NLD
GBR
USA
k=-0.051k+21.82
Pearson correlation = -0.73
t-test=11.8 p-value=6x10-22
10
C
20
50
kc,0
100
200
<kp,1> (new exports)
200
180
160
140
120
100
D
80
WSM
MWI
30
HTI
MDG NIC
MNG
TKMSDN
TGO
VEN
PNG
CAF
GTM
BDISLV TJK
GUY
TZA ETH
ZMB
BLRMDA NPL
FJI
SEN
FIN
BOL GHA
ARM
UGA
ALB
BHR
RWAGIN
DOM
NCL
KGZAZE
MOZ
HND
NGA
BLZBEN BFA GMB
PAN
MAR
ZWE
MAC
NER
JOR
CYP
SYR
LTU
LBN
PRY
KEN
SLE
ECU
SVK IRL MYS
GEO
CMR
LVA
BHS
MLT
PER
EST
SWE
MLI
PAK
BRB
OMN
BGD
UKR
AUS
GAB TTO
ZAF
IRN
KAZ
CIV LKA MUS
CRI
HKG
ARG DZA
PHL
ISLTUR IDN
KNA
EGY
ROM
URY
PRT
THA
CANNOR
IND
CHL COL
ISRSAU POLHRV
BRA
GRC
HUN
NZL
JAM
SVNRUS
SGP
MEX
AUT
CHN
ESP
KOR
DNK
DEU
ITA
GBR NLD
USA
25
20
15
10
5
-5
12
Pearson correlation = 0.59
t-test=8.21 p-value<3x10-13
POL
NOR KOR
DNK
ITA
MEX
NZL
GBR
SVN
SAU
JAM
IRL
NLD
RUS GRCHRV
AUT
SVKBRB
SWE DEU
GMB
CHL IDN
ESP
THA
TUR
EGY
CHN
EST
SLE
PHL
HUN
MUS
COLURY
IND
ISL
KNA LVA
ROM
CAN
UKRSGP
IRNGEO
ZAF
JOR
MOZ
BRA USA
PAN
MAC
LTU
CYP
LBNISR
ARMGABBFA
ALBDOM
LKA
TTO BHR
SLV
MLI
MAR
PRT
MDAPAK
BGD
NGA BHS NPL
HKG
BLR
KEN
CRI
PRY
PER
KGZ
NER
ARG
OMN
MLT
SYR
BEN
HND ZWE MYS
DZA
ECU
FJI
CAF
CIV
BDI
AZE BLZ
KAZ
BOL
NCL
SEN
VEN GTM
PNG
GHA
ZMB
TJK
RWA TGO
ETH MNG
FIN
UGA
TZA
NIC
AUS
MDG
HTI
TKM
SDN
CMR
GUY
MWI
GIN
WSM
0
500
240 k1=0.178k+146.2
220
<kp,0> (new exports)
30
14
16
18
20
22
24
26
kc,1
28
30
32
240
POL
DNKNORKOR
ITA
MEXNZL
SVN
SAU
JAM
IRL
NLD
RUS HRV GRC
DEU
SVK AUT
SWE
BRB IDN
GMB
CHL
ESP
THA
TUR EST
EGY
CHN
SLE
PHL
HUN
MUS
URY
IND
LVA
ISL COL
KNA
ROM
UKR
GEO
IRN
USA SGPCANBRA ZAF JOR
MOZ
PANGAB
MAC
LTU
CYP
ISR BHR PRT
ALB ARM
LKA
TTOMLI
BFASLV
MAR
BHSLBN
DOM
NPL
MDA
NGA
PAK
HKG BLR
KEN
CRI BGD
PRY
PER
KGZ
ARG NER
OMN
MLT SYR
BEN
MYS
DZA
ECU
FJI
ZWE
CAFHND
BDI
AZENCL
KAZ BOLCIV
GTM
SEN
VEN
BLZ
PNG
GHA
ZMB
TJK
TGO
RWA
ETH
MNG
FIN
UGATZA
NIC
AUS
MDG
HTI
TKMSDN
CMR
GUY
GINMWI
220
<kp,1> (new exports)
<kp,0> (new exports)
MWI
GBR
200
180
160
140
120
100 k1=-2.99k1+230
Fig. 4. Path dependent development. Average network properties (具kp,0典, 具kp,1典;
measured in 1992) of the new exports developed by a country between 1992 and 2000 as
a function of the diversification of a country
kc,0 and the average ubiquity of its products
kc,1 measured in 1992. (A) kc,0 vs. 具kp,0典. (B) kc,1
vs. 具kp,0典. (C) kc,0 vs. 具kp,1典. (D) kc,1 vs. 具kp,1典.
WSM
Pearson correlation = -0.54
t-test=7.2 p-value<6x10-11
10
20
50
kc,0
100
200
500
80
12
14
16
number of employment categories going into the export basket of
countries and our family of measures of diversification
(kc,0, kc,2, kc,4, . . . ,kc,2N). We also find a negative correlation between the average number of employment categories and measures
of the ubiquity of products made by a country
(kc,1, kc,3, kc,5, . . . ,kc,2N⫹1) (Fig. 2D). This shows that more diversified countries indeed produce more complex products, in the sense
that they require a wider combination of human capabilities, and
that kជ c is able to capture this information.
Complexity of the Productive Structure, Income and Growth. We show
that the information extracted by the method of reflections is
connected to income by looking at the first 3 measures of diversification of a country (kc,0, kc,2, kc,4) versus GDP per-capita adjusted
for Purchasing Power Parity (PPP) (Fig. 3 A–C). To make these 3
different measures comparable we have normalized them by subtracting their respective means (具kN典) and dividing them by their
respective standard deviations (stdev(kN)). As we iterate the
method the relative ranking of countries defined by these variables
shifts (Fig. 3D and SI Appendix, Fig. S14), making our measures of
diversification and ubiquity increasingly more correlated with income (Fig. 3E and SI Appendix, Section 11). This can be illustrated
by looking at the position, in the kc,N–GDP diagrams, of 3 countries
that exported a similar number of products in the year 2000, albeit
having large differences in income (Pakistan (PAK), Chile (CHL)
and Singapore (SGP) Fig. 3 A–C). Higher reflections of our method
are able to correctly differentiate the income level of these countries
because they incorporate information about the ubiquity of the
products they export and about the diversification of other countries connected indirectly to them in Mcp, altering their relative
rankings (Fig. 3D and SI Appendix, Fig. S14). For example, kc,2 is
to use less labor inputs in the production of products than what would be reported from
U.S. labor data, accentuating the effect presented in Fig. 2D.
10574 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0900943106
18
20
22
24
26
kc,1
28
30
32
able to correctly separate Singapore, Chile and Pakistan, because
it considers that in the bipartite network Singapore is connected to
diversified countries mainly through nonubiquitous products, signaling the availability in Singapore of capabilities that are required
to produce goods in diversified countries. In contrast, Pakistan is
connected mostly to poorly diversified countries, and most of its
connections are through ubiquitous products, indicating that Pakistan has capabilities that are available in most countries and that its
relatively high level of diversification is probably due to its relatively
large population, rather than to the complexity of its productive
structure. Indeed, we find the method of reflections to be an
accurate way to control for a country’s population, as correlations
between kជ c and population decrease rapidly as we iterate the
method (see SI Appendix, Section 11), whereas correlations between kជ c and GDP increase as we iterate the method. This is another
piece of evidence suggesting that the information captured by our
method is related to factors that affect the ability to generate per
capita income.
Deviations from the correlation between kជ c and income are good
predictors of future growth, indicating that countries tend to
approach the levels of income that correspond to their measured
complexity. We show this by regressing the rate of growth of income
per capita on successive generations of our measures of economic
complexity (i.e., kc,0,kc,1 or kc,10,kc,11) and on a country’s initial level
of income
log
冉
GDP共t ⫹ ⌬t兲
GDP共t兲
冊
⫽ a ⫹ b 1GDP共t兲 ⫹ b 2k c,N共t兲
⫹ b 3k c,N⫹1共t兲,
finding that successive generations of the variables constructed in
the previous section are increasingly good predictors of growth. In
SI Appendix, Section 13, we present regression tables showing that
these results are valid for a 20-year period (1985–2005), two 10-year
Hidalgo and Hausmann
Discussion
Understanding the increasingly large gaps in income per capita
across countries is one of the eternal puzzles of development
economics. Our view is that complexity is at the root of the
explanation, as argued by both Adam Smith (1) and the recent
endogenous growth theories (2, 3), yet empirical research has not
advanced along these dimensions because of the absence of adequate measures of complexity. Instead, it has emphasized the
ACKNOWLEDGMENTS. We thank M. Andrews, A.-L. Barabási, B. Klinger, M.
Kremer, N. Nunn, L. Pritchett, R. Rigobon, D. Rodrik, M. Yildirim, R. Zeckhauser,
participants at the Center for International Development’s Seminar on Economic
Policy and the Harvard Kennedy School Faculty Seminar, members of the Center
for Complex Network Research at Northeastern University, and the Ratatouille
Seminar Series. We acknowledge support from the Growth Lab and the Empowerment Lab at the Center for International Development.
1. Smith A (1776) An Inquiry into the Nature and Causes of the Wealth of Nations (W.
Strahan and T. Cadell, London).
2. Romer P (1990) Endogenous technological change. J Pol Econ 98:S71–S102.
3. Grossman GM, Helpman E (1991). Quality ladders in the theory of growth. Rev Econ
Stud 58:43– 61.
4. Maddison A (2001) The World Economy: A Millennial Perspective (Development
Centre of the OECD, Paris).
5. Pritchett L (1997) Divergence, big time. J Econ Perspec 11:3–18.
6. Aghion P, Howitt PW(1998) Endogenous Growth Theory (MIT Press, Cambridge, MA)
7. Barro RJ, Sala-i-Martin X(2003) Economic Growth (MIT Press, Cambridge, MA)
8. Feenstra RC, Lipsey RE, Deng H, Ma AC, Ma H (2005) World Trade Flows: 1962–2000.
NBER Working Paper 11040. Available at www.nber.org/papers/w11040.
9. Pastor-Satorras R, Vazquez A, Vespignani A (2001) Dynamical and correlation properties of the internet. Phys Rev Lett 87:258701.
10. Maslov S, Sneppen K (2002) Specificity and stability in topology of protein networks.
Science 296:910 –913.
11. Newman MEJ (2002) Assortative mixing in networks. Phys Rev Lett 89:208701.
12. Hirschman AO (1945) National power and structure of foreign trade (University of
California Press, Berkley, CA).
13. Herfindahl OC (1950) Concentration in the steel industry (PhD Dissertation, Columbia
University, New York)
14. Saviotti PP, Frenken K (2008) Export variety and the economic performance of countries. J Evol Econ 18:201–218.
15. Hidalgo CA, Klinger B, Barabási A-L, Hausmann R (2007) The product space conditions
the development of nations. Science 317:482– 487.
16. Hausmann R, Klinger B (2006) The structure of the product space and the evolution of
comparative advantage. CID Working Paper No. 128. Available at www.cid.harvard.
edu/cidwp/128.htm.
17. Hidalgo CA, Hausmann R (2008) A network view of economic development. Developing Alternatives 12(1):5–10.
18. Hirschman AO (1958) The Strategy of Economic Development (Yale Univ Press, New
Haven, CT).
Hidalgo and Hausmann
PNAS 兩 June 30, 2009 兩 vol. 106 兩 no. 26 兩 10575
STATISTICS
accumulation of a few highly aggregated factors of production, such
as physical and human capital or general institutional measures,
such as rule of law, disregarding their specificity and complementarity. In this article we have presented a technique that uses
available economic data to develop measures of the complexity of
products and of countries, and showed that (i) these measures
capture information about the complexity of the set of capabilities
available in a country; (ii) are strongly correlated with income per
capita; (iii) are predictive of future growth; and (iv) are predictive
of the complexity of a country’s future exports, making a strong
empirical case that the level of development is indeed associated to
the complexity of a country’s economy.
This article has not emphasized the process through which
countries accumulate capabilities, but has instead focused on their
measurement and consequences. However, the results presented
here suggest that changes in a country’s productive structure can be
understood as a combination of 2 processes, (i) that by which
countries find new products as yet unexplored combinations of the
capabilities they already have, and (ii) the process by which countries accumulate new capabilities and combine them with other
previously available capabilities to develop yet more products.
A possible explanation for the connection between economic
complexity and growth is that countries that are below the income
expected from their capability endowment have yet to develop all of
the products that are feasible with their existing capabilities. We can
expect such countries to be able to grow more quickly, relative to
those countries that can only grow by accumulating new capabilities.
This perspective also suggests that the incentive to accumulate
capabilities would depend, among other things, on the expected
demand that new capabilities would face, and this would depend on
how new capabilities can complement existing ones to create new
products. This opens up an avenue for further research on the
dynamics of product and capability accumulation.
Development economics has tended to disregard the search for
detailed capabilities and their patterns of complementarity, hoping
that aggregate measures of physical capital (e.g., measured in
dollars) or human capital (e.g., measured in years of schooling)
would provide enough guidance for policy. Our line of research
would justify and provide guidance to development strategies that
look to promote products (or capabilities) as a way to create
incentives to accumulate capabilities (or develop new products) that
could themselves encourage the further coevolution of new products and capabilities, echoing ideas put forward by Albert Hirschman (18) more than 50 years ago, but adding the capacity to
analyze them in practice.
ECONOMIC SCIENCES
periods or four 5-year periods, and that it is robust to the inclusion
of other control variables such as individual country dummies (to
capture any time-invariant country characteristic) and outperforms
other indicators used to measure the productive structure of a
country such as the Hirschman-Herfindahl (12, 13) index and
entropy measures (14). A graphical example of this relationship is
presented in Fig. 3f, which compares the growth predicted from the
linear regression described by Eq. 6 and that observed empirically
for the 1985–2005 period and N ⫽ 18.
Finally, we show that the evolution of Mcp exhibits strong path
dependence, meaning that we can anticipate some of the properties
of a country’s future new exports based on its current productive
structure. This observation is consistent with the existence of an
unobservable capability space that evolves gradually, because the
ability of a country to produce a new product is limited to
combinations of the capabilities it initially possesses plus any new
capabilities it will accumulate. Countries with many capabilities will
be able to combine new capabilities with a wide set of existing
capabilities, resulting in new products of higher complexity than
those of countries with few capabilities, which will be limited by this
fact.
We show this using data collected between 1992 and 2000 (we
choose 1992 as our starting point because the end of the Soviet
Union and the unification of Germany introduce large discontinuities in the number and identity of countries) and consider as a
country’s new exports those items for which that country had an
RCAcp ⬍ 0.1 in the year 1992 and an RCAcp ⱖ 1 by the year 2000.
Fig. 4 shows that the level of diversification (kc,0) of a country and
the ubiquity of its exports (kc,1), predicts the average ubiquity
(具kp,0典) of a country’s new exports and the average level of diversification (具kp,1典) of the countries that were hitherto exporting those
products.
This result is related to the idea that the productive structure of
countries evolves by spreading to ‘‘nearby’’ products in The Product
Space (15–17), which is a projection of the bipartite network studied
here in which pairs of products are connected based on the
probability that they are exported by the same countries. This last
set of results suggests that the proximity between products in the
The Product Space is related to the similarity of the requisite
capabilities that go into a product, because countries tend to jump
into products that require capabilities that are similar to those
required by the products they already export.
SUPPLEMENTARY MATERIAL FOR:
THE BUILDING BLOCKS OF ECONOMIC COMPLEXITY
Cesar A. Hidalgo, Ricardo Hausmann
Center for International Development and Harvard Kennedy School, Harvard University
TABLE OF CONTENTS
SECTION 1: SOURCE DATA
2
SECTION 2: REVEALED COMPARATIVE ADVANTAGE (RCA)
3
SECTION 3: THE COUNTRY-PRODUCT NETWORK
4
SECTION 4: BIPARTITE NETWORK ANALYSIS
6
SECTION 5: BIPARTITE NETWORK STRUCTURE MEASURED IN OTHER DATASETS
13
SECTION 6: RANDOMIZING A BIPARTITE NETWORK
14
SECTION 7: THE KP,0-KP,1 DIAGRAM
16
SECTION 8: A THIRD REFLECTION VIEW OF THE STRUCTURE OF THE COUNTRY-PRODUCT NETWORK
19
SECTION 9: NULL MODELS AND GDP
20
SECTION 10: THE METHOD OF REFLECTIONS AND COUNTRY RANKINGS (YEAR 2000)
22
SECTION 11: THE METHOD OF REFLECTIONS AND POPULATION
23
SECTION 12: SHARES OF PRODUCTS IN THE WORLD
24
SECTION 13: NETWORK STRUCTURE, INCOME AND GROWTH
25
SECTION 14: ADDITIONAL RESULTS
35
REFERENCES
42
1
SECTION 1: SOURCE DATA
All of the figures presented in the main text of this paper were constructed using
International trade data taken from Feenstra, Lipsey, Deng, Ma and Mo's "World Trade Flows:
1962-2000" dataset. This dataset consists of imports and exports both by country of origin and
by destination, with products disaggregated to the SITC revision 4, four-digit level. The authors
built this dataset using the United Nations COMTRADE database. The authors cleaned that
dataset by calculating exports using the records of the importing country, when available,
assuming that data on imports is more accurate than data from exporters. This is likely, as
imports are more tightly controlled in order to enforce safety standards and collect customs
fees. In addition, the authors correct the UN data for flows to and from the United States, Hong
Kong, and China. We focus only on export data and do not disaggregate by country of
destination. More information on this dataset can be found in NBER Working Paper #11040,
and
the
dataset
itself
is
available
at
www.nber.org/data.
and
http://cid.econ.ucdavis.edu/data/undata/undata.html
We checked the validity of our results by using two additional datasets: COMTRADE
classified according to the Harmonized System at the 4-digit level (1241 products, 103
countries) and the North American Industry Classification System (NAICS) (318 products, 150
countries). We found that our results are not affected by the use of data at these different
levels of aggregation. We chose to work with the Feenstra dataset because, of the three
datasets available, it is the one only one that has been cleaned and checked thoroughly as part
of a dedicated research project.
The labor data used to construct figure 2d was downloaded from the US Bureau of
Labor and Statistics at http://www.bls.gov/data/
2
SECTION 2: REVEALED COMPARATIVE ADVANTAGE (RCA)
One way to empirically estimate whether a country is a significant exporter of a product is
to calculate the Revealed Comparative Advantage (RCA) that that country has in a particular
product. RCA is a measure constructed to inform whether a country’s share of a product’s
world market, is larger or smaller than the product’s share of the entire world market.
Mathematically, we can rewrite the above sentence by introducing Scp, as the share that
country c has of the world market for product p, and Tp as the total share of product p of the
world market. Using this notation, RCA can be written as
RCAcp= Scp / Tp
(1)
(2)
where
RCA CUTOFFS, EXPORTS AND COUNTRIES’ LEVEL OF DIVERSIFICATION
The natural cutoff used to determine whether a country has revealed comparative
advantage in a product is RCA≥1. At this point the country’s share of that product’s market is
equal or larger than the product’s share of the world market. The benchmark here is a world in
which countries export an amount of each product equal to the share of that product in the
world market times the size of its economy.
From an empirical perspective, we can study the number of products (kc,0) for which a
country has RCA as a function of the RCA cutoff. By performing this exercise we find that the
RCAcp=1 cutoff lies on the phase transition of a softened step function (Figure S1).
3
Fig S 1 Diversification (kc,0) as a function of the RCA cutoff for all countries in the study
What is interesting about looking at kc,0(RCA) from this empirical perspective is that we
can see that there are a few countries that had exports in almost all of the 772 products
exported in the year 2000. For example, Germany exported 758 products with an RCA≥0.01,
and 707 products with RCA≥0.1, a profile similar to that of other industrialized countries like the
U.K., U.S.A and Italy. Hence lowering the RCA threshold shows that industrialized countries
manufacture and export products in almost all of the SITC-4 categories, and that specialization
patterns are empirically driven by the lack of diversification of less developed countries, rather
than by the absence of more productive economies in comparatively less sophisticated sectors.
SECTION 3: THE COUNTRY-PRODUCT NETWORK
Fig S 2 shows a simple visualization of the country product network for the year 2000 in
which countries are located at the center of the figure and products are grouped into root SITC4 categories along the edges of the image. This network consists of 129 countries, 772 products
and 13,470 links connecting countries and products when RCAcp≥1. The large number of links in
the network limits our ability to create a useful visualization of the entire set of connections.
4
Fig S 2 Visualization of the country product network in which all exports with an RCA>1 are shown.
5
SECTION 4: BIPARTITE NETWORK ANALYSIS
A bipartite graph or network is a set of nodes and links in which nodes can be separated
into two groups, or partitions, such that links only connect nodes in different partitions. While
in principle many networks can be separated into different partitions (for example every tree is
a bipartite graph), here we concentrate on examples that are bipartite, by definition, rather
than as a property. One example of naturally occurring bipartite networks are publication
networks, where nodes are researchers and papers, and links connect researchers to the
papers they have authored. Another example is the movie-actor network in which nodes are
actors and movies, and links connect actors to the movies in which they have starred..
With the exception of a few studies [1,2,3,4], bipartite networks have mostly been
investigated by projecting the network into one of its partitions [5,6,7,8,9,10,11,12,13,14], typically by
considering nodes to be connected if they share a neighbor in the opposite partition
[5,6,7,8,9,10,11,12,13,14]. For example, co-authorship networks link scientists that have co-authored
one or more papers [8,9,10,11], whereas movie-actor networks connect actors that have appeared
together in one or more movies.
While valuable information can be obtained from these projections, there is important
information that is left out by reducing the bipartite network into either one of its partitions,
regardless of the sophistication of the projection method. Here we present a method to
characterize the structure of a bipartite network by iteratively considering the properties of
neighboring nodes.
THE METHOD OF REFLECTIONS
In this section we explain in detail the method of reflections as a general technique to
study the structure of bipartite networks. To shorten the math we adopt a different notation
than the one used for the particular example of countries and products. Going forward, we
indicate all variables that are related to nodes in each partition by either Latin or Greek
characters.
6
Consider a bipartite network M described by the adjacency matrix Maα, where Maα =1 if
node a is connected to node α and zero otherwise.
We define the method of reflections as the recursive set of observables
,
,
1
,
1
,
,
(3)
,
(4)
for n>0,with
, (5)
, (6)
Following these definitions, the degree of nodes in the bipartite network is given by and (in this notation we can drop the a and α indices when referring to the general concept
described by the variable as the alphabet already indicates if the variables refers to one
partition or the other –countries or products-). In the example of the main text these variables
are the diversification (ka,0) of countries and the ubiquity (kp,0) of products. Following from (3)
and (4), the average ubiquity of a country’s exports is given by whereas the average
diversification of a product’s exporters is given by . The recursive nature of the method of
reflections allows us to characterize the structure of the bipartite network by defining N
variables for each one of its partitions. For example, continuing the characterization of the
country-product network into a third layer of analysis in which , the average κ1 of a
country’s exports, and ,the average k1 of a product’s exporter, is considered, allows us to
7
characterize countries and products through a three dimensional phase space spanned by
, , and , , .
In principle we can use the method of reflections to characterize countries and products
by N variables. The method of reflections can be generalized by choosing different values for k0
and κ0 and iterating over them using (3) and (4). In fact, the measure of product sophistications
PRODY [15] can be seen as a special case of the method of reflections in which ka,0 is the
GDP(PPP) of a country and Maα is a matrix of RCAs. In such a case then PRODY=ka,1. When these
variables were constructed, however, the authors were not aware that their methods were
combining income information with the structure of a bipartite network.
THE VARIABLES FOR THE FIRST THREE LEVELS
Table S 1 shows how we interpret the first three pairs of variables describing the
country-product network through the method of reflections:
Description:
Definition
,
Working Name
Diversification
,
Ubiquity
,
,
,
,
,
,
,
,
Short summary
Question Form
Number of products exported by country a.
How many products are exported by country a?
Number of countries exporting product α.
How many countries export product α?
Average ubiquity of the products exported by country a.
How common are the products exported by country a?
Average diversification of the countries exporting product α.
How diversified are the countries that export product α?
Average diversification of countries with an export basket similar to country a
How diversified are countries exporting goods similar to those of country a?
Average ubiquity of the products exported by countries that export product α.
How ubiquitous are the products exported by product’s α exporters?
Table S 1 Interpretation of the bipartite network description obtained from the method of reflections.
INTERPRETING HIGHER REFLECTIONS
As we iterate the method of reflections, it becomes increasingly harder to interpret the
variables generated by it. We can gain insight into what higher reflection variables stand for by
analytically solving the recursion formulas presented in (3)-(6). Analytically solving the recursion
r
r
r
r
requires us to be able to express k N and κ N as a function of the initial conditions, k0 and κ 0 .
Mathematically (3)-(4) we search for solutions of the form:
8
r r
k a,N = ∑ Cab, N (k0 , κ 0 )k b ,0
r r
, κ α,N = ∑ Cαβ , N (k0 , κ 0 )κ β , 0
b
(7)
β
r
To illustrate this we calculate the elements k 2 as an example. According to the
r
definitions of the method shown in (3)-(6) the elements of k 2 can be expressed as:
ka , 2 =
1
1
M aα κα ,1 =
∑
∑ κα ,1
ka , 0 α
ka ,0 {a}α
(8)
Where {a}α is the set of the α neighbors of a. We can use (4) to rewrite (8) as
ka , 2 =
1
1
∑
∑ kb , 0
ka , 0 {a}α κα , 0 {α }b
(9)
Which can be taken into the form (7) by permuting the sums and changing the index of
the first summation to a sum over the second neighbors of a, and the index of the second
summation to a sum over the neighbors of a and b.
ka ,2 =
1
1
kb, 0
∑
∑
k a , 0 {{a}}b {a∩b}α κ α , 0
(10)
Which satisfies the form presented in (7) with
r r
1
1
Cab, 2 (k0 , κ 0 ) =
∑
ka , 0 {a ∩ b}α κ α , 0
(11)
We can interpret ka,2 from the form presented in (10) by noticing that ka,2 is a linear
r
combination of the elements of k0 with coefficients given by product of the degrees of all
nodes lying in the path connecting nodes a and b, including node a but not node b. Hence the
r r
coefficients Cab , 2 (k0 , κ 0 ) can be interpreted as the probability that a random walker that
started at a ends up at b after two steps.
r
The random walker interpretation of the method of reflections is true not only for k 2 but
for any N. Fig S 3 shows an example of a three node network in which some of the coefficients
9
associated with N=4 are presented explicitly.
explicitly. Hence the method of reflections is a way to
express the properties of a node in a network as a combination of the properties of all its
neighbors, the coefficients of the linear combination being the probability that two nodes are
connected by a random walker
alker after N steps.
The coefficients of the expansion can be interpreted as a measure of similarity between
the nodes in the network, which is context dependent, as what matters in the expansion is the
relative weight of these coefficients when compared to each other.
r r
k a,N = ∑ Cab , N (k 0 , κ 0 )k b , 0
b
= ka (
1 1 1 1
1 1 1 1
+ ... +
+ ...) + kb (....) + k c (...)
kb κ γ kb κ γ
kb κ β ka κ α
Fig S 3 Example showing how the method of reflections can be seen as an expansion of the properties of a node as a function of the
properties of other nodes in the network with weights given
given by the product of the inverse of the degrees of each node traversed in the path
connecting them.
Finally, we would like to mention that while higher order reflections do extract
increasingly more relevant information about the productive structure of a country, as
measured by how they are related to income and growth, it is important to mention that as N->
N
∞ all variables will progressively converge to the a similar value. Surprisingly, we find the tiny
deviations of these values to be extremely informative.
10
A SIMPLE EXAMPLE
In this section we explain the method of reflections using a simple example in which a
network composed of four countries and four products is considered (Fig S 4).
Products
Countries
p1
C1
p2
C2
C3
p3
C4
p4
Fig S 4 A simple network used to exemplify the method of reflections.
In this example, the diversification of countries and the ubiquity of products is given by:
kc1,0=4
kc2,0=1
kc3,0=2
kc4,0=1
kp1,0=1
kp2,0=2
kp3,0=2
kp4,0=3
Next, we calculate higher reflections of the method (or iterations). The first reflection
consists of the average ubiquity of country’s products and of the average diversification of a
product’s exporters and is given by:
kc1,1=(1/4)(1+2+2+3)=2
kc2,1=(1/1)(2)=2
kc3,1=(1/2)(2+3)=2.5
kc4,1=(1/1)(3)=3
kp1,1=(1/1)(4)=4
kp2,1=(1/2)(4+1)=2.5
kp3,1=(1/2)(4+2)=3
kp4,1=(1/3)(4+2+1)=2.33
11
The second reflection is given by the average first reflection values of a node’s
neighbors.
kc1,2=(1/4)(4+2.5+2.25+2.5)=2.9583
kc2,2=(1/1)(2.5)=2.5
kc3,2=(1/2)(3+2.333)=2.66
kc4,2=(1/1)(2.333)=2.33
kp1,2=(1/1)(2)=2
kp2,2=(1/2)(2+2)=2
kp3,2=(1/2)(2+2.5)=2.25
kp4,2=(1/3)(2+2.5+3)=2.5
We can use this example to illustrate how the method of reflections is able to
differentiate between different countries based only on information regarding which country
exports which product. In this example, the most diversified country is c1, which exports all four
products while there are two countries, c2 and c4, that only export a single product. The sole
export of c2 however, is a relatively non ubiquitous product that is exported only by c1, the
most diversified country, while the sole export of c4 is a product that is exported by all
countries except c2.
As we iterate the method we find that there is important information encoded in the
relative position of countries and products relative to one another. For example, when we look
at the values characterizing countries after the second reflection (kc,2) we can see that country
c1 comes up ahead, followed by country c3, c2 and c4. The method places country c2 ahead of
c4 because by the second reflection it is already considering that country c2 produces a non
ubiquitous product that is found only in diversified countries, probably signaling that country c2
has a relatively good endowment of capabilities and produces a small number of products
because of other reason, such as being of relatively small size. On the contrary, c4 produces a
product that is ubiquitous and it is found in diversified and non diversified countries, probably
indicating that is a simple product which is accessible to countries with relatively simple
productive structures. Hence while both, c2 and c4 produce the same number of products, the
method can differentiate between them and considers c2 to have a more complex productive
structure than c4.
While small in size this example illustrates how the method of reflections can be used to
characterize the structure of a bipartite network and how this can be applied to help the
understanding of the productive structure of countries and the sophistication of products.
12
SECTION 5: BIPARTITE NETWORK STRUCTURE MEASURED IN OTHER DATASETS
In this section we present two additional kc,0-kc,1 diagrams constructed using data
aggregated according to the Harmonized system and according to the North American Industry
Classification System (NAICS).
35
MDV
30
GUY
BDI
qk1
25
20
15
BLZ
KNA
NCL
TTO
MWI
PAN
CRI
MKD
HND ALB
NIC
OMN
SDN
MAR
MDA
BEN
LVA
VCT TGO
GT M LTU
GMB
DMA
BRB
LCA
HRV
ECU
CYP
NER
MNG
UGA
SEN
ZMB
CPV
AZE
T ZA
GRC
EST
COL
ROM
ARM
MUS
BGR
URY T UN
BLR
VEN
PRTT URDNK
PER
NZL
ISL
QAT
LUX
HUN
GEO
KGZ CHL
POL
JOR
UKR
SVK
SVN
IRN
PHL
BOL
MEX
PYF MLT
ARG
IDNTHA
ZAF
CAN
AUT
ISR
NLD
BRA
SWE
NOR
IND
SAU
FIN
KAZ
AUS
KOR
IRL
MYS
GBR
MSR
RUS
HKG
SGP
CHE
T WN
ESP
CZE
BEL
FRAITA
CHN
DEU
USA
JPN
10
0
200
400
600
k0
Fig S 5 kc,0-kc,1 diagram constructed using data containing 103 countries and 1241 products aggregated according to the Harmonized System.
13
55
50
MDV
BDI
BLZ
MLI
ST P
45
kq1
40
35
GMB
GRL
NER
FJI
BGD GUY
CUB
GHA
MDG
BENDMAUGA
MOZ
PNG ET H
LSO
NGA
CMR
MRT
SUR
VCT T GO
KEN
ECU
IRN
JAM CIV
SDN
TZANIC
AT G
MWI
NPL
NAM
HND
DZA FRO
COM
PER
MNG
ZMB
MART TO
GIN
CPV
GAB
CAF
KHM
SAU
MAC
LCA
OMN
NCL
T KM
BWA
KWT
BFA SEN
ISLMUS
BHS
AZE
KNA
BHR
30
EGY
SWZ PAN
CHL
MDA
PRY BOL
GRD
MYT
NOR
ARM
VEN
KAZ
25
GTM
ZWE
NZL
MKD
URY
BRB CYP
TUN
CRI
LTU
SLV
LBN
ARG
GEO COL
EST
LVA
JOR
BGR
ZAF
ALB
TUR
GRC
KGZ
HRV
AUS
PHL
IDN IND
ROM
BLR
PRT
T HA
UKR
BRA
MLT
MYSIRL
ISR
ADO
RUS
LUX
POL
ESPNLD
BEL
SVKHUN CHN
CAN
DNK
SVN
MEX
PYF
QAT
YUG
CZE
FRA
HKG
KOR
IT A
FIN
AUT
SWE
GBR
20
15
0
USA
DEU
CHE
SGP
JPN
50
100
k0
150
200
Fig S 6 kc,0-kc,1 diagram constructed using data containing 150 countries and 318 products aggregated according to the NAICS.
SECTION 6: RANDOMIZING A BIPARTITE NETWORK
To decide whether the structure of a network is trivial,* we need to compare it to an
appropriate null model. The four null models we introduce in this section are an extension of
the randomization algorithms introduced by Maslov and Sneppen [16] to analyze degree
correlations in protein interaction networks. Our case differs from theirs in that we are dealing
with a bipartite network rather than with a simple graph.
The idea behind the randomization procedure is that we can create a null model starting
from the data we want to analyze by shuffling the links of the network while conserving some
of its statistical properties. The most popular version of this randomization procedure, which
was designed for simple graphs†, consists of randomizing the links in the network by permuting
the nodes at the end of a pair of links. For example, if we consider a simple graph containing
the links {a,b} and {c,d}, then an allowed randomization step would consist of replacing these
two links by the pairs {a,d} and {b,c}, given that the {a,d} and {b,c} links were not already part of
*
†
Expected from chance
Simple Graph is a network in which there is only one type of nodes, and connections are strictly binary (0 or 1).
14
the network. The randomization procedure described above conserves the number of links in
the network as well as its degree‡ sequence and degree distribution. This is because the
randomization procedure conserves the exact number of connections of each node, making it a
good null model to compare properties of a network while controlling for the degree of nodes,
which is the most fundamental property of a network.
In the case of a bipartite network, we have two separate degree sequences, one for
each of its partitions. Here we introduce four null models to control for all possible
combinations of degree sequences. Null Model 1 is a network with the same number of nodes
and links as the original network, yet in Null Model 1 connections have been randomly
assigned. Null Model 1 is the less stringent of our Null Models and represents a network with
the same number of links as the original network, but with a random degree sequence for both
partitions. Null Model 2 controls for the degree sequence of one partition of the network, while
randomizing the target of those links in the other partition. Null Model 2 represents a network
with a diversification sequence matching the one in the observed data, yet in Null Model 2 the
products exported by a country have been randomly assigned. Null Model 2 also conserves the
total number of links in the network. Null Model 3 is symmetric to Null Model 2 in the sense
that it represents a network with the same ubiquity distribution as the one observed in the
data, but where the exporters of each product have been randomly assigned. Finally, Null
Model 4 is a model obtained by permuting links in the network such that the diversification of
countries and the ubiquity of products are exactly the same as those observed in the empirical
data.
It is important to notice that as Null Models become more stringent, the number of
possible permutations that can be performed in the randomization procedure drops
substantially. The possible number of permutations that can be performed in a randomization
procedure does not only depend on the stringency of the null model, but also on the structure
of the original network. For example, if we consider a bipartite network that can be
represented by a triangular adjacency matrix (for simplicity assume that the number of
‡
Degree: The number of links a node has. Degree Sequence: List containing the degrees of all nodes in the
network.
15
products is equal to the number of countries and that Mcp= 1 c<p; Mcp=0 otherwise), then there
is not a single possible permutation that could be performed using the fourth null model. For
such a case, Null Model 4 is equivalent to the original network.
NULL MODEL SUMMARY
Null Model
Number of links
kc,0 sequence
kp,0 sequence
<kc,0>
<kc,1>
< kp,0>
< kp,1>
Null Model 1
= Mcp
≠Mcp
≠Mcp
= Mcp
≠Mcp
= Mcp
≠Mcp
Null Model 2
= Mcp
= Mcp
≠Mcp
= Mcp
≠ Mcp
= Mcp
≠ Mcp
Null Model 3
= Mcp
≠Mcp
= Mcp
= Mcp
≠ Mcp
=Mcp
≠ Mcp
Null Model 4
= Mcp
= Mcp
= Mcp
= Mcp
≠ Mcp
=Mcp
≠ Mcp
Table S 2 Summary null model behavior. <> stands for the average of a quantity.
SECTION 7: THE K P , 0 -K P , 1 DIAGRAM
We compare the kp,0-kp,1 diagram obtained from our data with the one from our four
null models (Fig S 7), finding that the structure of the country-product network is characterized
by a strong negative correlation between kp,0-kp,1 and a wide range of kp,1 values that cannot be
explained by any of the four null models. This result becomes even more evident when we
study higher order reflections of the method (see SM section 7). Products from different
sectors are colored according to the ten root categories in the SITC-4 classification, showing
that while there is a correspondence between the kp,0-kp,1 diagram and the SITC-4 classification,
there are important variations among similarly classified products. For example, this graph
shows that natural resource-based products, such as minerals and fuels, exhibit a wide range of
ubiquities (kp,0) at approximately constant diversification of its exporters (kp,1), meaning that
16
raw materials are on average exported by poorly diversified countries regardless of being
relatively ubiquitous like coniferous wood (kp,0=43, kp,1=115 ), or rare as tin ore (kp,0=8, kp,1
=109 ). On the other hand, products classified as machinery show variation in the level of
diversification of their exporters (kp,1) at relatively low ubiquities (kp,0). Hence the kp,0-kp,1
diagram can separate simple machines produced in less-diversified countries, such as handheld
calculators, (kp,0 =7,kp,0 =144 ) from more complex machines produced in diversified countries
such as motorcycles (kp,0 =5,kp,1 =270 ).
17
Fig S 7 Method of reflections and products characteristics. A, Schematic explanation of the kp,0− kp,1 space to characterize products. B, kp,0−
kp,1 diagram for null models. C, kp,0− kp,1 diagram for the empirically observed exports data.
18
SECTION 8: A THIRD REFLECTION VIEW OF THE STRUCTURE OF THE COUNTRY-PRODUCT
NETWORK
Here we continue the analysis presented in the manuscript to a third layer of analysis in
which we show figures characterizing countries by kc,0,kc,1,kc,2 and products by kp,0,kp,1,kp,2 (Fig S
8-Fig S 11).
Fig S 8 Scatter plot for kc,0 and kc,2 for the original data in the year 2000 and the four null models.
Fig S 9 Scatter plot for kc,1 and kc,2 for the original data in the year 2000 and the four null models.
19
Fig S 10 Scatter plot for κ and κ2 for the original data in the year 2000 and the four null models.
Fig S 11 Scatter plot for κ1 and κ2 for the original data in the year 2000 and the four null models.
SECTION 9: NULL MODELS AND GDP
In this section we present scatter plots between GDP per capita and the first two
variables of the method of reflections characterizing the structure of bipartite networks created
from our four null models (Fig S 12, Fig S 13).
20
Fig S 12 Scatter plot between GDP and bipartite network properties for countries (k=kc,0, k1=kc,1) and Null Models 1 and 2
Fig S 13 Scatter plot between GDP and bipartite network properties for countries (k=kc,0, k1=kc,1) and Null Models 3 and 4
21
SECTION 10: THE METHOD OF REFLECTIONS AND COUNTRY RANKINGS (YEAR 2000)
Fig S 14 Relative ranking of countries based on the Method of Reflections for the year 2000
22
SECTION 11: THE METHOD OF REFLECTIONS AND POPULATION
Economic output is usually measured in per capita terms, as the goal of development is
to generate and distribute wealth in the most democratic way possible. Yet there are some
other variables in which the per capita idea does not apply as directly as it does for income. One
example is diversification, which in our formalism is represented by kc,0. While in principle we
might be tempted to consider the per capita level of diversification, as a good indicator of the
diversification that can be attributed to each individual in a population, it is important to
consider that such normalization assumes that the level of diversification grows linearly with
the number of people. This, however, would not be a careful way of measuring the amount of
diversification that should be attributed to each individual in a population, as the number of
different products a group of people can make might well depend on the possible number of
interactions, and hence go as the square of the population, or could depend on a more complex
function that is hitherto unknown. Normalizing diversification by the number of individuals in a
population can therefore be considered naïve, as it assumes a linear functional form as the
correct normalization for a variable that does not necessarily depends linearly in the
population.
The diversification of a country kc,0, however, does depend on a country’s population
(Table S 3 column 1). Hence, we still need a variable that would give us a measure of the level
diversification of a country that is independent of its number of inhabitants. In Table S 3 we
present the dependence of our first four measures of diversification (kc,0,kc,2,kc,4,kc,8) on
population,
showing that higher order reflections of the method generate measures of
diversification that are independent of a country’s population, and are therefore good
indicators of the level of diversification of a country that is due to the complexity of its
economy rather than to its population.
23
VARIABLES
Log kc,0
Log kc,0
Log kc,4
Log kc,8
Log Population
t-test
Constant
t-test
Observations
Adjusted R2
0.190***
(4.812)
1.272**
(2.005)
127
0.150
0.0168**
(2.168)
4.708***
(37.63)
127
0.029
0.00343
(1.488)
5.004***
(134.7)
127
0.010
0.000267
(1.198)
5.081***
(1415)
127
0.003
Table S 3 Correlation between population and successive generations of measures of diversification constructed from the method of
reflections (** statistically significant at the 5% level, *** statistically significant at the 1% level).
SECTION 12: SHARES OF PRODUCTS IN THE WORLD
One critique of our methods that can be raised is that the SITC-4 classification is more
disaggregated for goods produced by richer countries, as rich countries are the ones that
created the classification system. A classification bias in that direction would overstate the level
of diversification of rich countries and understate that of poor countries.
We have shown that our results do not depend on the level of aggregation by
considering two additional datasets aggregated according to different classification systems,
which summarize all tradable goods using a different number of product classifications. Here
we complement this test of the validity of our methods by looking at the share in world trade
associated with each product in the SITC-4 classification (Fig S 15), finding that, contrary to the
critique presented above, industrialized country products have large shares in total trade,
indicating that they are not more narrowly classified than agricultural products and raw
materials (except oil) when benchmarked by their share in world trade. In simpler terms, if we
were to further disaggregate products into categories to achieve more homogenous shares in
world trade, we would have to disaggregate cars into classes, like SUVs, sedans and compacts
rather than melons into different types, indicating that the data behaves in the opposite way
than what the critique suggests.
24
Fig S 15 Share in world trade for products sorted by SITC-4 code.
Table S 4 and Table S 5 respectively show the five products with smallest, largest share in world
trade.
SITC-4
Code
Product Names
6553
19
6344
3415
2652
Knitted/crocheted fabrics elastic or ruberized
Live animals of a kind mainly used for human food
Wood-based panels N.E.S.
Coal gas, water gas, producer gas & similar gases
True hemp, raw or processed, not spun; tow and waste
World Market Share
in the year 2000
(Total World Trade = 1)
-8
3.2x10
-8
5.3x10
-7
1.7x10
-7
5.5x10
-7
8.0x10
Table S 4 The five products with the smallest world share in the year 2000.
SITC-4
Code
7810
3330
7764
7849
7599
Product Names
Passenger motor cars, for transport of pass. & good
Petroleum oils & crude oils obt. from bitumen minerals
Electronic microcircuits
Other parts and accessories of motor vehicles
Parts and accessories suitable for calculating and data processing
machines
World Market Share
in the year 2000
(Total World Trade = 1)
0.0494
0.0493
0.0329
0.0225
0.0214
Table S 5 The five products with the largest world share in the year 2000
SECTION 13: NETWORK STRUCTURE, INCOME AND GROWTH
In this section we present regressions showing how the structure of the bipartite
network is connected to income and economic growth. We also compare the performance of
our structural measures to two other measures of diversity: the Hirschaman-Herfindahl (H-H)
index and Entropy.
25
The HH index is a measure of market concentration commonly used for antitrust
purposes, yet it has also been used as a measure of diversification. The H-H index (H) is defined
as:
(12)
where Scp is the share of product p in the export basket of country c. An alternative method to
measure the diversification of a country’s export basket is to consider its entropy, which is
defined as:
log (13)
High entropy values are characteristic of diversified export baskets, whereas low
entropy values are associated with export baskets that are concentrated in a small number of
products.
We present the results of our regressions as tables (Table S 6-Table S 9). To help the
reader understand the information contained in these tables, we have created a figure
explaining how to read these regression tables (Fig S 16):
26
Fig S 16 How to read regression tables
In this section we present regression tables between E, H, kc,0, kc,1, kc,4, kc,8, kc,12, kc,18 and
income per capita adjusted by power-purchasing parity (Table S 6) and E, H, kc,0, kc,1, kc,4, kc,5,
kc,8, kc,9, kc,18, kc,19 and economic growth for a 20 year period (Table S 7), two ten year periods
(Table S 8) and four five year periods (Table S 9). Additionally, we present regression results for
four five year periods with fixed country effects (Table S 10). A fixed country effect regression
means that dummy variables were introduced to capture all the variation between countries,
hence the quantity we look for here is the within R2, which is the variation in growth explained
by the productive structure after controlling for all between-country variations. Technically
27
dummy variables are defined as 0 for all countries except one. In fixed effect regressions we
introduce one of these variables per country considered.
Table S 5 studies the relationship between the level of income in 2000, as measured by
the log of GDP per capita at purchasing power parity, and different measures of productive
structure. Columns 1 and 2 use pre-existing measures of diversification, in particular the
entropy and the H-H index. The first can explain 37.7 percent of the variance in income per
capita, while the second can only account for 17.6 percent, as shown by the R2 of the
regression. Columns 3 to 8 use successive iterations of our method. Diversification kc,0 explains
34.5 percent of the variance; kc,1 explains 37.8 percent, and subsequent variables converge to
53 percent by the 8th reflection, with higher order variables adding little additional power.
Columns 9 to 11 show a “horse race” between kc,18 and the pre-existing measures taken one at
the time or simultaneously. It shows that kc,18 contains much more information than the others
do, as reflected in the fact that adding them increases the R2 very little vis a vis column 8 but
much more vis a vis columns 1 and 2. Table S 6 does a cross-country regression of growth
between 1985 and 2005 and initial values of productive structure indicators. Columns 1–3 use
the entropy indicator, the H-H index and the two combined. Columns 4–8 use successive pairs
of k variables. Columns 9-11 present a horse race between the kc,18-kc,19 pair and the traditional
measures of productive structure, both separately and taken together. All regressions also
control for the initial level of GDP per capita. The results are similar to those of the previous
table. The variables we introduce do a better job at predicting the pattern of future growth and
higher reflections of the method have the largest predictive power. Interestingly, there is
complementary information in successive measures of our variables so that both appear
significant in the regression. kc,18-kc,19 contain more information than the traditional measures
and beat them in a horse race (equations 9-11).
Table S 7 repeats these regressions, splitting the sample into two periods of 10 years,
1985-95 and 1995-05, and finds similar results: pairs of k variables do a better job of explaining
growth than do the traditional variables, and the quality of the fit increases with each iteration.
A horse race between traditional and k variables shows that the bulk of the explanatory power
28
comes from the k variables, although the traditional variables have some residual information
that is statistically significant, although small. Table S 8 repeats the analysis using four 5-year
periods between 1985 and 2005 and finds similar results.
Table S 9 presents an equivalent set of regressions but controls for average fixed
country characteristics by including a dummy variable per country. This regression bases its
identification only in the within-country variation in growth and finds similar but even stronger
results. Our preferred specification – column 8 – is able to explain 33.72 percent of the withincountry variance, while adding the traditional variables only increases the explanatory power to
35 percent. The two traditional variables on their own (column 3) explain only 21.72 percent of
the within-country variance, indicating that the fit increases much more when adding the k
variables to the traditional variables (contrast of columns 3 and 11) than when adding the
traditional variables to the k variables (contrast column 8 and 11).
29
6.696***
(30.38)
125
0.377
(8.712)
8.914***
(71.88)
125
0.176
(-5.250)
-2.554***
(2000)
(2000)
0.552***
(2)
Log GDP per
capita ppp
(1)
Log GDP per
capita ppp
7.603***
(55.88)
125
0.345
(8.147)
0.00859***
(2000)
(3)
Log GDP per
capita ppp
Table S 6 Regression coefficients for income per capita
Observations
Adjusted R2
Constant
(2000)
kc,18
(2000)
kc,12
(2000)
kc,8
(2000)
kc,4
(2000)
kc,1
(2000)
kc,0
(2000)
Herfindahl
(2000)
Predictors
Entropy
Predicted
Variable
INCOME (YEAR 2000)
12.34***
(27.49)
125
0.378
(-8.740)
-0.159***
(2000)
(4)
Log GDP per
capita ppp
-9.796***
(-6.147)
125
0.513
(11.48)
0.116***
(2000)
(5)
Log GDP per
capita ppp
-185.6***
(-11.41)
125
0.533
(11.93)
1.201***
(2000)
(6)
Log GDP per
capita ppp
-1968***
(-11.94)
125
0.535
(11.99)
12.21***
(2000)
(7)
Log GDP per
capita ppp
(11.99)
-63581***
(-11.99)
125
0.535
392.6***
(2000)
(8)
Log GDP per
capita ppp
(6.854)
-52466***
(-6.853)
125
0.546
324.0***
(1.991)
0.157**
(2000)
(9)
Log GDP per
capita ppp
(9.923)
-59234***
(-9.921)
125
0.541
365.8***
(-1.552)
-0.639
(2000)
(10)
Log GDP per
capita ppp
30
(5.859)
-51109***
(-5.858)
125
0.543
315.6***
(0.329)
(1.275)
0.270
0.202
(2000)
(11)
Log GDP per
capita ppp
0.000993
-0.00176
0.0114
(0.751)
97
0.195
(3.650)
0.0137
(0.776)
97
0.115
(-2.765)
-0.0273***
(0.533)
(85, 05)
(85,05)
(-0.794)
0.00660***
(2)
Growth
(1)
Growth
0.00650
(0.437)
97
0.192
(0.760)
(2.600)
0.0116
(-0.882)
0.00828**
-0.00206
(85, 05)
(3)
Growth
(-0.497)
-0.00154
(85, 05)
(4)
Growth
0.0338
(0.922)
97
0.118
(-0.749)
(2.080)
-0.000612
6.62e-05**
Table S 7 Regression coefficients for a twenty year period of growth
Observations
Adjusted R2
Constant
(1985)
kc,19
(1985)
kc,18
(1985)
kc,9
(1985)
kc,8
(1985)
kc,5
(1985)
kc,4
(1985)
kc,1
(1985)
kc,0
(1985)
Herfindahl
(1985)
Entropy
(1985)
Predicted
Variable
Predictors
GDP per
capita ppp
20 YEAR GROWTH
-0.735*
(-1.883)
97
0.206
(1.737)
(2.866)
0.0321*
0.00169***
(-0.735)
-0.00249
(85, 05)
(5)
Growth
-19.29***
(-2.807)
97
0.247
(2.713)
(3.075)
0.890***
0.0338***
(-0.688)
-0.00223
(85, 05)
(6)
Growth
-69.21***
(-3.454)
97
0.202
(3.453)
0.401***
(-1.478)
-0.00470
(85, 05)
(7)
Growth
(2.928)
-23801***
(-2.940)
97
0.274
(2.952)
1127***
38.88***
(-0.758)
-0.00233
(85, 05)
(8)
Growth
(2.603)
-21475**
(-2.610)
97
0.274
(2.618)
1017**
35.05**
(0.931)
(-0.849)
0.00200
-0.00244
(85, 05)
(9)
Growth
(2.829)
-22808***
(-2.834)
97
0.268
(2.849)
1080***
37.26***
(-0.406)
-0.00414
(-0.804)
-0.00238
(85, 05)
(10)
Growth
31
(2.632)
-21801***
(-2.633)
97
0.268
(2.643)
1033***
35.57***
(0.454)
(0.896)
0.00723
(-0.831)
0.00322
-0.00242
(85, 05)
(11)
Growth
0.000334
-0.00235
0.0171
(1.356)
221
0.113
0.0226
(1.602)
221
0.077
0.0153
(1.087)
221
0.109
(0.285)
(-3.890)
(2.985)
0.00422
(4.962)
-0.0325***
0.00759***
(-1.349)
-0.00246
(85-95-05)
(3)
Growth
0.00699***
(0.209)
(85-95-05)
(85-95-05)
(-1.322)
(2)
Growth
(1)
Growth
-0.0186
(-0.971)
221
0.085
(2.543)
0.000916**
(3.967)
9.75e-05***
(0.595)
0.00112
(85-95-05)
(4)
Growth
Table S 8 Regression coefficients for two ten year periods of growth
Observations
Adjusted R2
Constant
(85,95)
kc,19
(85,95)
kc,18
(85,95)
kc,9
(85,95)
kc,8
(85,95)
kc,5
(85,95)
kc,4
(85,95)
kc,1
(85,95)
kc,0
(85,95)
Herfindahl
(85,95)
Entropy
(85,95)
Predicted
Variable
Predictors
GDP per
capita ppp
10 YEAR GROWTH
-0.168***
(-6.137)
221
0.170
(5.971)
0.00329***
(5.577)
0.00102***
(-2.062)
-0.00395**
(85-95-05)
(5)
Growth
-1.178***
(-5.215)
221
0.160
(5.594)
0.0152***
(5.056)
0.00577***
(-1.695)
-0.00311*
(85-95-05)
(6)
Growth
0.102***
(3.107)
221
0.068
0.789***
(2.709)
-65.48***
(-2.705)
221
0.168
(2.705)
0.310***
(4.312)
-96.21***
(-4.308)
221
0.137
(4.306)
(3.002)
0.00458***
(-1.899)
-0.00346*
(85-95-05)
(9)
Growth
1.158***
0.455***
(-3.577)
(-0.707)
-0.00119
(85-95-05)
(8)
Growth
-0.000660***
(2.188)
0.00310**
(85-95-05)
(7)
Growth
(3.433)
-79.20***
(-3.428)
221
0.158
0.954***
(3.428)
0.375***
(-2.494)
-0.0210**
(-1.327)
-0.00229
(85-95-05)
(10)
Growth
32
(2.699)
-65.78***
(-2.695)
221
0.164
0.792***
(2.695)
0.311***
(-0.112)
-0.00162
(1.648)
0.00434
(-1.850)
-0.00343*
(85-95-05)
(11)
Growth
0.000292
-0.00269*
0.0166
(1.497)
451
0.090
0.00798***
(6.280)
0.0142
(1.144)
451
0.089
(0.440)
0.0236*
(1.910)
451
0.062
0.00602
(-4.970)
0.00885***
(3.760)
(-1.785)
-0.00286*
(85-90-95-00-05)
(3)
Growth
-0.0373***
(0.211)
(85-90-95-00-05)
(85-90-95-00-05)
(-1.732)
(2)
Growth
(1)
Growth
(4)
Growth
-0.0160
(-0.933)
451
0.071
0.000953***
(2.853)
(5.351)
0.000121***
(0.220)
0.000361
(85-90-95-00-05)
Table S 9 Regression coefficients for four five year periods of growth.
Observations
Adjusted R2
Constant
(85,90,95,00)
kc,19
(85,90,95,00)
kc,18
(85,90,95,00)
kc,9
(85,90,95,00)
kc,8
(85,90,95,00)
kc,5
(85,90,95,00)
kc,4
(85,90,95,00)
kc,1
(85,90,95,00)
kc,0
(85,90,95,00)
Herfindahl
(85,90,95,00)
Entropy
(85,90,95,00)
Predicted
Variable
Predictors
GDP per
capita ppp
5 YEAR GROWTH
-0.173***
(-7.646)
451
0.136
0.00260***
(5.694)
(7.074)
0.00113***
(-2.431)
-0.00393**
(85-90-95-00-05)
(5)
Growth
-0.224***
(-4.198)
451
0.071
(5.504)
(3.474)
0.00312***
0.00102***
(1.767)
0.00224*
(85-90-95-00-05)
(6)
Growth
(4.494)
-0.142**
(-2.576)
451
0.127
(2.131)
0.00280***
(4.671)
-0.163***
(-2.846)
451
0.057
0.0349
(0.889)
451
0.013
(2.359)
0.00259***
0.000632**
0.000673**
0.00759***
(6.060)
(-1.686)
-0.00257*
(85-90-95-00-05)
(9)
Growth
(-1.147)
(2.479)
0.00310**
(85-90-95-00-05)
(8)
Growth
-0.000265
(2.553)
0.00326**
(85-90-95-00-05)
(7)
Growth
(4.523)
-0.132**
(-2.355)
451
0.101
(2.234)
0.00265***
0.000647**
(-4.765)
-0.0351***
(0.207)
0.000281
(85-90-95-00-05)
(10)
Growth
33
(4.493)
-0.145***
(-2.611)
451
0.125
(2.365)
0.00260***
0.000676**
(0.474)
0.00636
0.00851***
(3.680)
(-1.749)
-0.00275*
(85-90-95-00-05)
(11)
Growth
-0.0581***
-0.0585***
0.467***
(7.427)
451
0.2071
(4.478)
0.514***
(8.147)
451
0.1784
(-2.842)
-0.0390***
(-7.721)
(85-90-95-00-05)
(85-90-95-00-05)
(-7.911)
0.0134***
(2)
Growth
(1)
Growth
0.429***
(6.592)
451
0.2179
(2.117)
(4.037)
0.0585**
(-8.072)
0.0247***
-0.0595***
(85-90-95-00-05)
(3)
Growth
0.594***
(9.546)
451
0.2991
(6.549)
(3.710)
0.00238***
0.000223***
(-10.11)
-0.0773***
(85-90-95-00-05)
(4)
Growth
0.588***
(8.165)
451
0.3379
(9.287)
(2.922)
0.00366***
0.000537***
(-11.28)
-0.0863***
(85-90-95-00-05)
(5)
Growth
Table S 10 Regression table for four five year periods of growth considering fixed country effects.
Observations
Within R2
Constant
(85,90,95,00)
kc,19
(85,90,95,00)
kc,18
(85,90,95,00)
kc,9
(85,90,95,00)
kc,8
(85,90,95,00)
kc,5
(85,90,95,00)
kc,4
(85,90,95,00)
kc,1
(85,90,95,00)
kc,0
(85,90,95,00)
Herfindahl
(85,90,95,00)
Entropy
(85,90,95,00)
Predicted
Variables
Predictors
GDP per
capita ppp
5 YEAR GROWTH FIXED EFFECTS
0.589***
(8.070)
451
0.3373
(8.998)
(2.801)
0.00410***
0.000611***
(-11.78)
-0.0891***
(85-90-95-00-05)
(6)
Growth
0.651***
(8.203)
451
0.1779
0.000535***
(-2.808)
(-8.337)
-0.0651***
(85-90-95-00-05)
(7)
Growth
(8.799)
0.596***
(8.310)
451
0.3372
(2.801)
0.00419***
0.000601***
(-11.89)
-0.0899***
(85-90-95-00-05)
(8)
Growth
(8.164)
0.543***
(7.315)
451
0.3494
(3.051)
0.00395***
0.000653***
(2.453)
(-11.39)
0.00706**
-0.0868***
(85-90-95-00-05)
(9)
Growth
(8.521)
0.585***
(8.133)
451
0.3415
(2.879)
0.00410***
0.000618***
(-1.435)
-0.0181
(-11.58)
-0.0884***
(85-90-95-00-05)
(10)
Growth
34
(8.031)
0.511***
(6.583)
451
0.3535
(3.141)
0.00389***
0.000673***
(1.410)
(2.435)
0.0360
(-11.40)
0.0142**
-0.0867***
(85-90-95-00-05)
(11)
Growth
SECTION 14: ADDITIONAL RESULTS
PRODY AND EXPY
The variables PRODY and EXPY were introduced originally by Hausmann, Hwang and
Rodrik [15] to characterize the sophistication of products and of countries’ exports starting from
trade and income data. PRODY and EXPY allow us to study the income of countries from a
product-specific perspective.
DEFINITIONS
PRODY
The PRODY of a product is the average income per-capita associated with that product.
We can calculate PRODY using trade data as
!"# %
$ (14)
Where Scp is the share of product p in the export basket of country c, Gc is the income of
country c measured as GDP per capita adjusted for power purchasing parity and $ ∑ ' .
EXPY
The EXPY of a country is the average PRODY of its exports.
(# !"#
(15)
We notice that PRODY and EXPY mix income and network information as these variables
have a similar definition than the first two reflections of the method with k0=GDP per capita
and Mcp related to the shares of products in the export baskets of countries.
35
EXPY, K C, 0 , K C , 1
Here we complement our results on income by showing that k and k1 correlate with a
countries’ EXPY (Fig S 17).
Fig S 17 EXPY and bipartite network structure. a, Diversification (kc,0=k) versus EXPY. b, Average ubiquity of a country’s products (kc,1=k1)
versus EXPY.
Fig S 18 PRODY and bipartite network structure. A, Ubiquity (kp,0,) versus PRODY. b, Average ubiquity of a country’s products (kp,1) versus
PRODY.
NULL MODEL BEHAVIOR FOR PRODY AND EXPY, K C ,0 , K C , 1
Here we present the null model behavior for the relationships found between PRODY,
EXPY and the network structure (Fig S 19 - Fig S 22).
36
Fig S 19 Comparison between PRODY and EXPY with kc,0, kc,1, kp,0 and kp,1 for null model 1. A PRODY v/s kp,0 B PRODY v/s kp,1 C EXPY v/s kc,0 D
EXPY v/s , kc,1
Fig S 20 Comparison between PRODY and EXPY with kc,0, kc,1, kp,0 and kp,1 for null model 2. A PRODY v/s kp,0 B PRODY v/s kp,1 C EXPY v/s kc,0 D
EXPY v/s , kc,1
37
Fig S 21 Comparison between PRODY and EXPY with kc,0, kc,1, kp,0 and kp,1 for null model 3. A PRODY v/s kp,0 B PRODY v/s kp,1 C EXPY v/s kc,0 D
EXPY v/s , kc,1
Fig S 22 Comparison between PRODY and EXPY with kc,0, kc,1, kp,0 and kp,1 for null model 4. A PRODY v/s kp,0 B PRODY v/s kp,1 C EXPY v/s kc,0 D
EXPY v/s , kc,1
38
BIPARTITE NETWORK ANALYSIS AND PROXIMITY IN THE PRODUCT SPACE
We study the relationship between the analysis presented here and the proximity
between products in the product space by asking if products that are close in the κ−θ diagram
are proximate in The Product Space.
Proximity in the product space is defined as the minimum pair-wise conditional
probability of co-exporting products p1 and p2. We can express this as a function of M as:
)* + min /
∑ * + ∑ * +
0
1.
∑ *
∑ +
(16)
We expect pairs of products co-exported by a large fraction of countries (i.e. pairs of
products having a large φ) to have a similar kp,0 and kp,1. We control for randomness by using
our four null models, as these can be used to compare the relationship between kp,0 and kp,1
and φ for networks that are similar to Mcp. The four null-models allow us to study variations in
the relationships between kp,0, kp,1 and φ that come from the network structure, rather than
from their definition.
Proximity (φ) is a quantity associated with a pair of products. We compare φ to kp,0 and
kp,1 by measuring the Euclidean distance in the kp,0 and kp,1 space:
Δ* + 4* , + , 5 * , * , (17)
Δ* + 4Δ, 5 Δ, .
We study the relationship between the distance in the kp,0-kp,1 space and φ (Fig S 23) and
find that high proximity values are likely only among products close by in the kp,0-kp,1 diagram.
We notice that the null models do not give rise to proximities as high as the ones observed in
the original data, suggesting that the high observed co-production of some pairs of products
39
cannot be expected from chance, and hence, high proximity values indicate similarities
between the productive structures required to produce such pairs of products.
These results also show that a good φ threshold is to consider φ>0.5, as φ values above
that threshold are extremely rare in any of the four null models.
40
Fig S 23 Bipartite network structure and product proximity. The five plots show proximity as a function of the Euclidean distance between
products in the kp,0-kp,1 diagram.
41
REFERENCES
1
PG Lind, MC González, HJ Herrmann. Cycles and clustering in bipartite networks Phys. Rev. E, 72:056127 (2005)
R Guimerà, M Sales-Pardo, LAN Amaral. Module identification in bipartite and directed networks Phys. Rev. E,
76:036102 (2007)
3
S Lehmann, M Schwartz, LK Hansen. Bi-clique Communities Phys. Rev. E, 78:016108 (2008)
4
K-I Goh et al. The Human Disease Network, PNAS, 104:8685-8690 (2007)
5
W Souma, Y Fujiwara, H Aoyama Complex Networks and Economics Physica A 324:396-401 (2003)
6
A.-L. Barabási, R. Albert. Emergence of scaling in random networks Science 286:509–512 (1999)
7
Watts, D.J.; Strogatz, S.H..Collective dynamics of 'small-world' networks. Nature 393(6684): 409–10 (1998)
8
MEJ Newman. The structure of scientific collaboration networks PNAS, 98-404-409 (2001)
9
MEJ Newman. Scientific collaboration networks I Network Construction and Fundamental Results. Phys. Rev. E,
64:016131 (2001)
10
MEJ Newman. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality Phys. Rev.
E, 64:016132 (2001)
11
AL Barabási et al. Evolution of the social network of scientific collaborations Physica A 311:590-614(2002)
12
LAN Amaral et al. Classes of small-world networks PNAS 97:11149-11152 (2000)
13
H Jeong, Z Neda, A-L Barabasi Measuring preferential attachment in evolving networks Europhysics Letters 61:
567-572 (2003) (2003)
14
P Gleiser, L Danon Community Structure in Jazz arxiv/cond-mat/0307434 (2003)
15
R. Hausmann, Hwang, D. Rodrik (2007) Journal of Economic Growth, 12(1):1-25 (2007)
16
S Maslov, K Sneppen Specificity and stability in topology of protein networks, Science, 296:910-913 (2002)
2
42
Download