The building blocks of economic complexity César A. Hidalgo1 and Ricardo Hausmann aCenter for International Development and Harvard Kennedy School, Harvard University, Cambridge, MA 02138 Edited by Partha Sarathi Dasgupta, University of Cambridge, Cambridge, United Kingdom, and approved May 1, 2009 (received for review January 28, 2009) For Adam Smith, wealth was related to the division of labor. As people and firms specialize in different activities, economic efficiency increases, suggesting that development is associated with an increase in the number of individual activities and with the complexity that emerges from the interactions between them. Here we develop a view of economic growth and development that gives a central role to the complexity of a country’s economy by interpreting trade data as a bipartite network in which countries are connected to the products they export, and show that it is possible to quantify the complexity of a country’s economy by characterizing the structure of this network. Furthermore, we show that the measures of complexity we derive are correlated with a country’s level of income, and that deviations from this relationship are predictive of future growth. This suggests that countries tend to converge to the level of income dictated by the complexity of their productive structures, indicating that development efforts should focus on generating the conditions that would allow complexity to emerge to generate sustained growth and prosperity. economic development 兩 networks F or Adam Smith, the secret to the wealth of nations was related to the division of labor. As people and firms specialize in different activities, economic efficiency increases. This division of labor, however, is limited by the extent of the market: The bigger the market, the more its participants can specialize and the deeper the division of labor that can be achieved. This suggests that wealth and development are related to the complexity that emerges from the interactions between the increasing number of individual activities that conform an economy (1–3). Now, if all countries are connected to each other through a global market for inputs and outputs so that they can exploit a division of labor at the global scale, why have differences in Gross Domestic Product (GDP) per capita exploded over the past 2 centuries? (4, 5, *) One possible answer is that some of the individual activities that arise from the division of labor described above cannot be imported, such as property rights, regulation, infrastructure, specific labor skills, etc., and so countries need to have them locally available to produce. Hence, the productivity of a country resides in the diversity of its available nontradable “capabilities,” and therefore, cross-country differences in income can be explained by differences in economic complexity, as measured by the diversity of capabilities present in a country and their interactions. During the last 20 years, models of economic growth have often included the assumption that the variety of inputs that go into the production of the goods produced by a country affects that country’s overall productivity (3, 6). There have been very few attempts, however, to bring this intuition to the data. In fact, the most frequently cited surveys of the empirical literature do not incorporate a single reference to any measure of diversity of inputs or complexity (7). We can create indirect measures of the capabilities available in a country by thinking of each capability as a building block or Lego piece. In this analogy, a product is equivalent to a Lego model, and a country is equivalent to a bucket of Legos. Countries will be able to make products for which they have all of the necessary capabilities, just like a child is able to produce a Lego model if the child’s bucket contains all of the necessary Lego pieces. Using this analogy, 10570 –10575 兩 PNAS 兩 June 30, 2009 兩 vol. 106 兩 no. 26 the question of economic complexity is equivalent to asking whether we can infer properties such as the diversity and exclusivity of the Lego pieces inside a child’s bucket by looking only at the models that a group of children, each with a different bucket of Legos, can make. Here we show that this is possible if we interpret data connecting countries to the products they export as a bipartite network and assume that this network is the result of a larger, tripartite network, connecting countries to the capabilities they have and products to the capabilities they require (Fig. 1A). Hence, connections between countries and products signal the availability of capabilities in a country just like the creation of a model by a child signals the availability of a specific set of Lego pieces. Note that this interpretation says nothing of the processes whereby countries accumulate capabilities and the characteristics of an economy that might affect them. It just attempts to develop measures of the complexity of a country’s economy at a point in time. However, the approach presented here can be seen as a building block of a theory that accounts for the process by which countries accumulate capabilities. A detailed analysis of capability accumulation is beyond the scope of this article but the implications of our approach will be discussed briefly in Discussion. In this article we develop a method to characterize the structure of bipartite networks, which we call the Method of Reflections, and apply it to trade data to illustrate how it can be used to extract relevant information about the availability of capabilities in a country. We interpret the variables produced by the Method of Reflections as indicators of economic complexity and show that the complexity of a country’s economy is correlated with income and that deviations from this relationship are predictive of future growth, suggesting that countries tend to approach the level of income associated with the capability set available in them. We validate our measures of the capabilities available in a country by introducing a model and by showing empirically that our metrics are strongly correlated with the diversity of the labor inputs used in the production of a country’s goods, approximated by using data on the use of labor inputs in the United States. Finally, we show that the level of complexity of a country’s economy predicts the types of products that countries will be able to develop in the future, suggesting that the new products that a country develops depend substantially on the capabilities already available in that country. Methods We look at country product associations by using international trade data with products disaggregated according to 3 alternative data sources and classifications: First, the Standard International Trade Classification (SITC) revision 4 at the 4-digit level (see ref. 8; the data are available at www.nber.org/data, http://cid.econ. udavis.edu/data/undata/undata.html, and www.chidalgo.com/ Author contributions: C.A.H. and R.H. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. 1To whom correspondence should be addressed. E-mail: cesar㛭hidalgo@ksg.harvard.edu. *In ref. 4, Maddison presents GDP per capita measures for 60 countries since 1820. In that year, the ratio of the 95th to the 5th percentile was 3.18 but it increased to 17.82 by the year 2000. Today, the U.S. GDP per capita is ⬎60 times higher than Malawi’s. This article contains supporting information online at www.pnas.org/cgi/content/full/ 0900943106/DCSupplemental. www.pnas.org兾cgi兾doi兾10.1073兾pnas.0900943106 A Countries Capabilities Products a1 p1 c2 a2 p2 c3 a3 p3 c1 MYS PAK JPN PHL Countries Products c1 p1 c2 p2 c3 p3 B Node Color SITC-4 Category Name 0-999 Food & live animals 1000-1999 Beverages & tobacco 2000-2999 Raw materials 3000-3999 Mineral fuels, lubricants & related materials 4000-4999 Animal & vegetable oils, fats & waxes 5000-5999 Chemicals 6000-6999 Manufactured goods by material 7000-7999 Machinery & transport equipment 8000-8999 Miscellanous manufactured articles 9000-9999 Miscellaneous 35 Non-Diversified Countries Producing Standard Products Diversified Countries Producing Standard Products Non-Diversified Countries Producing Exclusive Products Diversified Countries Producing Exclusive Products 25 20 <k > SGP GBR 15 10 0 kc,0 100 200 300 kc,0 productspace/data.html); second, the COMTRADE Harmonized System at the 4-digit level; and third, the North American Industry Classification System (NAICS) at the 6-digit level (SI Appendix, Section 1). We interpret these data as bipartite networks in which countries are connected to the products they export (Fig. 1B). Mathematically, we represent this network using the adjacency matrix Mcp, where Mcp ⫽ 1 if country c is a significant exporter of product p and 0 otherwise. We consider country c to be a significant exporter of product p if its Revealed Comparative Advantage (RCA) (the share of product p in the export basket of country c to the share of product p in world trade) is greater than some threshold value, which we take as 1 in this exercise (RCAcp ⱖ 1) (see SI Appendix, Section 2). Method of Reflections. We characterize countries and products by introducing a family of variables capturing the structure of the network defined by Mcp (SI Appendix, Section 3). Because of the symmetry of the bipartite network, we refer to this technique as the ‘‘Method of Reflections,’’ as the method produces a symmetric set of variables for the 2 types of nodes in the network (countries and products). The Method of Reflections consists of iteratively calculating the average value of the previous-level properties of a node’s neighbors and is defined as the set of observables: kc, N ⫽ kp, N ⫽ 1 k c,0 冘 1 k p,0 冘 M cpk p,N⫺1, [1] M cpk c,N⫺1, [2] p c for N ⱖ 1. With initial conditions given by the degree, or number of links, of countries and products: Hidalgo and Hausmann USA DEU JPN 400 Fig. 1. Quantifying countries’ economic complexity. (A) A country will be able to produce a product if it has all of the available capabilities, hence the bipartite network connecting countries to products is a result of the tripartite network connecting countries to their available capabilities and products to the capabilities they require. (B) Network visualization of a subset of Mcp in which we show Malaysia (MYS), Pakistan (PAK), Philippines (PHL), Japan (JPN), and all of the products exported by them in the year 2000 (colored circles), illustrating how countries and products are connected in Mcp. (C) kc,0–kc,1 diagram divided into 4 quadrants defined by the empirically observed averages 具kc,0典 and 具kc,1典. kc,0 ⫽ 冘 冘 M cp, [3] M cp. [4] p kp,0 ⫽ c kc,0 and kp,0 represent, respectively, the observed levels of diversification of a country (the number of products exported by that country), and the ubiquity of a product (the number of countries exporting that product). Hence, we characterize each country through the vector kជ c ⫽ (kc,0, kc,1, kc,2 . . . kc,N) and each product by the vector kជ p ⫽ (kp,0,kp,1,kp,2, . . . ,kp,N). For countries, even variables (kc,0,kc,2,kc,4, . . . ) are generalized measures of diversification, whereas odd variables (kc,1,kc,3,kc,5, . . . ) are generalized measures of the ubiquity of their exports. For products, even variables are related to their ubiquity and the ubiquity of other related products, whereas odd variables are related to the diversification of countries exporting those products. In network terms, kc,1 and kp,1 are known as the average nearest neighbor degree (9,10). Higher order variables, however, (N ⬎ 1) can be interpreted as a linear combination of the properties of all of the nodes in the network with coefficients given by the probability that a random walker that started at a given node ends up at another node after N steps (see SI Appendix, Section 4). Results We can begin understanding the type of information about countries captured by the Method of Reflections by looking at where countries are located in the space defined by the first two sets of variables produced by our method: kc,0 and kc,1. Fig. 1C shows that there is a strong negative correlation between kc,0 and kc,1 (10, 11), meaning that diversified countries tend to export less ubiquitous products. Deviations from this behavior, however, are informative. For example, whereas Malaysia and Pakistan export the same PNAS 兩 June 30, 2009 兩 vol. 106 兩 no. 26 兩 10571 STATISTICS 30 > ECONOMIC SCIENCES <k kc,1 kc,1 C MWI FJI MDG HTIHND WSM c,0 SLV GTM NIC GMB GIN JAM GUY MUS BGD TGO CAF SDN MAC CRI DOM TKMMNG UGA KEN ALB MAR CMR SYR SEN NPL MOZ BDI PAK GAB BLZ TZA LVA BLR NCLETH LTU TJK MDA BHR NGATTOGHA EGY PNG AZEBOL LKA BFACIV LBN EST ZWE c,1 ZMBVEN ECU CYP BENISLARM PAN PHL KGZ DZA BHS HRV GEO JOR PER TUR SLE COL MLT RWASAU ROM IDN PRY GRC OMN MLIKNA CHL PRT NZL NERIRN BRB THA URY ZAFUKR SVKIND POL MEX ARG HUN SVN KAZ NOR AUSHKG CHN RUS DNK BRA CZE ESP ISR CAN ITA FIN KOR AUT MYS NLD IRL SWE 500 100 700 50 100 150 200 Capabilities 50 100 150 Capabilities 40 10 600 120 50 200 30 0 0 10 20 kc,0 35 Mcp 80 100 300 500 MYS FIN SWE JPN 110 120 130 HUN SVK ROMCAN JOR PHL SEN NOR HRV EST SVN DNK BRA KOR LVA BLR ALB IRL SGP THA PRT CHN IDN BOLLBN ISR NLD MEX UKR RUSLTU HKG BRB GIN BHS CRI OMN MAR TUR CYP MUS URY HNDZWE NZL TGO NPL KNA KAZ SLV IND COL MNG PRYMDA GHA ZAF GRC ECU BEN ARG GTM ZMBFJI TTO MLT KGZ PER CIV CAF MLI ISL MAC PAN AUS CHL DZA BGD MOZ MDG KEN UGA NER EGY GEO JAM MWI NCL BHR TZA GUY NIC TKM VEN CMR GMB ETH BDI BFA SAU SDN BLZ 100 50 kc,0 JPN AUT ITA CZE DEU POL ARM 0 Na=50 ESP 200 300 400 100 200 kc,0 300 15 kc,0 20 40 20 15 20 10 150 80 60 100 400 40 20 10 20 30 0 40 110 130 150 Na MYS FIN SWE JPN AUT ITA ARM SVK CZE CAN HUN JOR ROM NOR PHL POL SEN HRV EST SVN DEU DNK BRA KOR BLR ALBLVA IRL SGP IDN THA CHN ISRPRT BOL LBN NLD ESP MEX LTU UKR RUS HKG BRB GIN BHS CRI MAR OMN TUR CYPURY NZL TGO HNDMUS NPL ZWE KNA KAZ SLV IND COL MNG PRY GHA ZAF ECU GRC BEN ARG GTM MDA ZMB FJITTO MLT KGZ MLI PER CAFCIV ISL PAN CHL DZA BGD AUS MOZMAC KEN UGA MDG GEO JAM NERBHREGY MWITZA NCL GUY NICTKM VEN CMR ETH GMB BDI BFA SDN BLZ SAU PNG AZEGAB NGA 25 0 r=0.7 q=0.05 Na=200 Na MYS SWE FIN AUT ITA ARM SVK ROM CAN HUN CZE JOR DEU NOR PHL POL HRV SEN ESTBLR SVN DNK BRA KOR LVA ALB IRL SGP ISR CHN THA PRTIDN BOL LBN NLDESP MEX LTU RUS BRB HKGUKR BHS CRIGIN HND OMN MAR TUR CYP URY NZL TGO NPL MUS KAZ INDKNA COLZWE MNG SLV PRY GHA ZAF ECU GRC BEN ARG GTM MDA ZMB FJI TTO MLT KGZCIV MLI PER CAF ISL MAC PAN DZA AUS CHL BGD MOZ MDG KEN UGA NER EGY GEO MWI NCLTZA JAM BHR GUY TKM NIC VEN CMR GMB ETH BDI BFA SAU SDN BLZ IRN 100 25 50 30 100 0 GAB PNGAZE NGA IRN 90 Average Number of Labor Inputs 140 Products D k c,0 kc,1 10 0 700 300 40 15 120 200 50 20 100 100 60 25 50 30 20 0 70 kc,1 60 40 Na=200 30 40 r=0.7 Countries 20 30 30 kc,1 400 r=0.55 q=0.1 N =50 60 a kc,0 Πpa 300 Na=50 60 kc,1 Cca Na=200 C q=0.1 70 20 r=0.55 80 200 Products Countries 60 q=0.05 30 100 20 40 B kc,1 q=0.05 kc,0 r=0.7 kc,1 A PNG GAB AZE NGA IRN 30 35 120 140 k c,1 160 180 kc,2 200 Fig. 2. Capabilities and bipartite network structure. (A) We model the structure of Mcp by taking 2 random matrices representing the availability of capabilities in a country and the requirement of capabilities by products and consider that countries are able to produce products if they have all of the required capabilities. (B) The kc,0–kc,1 diagrams that emerge from 4 implementations of the model described in A. (C) kc,0 and kc,1 as a function of the number of capabilities (Nc) available in countries for 2 implementations of the model. (D) Average number of labor inputs required by products produced in a country as a function of the first 3 components of kជ c. number of products, the products exported by Malaysia (kMYS,0 ⫽ 104, kMYS,1 ⫽ 18) are exported by fewer countries than those exported by Pakistan (kPAK,0 ⫽ 104, kPAK,1 ⫽ 27.5). Combining this fact with our third level of analysis, we see that Malaysian products are exported by more diversified countries than the exports of Pakistan (kMYS,2 ⫽ 163 kPAK,2 ⫽ 142, SI Appendix, Section 8). This suggests that the productive structure of Malaysia is more complex than that of Pakistan, due, as we will show shortly, to a larger number of capabilities available in Malaysia than in Pakistan. In SI Appendix we show that the negative relationship presented in the kc,0–kc,1 diagram is not a consequence of variations in the level of diversification of countries and in the ubiquity of products. We prove this by creating 4 null models (11) that control, with increasing stringency, for the diversification of countries and the ubiquity of products and show that these distributions, per se, are not responsible for the negative relationship observed in the data (see SI Appendix, section 6). Minimalistic Model. We show that the location of countries in the kc,0–kc,1 diagram is informative about the capabilities available in a country by introducing a simple model based on the assumption that country c will be able to produce product p if it has all of the required capabilities (Fig. 2A). We implement this model by considering a fixed number of capabilities in each country and represent this by using a matrix Cca, that is equal to 1 if country c has capability a and 0 otherwise. We represent the relationship between capabilities and the products that require them by a matrix ⌸pa whose elements are equal to 1 if product p requires capability a and 0 otherwise. 10572 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0900943106 Using the notation introduced above, together with our only assumption, we can model the structure of the Mcp matrix as: Mcp ⫽ 1 if 冘 a ⌸ pa ⫽ 冘 ⌸ paC ca a and M cp ⫽ 0 otherwise [5] The simplest implementation of this model is to consider Cca ⫽ 1 with probability r and 0 with probability 1 ⫺ r and ⌸pa ⫽ 1 with probability q and 0 with probability 1 ⫺ q. An emergent property of the matrix resulting from this model is that the average ubiquity of a country’s products tends to decrease with its level of diversification for a wide range of parameters (Fig. 2B). We interpret this negative relationship by considering that countries with many capabilities will be more diversified, because they can produce a wider set of products, and that because they can make products requiring many capabilities, few other countries will have all of the requisite capabilities to make them, hence diversified countries will be able to make less ubiquitous products. The model allows us to test directly whether given this set of assumptions we should expect countries with more capabilities to be more diversified and produce less ubiquitous products. Fig. 2C shows that, in the model, the diversity of a country increases with the number of capabilities it poses, whereas the ubiquity of a country’s products is a decreasing function of the number of capabilities available in that country, providing further theoretical evidence that kជ c captures information on the availability of capabilities in a country, and therefore, about the complexity of its economy. Hidalgo and Hausmann STATISTICS ECONOMIC SCIENCES Fig. 3. Bipartite network structure and income (all GDPs have been adjusted by Purchasing Power Parity PPP). A–E were constructed with data from the year 2000. (A–C) GDP per capita adjusted by purchasing power parity as a function of our first 3 measures of diversification (kc,0,kc,2,kc,4), normalized by subtracting their respective means (具kc,N典) and dividing them by their standard deviations (stdev(kc,N)). (A) kc,0. (B) kc,2. (C) kc,4. (D) Comparison between the ranking of countries based on successive measures of diversification (kc,2N) (E) Absolute value of the Pearson correlation between the log GDP per capita at ppp of countries and theit local network structure characterized by kc,N. (F) Growth in GDP per capita at ppp observed between 1985 and 2005 as a function of growth predicted from kc,18 and kc,19 measured in 1985 and controlling for GDP per capita at ppp in 1985. Direct Measurement of a Subset of Capabilities. We provide empir- ical evidence that the method of reflections extracts information that is related to the capabilities available in a country by looking at a measurable subset of the capabilities required by products. Fig. 2D shows the average number of different employment categories required by products exported by countries versus kc,0, kc,1, and kc,2. We measure the number of employment categories that go into a product by using the data of the U.S. Bureau of Labor Statistics (see SI Appendix, Section 1). This data should play against us, because Hidalgo and Hausmann we are disregarding the fact that other countries may use different technologies to produce goods that are similarly classified†. Despite this, we find a strong positive correlation between the average †Indeed, it is common for poorer countries to exchange labor for capital. For example, building a road in the US is done by a relatively small team of workers, each of them specialized to operate a different machine or technique, whereas more modest economies will tend to use more workers, yet less specialized ones, because the relative cost of machines to labor is larger in poorer economies. Hence we should expect poor countries PNAS 兩 June 30, 2009 兩 vol. 106 兩 no. 26 兩 10573 A B 40 35 40 k1=0.83k1-1.83 35 WSM Pearson correlation = 0.63 t-test=9.17 p-value<2x10-15 25 20 15 10 5 0 HTI NIC MNGMDG TKM TGOSDNVEN PNG CAF BDI ZMB GUY TZAGTM TJK SLV NPL MDABLR ETH SEN FJI GIN GHA FIN ARM UGA BOL ALBDOM BHR RWA NCL AZE KGZ MOZ HND NGA BLZ PAN BEN BFA MAR ZWE MAC NER GMB JOR CYP SYR LTU LBN IRL PRY KEN SLECMR MYS ECU SVK GEO BHS MLT LVA PER EST SWE MLI PAK BRB OMN BGD UKR GAB AUS HKG DZAIRNTTOCIV KAZ CRI LKAZAF ARG PHL KNA EGY IDN ISL MUS TUR ROM URY PRT THA HRV SAU COL CAN IND CHL NORISR BRA HUN JAM NZLGRC POL SVN SGP RUS MEX CHN AUT ESP KOR DNK DEU ITA NLD GBR USA k=-0.051k+21.82 Pearson correlation = -0.73 t-test=11.8 p-value=6x10-22 10 C 20 50 kc,0 100 200 <kp,1> (new exports) 200 180 160 140 120 100 D 80 WSM MWI 30 HTI MDG NIC MNG TKMSDN TGO VEN PNG CAF GTM BDISLV TJK GUY TZA ETH ZMB BLRMDA NPL FJI SEN FIN BOL GHA ARM UGA ALB BHR RWAGIN DOM NCL KGZAZE MOZ HND NGA BLZBEN BFA GMB PAN MAR ZWE MAC NER JOR CYP SYR LTU LBN PRY KEN SLE ECU SVK IRL MYS GEO CMR LVA BHS MLT PER EST SWE MLI PAK BRB OMN BGD UKR AUS GAB TTO ZAF IRN KAZ CIV LKA MUS CRI HKG ARG DZA PHL ISLTUR IDN KNA EGY ROM URY PRT THA CANNOR IND CHL COL ISRSAU POLHRV BRA GRC HUN NZL JAM SVNRUS SGP MEX AUT CHN ESP KOR DNK DEU ITA GBR NLD USA 25 20 15 10 5 -5 12 Pearson correlation = 0.59 t-test=8.21 p-value<3x10-13 POL NOR KOR DNK ITA MEX NZL GBR SVN SAU JAM IRL NLD RUS GRCHRV AUT SVKBRB SWE DEU GMB CHL IDN ESP THA TUR EGY CHN EST SLE PHL HUN MUS COLURY IND ISL KNA LVA ROM CAN UKRSGP IRNGEO ZAF JOR MOZ BRA USA PAN MAC LTU CYP LBNISR ARMGABBFA ALBDOM LKA TTO BHR SLV MLI MAR PRT MDAPAK BGD NGA BHS NPL HKG BLR KEN CRI PRY PER KGZ NER ARG OMN MLT SYR BEN HND ZWE MYS DZA ECU FJI CAF CIV BDI AZE BLZ KAZ BOL NCL SEN VEN GTM PNG GHA ZMB TJK RWA TGO ETH MNG FIN UGA TZA NIC AUS MDG HTI TKM SDN CMR GUY MWI GIN WSM 0 500 240 k1=0.178k+146.2 220 <kp,0> (new exports) 30 14 16 18 20 22 24 26 kc,1 28 30 32 240 POL DNKNORKOR ITA MEXNZL SVN SAU JAM IRL NLD RUS HRV GRC DEU SVK AUT SWE BRB IDN GMB CHL ESP THA TUR EST EGY CHN SLE PHL HUN MUS URY IND LVA ISL COL KNA ROM UKR GEO IRN USA SGPCANBRA ZAF JOR MOZ PANGAB MAC LTU CYP ISR BHR PRT ALB ARM LKA TTOMLI BFASLV MAR BHSLBN DOM NPL MDA NGA PAK HKG BLR KEN CRI BGD PRY PER KGZ ARG NER OMN MLT SYR BEN MYS DZA ECU FJI ZWE CAFHND BDI AZENCL KAZ BOLCIV GTM SEN VEN BLZ PNG GHA ZMB TJK TGO RWA ETH MNG FIN UGATZA NIC AUS MDG HTI TKMSDN CMR GUY GINMWI 220 <kp,1> (new exports) <kp,0> (new exports) MWI GBR 200 180 160 140 120 100 k1=-2.99k1+230 Fig. 4. Path dependent development. Average network properties (具kp,0典, 具kp,1典; measured in 1992) of the new exports developed by a country between 1992 and 2000 as a function of the diversification of a country kc,0 and the average ubiquity of its products kc,1 measured in 1992. (A) kc,0 vs. 具kp,0典. (B) kc,1 vs. 具kp,0典. (C) kc,0 vs. 具kp,1典. (D) kc,1 vs. 具kp,1典. WSM Pearson correlation = -0.54 t-test=7.2 p-value<6x10-11 10 20 50 kc,0 100 200 500 80 12 14 16 number of employment categories going into the export basket of countries and our family of measures of diversification (kc,0, kc,2, kc,4, . . . ,kc,2N). We also find a negative correlation between the average number of employment categories and measures of the ubiquity of products made by a country (kc,1, kc,3, kc,5, . . . ,kc,2N⫹1) (Fig. 2D). This shows that more diversified countries indeed produce more complex products, in the sense that they require a wider combination of human capabilities, and that kជ c is able to capture this information. Complexity of the Productive Structure, Income and Growth. We show that the information extracted by the method of reflections is connected to income by looking at the first 3 measures of diversification of a country (kc,0, kc,2, kc,4) versus GDP per-capita adjusted for Purchasing Power Parity (PPP) (Fig. 3 A–C). To make these 3 different measures comparable we have normalized them by subtracting their respective means (具kN典) and dividing them by their respective standard deviations (stdev(kN)). As we iterate the method the relative ranking of countries defined by these variables shifts (Fig. 3D and SI Appendix, Fig. S14), making our measures of diversification and ubiquity increasingly more correlated with income (Fig. 3E and SI Appendix, Section 11). This can be illustrated by looking at the position, in the kc,N–GDP diagrams, of 3 countries that exported a similar number of products in the year 2000, albeit having large differences in income (Pakistan (PAK), Chile (CHL) and Singapore (SGP) Fig. 3 A–C). Higher reflections of our method are able to correctly differentiate the income level of these countries because they incorporate information about the ubiquity of the products they export and about the diversification of other countries connected indirectly to them in Mcp, altering their relative rankings (Fig. 3D and SI Appendix, Fig. S14). For example, kc,2 is to use less labor inputs in the production of products than what would be reported from U.S. labor data, accentuating the effect presented in Fig. 2D. 10574 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0900943106 18 20 22 24 26 kc,1 28 30 32 able to correctly separate Singapore, Chile and Pakistan, because it considers that in the bipartite network Singapore is connected to diversified countries mainly through nonubiquitous products, signaling the availability in Singapore of capabilities that are required to produce goods in diversified countries. In contrast, Pakistan is connected mostly to poorly diversified countries, and most of its connections are through ubiquitous products, indicating that Pakistan has capabilities that are available in most countries and that its relatively high level of diversification is probably due to its relatively large population, rather than to the complexity of its productive structure. Indeed, we find the method of reflections to be an accurate way to control for a country’s population, as correlations between kជ c and population decrease rapidly as we iterate the method (see SI Appendix, Section 11), whereas correlations between kជ c and GDP increase as we iterate the method. This is another piece of evidence suggesting that the information captured by our method is related to factors that affect the ability to generate per capita income. Deviations from the correlation between kជ c and income are good predictors of future growth, indicating that countries tend to approach the levels of income that correspond to their measured complexity. We show this by regressing the rate of growth of income per capita on successive generations of our measures of economic complexity (i.e., kc,0,kc,1 or kc,10,kc,11) and on a country’s initial level of income log 冉 GDP共t ⫹ ⌬t兲 GDP共t兲 冊 ⫽ a ⫹ b 1GDP共t兲 ⫹ b 2k c,N共t兲 ⫹ b 3k c,N⫹1共t兲, finding that successive generations of the variables constructed in the previous section are increasingly good predictors of growth. In SI Appendix, Section 13, we present regression tables showing that these results are valid for a 20-year period (1985–2005), two 10-year Hidalgo and Hausmann Discussion Understanding the increasingly large gaps in income per capita across countries is one of the eternal puzzles of development economics. Our view is that complexity is at the root of the explanation, as argued by both Adam Smith (1) and the recent endogenous growth theories (2, 3), yet empirical research has not advanced along these dimensions because of the absence of adequate measures of complexity. Instead, it has emphasized the ACKNOWLEDGMENTS. We thank M. Andrews, A.-L. Barabási, B. Klinger, M. Kremer, N. Nunn, L. Pritchett, R. Rigobon, D. Rodrik, M. Yildirim, R. Zeckhauser, participants at the Center for International Development’s Seminar on Economic Policy and the Harvard Kennedy School Faculty Seminar, members of the Center for Complex Network Research at Northeastern University, and the Ratatouille Seminar Series. We acknowledge support from the Growth Lab and the Empowerment Lab at the Center for International Development. 1. Smith A (1776) An Inquiry into the Nature and Causes of the Wealth of Nations (W. Strahan and T. Cadell, London). 2. Romer P (1990) Endogenous technological change. J Pol Econ 98:S71–S102. 3. Grossman GM, Helpman E (1991). Quality ladders in the theory of growth. Rev Econ Stud 58:43– 61. 4. Maddison A (2001) The World Economy: A Millennial Perspective (Development Centre of the OECD, Paris). 5. Pritchett L (1997) Divergence, big time. J Econ Perspec 11:3–18. 6. Aghion P, Howitt PW(1998) Endogenous Growth Theory (MIT Press, Cambridge, MA) 7. Barro RJ, Sala-i-Martin X(2003) Economic Growth (MIT Press, Cambridge, MA) 8. Feenstra RC, Lipsey RE, Deng H, Ma AC, Ma H (2005) World Trade Flows: 1962–2000. NBER Working Paper 11040. Available at www.nber.org/papers/w11040. 9. Pastor-Satorras R, Vazquez A, Vespignani A (2001) Dynamical and correlation properties of the internet. Phys Rev Lett 87:258701. 10. Maslov S, Sneppen K (2002) Specificity and stability in topology of protein networks. Science 296:910 –913. 11. Newman MEJ (2002) Assortative mixing in networks. Phys Rev Lett 89:208701. 12. Hirschman AO (1945) National power and structure of foreign trade (University of California Press, Berkley, CA). 13. Herfindahl OC (1950) Concentration in the steel industry (PhD Dissertation, Columbia University, New York) 14. Saviotti PP, Frenken K (2008) Export variety and the economic performance of countries. J Evol Econ 18:201–218. 15. Hidalgo CA, Klinger B, Barabási A-L, Hausmann R (2007) The product space conditions the development of nations. Science 317:482– 487. 16. Hausmann R, Klinger B (2006) The structure of the product space and the evolution of comparative advantage. CID Working Paper No. 128. Available at www.cid.harvard. edu/cidwp/128.htm. 17. Hidalgo CA, Hausmann R (2008) A network view of economic development. Developing Alternatives 12(1):5–10. 18. Hirschman AO (1958) The Strategy of Economic Development (Yale Univ Press, New Haven, CT). Hidalgo and Hausmann PNAS 兩 June 30, 2009 兩 vol. 106 兩 no. 26 兩 10575 STATISTICS accumulation of a few highly aggregated factors of production, such as physical and human capital or general institutional measures, such as rule of law, disregarding their specificity and complementarity. In this article we have presented a technique that uses available economic data to develop measures of the complexity of products and of countries, and showed that (i) these measures capture information about the complexity of the set of capabilities available in a country; (ii) are strongly correlated with income per capita; (iii) are predictive of future growth; and (iv) are predictive of the complexity of a country’s future exports, making a strong empirical case that the level of development is indeed associated to the complexity of a country’s economy. This article has not emphasized the process through which countries accumulate capabilities, but has instead focused on their measurement and consequences. However, the results presented here suggest that changes in a country’s productive structure can be understood as a combination of 2 processes, (i) that by which countries find new products as yet unexplored combinations of the capabilities they already have, and (ii) the process by which countries accumulate new capabilities and combine them with other previously available capabilities to develop yet more products. A possible explanation for the connection between economic complexity and growth is that countries that are below the income expected from their capability endowment have yet to develop all of the products that are feasible with their existing capabilities. We can expect such countries to be able to grow more quickly, relative to those countries that can only grow by accumulating new capabilities. This perspective also suggests that the incentive to accumulate capabilities would depend, among other things, on the expected demand that new capabilities would face, and this would depend on how new capabilities can complement existing ones to create new products. This opens up an avenue for further research on the dynamics of product and capability accumulation. Development economics has tended to disregard the search for detailed capabilities and their patterns of complementarity, hoping that aggregate measures of physical capital (e.g., measured in dollars) or human capital (e.g., measured in years of schooling) would provide enough guidance for policy. Our line of research would justify and provide guidance to development strategies that look to promote products (or capabilities) as a way to create incentives to accumulate capabilities (or develop new products) that could themselves encourage the further coevolution of new products and capabilities, echoing ideas put forward by Albert Hirschman (18) more than 50 years ago, but adding the capacity to analyze them in practice. ECONOMIC SCIENCES periods or four 5-year periods, and that it is robust to the inclusion of other control variables such as individual country dummies (to capture any time-invariant country characteristic) and outperforms other indicators used to measure the productive structure of a country such as the Hirschman-Herfindahl (12, 13) index and entropy measures (14). A graphical example of this relationship is presented in Fig. 3f, which compares the growth predicted from the linear regression described by Eq. 6 and that observed empirically for the 1985–2005 period and N ⫽ 18. Finally, we show that the evolution of Mcp exhibits strong path dependence, meaning that we can anticipate some of the properties of a country’s future new exports based on its current productive structure. This observation is consistent with the existence of an unobservable capability space that evolves gradually, because the ability of a country to produce a new product is limited to combinations of the capabilities it initially possesses plus any new capabilities it will accumulate. Countries with many capabilities will be able to combine new capabilities with a wide set of existing capabilities, resulting in new products of higher complexity than those of countries with few capabilities, which will be limited by this fact. We show this using data collected between 1992 and 2000 (we choose 1992 as our starting point because the end of the Soviet Union and the unification of Germany introduce large discontinuities in the number and identity of countries) and consider as a country’s new exports those items for which that country had an RCAcp ⬍ 0.1 in the year 1992 and an RCAcp ⱖ 1 by the year 2000. Fig. 4 shows that the level of diversification (kc,0) of a country and the ubiquity of its exports (kc,1), predicts the average ubiquity (具kp,0典) of a country’s new exports and the average level of diversification (具kp,1典) of the countries that were hitherto exporting those products. This result is related to the idea that the productive structure of countries evolves by spreading to ‘‘nearby’’ products in The Product Space (15–17), which is a projection of the bipartite network studied here in which pairs of products are connected based on the probability that they are exported by the same countries. This last set of results suggests that the proximity between products in the The Product Space is related to the similarity of the requisite capabilities that go into a product, because countries tend to jump into products that require capabilities that are similar to those required by the products they already export. SUPPLEMENTARY MATERIAL FOR: THE BUILDING BLOCKS OF ECONOMIC COMPLEXITY Cesar A. Hidalgo, Ricardo Hausmann Center for International Development and Harvard Kennedy School, Harvard University TABLE OF CONTENTS SECTION 1: SOURCE DATA 2 SECTION 2: REVEALED COMPARATIVE ADVANTAGE (RCA) 3 SECTION 3: THE COUNTRY-PRODUCT NETWORK 4 SECTION 4: BIPARTITE NETWORK ANALYSIS 6 SECTION 5: BIPARTITE NETWORK STRUCTURE MEASURED IN OTHER DATASETS 13 SECTION 6: RANDOMIZING A BIPARTITE NETWORK 14 SECTION 7: THE KP,0-KP,1 DIAGRAM 16 SECTION 8: A THIRD REFLECTION VIEW OF THE STRUCTURE OF THE COUNTRY-PRODUCT NETWORK 19 SECTION 9: NULL MODELS AND GDP 20 SECTION 10: THE METHOD OF REFLECTIONS AND COUNTRY RANKINGS (YEAR 2000) 22 SECTION 11: THE METHOD OF REFLECTIONS AND POPULATION 23 SECTION 12: SHARES OF PRODUCTS IN THE WORLD 24 SECTION 13: NETWORK STRUCTURE, INCOME AND GROWTH 25 SECTION 14: ADDITIONAL RESULTS 35 REFERENCES 42 1 SECTION 1: SOURCE DATA All of the figures presented in the main text of this paper were constructed using International trade data taken from Feenstra, Lipsey, Deng, Ma and Mo's "World Trade Flows: 1962-2000" dataset. This dataset consists of imports and exports both by country of origin and by destination, with products disaggregated to the SITC revision 4, four-digit level. The authors built this dataset using the United Nations COMTRADE database. The authors cleaned that dataset by calculating exports using the records of the importing country, when available, assuming that data on imports is more accurate than data from exporters. This is likely, as imports are more tightly controlled in order to enforce safety standards and collect customs fees. In addition, the authors correct the UN data for flows to and from the United States, Hong Kong, and China. We focus only on export data and do not disaggregate by country of destination. More information on this dataset can be found in NBER Working Paper #11040, and the dataset itself is available at www.nber.org/data. and http://cid.econ.ucdavis.edu/data/undata/undata.html We checked the validity of our results by using two additional datasets: COMTRADE classified according to the Harmonized System at the 4-digit level (1241 products, 103 countries) and the North American Industry Classification System (NAICS) (318 products, 150 countries). We found that our results are not affected by the use of data at these different levels of aggregation. We chose to work with the Feenstra dataset because, of the three datasets available, it is the one only one that has been cleaned and checked thoroughly as part of a dedicated research project. The labor data used to construct figure 2d was downloaded from the US Bureau of Labor and Statistics at http://www.bls.gov/data/ 2 SECTION 2: REVEALED COMPARATIVE ADVANTAGE (RCA) One way to empirically estimate whether a country is a significant exporter of a product is to calculate the Revealed Comparative Advantage (RCA) that that country has in a particular product. RCA is a measure constructed to inform whether a country’s share of a product’s world market, is larger or smaller than the product’s share of the entire world market. Mathematically, we can rewrite the above sentence by introducing Scp, as the share that country c has of the world market for product p, and Tp as the total share of product p of the world market. Using this notation, RCA can be written as RCAcp= Scp / Tp (1) (2) where RCA CUTOFFS, EXPORTS AND COUNTRIES’ LEVEL OF DIVERSIFICATION The natural cutoff used to determine whether a country has revealed comparative advantage in a product is RCA≥1. At this point the country’s share of that product’s market is equal or larger than the product’s share of the world market. The benchmark here is a world in which countries export an amount of each product equal to the share of that product in the world market times the size of its economy. From an empirical perspective, we can study the number of products (kc,0) for which a country has RCA as a function of the RCA cutoff. By performing this exercise we find that the RCAcp=1 cutoff lies on the phase transition of a softened step function (Figure S1). 3 Fig S 1 Diversification (kc,0) as a function of the RCA cutoff for all countries in the study What is interesting about looking at kc,0(RCA) from this empirical perspective is that we can see that there are a few countries that had exports in almost all of the 772 products exported in the year 2000. For example, Germany exported 758 products with an RCA≥0.01, and 707 products with RCA≥0.1, a profile similar to that of other industrialized countries like the U.K., U.S.A and Italy. Hence lowering the RCA threshold shows that industrialized countries manufacture and export products in almost all of the SITC-4 categories, and that specialization patterns are empirically driven by the lack of diversification of less developed countries, rather than by the absence of more productive economies in comparatively less sophisticated sectors. SECTION 3: THE COUNTRY-PRODUCT NETWORK Fig S 2 shows a simple visualization of the country product network for the year 2000 in which countries are located at the center of the figure and products are grouped into root SITC4 categories along the edges of the image. This network consists of 129 countries, 772 products and 13,470 links connecting countries and products when RCAcp≥1. The large number of links in the network limits our ability to create a useful visualization of the entire set of connections. 4 Fig S 2 Visualization of the country product network in which all exports with an RCA>1 are shown. 5 SECTION 4: BIPARTITE NETWORK ANALYSIS A bipartite graph or network is a set of nodes and links in which nodes can be separated into two groups, or partitions, such that links only connect nodes in different partitions. While in principle many networks can be separated into different partitions (for example every tree is a bipartite graph), here we concentrate on examples that are bipartite, by definition, rather than as a property. One example of naturally occurring bipartite networks are publication networks, where nodes are researchers and papers, and links connect researchers to the papers they have authored. Another example is the movie-actor network in which nodes are actors and movies, and links connect actors to the movies in which they have starred.. With the exception of a few studies [1,2,3,4], bipartite networks have mostly been investigated by projecting the network into one of its partitions [5,6,7,8,9,10,11,12,13,14], typically by considering nodes to be connected if they share a neighbor in the opposite partition [5,6,7,8,9,10,11,12,13,14]. For example, co-authorship networks link scientists that have co-authored one or more papers [8,9,10,11], whereas movie-actor networks connect actors that have appeared together in one or more movies. While valuable information can be obtained from these projections, there is important information that is left out by reducing the bipartite network into either one of its partitions, regardless of the sophistication of the projection method. Here we present a method to characterize the structure of a bipartite network by iteratively considering the properties of neighboring nodes. THE METHOD OF REFLECTIONS In this section we explain in detail the method of reflections as a general technique to study the structure of bipartite networks. To shorten the math we adopt a different notation than the one used for the particular example of countries and products. Going forward, we indicate all variables that are related to nodes in each partition by either Latin or Greek characters. 6 Consider a bipartite network M described by the adjacency matrix Maα, where Maα =1 if node a is connected to node α and zero otherwise. We define the method of reflections as the recursive set of observables , , 1 , 1 , , (3) , (4) for n>0,with , (5) , (6) Following these definitions, the degree of nodes in the bipartite network is given by and (in this notation we can drop the a and α indices when referring to the general concept described by the variable as the alphabet already indicates if the variables refers to one partition or the other –countries or products-). In the example of the main text these variables are the diversification (ka,0) of countries and the ubiquity (kp,0) of products. Following from (3) and (4), the average ubiquity of a country’s exports is given by whereas the average diversification of a product’s exporters is given by . The recursive nature of the method of reflections allows us to characterize the structure of the bipartite network by defining N variables for each one of its partitions. For example, continuing the characterization of the country-product network into a third layer of analysis in which , the average κ1 of a country’s exports, and ,the average k1 of a product’s exporter, is considered, allows us to 7 characterize countries and products through a three dimensional phase space spanned by , , and , , . In principle we can use the method of reflections to characterize countries and products by N variables. The method of reflections can be generalized by choosing different values for k0 and κ0 and iterating over them using (3) and (4). In fact, the measure of product sophistications PRODY [15] can be seen as a special case of the method of reflections in which ka,0 is the GDP(PPP) of a country and Maα is a matrix of RCAs. In such a case then PRODY=ka,1. When these variables were constructed, however, the authors were not aware that their methods were combining income information with the structure of a bipartite network. THE VARIABLES FOR THE FIRST THREE LEVELS Table S 1 shows how we interpret the first three pairs of variables describing the country-product network through the method of reflections: Description: Definition , Working Name Diversification , Ubiquity , , , , , , , , Short summary Question Form Number of products exported by country a. How many products are exported by country a? Number of countries exporting product α. How many countries export product α? Average ubiquity of the products exported by country a. How common are the products exported by country a? Average diversification of the countries exporting product α. How diversified are the countries that export product α? Average diversification of countries with an export basket similar to country a How diversified are countries exporting goods similar to those of country a? Average ubiquity of the products exported by countries that export product α. How ubiquitous are the products exported by product’s α exporters? Table S 1 Interpretation of the bipartite network description obtained from the method of reflections. INTERPRETING HIGHER REFLECTIONS As we iterate the method of reflections, it becomes increasingly harder to interpret the variables generated by it. We can gain insight into what higher reflection variables stand for by analytically solving the recursion formulas presented in (3)-(6). Analytically solving the recursion r r r r requires us to be able to express k N and κ N as a function of the initial conditions, k0 and κ 0 . Mathematically (3)-(4) we search for solutions of the form: 8 r r k a,N = ∑ Cab, N (k0 , κ 0 )k b ,0 r r , κ α,N = ∑ Cαβ , N (k0 , κ 0 )κ β , 0 b (7) β r To illustrate this we calculate the elements k 2 as an example. According to the r definitions of the method shown in (3)-(6) the elements of k 2 can be expressed as: ka , 2 = 1 1 M aα κα ,1 = ∑ ∑ κα ,1 ka , 0 α ka ,0 {a}α (8) Where {a}α is the set of the α neighbors of a. We can use (4) to rewrite (8) as ka , 2 = 1 1 ∑ ∑ kb , 0 ka , 0 {a}α κα , 0 {α }b (9) Which can be taken into the form (7) by permuting the sums and changing the index of the first summation to a sum over the second neighbors of a, and the index of the second summation to a sum over the neighbors of a and b. ka ,2 = 1 1 kb, 0 ∑ ∑ k a , 0 {{a}}b {a∩b}α κ α , 0 (10) Which satisfies the form presented in (7) with r r 1 1 Cab, 2 (k0 , κ 0 ) = ∑ ka , 0 {a ∩ b}α κ α , 0 (11) We can interpret ka,2 from the form presented in (10) by noticing that ka,2 is a linear r combination of the elements of k0 with coefficients given by product of the degrees of all nodes lying in the path connecting nodes a and b, including node a but not node b. Hence the r r coefficients Cab , 2 (k0 , κ 0 ) can be interpreted as the probability that a random walker that started at a ends up at b after two steps. r The random walker interpretation of the method of reflections is true not only for k 2 but for any N. Fig S 3 shows an example of a three node network in which some of the coefficients 9 associated with N=4 are presented explicitly. explicitly. Hence the method of reflections is a way to express the properties of a node in a network as a combination of the properties of all its neighbors, the coefficients of the linear combination being the probability that two nodes are connected by a random walker alker after N steps. The coefficients of the expansion can be interpreted as a measure of similarity between the nodes in the network, which is context dependent, as what matters in the expansion is the relative weight of these coefficients when compared to each other. r r k a,N = ∑ Cab , N (k 0 , κ 0 )k b , 0 b = ka ( 1 1 1 1 1 1 1 1 + ... + + ...) + kb (....) + k c (...) kb κ γ kb κ γ kb κ β ka κ α Fig S 3 Example showing how the method of reflections can be seen as an expansion of the properties of a node as a function of the properties of other nodes in the network with weights given given by the product of the inverse of the degrees of each node traversed in the path connecting them. Finally, we would like to mention that while higher order reflections do extract increasingly more relevant information about the productive structure of a country, as measured by how they are related to income and growth, it is important to mention that as N-> N ∞ all variables will progressively converge to the a similar value. Surprisingly, we find the tiny deviations of these values to be extremely informative. 10 A SIMPLE EXAMPLE In this section we explain the method of reflections using a simple example in which a network composed of four countries and four products is considered (Fig S 4). Products Countries p1 C1 p2 C2 C3 p3 C4 p4 Fig S 4 A simple network used to exemplify the method of reflections. In this example, the diversification of countries and the ubiquity of products is given by: kc1,0=4 kc2,0=1 kc3,0=2 kc4,0=1 kp1,0=1 kp2,0=2 kp3,0=2 kp4,0=3 Next, we calculate higher reflections of the method (or iterations). The first reflection consists of the average ubiquity of country’s products and of the average diversification of a product’s exporters and is given by: kc1,1=(1/4)(1+2+2+3)=2 kc2,1=(1/1)(2)=2 kc3,1=(1/2)(2+3)=2.5 kc4,1=(1/1)(3)=3 kp1,1=(1/1)(4)=4 kp2,1=(1/2)(4+1)=2.5 kp3,1=(1/2)(4+2)=3 kp4,1=(1/3)(4+2+1)=2.33 11 The second reflection is given by the average first reflection values of a node’s neighbors. kc1,2=(1/4)(4+2.5+2.25+2.5)=2.9583 kc2,2=(1/1)(2.5)=2.5 kc3,2=(1/2)(3+2.333)=2.66 kc4,2=(1/1)(2.333)=2.33 kp1,2=(1/1)(2)=2 kp2,2=(1/2)(2+2)=2 kp3,2=(1/2)(2+2.5)=2.25 kp4,2=(1/3)(2+2.5+3)=2.5 We can use this example to illustrate how the method of reflections is able to differentiate between different countries based only on information regarding which country exports which product. In this example, the most diversified country is c1, which exports all four products while there are two countries, c2 and c4, that only export a single product. The sole export of c2 however, is a relatively non ubiquitous product that is exported only by c1, the most diversified country, while the sole export of c4 is a product that is exported by all countries except c2. As we iterate the method we find that there is important information encoded in the relative position of countries and products relative to one another. For example, when we look at the values characterizing countries after the second reflection (kc,2) we can see that country c1 comes up ahead, followed by country c3, c2 and c4. The method places country c2 ahead of c4 because by the second reflection it is already considering that country c2 produces a non ubiquitous product that is found only in diversified countries, probably signaling that country c2 has a relatively good endowment of capabilities and produces a small number of products because of other reason, such as being of relatively small size. On the contrary, c4 produces a product that is ubiquitous and it is found in diversified and non diversified countries, probably indicating that is a simple product which is accessible to countries with relatively simple productive structures. Hence while both, c2 and c4 produce the same number of products, the method can differentiate between them and considers c2 to have a more complex productive structure than c4. While small in size this example illustrates how the method of reflections can be used to characterize the structure of a bipartite network and how this can be applied to help the understanding of the productive structure of countries and the sophistication of products. 12 SECTION 5: BIPARTITE NETWORK STRUCTURE MEASURED IN OTHER DATASETS In this section we present two additional kc,0-kc,1 diagrams constructed using data aggregated according to the Harmonized system and according to the North American Industry Classification System (NAICS). 35 MDV 30 GUY BDI qk1 25 20 15 BLZ KNA NCL TTO MWI PAN CRI MKD HND ALB NIC OMN SDN MAR MDA BEN LVA VCT TGO GT M LTU GMB DMA BRB LCA HRV ECU CYP NER MNG UGA SEN ZMB CPV AZE T ZA GRC EST COL ROM ARM MUS BGR URY T UN BLR VEN PRTT URDNK PER NZL ISL QAT LUX HUN GEO KGZ CHL POL JOR UKR SVK SVN IRN PHL BOL MEX PYF MLT ARG IDNTHA ZAF CAN AUT ISR NLD BRA SWE NOR IND SAU FIN KAZ AUS KOR IRL MYS GBR MSR RUS HKG SGP CHE T WN ESP CZE BEL FRAITA CHN DEU USA JPN 10 0 200 400 600 k0 Fig S 5 kc,0-kc,1 diagram constructed using data containing 103 countries and 1241 products aggregated according to the Harmonized System. 13 55 50 MDV BDI BLZ MLI ST P 45 kq1 40 35 GMB GRL NER FJI BGD GUY CUB GHA MDG BENDMAUGA MOZ PNG ET H LSO NGA CMR MRT SUR VCT T GO KEN ECU IRN JAM CIV SDN TZANIC AT G MWI NPL NAM HND DZA FRO COM PER MNG ZMB MART TO GIN CPV GAB CAF KHM SAU MAC LCA OMN NCL T KM BWA KWT BFA SEN ISLMUS BHS AZE KNA BHR 30 EGY SWZ PAN CHL MDA PRY BOL GRD MYT NOR ARM VEN KAZ 25 GTM ZWE NZL MKD URY BRB CYP TUN CRI LTU SLV LBN ARG GEO COL EST LVA JOR BGR ZAF ALB TUR GRC KGZ HRV AUS PHL IDN IND ROM BLR PRT T HA UKR BRA MLT MYSIRL ISR ADO RUS LUX POL ESPNLD BEL SVKHUN CHN CAN DNK SVN MEX PYF QAT YUG CZE FRA HKG KOR IT A FIN AUT SWE GBR 20 15 0 USA DEU CHE SGP JPN 50 100 k0 150 200 Fig S 6 kc,0-kc,1 diagram constructed using data containing 150 countries and 318 products aggregated according to the NAICS. SECTION 6: RANDOMIZING A BIPARTITE NETWORK To decide whether the structure of a network is trivial,* we need to compare it to an appropriate null model. The four null models we introduce in this section are an extension of the randomization algorithms introduced by Maslov and Sneppen [16] to analyze degree correlations in protein interaction networks. Our case differs from theirs in that we are dealing with a bipartite network rather than with a simple graph. The idea behind the randomization procedure is that we can create a null model starting from the data we want to analyze by shuffling the links of the network while conserving some of its statistical properties. The most popular version of this randomization procedure, which was designed for simple graphs†, consists of randomizing the links in the network by permuting the nodes at the end of a pair of links. For example, if we consider a simple graph containing the links {a,b} and {c,d}, then an allowed randomization step would consist of replacing these two links by the pairs {a,d} and {b,c}, given that the {a,d} and {b,c} links were not already part of * † Expected from chance Simple Graph is a network in which there is only one type of nodes, and connections are strictly binary (0 or 1). 14 the network. The randomization procedure described above conserves the number of links in the network as well as its degree‡ sequence and degree distribution. This is because the randomization procedure conserves the exact number of connections of each node, making it a good null model to compare properties of a network while controlling for the degree of nodes, which is the most fundamental property of a network. In the case of a bipartite network, we have two separate degree sequences, one for each of its partitions. Here we introduce four null models to control for all possible combinations of degree sequences. Null Model 1 is a network with the same number of nodes and links as the original network, yet in Null Model 1 connections have been randomly assigned. Null Model 1 is the less stringent of our Null Models and represents a network with the same number of links as the original network, but with a random degree sequence for both partitions. Null Model 2 controls for the degree sequence of one partition of the network, while randomizing the target of those links in the other partition. Null Model 2 represents a network with a diversification sequence matching the one in the observed data, yet in Null Model 2 the products exported by a country have been randomly assigned. Null Model 2 also conserves the total number of links in the network. Null Model 3 is symmetric to Null Model 2 in the sense that it represents a network with the same ubiquity distribution as the one observed in the data, but where the exporters of each product have been randomly assigned. Finally, Null Model 4 is a model obtained by permuting links in the network such that the diversification of countries and the ubiquity of products are exactly the same as those observed in the empirical data. It is important to notice that as Null Models become more stringent, the number of possible permutations that can be performed in the randomization procedure drops substantially. The possible number of permutations that can be performed in a randomization procedure does not only depend on the stringency of the null model, but also on the structure of the original network. For example, if we consider a bipartite network that can be represented by a triangular adjacency matrix (for simplicity assume that the number of ‡ Degree: The number of links a node has. Degree Sequence: List containing the degrees of all nodes in the network. 15 products is equal to the number of countries and that Mcp= 1 c<p; Mcp=0 otherwise), then there is not a single possible permutation that could be performed using the fourth null model. For such a case, Null Model 4 is equivalent to the original network. NULL MODEL SUMMARY Null Model Number of links kc,0 sequence kp,0 sequence <kc,0> <kc,1> < kp,0> < kp,1> Null Model 1 = Mcp ≠Mcp ≠Mcp = Mcp ≠Mcp = Mcp ≠Mcp Null Model 2 = Mcp = Mcp ≠Mcp = Mcp ≠ Mcp = Mcp ≠ Mcp Null Model 3 = Mcp ≠Mcp = Mcp = Mcp ≠ Mcp =Mcp ≠ Mcp Null Model 4 = Mcp = Mcp = Mcp = Mcp ≠ Mcp =Mcp ≠ Mcp Table S 2 Summary null model behavior. <> stands for the average of a quantity. SECTION 7: THE K P , 0 -K P , 1 DIAGRAM We compare the kp,0-kp,1 diagram obtained from our data with the one from our four null models (Fig S 7), finding that the structure of the country-product network is characterized by a strong negative correlation between kp,0-kp,1 and a wide range of kp,1 values that cannot be explained by any of the four null models. This result becomes even more evident when we study higher order reflections of the method (see SM section 7). Products from different sectors are colored according to the ten root categories in the SITC-4 classification, showing that while there is a correspondence between the kp,0-kp,1 diagram and the SITC-4 classification, there are important variations among similarly classified products. For example, this graph shows that natural resource-based products, such as minerals and fuels, exhibit a wide range of ubiquities (kp,0) at approximately constant diversification of its exporters (kp,1), meaning that 16 raw materials are on average exported by poorly diversified countries regardless of being relatively ubiquitous like coniferous wood (kp,0=43, kp,1=115 ), or rare as tin ore (kp,0=8, kp,1 =109 ). On the other hand, products classified as machinery show variation in the level of diversification of their exporters (kp,1) at relatively low ubiquities (kp,0). Hence the kp,0-kp,1 diagram can separate simple machines produced in less-diversified countries, such as handheld calculators, (kp,0 =7,kp,0 =144 ) from more complex machines produced in diversified countries such as motorcycles (kp,0 =5,kp,1 =270 ). 17 Fig S 7 Method of reflections and products characteristics. A, Schematic explanation of the kp,0− kp,1 space to characterize products. B, kp,0− kp,1 diagram for null models. C, kp,0− kp,1 diagram for the empirically observed exports data. 18 SECTION 8: A THIRD REFLECTION VIEW OF THE STRUCTURE OF THE COUNTRY-PRODUCT NETWORK Here we continue the analysis presented in the manuscript to a third layer of analysis in which we show figures characterizing countries by kc,0,kc,1,kc,2 and products by kp,0,kp,1,kp,2 (Fig S 8-Fig S 11). Fig S 8 Scatter plot for kc,0 and kc,2 for the original data in the year 2000 and the four null models. Fig S 9 Scatter plot for kc,1 and kc,2 for the original data in the year 2000 and the four null models. 19 Fig S 10 Scatter plot for κ and κ2 for the original data in the year 2000 and the four null models. Fig S 11 Scatter plot for κ1 and κ2 for the original data in the year 2000 and the four null models. SECTION 9: NULL MODELS AND GDP In this section we present scatter plots between GDP per capita and the first two variables of the method of reflections characterizing the structure of bipartite networks created from our four null models (Fig S 12, Fig S 13). 20 Fig S 12 Scatter plot between GDP and bipartite network properties for countries (k=kc,0, k1=kc,1) and Null Models 1 and 2 Fig S 13 Scatter plot between GDP and bipartite network properties for countries (k=kc,0, k1=kc,1) and Null Models 3 and 4 21 SECTION 10: THE METHOD OF REFLECTIONS AND COUNTRY RANKINGS (YEAR 2000) Fig S 14 Relative ranking of countries based on the Method of Reflections for the year 2000 22 SECTION 11: THE METHOD OF REFLECTIONS AND POPULATION Economic output is usually measured in per capita terms, as the goal of development is to generate and distribute wealth in the most democratic way possible. Yet there are some other variables in which the per capita idea does not apply as directly as it does for income. One example is diversification, which in our formalism is represented by kc,0. While in principle we might be tempted to consider the per capita level of diversification, as a good indicator of the diversification that can be attributed to each individual in a population, it is important to consider that such normalization assumes that the level of diversification grows linearly with the number of people. This, however, would not be a careful way of measuring the amount of diversification that should be attributed to each individual in a population, as the number of different products a group of people can make might well depend on the possible number of interactions, and hence go as the square of the population, or could depend on a more complex function that is hitherto unknown. Normalizing diversification by the number of individuals in a population can therefore be considered naïve, as it assumes a linear functional form as the correct normalization for a variable that does not necessarily depends linearly in the population. The diversification of a country kc,0, however, does depend on a country’s population (Table S 3 column 1). Hence, we still need a variable that would give us a measure of the level diversification of a country that is independent of its number of inhabitants. In Table S 3 we present the dependence of our first four measures of diversification (kc,0,kc,2,kc,4,kc,8) on population, showing that higher order reflections of the method generate measures of diversification that are independent of a country’s population, and are therefore good indicators of the level of diversification of a country that is due to the complexity of its economy rather than to its population. 23 VARIABLES Log kc,0 Log kc,0 Log kc,4 Log kc,8 Log Population t-test Constant t-test Observations Adjusted R2 0.190*** (4.812) 1.272** (2.005) 127 0.150 0.0168** (2.168) 4.708*** (37.63) 127 0.029 0.00343 (1.488) 5.004*** (134.7) 127 0.010 0.000267 (1.198) 5.081*** (1415) 127 0.003 Table S 3 Correlation between population and successive generations of measures of diversification constructed from the method of reflections (** statistically significant at the 5% level, *** statistically significant at the 1% level). SECTION 12: SHARES OF PRODUCTS IN THE WORLD One critique of our methods that can be raised is that the SITC-4 classification is more disaggregated for goods produced by richer countries, as rich countries are the ones that created the classification system. A classification bias in that direction would overstate the level of diversification of rich countries and understate that of poor countries. We have shown that our results do not depend on the level of aggregation by considering two additional datasets aggregated according to different classification systems, which summarize all tradable goods using a different number of product classifications. Here we complement this test of the validity of our methods by looking at the share in world trade associated with each product in the SITC-4 classification (Fig S 15), finding that, contrary to the critique presented above, industrialized country products have large shares in total trade, indicating that they are not more narrowly classified than agricultural products and raw materials (except oil) when benchmarked by their share in world trade. In simpler terms, if we were to further disaggregate products into categories to achieve more homogenous shares in world trade, we would have to disaggregate cars into classes, like SUVs, sedans and compacts rather than melons into different types, indicating that the data behaves in the opposite way than what the critique suggests. 24 Fig S 15 Share in world trade for products sorted by SITC-4 code. Table S 4 and Table S 5 respectively show the five products with smallest, largest share in world trade. SITC-4 Code Product Names 6553 19 6344 3415 2652 Knitted/crocheted fabrics elastic or ruberized Live animals of a kind mainly used for human food Wood-based panels N.E.S. Coal gas, water gas, producer gas & similar gases True hemp, raw or processed, not spun; tow and waste World Market Share in the year 2000 (Total World Trade = 1) -8 3.2x10 -8 5.3x10 -7 1.7x10 -7 5.5x10 -7 8.0x10 Table S 4 The five products with the smallest world share in the year 2000. SITC-4 Code 7810 3330 7764 7849 7599 Product Names Passenger motor cars, for transport of pass. & good Petroleum oils & crude oils obt. from bitumen minerals Electronic microcircuits Other parts and accessories of motor vehicles Parts and accessories suitable for calculating and data processing machines World Market Share in the year 2000 (Total World Trade = 1) 0.0494 0.0493 0.0329 0.0225 0.0214 Table S 5 The five products with the largest world share in the year 2000 SECTION 13: NETWORK STRUCTURE, INCOME AND GROWTH In this section we present regressions showing how the structure of the bipartite network is connected to income and economic growth. We also compare the performance of our structural measures to two other measures of diversity: the Hirschaman-Herfindahl (H-H) index and Entropy. 25 The HH index is a measure of market concentration commonly used for antitrust purposes, yet it has also been used as a measure of diversification. The H-H index (H) is defined as: (12) where Scp is the share of product p in the export basket of country c. An alternative method to measure the diversification of a country’s export basket is to consider its entropy, which is defined as: log (13) High entropy values are characteristic of diversified export baskets, whereas low entropy values are associated with export baskets that are concentrated in a small number of products. We present the results of our regressions as tables (Table S 6-Table S 9). To help the reader understand the information contained in these tables, we have created a figure explaining how to read these regression tables (Fig S 16): 26 Fig S 16 How to read regression tables In this section we present regression tables between E, H, kc,0, kc,1, kc,4, kc,8, kc,12, kc,18 and income per capita adjusted by power-purchasing parity (Table S 6) and E, H, kc,0, kc,1, kc,4, kc,5, kc,8, kc,9, kc,18, kc,19 and economic growth for a 20 year period (Table S 7), two ten year periods (Table S 8) and four five year periods (Table S 9). Additionally, we present regression results for four five year periods with fixed country effects (Table S 10). A fixed country effect regression means that dummy variables were introduced to capture all the variation between countries, hence the quantity we look for here is the within R2, which is the variation in growth explained by the productive structure after controlling for all between-country variations. Technically 27 dummy variables are defined as 0 for all countries except one. In fixed effect regressions we introduce one of these variables per country considered. Table S 5 studies the relationship between the level of income in 2000, as measured by the log of GDP per capita at purchasing power parity, and different measures of productive structure. Columns 1 and 2 use pre-existing measures of diversification, in particular the entropy and the H-H index. The first can explain 37.7 percent of the variance in income per capita, while the second can only account for 17.6 percent, as shown by the R2 of the regression. Columns 3 to 8 use successive iterations of our method. Diversification kc,0 explains 34.5 percent of the variance; kc,1 explains 37.8 percent, and subsequent variables converge to 53 percent by the 8th reflection, with higher order variables adding little additional power. Columns 9 to 11 show a “horse race” between kc,18 and the pre-existing measures taken one at the time or simultaneously. It shows that kc,18 contains much more information than the others do, as reflected in the fact that adding them increases the R2 very little vis a vis column 8 but much more vis a vis columns 1 and 2. Table S 6 does a cross-country regression of growth between 1985 and 2005 and initial values of productive structure indicators. Columns 1–3 use the entropy indicator, the H-H index and the two combined. Columns 4–8 use successive pairs of k variables. Columns 9-11 present a horse race between the kc,18-kc,19 pair and the traditional measures of productive structure, both separately and taken together. All regressions also control for the initial level of GDP per capita. The results are similar to those of the previous table. The variables we introduce do a better job at predicting the pattern of future growth and higher reflections of the method have the largest predictive power. Interestingly, there is complementary information in successive measures of our variables so that both appear significant in the regression. kc,18-kc,19 contain more information than the traditional measures and beat them in a horse race (equations 9-11). Table S 7 repeats these regressions, splitting the sample into two periods of 10 years, 1985-95 and 1995-05, and finds similar results: pairs of k variables do a better job of explaining growth than do the traditional variables, and the quality of the fit increases with each iteration. A horse race between traditional and k variables shows that the bulk of the explanatory power 28 comes from the k variables, although the traditional variables have some residual information that is statistically significant, although small. Table S 8 repeats the analysis using four 5-year periods between 1985 and 2005 and finds similar results. Table S 9 presents an equivalent set of regressions but controls for average fixed country characteristics by including a dummy variable per country. This regression bases its identification only in the within-country variation in growth and finds similar but even stronger results. Our preferred specification – column 8 – is able to explain 33.72 percent of the withincountry variance, while adding the traditional variables only increases the explanatory power to 35 percent. The two traditional variables on their own (column 3) explain only 21.72 percent of the within-country variance, indicating that the fit increases much more when adding the k variables to the traditional variables (contrast of columns 3 and 11) than when adding the traditional variables to the k variables (contrast column 8 and 11). 29 6.696*** (30.38) 125 0.377 (8.712) 8.914*** (71.88) 125 0.176 (-5.250) -2.554*** (2000) (2000) 0.552*** (2) Log GDP per capita ppp (1) Log GDP per capita ppp 7.603*** (55.88) 125 0.345 (8.147) 0.00859*** (2000) (3) Log GDP per capita ppp Table S 6 Regression coefficients for income per capita Observations Adjusted R2 Constant (2000) kc,18 (2000) kc,12 (2000) kc,8 (2000) kc,4 (2000) kc,1 (2000) kc,0 (2000) Herfindahl (2000) Predictors Entropy Predicted Variable INCOME (YEAR 2000) 12.34*** (27.49) 125 0.378 (-8.740) -0.159*** (2000) (4) Log GDP per capita ppp -9.796*** (-6.147) 125 0.513 (11.48) 0.116*** (2000) (5) Log GDP per capita ppp -185.6*** (-11.41) 125 0.533 (11.93) 1.201*** (2000) (6) Log GDP per capita ppp -1968*** (-11.94) 125 0.535 (11.99) 12.21*** (2000) (7) Log GDP per capita ppp (11.99) -63581*** (-11.99) 125 0.535 392.6*** (2000) (8) Log GDP per capita ppp (6.854) -52466*** (-6.853) 125 0.546 324.0*** (1.991) 0.157** (2000) (9) Log GDP per capita ppp (9.923) -59234*** (-9.921) 125 0.541 365.8*** (-1.552) -0.639 (2000) (10) Log GDP per capita ppp 30 (5.859) -51109*** (-5.858) 125 0.543 315.6*** (0.329) (1.275) 0.270 0.202 (2000) (11) Log GDP per capita ppp 0.000993 -0.00176 0.0114 (0.751) 97 0.195 (3.650) 0.0137 (0.776) 97 0.115 (-2.765) -0.0273*** (0.533) (85, 05) (85,05) (-0.794) 0.00660*** (2) Growth (1) Growth 0.00650 (0.437) 97 0.192 (0.760) (2.600) 0.0116 (-0.882) 0.00828** -0.00206 (85, 05) (3) Growth (-0.497) -0.00154 (85, 05) (4) Growth 0.0338 (0.922) 97 0.118 (-0.749) (2.080) -0.000612 6.62e-05** Table S 7 Regression coefficients for a twenty year period of growth Observations Adjusted R2 Constant (1985) kc,19 (1985) kc,18 (1985) kc,9 (1985) kc,8 (1985) kc,5 (1985) kc,4 (1985) kc,1 (1985) kc,0 (1985) Herfindahl (1985) Entropy (1985) Predicted Variable Predictors GDP per capita ppp 20 YEAR GROWTH -0.735* (-1.883) 97 0.206 (1.737) (2.866) 0.0321* 0.00169*** (-0.735) -0.00249 (85, 05) (5) Growth -19.29*** (-2.807) 97 0.247 (2.713) (3.075) 0.890*** 0.0338*** (-0.688) -0.00223 (85, 05) (6) Growth -69.21*** (-3.454) 97 0.202 (3.453) 0.401*** (-1.478) -0.00470 (85, 05) (7) Growth (2.928) -23801*** (-2.940) 97 0.274 (2.952) 1127*** 38.88*** (-0.758) -0.00233 (85, 05) (8) Growth (2.603) -21475** (-2.610) 97 0.274 (2.618) 1017** 35.05** (0.931) (-0.849) 0.00200 -0.00244 (85, 05) (9) Growth (2.829) -22808*** (-2.834) 97 0.268 (2.849) 1080*** 37.26*** (-0.406) -0.00414 (-0.804) -0.00238 (85, 05) (10) Growth 31 (2.632) -21801*** (-2.633) 97 0.268 (2.643) 1033*** 35.57*** (0.454) (0.896) 0.00723 (-0.831) 0.00322 -0.00242 (85, 05) (11) Growth 0.000334 -0.00235 0.0171 (1.356) 221 0.113 0.0226 (1.602) 221 0.077 0.0153 (1.087) 221 0.109 (0.285) (-3.890) (2.985) 0.00422 (4.962) -0.0325*** 0.00759*** (-1.349) -0.00246 (85-95-05) (3) Growth 0.00699*** (0.209) (85-95-05) (85-95-05) (-1.322) (2) Growth (1) Growth -0.0186 (-0.971) 221 0.085 (2.543) 0.000916** (3.967) 9.75e-05*** (0.595) 0.00112 (85-95-05) (4) Growth Table S 8 Regression coefficients for two ten year periods of growth Observations Adjusted R2 Constant (85,95) kc,19 (85,95) kc,18 (85,95) kc,9 (85,95) kc,8 (85,95) kc,5 (85,95) kc,4 (85,95) kc,1 (85,95) kc,0 (85,95) Herfindahl (85,95) Entropy (85,95) Predicted Variable Predictors GDP per capita ppp 10 YEAR GROWTH -0.168*** (-6.137) 221 0.170 (5.971) 0.00329*** (5.577) 0.00102*** (-2.062) -0.00395** (85-95-05) (5) Growth -1.178*** (-5.215) 221 0.160 (5.594) 0.0152*** (5.056) 0.00577*** (-1.695) -0.00311* (85-95-05) (6) Growth 0.102*** (3.107) 221 0.068 0.789*** (2.709) -65.48*** (-2.705) 221 0.168 (2.705) 0.310*** (4.312) -96.21*** (-4.308) 221 0.137 (4.306) (3.002) 0.00458*** (-1.899) -0.00346* (85-95-05) (9) Growth 1.158*** 0.455*** (-3.577) (-0.707) -0.00119 (85-95-05) (8) Growth -0.000660*** (2.188) 0.00310** (85-95-05) (7) Growth (3.433) -79.20*** (-3.428) 221 0.158 0.954*** (3.428) 0.375*** (-2.494) -0.0210** (-1.327) -0.00229 (85-95-05) (10) Growth 32 (2.699) -65.78*** (-2.695) 221 0.164 0.792*** (2.695) 0.311*** (-0.112) -0.00162 (1.648) 0.00434 (-1.850) -0.00343* (85-95-05) (11) Growth 0.000292 -0.00269* 0.0166 (1.497) 451 0.090 0.00798*** (6.280) 0.0142 (1.144) 451 0.089 (0.440) 0.0236* (1.910) 451 0.062 0.00602 (-4.970) 0.00885*** (3.760) (-1.785) -0.00286* (85-90-95-00-05) (3) Growth -0.0373*** (0.211) (85-90-95-00-05) (85-90-95-00-05) (-1.732) (2) Growth (1) Growth (4) Growth -0.0160 (-0.933) 451 0.071 0.000953*** (2.853) (5.351) 0.000121*** (0.220) 0.000361 (85-90-95-00-05) Table S 9 Regression coefficients for four five year periods of growth. Observations Adjusted R2 Constant (85,90,95,00) kc,19 (85,90,95,00) kc,18 (85,90,95,00) kc,9 (85,90,95,00) kc,8 (85,90,95,00) kc,5 (85,90,95,00) kc,4 (85,90,95,00) kc,1 (85,90,95,00) kc,0 (85,90,95,00) Herfindahl (85,90,95,00) Entropy (85,90,95,00) Predicted Variable Predictors GDP per capita ppp 5 YEAR GROWTH -0.173*** (-7.646) 451 0.136 0.00260*** (5.694) (7.074) 0.00113*** (-2.431) -0.00393** (85-90-95-00-05) (5) Growth -0.224*** (-4.198) 451 0.071 (5.504) (3.474) 0.00312*** 0.00102*** (1.767) 0.00224* (85-90-95-00-05) (6) Growth (4.494) -0.142** (-2.576) 451 0.127 (2.131) 0.00280*** (4.671) -0.163*** (-2.846) 451 0.057 0.0349 (0.889) 451 0.013 (2.359) 0.00259*** 0.000632** 0.000673** 0.00759*** (6.060) (-1.686) -0.00257* (85-90-95-00-05) (9) Growth (-1.147) (2.479) 0.00310** (85-90-95-00-05) (8) Growth -0.000265 (2.553) 0.00326** (85-90-95-00-05) (7) Growth (4.523) -0.132** (-2.355) 451 0.101 (2.234) 0.00265*** 0.000647** (-4.765) -0.0351*** (0.207) 0.000281 (85-90-95-00-05) (10) Growth 33 (4.493) -0.145*** (-2.611) 451 0.125 (2.365) 0.00260*** 0.000676** (0.474) 0.00636 0.00851*** (3.680) (-1.749) -0.00275* (85-90-95-00-05) (11) Growth -0.0581*** -0.0585*** 0.467*** (7.427) 451 0.2071 (4.478) 0.514*** (8.147) 451 0.1784 (-2.842) -0.0390*** (-7.721) (85-90-95-00-05) (85-90-95-00-05) (-7.911) 0.0134*** (2) Growth (1) Growth 0.429*** (6.592) 451 0.2179 (2.117) (4.037) 0.0585** (-8.072) 0.0247*** -0.0595*** (85-90-95-00-05) (3) Growth 0.594*** (9.546) 451 0.2991 (6.549) (3.710) 0.00238*** 0.000223*** (-10.11) -0.0773*** (85-90-95-00-05) (4) Growth 0.588*** (8.165) 451 0.3379 (9.287) (2.922) 0.00366*** 0.000537*** (-11.28) -0.0863*** (85-90-95-00-05) (5) Growth Table S 10 Regression table for four five year periods of growth considering fixed country effects. Observations Within R2 Constant (85,90,95,00) kc,19 (85,90,95,00) kc,18 (85,90,95,00) kc,9 (85,90,95,00) kc,8 (85,90,95,00) kc,5 (85,90,95,00) kc,4 (85,90,95,00) kc,1 (85,90,95,00) kc,0 (85,90,95,00) Herfindahl (85,90,95,00) Entropy (85,90,95,00) Predicted Variables Predictors GDP per capita ppp 5 YEAR GROWTH FIXED EFFECTS 0.589*** (8.070) 451 0.3373 (8.998) (2.801) 0.00410*** 0.000611*** (-11.78) -0.0891*** (85-90-95-00-05) (6) Growth 0.651*** (8.203) 451 0.1779 0.000535*** (-2.808) (-8.337) -0.0651*** (85-90-95-00-05) (7) Growth (8.799) 0.596*** (8.310) 451 0.3372 (2.801) 0.00419*** 0.000601*** (-11.89) -0.0899*** (85-90-95-00-05) (8) Growth (8.164) 0.543*** (7.315) 451 0.3494 (3.051) 0.00395*** 0.000653*** (2.453) (-11.39) 0.00706** -0.0868*** (85-90-95-00-05) (9) Growth (8.521) 0.585*** (8.133) 451 0.3415 (2.879) 0.00410*** 0.000618*** (-1.435) -0.0181 (-11.58) -0.0884*** (85-90-95-00-05) (10) Growth 34 (8.031) 0.511*** (6.583) 451 0.3535 (3.141) 0.00389*** 0.000673*** (1.410) (2.435) 0.0360 (-11.40) 0.0142** -0.0867*** (85-90-95-00-05) (11) Growth SECTION 14: ADDITIONAL RESULTS PRODY AND EXPY The variables PRODY and EXPY were introduced originally by Hausmann, Hwang and Rodrik [15] to characterize the sophistication of products and of countries’ exports starting from trade and income data. PRODY and EXPY allow us to study the income of countries from a product-specific perspective. DEFINITIONS PRODY The PRODY of a product is the average income per-capita associated with that product. We can calculate PRODY using trade data as !"# % $ (14) Where Scp is the share of product p in the export basket of country c, Gc is the income of country c measured as GDP per capita adjusted for power purchasing parity and $ ∑ ' . EXPY The EXPY of a country is the average PRODY of its exports. (# !"# (15) We notice that PRODY and EXPY mix income and network information as these variables have a similar definition than the first two reflections of the method with k0=GDP per capita and Mcp related to the shares of products in the export baskets of countries. 35 EXPY, K C, 0 , K C , 1 Here we complement our results on income by showing that k and k1 correlate with a countries’ EXPY (Fig S 17). Fig S 17 EXPY and bipartite network structure. a, Diversification (kc,0=k) versus EXPY. b, Average ubiquity of a country’s products (kc,1=k1) versus EXPY. Fig S 18 PRODY and bipartite network structure. A, Ubiquity (kp,0,) versus PRODY. b, Average ubiquity of a country’s products (kp,1) versus PRODY. NULL MODEL BEHAVIOR FOR PRODY AND EXPY, K C ,0 , K C , 1 Here we present the null model behavior for the relationships found between PRODY, EXPY and the network structure (Fig S 19 - Fig S 22). 36 Fig S 19 Comparison between PRODY and EXPY with kc,0, kc,1, kp,0 and kp,1 for null model 1. A PRODY v/s kp,0 B PRODY v/s kp,1 C EXPY v/s kc,0 D EXPY v/s , kc,1 Fig S 20 Comparison between PRODY and EXPY with kc,0, kc,1, kp,0 and kp,1 for null model 2. A PRODY v/s kp,0 B PRODY v/s kp,1 C EXPY v/s kc,0 D EXPY v/s , kc,1 37 Fig S 21 Comparison between PRODY and EXPY with kc,0, kc,1, kp,0 and kp,1 for null model 3. A PRODY v/s kp,0 B PRODY v/s kp,1 C EXPY v/s kc,0 D EXPY v/s , kc,1 Fig S 22 Comparison between PRODY and EXPY with kc,0, kc,1, kp,0 and kp,1 for null model 4. A PRODY v/s kp,0 B PRODY v/s kp,1 C EXPY v/s kc,0 D EXPY v/s , kc,1 38 BIPARTITE NETWORK ANALYSIS AND PROXIMITY IN THE PRODUCT SPACE We study the relationship between the analysis presented here and the proximity between products in the product space by asking if products that are close in the κ−θ diagram are proximate in The Product Space. Proximity in the product space is defined as the minimum pair-wise conditional probability of co-exporting products p1 and p2. We can express this as a function of M as: )* + min / ∑ * + ∑ * + 0 1. ∑ * ∑ + (16) We expect pairs of products co-exported by a large fraction of countries (i.e. pairs of products having a large φ) to have a similar kp,0 and kp,1. We control for randomness by using our four null models, as these can be used to compare the relationship between kp,0 and kp,1 and φ for networks that are similar to Mcp. The four null-models allow us to study variations in the relationships between kp,0, kp,1 and φ that come from the network structure, rather than from their definition. Proximity (φ) is a quantity associated with a pair of products. We compare φ to kp,0 and kp,1 by measuring the Euclidean distance in the kp,0 and kp,1 space: Δ* + 4* , + , 5 * , * , (17) Δ* + 4Δ, 5 Δ, . We study the relationship between the distance in the kp,0-kp,1 space and φ (Fig S 23) and find that high proximity values are likely only among products close by in the kp,0-kp,1 diagram. We notice that the null models do not give rise to proximities as high as the ones observed in the original data, suggesting that the high observed co-production of some pairs of products 39 cannot be expected from chance, and hence, high proximity values indicate similarities between the productive structures required to produce such pairs of products. These results also show that a good φ threshold is to consider φ>0.5, as φ values above that threshold are extremely rare in any of the four null models. 40 Fig S 23 Bipartite network structure and product proximity. The five plots show proximity as a function of the Euclidean distance between products in the kp,0-kp,1 diagram. 41 REFERENCES 1 PG Lind, MC González, HJ Herrmann. Cycles and clustering in bipartite networks Phys. Rev. E, 72:056127 (2005) R Guimerà, M Sales-Pardo, LAN Amaral. Module identification in bipartite and directed networks Phys. Rev. E, 76:036102 (2007) 3 S Lehmann, M Schwartz, LK Hansen. Bi-clique Communities Phys. Rev. E, 78:016108 (2008) 4 K-I Goh et al. The Human Disease Network, PNAS, 104:8685-8690 (2007) 5 W Souma, Y Fujiwara, H Aoyama Complex Networks and Economics Physica A 324:396-401 (2003) 6 A.-L. Barabási, R. Albert. Emergence of scaling in random networks Science 286:509–512 (1999) 7 Watts, D.J.; Strogatz, S.H..Collective dynamics of 'small-world' networks. Nature 393(6684): 409–10 (1998) 8 MEJ Newman. The structure of scientific collaboration networks PNAS, 98-404-409 (2001) 9 MEJ Newman. Scientific collaboration networks I Network Construction and Fundamental Results. Phys. Rev. E, 64:016131 (2001) 10 MEJ Newman. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality Phys. Rev. E, 64:016132 (2001) 11 AL Barabási et al. Evolution of the social network of scientific collaborations Physica A 311:590-614(2002) 12 LAN Amaral et al. Classes of small-world networks PNAS 97:11149-11152 (2000) 13 H Jeong, Z Neda, A-L Barabasi Measuring preferential attachment in evolving networks Europhysics Letters 61: 567-572 (2003) (2003) 14 P Gleiser, L Danon Community Structure in Jazz arxiv/cond-mat/0307434 (2003) 15 R. Hausmann, Hwang, D. Rodrik (2007) Journal of Economic Growth, 12(1):1-25 (2007) 16 S Maslov, K Sneppen Specificity and stability in topology of protein networks, Science, 296:910-913 (2002) 2 42