Michael Schlauch GIS – Fletcher School Parmenter, Florance May 2013 VISUALIZING GLOBAL TRADE – FINAL PROJECT PAPER PROJECT DESCRIPTION The purpose of this project was to visualize global trade through the lens of individual commodities and value chains. Unlike many other projects, my objective was not so much to answer a specific research question as it was to see the data first before starting to ask questions about it. Global trade is a complex and interdependent system which I thought would be both challenging and rewarding to visualize – and the trade in primary commodities (minerals, agricultural goods, energy) is also an area of interest for me. From the beginning, I had in mind some of the maps and information graphics made by Charles Joseph Minard as an overall aesthetic to be working towards. These maps depict geographical, numerical and directional data in a single image using flow arrows – a form ideally suited for representing trade between countries. Example Map: Raw and Refined Sugar DATA SOURCES The canonical data set on global trade in commodities is the UN’s Commodity and Trade Statistical Database (Comtrade). It contains annual data from 1962 on trade as reported by UN member countries, detailing for each record the traded good (classified by HS code), the value and weight of the trade, the reporting country, and the partner country.1 The database can be queried from comtrade.un.org using a number of parameters which are discussed in further detail below. The reporting country is the country whose national statistical agency is reporting on the trade, and could be the exporting or importing country depending on the traded good. For this reason, Comtrade inherently has “duplicate” entries in the sense that for every trade flow there is one record reported by the importing country (e.g. the US reported import of 100 cars from Japan) and one record reported by the exporting country (e.g. Japan reported export of 100 cars to the US). Secondary literature on the UN data set suggests that the data reported by the importing country tends to be more accurate, so for the purposes of this project, I eliminated the problem of duplicate entries by restricting my analysis to the importing country data only. There is an option on Comtrade website to include only “Imports” (or “Exports”) in the data request. On the Comtrade site, there is also an option to select which countries to include in your query – I selected “All” in each case. HS codes come from the Harmonized Commodity Description and Coding system, an internationally standardized system used by customs and other government agencies to classify over 5,000 different traded goods. The system is relatively straight forward for a data consumer to navigate in order to find particular commodities of interest (although it is sometimes notoriously difficult to classify your goods if you are trying to do business and get your shipment through customs). The system uses a hierarchy of two-digit parent codes, and 4-6 digit child codes to classify goods. The Comtrade system contains data on all trade reported by countries, so it is necessary to filter out trade flows of smaller volume in order to reduce clutter on the maps. As can be seen from the poster, the value parameter filter was set at a variety of different levels in order to get an appropriate number of records for importing into ArcGIS. Generally, the goal was to have the query return about 25-50 records per commodity. It was also possible to have a larger number of records returned in the query and then delete any unwanted records in Excel after the data had been exported from Comtrade. In retrospect, this latter method allowed for more control of the process, although I only started using it about halfway through the project. Importantly, there are a few tips for working with the Comtrade website, which is somewhat prone to bugs and request timeouts. Firstly, it operates best on a Windows machine with either Internet Explorer or Firefox as browser. Secondly, for the purposes of this project, the most useful way of accessing the data is through the Basic Selection option under the Data Query tab. Most importantly, users should know that after submitting their query, a new page will open up allowing Direct Download of the results 1 There are other fields as well, but these are the most relevant and useful ones, particularly for this project. in Excel format. After downloading the results, it is tempting to use the Modify Selection option, which in theory would allow you to return to your query with your previously entered parameters still saved. However, my experience with this option was that the website seems to have a bug with storing the values from a previous query and upon requesting a new query the request would usually stall and then timeout. Instead of using the Modify Selection option it is better to start over with a blank query by going back to the Basic Selection option under the Data Query tab. Prior to discovering this, much time was wasted waiting for queries to finish only to have found they had failed. DATA PREPARATION Data is exported from Comtrade in Excel/CSV format including all attributes already mentioned above. In the UN exported data format, country codes of up to 3 digits are used instead of country names, and a metadata file that cross-references these codes with country names and ISO codes is available on the Comtrade website. The first step in data preparation was to link these Country_Code fields with geographical data, which would ultimately serve as the start and end points of the flow arrows in the final maps. To do this, I began by downloading country boundaries shapefiles from ArcGIS Online. Initially, I added two new fields to the attribute table of the shapefile and calculated XY centroids for each country polygon (one X centroid, one Y centroid). However, this method was not ideal since it resulted in points that were located in odd locations, even though technically they were “centered” on the country (think of how Alaska and Hawaii would skew the US centroid to the western part of the continental land mass). Instead, I ended up using a “Country Boundaries” shapefile from ArcGIS online that already had points associated with each country at locations that made more sense visually. This shapefile therefore contained my geographical data in latitude and longitude format which I would need to join with the Comtrade data. I first renamed this shapefile to “Country_Reference” and exported its attribute table to Excel. In addition to the lat-long attributes, this exported “Country_Reference” table also include ISO-2 digit codes corresponding to each country. Next, I added a column to the Excel table for “UN_Country_Code” and joined it with the Comtrade metadata file on ISO_Code in order to populate the new column with UN_Country_Code data. A few manipulations were necessary in order for this join (actually a VLOOKUP) to work properly due to some duplicate entries and discrepancies between the shapefile and UN data: Deleted “India excl Sikkah” row from Comtrade metadata table Deleted “USA (before 1981)” row from Comtrade metadata table Added “EU-27” with Country_Code and Germany’s coordinates Added Hong Kong Country_Code and coordinates Changed Ethiopia Country_Code from 230 to 231 Changed Panama Country_Code from 590 to 591 Changed Yemen Country_Code from 886 to 887 Several rows for small countries and territories were also edited (e.g. Puerto Rico and Guam were recoded as US rather than separate entities), however this probably had no effect on final results as the trade volumes for these places are too small. The cleaned data in this “Country_Reference” Excel file therefore served as my reference table for linking all UN data with the geographical attributes I would need to create flow arrows in ArcGIS. For each data set exported from UN Comtrade, the steps were as follows: Rename column headings from UN data to be consistent with ArcGIS conventions (no spaces, etc). See below. Add 5 columns to each Excel table, with headings titled respectively: Report_X, Report_Y, Partner_X, Partner_Y, and ID. The first four column headings corresponded to the latitude and longitude coordinates of each country-pair in each trade flow. The ID column was used as a unique reference that would be used later for re-joining each record in the shapefiles with its trade value and weight data. Copy and paste VLOOKUP function into the first 4 columns which would populate each of these with latitude and longitude data from the “Country_Reference” table based on the UN_Country_Code field. o In many cases, this function will generate errors when it cannot locate a corresponding country record based on the UN_Country_Code. This is mostly due to the fact that some of the UN commodity data records are for trade flows where the country of origin is unknown. Since it is impossible to associate these trade records with geographical locations they were simply removed from the data set. Add sequential list of unique IDs for each record in the ID column. [1,2,3…] Save as “Commodity_Year_TradeValueFilter”and Close. We’ll call these excel files generically the “Commodity Tables” for the remainder of this paper. Next, we were finally ready to use the XY to Line tool from ArcGIS. The XY to Line tool is located under Data Management Tools \ Features and asks for the following arguments: Input table: The relevant tab from the Excel “Commodity Tables” described above. Output Feature Class: Location to save shapefile generated by XY to Line tool. Start X Field: Partner_X field from “Commodity Tables.” (Since all data is as reported by the importing country, the “Partner Country” corresponds to the “Start” of the trade flow.) Start Y Field: Partner_ field from “Commodity Tables.” End X Field: Report_X from “Commodity Tables.” End Y Field: Report_Y from “Commodity Tables.” Line Type (optional): I preferred the look of “RHUMB” lines. ID (optional): ID field from “Commodity Tables.” This field is not “optional” for this project since it is required for joining the trade value attributes back to the shapefile afterwards. Spatial Reference (optional): GCS_WGS_1984. The tool will use your current projection as default, which can cause problems since the latitude and longitude data requires a Geographic Coordinate System rather than a Projected Coordinate System. After hitting OK, the process will run a few moments and then generate a new shape file. Next, it is necessary to re-join the attribute table of the new shapefile to the original trade data from the “Commodity Tables.” To do this, open the attribute table of the new shapefile and Join with the “Commodity Tables” data on the basis of the ID field. This will “re-unite” the trade value and all other UN data with the shapefile lines, and allow one to use this data in setting the formatting (e.g. graduated symbols) of the flow arrows, e.g. based on USD value of the trade in this case. Finally, the last stage is to format the resulting shapefile. I experimented with a number of different options here and ultimately found that graduated symbols rather than graduated colors created a more visually appealing map. For most commodities, I used line widths of range 0.1 to 4, and arrow symbols (arrow heads) of 10 width and 6 height. More on this below in the Discussion section. DISCUSSION The XY to Line tool was able to meet my requirements at a basic level, but it does have some limitations. In general, creating flow maps using automated software processes is imperfect due to the fact that flow maps often do not scale well, and result in “cluttered” images with multiple overlaying lines and arrow heads. This is a problem acknowledged repeatedly in the GIS and information graphics literature, and the XY to Line tool is no exception in falling victim to it as well. As can be seen from some of the maps on my poster, when the trade in certain commodities is geographically concentrated, this can lead to multiple flows “piling up” on one another. Major hubs like China and Brazil may see overlapping lines, and trade flows between various European countries can become unreadable when viewed from a global level. Initially, my flow arrows positioned the arrow head symbols at the end of each line, making this problem particularly bad at major importing hubs, for example for many commodities going into China. However, this problem was partially solved by locating the arrow heads at the middle of the line. Occasionally, these would also overlap, but to a much less problematic extent. Serendipitously, locating the arrow symbols at line midpoints creates an intuitively appealing new “bow tie” symbol for country pairs that export to one another. I did not intend to do this, but I liked the result as two triangular arrow heads “sum” into a directional “bow tie” symbol. North American Bow Ties (Refined Copper) For overlapping lines, it was possible to reduce the thickness of the lines in order to better distinguish between individual lines, however reducing the range of symbol sizes also reduces one’s ability to show the range of value in each set of trade flows, so there is a tradeoff here as well. In lieu of an automated function in ArcGIS that would resolve the cluttering problem by determining the optimal positioning and spacing of flow arrows, it seems the next best solution would be to create maps as I have done here and then manipulate them further in a program like Adobe Illustrator. Here are a couple examples of the cluttering problems I was faced with: Clutter in Europe (Steel) Overlapping Lines (Petroleum Imports, Middle East to East Asia) CONCLUSION Aside from the frustrations with “cluttering” documented above, overall I am very satisfied with how the final maps came out. Clutter is largely a problem on a two dimensional, static surface like a poster, but when viewed on a computer it is relatively easy to disambiguate the various flow arrows from one another by zooming in or out – for example, centering the map on continental Europe reveals the web of trade relationships going on there instead of the blobby mass that appears on the poster (compare image below with the one above). I’ve found myself spending a lot of time just looking at these maps and discovering facets of global trade I did not know about, and what I’ve learned here will actually be very useful at my current job as a research tool. Of course, with over 5,000 goods in the HS code system, there are also still many more data sets to look at. Steel in Europe Beyond the detailed instructions covered above for data manipulations, I’d like to re-iterate a few key recommendations for anyone attempting a similar project in the future: 1. You will save yourself a lot of time by following my suggestions above on how to use the quirky UN Comtrade website and avoid “request timeouts.” 2. Format flow arrows to have the arrow head symbol positioned in the middle of the line. 3. For any project with “noisiness” levels comparable to this one, my view is that using graduated symbols rather than graduated colors can limit the noise levels. (Though I am also color-blind and potentially biased on this point.) 4. Get started on the poster earlier than you think you need to! Formatting and layout takes much longer than you might expect even if all your maps are virtually ready. Citations 1. “Visualizing Migration Flows and their Development in Time: Flow Maps and Beyond” by Boyandin et al is the first paper I read that articulated why there are challenges in creating flow maps using automated software processes. It touches on some of the software solutions that have been attempted already and offers plans to develop a different approach to address the shortcomings of past efforts. Importantly for me, it spelled out the reasons that flow maps can result in imperfect visualizations, which allowed me to make sense of why certain problems were occuring and think more clinically about how to address them. “Visualizing Migration Flows and their Development in Time: Flow Maps and Beyond.” Ilya Boyandin, Enrico Bertini, Denis Lalanne. Retrieved May 2013 from http://diuf.unifr.ch/people/lalanned/Articles/Infovis10-DC.pdf 2. The paper and data documentation by Robert Feenstra et al as part of a project with the National Bureau of Economic Research contained valuable guidance about the UN trade data, offering insight into its composition and quality. In particular, this literature provided the key recommendation for using only importing country data in order to eliminate “duplicate” entries by reporter-partner country pairs, and substantiates the assumption that importing country data is typically superior in quality to exporter country data. World Trade Flows: 1962-2000. Robert C. Feenstra, Robert E. Lipsey, Haiyan Deng, Alyson C. Ma, and Hengyong Mo. Retrieved May 2013 from http://cid.econ.ucdavis.edu/data/undata/NBER-UN_Data_Documentation_w11040.pdf 3. The Global Trade and Competitiveness Atlas is a website that allows users to select a specific variable (namely export volumes, or one of several other trade/economic indicators), a particular industry (corresponding to commodities in my project) and a display format (map images or animation). In principle, I would like to create just this, however the actual visualization output of the website is quite poor – it uses polygons rather than flow arrows to represent export volumes for example, and only conveys information about the exporting country of each trade flow. Moenius, Johanes., Xin Zhao. “The Global Trade and Competitiveness Atlas.” Retrieved May 2013 from http://business.redlands.edu/global/maps/ 4. A tool from “The Observatory of Economic Complexity” has similar goals to my project, and in some ways does a better job than I did in visualizing the magnitude of multiple commodity trade flows simultaneously. The visualization formats are tree maps and stacked area charts. The only missing piece here is the geographical referencing – ie an ability to link the data to country export-import pairs on a map using flow arrows. Simoes, Alexander. “The Observatory of Economic Complexity.” Retrieved May 2013 from http://atlas.media.mit.edu/about/ 5. Charles Joseph Minard was a French engineer whose charts are often cited as prime examples of flow maps and good cartographic principles in general. His representation of Napoleon’s march into Russia is somewhat well known. More relevant for my purposes are some of his maps depicting international connectivity, for example his maps of immigration or British Coal Exports. Although he obviously took a very different approach to map-making and had very different tools at his disposal, his work serves as an example of what a good flow arrow visualization can and should ultimately look like. Author Unknown. “Cartographia – Charles Joseph Minard.” Retrieved from https://cartographia.wordpress.com/category/charles-joseph-minard/ 6. The Global Dependency Explorer offers a very different approach to visualizing global trade flows, as a radial graph rather than over a geographical layer. Since this is a GIS class and there is no real geographical data here, I would not attempt something like this, but it is interesting to think about alternatives. One advantage of such an approach is that it avoids some of the common shortfalls with flow arrows (rendering problems, over-cluttering of arrows on top of one another when too much data is on display, etc). This approach still has its shortcomings however – for one, the data is not disaggregated by commodity type, which makes it less interesting. Perhaps more importantly, it is less intuitive to interpret without the geographical references. Author Unknown. “Global Dependency Explorer.” Retrieved May 2013 from http://cephea.de/gde/