Final Paper

advertisement
Michael Schlauch
GIS – Fletcher School
Parmenter, Florance
May 2013
VISUALIZING GLOBAL TRADE – FINAL PROJECT PAPER
PROJECT DESCRIPTION
The purpose of this project was to visualize global trade through the lens of individual commodities and
value chains. Unlike many other projects, my objective was not so much to answer a specific research
question as it was to see the data first before starting to ask questions about it. Global trade is a
complex and interdependent system which I thought would be both challenging and rewarding to
visualize – and the trade in primary commodities (minerals, agricultural goods, energy) is also an area of
interest for me. From the beginning, I had in mind some of the maps and information graphics made by
Charles Joseph Minard as an overall aesthetic to be working towards. These maps depict geographical,
numerical and directional data in a single image using flow arrows – a form ideally suited for
representing trade between countries.
Example Map: Raw and Refined Sugar
DATA SOURCES
The canonical data set on global trade in commodities is the UN’s Commodity and Trade Statistical
Database (Comtrade). It contains annual data from 1962 on trade as reported by UN member countries,
detailing for each record the traded good (classified by HS code), the value and weight of the trade, the
reporting country, and the partner country.1 The database
can be queried from comtrade.un.org using a number of
parameters which are discussed in further detail below.
The reporting country is the country whose national statistical
agency is reporting on the trade, and could be the exporting
or importing country depending on the traded good. For this
reason, Comtrade inherently has “duplicate” entries in the
sense that for every trade flow there is one record reported
by the importing country (e.g. the US reported import of 100
cars from Japan) and one record reported by the exporting
country (e.g. Japan reported export of 100 cars to the US).
Secondary literature on the UN data set suggests that the
data reported by the importing country tends to be more
accurate, so for the purposes of this project, I eliminated the
problem of duplicate entries by restricting my analysis to the
importing country data only. There is an option on Comtrade
website to include only “Imports” (or “Exports”) in the data request. On the Comtrade site, there is also
an option to select which countries to include in your query – I selected “All” in each case.
HS codes come from the Harmonized Commodity Description and Coding system, an internationally
standardized system used by customs and other government agencies to classify over 5,000 different
traded goods. The system is relatively straight forward for a data consumer to navigate in order to find
particular commodities of interest (although it is sometimes notoriously difficult to classify your goods if
you are trying to do business and get your shipment through customs). The system uses a hierarchy of
two-digit parent codes, and 4-6 digit child codes to classify goods.
The Comtrade system contains data on all trade reported by countries, so it is necessary to filter out
trade flows of smaller volume in order to reduce clutter on the maps. As can be seen from the poster,
the value parameter filter was set at a variety of different levels in order to get an appropriate number
of records for importing into ArcGIS. Generally, the goal was to have the query return about 25-50
records per commodity. It was also possible to have a larger number of records returned in the query
and then delete any unwanted records in Excel after the data had been exported from Comtrade. In
retrospect, this latter method allowed for more control of the process, although I only started using it
about halfway through the project.
Importantly, there are a few tips for working with the Comtrade website, which is somewhat prone to
bugs and request timeouts. Firstly, it operates best on a Windows machine with either Internet Explorer
or Firefox as browser. Secondly, for the purposes of this project, the most useful way of accessing the
data is through the Basic Selection option under the Data Query tab. Most importantly, users should
know that after submitting their query, a new page will open up allowing Direct Download of the results
1
There are other fields as well, but these are the most relevant and useful ones, particularly for this project.
in Excel format. After downloading the results, it is tempting to use the Modify Selection option, which
in theory would allow you to return to your query with your previously entered parameters still saved.
However, my experience with this option was that the website seems to have a bug with storing the
values from a previous query and upon requesting a new query the request would usually stall and then
timeout. Instead of using the Modify Selection option it is better to start over with a blank query by
going back to the Basic Selection option under the Data Query tab. Prior to discovering this, much time
was wasted waiting for queries to finish only to have found they had failed.
DATA PREPARATION
Data is exported from Comtrade in Excel/CSV format including all attributes already mentioned above.
In the UN exported data format, country codes of up to 3 digits are used instead of country names, and
a metadata file that cross-references these codes with country names and ISO codes is available on the
Comtrade website. The first step in data preparation was to link these Country_Code fields with
geographical data, which would ultimately serve as the start and end points of the flow arrows in the
final maps. To do this, I began by downloading country boundaries shapefiles from ArcGIS Online.
Initially, I added two new fields to the attribute table of the shapefile and calculated XY centroids for
each country polygon (one X centroid, one Y centroid). However, this method was not ideal since it
resulted in points that were located in odd locations, even though technically they were “centered” on
the country (think of how Alaska and Hawaii would skew the US centroid to the western part of the
continental land mass). Instead, I ended up using a “Country Boundaries” shapefile from ArcGIS online
that already had points associated with each country at locations that made more sense visually.
This shapefile therefore contained my geographical data in latitude and longitude format which I would
need to join with the Comtrade data. I first renamed this shapefile to “Country_Reference” and
exported its attribute table to Excel. In addition to the lat-long attributes, this exported
“Country_Reference” table also include ISO-2 digit codes corresponding to each country. Next, I added
a column to the Excel table for “UN_Country_Code” and joined it with the Comtrade metadata file on
ISO_Code in order to populate the new column with UN_Country_Code data. A few manipulations were
necessary in order for this join (actually a VLOOKUP) to work properly due to some duplicate entries and
discrepancies between the shapefile and UN data:







Deleted “India excl Sikkah” row from Comtrade metadata table
Deleted “USA (before 1981)” row from Comtrade metadata table
Added “EU-27” with Country_Code and Germany’s coordinates
Added Hong Kong Country_Code and coordinates
Changed Ethiopia Country_Code from 230 to 231
Changed Panama Country_Code from 590 to 591
Changed Yemen Country_Code from 886 to 887
Several rows for small countries and territories were also edited (e.g. Puerto Rico and Guam were recoded as US rather than separate entities), however this probably had no effect on final results as the
trade volumes for these places are too small.
The cleaned data in this “Country_Reference” Excel file therefore served as my reference table for
linking all UN data with the geographical attributes I would need to create flow arrows in ArcGIS.
For each data set exported from UN Comtrade, the steps were as follows:





Rename column headings from UN data to be consistent with ArcGIS conventions (no spaces,
etc). See below.
Add 5 columns to each Excel table, with headings titled respectively: Report_X, Report_Y,
Partner_X, Partner_Y, and ID. The first four column headings corresponded to the latitude and
longitude coordinates of each country-pair in each trade flow. The ID column was used as a
unique reference that would be used later for re-joining each record in the shapefiles with its
trade value and weight data.
Copy and paste VLOOKUP function into the first 4 columns which would populate each of these
with latitude and longitude data from the “Country_Reference” table based on the
UN_Country_Code field.
o In many cases, this function will generate errors when it cannot locate a corresponding
country record based on the UN_Country_Code. This is mostly due to the fact that
some of the UN commodity data records are for trade flows where the country of origin
is unknown. Since it is impossible to associate these trade records with geographical
locations they were simply removed from the data set.
Add sequential list of unique IDs for each record in the ID column. [1,2,3…]
Save as “Commodity_Year_TradeValueFilter”and Close. We’ll call these excel files generically
the “Commodity Tables” for the remainder of this paper.
Next, we were finally ready to use the XY to Line tool from ArcGIS. The XY to Line tool is located under
Data Management Tools \ Features and asks for the following arguments:


Input table: The relevant tab from the Excel “Commodity Tables” described above.
Output Feature Class: Location to save shapefile generated by XY to Line tool.







Start X Field: Partner_X field from “Commodity Tables.” (Since all data is as reported by the
importing country, the “Partner Country” corresponds to the “Start” of the trade flow.)
Start Y Field: Partner_ field from “Commodity Tables.”
End X Field: Report_X from “Commodity Tables.”
End Y Field: Report_Y from “Commodity Tables.”
Line Type (optional): I preferred the look of “RHUMB” lines.
ID (optional): ID field from “Commodity Tables.” This field is not “optional” for this project since
it is required for joining the trade value attributes back to the shapefile afterwards.
Spatial Reference (optional): GCS_WGS_1984. The tool will use your current projection as
default, which can cause problems since the latitude and longitude data requires a Geographic
Coordinate System rather than a Projected Coordinate System.
After hitting OK, the process will run a few moments and then generate a new shape file. Next, it is
necessary to re-join the attribute table of the new shapefile to the original trade data from the
“Commodity Tables.” To do this, open the attribute table of the new shapefile and Join with the
“Commodity Tables” data on the basis of the ID field. This will “re-unite” the trade value and all other
UN data with the shapefile lines, and allow one to use this data in setting the formatting (e.g. graduated
symbols) of the flow arrows, e.g. based on USD value of the trade in this case.
Finally, the last stage is to format the resulting shapefile. I experimented with a number of different
options here and ultimately found that graduated symbols rather than graduated colors created a more
visually appealing map. For most commodities, I used line widths of range 0.1 to 4, and arrow symbols
(arrow heads) of 10 width and 6 height. More on this below in the Discussion section.
DISCUSSION
The XY to Line tool was able to meet my requirements at a basic level, but it does have some limitations.
In general, creating flow maps using automated software processes is imperfect due to the fact that
flow maps often do not scale well, and result in “cluttered” images with multiple overlaying lines and
arrow heads. This is a problem acknowledged repeatedly in the GIS and information graphics literature,
and the XY to Line tool is no exception in falling victim to it as well.
As can be seen from some of the maps on my poster, when the trade in certain commodities is
geographically concentrated, this can lead to multiple flows “piling up” on one another. Major hubs like
China and Brazil may see overlapping lines, and trade flows between various European countries can
become unreadable when viewed from a global level. Initially, my flow arrows positioned the arrow
head symbols at the end of each line, making this problem particularly bad at major importing hubs, for
example for many commodities going into China. However, this problem was partially solved by locating
the arrow heads at the middle of the line. Occasionally, these would also overlap, but to a much less
problematic extent. Serendipitously, locating the arrow symbols at line midpoints creates an intuitively
appealing new “bow tie” symbol for country pairs that export to one another. I did not intend to do
this, but I liked the result as two triangular arrow heads “sum” into a directional “bow tie” symbol.
North American Bow Ties (Refined Copper)
For overlapping lines, it was possible to reduce the thickness of the lines in order to better distinguish
between individual lines, however reducing the range of symbol sizes also reduces one’s ability to show
the range of value in each set of trade flows, so there is a tradeoff here as well.
In lieu of an automated function in ArcGIS that would resolve the cluttering problem by determining the
optimal positioning and spacing of flow arrows, it seems the next best solution would be to create maps
as I have done here and then manipulate them further in a program like Adobe Illustrator.
Here are a couple examples of the cluttering problems I was faced with:
Clutter in Europe (Steel)
Overlapping Lines (Petroleum Imports, Middle East to East Asia)
CONCLUSION
Aside from the frustrations with “cluttering” documented above, overall I am very satisfied with how the
final maps came out. Clutter is largely a problem on a two dimensional, static surface like a poster, but
when viewed on a computer it is relatively easy to disambiguate the various flow arrows from one
another by zooming in or out – for example, centering the map on continental Europe reveals the web
of trade relationships going on there instead of the blobby mass that appears on the poster (compare
image below with the one above). I’ve found myself spending a lot of time just looking at these maps
and discovering facets of global trade I did not know about, and what I’ve learned here will actually be
very useful at my current job as a research tool. Of course, with over 5,000 goods in the HS code
system, there are also still many more data sets to look at.
Steel in Europe
Beyond the detailed instructions covered above for data manipulations, I’d like to re-iterate a few key
recommendations for anyone attempting a similar project in the future:
1. You will save yourself a lot of time by following my suggestions above on how to use the quirky
UN Comtrade website and avoid “request timeouts.”
2. Format flow arrows to have the arrow head symbol positioned in the middle of the line.
3. For any project with “noisiness” levels comparable to this one, my view is that using graduated
symbols rather than graduated colors can limit the noise levels. (Though I am also color-blind
and potentially biased on this point.)
4. Get started on the poster earlier than you think you need to! Formatting and layout takes much
longer than you might expect even if all your maps are virtually ready.
Citations
1. “Visualizing Migration Flows and their Development in Time: Flow Maps and Beyond” by
Boyandin et al is the first paper I read that articulated why there are challenges in creating flow
maps using automated software processes. It touches on some of the software solutions that
have been attempted already and offers plans to develop a different approach to address the
shortcomings of past efforts. Importantly for me, it spelled out the reasons that flow maps can
result in imperfect visualizations, which allowed me to make sense of why certain problems
were occuring and think more clinically about how to address them.
“Visualizing Migration Flows and their Development in Time: Flow Maps and Beyond.” Ilya
Boyandin, Enrico Bertini, Denis Lalanne. Retrieved May 2013 from
http://diuf.unifr.ch/people/lalanned/Articles/Infovis10-DC.pdf
2. The paper and data documentation by Robert Feenstra et al as part of a project with the
National Bureau of Economic Research contained valuable guidance about the UN trade data,
offering insight into its composition and quality. In particular, this literature provided the key
recommendation for using only importing country data in order to eliminate “duplicate” entries
by reporter-partner country pairs, and substantiates the assumption that importing country
data is typically superior in quality to exporter country data.
World Trade Flows: 1962-2000. Robert C. Feenstra, Robert E. Lipsey, Haiyan Deng, Alyson C.
Ma, and Hengyong Mo. Retrieved May 2013 from
http://cid.econ.ucdavis.edu/data/undata/NBER-UN_Data_Documentation_w11040.pdf
3. The Global Trade and Competitiveness Atlas is a website that allows users to select a specific
variable (namely export volumes, or one of several other trade/economic indicators), a
particular industry (corresponding to commodities in my project) and a display format (map
images or animation). In principle, I would like to create just this, however the actual
visualization output of the website is quite poor – it uses polygons rather than flow arrows to
represent export volumes for example, and only conveys information about the exporting
country of each trade flow.
Moenius, Johanes., Xin Zhao. “The Global Trade and Competitiveness Atlas.” Retrieved May
2013 from http://business.redlands.edu/global/maps/
4. A tool from “The Observatory of Economic Complexity” has similar goals to my project, and in
some ways does a better job than I did in visualizing the magnitude of multiple commodity trade
flows simultaneously. The visualization formats are tree maps and stacked area charts. The
only missing piece here is the geographical referencing – ie an ability to link the data to country
export-import pairs on a map using flow arrows.
Simoes, Alexander. “The Observatory of Economic Complexity.” Retrieved May 2013 from
http://atlas.media.mit.edu/about/
5. Charles Joseph Minard was a French engineer whose charts are often cited as prime examples of
flow maps and good cartographic principles in general. His representation of Napoleon’s march
into Russia is somewhat well known. More relevant for my purposes are some of his maps
depicting international connectivity, for example his maps of immigration or British Coal
Exports. Although he obviously took a very different approach to map-making and had very
different tools at his disposal, his work serves as an example of what a good flow arrow
visualization can and should ultimately look like.
Author Unknown. “Cartographia – Charles Joseph Minard.” Retrieved from
https://cartographia.wordpress.com/category/charles-joseph-minard/
6. The Global Dependency Explorer offers a very different approach
to visualizing global trade flows, as a radial graph rather than over
a geographical layer. Since this is a GIS class and there is no real
geographical data here, I would not attempt something like this,
but it is interesting to think about alternatives. One advantage of
such an approach is that it avoids some of the common shortfalls
with flow arrows (rendering problems, over-cluttering of arrows
on top of one another when too much data is on display, etc).
This approach still has its shortcomings however – for one, the
data is not disaggregated by commodity type, which makes it less
interesting. Perhaps more importantly, it is less intuitive to
interpret without the geographical references.
Author Unknown. “Global Dependency Explorer.” Retrieved May 2013 from
http://cephea.de/gde/
Download