# Global trade network analysis with Python: central players in bluefin tuna and large aircraft

Network analysis provides useful insights into complex bilateral trade data. Two methods are presented for calculating with Python each country’s influence in the global trade network for individual goods. Related concepts in graph and international trade theories are discussed.

##### Modern goods have complex trade networks

The things we buy increasingly travel long distances and from scattered origins before they reach us. Take one man’s repeatedly failed attempt to build a toaster (from scratch). Over several decades companies have changed their production techniques, relying on global value chains, for example, to keep costs low. These changes have gradually contributed to long-term growth in global trade.

The more deeply a country becomes involved in global trade, and in global supply chains in particular, the more subjected its economy becomes to changes abroad. This can be good; historically many powerful cities began as ports. However, the potential for higher returns from servicing foreign demand carries with it increased risk of economic contagion.

It is not hard to imagine how global supply chain connections transmit effects from other countries. If a strike in France delays the delivery of a crucial intermediate good, it may cause an assembly line stoppage in Taiwan. On an aggregate level, the results do not necessarily average out, and can result in vast shifts of wealth.

It may therefore prove useful to examine the complex networks of global trade using the tools provided largely by graph theory. As an example, let’s start with a graph of the global trade of tires in 2012.

Each country that exported or imported automobile tires in 2012 is represented above by one node labeled with its three letter country code (for example Germany is DEU). The precise location of a node on the graph is not critical (it is often arbitrary), but generally countries more central to the trade of tires are closer to the center of the network. Likewise, countries are generally graphed near their largest trading partners.

Each trading relationship is shown on the graph as an edge (a line connecting two nodes). If France exports tires to Aruba, the graph will include an edge connecting the two nodes labeled FRA and ABW. Trade network edges are considered directed, as the flow of goods has a direction (either imports or exports).

##### Rat’s nest or Rorschach?

You may look at the above ‘visualization’ and simply see a rat’s nest. This is a correct interpretation. The graph shows overall complexity in the trade network, not individual bilateral relationships (there are more than 4400 edges in this network). Indeed the automobile tire trade network is particularly large and dense. Many countries currently produce internationally competitive tires and all countries use them and import at least some. In fact, the average country imports tires from many other countries. A graph of the resultant trade network is reminiscent of a gray blob and practically as useful.

More useful, however, are individual metrics of network structure. For example, which countries tend to trade only with a select subgroup of other countries? Which goods are traded in networks where one country dominates trade? These questions relate theoretically to the respective graph theory concepts of clustering and centrality.

Let’s take a look at how the Python programming language can be used to measure centrality in trade networks, and discuss two specific measures of centrality.

#### Python for trade network analysis

What follows is a more technical segment with sample code for trade network analysis of using Python 2.7.

###### Let’s start by importing the packages

In [1]:

```import networkx as nx
import csv
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.axes_grid1 import make_axes_locatable
%matplotlib inline```

We will rely heavily on NetworkX and give it the short name nx. Numpy is used to do certain calculations, and matplotlib helps with the visualizations.

##### Load the data and build the network

The example uses cleaned bilateral UN Comtrade trade data for scrap aluminum exports in 2012. The data follow the HS2002 classification system at the six-digit level of aggregation, and are sourced from WITS (subscription required for bulk download). Data are read from a csv file with the equivalent of three ‘columns’: the exporting country code, the importing country code, and the inflation-adjusted US Dollar value of exports in the one year period.

Data from the csv file are read line by line to build the network quickly. NetworkX is used to build the network, which is called G according to convention,  as a series of edges.

###### Read the data and build a network called G

In [2]:

```G = nx.DiGraph() # create a directed graph called G

# Loop reads a csv file with scrap aluminum bilateral trade data
with open('760200_2012.csv', 'r') as csvfile:

csv_f.next()

# Now we build the network by adding each row of data
# as an edge between two nodes (columns 1 and 2).
for row in csv_f:

Let’s look at a specific bilateral trade relationship to verify that the new network, G, is correct. Exports of scrap aluminum from the U.S. to China should be quite large in 2012.

###### Check individual trade flow (edge)

In [3]:

```usachnexp = G.edge['USA']['CHN']['weight']
print 'USA 2012 scrap aluminum exports to China, in USD: ' + str(usachnexp)```
`USA 2012 scrap aluminum exports to China, in USD: 1199682944`
##### Central players can affect the market

Now that the network has been built, we can use indicators from graph theory to identify potential weaknesses and risks in the network’s structure. In this example, we will look for the presence of dominant countries in the trade network. Dominant importers or exporters have the ability to influence supply and demand and therefore price. These dominant countries are highly influential players in the trade network, a characteristic measured in graph theory as centrality.

There are several measures of centrality and two are discussed briefly in this post. The first is eigenvector centrality, which iteratively computes the weighted and directed centrality of a node based on the centrality scores of its connections. The example below scores each importer country as a function of the import-value-weighted scores of its trading partners. That is, an importer is considered influential to a trade network (receives a high eigenvector centrality score) if it imports a lot from countries that are also influential. Mathematically, eigenvector centrality computes the left or right (left is import centrality, right is export centrality) principle eigenvector for the network matrix.

See the NetworkX documentation for questions on the code or this for more details on the the math.

###### Calculate eigenvector centrality of imports

In [4]:

```# Calculate eigenvector centrality of matrix G
# with the exports value as weights
ec = nx.eigenvector_centrality_numpy(G, weight='weight')

# Set this as a node attribute for each node
nx.set_node_attributes(G, 'cent', ec)

# Use this measure to determine the node color in viz
node_color = [float(G.node[v]['cent']) for v in G]```
##### Calculate total exports

Next we calculate each country’s total exports of scrap aluminum in 2012 as the sum total of its individual exports (edges) to other nodes. In the script, total export data is assigned as a node attribute and set aside to be used as the node size in the visualization.

###### Calculate each country’s total exports

In [5]:

```# Blank dictionary to store total exports
totexp = {}

# Calculate total exports of each country in the network
for exp in G.nodes():
tx=sum([float(g) for exp,f,g in G.out_edges_iter(exp, 'weight')])
totexp[exp] = tx
avgexp = np.mean(tx)
nx.set_node_attributes(G, 'totexp', totexp)

# Use the results later for the node's size in the graph
node_size = [float(G.node[v]['totexp']) / avgexp for v in G]```
##### Visualization of the scrap aluminum network

NetworkX works well with matplotlib to produce the spring layout visualization. It is another rat’s nest, but you may notice a different color on one of the medium-sized nodes.

###### Create graph using NetworkX and matplotlib

In [6]:

```# Visualization
# Calculate position of each node in G using networkx spring layout
pos = nx.spring_layout(G,k=30,iterations=8)

# Draw nodes
nodes = nx.draw_networkx_nodes(G,pos, node_size=node_size, \
node_color=node_color, alpha=0.5)
# Draw edges
edges = nx.draw_networkx_edges(G, pos, edge_color='lightgray', \
arrows=False, width=0.05,)

nx.draw_networkx_labels(G,pos,font_size=5)
nodes.set_edgecolor('gray')

plt.text(0,-0.1, \
'Node color is eigenvector centrality; \
Node size is value of global exports', \
fontsize=7)
plt.title('Scrap Aluminum trade network, 2012', fontsize=12)

# Bar with color scale for eigenvalues
cbar = plt.colorbar(mappable=nodes, cax=None, ax=None, fraction=0.015, pad=0.04)
cbar.set_clim(0, 1)

# Plot options
plt.margins(0,0)
plt.axis('off')

# Save as high quality png
plt.savefig('760200.png', dpi=1000)```
##### Central players on the demand side: scrap aluminum and bluefin tuna

The graph above shows plenty of large exporters (the large nodes) of scrap aluminum in 2012, including the US, Hong Kong (HKG), and Germany (DEU). The demand of one country in the network, however, actually dominates the market. In 2012, the booming Chinese economy was purchasing large quantities of industrial metals, including scrap metals. The surge in demand from China was enough to cause global price increases and lead to increased levels of recycling. Since 2012, however, Chinese imports of scrap aluminum have nearly halved, as has the market price of aluminum. The recent boom-and-bust cycle in scrap aluminum prices has a single country of origin but global ripples; the downturn generates domestic consequences for the large exporters and reduces the financial incentives for recycling.

The central influence of China in the 2012 scrap aluminum trade network is captured by its high eigenvector centrality score (node color in the graph above). We can also easily infer the out sized influence of China from a more simple measure–the high value of its imports relative to other countries. Centrality metrics, of which there are many, often prove useful in nuanced cases.

Another example of a central influence on a trade network can be found in Japanese demand for bluefin tuna. As shown below, Japan has very high eigenvector centrality for imports of this key ingredient in many sushi dishes.

Australia (AUS) dominates bluefin tuna exports, but by eigenvector import centrality Japan (JPN) is the influential player in the market. The first Tokyo tuna auction of 2013 saw one fish fetch a record 1.76 million USD. Indeed in 2012 Japan imported more than 100 times as much bluefin tuna as the second largest importer, Korea.

Like scrap aluminum, the story here follows the familiar boom-and-bust cycle; prices for bluefin tuna have returned to lower levels since 2012. The structure of the trade network, with one central player, introduces a higher level of price volatility. During a downturn in prices, this transmits financial consequences to fishermen throughout the world.

##### Supply-side influential players: large aircraft production

Trade network analysis can also help to identify influential exporters of goods. Cases that come to mind are rare earth minerals found only in certain countries, or large and complex transportation equipment. Commercial aircraft manufacturers, for example, are limited (unfortunately this may have more to do with subsidies than limited supply of technological prowess). Very large aircraft production is dominated by two firms: Airbus, with production sites primarily in France and Germany, and U.S. competitor, Boeing.

Instead of using eigenvector centrality to measure the influence of each exporting country in the large aircraft global trade network, let’s use a more simple method called outdegree centrality. We compute outdegree centrality for each country, $i$, as its number of outgoing (exporting) connections, $k^{out}_i$, divided by the total number of possible importers, $(v - 1)$:

$c^{out}_D (i) = k^{out}_i / (v - 1)$.

You can think of this measure as the share of importers that are serviced by each exporter. Nodes with a high outdegree centrality are considered influential exporters in the network.

###### Calculate outdegree centrality

In [7]:

`oc = nx.out_degree_centrality(G) # replaces ec in the above`

As expected, France, Germany, and the U.S. receive high outdegree centrality scores. There simply aren’t many alternative countries from which to buy your large aircraft. Beyond lack of choice for buyers, central exporters in a trade network may introduce (or represent) vulnerability and barriers to competition.

##### Network structure and (preventing) domestic consequences

Global trade is increasingly complex. Open economies are vulnerable to supply and demand shocks from the other countries in their trade network. The structure of the trade network itself determines in part the level of vulnerability and how and where supply and demand shocks may be transmitted. Certain trade networks, such as those for scrap aluminum or bluefin tuna, face dominant consumers and additional price volatility. Networks can also be subject to supply-side market structure issues, such as the virtual duopoly with very large aircraft.

Hindsight makes bubbles more visible; we easily find the previously missed warning signs once we know where to look. Decision makers aim for early detection of vulnerabilities, but face a geographically growing set of possible sources. Network analysis tools, such as centrality, can be applied to existing sets of complex bilateral trade data to provide new insight in the search for today’s warning signs. Such nontraditional tools may prove increasingly useful in a world where an individual is not capable of building a toaster from scratch, yet they sell down the street for \$11.99.

###### For fun:

The Toaster Project

###### Some references and further reading on networks and graph theory:

Easley and Kleinberg (2010) Networks, Crowds, and Markets: Reasoning about a Highly Connected World

Jackson (2010) Social and Economic Networks

De Benedictis (2013) Network Analysis of World Trade using the BACI-CEPII dataset

Nemeth (1985) International Trade and World-System Structure: A Multiple Network Analysis

###### Some python related resources:

Anaconda distribution for python

NetworkX

ComputSimu: Networks 1: Scraping + Data visualization + Graph stats (more useful code here)

## 37 thoughts on “Global trade network analysis with Python: central players in bluefin tuna and large aircraft”

1. Jaeho Jung says:

Hi, I wonder whether you could share the csv data you used?

Thank you.

sincerely,
Jaeho

Like

2. Mingzhu says:

I wonder how did you obtain these outdated info, very helpful though. Good Python analysis.

Like

3. really good article. also woudl like to know if you coudl share the csv so that we could try to replicate the results.

for row in csv_f:

is the above ‘row’ supposed to be column? i understand that the weight is the value of import value.

many thanks

Liked by 1 person

1. penyelidik says:

thank you very much! i will try and post any relevant feedback. i think now the networkx app has also been updated so a few minor update on the code could be necessary. thanks again

Like

4. Prerna Pandey says:

Hi! I just wanted to ask one thing- regarding # Use this measure to determine the node color in viz
node_color = [float(G.node[v][‘cent’]) for v in G]
So when I am implementing it, it throws an error that ‘Graph’ object has no attribute ‘node’. I then make it nodes following a stack overflow suggestion. However, next it throws an error {KeyError: ‘cent’}. My data is similar to yours in the sense that it is a trade data only but inter state instead of countries.

Like

1. The networkx package has changed. In this case, the graph object G no longer has a method called “node”. Also the options for setting a node attribute have changed.

First, to set the eigenvector centrality as a node attribute, you’d use:

nx.set_node_attributes(G, ec, name=’cent’)

Then to create a list of the centrality values to convert to node colors, you’d use:

node_color = [float(G.nodes[v][‘cent’]) for v in G.nodes]

Like

1. Prerna Pandey says:

Thanks a lot! It worked. However, similar error was shown when I used –
tx=sum([float(g) for exp,f,g in G.out_edges_iter(exp, ‘weight’)])
So, I corrected it by replacing G.out_edges_iter with G.out_edges and the code ran. Is this also because of the version difference? I just want to confirm whether it is correct. Thanks!

Like

2. Prerna Pandey says:

Also, could you please explain this code- as in why take the mean
# Blank dictionary to store total exports
totexp = {}

# Calculate total exports of each country in the network
for exp in G.nodes():
tx=sum([float(g) for exp,f,g in G.out_edges_iter(exp, ‘weight’)])
totexp[exp] = tx
avgexp = np.mean(tx)
nx.set_node_attributes(G, ‘totexp’, totexp)

# Use the results later for the node’s size in the graph
node_size = [float(G.node[v][‘totexp’]) / avgexp for v in G]

I am getting a- “RuntimeWarning: divide by zero encountered in double_scalars” and the output is just an empty box.
Further, the final plot command isn’t showing any graph.

Like

2. Yes, you are right to use G.out_edges instead of G.out_edges_iter. There are several changes to the networkx package since the original blog post.

Like

1. Prerna Pandey says:

Thanks. Just one last doubt (hopefully) regarding total exports code. Kindly clarify that. I’ve posted it above. Thanks a lot!

Like

3. This old website won’t let me reply to your nested comment directly, but re: the question about average exports, it looks like an error. The idea was to to normalize the size of each node by dividing each country’s total exports of a product by the average exports of the product across all countries.

But that’s not what the code is doing, so it is likely a error. The correct code to normalize the values would be to use the average exports across all exporters. The code looks like its just returning the average across all trading partners for the last exporter in the loop.

With the runtime error, it’s a python error that likely means the value for ‘avgexp’ is zero. If you use the average across all exporters, it should work.

Like

1. Prerna Pandey says:

What if I don’t normalize at the first place? Like, I just took the total exports . However, the output of the graph I am getting is wrong- it’s a like a violet square with no nodes. I followed your codes except for the following-
I did this –
1.) G = nx.from_pandas_edgelist(df,source=’state’,target=’state_d’, edge_attr= ‘railway_cargo’, create_using= nx.DiGraph()) instead of what you did i.e.
G = nx.DiGraph()
and
2.) for index, row in df.iterrows():
# Loop reads a csv file with scrap aluminum bilateral trade data
with open(‘760200_2012.csv’, ‘r’) as csvfile:
csv_f.next()
# Now we build the network by adding each row of data
# as an edge between two nodes (columns 1 and 2).
for row in csv_f:

Do you think this made any difference? Since the codes ran smoothly except for the version difference ones, I am not sure where is the mistake. In my data, state is the exporting state, state_d is the exporting state and railway_cargo is the trade volume- exports from state to state_d.

Like

2. Yes, it sounds like normalizing the values will resolve it. Try to divide railway cargo by some constant so that the values are around 1-100. My guess is that the values are too big so you are just seeing a zoomed in picture of one node.

Like

5. Prerna Pandey says:

Hey! that normalization thing worked, I just divided the total exports of each state by the average as in: / followed by the number itself instead of np.mean(tx). Thank you so much Brian!!
I just have a small doubt- so I get the concept of eigenvector centrality like it measures one’s connection to those who are highly connected. Can you please explain the intuition of taking eigenvector centrality of imports. I am confused because as per the code, ec = nx.eigenvector_centrality_numpy(G, weight=’weight’), the weight is the third column of the csv i.e. the exports, right? This is accounting for the node colour , so how come we are taking imports to calculate the eigenvector centrality?

Like

1. Glad it worked. The intuition is G is a directed graph and eigenvector centrality looks at inward directed connections. The term exports is confusing perhaps.

Like

1. Prerna Pandey says:

I had a look at your csv. I meant the variable is capturing exports from country in column 1 to country in column 2 – “Data are read from a csv file —————- inflation-adjusted US Dollar value of export” but since you are saying that directed graph accounts for the inward connections, I am guessing it is wrt to the importing country only i.e. the second column as per your csv file.

Like

2. From the networkx documentation: “For directed graphs this is “left” eigenvector centrality which corresponds to the in-edges in the graph.”

The data are labeled exports because it’s the convention to use exports from the exporter country when looking at trade between two countries. The exports to country B reported by country A often don’t match the imports from country A reported by country B, even though they should match. So the convention in this strand of research (at least at the time) was to use the exports reported by the exporter country. But I see why it is confusing in this context.

Like

6. Prerna Pandey says:

Also, is there any way to scale up the node size for all the nodes so that the difference due to the exports is retained but the overall size has increased ?? The argument for node size has already been specified as size of exports. My graph is a bit cluttered and if I am setting the figure size by #from pylab import rcParams
#rcParams[‘figure.figsize’] = 19, 16 then the dimensions increased but not the node size . In short, the final figure as per the above codes is quite small. Thanks.

Like

1. You could take the natural log of the values or something like that, to normalize them.

Like

1. Prerna Pandey says:

Will I lose out on something if I take natural log instead of dividing by the mean?

Like

2. It would mean the node sizes are interpreted as natural logs. But anything that changes the ratio of node sizes would change the interpretation. Natural logs can be useful for normalizing values where the unit of measurement is a currency or the distribution is assumed to be lognormal. It’s a reasonably common approach to normalizing the values.

Like

7. Prerna Pandey says:

Hey! I just did a small adjustment. So the node size is based on the total exports and I did normalize it. So I just ran this code-
node_size = [float((G.nodes[v][‘totexp’]/646389.96891)*6)for v in G.nodes] where 646389.96891 is the average exports and I did *3, *4……….*6, so as to scale up the sizes of all the nodes. Is it okay? As far as I can judge, there isn’t any stark difference wrt to the previous figure I had. Like the patterns are the same. I just want to confirm if there is no glaring error if I multiply the entire thing by a scalar.

Like

1. Yea that works and is better than using natural log. There’s no change in interpretation.

Like

1. Prerna Pandey says:

Great!! Thanks a lot for your help. As far as I searched, there has been very little work in trade wrt visualizations in networkx. I couldn’t find good documentations. Yours was the only one that actually helped. It’s a part of a research paper I am writing with 2 others. So, is there any specific way in which I should cite this blog page? Thanks.

Like

2. Glad it helped. Good luck with the research paper. The blog post isn’t peer reviewed or really academic, so I don’t think you need to cite it. It’s more like IT support.

Like

8. Prerna Pandey says:

Okay! Cool. Thanks 🙂

Like

9. Efthymiosxyl says:

Hello Brian, do you know how i can fix the AttributeError: ‘Graph’ object has no attribute ‘out_edges’
in tx=sum([float(g) for exp,f,g in G.out_edges(exp, ‘weight’)]) ?

Like

10. Efthymios Xyl says:

Hello Brian, i am doing a research for global trade and the only documentation for such networks in python is yours. Great work!!
Could you please help me find out why when i try to run a specific part of the code has the error “AttributeError: ‘Graph’ object has no attribute ‘out_edges’ ” and what i can do to fix it?

The error is presented in the following code

# Blank dictionary to store total exports
totexp = {}
# Calculate total exports of each country in the network

for exp in G.nodes():
tx=sum([float(g) for exp,f,g in G.out_edges(exp, ‘weight’)])
totexp[exp] = tx
avgexp = np.mean(tx)
nx.set_node_attributes(G, ‘totexp’, totexp)

Like

1. Thanks for the kind feedback. I’m traveling today but can take a look tonight or tomorrow. The comments on the post from Prerna might help in the meantime. Also to have directed (out) edges the graph G would need to be a directed graph (DiGraph) so perhaps that is it. Last thing to check in the meantime is the networkx documentation in case something changed. But I’ll get back to you when I get a chance to look more closely.

Like

2. Ok, back at my computer… the blog post is outdated as networkx has changed since it was written, and there are some small errors, but I’m not sure exactly what is causing the error message that you’re seeing. I ran the code here (https://nbviewer.jupyter.org/github/bdecon/econ_data/blob/master/micro/Trade_Network_example.ipynb) using this data (https://github.com/bdecon/econ_data/blob/master/micro/440710.csv) using networkx 2.5.1 and it worked for me. Could you take a look and let me know if you still get an error?

Like

1. Efthymios Xyl says:

Thats works perfectly fine, your work is really valuable and thanks for the feedback!! Could you please also explain how can someone find the optimal position regarding the >>pos = nx.spring_layout(G,k=30,iterations=8), in order to achieve a fixed network without significant fluctuations. Thanks again!!

Like

11. Efthymios Xyl says:

Hello again Brian!! After spending some time on reviewing the python script regarding many different databases i think that i came to a controversial realisation. More specifically in the section where you create the node size, it seems that for some datasets with not a large amount of observations when you divide with avgexp, the visualization has nodes that do not utilize the concept of centrality, resulting to nodes with the same colour. Nevertheless, when i used the natural log normalization, the result was way more precise and the centrality concept was way more clear in terms of the color variations. So, at this point is it a necessary to inspect the dataset and try to find the best normalization fit or do you thing that there is something that i am missing in the analysis?

# Use the results later for the node’s size in the graph
totexp = [float(G.nodes[v][‘totexp’]) for v in G.nodes]
avgexp = sum(totexp) / len(totexp)
node_size = [20 * float(G.nodes[v][‘totexp’]) / avgexp for v in G.nodes]