Updated November 16, 2016
Introduction
The International Monetary Fund (IMF) Statistics Department (STA) allows API access to their economic time series. Well-known datasets such as International Financial Statistics (IFS) can be machine read through the API. This example will use Python to retrieve Direction of Trade Statistics (DOTS) data from STA’s JSON RESTful Web Service so that we can determine the United States’ share of world exports over the past 50 years.
The IMF knowledge base provides more information on the three avaiable API formats and IMF data services. For more information on the work of STA, see their PDF annual report (PDF), STA at a glance 2015.
Gathering series and dimension information
First, we will need to import the requests and pandas libraries. These will allow us to read json data, open urls, and request information from the web.
Libraries
# Import libraries
import requests
import pandas as pd
Since we are using the JSON RESTful API, we start by using the ‘Dataflow’ endpoint URL to look at what series are available and find the series id of interest. The full output is long, so I’ve removed the data unrelated to this example. The IMF has many more series than what is shown below.
Find Series Name
# Find the series id and text name.
url = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/Dataflow/'
seriesids = requests.get(url).json()
df = pd.DataFrame(seriesids['Structure']['Dataflows']['Dataflow'])
for x in range(6, 13):
items = (str(df['@id'][x]), str(df['Name'][x]['#text']))
print ': '.join(items)
We found above that the id for Direction of Trade Statistics is DOT. We can use this id to read notes about the series. We will next need to identify the dimensions of the data. For example, direction of trade data is based on a home country a flow and measure and a counterpart country. The data also has multiple frequencies and units of measurement. All of this information will be needed to later make our data request.
Find Series Details and Description
# Annotations for the series
url = "http://dataservices.imf.org/REST/SDMX_JSON.svc/DataStructure/DOT"
dotstruct = requests.get(url).json()
df = pd.DataFrame(dotstruct['Structure']['KeyFamilies']\
['KeyFamily']['Annotations'])
for x in range(0, 7):
items = (str(df['Annotation'][x]['AnnotationTitle']), \
str(df['Annotation'][x]['AnnotationText']['#text']))
print ': '.join(items)
Find Series Dimensions
# Look at structure of DOTS data to find the dimensions for our data request
url = "http://dataservices.imf.org/REST/SDMX_JSON.svc/DataStructure/DOT"
dotstruct = requests.get(url).json()
df = pd.DataFrame(dotstruct['Structure']['KeyFamilies']['KeyFamily']\
['Components']['Dimension'])
for x in range(0, 4):
items = ("Dimension", str(x+1), str(df['@codelist'][x]))
print ': '.join(items)
We can now copy the code for each dimension into the CodeList Method to get the list of possible values. For example, we will need to identify the value of the second dimension, CL_AREA_DOT, for the United States. Below, we show that the code is US. I’ve manually placed the index number for the U.S. and World codes (again to save space), however, you can replace [200, 248] with [0, 248] to get the full list of country/area codes.
Find Country Codes
# Obtain country codes
url = "http://dataservices.imf.org/REST/SDMX_JSON.svc/CodeList/CL_AREA_DOT"
country = requests.get(url).json()
df = pd.DataFrame(country['Structure']['CodeLists']['CodeList']['Code'])
for x in [200, 248]:
items = (str(df['@value'][x]), str(df['Description'][x]['#text']))
print ': '.join(items)
The series ID is DOT and the country codes (we will use this with the exporting country, CL_AREA_DOT, and the counterpart, CL_COUNTERPART_AREA_DOT) of interest are W00 for world and US for the US. We see below that the indicator of interest is TXG_FOB_USD, Goods, Value of Exports, Free on board (FOB), US Dollars.
Find Column IDs
# Obtain series info and ids
url = "http://dataservices.imf.org/REST/SDMX_JSON.svc/CodeList/CL_INDICATOR_DOT"
series = requests.get(url).json()
df = pd.DataFrame(series['Structure']['CodeLists']['CodeList']['Code'])
for x in range(0, 4):
items = (str(df['@value'][x]), str(df['Description'][x]['#text']))
print ': '.join(items)
We repeat the above steps for each dimension and record which series values are of interest to us.
Retrieving Data
The guide to STA’s API shows how we can combine information from the previous steps to call and retrieve data. For direction of trade statistics, we see that the dimensions are as follows:
- Dimension 1: CL_FREQ (the frequency of the data–we want to use monthly data) – M
- Dimension 2: CL_AREA_DOT (the primary country) – US
- Dimension 3: CL_INDICATOR_DOT (the measure–we want to look at exports free of board) – TXG_FOB_USD
- Dimension 4: CL_COUNTERPART_AREA_DOT (the counterpart country) – W00
The JSON RESTful API method for requesting the data is the CompactData Method. The format for putting together dimension and time period information is shown on the Web Service knowledge base as:
http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/{database ID}/ {item1 from dimension1}+{item2 from dimension1}{item N from dimension1}.{item1 from dimension2} +{item2 from dimension2}+{item M from dimension2}? startPeriod={start date}&endPeriod={end date}
Putting all of this information together, the URL to retrieve a JSON dictionary for 1966-2016 US exports to the world data is:
The python code which gets the data and saves it as a dictionary is as follows:
Request data from IMF API
url = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/DOT/M.US.TXG_FOB_USD.W00.?startPeriod=1981&endPeriod=2016'
data = requests.get(url).json()
usexp = pd.DataFrame(data['CompactData']['DataSet']['Series']['Obs'])
usexp.columns = ['status', 'usexports','date'];
usexp.tail()
We can repeat the above code with a different URL to obtain data on total world exports and the exports of other countries which we may want to compare with the United States. We combine the request for several series into one URL, by adding ‘+code2+code3’. For example, ‘US+JP+CN.TXG..’
Example of request with multiple columns
ourl = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/DOT/M.US+CN+JP+W00.TXG_FOB_USD.W00.?startPeriod=1972&endPeriod=2016'
odata = requests.get(ourl).json();
Cleaning the dataframe and naming rows
wexp = pd.DataFrame(odata['CompactData']['DataSet']['Series'][0]['Obs'])
wexp.columns = ['status','wexports','date']
del wexp['date']
del wexp['status']
chexp = pd.DataFrame(odata['CompactData']['DataSet']['Series'][1]['Obs'])
chexp.columns = ['status', 'chexports','date']
del chexp['date']
del chexp['status']
jpexp = pd.DataFrame(odata['CompactData']['DataSet']['Series'][2]['Obs'])
jpexp.columns = ['jpexports','date']
del jpexp['date']
usexp = pd.DataFrame(odata['CompactData']['DataSet']['Series'][3]['Obs'])
usexp.columns = ['status', 'usexports','date']
del usexp['status'];
Now we combine the two series into one dataframe and tell our script to read the export value columns as numbers.
Read as numeric
combined = pd.concat([usexp, wexp, chexp, jpexp], axis=1)
pd.to_datetime(combined.date)
combined = combined.set_index(pd.DatetimeIndex(combined['date']))
usexports = pd.to_numeric(combined.usexports)
wexports = pd.to_numeric(combined.wexports)
cexports = pd.to_numeric(combined.chexports)
jexports = pd.to_numeric(combined.jpexports)
Finally, we can calculate the U.S. percentage share of world exports. We simply divide the us exports by the world exports and multiply by 100. If using the data for economic research, we would likely take the log forms and apply some filters.
Calculate share of world exports for each country
combined['usshare'] = usexports / wexports * 100
combined['chinashare'] = cexports / wexports * 100
combined['japanshare'] = jexports / wexports * 100
combined.tail()
Out[11]:
date | usshare | chinashare | japanshare |
---|---|---|---|
2015-08 | 9.460 | 15.121 | 3.668 |
2015-09 | 8.830 | 14.455 | 3.754 |
2015-10 | 9.330 | 13.737 | 3.892 |
2015-11 | 9.018 | 14.663 | 3.645 |
2015-12 | 8.776 | 16.362 | 3.804 |
Graphing the data
Let’s use matplotlib to view the result of our work.
Graph of US share of world exports
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
txt = '''Source: International Monetary Fund.'''
# Plot US share of world exports
combined.usshare.plot(grid=True, figsize=(9, 5), color="blue", linewidth=2,)
plt.ylabel('percentage of world exports')
plt.xlabel('Year')
plt.text(20,4.5,txt)
plt.title('U.S. Share of World Exports');
The graph shows a decrease in the U.S. share of exports from nearly 20 percent in 1966 to roughly 9 percent in 2015. We can also easily examine how changes in the U.S. share of exports compare with changes in the share of Japan and China.
Graph of moving average of US, China, and Japan shares of world exports
# Calculate moving average for each share, to reduce noise
combined['ussharema'] = combined['usshare'].rolling(12,12).mean()
combined['chsharema'] = combined['chinashare'].rolling(12,12).mean()
combined['jpsharema'] = combined['japanshare'].rolling(12,12).mean()
combshares = combined[['ussharema', 'chsharema', 'jpsharema']]
shares = list(combshares);
# Plot various shares of world exports
combined[shares][120:].plot(grid=True, figsize=(9, 5), linewidth=2)
plt.ylabel('percentage of world exports')
plt.xlabel('Year')
plt.text(150,-2,txt)
plt.title('Share of World Exports', );
Export dataset to .csv
Let’s save the dataset in a portable format that can be read by any statistical software. My preference is to create a .csv file, which I will use for my U.S. Macroeconomic and Markets Dashboard.
Create csv file
combined.to_csv('us_share_exports.csv')
It appears ‘KeyFamilies’ and ‘KeyFamily’ are no longer keys in the ‘seriesids’ dict object!
LikeLiked by 1 person
Thank you for letting me know. The replacement is ‘Dataflows’ and ‘Dataflow’ and it is updated above.
LikeLike