Given the time constraint, I was not able to get the precise Tariff Data. It was extremely exhausting to download the country-paired tariff data in a time-series manner, and it was even harder to merge in a short period of time. I then took a detour to examine and predictions based on some political indicators that may bring some new interests. The content can be more fruitful if I make the gravity model computable, such as CES, or other functions with homothetic features that can predict the trade flow by each commodity. I'll continue to enhance the flavor/richness of this project in the very near future.
This project includes several visualizations on international trade flow, U.S. and China partner trade share in aggregated and disaggregated level by commodity code and state. Three detailed the tariff war timelines are also presented. Some simple regression and machine learning method are applied to answer the question how will he bilateral trade flow change if the country was assigned some different political scores. The dataset is extracted mainly from CEPII database and the USITC dynamic gravity model database. The merged data frame contains 194884 pairs of trade flows from 2005 to 2013. A precise U.S.-China bilateral trade flow and tariff data from 1992 to 2013 was also examined.
For the simple tariff analysis, the prediction is based on the current aggregate tariff level 21% for both countries, for which scenario, China's export decreased more than the U.S.'s, a valid reason from the given data is that China has never been levied high tariff by the U.S. for the past decade. Though, Similarly, for policy variables, China's export is also hurt more compared to the U.S. if higher hostility scores and trade sanction being assigned. By the end, I also present the USITC GME package designed for international trade analysis which I discovered by the end of finishing this project.
The detailed summary table is presented in the Conclusion Section.
3.1 5-digit End Use Code Goods Trade Value during the Trade War
3.2 Timeline and Lag
3.3 Visuliaztion on Tariff Trend and Adjustment Lag
4.1 Exports Value by States in 2018
4.2 Goods Exports (non-service) Relative Change in 2019 by State
6.1 Data Cleaning
6.2 Small Dataset Regression
7.1 Cleaning Data
7.2 Gravity Model Linear regression
7.3 Variance Inflation Factor (VIF)
7.4 Ridge Model
7.5 Lasso Regression
Lasso Path
7.7 Gravity Linear V.S. Regularization (Ridge V.S. Lasso)
7.8 Test and Train
7.9 Neural Networks
Visualization
Data vs Test_n_Train vs Neural Networks vs Gravity Multilinear vs Lasso vs Simple Linear
7.10 Political Score
7.11 Hostility Level
7.12 Classfication
Install some pip.
# Uncomment following line to install on colab.
#! pip install qeds fiona geopandas xgboost gensim folium pyLDAvis descartes
# uncomment to install jupyterlab-manager.
#!jupyter labextension install @jupyter-widgets/jupyterlab-manager
#install jupyter_bokeh extension
#!jupyter labextension install @bokeh/jupyter_bokeh
#!pip install gme
#!pip install geopandas
import pandas as pd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import (
linear_model, metrics, neural_network, pipeline, model_selection
)
from sklearn.metrics import mean_squared_error, r2_score
import gme as gme
import qeds
qeds.themes.mpl_style();
plotly_template = qeds.themes.plotly_template()
colors = qeds.themes.COLOR_CYCLE
%matplotlib inline
# activate plot theme
import yaml
from bokeh.layouts import column
from bokeh.models import ColumnDataSource, Slider
from bokeh.plotting import figure
from bokeh.themes import Theme
from bokeh.io import show, output_notebook
Imports many datasets
df = pd.read_excel (r'/Users/zihao/Desktop/UBC/Econ 323/Project/TRADHIST_BITRADE_BITARIFF_3.xlsx')
df_2018_exports = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/2018_exports_all.xlsx')
df_2018_imports = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/2018_imports_all.xlsx')
df_tariff = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/U.S. China Average Tariff Change.xlsx')
df_us_partner = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/U.S. Trade Partner 2018.xlsx')
df_china_partner = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/China Trade Partner 2018.xlsx')
df_us_tariff_time = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/U.S. tariff on China.xlsx')
df_us_tariff_time = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/U.S. tariff on China.xlsx')
df_china_tariff_time = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/China tariif on U.S.xlsx')
df_gravity_model = pd.read_csv(r'/Users/zihao/Desktop/UBC/Econ 323/Project/gravity dataset/DATASET/2004-2016 gravity.csv')
df_gravity_model2 = pd.read_stata(r'/Users/zihao/Desktop/UBC/Econ 323/Project/gravity dataset/DATASET/gravdata_cepii/gravdata.dta')
import numpy as np
background_1 = tuple(np.array([253, 238, 222]) / 255)
Let's firstly look at the bilateral trade data of the United States
df_us_partner
The U.S.'s trade balance with the world is
us_world_blance = df_us_partner.iloc[223]['Trade Balance (US$ Thousand)']
us_world_blance
Then look at the U.S.'s and China's top trade partners in pie charts. Also observe the source of the U.S. trade deficit
#clean
#pull out some redudant aggregate regions from the dataset
region = ["World","Latin America & Caribbean","North America",'East Asia & Pacific',
'Europe & Central Asia','Middle East & North Africa','Sub-Saharan Africa','Other Asia, nes','South Asia']
#U.S. top 10 exports destination
df_us_top10_destination = df_us_partner.nlargest(16,'Export Partner Share (%)')
df_us_top10_destination_1 = df_us_top10_destination[['Partner Name','Export Partner Share (%)']]
df_us_top10_destination_1.drop(df_us_top10_destination_1.
loc[df_us_top10_destination_1['Partner Name'].isin(region)].index, inplace=True)
df_us_top10_destination_2 = df_us_top10_destination_1.drop(223).reset_index().drop('index',axis=1)
others_us_share = 100 - df_us_top10_destination_2['Export Partner Share (%)'].sum()
other_us_row = pd.DataFrame([['Others',others_us_share]], columns=['Partner Name','Export Partner Share (%)'])
df_us_top10_destination_3 = pd.concat([other_us_row,df_us_top10_destination_2]).set_index('Partner Name')
#U.S. top 10 imports origin
df_us_top10_origin = df_us_partner.nlargest(16,'Import Partner Share (%)')
df_us_top10_origin_1 = df_us_top10_origin[['Partner Name','Import Partner Share (%)']]
df_us_top10_origin_1.drop(df_us_top10_origin_1.
loc[df_us_top10_origin_1['Partner Name'].isin(region)].index, inplace=True)
df_us_top10_origin_2 = df_us_top10_origin_1.drop(223).reset_index().drop('index',axis=1)
others_share_imports = 100 - df_us_top10_origin_2['Import Partner Share (%)'].sum()
other_row_imports = pd.DataFrame([['Others',others_share_imports]], columns=['Partner Name','Import Partner Share (%)'])
df_us_top10_origin_3 = pd.concat([other_row_imports,df_us_top10_origin_2]).set_index('Partner Name')
#China top 10 exports destination
df_china_top10_destination = df_china_partner.nlargest(16,'Export Partner Share (%)')
df_china_top10_destination_1 = df_china_top10_destination[['Partner Name','Export Partner Share (%)']]
df_china_top10_destination_1.drop(df_china_top10_destination_1.
loc[df_china_top10_destination_1['Partner Name'].isin(region)].index, inplace=True)
df_china_top10_destination_2 = df_china_top10_destination_1.drop(215).reset_index().drop('index',axis=1)
others_china_share = 100 - df_china_top10_destination_2['Export Partner Share (%)'].sum()
other_china_row = pd.DataFrame([['Others',others_china_share]], columns=['Partner Name','Export Partner Share (%)'])
df_china_top10_destination_3 = pd.concat([other_china_row,df_china_top10_destination_2]).set_index('Partner Name')
#China top 10 imports origin
df_china_top10_origin = df_china_partner.nlargest(17,'Import Partner Share (%)')
df_china_top10_origin_1 = df_china_top10_origin[['Partner Name','Import Partner Share (%)']]
df_china_top10_origin_1.drop(df_china_top10_origin_1.
loc[df_china_top10_origin_1['Partner Name'].isin(region)].index, inplace=True)
df_china_top10_origin_2 = df_china_top10_origin_1.drop(215).reset_index().drop('index',axis=1)
others_share_china_import = 100 - df_china_top10_origin_2['Import Partner Share (%)'].sum()
other_row_china_imports = pd.DataFrame([['Others',others_share_china_import]], columns=['Partner Name','Import Partner Share (%)'])
df_china_top10_origin_3 = pd.concat([other_row_china_imports,df_china_top10_origin_2]).set_index('Partner Name')
#we also clean the data of u.s. trade balance top 10 list
df_us_balance = df_us_partner.nsmallest(12,'Trade Balance (US$ Thousand)')
df_deficit = df_us_balance[['Partner Name','Trade Balance (US$ Thousand)']]
df_deficit.drop(df_deficit.loc[df_deficit['Partner Name'].isin(region)].index, inplace=True)
df_us_balance_1 = df_deficit.drop(223).reset_index().drop('index',axis=1)
balance_other_country = us_world_blance - df_us_balance_1['Trade Balance (US$ Thousand)'].sum()
other_us_blance = pd.DataFrame([['Deficit From Other Countries',balance_other_country]], columns=['Partner Name','Trade Balance (US$ Thousand)'])
df_us_deficit = pd.concat([other_us_blance,df_us_balance_1]).set_index('Partner Name')
#for the sake of better plotting in bokeh I put others on the top
df_us_top10_destination_3
df_us_top10_origin_3
df_china_top10_destination_3
df_china_top10_origin_3
#prep for bokeh some dictionaries
us_top10_destination = df_us_top10_destination_3['Export Partner Share (%)'].to_dict()
us_top10_origin = df_us_top10_origin_3['Import Partner Share (%)'].to_dict()
china_top10_destination = df_china_top10_destination_3['Export Partner Share (%)'].to_dict()
china_top10_origin = df_china_top10_origin_3['Import Partner Share (%)'].to_dict()
us_top10_destination
df_us_deficit = df_us_deficit.reindex(['Deficit From Other Countries', 'Mexico', 'Germany', 'Japan','Ireland','Vietnam','Italy','China'])
#reorder prepare for a better visualization for pie chart
df_us_deficit
The bokeh pie chart below is based on the example given by the bokeh library.
reference from the bokeh lib http://docs.bokeh.org/en/latest/docs/gallery/pie_chart.htm
#I manually looped over
from math import pi
from bokeh.io import output_notebook, show
from bokeh.palettes import Category20c
from bokeh.plotting import figure
from bokeh.transform import cumsum
from bokeh.models import LabelSet, ColumnDataSource
from bokeh.layouts import row
from bokeh.layouts import gridplot
#1st U.S. destination plot
x = us_top10_destination
data = pd.Series(x).reset_index(name='value').rename(columns={'index':'country'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Category20c[len(x)]
s = figure(plot_height=350, title="U.S. Top 10 Export Destination by Percentage Point in 2018", toolbar_location=None,
tools="hover", tooltips="@country: @value", x_range=(-0.5, 1.0))
s.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_field='country', source=data)
data["country"] = data['country'].astype(str)
data["country"] = data["country"].str.pad(28, side = "left")
source = ColumnDataSource(data)
labels = LabelSet(x=0, y=1, text='country',
angle=cumsum('angle', include_zero=True), source=source, render_mode='canvas')
s.add_layout(labels)
s.axis.axis_label=None
s.axis.visible=False
s.grid.grid_line_color = None
#2nd U.S. origin plot
x1 = us_top10_origin
data1 = pd.Series(x1).reset_index(name='value').rename(columns={'index':'country'})
data1['angle'] = data1['value']/data1['value'].sum() * 2*pi
data1['color'] = Category20c[len(x1)]
s1 = figure(plot_height=350, title="U.S. Top 10 Imports Country of Origin by Percentage Point in 2018", toolbar_location=None,
tools="hover", tooltips="@country: @value", x_range=(-0.5, 1.0))
s1.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_field='country', source=data1)
data1["country"] = data1['country'].astype(str)
data1["country"] = data1["country"].str.pad(28, side = "left")
source1 = ColumnDataSource(data1)
labels1 = LabelSet(x=0, y=1, text='country',
angle=cumsum('angle', include_zero=True), source=source1, render_mode='canvas')
s1.add_layout(labels1)
s1.axis.axis_label=None
s1.axis.visible=False
s1.grid.grid_line_color = None
#3rd China Destination Plot
x2 = china_top10_destination
data2 = pd.Series(x2).reset_index(name='value').rename(columns={'index':'country'})
data2['angle'] = data2['value']/data2['value'].sum() * 2*pi
data2['color'] = Category20c[len(x2)]
s2 = figure(plot_height=350, title="China Top 10 Export Destination by Percentage Point in 2018", toolbar_location=None,
tools="hover", tooltips="@country: @value", x_range=(-0.5, 1.0))
s2.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_field='country', source=data2)
data2["country"] = data2['country'].astype(str)
data2["country"] = data2["country"].str.pad(28, side = "left")
source2 = ColumnDataSource(data2)
labels2 = LabelSet(x=0, y=1, text='country',
angle=cumsum('angle', include_zero=True), source=source2, render_mode='canvas')
s2.add_layout(labels2)
s2.axis.axis_label=None
s2.axis.visible=False
s2.grid.grid_line_color = None
#4th China Origin Plot
x3 = china_top10_origin
data3 = pd.Series(x3).reset_index(name='value').rename(columns={'index':'country'})
data3['angle'] = data3['value']/data3['value'].sum() * 2*pi
data3['color'] = Category20c[len(x3)]
s3 = figure(plot_height=350, title="China Top 10 Imports Country of Origin by Percentage Point in 2018", toolbar_location=None,
tools="hover", tooltips="@country: @value", x_range=(-0.5, 1.0))
s3.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_field='country', source=data3)
data3["country"] = data3['country'].astype(str)
data3["country"] = data3["country"].str.pad(28, side = "left")
source3 = ColumnDataSource(data3)
labels3 = LabelSet(x=0, y=1, text='country',
angle=cumsum('angle', include_zero=True), source=source3, render_mode='canvas')
s3.add_layout(labels3)
s3.axis.axis_label=None
s3.axis.visible=False
s3.grid.grid_line_color = None
##now we use grid to show results
#show the result by a 2*2 grid
grid = gridplot([s, s1, s2,s3], ncols=2)
# show the results
#show(row(s, s1))
output_notebook()
#reference from the bokeh lib http://docs.bokeh.org/en/latest/docs/gallery/pie_chart.html
show(grid)
You can also check the bokeh video I attached to see the detail of the plot
from IPython.display import Image
Image("/Users/zihao/Desktop/UBC/Econ 323/Project/Pie Share.png")
Top 8 deficit, slice China out.
#I chose a better visualization pallette?
labels = df_us_deficit.index.tolist()
df_normalize_percentage= df_us_deficit['Trade Balance (US$ Thousand)']/us_world_blance
sizes = df_normalize_percentage.tolist()
colors = ['#003f5c','#2f4b7c','#665191','#a05195','#d45087','#f95d6a','#ff7c43','#ffa600']
#split china out
explode = (0, 0, 0, 0,0,0,0,0.1)
fig, ax = plt.subplots(figsize = (15,10))
ax.pie(sizes, labels=labels,explode=explode, colors = colors, autopct='%1.1f%%',
shadow=True, startangle=90)
ax.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
ax.set_title('U.S. 2018 Trade Deficit Top 8 Partners',size = 20)
import matplotlib as mpl
mpl.rcParams['font.size'] = 15
df_2018_ex_im = pd.merge(df_2018_exports, df_2018_imports, on="Reporter Name")
df_2018_ex_im
df_2018_ex_im.set_index("Reporter Name",inplace=True)
df_2018_ex_im.loc["United States"]
df_2018_clean = df_2018_ex_im.dropna(axis=0)
df_2018_clean
null_sum_clean = df_2018_clean.isnull().sum()
null_sum_clean
df_2018_clean = df_2018_clean / 1000000
df_2018_clean.loc["China"]["Exports"]
background_1 = tuple(np.array([253, 238, 222]) / 255)
def scatterlabels(xvar, yvar, ax):
df_2018_clean.plot(xvar,yvar, kind="scatter", color= "blue", alpha=0.8, ax=ax, s=50)
for idx in df_2018_clean.index :
if idx == "China" :
ax.annotate("China", (df_2018_clean.loc["China",xvar], df_2018_clean.loc["China",yvar]), color="black",size=20)
elif idx == "United States" :
ax.annotate("United States", (df_2018_clean.loc["United States",xvar], df_2018_clean.loc["United States",yvar]), color="black",size=20)
return(ax)
fig, ax = plt.subplots(figsize=(20,10))
ax = scatterlabels("Exports","Imports",ax)
ax.set_xlabel('Exports',size=20)
ax.set_ylabel('Imports', size =20)
ax.set_title("2018 Imports Against Exports by Nations, by Million USD",size =20)
lims = [
np.min([ax.get_xlim(), ax.get_ylim()]), # min of both axes
np.max([ax.get_xlim(), ax.get_ylim()]), # max of both axes
]
# now plot both limits against eachother
# a 45 degree line
ax.plot(lims, lims, 'k-', alpha=0.5, zorder=0)
ax.set_aspect('equal')
ax.set_xlim(lims)
ax.set_ylim(lims)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.annotate(
"45 degree line",
xy=(10, 20), xycoords="data", xytext=(2000,2000),
annotation_clip=False,
horizontalalignment="left",
arrowprops={"visible": False},
fontsize=12, fontweight="semibold", color = tuple(np.array([139,0,0]) / 255),size =10)
fig.tight_layout()
import requests
url = 'https://www.census.gov/foreign-trade/statistics/product/enduse/imports/c5700.html?'
html = requests.get(url).content
df_list = pd.read_html(html)
df_end_use = df_list[-1]
df_end_use.to_csv('my data.csv')
df_end_use = df_end_use[(df_end_use != 0).all(1)]
df_end_use_1 = df_end_use.nlargest(70,'Value 2019')
df_end_use_1
df_end_use1 = df_end_use_1.set_index('End-Use Code').T.astype(float).drop(['TOTAL'],axis=1)
df_end_use1
#let's apply the shift funciton, see which chinese industry got hurt the most. We can clearly see the last row is negative
df_end_use2=((df_end_use1 - df_end_use1.shift()) / df_end_use1.shift())
df_end_use_trans= df_end_use2.T
df_end_use_trans.columns = ['2010', '2011', '2012','2013','2014','2015','2016','2017','2018','2019']
df_china_end_use = df_end_use_trans.T
#growth rate
df_end_use_trans
#There's one outlier that ruins the whole graph
df_end_use_trans['2016'].idxmax()
#Let's see what happens to the Chinese Product exports
most_china_list = df_end_use_trans.nsmallest(10,'2019').index.tolist()
most_china_list
not_bad = list(df_end_use1.columns)
for i in not_bad:
if i in most_china_list:
not_bad.remove(i)
not_bad.remove(df_end_use_trans['2016'].idxmax())
# now read and clean the U.S. exports file
url = 'https://www.census.gov/foreign-trade/statistics/product/enduse/exports/c5700.html'
html = requests.get(url).content
df_list_us = pd.read_html(html)
df_end_use_us= df_list_us[-1]
df_end_use_us.to_csv('my data us.csv')
df_end_use_us = df_end_use_us[(df_end_use_us != 0).all(1)]
df_end_use_1_us = df_end_use_us.nlargest(70,'Value 2019')
df_end_use_1_us
df_end_use1_us = df_end_use_1_us.set_index('End-Use Code').T.astype(float).drop(['TOTAL'],axis=1)
#let's apply the shift funciton, see which chinese industry got hurt the most. We can clearly see the last row is negative
df_end_use2_us=((df_end_use1_us - df_end_use1_us.shift()) / df_end_use1_us.shift())
df_end_use_trans_us= df_end_use2_us.T
df_end_use_trans_us.columns = ['2010', '2011', '2012','2013','2014','2015','2016','2017','2018','2019']
df_china_end_use_us = df_end_use_trans_us.T
#growth rate
df_end_use_trans_us['2013'].idxmax()
We seclect the top 10 HS2 commodities that were most affected by the tariff escalation
#China exports to U.S.
df_end_use_trans.nsmallest(10,'2019')
#U.S. exports to China
df_end_use_trans_us.nsmallest(10,'2019')
most_us_list = df_end_use_trans_us.nsmallest(10,'2019').index.tolist()
most_us_list
not_bad_us = list(df_end_use1_us.columns)
for i in not_bad_us:
if i in most_us_list:
not_bad_us.remove(i)
most_us_list.remove(df_end_use_trans_us['2013'].idxmax())
fig, ax = plt.subplots(2,figsize=(30,40))
df_china_end_use[not_bad].plot(ax=ax[0], color=[(0.7, 0.7, 0.7)], lw=0.4, legend=False)
df_china_end_use[most_china_list].plot(ax=ax[0] ,lw=2.5, legend=True,alpha = 0.8)
df_china_end_use_us[not_bad_us].plot(ax=ax[1], color=[(0.7, 0.7, 0.7)], lw=0.4, legend=False)
df_china_end_use_us[most_us_list].plot(ax=ax[1] ,lw=2.5, legend=True,alpha = 0.8)
ax[0].set_title("China Exports to U.S. Growth Rate, by 5-digit-End-Use Code",size = 25)
ax[0].text(8, 0.25, "Other Products")
ax[1].set_title("U.S. Exports to China Growth Rate, by 5-digit-End-Use Code",size = 25)
ax[1].text(8, 0.25, "Other Products")
ax[1].text(3.5, 1.5, "note: (00210) Sorghum, barley, oats should also be included, but flutuates too much on the graph")
for j in range(len(ax)):
ax[j].set_facecolor(background_1)
ax[j].spines["right"].set_visible(False)
ax[j].spines["top"].set_visible(False)
ax[j].set_ylabel("Growth Rate",size = 20)
ax[j].set_xlabel('Year')
fig.set_facecolor(background_1)