Given the time constraint, I was not able to get the precise Tariff Data. It was extremely exhausting to download the country-paired tariff data in a time-series manner, and it was even harder to merge in a short period of time. I then took a detour to examine and predictions based on some political indicators that may bring some new interests. The content can be more fruitful if I make the gravity model computable, such as CES, or other functions with homothetic features that can predict the trade flow by each commodity. I'll continue to enhance the flavor/richness of this project in the very near future.
This project includes several visualizations on international trade flow, U.S. and China partner trade share in aggregated and disaggregated level by commodity code and state. Three detailed the tariff war timelines are also presented. Some simple regression and machine learning method are applied to answer the question how will he bilateral trade flow change if the country was assigned some different political scores. The dataset is extracted mainly from CEPII database and the USITC dynamic gravity model database. The merged data frame contains 194884 pairs of trade flows from 2005 to 2013. A precise U.S.-China bilateral trade flow and tariff data from 1992 to 2013 was also examined.
For the simple tariff analysis, the prediction is based on the current aggregate tariff level 21% for both countries, for which scenario, China's export decreased more than the U.S.'s, a valid reason from the given data is that China has never been levied high tariff by the U.S. for the past decade. Though, Similarly, for policy variables, China's export is also hurt more compared to the U.S. if higher hostility scores and trade sanction being assigned. By the end, I also present the USITC GME package designed for international trade analysis which I discovered by the end of finishing this project.
The detailed summary table is presented in the Conclusion Section.
3.1 5-digit End Use Code Goods Trade Value during the Trade War
3.2 Timeline and Lag
3.3 Visuliaztion on Tariff Trend and Adjustment Lag
4.1 Exports Value by States in 2018
4.2 Goods Exports (non-service) Relative Change in 2019 by State
6.1 Data Cleaning
6.2 Small Dataset Regression
7.1 Cleaning Data
7.2 Gravity Model Linear regression
7.3 Variance Inflation Factor (VIF)
7.4 Ridge Model
7.5 Lasso Regression
Lasso Path
7.7 Gravity Linear V.S. Regularization (Ridge V.S. Lasso)
7.8 Test and Train
7.9 Neural Networks
Visualization
Data vs Test_n_Train vs Neural Networks vs Gravity Multilinear vs Lasso vs Simple Linear
7.10 Political Score
7.11 Hostility Level
7.12 Classfication
Install some pip.
# Uncomment following line to install on colab.
#! pip install qeds fiona geopandas xgboost gensim folium pyLDAvis descartes
# uncomment to install jupyterlab-manager.
#!jupyter labextension install @jupyter-widgets/jupyterlab-manager
#install jupyter_bokeh extension
#!jupyter labextension install @bokeh/jupyter_bokeh
Building jupyterlab assets (build:prod:minimize)
#!pip install gme
Collecting gme Downloading gme-1.2.tar.gz (21 kB) Requirement already satisfied: pandas in ./opt/anaconda3/lib/python3.7/site-packages (from gme) (1.0.1) Requirement already satisfied: statsmodels in ./opt/anaconda3/lib/python3.7/site-packages (from gme) (0.11.0) Requirement already satisfied: scipy in ./opt/anaconda3/lib/python3.7/site-packages (from gme) (1.4.1) Requirement already satisfied: patsy in ./opt/anaconda3/lib/python3.7/site-packages (from gme) (0.5.1) Requirement already satisfied: python-dateutil>=2.6.1 in ./opt/anaconda3/lib/python3.7/site-packages (from pandas->gme) (2.8.1) Requirement already satisfied: pytz>=2017.2 in ./opt/anaconda3/lib/python3.7/site-packages (from pandas->gme) (2019.3) Requirement already satisfied: numpy>=1.13.3 in ./opt/anaconda3/lib/python3.7/site-packages (from pandas->gme) (1.18.1) Requirement already satisfied: six in ./opt/anaconda3/lib/python3.7/site-packages (from patsy->gme) (1.14.0) Building wheels for collected packages: gme Building wheel for gme (setup.py) ... done Created wheel for gme: filename=gme-1.2-py3-none-any.whl size=27290 sha256=7a1c230c9ee1836a9d82824b93cba8d57d4d32fe19ec4275c97a47a529ce4ca9 Stored in directory: /Users/zihao/Library/Caches/pip/wheels/2e/dd/b4/f7f3d707ebd7a89d1c4a15751c0878251365296ceadbe5952b Successfully built gme Installing collected packages: gme Successfully installed gme-1.2
#!pip install geopandas
['Requirement already satisfied: geopandas in ./opt/anaconda3/lib/python3.7/site-packages (0.7.0)', 'Requirement already satisfied: pyproj>=2.2.0 in ./opt/anaconda3/lib/python3.7/site-packages (from geopandas) (2.6.0)', 'Requirement already satisfied: shapely in ./opt/anaconda3/lib/python3.7/site-packages (from geopandas) (1.7.0)', 'Requirement already satisfied: fiona in ./opt/anaconda3/lib/python3.7/site-packages (from geopandas) (1.8.13.post1)', 'Requirement already satisfied: pandas>=0.23.0 in ./opt/anaconda3/lib/python3.7/site-packages (from geopandas) (1.0.1)', 'Requirement already satisfied: click-plugins>=1.0 in ./opt/anaconda3/lib/python3.7/site-packages (from fiona->geopandas) (1.1.1)', 'Requirement already satisfied: cligj>=0.5 in ./opt/anaconda3/lib/python3.7/site-packages (from fiona->geopandas) (0.5.0)', 'Requirement already satisfied: click<8,>=4.0 in ./opt/anaconda3/lib/python3.7/site-packages (from fiona->geopandas) (7.0)', 'Requirement already satisfied: munch in ./opt/anaconda3/lib/python3.7/site-packages (from fiona->geopandas) (2.5.0)', 'Requirement already satisfied: attrs>=17 in ./opt/anaconda3/lib/python3.7/site-packages (from fiona->geopandas) (19.3.0)', 'Requirement already satisfied: six>=1.7 in ./opt/anaconda3/lib/python3.7/site-packages (from fiona->geopandas) (1.14.0)', 'Requirement already satisfied: python-dateutil>=2.6.1 in ./opt/anaconda3/lib/python3.7/site-packages (from pandas>=0.23.0->geopandas) (2.8.1)', 'Requirement already satisfied: numpy>=1.13.3 in ./opt/anaconda3/lib/python3.7/site-packages (from pandas>=0.23.0->geopandas) (1.18.1)', 'Requirement already satisfied: pytz>=2017.2 in ./opt/anaconda3/lib/python3.7/site-packages (from pandas>=0.23.0->geopandas) (2019.3)']
import pandas as pd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import (
linear_model, metrics, neural_network, pipeline, model_selection
)
from sklearn.metrics import mean_squared_error, r2_score
import gme as gme
import qeds
qeds.themes.mpl_style();
plotly_template = qeds.themes.plotly_template()
colors = qeds.themes.COLOR_CYCLE
%matplotlib inline
# activate plot theme
import yaml
from bokeh.layouts import column
from bokeh.models import ColumnDataSource, Slider
from bokeh.plotting import figure
from bokeh.themes import Theme
from bokeh.io import show, output_notebook
Imports many datasets
df = pd.read_excel (r'/Users/zihao/Desktop/UBC/Econ 323/Project/TRADHIST_BITRADE_BITARIFF_3.xlsx')
df_2018_exports = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/2018_exports_all.xlsx')
df_2018_imports = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/2018_imports_all.xlsx')
df_tariff = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/U.S. China Average Tariff Change.xlsx')
df_us_partner = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/U.S. Trade Partner 2018.xlsx')
df_china_partner = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/China Trade Partner 2018.xlsx')
df_us_tariff_time = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/U.S. tariff on China.xlsx')
df_us_tariff_time = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/U.S. tariff on China.xlsx')
df_china_tariff_time = pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/China tariif on U.S.xlsx')
df_gravity_model = pd.read_csv(r'/Users/zihao/Desktop/UBC/Econ 323/Project/gravity dataset/DATASET/2004-2016 gravity.csv')
df_gravity_model2 = pd.read_stata(r'/Users/zihao/Desktop/UBC/Econ 323/Project/gravity dataset/DATASET/gravdata_cepii/gravdata.dta')
import numpy as np
background_1 = tuple(np.array([253, 238, 222]) / 255)
Let's firstly look at the bilateral trade data of the United States
df_us_partner
Reporter Name | Partner Name | Export Share in Total Products (%) | Import Share in Total Products (%) | Trade Balance (US$ Thousand) | Export (US$ Thousand) | Import (US$ Thousand) | Import Partner Share (%) | Export Partner Share (%) | AHS Simple Average (%) | ... | MFN Dutiable Tariff Lines Share (%) | MFN Duty Free Tariff Lines Share (%) | MFN Specific Tariff Lines Share (%) | MFN AVE Tariff Lines Share (%) | MFN MaxRate (%) | MFN MinRate (%) | MFN SpecificDuty Imports (US$ Thousand) | MFN Dutiable Imports (US$ Thousand) | MFN Duty Free Imports (US$ Thousand) | No Of Tariff Agreement | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | United States | Aruba | 37.99 | 1.41 | 570308.23 | 601974.40 | 31666.17 | 0.00 | 0.04 | 0.74 | ... | 51.22 | 46.34 | 0.00 | 2.44 | 90.03 | 0 | 0.00 | 5422.55 | 4333.60 | 1.0 |
1 | United States | Afghanistan | 24.22 | 2.91 | 1197093.01 | 1226885.15 | 29792.15 | 0.00 | 0.07 | 1.47 | ... | 56.18 | 30.19 | 0.21 | 13.42 | 76.52 | 0 | 1.06 | 16866.93 | 10514.38 | 2.0 |
2 | United States | Angola | 18.71 | 0.40 | -2243739.33 | 525185.65 | 2768924.98 | 0.11 | 0.03 | 1.03 | ... | 37.84 | 56.76 | 5.41 | 0.00 | 16.50 | 0 | 2487662.61 | 2553366.97 | 210040.74 | 3.0 |
3 | United States | Anguila | 19.99 | 0.62 | 110551.09 | 115705.45 | 5154.36 | 0.00 | 0.01 | 0.73 | ... | 42.65 | 54.41 | 0.00 | 2.94 | 16.50 | 0 | 0.00 | 2396.21 | 2567.91 | 1.0 |
4 | United States | Albania | 6.71 | 5.19 | -11206.99 | 59200.12 | 70407.11 | 0.00 | 0.00 | 6.75 | ... | 65.92 | 24.97 | 0.12 | 8.99 | 350.00 | 0 | 38.91 | 65995.63 | 2639.05 | 1.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
226 | United States | Yemen | 4.39 | 0.79 | 189077.98 | 191307.70 | 2229.72 | 0.00 | 0.01 | 3.11 | ... | 61.81 | 20.83 | 0.00 | 17.36 | 37.11 | 0 | 0.00 | 341.11 | 1852.67 | 2.0 |
227 | United States | South Africa | 56.83 | 31.65 | -3132357.73 | 5517377.25 | 8649734.98 | 0.33 | 0.33 | 0.63 | ... | 54.17 | 37.57 | 1.47 | 6.79 | 350.00 | 0 | 38849.58 | 3567909.24 | 4859075.84 | 3.0 |
228 | United States | Congo, Dem. Rep. | 9.75 | 1.41 | 27678.64 | 78002.32 | 50323.68 | 0.00 | 0.00 | 0.69 | ... | 39.52 | 60.00 | 0.00 | 0.48 | 19.70 | 0 | 0.00 | 20667.85 | 28685.94 | 2.0 |
229 | United States | Zambia | 10.35 | 1.08 | 4537.42 | 195379.27 | 190841.84 | 0.01 | 0.01 | 2.18 | ... | 59.60 | 36.36 | 0.00 | 4.04 | 350.00 | 0 | 0.00 | 138367.28 | 51342.50 | 4.0 |
230 | United States | Zimbabwe | 5.78 | 1.70 | -45324.59 | 33737.88 | 79062.47 | 0.00 | 0.00 | 3.85 | ... | 58.82 | 32.20 | 0.93 | 8.05 | 350.00 | 0 | 6198.63 | 70412.93 | 8226.68 | 1.0 |
231 rows × 34 columns
The U.S.'s trade balance with the world is
us_world_blance = df_us_partner.iloc[223]['Trade Balance (US$ Thousand)']
us_world_blance
-946129553.57
Then look at the U.S.'s and China's top trade partners in pie charts. Also observe the source of the U.S. trade deficit
#clean
#pull out some redudant aggregate regions from the dataset
region = ["World","Latin America & Caribbean","North America",'East Asia & Pacific',
'Europe & Central Asia','Middle East & North Africa','Sub-Saharan Africa','Other Asia, nes','South Asia']
#U.S. top 10 exports destination
df_us_top10_destination = df_us_partner.nlargest(16,'Export Partner Share (%)')
df_us_top10_destination_1 = df_us_top10_destination[['Partner Name','Export Partner Share (%)']]
df_us_top10_destination_1.drop(df_us_top10_destination_1.
loc[df_us_top10_destination_1['Partner Name'].isin(region)].index, inplace=True)
df_us_top10_destination_2 = df_us_top10_destination_1.drop(223).reset_index().drop('index',axis=1)
others_us_share = 100 - df_us_top10_destination_2['Export Partner Share (%)'].sum()
other_us_row = pd.DataFrame([['Others',others_us_share]], columns=['Partner Name','Export Partner Share (%)'])
df_us_top10_destination_3 = pd.concat([other_us_row,df_us_top10_destination_2]).set_index('Partner Name')
#U.S. top 10 imports origin
df_us_top10_origin = df_us_partner.nlargest(16,'Import Partner Share (%)')
df_us_top10_origin_1 = df_us_top10_origin[['Partner Name','Import Partner Share (%)']]
df_us_top10_origin_1.drop(df_us_top10_origin_1.
loc[df_us_top10_origin_1['Partner Name'].isin(region)].index, inplace=True)
df_us_top10_origin_2 = df_us_top10_origin_1.drop(223).reset_index().drop('index',axis=1)
others_share_imports = 100 - df_us_top10_origin_2['Import Partner Share (%)'].sum()
other_row_imports = pd.DataFrame([['Others',others_share_imports]], columns=['Partner Name','Import Partner Share (%)'])
df_us_top10_origin_3 = pd.concat([other_row_imports,df_us_top10_origin_2]).set_index('Partner Name')
#China top 10 exports destination
df_china_top10_destination = df_china_partner.nlargest(16,'Export Partner Share (%)')
df_china_top10_destination_1 = df_china_top10_destination[['Partner Name','Export Partner Share (%)']]
df_china_top10_destination_1.drop(df_china_top10_destination_1.
loc[df_china_top10_destination_1['Partner Name'].isin(region)].index, inplace=True)
df_china_top10_destination_2 = df_china_top10_destination_1.drop(215).reset_index().drop('index',axis=1)
others_china_share = 100 - df_china_top10_destination_2['Export Partner Share (%)'].sum()
other_china_row = pd.DataFrame([['Others',others_china_share]], columns=['Partner Name','Export Partner Share (%)'])
df_china_top10_destination_3 = pd.concat([other_china_row,df_china_top10_destination_2]).set_index('Partner Name')
#China top 10 imports origin
df_china_top10_origin = df_china_partner.nlargest(17,'Import Partner Share (%)')
df_china_top10_origin_1 = df_china_top10_origin[['Partner Name','Import Partner Share (%)']]
df_china_top10_origin_1.drop(df_china_top10_origin_1.
loc[df_china_top10_origin_1['Partner Name'].isin(region)].index, inplace=True)
df_china_top10_origin_2 = df_china_top10_origin_1.drop(215).reset_index().drop('index',axis=1)
others_share_china_import = 100 - df_china_top10_origin_2['Import Partner Share (%)'].sum()
other_row_china_imports = pd.DataFrame([['Others',others_share_china_import]], columns=['Partner Name','Import Partner Share (%)'])
df_china_top10_origin_3 = pd.concat([other_row_china_imports,df_china_top10_origin_2]).set_index('Partner Name')
#we also clean the data of u.s. trade balance top 10 list
df_us_balance = df_us_partner.nsmallest(12,'Trade Balance (US$ Thousand)')
df_deficit = df_us_balance[['Partner Name','Trade Balance (US$ Thousand)']]
df_deficit.drop(df_deficit.loc[df_deficit['Partner Name'].isin(region)].index, inplace=True)
df_us_balance_1 = df_deficit.drop(223).reset_index().drop('index',axis=1)
balance_other_country = us_world_blance - df_us_balance_1['Trade Balance (US$ Thousand)'].sum()
other_us_blance = pd.DataFrame([['Deficit From Other Countries',balance_other_country]], columns=['Partner Name','Trade Balance (US$ Thousand)'])
df_us_deficit = pd.concat([other_us_blance,df_us_balance_1]).set_index('Partner Name')
#for the sake of better plotting in bokeh I put others on the top
df_us_top10_destination_3
Export Partner Share (%) | |
---|---|
Partner Name | |
Others | 38.22 |
Canada | 18.00 |
Mexico | 15.94 |
China | 7.21 |
Japan | 4.52 |
United Kingdom | 3.98 |
Germany | 3.44 |
Korea, Rep. | 3.39 |
Netherlands | 2.92 |
Brazil | 2.38 |
df_us_top10_origin_3
Import Partner Share (%) | |
---|---|
Partner Name | |
Others | 32.44 |
China | 21.57 |
Mexico | 13.37 |
Canada | 12.47 |
Japan | 5.59 |
Germany | 4.91 |
Korea, Rep. | 2.92 |
United Kingdom | 2.36 |
Ireland | 2.21 |
India | 2.16 |
df_china_top10_destination_3
Export Partner Share (%) | |
---|---|
Partner Name | |
Others | 38.19 |
United States | 19.23 |
Hong Kong, China | 12.15 |
Japan | 5.90 |
Korea, Rep. | 4.37 |
Vietnam | 3.37 |
Germany | 3.12 |
India | 3.08 |
Netherlands | 2.93 |
df_china_top10_origin_3
Import Partner Share (%) | |
---|---|
Partner Name | |
Others | 52.42 |
Korea, Rep. | 9.58 |
Japan | 8.45 |
United States | 7.31 |
Germany | 4.98 |
Australia | 4.92 |
Brazil | 3.61 |
Vietnam | 3.00 |
Malaysia | 2.97 |
Russian Federation | 2.76 |
#prep for bokeh some dictionaries
us_top10_destination = df_us_top10_destination_3['Export Partner Share (%)'].to_dict()
us_top10_origin = df_us_top10_origin_3['Import Partner Share (%)'].to_dict()
china_top10_destination = df_china_top10_destination_3['Export Partner Share (%)'].to_dict()
china_top10_origin = df_china_top10_origin_3['Import Partner Share (%)'].to_dict()
us_top10_destination
{'Others': 38.21999999999999, 'Canada': 18.0, 'Mexico': 15.94, 'China': 7.21, 'Japan': 4.52, 'United Kingdom': 3.98, 'Germany': 3.44, 'Korea, Rep.': 3.39, 'Netherlands': 2.92, 'Brazil': 2.38}
df_us_deficit = df_us_deficit.reindex(['Deficit From Other Countries', 'Mexico', 'Germany', 'Japan','Ireland','Vietnam','Italy','China'])
#reorder prepare for a better visualization for pie chart
df_us_deficit
Trade Balance (US$ Thousand) | |
---|---|
Partner Name | |
Deficit From Other Countries | -1.556063e+08 |
Mexico | -8.376046e+07 |
Germany | -7.101312e+07 |
Japan | -7.067617e+07 |
Ireland | -4.695435e+07 |
Vietnam | -4.160220e+07 |
Italy | -3.346167e+07 |
China | -4.430553e+08 |
The bokeh pie chart below is based on the example given by the bokeh library.
reference from the bokeh lib http://docs.bokeh.org/en/latest/docs/gallery/pie_chart.htm
#I manually looped over
from math import pi
from bokeh.io import output_notebook, show
from bokeh.palettes import Category20c
from bokeh.plotting import figure
from bokeh.transform import cumsum
from bokeh.models import LabelSet, ColumnDataSource
from bokeh.layouts import row
from bokeh.layouts import gridplot
#1st U.S. destination plot
x = us_top10_destination
data = pd.Series(x).reset_index(name='value').rename(columns={'index':'country'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Category20c[len(x)]
s = figure(plot_height=350, title="U.S. Top 10 Export Destination by Percentage Point in 2018", toolbar_location=None,
tools="hover", tooltips="@country: @value", x_range=(-0.5, 1.0))
s.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_field='country', source=data)
data["country"] = data['country'].astype(str)
data["country"] = data["country"].str.pad(28, side = "left")
source = ColumnDataSource(data)
labels = LabelSet(x=0, y=1, text='country',
angle=cumsum('angle', include_zero=True), source=source, render_mode='canvas')
s.add_layout(labels)
s.axis.axis_label=None
s.axis.visible=False
s.grid.grid_line_color = None
#2nd U.S. origin plot
x1 = us_top10_origin
data1 = pd.Series(x1).reset_index(name='value').rename(columns={'index':'country'})
data1['angle'] = data1['value']/data1['value'].sum() * 2*pi
data1['color'] = Category20c[len(x1)]
s1 = figure(plot_height=350, title="U.S. Top 10 Imports Country of Origin by Percentage Point in 2018", toolbar_location=None,
tools="hover", tooltips="@country: @value", x_range=(-0.5, 1.0))
s1.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_field='country', source=data1)
data1["country"] = data1['country'].astype(str)
data1["country"] = data1["country"].str.pad(28, side = "left")
source1 = ColumnDataSource(data1)
labels1 = LabelSet(x=0, y=1, text='country',
angle=cumsum('angle', include_zero=True), source=source1, render_mode='canvas')
s1.add_layout(labels1)
s1.axis.axis_label=None
s1.axis.visible=False
s1.grid.grid_line_color = None
#3rd China Destination Plot
x2 = china_top10_destination
data2 = pd.Series(x2).reset_index(name='value').rename(columns={'index':'country'})
data2['angle'] = data2['value']/data2['value'].sum() * 2*pi
data2['color'] = Category20c[len(x2)]
s2 = figure(plot_height=350, title="China Top 10 Export Destination by Percentage Point in 2018", toolbar_location=None,
tools="hover", tooltips="@country: @value", x_range=(-0.5, 1.0))
s2.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_field='country', source=data2)
data2["country"] = data2['country'].astype(str)
data2["country"] = data2["country"].str.pad(28, side = "left")
source2 = ColumnDataSource(data2)
labels2 = LabelSet(x=0, y=1, text='country',
angle=cumsum('angle', include_zero=True), source=source2, render_mode='canvas')
s2.add_layout(labels2)
s2.axis.axis_label=None
s2.axis.visible=False
s2.grid.grid_line_color = None
#4th China Origin Plot
x3 = china_top10_origin
data3 = pd.Series(x3).reset_index(name='value').rename(columns={'index':'country'})
data3['angle'] = data3['value']/data3['value'].sum() * 2*pi
data3['color'] = Category20c[len(x3)]
s3 = figure(plot_height=350, title="China Top 10 Imports Country of Origin by Percentage Point in 2018", toolbar_location=None,
tools="hover", tooltips="@country: @value", x_range=(-0.5, 1.0))
s3.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_field='country', source=data3)
data3["country"] = data3['country'].astype(str)
data3["country"] = data3["country"].str.pad(28, side = "left")
source3 = ColumnDataSource(data3)
labels3 = LabelSet(x=0, y=1, text='country',
angle=cumsum('angle', include_zero=True), source=source3, render_mode='canvas')
s3.add_layout(labels3)
s3.axis.axis_label=None
s3.axis.visible=False
s3.grid.grid_line_color = None
##now we use grid to show results
#show the result by a 2*2 grid
grid = gridplot([s, s1, s2,s3], ncols=2)
# show the results
#show(row(s, s1))
output_notebook()
#reference from the bokeh lib http://docs.bokeh.org/en/latest/docs/gallery/pie_chart.html
show(grid)
You can also check the bokeh video I attached to see the detail of the plot
from IPython.display import Image
Image("/Users/zihao/Desktop/UBC/Econ 323/Project/Pie Share.png")
Top 8 deficit, slice China out.
#I chose a better visualization pallette?
labels = df_us_deficit.index.tolist()
df_normalize_percentage= df_us_deficit['Trade Balance (US$ Thousand)']/us_world_blance
sizes = df_normalize_percentage.tolist()
colors = ['#003f5c','#2f4b7c','#665191','#a05195','#d45087','#f95d6a','#ff7c43','#ffa600']
#split china out
explode = (0, 0, 0, 0,0,0,0,0.1)
fig, ax = plt.subplots(figsize = (15,10))
ax.pie(sizes, labels=labels,explode=explode, colors = colors, autopct='%1.1f%%',
shadow=True, startangle=90)
ax.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
ax.set_title('U.S. 2018 Trade Deficit Top 8 Partners',size = 20)
import matplotlib as mpl
mpl.rcParams['font.size'] = 15
df_2018_ex_im = pd.merge(df_2018_exports, df_2018_imports, on="Reporter Name")
df_2018_ex_im
Reporter Name | Exports | Imports | |
---|---|---|---|
0 | Afghanistan | 1.769009e+06 | 1.481318e+07 |
1 | Albania | 2.875860e+06 | 5.941287e+06 |
2 | Algeria | NaN | NaN |
3 | Andorra | 1.294482e+05 | 1.609394e+06 |
4 | Angola | 3.768468e+07 | 1.635306e+07 |
... | ... | ... | ... |
198 | Venezuela | NaN | NaN |
199 | Vietnam | NaN | NaN |
200 | Yemen | NaN | NaN |
201 | Zambia | 9.052165e+06 | 9.461739e+06 |
202 | Zimbabwe | 4.037203e+06 | 6.258856e+06 |
203 rows × 3 columns
df_2018_ex_im.set_index("Reporter Name",inplace=True)
df_2018_ex_im.loc["United States"]
Exports 1.665303e+09 Imports 2.611432e+09 Name: United States, dtype: float64
df_2018_clean = df_2018_ex_im.dropna(axis=0)
df_2018_clean
Exports | Imports | |
---|---|---|
Reporter Name | ||
Afghanistan | 1.769009e+06 | 1.481318e+07 |
Albania | 2.875860e+06 | 5.941287e+06 |
Andorra | 1.294482e+05 | 1.609394e+06 |
Angola | 3.768468e+07 | 1.635306e+07 |
Antigua and Barbuda | 2.584108e+04 | 5.691552e+05 |
... | ... | ... |
United States | 1.665303e+09 | 2.611432e+09 |
Uruguay | 7.498005e+06 | 8.893246e+06 |
Uzbekistan | 1.091901e+07 | 1.731395e+07 |
Zambia | 9.052165e+06 | 9.461739e+06 |
Zimbabwe | 4.037203e+06 | 6.258856e+06 |
131 rows × 2 columns
null_sum_clean = df_2018_clean.isnull().sum()
null_sum_clean
Exports 0 Imports 0 dtype: int64
df_2018_clean = df_2018_clean / 1000000
df_2018_clean.loc["China"]["Exports"]
2494.230194966
background_1 = tuple(np.array([253, 238, 222]) / 255)
def scatterlabels(xvar, yvar, ax):
df_2018_clean.plot(xvar,yvar, kind="scatter", color= "blue", alpha=0.8, ax=ax, s=50)
for idx in df_2018_clean.index :
if idx == "China" :
ax.annotate("China", (df_2018_clean.loc["China",xvar], df_2018_clean.loc["China",yvar]), color="black",size=20)
elif idx == "United States" :
ax.annotate("United States", (df_2018_clean.loc["United States",xvar], df_2018_clean.loc["United States",yvar]), color="black",size=20)
return(ax)
fig, ax = plt.subplots(figsize=(20,10))
ax = scatterlabels("Exports","Imports",ax)
ax.set_xlabel('Exports',size=20)
ax.set_ylabel('Imports', size =20)
ax.set_title("2018 Imports Against Exports by Nations, by Million USD",size =20)
lims = [
np.min([ax.get_xlim(), ax.get_ylim()]), # min of both axes
np.max([ax.get_xlim(), ax.get_ylim()]), # max of both axes
]
# now plot both limits against eachother
# a 45 degree line
ax.plot(lims, lims, 'k-', alpha=0.5, zorder=0)
ax.set_aspect('equal')
ax.set_xlim(lims)
ax.set_ylim(lims)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.annotate(
"45 degree line",
xy=(10, 20), xycoords="data", xytext=(2000,2000),
annotation_clip=False,
horizontalalignment="left",
arrowprops={"visible": False},
fontsize=12, fontweight="semibold", color = tuple(np.array([139,0,0]) / 255),size =10)
fig.tight_layout()
import requests
url = 'https://www.census.gov/foreign-trade/statistics/product/enduse/imports/c5700.html?'
html = requests.get(url).content
df_list = pd.read_html(html)
df_end_use = df_list[-1]
df_end_use.to_csv('my data.csv')
df_end_use = df_end_use[(df_end_use != 0).all(1)]
df_end_use_1 = df_end_use.nlargest(70,'Value 2019')
df_end_use_1
End-Use Code | Value 2010 | Value 2011 | Value 2012 | Value 2013 | Value 2014 | Value 2015 | Value 2016 | Value 2017 | Value 2018 | Value 2019 | |
---|---|---|---|---|---|---|---|---|---|---|---|
141 | TOTAL | 364952634 | 399371233 | 425619083 | 440430020 | 468474895 | 483201655 | 462419992 | 505220233 | 539675591 | 452243440 |
121 | (41050) Cell phones and other household goods,... | 32612041 | 40437973 | 52820617 | 58708992 | 64102970 | 64538065 | 61465082 | 70339967 | 71872376 | 64552117 |
84 | (21300) Computers | 39343305 | 47212554 | 47638124 | 47154020 | 46246719 | 43804679 | 40407803 | 45400673 | 47178858 | 42379802 |
124 | (41120) Toys, games, and sporting goods | 27817414 | 25306178 | 25115136 | 24632369 | 24432816 | 26355499 | 25044156 | 26726633 | 28222683 | 26512710 |
108 | (40020) Apparel, textiles, nonwool or cotton | 17317807 | 19754755 | 20619014 | 21554715 | 23924860 | 25645126 | 24143354 | 24114281 | 25140494 | 24453425 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
126 | (41140) Musical instruments | 655151 | 660364 | 674404 | 647305 | 708923 | 703271 | 708904 | 717117 | 777382 | 816483 |
122 | (41100) Motorcycles and parts | 133870 | 181086 | 229314 | 254189 | 260398 | 478429 | 466435 | 568479 | 842384 | 724828 |
74 | (21110) Food, tobacco machinery | 430348 | 509068 | 551661 | 585582 | 671607 | 725335 | 755540 | 806258 | 796777 | 681196 |
92 | (22010) Parts-civilian aircraft | 303580 | 413779 | 445631 | 509563 | 535630 | 534547 | 522628 | 600075 | 676175 | 651946 |
11 | (00180) Other foods | 339091 | 431406 | 495041 | 534875 | 475411 | 496359 | 513580 | 598441 | 711488 | 642395 |
70 rows × 11 columns
df_end_use1 = df_end_use_1.set_index('End-Use Code').T.astype(float).drop(['TOTAL'],axis=1)
df_end_use1
End-Use Code | (41050) Cell phones and other household goods, n.e.c. | (21300) Computers | (41120) Toys, games, and sporting goods | (40020) Apparel, textiles, nonwool or cotton | (21400) Telecommunications equipment | (21301) Computer accessories | (41000) Furniture, household goods, etc. | (41030) Household appliances | (30230) Other parts and accessories of vehicles | (20005) Electric apparatus | ... | (12070) Tobacco, waxes, etc. | (42110) Gem stones, other | (50040) Other (movies, miscellaneous imports, and special transactions) | (12500) Plastic materials | (13020) Stone, sand, cement, etc. | (41140) Musical instruments | (41100) Motorcycles and parts | (21110) Food, tobacco machinery | (22010) Parts-civilian aircraft | (00180) Other foods |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Value 2010 | 32612041.0 | 39343305.0 | 27817414.0 | 17317807.0 | 16619084.0 | 28195127.0 | 13561429.0 | 9390599.0 | 7361394.0 | 11044289.0 | ... | 644778.0 | 1512192.0 | 161174.0 | 571061.0 | 782220.0 | 655151.0 | 133870.0 | 430348.0 | 303580.0 | 339091.0 |
Value 2011 | 40437973.0 | 47212554.0 | 25306178.0 | 19754755.0 | 17336848.0 | 27663608.0 | 13753109.0 | 10210570.0 | 9029238.0 | 12946856.0 | ... | 760624.0 | 1577880.0 | 203950.0 | 808861.0 | 799358.0 | 660364.0 | 181086.0 | 509068.0 | 413779.0 | 431406.0 |
Value 2012 | 52820617.0 | 47638124.0 | 25115136.0 | 20619014.0 | 19145602.0 | 28418685.0 | 14779785.0 | 11145330.0 | 11172211.0 | 11590260.0 | ... | 899706.0 | 1661028.0 | 225795.0 | 812741.0 | 853707.0 | 674404.0 | 229314.0 | 551661.0 | 445631.0 | 495041.0 |
Value 2013 | 58708992.0 | 47154020.0 | 24632369.0 | 21554715.0 | 20951280.0 | 28979113.0 | 15179327.0 | 12482507.0 | 11681762.0 | 12696414.0 | ... | 945442.0 | 1613486.0 | 199275.0 | 903012.0 | 995017.0 | 647305.0 | 254189.0 | 585582.0 | 509563.0 | 534875.0 |
Value 2014 | 64102970.0 | 46246719.0 | 24432816.0 | 23924860.0 | 22537644.0 | 31191054.0 | 16052809.0 | 13244246.0 | 13464650.0 | 13333796.0 | ... | 964357.0 | 1486404.0 | 204473.0 | 1183041.0 | 1088456.0 | 708923.0 | 260398.0 | 671607.0 | 535630.0 | 475411.0 |
Value 2015 | 64538065.0 | 43804679.0 | 26355499.0 | 25645126.0 | 27085100.0 | 30456135.0 | 17894345.0 | 14217862.0 | 14867660.0 | 13627422.0 | ... | 998928.0 | 1354299.0 | 249775.0 | 1041041.0 | 1301605.0 | 703271.0 | 478429.0 | 725335.0 | 534547.0 | 496359.0 |
Value 2016 | 61465082.0 | 40407803.0 | 25044156.0 | 24143354.0 | 29036921.0 | 28252467.0 | 18644665.0 | 13715844.0 | 14231925.0 | 13132186.0 | ... | 942742.0 | 1170479.0 | 285428.0 | 972701.0 | 1469631.0 | 708904.0 | 466435.0 | 755540.0 | 522628.0 | 513580.0 |
Value 2017 | 70339967.0 | 45400673.0 | 26726633.0 | 24114281.0 | 33441172.0 | 31650136.0 | 20669159.0 | 14147002.0 | 14387447.0 | 14171418.0 | ... | 1014030.0 | 1034977.0 | 293207.0 | 1311665.0 | 1714360.0 | 717117.0 | 568479.0 | 806258.0 | 600075.0 | 598441.0 |
Value 2018 | 71872376.0 | 47178858.0 | 28222683.0 | 25140494.0 | 33882831.0 | 32501200.0 | 22694251.0 | 16047425.0 | 16382779.0 | 16088867.0 | ... | 1152272.0 | 1073617.0 | 421308.0 | 1596564.0 | 1924222.0 | 777382.0 | 842384.0 | 796777.0 | 676175.0 | 711488.0 |
Value 2019 | 64552117.0 | 42379802.0 | 26512710.0 | 24453425.0 | 24443054.0 | 18709286.0 | 16920257.0 | 14069356.0 | 13012552.0 | 12948998.0 | ... | 967714.0 | 937668.0 | 925406.0 | 877651.0 | 868285.0 | 816483.0 | 724828.0 | 681196.0 | 651946.0 | 642395.0 |
10 rows × 69 columns
#let's apply the shift funciton, see which chinese industry got hurt the most. We can clearly see the last row is negative
df_end_use2=((df_end_use1 - df_end_use1.shift()) / df_end_use1.shift())
df_end_use_trans= df_end_use2.T
df_end_use_trans.columns = ['2010', '2011', '2012','2013','2014','2015','2016','2017','2018','2019']
df_china_end_use = df_end_use_trans.T
#growth rate
df_end_use_trans
2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | |
---|---|---|---|---|---|---|---|---|---|---|
End-Use Code | ||||||||||
(41050) Cell phones and other household goods, n.e.c. | NaN | 0.239971 | 0.306213 | 0.111479 | 0.091877 | 0.006787 | -0.047615 | 0.144389 | 0.021786 | -0.101851 |
(21300) Computers | NaN | 0.200015 | 0.009014 | -0.010162 | -0.019241 | -0.052805 | -0.077546 | 0.123562 | 0.039166 | -0.101720 |
(41120) Toys, games, and sporting goods | NaN | -0.090276 | -0.007549 | -0.019222 | -0.008101 | 0.078693 | -0.049756 | 0.067180 | 0.055976 | -0.060589 |
(40020) Apparel, textiles, nonwool or cotton | NaN | 0.140719 | 0.043749 | 0.045380 | 0.109959 | 0.071903 | -0.058560 | -0.001204 | 0.042556 | -0.027329 |
(21400) Telecommunications equipment | NaN | 0.043189 | 0.104330 | 0.094313 | 0.075717 | 0.201772 | 0.072063 | 0.151678 | 0.013207 | -0.278601 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
(41140) Musical instruments | NaN | 0.007957 | 0.021261 | -0.040182 | 0.095192 | -0.007973 | 0.008010 | 0.011585 | 0.084038 | 0.050298 |
(41100) Motorcycles and parts | NaN | 0.352700 | 0.266326 | 0.108476 | 0.024427 | 0.837299 | -0.025070 | 0.218774 | 0.481821 | -0.139552 |
(21110) Food, tobacco machinery | NaN | 0.182922 | 0.083669 | 0.061489 | 0.146905 | 0.079999 | 0.041643 | 0.067128 | -0.011759 | -0.145061 |
(22010) Parts-civilian aircraft | NaN | 0.362998 | 0.076978 | 0.143464 | 0.051156 | -0.002022 | -0.022297 | 0.148188 | 0.126817 | -0.035832 |
(00180) Other foods | NaN | 0.272243 | 0.147506 | 0.080466 | -0.111174 | 0.044063 | 0.034695 | 0.165234 | 0.188902 | -0.097111 |
69 rows × 10 columns
#There's one outlier that ruins the whole graph
df_end_use_trans['2016'].idxmax()
'(30000) Passenger cars, new and used'
#Let's see what happens to the Chinese Product exports
most_china_list = df_end_use_trans.nsmallest(10,'2019').index.tolist()
most_china_list
['(13020) Stone, sand, cement, etc.', '(30220) Automotive tires and tubes', '(21320) Semiconductors', '(12500) Plastic materials', '(21301) Computer accessories', '(12135) Synthetic cloth', '(01000) Fish and shellfish', '(21160) Measuring, testing, control instruments', '(12550) Chemicals-other, n.e.c.', '(15000) Iron and steel products, n.e.c.']
not_bad = list(df_end_use1.columns)
for i in not_bad:
if i in most_china_list:
not_bad.remove(i)
not_bad.remove(df_end_use_trans['2016'].idxmax())
['(41050) Cell phones and other household goods, n.e.c.', '(21300) Computers', '(41120) Toys, games, and sporting goods', '(40020) Apparel, textiles, nonwool or cotton', '(21400) Telecommunications equipment', '(41000) Furniture, household goods, etc.', '(41030) Household appliances', '(30230) Other parts and accessories of vehicles', '(20005) Electric apparatus', '(40040) Footwear', '(40000) Apparel, household goods - cotton', '(21180) Industrial machines, other', '(21190) Photo, service industry machinery', '(41200) Televisions and video equipment', '(41020) Cookware, cutlery, tools', '(16120) Industrial supplies, other', '(20000) Generators, accessories', '(40140) Other consumer nondurables', '(12540) Chemicals-organic', '(21610) Medicinal equipment', '(40050) Camping apparel and gear', '(21100) Industrial engines', '(15200) Finished metal shapes', '(40030) Apparel,household goods-nontextile', '(41210) Stereo equipment, etc', '(50030) Minimum value shipments', '(13120) Nontextile floor tiles', '(15100) Iron and steel, advanced', '(50020) U.S. goods returned, and reimports', '(21170) Materials handling equipment', '(40100) Pharmaceutical preparations', '(13110) Shingles, wallboard', '(41320) Artwork, antiques, stamps, etc.', '(41130) Photo equipment', '(40110) Books, printed matter', '(41310) Jewelry', '(30200) Engines and engine parts (carburetors, pistons, rings, and valves)', '(21030) Excavating machinery', '(40120) Toiletries and cosmetics', '(41010) Glassware, chinaware', '(21500) Business machines and equipment', '(21200) Agricultural machinery, equipment', '(21140) Wood, glass, plastic', '(12530) Chemicals-inorganic', '(12510) Chemicals-fertilizers', '(12150) Finished textile supplies', '(21120) Metalworking machine tools', '(40010) Apparel, household goods - wool', '(12550) Chemicals-other, n.e.c.', '(21600) Laboratory testing instruments', '(13100) Glass-plate, sheet, etc.', '(12070) Tobacco, waxes, etc.', '(42110) Gem stones, other', '(50040) Other (movies, miscellaneous imports, and special transactions)', '(13020) Stone, sand, cement, etc.', '(41140) Musical instruments', '(41100) Motorcycles and parts', '(21110) Food, tobacco machinery', '(22010) Parts-civilian aircraft', '(00180) Other foods']
# now read and clean the U.S. exports file
url = 'https://www.census.gov/foreign-trade/statistics/product/enduse/exports/c5700.html'
html = requests.get(url).content
df_list_us = pd.read_html(html)
df_end_use_us= df_list_us[-1]
df_end_use_us.to_csv('my data us.csv')
df_end_use_us = df_end_use_us[(df_end_use_us != 0).all(1)]
df_end_use_1_us = df_end_use_us.nlargest(70,'Value 2019')
df_end_use_1_us
End-Use Code | Value 2010 | Value 2011 | Value 2012 | Value 2013 | Value 2014 | Value 2015 | Value 2016 | Value 2017 | Value 2018 | Value 2019 | |
---|---|---|---|---|---|---|---|---|---|---|---|
133 | TOTAL | 91911081 | 104121524 | 110516616 | 121746189 | 123657203 | 115873365 | 115594784 | 129797587 | 120148141 | 106626775 |
88 | (22090) Civilian aircraft, engines, equipment,... | 5754879 | 6388494 | 8357556 | 12586701 | 13927126 | 15439704 | 14577081 | 16264439 | 18220787 | 10458939 |
83 | (21320) Semiconductors | 6410995 | 4607132 | 3892964 | 4737314 | 5510760 | 5976385 | 5955948 | 6076755 | 7117197 | 9113628 |
2 | (00100) Soybeans | 10864318 | 10508369 | 14879547 | 13304176 | 14485397 | 10494079 | 14212335 | 12230585 | 3129041 | 7991950 |
94 | (30000) Passenger cars, new and used | 3441170 | 5305700 | 5698113 | 8522223 | 11178484 | 9055568 | 8842569 | 10208474 | 6638100 | 7231631 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
111 | (41030) Household appliances | 172171 | 163815 | 161100 | 173186 | 156649 | 160376 | 162870 | 182692 | 205756 | 194372 |
55 | (12750) Industrial rubber products | 163349 | 181810 | 172857 | 204646 | 213004 | 195600 | 186298 | 210638 | 212998 | 191699 |
5 | (00210) Sorghum, barley, oats | 3877 | 241 | 284 | 96643 | 1467583 | 2116876 | 1030781 | 838817 | 521577 | 191684 |
33 | (12100) Iron and steel mill products | 175218 | 206232 | 204275 | 186074 | 216657 | 201045 | 161389 | 217445 | 220335 | 184759 |
68 | (21030) Excavating machinery | 273877 | 372710 | 339271 | 304497 | 248859 | 183482 | 163866 | 191132 | 184523 | 182237 |
70 rows × 11 columns
df_end_use1_us = df_end_use_1_us.set_index('End-Use Code').T.astype(float).drop(['TOTAL'],axis=1)
#let's apply the shift funciton, see which chinese industry got hurt the most. We can clearly see the last row is negative
df_end_use2_us=((df_end_use1_us - df_end_use1_us.shift()) / df_end_use1_us.shift())
df_end_use_trans_us= df_end_use2_us.T
df_end_use_trans_us.columns = ['2010', '2011', '2012','2013','2014','2015','2016','2017','2018','2019']
df_china_end_use_us = df_end_use_trans_us.T
#growth rate
df_end_use_trans_us['2013'].idxmax()
'(00210) Sorghum, barley, oats'
We seclect the top 10 HS2 commodities that were most affected by the tariff escalation
#China exports to U.S.
df_end_use_trans.nsmallest(10,'2019')
2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | |
---|---|---|---|---|---|---|---|---|---|---|
End-Use Code | ||||||||||
(13020) Stone, sand, cement, etc. | NaN | 0.021909 | 0.067991 | 0.165525 | 0.093907 | 0.195827 | 0.129091 | 0.166524 | 0.122414 | -0.548760 |
(30220) Automotive tires and tubes | NaN | 0.185733 | 0.284274 | 0.272377 | 0.168855 | -0.288444 | -0.257422 | -0.059118 | 0.237382 | -0.500792 |
(21320) Semiconductors | NaN | 0.555661 | -0.013246 | -0.023967 | 0.206043 | 0.144985 | 0.055277 | 0.127307 | -0.078197 | -0.454291 |
(12500) Plastic materials | NaN | 0.416418 | 0.004797 | 0.111070 | 0.310106 | -0.120030 | -0.065646 | 0.348477 | 0.217204 | -0.450288 |
(21301) Computer accessories | NaN | -0.018851 | 0.027295 | 0.019720 | 0.076329 | -0.023562 | -0.072355 | 0.120261 | 0.026890 | -0.424351 |
(12135) Synthetic cloth | NaN | 0.281204 | 0.093669 | 0.069574 | 0.056051 | 0.062712 | -0.058729 | 0.027167 | 0.082951 | -0.364455 |
(01000) Fish and shellfish | NaN | 0.110230 | 0.002442 | 0.011203 | 0.070062 | -0.097249 | -0.033415 | 0.076939 | 0.078985 | -0.342140 |
(21160) Measuring, testing, control instruments | NaN | 0.139496 | 0.139968 | 0.087184 | 0.061042 | 0.067711 | 0.031598 | 0.096015 | 0.016002 | -0.295595 |
(12550) Chemicals-other, n.e.c. | NaN | 0.551198 | 0.059874 | -0.030072 | 0.038569 | -0.051580 | -0.052498 | 0.048094 | 0.366366 | -0.294860 |
(15000) Iron and steel products, n.e.c. | NaN | 0.358181 | 0.116406 | -0.090753 | 0.158544 | -0.032102 | -0.069929 | 0.092477 | 0.153588 | -0.286197 |
#U.S. exports to China
df_end_use_trans_us.nsmallest(10,'2019')
2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | |
---|---|---|---|---|---|---|---|---|---|---|
End-Use Code | ||||||||||
(12210) Copper | NaN | 0.284117 | 0.017078 | -0.117972 | -0.136723 | -0.419547 | -0.033447 | 0.199860 | -0.213935 | -0.748894 |
(00210) Sorghum, barley, oats | NaN | -0.937839 | 0.178423 | 339.292254 | 14.185611 | 0.442423 | -0.513065 | -0.186232 | -0.378199 | -0.632491 |
(13100) Logs and lumber | NaN | 0.662561 | -0.151574 | 0.412149 | 0.160002 | -0.229387 | 0.247170 | 0.269094 | -0.107772 | -0.450961 |
(11120) Petroleum products, other | NaN | 0.574671 | 0.074208 | 0.376012 | -0.270624 | -0.105296 | -0.036773 | 0.243342 | -0.179879 | -0.435605 |
(22090) Civilian aircraft, engines, equipment, and parts | NaN | 0.110100 | 0.308220 | 0.506027 | 0.106495 | 0.108607 | -0.055870 | 0.115754 | 0.120284 | -0.425989 |
(12700) Synthetic rubber-primary | NaN | 0.085737 | 0.080699 | -0.140991 | -0.100860 | -0.293144 | -0.103461 | 0.087465 | -0.001362 | -0.402623 |
(00310) Dairy products and eggs | NaN | 0.524074 | 0.027630 | 1.013803 | -0.032044 | -0.427603 | -0.112393 | 0.503283 | -0.223736 | -0.387536 |
(12200) Aluminum and alumina | NaN | 0.336770 | -0.101715 | -0.049950 | -0.204781 | -0.232125 | -0.263336 | 0.206854 | -0.355763 | -0.373764 |
(30230) Other parts and accessories of vehicles | NaN | 0.153988 | 0.042625 | 0.761972 | 0.270673 | -0.110203 | 0.248302 | 0.166169 | 0.052884 | -0.322354 |
(10120) Hides and skins | NaN | 0.232361 | 0.127227 | 0.226956 | -0.095561 | -0.152762 | -0.233429 | -0.017189 | -0.364362 | -0.321571 |
most_us_list = df_end_use_trans_us.nsmallest(10,'2019').index.tolist()
most_us_list
['(12210) Copper', '(00210) Sorghum, barley, oats', '(13100) Logs and lumber', '(11120) Petroleum products, other', '(22090) Civilian aircraft, engines, equipment, and parts', '(12700) Synthetic rubber-primary', '(00310) Dairy products and eggs', '(12200) Aluminum and alumina', '(30230) Other parts and accessories of vehicles', '(10120) Hides and skins']
not_bad_us = list(df_end_use1_us.columns)
for i in not_bad_us:
if i in most_us_list:
not_bad_us.remove(i)
most_us_list.remove(df_end_use_trans_us['2013'].idxmax())
fig, ax = plt.subplots(2,figsize=(30,40))
df_china_end_use[not_bad].plot(ax=ax[0], color=[(0.7, 0.7, 0.7)], lw=0.4, legend=False)
df_china_end_use[most_china_list].plot(ax=ax[0] ,lw=2.5, legend=True,alpha = 0.8)
df_china_end_use_us[not_bad_us].plot(ax=ax[1], color=[(0.7, 0.7, 0.7)], lw=0.4, legend=False)
df_china_end_use_us[most_us_list].plot(ax=ax[1] ,lw=2.5, legend=True,alpha = 0.8)
ax[0].set_title("China Exports to U.S. Growth Rate, by 5-digit-End-Use Code",size = 25)
ax[0].text(8, 0.25, "Other Products")
ax[1].set_title("U.S. Exports to China Growth Rate, by 5-digit-End-Use Code",size = 25)
ax[1].text(8, 0.25, "Other Products")
ax[1].text(3.5, 1.5, "note: (00210) Sorghum, barley, oats should also be included, but flutuates too much on the graph")
for j in range(len(ax)):
ax[j].set_facecolor(background_1)
ax[j].spines["right"].set_visible(False)
ax[j].spines["top"].set_visible(False)
ax[j].set_ylabel("Growth Rate",size = 20)
ax[j].set_xlabel('Year')
fig.set_facecolor(background_1)
df_tariff.round(1)
Reporter Name | Partner Name | Trade Flow | Product Group | Indicator | 1992 | 1993 | 1994 | 1995 | 1996 | ... | 2018-09-24 00:00:00 | 2018-11-01 00:00:00 | 2019-01-01 00:00:00 | 2019-02-07 00:00:00 | 2019-05-10 00:00:00 | 2019-07-01 00:00:00 | 2019-09-01 00:00:00 | 2019-09-17 00:00:00 | 2019-12-26 00:00:00 | 2020-02-07 00:00:00 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | China | United States | Import | All Products | AHS Weighted Average (%) | 22.6 | 25.2 | 20.2 | NaN | 19.4 | ... | 18.3 | 18.2 | 16.5 | 16.5 | 20.7 | 20.7 | 21.8 | 21.1 | 20.9 | 20.9 |
1 | United States | China | Import | All Products | AHS Weighted Average (%) | 9.1 | 9.0 | NaN | 6.6 | 6.1 | ... | 12.0 | 12.0 | 12.0 | 12.0 | 17.6 | 17.6 | 21.0 | 21.0 | 21.0 | 21.0 |
2 rows × 49 columns
df_tariff_short = df_tariff.head(2).drop(['Partner Name', 'Trade Flow','Product Group','Indicator'], axis=1)
df_tariff_short.set_index("Reporter Name",inplace=True)
df_tariff_short
1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | ... | 2018-09-24 00:00:00 | 2018-11-01 00:00:00 | 2019-01-01 00:00:00 | 2019-02-07 00:00:00 | 2019-05-10 00:00:00 | 2019-07-01 00:00:00 | 2019-09-01 00:00:00 | 2019-09-17 00:00:00 | 2019-12-26 00:00:00 | 2020-02-07 00:00:00 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Reporter Name | |||||||||||||||||||||
China | 22.596588 | 25.216208 | 20.165407 | NaN | 19.432578 | 16.807201 | 16.158146 | 15.039880 | 17.136340 | 16.150239 | ... | 18.3 | 18.2 | 16.5 | 16.5 | 20.7 | 20.7 | 21.80000 | 21.10000 | 20.912747 | 20.912747 |
United States | 9.140252 | 8.979555 | NaN | 6.596073 | 6.109358 | 5.684370 | 4.911233 | 4.427894 | 4.112225 | 4.161655 | ... | 12.0 | 12.0 | 12.0 | 12.0 | 17.6 | 17.6 | 21.03562 | 21.03562 | 21.035620 | 21.026250 |
2 rows × 44 columns
df_new_tariff = df_tariff_short.T.reset_index()
df_new_tariff.to_csv(r'/Users/zihao/Desktop/UBC/Econ 323/Project/df_new_tariff.csv', index = False)
df_tariff_time= pd.read_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/Time_series_tariff.xlsx')
df_tariff_time.tail()
Year | China | United States | |
---|---|---|---|
39 | 2019-07-01 | 20.700000 | 17.60000 |
40 | 2019-09-01 | 21.800000 | 21.03562 |
41 | 2019-09-17 | 21.100000 | 21.03562 |
42 | 2019-12-26 | 20.912747 | 21.03562 |
43 | 2020-02-07 | 20.912747 | 21.02625 |
pd.to_datetime(df_tariff_time.Year, format='%Y%M%D')
0 1992-01-01 1 1993-01-01 2 1994-01-01 3 1995-01-01 4 1996-01-01 5 1997-01-01 6 1998-01-01 7 1999-01-01 8 2000-01-01 9 2001-01-01 10 2002-01-01 11 2003-01-01 12 2004-01-01 13 2005-01-01 14 2006-01-01 15 2007-01-01 16 2008-01-01 17 2009-01-01 18 2010-01-01 19 2011-01-01 20 2012-01-01 21 2013-01-01 22 2014-01-01 23 2015-01-01 24 2016-01-01 25 2017-01-01 26 2018-01-01 27 2018-02-07 28 2018-03-23 29 2018-04-02 30 2018-05-01 31 2018-07-01 32 2018-07-06 33 2018-08-23 34 2018-09-24 35 2018-11-01 36 2019-01-01 37 2019-02-07 38 2019-05-10 39 2019-07-01 40 2019-09-01 41 2019-09-17 42 2019-12-26 43 2020-02-07 Name: Year, dtype: datetime64[ns]
df_tariff_time.head()
Year | China | United States | |
---|---|---|---|
0 | 1992-01-01 | 22.596588 | 9.140252 |
1 | 1993-01-01 | 25.216208 | 8.979555 |
2 | 1994-01-01 | 20.165407 | NaN |
3 | 1995-01-01 | NaN | 6.596073 |
4 | 1996-01-01 | 19.432578 | 6.109358 |
df_tariff_time.set_index('Year',inplace=True)
df_tariff_time.head()
China | United States | |
---|---|---|
Year | ||
1992-01-01 | 22.596588 | 9.140252 |
1993-01-01 | 25.216208 | 8.979555 |
1994-01-01 | 20.165407 | NaN |
1995-01-01 | NaN | 6.596073 |
1996-01-01 | 19.432578 | 6.109358 |
df_tariff_time.rename(columns={"China": "China tariff on the U.S.'s exports", "United States": "U.S. tariff on China's exports"},inplace = True)
neworder = ["U.S. tariff on China's exports","China tariff on the U.S.'s exports"]
df_US_China = df_tariff_time.reindex(columns=neworder)
df_US_China = df_US_China.round(1)
df_US_China.head()
U.S. tariff on China's exports | China tariff on the U.S.'s exports | |
---|---|---|
Year | ||
1992-01-01 | 9.1 | 22.6 |
1993-01-01 | 9.0 | 25.2 |
1994-01-01 | NaN | 20.2 |
1995-01-01 | 6.6 | NaN |
1996-01-01 | 6.1 | 19.4 |
null_sum_tariff = df_US_China.isnull().sum()
null_sum_tariff
U.S. tariff on China's exports 1 China tariff on the U.S.'s exports 3 dtype: int64
df_trade_war = df_US_China.loc[df_US_China.index > '2017-01-01 00:00:00']
df_trade_war.head()
U.S. tariff on China's exports | China tariff on the U.S.'s exports | |
---|---|---|
Year | ||
2018-01-01 | 2.8 | 5.9 |
2018-02-07 | 3.2 | 5.9 |
2018-03-23 | 3.8 | 5.9 |
2018-04-02 | 3.8 | 8.4 |
2018-05-01 | 3.8 | 8.3 |
url = 'https://www.census.gov/foreign-trade/balance/c5700.html'
html = requests.get(url).content
df_list_us = pd.read_html(html)
df_us_trade_in_2020=df_list_us[0]
df_us_trade_in_2019 = df_list_us[1]
df_us_trade_in_2018 = df_list_us[2]
df_trade_flow = pd.concat([df_us_trade_in_2018, df_us_trade_in_2019,df_us_trade_in_2020])
df_trade_flow
Month | Exports | Imports | Balance | |
---|---|---|---|---|
0 | January 2018 | 9902.6 | 45765.6 | -35863.1 |
1 | February 2018 | 9759.9 | 39020.6 | -29260.7 |
2 | March 2018 | 12652.1 | 38327.6 | -25675.5 |
3 | April 2018 | 10503.8 | 38303.9 | -27800.1 |
4 | May 2018 | 10428.2 | 43965.7 | -33537.5 |
5 | June 2018 | 10860.1 | 44612.1 | -33752.0 |
6 | July 2018 | 10134.6 | 47120.6 | -36986.0 |
7 | August 2018 | 9285.9 | 47869.2 | -38583.3 |
8 | September 2018 | 9730.0 | 50015.0 | -40285.0 |
9 | October 2018 | 9139.9 | 52202.3 | -43062.5 |
10 | November 2018 | 8606.2 | 46500.8 | -37894.6 |
11 | December 2018 | 9144.9 | 45972.1 | -36827.2 |
12 | TOTAL 2018 | 120148.1 | 539675.6 | -419527.4 |
0 | January 2019 | 7134.3 | 41603.8 | -34469.5 |
1 | February 2019 | 8433.6 | 33194.4 | -24760.8 |
2 | March 2019 | 10426.5 | 31175.7 | -20749.1 |
3 | April 2019 | 7896.3 | 34798.9 | -26902.6 |
4 | May 2019 | 9074.5 | 39269.1 | -30194.6 |
5 | June 2019 | 9034.7 | 39002.3 | -29967.6 |
6 | July 2019 | 8733.7 | 41508.7 | -32775.0 |
7 | August 2019 | 9430.6 | 41187.3 | -31756.6 |
8 | September 2019 | 8597.7 | 40220.1 | -31622.5 |
9 | October 2019 | 8886.4 | 40145.7 | -31259.3 |
10 | November 2019 | 10103.3 | 36468.9 | -26365.7 |
11 | December 2019 | 8875.1 | 33668.4 | -24793.3 |
12 | TOTAL 2019 | 106626.8 | 452243.4 | -345616.7 |
0 | January 2020 | 7215.3 | 33280.6 | -26065.3 |
1 | February 2020 | 6815.0 | 22813.1 | -15998.1 |
2 | TOTAL 2020 | 14030.3 | 56093.6 | -42063.3 |
df_trade_flow = df_trade_flow.set_index('Month').drop(['TOTAL 2018','TOTAL 2019','TOTAL 2020'])
balance = df_trade_flow['Balance'].tolist() ## later annotate
df_trade_flow.drop('Balance',axis=1,inplace = True)
improve = (balance[9] - balance[25])/balance[9]
improve
0.6284911465892599
from datetime import datetime, timedelta
month = ['01-01-2018','02-01-2018','03-01-2018','04-01-2018','05-01-2018','06-01-2018','07-01-2018',
'08-01-2018','09-01-2018','10-01-2018','11-01-2018','12-01-2018','01-01-2019','02-01-2019','03-01-2019','04-01-2019','05-01-2019','06-01-2019',
'07-01-2019','08-01-2019','09-01-2019','10-01-2019','11-01-2019','12-01-2019','01-01-2020','02-01-2020']
#add it into the dataframe
df_trade_flow['month'] = month
df_trade_flow_clean = df_trade_flow.reset_index().drop('Month',axis=1)
df_trade_flow_clean.to_excel(r'/Users/zihao/Desktop/UBC/Econ 323/Project/df_trade_flow_clean.xlsx', index = False)
df_trade_flow_date = pd.read_excel (r'/Users/zihao/Desktop/UBC/Econ 323/Project/df_trade_flow_date.xlsx')
pd.to_datetime(df_trade_flow_date.Year, format='%Y%M%D')
df_trade_flow_date.head()
Exports | Imports | Year | |
---|---|---|---|
0 | 9902.6 | 45765.6 | 2018-01-01 |
1 | 9759.9 | 39020.6 | 2018-02-01 |
2 | 12652.1 | 38327.6 | 2018-03-01 |
3 | 10503.8 | 38303.9 | 2018-04-01 |
4 | 10428.2 | 43965.7 | 2018-05-01 |
df_trade_flow_date1 = df_trade_flow_date.set_index('Year')
The timeline is based on https://www.china-briefing.com/news/the-us-china-trade-war-a-timeline/
from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes
from mpl_toolkits.axes_grid1.inset_locator import mark_inset
import matplotlib.patches as patches
from matplotlib.path import Path
style = dict(size=15, color=tuple(np.array([139,0,0]) / 255))
style2 = dict(size=20, color='gray')
style3 = dict(size=10, color='blue')
style4 = dict(size=10, color='red')
style5 = dict(size=15, color='black')
style6 = dict(size=10, color='gray')
style7 = dict(size=20, color='red')
fig, ax = plt.subplots(3,figsize=(30,40))
df_US_China.plot(ax=ax[0])
ax[0].set_title("Bilateral Tariff Rate AHS Simple Average (1992-2020)",size =40)
ax[0].set_ylabel('Effectively Applied Tariff Rate %', size =20)
ax[0].text('1996-01-01', 15, "WITS Missing Data ", ha='right', **style)
ax[0].text('2014-01-01', 5, "WITS Missing Data ", ha='right', **style)
ax[0].axvline(x="2001-12-11",linestyle = '--',color='gray')
ax[0].text('2014-01-01', 5, "WITS Missing Data ", ha='right', **style)
ax[0].annotate('China Enters WTO', xy=('2001-12-11', 20), xycoords='data',
xytext=(-30, 0), textcoords='offset points',
size=20, ha='right', va="center",
bbox=dict(boxstyle="round", alpha=0.1),
arrowprops=dict(arrowstyle="wedge,tail_width=0.5", alpha=0.1));
ax[0].text('2018-03-23', 15, "Tariff Escalation ", ha='left', **style6)
ax[0].axvline(x="2018-01-01", ymin=0 ,ymax=0.84,color="gray",linestyle = ':')
ax[0].axvline(x="2020-02-17", ymin=0 ,ymax=0.84,color="gray",linestyle = ':')
ax[0].hlines(y=22.4, xmin="2018-01-01",xmax="2020-02-17",color="gray",linestyle = ':')
ax[0].annotate("Trade Spat Period", xy=('2018-01-01', 20), xycoords='data',
xytext=(-30, 0), textcoords='offset points',
size=20, ha='right', va="center", color = 'red',
bbox=dict(boxstyle="round", alpha=0.1),
arrowprops=dict(arrowstyle="wedge,tail_width=0.5", alpha=0.1))
df_trade_war.plot(ax=ax[1])
ax[1].set_title("Zooming in the Trade War",size =40)
ax[1].set_ylabel('Effectively Applied Tariff Rate %', size =20)
ax[1].text('2018-02-07', 5.0, "Feburary 7, 2018 ", ha='right', **style3)
ax[1].text('2018-02-07', 4.5, "US Section 201 ", ha='right', **style3)
ax[1].text('2018-02-07', 4.0, "tariffs on solar panels ", ha='right', **style3)
ax[1].text('2018-03-23', 5.3, "March 23, 2018", ha='center', **style3)
ax[1].text('2018-03-23', 4.8, "US Section 232", ha='center', **style3)
ax[1].text('2018-03-23', 4.3, "tariffs on steel and aluminum", ha='center', **style3)
ax[1].text('2018-04-22', 14.5, "April 2, 2018", ha='center', **style4)
ax[1].text('2018-04-22', 14, "China's retaliation", ha='center', **style4)
ax[1].text('2018-04-22', 13.5, "US Section 232 tariffs", ha='center', **style4)
ax[1].text('2018-05-01', 12, "May 1, 2018", ha='center', **style4)
ax[1].text('2018-05-01', 11.5, "China's MFN tariff cut", ha='center', **style4)
ax[1].text('2018-05-01', 11, "on pharmeceuticals", ha='center', **style4)
ax[1].text('2018-07-01', 10, "July 1, 2018", ha='center', **style4)
ax[1].text('2018-07-01', 9.5, "China's MFN tariff cut on ", ha='center', **style4)
ax[1].text('2018-07-01', 9, "consumer goods, autos, and ITA products", ha='center', **style4)
ax[1].text('2018-07-06', 5.5, " July 6, 2018", ha='center', **style3)
ax[1].text('2018-07-06', 5, "US Section 301 tariffs ", ha='center', **style3)
ax[1].text('2018-07-06', 4.5, " ($34 billion)", ha='center', **style3)
ax[1].text('2018-07-06', 14.5, "July 6, 2018", ha='center', **style4)
ax[1].text('2018-07-06', 14, "China's retaliation", ha='center', **style4)
ax[1].text('2018-07-06', 13.5, "($34 billion)", ha='center', **style4)
ax[1].axvline(x="2018-07-06",linestyle = '--',color='lightgray')
ax[1].text('2018-07-06', 19, '1st round Retaliation', ha='center', **style5)
ax[1].axvline(x="2018-08-23",linestyle = '--',color='lightgray')
ax[1].text('2018-08-23', 21, '2nd round Retaliation', ha='center', **style5)
ax[1].text('2018-08-23', 7.3, 'August 23, 2018', ha='center', **style3)
ax[1].text('2018-08-23', 6.8, "US Section 301 tariffs", ha='center', **style3)
ax[1].text('2018-08-23', 6.3, "($16 billion)", ha='center', **style3)
ax[1].text('2018-08-23', 17.5, "August 23, 2018", ha='center', **style4)
ax[1].text('2018-08-23', 17, "China's retaliation", ha='center', **style4)
ax[1].text('2018-08-23', 16.5, "($34 billion)", ha='center', **style4)
ax[1].text('2018-09-24', 9.5, "September 24, 2018", ha='center', **style3)
ax[1].text('2018-09-24', 9, "US Section 301 tariffs", ha='center', **style3)
ax[1].text('2018-09-24', 8.5, "($200 billion)", ha='center', **style3)
ax[1].text('2018-09-24', 20, "September 24, 2018", ha='center', **style4)
ax[1].text('2018-09-24', 19.5, "China's retaliation", ha='center', **style4)
ax[1].text('2018-09-24', 19, "($60 billion)", ha='center', **style4)
ax[1].axvline(x="2018-09-24",linestyle = '--',color='lightgray')
ax[1].text('2018-09-24', 22, '3rd round Retaliation', ha='center', **style5)
ax[1].text('2018-11-01', 17, "November 1, 2018", ha='center', **style4)
ax[1].text('2018-11-01', 16.5, "China's MFN tariff cut", ha='center', **style4)
ax[1].text('2018-11-01', 16, "on industry goods", ha='center', **style4)
ax[1].text('2019-05-10', 15, "May 10, 2019", ha='center', **style3)
ax[1].text('2019-05-10', 14.5, "US Section 301 tariffs", ha='center', **style3)
ax[1].text('2019-05-10', 14, "($200 billion)", ha='center', **style3)
ax[1].text('2019-05-10', 20, "June 1, 2019", ha='center', **style4)
ax[1].text('2019-05-10', 19.5, "China's retaliation", ha='center', **style4)
ax[1].text('2019-05-10', 19, "($60 billion)", ha='center', **style4)
ax[1].axvline(x="2019-02-07",linestyle = '--',color='lightgray')
ax[1].text('2019-02-07', 19, '4th round Retaliation', ha='center', **style5)
ax[1].text('2019-09-01', 19.5, "September 1, 2019", ha='center', **style3)
ax[1].text('2019-09-01', 19, "US Section 301 tariffs", ha='center', **style3)
ax[1].text('2019-09-01', 18.5, "($300 billion)", ha='center', **style3)
ax[1].text('2019-09-01', 23, "September 1, 2019", ha='center', **style4)
ax[1].text('2019-09-01', 22.5, "China's retaliation", ha='center', **style4)
ax[1].text('2019-09-01', 22, "($75 billion)", ha='center', **style4)
ax[1].axvline(x="2019-07-01",linestyle = '--',color='lightgray')
ax[1].text('2019-07-01', 10, '5th round Retaliation', ha='center', **style5)
ax[1].annotate('Levels around 21%', xy=('2019-11-30', 21), xycoords='data', ha='center',
xytext=(0, -20), textcoords='offset points')
ax[1].annotate('', xy=('2019-09-17', 20.9), xytext=('2020-02-01', 20.9),
xycoords='data', textcoords='data',
arrowprops={'arrowstyle': '|-|,widthA=0.2,widthB=0.2', })
ax[1].axvline(x="2019-09-17",linestyle = '--',color='lightgray')
ax[1].text('2019-09-17', 12.5, 'Temporary Pause', ha='center', **style6)
df_trade_flow_date1.plot(ax=ax[2])
ax[2].set_title("2018-2020 U.S. imports and exports from China",size =40)
ax[2].set_ylabel('Million USD', size =20)
ax[2].axvspan('2018-03-28', '2018-07-01', color="grey", alpha=0.05)
ax[2].axvspan('2018-07-01', '2018-08-23', color="grey", alpha=0.1)
ax[2].axvspan('2018-08-23', '2018-09-24', color="grey", alpha=0.15)
ax[2].axvspan('2019-02-07', '2019-07-01', color="grey", alpha=0.05)
ax[2].axvspan('2019-07-01', '2019-09-01', color="grey", alpha=0.1)
ax[2].text('2018-01-01', 25000, 'trade balnce', ha='center', **style7)
ax[2].text('2018-01-01', 23000, balance[0], ha='center', **style7)
ax[2].text('2018-01-01', 21000, 'January 2018', ha='center', **style7)
ax[2].text('2018-10-01', 40000, 'trade balnce', ha='center', **style7)
ax[2].text('2018-10-01', 38000, balance[9], ha='center', **style7)
ax[2].text('2018-10-01', 36000, 'October 2018', ha='center', **style7)
ax[2].text('2019-03-01', 20000, 'trade balnce', ha='center', **style7)
ax[2].text('2019-03-01', 18000, balance[12], ha='center', **style7)
ax[2].text('2019-03-01', 16000, 'March 2019', ha='center', **style7)
ax[2].text('2020-02-01', 18000, 'trade balnce', ha='center', **style7)
ax[2].text('2020-02-01', 16000, balance[25], ha='center', **style7)
ax[2].text('2020-02-01', 14000, 'Feburary 2020', ha='center', **style7)
ax[2].text('2018-05-15', 29000, '1st', ha='center', **style2)
ax[2].text('2018-08-01', 29000, '2nd', ha='center', **style2)
ax[2].text('2018-09-10', 29000, '3rd', ha='center', **style2)
ax[2].text('2019-05-01', 29000, '4th', ha='center', **style2)
ax[2].text('2019-08-01', 29000, '5th', ha='center', **style2)
ax[2].text('2020-01-01', 45000, '**Since October 2018, trade deficit', ha='center', **style5)
ax[2].text('2020-01-01', 43500, 'has been reduced by', ha='center', **style5)
ax[2].text('2020-01-01', 42000, "{:%}".format(improve), ha='center', **style5)
ax[2].text('2018-07-01', 26000, 'Adjustment Lag', ha='center', **style2)
ax[2].annotate('', xy=('2018-03-28', 25000), xytext=('2018-10-01', 25000),
xycoords='data', textcoords='data',
arrowprops={'arrowstyle': '|-|,widthA=0.5,widthB=0.5', })
ax[2].text('2019-05-01', 26000, 'Adjustment Lag', ha='center', **style2)
ax[2].annotate('', xy=('2019-02-07', 25000), xytext=('2019-07-01', 25000),
xycoords='data', textcoords='data',
arrowprops={'arrowstyle': '|-|,widthA=0.5,widthB=0.5', })
for j in range(len(ax)):
ax[j].spines["right"].set_visible(False)
ax[j].spines["top"].set_visible(False)
ax[j].legend(fontsize=20)
ax[j].tick_params(axis='both', which='both', labelsize=20)
ax[j].tick_params(which="both", left=False)
ax[j].set_facecolor(background_1)
fig.set_facecolor(background_1)
This pair of state datasets come from USCBC, the U.S.-China Business Council State Export Report 2020. https://www.uschina.org/reports/2020-state-export-report
And the bokeh map plot is the modification based on https://towardsdatascience.com/walkthrough-mapping-basics-with-bokeh-and-geopandas-in-python-43f40aa5b7e9
import geopandas as gpd
contiguous_usa = gpd.read_file('/Users/zihao/Desktop/UBC/Econ 323/Project/cb_2018_us_state_20m.shp')
contiguous_usa.head()
STATEFP | STATENS | AFFGEOID | GEOID | STUSPS | NAME | LSAD | ALAND | AWATER | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 24 | 01714934 | 0400000US24 | 24 | MD | Maryland | 00 | 25151100280 | 6979966958 | MULTIPOLYGON (((-76.04621 38.02553, -76.00734 ... |
1 | 19 | 01779785 | 0400000US19 | 19 | IA | Iowa | 00 | 144661267977 | 1084180812 | POLYGON ((-96.62187 42.77925, -96.57794 42.827... |
2 | 10 | 01779781 | 0400000US10 | 10 | DE | Delaware | 00 | 5045925646 | 1399985648 | POLYGON ((-75.77379 39.72220, -75.75323 39.757... |
3 | 39 | 01085497 | 0400000US39 | 39 | OH | Ohio | 00 | 105828882568 | 10268850702 | MULTIPOLYGON (((-82.86334 41.69369, -82.82572 ... |
4 | 42 | 01779798 | 0400000US42 | 42 | PA | Pennsylvania | 00 | 115884442321 | 3394589990 | POLYGON ((-80.51989 40.90666, -80.51964 40.987... |
state_pop = pd.read_csv('/Users/zihao/Desktop/UBC/Econ 323/Project/state_pop_2018 copy.csv')
state_pop.head()
SUMLEV | REGION | DIVISION | STATE | NAME | Goods2018 | Service2018 | Total2018 | GoodsExports2019 | Relative_Change | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 10 | 0 | 0 | 0 | United States | 118.196 | 56.212 | 174.408 | 104.4325 | -0.12 |
1 | 40 | 3 | 6 | 1 | Alabama | 3.000 | 0.400 | 3.400 | 2.3000 | -0.23 |
2 | 40 | 4 | 9 | 2 | Alaska | 1.000 | 0.300 | 1.300 | 0.9000 | -0.10 |
3 | 40 | 4 | 8 | 4 | Arizona | 1.200 | 1.000 | 2.200 | 1.1000 | -0.08 |
4 | 40 | 3 | 7 | 5 | Arkansas | 0.400 | 0.200 | 0.600 | 0.5000 | 0.25 |
# Merge datasets
pop_states = contiguous_usa.merge(state_pop, left_on = 'NAME', right_on = 'NAME')
# Drop Alaska and Hawaii
pop_states = pop_states.loc[~pop_states['NAME'].isin(['Alaska', 'Hawaii'])]
import json
from bokeh.io import show
from bokeh.models import (CDSView, ColorBar, ColumnDataSource, CustomJS,
CustomJSFilter, GeoJSONDataSource, HoverTool,
LinearColorMapper, Slider)
from bokeh.layouts import column, row, widgetbox
from bokeh.palettes import brewer
from bokeh.plotting import figure
from bokeh.io import output_notebook
# Input GeoJSON source that contains features for plotting
geosource = GeoJSONDataSource(geojson = pop_states.to_json())
# Define color palettes
palette = brewer['RdPu'][8]
palette = palette[::-1] # reverse order of colors so higher values have darker colors
# Instantiate LinearColorMapper that linearly maps numbers in a range, into a sequence of colors.
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 28)
# Define custom tick labels for color bar.
tick_labels = {'0': '0', '0.5': '0.5',
'1':'1', '2':'2',
'4':'4', '6':'6',
'8':'8', '10':'10',
'12':'12+'}
# Create color bar.
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal',
major_label_overrides = tick_labels)
# Create figure object.
p = figure(title = 'U.S. Good and Service Export to China by States , 2018, USD Billion', plot_height = 600 ,
plot_width = 950, toolbar_location = 'below',
tools = "pan, wheel_zoom, box_zoom, reset")
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
# Add patch renderer to figure.
states = p.patches('xs','ys', source = geosource,
fill_color = {'field' :'Total2018', 'transform' : color_mapper},
line_color = 'gray', line_width = 0.25, fill_alpha = 1)
# Create hover tool
p.add_tools(HoverTool(renderers = [states],
tooltips = [('State','@NAME'),('TotalExportValue', '@Total2018'),('GoodExport', '@Goods2018'),
('ServiceExport','@Service2018')]))
# Specify layout
p.add_layout(color_bar, 'below')
output_notebook()
show(p)
file:///private/var/folders/5d/gs1fcrm524j0m4mvfqlchx5w0000gn/T/tmpldrqtmfb.html
"U.S. Exports to China by States.mov"
Texas, Carlifonia, Washington are among the top state exporters to China in 2018.
from IPython.display import Image
Image("/Users/zihao/Desktop/UBC/Econ 323/Project/U.S. Export State.png")
Though there were some states still witnessed a growth in exports to China, the major exporters, however, suffered from a decline. (As most parts of the heating map are darken) The 2019 Service Exports to China has not been released, so the contraction was only based on good exports.
# Define color palettes
palette = brewer['Blues'][8]
# Instantiate LinearColorMapper that linearly maps numbers in a range, into a sequence of colors.
color_mapper = LinearColorMapper(palette = palette, low = -2, high = 2)
# Define custom tick labels for color bar.
tick_labels = {'-2': '-2', '-1': '-1',
'-0.5':'-0.5', '-0.2':'-0.2',
'0':'0', '0.2':'0.2',
'0.4':'0.4', '0.6':'0.6',
'0.8':'0.8+'}
# Create color bar.
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal',
major_label_overrides = tick_labels)
# Create figure object.
p = figure(title = 'U.S. Good Exports to China Relative Change by Percentage, 2019', plot_height = 600 ,
plot_width = 950, toolbar_location = 'below',
tools = "pan, wheel_zoom, box_zoom, reset")
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
# Add patch renderer to figure.
states = p.patches('xs','ys', source = geosource,
fill_color = {'field' :'Relative_Change', 'transform' : color_mapper},
line_color = 'gray', line_width = 0.25, fill_alpha = 1)
# Create hover tool
p.add_tools(HoverTool(renderers = [states],
tooltips = [('State','@NAME'),('GrowthRate', '@Relative_Change'),('GoodExport2018', '@Goods2018'),
('GoodsExport2019','@GoodsExports2019')]))
# Specify layout
p.add_layout(color_bar, 'below')
output_notebook()
show(p)
You can also look at the video attached to the document "Relative change in flow.mov"
from IPython.display import Image
Image("/Users/zihao/Desktop/UBC/Econ 323/Project/Relative Change.png")
df
iso_o | iso_d | year | FLOW | |
---|---|---|---|---|
0 | ABW | AFG | 2001 | 0.0 |
1 | ABW | AFG | 2002 | 0.0 |
2 | ABW | AFG | 2003 | 0.0 |
3 | ABW | AFG | 2004 | 0.0 |
4 | ABW | AFG | 2005 | 0.0 |
... | ... | ... | ... | ... |
537583 | ZWE | ZMB | 2010 | 46558256.0 |
537584 | ZWE | ZMB | 2011 | 55171376.0 |
537585 | ZWE | ZMB | 2012 | 58787336.0 |
537586 | ZWE | ZMB | 2013 | 77869776.0 |
537587 | ZWE | ZMB | 2014 | 62405360.0 |
537588 rows × 4 columns
df = df[['year', 'iso_o', 'iso_d', 'FLOW']]
df
year | iso_o | iso_d | FLOW | |
---|---|---|---|---|
0 | 2001 | ABW | AFG | 0.0 |
1 | 2002 | ABW | AFG | 0.0 |
2 | 2003 | ABW | AFG | 0.0 |
3 | 2004 | ABW | AFG | 0.0 |
4 | 2005 | ABW | AFG | 0.0 |
... | ... | ... | ... | ... |
537583 | 2010 | ZWE | ZMB | 46558256.0 |
537584 | 2011 | ZWE | ZMB | 55171376.0 |
537585 | 2012 | ZWE | ZMB | 58787336.0 |
537586 | 2013 | ZWE | ZMB | 77869776.0 |
537587 | 2014 | ZWE | ZMB | 62405360.0 |
537588 rows × 4 columns
df_year = df.set_index(["year"])
df_year
iso_o | iso_d | FLOW | |
---|---|---|---|
year | |||
2001 | ABW | AFG | 0.0 |
2002 | ABW | AFG | 0.0 |
2003 | ABW | AFG | 0.0 |
2004 | ABW | AFG | 0.0 |
2005 | ABW | AFG | 0.0 |
... | ... | ... | ... |
2010 | ZWE | ZMB | 46558256.0 |
2011 | ZWE | ZMB | 55171376.0 |
2012 | ZWE | ZMB | 58787336.0 |
2013 | ZWE | ZMB | 77869776.0 |
2014 | ZWE | ZMB | 62405360.0 |
537588 rows × 3 columns
df_2014 = df_year.loc[2014]
df_2014.reset_index(drop=True,inplace=True)
df_2014
iso_o | iso_d | FLOW | |
---|---|---|---|
0 | ABW | AFG | 0.000000e+00 |
1 | ABW | AGO | 0.000000e+00 |
2 | ABW | ALB | 0.000000e+00 |
3 | ABW | ARE | 3.018303e+05 |
4 | ABW | ARG | 0.000000e+00 |
... | ... | ... | ... |
36966 | ZWE | VUT | 0.000000e+00 |
36967 | ZWE | WSM | 0.000000e+00 |
36968 | ZWE | YEM | 4.629711e+05 |
36969 | ZWE | ZAF | 1.235077e+08 |
36970 | ZWE | ZMB | 6.240536e+07 |
36971 rows × 3 columns
df_2014["FLOW"] = df_2014["FLOW"].replace('nan', np.nan).fillna(0)
/Users/zihao/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for launching an IPython kernel.
df_2014.head(6)
iso_o | iso_d | FLOW | |
---|---|---|---|
0 | ABW | AFG | 0.0000 |
1 | ABW | AGO | 0.0000 |
2 | ABW | ALB | 0.0000 |
3 | ABW | ARE | 301830.3125 |
4 | ABW | ARG | 0.0000 |
5 | ABW | ARM | 0.0000 |
df_2014.to_csv(r'/Users/zihao/Desktop/UBC/Econ 323/Project/df_2014_prep .csv', index = False)
df_2014_copy = df_2014.copy()
aggregate_exports = df_2014_copy.groupby('iso_o')['FLOW'].sum().reset_index()
aggregate_exports
iso_o | FLOW | |
---|---|---|
0 | ABW | 3.859146e+08 |
1 | AFG | 4.465384e+08 |
2 | AGO | 3.754324e+10 |
3 | ALB | 1.401560e+09 |
4 | ARE | 1.447143e+11 |
... | ... | ... |
191 | YEM | 6.121867e+09 |
192 | YUG | 0.000000e+00 |
193 | ZAF | 8.141608e+10 |
194 | ZMB | 4.169583e+09 |
195 | ZWE | 1.785179e+09 |
196 rows × 2 columns
I only selected the top 100 exporters, since low trade flow value will dramatically change the eigenvector values and distortes the node's size.
top100_aggregate_exports = aggregate_exports.nlargest(100,'FLOW')
top100 = top100_aggregate_exports['iso_o'].values.tolist()
top100_2014 = df_2014_copy[df_2014_copy ['iso_o'].isin(top100)]
top100_2014_square = top100_2014[top100_2014['iso_d'].isin(top100)]
top100_2014_square
iso_o | iso_d | FLOW | |
---|---|---|---|
374 | AGO | ARE | 0.000000e+00 |
375 | AGO | ARG | 3.211250e+03 |
377 | AGO | AUS | 3.731991e+07 |
378 | AGO | AUT | 2.813790e+04 |
379 | AGO | AZE | 1.823190e+03 |
... | ... | ... | ... |
36769 | ZMB | USA | 3.380583e+07 |
36772 | ZMB | VEN | 0.000000e+00 |
36773 | ZMB | VNM | 1.914714e+05 |
36776 | ZMB | YEM | 0.000000e+00 |
36777 | ZMB | ZAF | 2.234939e+08 |
9900 rows × 3 columns
top100_2014_square.to_csv(r'/Users/zihao/Desktop/UBC/Econ 323/Project/top100_2014_square.csv', index = False)
https://en.wikipedia.org/wiki/Eigenvector_centrality
In graph theory, eigenvector centrality (also called eigencentrality or prestige score) is a measure of the influence of a node in a network. Relative scores are assigned to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. A high eigenvector score means that a node is connected to many nodes who themselves have high scores.
$$ {\displaystyle x_{v}={\frac {1}{\lambda }}\sum _{t\in M(v)}x_{t}={\frac {1}{\lambda }}\sum _{t\in G}a_{v,t}x_{t}} $$where $M(v)$ is a set of the neighbors of $v$ and $\lambda$ is a constant. With a small rearrangement this can be rewritten in vector notation as the eigenvector equation where $${\displaystyle \mathbf {Ax} =\lambda \mathbf {x} }$$
import networkx as nx
import csv
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.axes_grid1 import make_axes_locatable
%matplotlib inline
Here I apply right eigen cenrality for export to consturct the network grah based on ComputSimu http://computsimu.blogspot.com/2014/01/networks-1-scraping-data-visualization.html and https://briandew.wordpress.com/2016/06/15/trade-network-analysis-why-centrality-matters/
G = nx.DiGraph() # create a directed graph called G
# Loop reads a csv file with scrap aluminum bilateral trade data
with open('/Users/zihao/Desktop/UBC/Econ 323/Project/top100_2014_square.csv', 'r') as csvfile:
csv_f = csv.reader(csvfile)
next(csv_f)
# Now we build the network by adding each row of data
# as an edge between two nodes (columns 1 and 2).
for row in csv_f:
G.add_edge(row[0],row[1],weight=row[2])
ec = nx.eigenvector_centrality_numpy(G, weight='weight')
nx.set_node_attributes(G,ec,'cent')
node_color = [float(G.nodes[v]['cent']) for v in G]
totexp = {}
# Calculate total exports of each country in the network
for exp in G.nodes():
tx=sum([float(g) for exp,f,g in G.out_edges(exp, 'weight')])
totexp[exp] = tx
avgexp = np.mean(tx)
nx.set_node_attributes(G, totexp, 'totexp')
# Use the results later for the node's size in the graph
node_size = [float(G.nodes[v]['totexp'])*50 / avgexp for v in G]
fig = plt.figure(figsize = (30,18))
nodes = nx.draw_networkx_nodes(G,pos, node_size=node_size, \
node_color=node_color, alpha=0.3)
# Draw edges
edges = nx.draw_networkx_edges(G, pos, edge_color='lightgray', \
arrows=False, width=0.01,)
# Visualization
# Calculate position of each node in G using networkx spring layout
pos = nx.spring_layout(G,k=500,iterations=50)
# Draw nodes
nx.draw_networkx_nodes(G,pos, node_size=node_size, node_color=node_color, alpha=0.5)
# Draw edges
nx.draw_networkx_edges(G, pos, edge_color='gray', arrows=False, width=0.05,)
# Add labels
nx.draw_networkx_labels(G,pos,font_size=15)
nodes.set_edgecolor('gray')
# Add labels and title
plt.title('Global trade network, 2014', fontsize=40)
# Bar with color scale for eigenvalues
cbar = plt.colorbar(mappable=nodes, cax=None, ax=None, fraction=0.015, pad=0.04)
plt.margins(0,0)
plt.axis('off')
plt.show()
plt.savefig("trade.pdf")
<Figure size 432x288 with 0 Axes>
This dataset captures the trade flow and the Aggregate Bilateral Data from CEPII TRADHIST. Two Centuries of Bilateral Trade and Gravity Data: 1827-2014 http://www.cepii.fr/cepii/en/bdd_modele/download.asp?id=32 CEPII Working Paper, N°2016-14, mai 2016 Michel Fouquin, Jules Hugot.
I also merged it with the WITS tariff data.
df_china = df.set_index('iso_o').loc['CHN']
df_china
iso_d | year | FLOW | |
---|---|---|---|
iso_o | |||
CHN | ABW | 2001 | 4.046692e+06 |
CHN | ABW | 2002 | 2.313028e+06 |
CHN | ABW | 2003 | 1.456348e+06 |
CHN | ABW | 2004 | 2.851764e+06 |
CHN | ABW | 2005 | 4.361033e+06 |
... | ... | ... | ... |
CHN | ZWE | 2010 | 2.227241e+08 |
CHN | ZWE | 2011 | 2.785391e+08 |
CHN | ZWE | 2012 | 2.997592e+08 |
CHN | ZWE | 2013 | 2.908296e+08 |
CHN | ZWE | 2014 | 2.699251e+08 |
2782 rows × 3 columns
df_us= df.set_index('iso_o').loc['USA']
df_us
iso_d | year | FLOW | |
---|---|---|---|
iso_o | |||
USA | ABW | 2001 | 319394048.0 |
USA | ABW | 2002 | 307090048.0 |
USA | ABW | 2003 | 286488672.0 |
USA | ABW | 2004 | 265345168.0 |
USA | ABW | 2005 | 316823584.0 |
... | ... | ... | ... |
USA | ZWE | 2010 | 47981852.0 |
USA | ZWE | 2011 | 42290668.0 |
USA | ZWE | 2012 | 37239684.0 |
USA | ZWE | 2013 | 42587544.0 |
USA | ZWE | 2014 | 32555730.0 |
2743 rows × 3 columns
df_04_16_gravity = pd.read_csv(r'/Users/zihao/Desktop/UBC/Econ 323/Project/gravity dataset/DATASET/2004-2016 gravity.csv')
df_04_16_gravity
year | country_d | iso3_d | dynamic_code_d | landlocked_d | island_d | region_d | gdp_pwt_const_d | pop_d | gdp_pwt_cur_d | ... | hostility_level_o | hostility_level_d | distance | common_language | colony_of_destination_after45 | colony_of_destination_current | colony_of_destination_ever | colony_of_origin_after45 | colony_of_origin_current | colony_of_origin_ever | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2005 | Aruba | ABW | ABW | 0 | 1 | caribbean | 3906.5203 | 0.100031 | 4093.2434 | ... | 0 | 0 | 120.05867 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 2006 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4118.1396 | 0.100830 | 4217.0669 | ... | 0 | 0 | 978.77728 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 2007 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4196.4634 | 0.101218 | 4248.4707 | ... | 0 | 0 | 8563.69630 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 2008 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4433.6772 | 0.101342 | 4441.8828 | ... | 0 | 0 | 7562.67330 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 2009 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4183.0449 | 0.101416 | 4304.9224 | ... | 0 | 0 | 16904.59600 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
746043 | 2016 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | NaN | NaN | NaN | ... | 0 | 0 | 10163.50800 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
746044 | 2016 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | NaN | NaN | NaN | ... | 0 | 0 | 11205.32000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
746045 | 2016 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | NaN | NaN | NaN | ... | 0 | 0 | 15422.45000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
746046 | 2016 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | NaN | NaN | NaN | ... | 0 | 0 | 10944.65400 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
746047 | 2016 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | NaN | NaN | NaN | ... | 0 | 0 | 16100.47300 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
746048 rows × 70 columns
df_93_04_gravity = pd.read_csv(r'/Users/zihao/Desktop/UBC/Econ 323/Project/gravity dataset/DATASET/1993-2004 gravity.csv')
df_93_04_gravity
year | country_d | iso3_d | dynamic_code_d | landlocked_d | island_d | region_d | gdp_pwt_const_d | pop_d | gdp_pwt_cur_d | ... | hostility_level_o | hostility_level_d | distance | common_language | colony_of_destination_after45 | colony_of_destination_current | colony_of_destination_ever | colony_of_origin_after45 | colony_of_origin_current | colony_of_origin_ever | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1993 | Aruba | ABW | ABW | 0 | 1 | caribbean | 3642.4900 | 0.072498 | 3967.2341 | ... | 0 | 0 | 2261.8384 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 1994 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4120.8423 | 0.076700 | 4375.1250 | ... | 0 | 0 | 18336.3320 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1995 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4051.7637 | 0.080326 | 4159.1152 | ... | 0 | 0 | 13977.8780 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 1996 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4171.4653 | 0.083195 | 4185.3594 | ... | 0 | 0 | 4349.6777 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 1997 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4420.3481 | 0.085447 | 4387.0752 | ... | 0 | 0 | 1130.6757 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
716890 | 2004 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 20304.7710 | 12.867828 | 20631.4080 | ... | 0 | 0 | 11802.5430 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
716891 | 2004 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 20304.7710 | 12.867828 | 20631.4080 | ... | 0 | 0 | 12886.2270 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
716892 | 2004 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 20304.7710 | 12.867828 | 20631.4080 | ... | 0 | 0 | 10374.1750 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
716893 | 2004 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 20304.7710 | 12.867828 | 20631.4080 | ... | 0 | 0 | 7659.8032 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
716894 | 2004 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 20304.7710 | 12.867828 | 20631.4080 | ... | 0 | 0 | 11158.9170 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
716895 rows × 70 columns
df_gravity = pd.read_stata(r'/Users/zihao/Desktop/UBC/Econ 323/Project/TRADHIST_WP.dta')
df_gravity
iso_o | iso_d | year | FLOW | FLOW_0 | SOURCE_TF | GDP_o | SOURCE_GDP_o | GDP_d | SOURCE_GDP_d | ... | Indepdate_o | Indepdate_d | XCH_RATE_o | SOURCE_XCH_o | XCH_RATE_d | SOURCE_XCH_d | CONTI_o | CONTI_d | REGIO_o | REGIO_d | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADEN | 1950 | NaN | NaN | NaN | NaN | ... | NaN | 1967.0 | NaN | NaN | ASIA | MIDEST | ||||||||
1 | ADEN | 1951 | NaN | NaN | NaN | NaN | ... | NaN | 1967.0 | NaN | NaN | ASIA | MIDEST | ||||||||
2 | ADEN | 1952 | NaN | NaN | NaN | NaN | ... | NaN | 1967.0 | NaN | NaN | ASIA | MIDEST | ||||||||
3 | ADEN | 1953 | NaN | NaN | NaN | NaN | ... | NaN | 1967.0 | NaN | NaN | ASIA | MIDEST | ||||||||
4 | ADEN | 1954 | NaN | NaN | NaN | NaN | ... | NaN | 1967.0 | NaN | NaN | ASIA | MIDEST | ||||||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2510149 | ZWE | ZMB | 2010 | 46558256.0 | NaN | DOTS_IP | 6.097825e+09 | WDI | 1.311544e+10 | WDI | ... | 1980.0 | 1964.0 | NaN | 0.134909 | IFS | AFRI | AFRI | STHAFR | STHAFR | |
2510150 | ZWE | ZMB | 2011 | 55171376.0 | NaN | DOTS_IP | 6.838230e+09 | WDI | 1.481194e+10 | WDI | ... | 1980.0 | 1964.0 | NaN | 0.128406 | IFS | AFRI | AFRI | STHAFR | STHAFR | |
2510151 | ZWE | ZMB | 2012 | 58787336.0 | NaN | DOTS_IP | 7.845171e+09 | WDI | 1.578776e+10 | WDI | ... | 1980.0 | 1964.0 | NaN | 0.122987 | IFS | AFRI | AFRI | STHAFR | STHAFR | |
2510152 | ZWE | ZMB | 2013 | 77869776.0 | NaN | DOTS_IP | 8.629172e+09 | WDI | 1.715622e+10 | WDI | ... | 1980.0 | 1964.0 | NaN | 0.118546 | IFS | AFRI | AFRI | STHAFR | STHAFR | |
2510153 | ZWE | ZMB | 2014 | 62405360.0 | NaN | DOTS_IP | 8.627889e+09 | WDI | 1.644896e+10 | WDI | ... | 1980.0 | 1964.0 | NaN | 0.098773 | IFS | AFRI | AFRI | STHAFR | STHAFR |
2510154 rows × 60 columns
df_gravity_1 = df_gravity[['iso_o','iso_d','year','FLOW','GDP_o','GDP_d','IPTOT_o','IPTOT_d','XPTOT_o','XPTOT_d','TARIFF_o','TARIFF_d',
'Dist','Comlang','Contig','Curcol']]
df_gravity_2 = df_gravity[['iso_o','iso_d','year','FLOW','GDP_o','GDP_d','IPTOT_o','IPTOT_d','XPTOT_o','XPTOT_d','TARIFF_d',
'Dist','Comlang','Contig','Curcol']]
df_us_gravity_1= df_gravity_1.set_index('iso_o').loc['USA']
df_us_92 = df_us_gravity_1[df_us_gravity_1['year'] >= 1945]
df_us_92
iso_d | year | FLOW | GDP_o | GDP_d | IPTOT_o | IPTOT_d | XPTOT_o | XPTOT_d | TARIFF_o | TARIFF_d | Dist | Comlang | Contig | Curcol | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
iso_o | |||||||||||||||
USA | ABW | 1989 | 77439896.0 | 3.457829e+12 | NaN | 3.015228e+11 | 2.457325e+08 | 2.283166e+11 | 1.997382e+07 | 3.463053 | NaN | 3859.885498 | 1 | 0.0 | 0.0 |
USA | ABW | 1990 | 113874384.0 | 3.367567e+12 | NaN | 2.911732e+11 | 3.271146e+08 | 2.288749e+11 | 1.353272e+07 | 3.515915 | NaN | 3859.885498 | 1 | 0.0 | 0.0 |
USA | ABW | 1991 | 132908312.0 | 3.500775e+12 | NaN | 2.887921e+11 | 7.954365e+08 | 2.542448e+11 | 8.829330e+07 | 3.430391 | NaN | 3859.885498 | 1 | 0.0 | 0.0 |
USA | ABW | 1992 | 164151888.0 | 3.725922e+12 | NaN | 3.148645e+11 | 7.441439e+08 | 2.715215e+11 | 2.035974e+08 | 3.443393 | NaN | 3859.885498 | 1 | 0.0 | 0.0 |
USA | ABW | 1993 | 177090656.0 | 4.586433e+12 | NaN | 4.020604e+11 | 9.306661e+08 | 3.294941e+11 | 3.774600e+08 | 3.421995 | NaN | 3859.885498 | 1 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
USA | ZWE | 2010 | 47981852.0 | 9.684627e+12 | 6.097825e+09 | 1.273726e+12 | 2.673542e+09 | 8.267647e+11 | 2.070475e+09 | 1.497259 | NaN | 14160.549805 | 1 | 0.0 | 0.0 |
USA | ZWE | 2011 | 42290668.0 | 9.685374e+12 | 6.838230e+09 | 1.378072e+12 | 3.097955e+09 | 9.252765e+11 | 2.192061e+09 | NaN | NaN | 14160.549805 | 1 | 0.0 | 0.0 |
USA | ZWE | 2012 | 37239684.0 | 1.023204e+13 | 7.845171e+09 | 1.440980e+12 | 3.321382e+09 | 9.785577e+11 | 2.457760e+09 | NaN | NaN | 14160.549805 | 1 | 0.0 | 0.0 |
USA | ZWE | 2013 | 42587544.0 | 1.072587e+13 | 8.629172e+09 | 1.450988e+12 | 3.244361e+09 | 1.009660e+12 | 2.243481e+09 | NaN | NaN | 14160.549805 | 1 | 0.0 | 0.0 |
USA | ZWE | 2014 | 32555730.0 | 1.058605e+13 | 8.627889e+09 | 1.426755e+12 | 2.983128e+09 | 9.848143e+11 | 2.089376e+09 | NaN | NaN | 14160.549805 | 1 | 0.0 | 0.0 |
12516 rows × 15 columns
null_sum = df_us_92['TARIFF_o'].isnull().sum()
null_sum
776
a= df_us_92.dropna(axis=0, subset=['GDP_d'])
b = a.fillna(method="ffill")
b.isnull().sum()
iso_d 0 year 0 FLOW 0 GDP_o 0 GDP_d 0 IPTOT_o 0 IPTOT_d 0 XPTOT_o 0 XPTOT_d 0 TARIFF_o 0 Dist 0 Comlang 0 Contig 0 Curcol 0 dtype: int64
Let's only focus on the U.S.-China trade flow, looking at the significance of U.S. import tariff on Chinese goods
df_china_gravity_2= df_gravity_2.set_index('iso_o').loc['CHN']
df_china_92 = df_china_gravity_2[df_china_gravity_2['year'] >= 1992]
china_to_us = df_china_92.reset_index().set_index('iso_d').loc['USA'].reset_index().set_index('iso_o').drop(['TARIFF_d','Dist','Comlang','Contig','Curcol'],axis=1)
china_to_us.head()
iso_d | year | FLOW | GDP_o | GDP_d | IPTOT_o | IPTOT_d | XPTOT_o | XPTOT_d | |
---|---|---|---|---|---|---|---|---|---|
iso_o | |||||||||
CHN | USA | 1992 | 1.561893e+10 | 2.421164e+11 | 3.725922e+12 | 4.664825e+10 | 3.148645e+11 | 7.779882e+10 | 2.715215e+11 |
CHN | USA | 1993 | 2.234483e+10 | 2.952897e+11 | 4.586433e+12 | 7.234828e+10 | 4.020604e+11 | 1.058858e+11 | 3.294941e+11 |
CHN | USA | 1994 | 2.702731e+10 | 3.673966e+11 | 4.775738e+12 | 7.895586e+10 | 4.504536e+11 | 1.250470e+11 | 3.634313e+11 |
CHN | USA | 1995 | 3.074602e+10 | 4.638653e+11 | 4.856469e+12 | 9.274582e+10 | 4.886170e+11 | 1.469554e+11 | 4.067594e+11 |
CHN | USA | 1996 | 3.487382e+10 | 5.517650e+11 | 5.191889e+12 | 1.004244e+11 | 5.242101e+11 | 1.626788e+11 | 4.444080e+11 |
df_us_gravity_2= df_gravity_2.set_index('iso_o').loc['USA']
df_us_92 = df_us_gravity_2[df_us_gravity_2['year'] >= 1992]
us_to_china = df_us_92.reset_index().set_index('iso_d').loc['CHN'].reset_index().set_index('iso_o').drop(['TARIFF_d','Dist','Comlang','Contig','Curcol'],axis=1)
us_to_china.head()
iso_d | year | FLOW | GDP_o | GDP_d | IPTOT_o | IPTOT_d | XPTOT_o | XPTOT_d | |
---|---|---|---|---|---|---|---|---|---|
iso_o | |||||||||
USA | CHN | 1992 | 5.072527e+09 | 3.725922e+12 | 2.421164e+11 | 3.148645e+11 | 4.664825e+10 | 2.715215e+11 | 7.779882e+10 |
USA | CHN | 1993 | 7.089494e+09 | 4.586433e+12 | 2.952897e+11 | 4.020604e+11 | 7.234828e+10 | 3.294941e+11 | 1.058858e+11 |
USA | CHN | 1994 | 9.132753e+09 | 4.775738e+12 | 3.673966e+11 | 4.504536e+11 | 7.895586e+10 | 3.634313e+11 | 1.250470e+11 |
USA | CHN | 1995 | 1.021676e+10 | 4.856469e+12 | 4.638653e+11 | 4.886170e+11 | 9.274582e+10 | 4.067594e+11 | 1.469554e+11 |
USA | CHN | 1996 | 1.037000e+10 | 5.191889e+12 | 5.517650e+11 | 5.242101e+11 | 1.004244e+11 | 4.444080e+11 | 1.626788e+11 |
df_us_tariff_time =df_us_tariff_time.T.iloc[1:]
df_china_tariff_time =df_china_tariff_time.T.iloc[1:]
df_us_tariff_time.rename(columns={0:'U.S. on China'}, inplace=True)
df_china_tariff_time.rename(columns={0:'China on U.S.'}, inplace=True)
df_us_tariff_time.head()
U.S. on China | |
---|---|
1992 | 9.14025 |
1993 | 8.97955 |
1994 | 7.78781 |
1995 | 6.59607 |
1996 | 6.10936 |
df_china_tariff_time.head()
China on U.S. | |
---|---|
1992 | 22.5966 |
1993 | 25.2162 |
1994 | 20.1654 |
1995 | 19.799 |
1996 | 19.4326 |
china_to_us['U.S. Tariff']= df_us_tariff_time['U.S. on China'].values
us_to_china['China Tariff']=df_china_tariff_time['China on U.S.'].values
Finally I could get the 1992-2013 precise tariff data. Since some datas such us population and distance did not very much, I simply drop the data. (I'll use those data when doing machine leanring). Let's look at the linear regression on some key factors.
china_to_us = china_to_us[china_to_us['year']<2014]
us_to_china = us_to_china[us_to_china['year']<2014]
china_to_us.tail()
iso_d | year | FLOW | GDP_o | GDP_d | IPTOT_o | IPTOT_d | XPTOT_o | XPTOT_d | U.S. Tariff | |
---|---|---|---|---|---|---|---|---|---|---|
iso_o | ||||||||||
CHN | USA | 2009 | 1.987112e+11 | 3.247738e+12 | 9.255663e+12 | 6.444161e+11 | 1.029349e+12 | 9.063443e+11 | 6.784442e+11 | 3.18477 |
CHN | USA | 2010 | 2.478586e+11 | 3.908740e+12 | 9.684627e+12 | 9.021093e+11 | 1.273726e+12 | 1.160983e+12 | 8.267647e+11 | 3.1402 |
CHN | USA | 2011 | 2.492638e+11 | 4.676334e+12 | 9.685374e+12 | 1.086898e+12 | 1.378072e+12 | 1.285863e+12 | 9.252765e+11 | 3.03097 |
CHN | USA | 2012 | 2.694368e+11 | 5.356605e+12 | 1.023204e+13 | 1.150462e+12 | 1.440980e+12 | 1.356310e+12 | 9.785577e+11 | 2.94673 |
CHN | USA | 2013 | 2.817285e+11 | 6.070769e+12 | 1.072587e+13 | 1.246891e+12 | 1.450988e+12 | 1.428233e+12 | 1.009660e+12 | 2.94626 |
Firstly, take the log of all terms except the tariff data
china_to_us['log_flow'] = np.log(china_to_us['FLOW'])
china_to_us['log_GDP_o']= np.log(china_to_us['GDP_o'])
china_to_us['log_GDP_d']= np.log(china_to_us['GDP_d'])
china_to_us['log_IPTOT_o']= np.log(china_to_us['IPTOT_o'])
china_to_us['log_IPTOT_d']= np.log(china_to_us['IPTOT_d'])
china_to_us['log_XPTOT_o']= np.log(china_to_us['XPTOT_o'])
china_to_us['log_XPTOT_d']= np.log(china_to_us['XPTOT_d'])
us_to_china['log_flow'] = np.log(us_to_china['FLOW'])
us_to_china['log_GDP_o']= np.log(us_to_china['GDP_o'])
us_to_china['log_GDP_d']= np.log(us_to_china['GDP_d'])
us_to_china['log_IPTOT_o']= np.log(us_to_china['IPTOT_o'])
us_to_china['log_IPTOT_d']= np.log(us_to_china['IPTOT_d'])
us_to_china['log_XPTOT_o']= np.log(us_to_china['XPTOT_o'])
us_to_china['log_XPTOT_d']= np.log(us_to_china['XPTOT_d'])
df_china_to_us = china_to_us.drop(['FLOW','GDP_o','GDP_d','IPTOT_o','IPTOT_d','XPTOT_o','XPTOT_d'],axis=1)
df_us_to_china = us_to_china.drop(['FLOW','GDP_o','GDP_d','IPTOT_o','IPTOT_d','XPTOT_o','XPTOT_d'],axis=1)
df_china_to_us.head()
iso_d | year | U.S. Tariff | log_flow | log_GDP_o | log_GDP_d | log_IPTOT_o | log_IPTOT_d | log_XPTOT_o | log_XPTOT_d | |
---|---|---|---|---|---|---|---|---|---|---|
iso_o | ||||||||||
CHN | USA | 1992 | 9.14025 | 23.471750 | 26.212685 | 28.946335 | 24.565901 | 26.475409 | 25.077393 | 26.327307 |
CHN | USA | 1993 | 8.97955 | 23.829861 | 26.411222 | 29.154123 | 25.004757 | 26.719868 | 25.385628 | 26.520824 |
CHN | USA | 1994 | 7.78781 | 24.020113 | 26.629707 | 29.194571 | 25.092155 | 26.833521 | 25.551956 | 26.618856 |
CHN | USA | 1995 | 6.59607 | 24.149027 | 26.862860 | 29.211332 | 25.253128 | 26.914845 | 25.713394 | 26.731487 |
CHN | USA | 1996 | 6.10936 | 24.275002 | 27.036388 | 29.278118 | 25.332670 | 26.985159 | 25.815044 | 26.820009 |
df_us_to_china.head()
iso_d | year | China Tariff | log_flow | log_GDP_o | log_GDP_d | log_IPTOT_o | log_IPTOT_d | log_XPTOT_o | log_XPTOT_d | |
---|---|---|---|---|---|---|---|---|---|---|
iso_o | ||||||||||
USA | CHN | 1992 | 22.5966 | 22.347105 | 28.946335 | 26.212685 | 26.475409 | 24.565901 | 26.327307 | 25.077393 |
USA | CHN | 1993 | 25.2162 | 22.681879 | 29.154123 | 26.411222 | 26.719868 | 25.004757 | 26.520824 | 25.385628 |
USA | CHN | 1994 | 20.1654 | 22.935133 | 29.194571 | 26.629707 | 26.833521 | 25.092155 | 26.618856 | 25.551956 |
USA | CHN | 1995 | 19.799 | 23.047295 | 29.211332 | 26.862860 | 26.914845 | 25.253128 | 26.731487 | 25.713394 |
USA | CHN | 1996 | 19.4326 | 23.062183 | 29.278118 | 27.036388 | 26.985159 | 25.332670 | 26.820009 | 25.815044 |
y_china = df_china_to_us['log_flow']
y_us = df_us_to_china['log_flow']
y_china.head()
iso_o CHN 23.471750 CHN 23.829861 CHN 24.020113 CHN 24.149027 CHN 24.275002 Name: log_flow, dtype: float32
y_us.head()
iso_o USA 22.347105 USA 22.681879 USA 22.935133 USA 23.047295 USA 23.062183 Name: log_flow, dtype: float32
Take the varaible matrix X
X_china = df_china_to_us.drop(["log_flow",'iso_d','year'], axis=1).copy()
X_us = df_us_to_china.drop(["log_flow",'iso_d','year'], axis=1).copy()
# convert everything to be a float for later on
for i in X_china,X_us:
for col in list(i):
i[col] = i[col].astype(float)
X_china.head()
U.S. Tariff | log_GDP_o | log_GDP_d | log_IPTOT_o | log_IPTOT_d | log_XPTOT_o | log_XPTOT_d | |
---|---|---|---|---|---|---|---|
iso_o | |||||||
CHN | 9.140252 | 26.212685 | 28.946335 | 24.565901 | 26.475409 | 25.077393 | 26.327307 |
CHN | 8.979555 | 26.411222 | 29.154123 | 25.004757 | 26.719868 | 25.385628 | 26.520824 |
CHN | 7.787814 | 26.629707 | 29.194571 | 25.092155 | 26.833521 | 25.551956 | 26.618856 |
CHN | 6.596073 | 26.862860 | 29.211332 | 25.253128 | 26.914845 | 25.713394 | 26.731487 |
CHN | 6.109358 | 27.036388 | 29.278118 | 25.332670 | 26.985159 | 25.815044 | 26.820009 |
X_us.head()
China Tariff | log_GDP_o | log_GDP_d | log_IPTOT_o | log_IPTOT_d | log_XPTOT_o | log_XPTOT_d | |
---|---|---|---|---|---|---|---|
iso_o | |||||||
USA | 22.596588 | 28.946335 | 26.212685 | 26.475409 | 24.565901 | 26.327307 | 25.077393 |
USA | 25.216208 | 29.154123 | 26.411222 | 26.719868 | 25.004757 | 26.520824 | 25.385628 |
USA | 20.165407 | 29.194571 | 26.629707 | 26.833521 | 25.092155 | 26.618856 | 25.551956 |
USA | 19.798993 | 29.211332 | 26.862860 | 26.914845 | 25.253128 | 26.731487 | 25.713394 |
USA | 19.432578 | 29.278118 | 27.036388 | 26.985159 | 25.332670 | 26.820009 | 25.815044 |
A standard scatter plot
I really need to apologize that I didn't have enough time to collect the precise time-series tariff data for longer period, since the bilateral data from the World Bank dataset is exteremly tedious to clean, and for the exsiting balateral trade dataset from CEPII, the aggregate data has not been updated for years. What I'll do next is the do a machine learning on gravity model to predict the trade if no tariff war happened.
# import
from sklearn import linear_model
# construct the model instance
china_lr_model = linear_model.LinearRegression()
us_lr_model = linear_model.LinearRegression()
# fit the model
china_lr_model.fit(X_china[['U.S. Tariff']], y_china)
us_lr_model.fit(X_us[['China Tariff']], y_us)
# print the coefficients
beta_0 = china_lr_model.intercept_
beta_1 = china_lr_model.coef_[0]
beta_2 = us_lr_model.intercept_
beta_3 = us_lr_model.coef_[0]
#####
print(f"Fit model: China export flow = {beta_0:.4f} + {beta_1:.4f} tariff imposed by the United States")
print(f"Fit model: U.S. export flow = {beta_2:.4f} + {beta_3:.4f} tariff imposed by China")
# points for the line
# use np.array to define the x axis
def var_scatter_china(df, ax=None, var="U.S. Tariff"):
if ax is None:
_, ax = plt.subplots(figsize=(8, 6))
df.plot.scatter(x=var , y="log_flow", alpha=1, s=20, ax=ax)
return ax
def var_scatter_us(df, ax=None, var="China Tariff"):
if ax is None:
_, ax = plt.subplots(figsize=(8, 6))
df.plot.scatter(x=var , y="log_flow", alpha=1, s=20, ax=ax)
return ax
ax[0] = var_scatter_china(df_china_to_us)
ax[1] = var_scatter_us(df_us_to_china)
x_china = np.array([0, df_china_to_us["U.S. Tariff"].max()])
x_us = np.array([0, df_us_to_china["China Tariff"].max()])
ax[0].plot(x_china, beta_0 + beta_1*x_china)
ax[0].set_title('China Export to U.S. Linear Regression')
ax[1].plot(x_us, beta_2 + beta_3*x_us)
ax[1].set_title('U.S. Export to China Linear Regression')
Fit model: China export flow = 27.0495 + -0.4122 tariff imposed by the United States Fit model: U.S. export flow = 25.0826 + -0.1084 tariff imposed by China
Text(0.5, 1.0, 'U.S. Export to China Linear Regression')
Let's look at the mean square error of the full model and plot them out The mean squared error loss function for the general model is
$$ \frac{1}{N} \sum_{i=1}^N (y_i - X_i \beta)^2 = \frac{1}{N} \| y - X \beta\|_2^2 $$lr_model_china = linear_model.LinearRegression()
lr_model_china.fit(X_china, y_china)
lr_model_us = linear_model.LinearRegression()
lr_model_us.fit(X_us, y_us)
china_mse = metrics.mean_squared_error(y_china, china_lr_model.predict(X_china[["U.S. Tariff"]]))
print(f"The mse of the China full model is {china_mse:.4f}")
us_mse = metrics.mean_squared_error(y_us, us_lr_model.predict(X_us[["China Tariff"]]))
print(f"The mse of the U.S. full model is {us_mse:.4f}")
The mse of the China full model is 0.1155 The mse of the U.S. full model is 0.1512
ax = var_scatter_china(df_china_to_us)
x_china = np.array([1, df_china_to_us["U.S. Tariff"].max()])
ax.plot(x_china, beta_0 + beta_1*x_china)
ax.set_title('China Export to U.S. Linear Regression Full Model')
def scatter_model(mod, X, ax=None, color="gray", x="U.S. Tariff"):
if ax is None:
_, ax = plt.subplots()
ax.scatter(X[x], mod.predict(X), c=color, alpha=1, s=20)
return ax
scatter_model(lr_model_china, X_china, ax, color='gray')
scatter_model(china_lr_model, X_china[["U.S. Tariff"]], ax, color="#ff6d13")
ax.legend(["prediction line", "data", "full model",'only tariff'])
ax.text(8, 25.5, f'mse ={china_mse:.4f}' , ha='center')
Text(8, 25.5, 'mse =0.1155')
ax = var_scatter_us(df_us_to_china)
x_us = np.array([1, df_us_to_china["China Tariff"].max()])
ax.plot(x_us, beta_2 + beta_3*x_us)
ax.set_title('U.S. Export to China Linear Regression Full Model')
def scatter_model(mod, X, ax=None, color="gray", x="China Tariff"):
if ax is None:
_, ax = plt.subplots()
ax.scatter(X[x], mod.predict(X), c=color, alpha=1, s=20)
return ax
scatter_model(lr_model_us, X_us, ax, color='gray')
scatter_model(us_lr_model, X_us[["China Tariff"]], ax, color="#ff6d13")
ax.legend(["prediction line", "data", "full model",'only tariff'])
ax.text(20, 24, f'mse ={us_mse:.4f}' , ha='center')
Text(20, 24, 'mse =0.1512')
Let's make a naive prediction on the U.S. - China trade flow if the tariff rate goes up to 21% for both countries
logt_china_21 =china_lr_model.predict([[21]])[0]
print(f"If the U.S. import tariff on Chinese goods increases to 21%, the export flow is predited to be {np.exp(logt_china_21):.2f} and {logt_china_21} in log term")
logt_us_21 = us_lr_model.predict([[21]])[0]
print(f"If the China's import tariff on the U.S. goods increases to 21%, the export flow is predited to be {np.exp(logt_us_21):.2f} and and {logt_us_21} in log term")
If the U.S. import tariff on Chinese goods increases to 21%, the export flow is predited to be 97198393.07 and 18.392264737131246 in log term If the China's import tariff on the U.S. goods increases to 21%, the export flow is predited to be 8027450054.60 and and 22.806132762121553 in log term
We see this is a very naive prediction, the first reason is that there's not many samples presenting in the prediction. Second is that China has never experienced such high tariff from the U.S. for decades, therefore the shrink of the Chinese exports to the U.S. is dramatic.
Now, I'll apply some machine learning on the whole graivty model database. This dataset comes from
The United States International trade commission based on the exports of the United States. Tamara Gurevich and Peter Herman, (2018). The Dynamic Gravity Dataset: 1948-2016. USITC Working Paper 2018-02-A. https://www.usitc.gov/data/gravity/dataset.htm
CEPII TRADHIST. Two Centuries of Bilateral Trade and Gravity Data: 1827-2014 CEPII Working Paper, N°2016-14, mai 2016 Michel Fouquin, Jules Hugot. http://www.cepii.fr/cepii/en/bdd_modele/download.asp?id=32
CEPII Gravity Database by Head, K., Mayer, T. & Ries, J. (2010), The erosion of colonial trade linkages after independence Journal of International Economics, 81(1):1-14 http://www.cepii.fr/CEPII/en/bdd_modele/download.asp?id=8
Head, K. and T. Mayer, (2014), "Gravity Equations: Toolkit, Cookbook, Workhorse." Handbook of International Economics, Vol. 4,eds. Gopinath, Helpman, and Rogoff, Elsevier. http://www.cepii.fr/CEPII/en/bdd_modele/download.asp?id=8
df_gravity_model =df_gravity_model[df_gravity_model['year'] <=2014]
df_gravity_model
year | country_d | iso3_d | dynamic_code_d | landlocked_d | island_d | region_d | gdp_pwt_const_d | pop_d | gdp_pwt_cur_d | ... | hostility_level_o | hostility_level_d | distance | common_language | colony_of_destination_after45 | colony_of_destination_current | colony_of_destination_ever | colony_of_origin_after45 | colony_of_origin_current | colony_of_origin_ever | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2005 | Aruba | ABW | ABW | 0 | 1 | caribbean | 3906.5203 | 0.100031 | 4093.2434 | ... | 0 | 0 | 120.05867 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 2006 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4118.1396 | 0.100830 | 4217.0669 | ... | 0 | 0 | 978.77728 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 2007 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4196.4634 | 0.101218 | 4248.4707 | ... | 0 | 0 | 8563.69630 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 2008 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4433.6772 | 0.101342 | 4441.8828 | ... | 0 | 0 | 7562.67330 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 2009 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4183.0449 | 0.101416 | 4304.9224 | ... | 0 | 0 | 16904.59600 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
745543 | 2014 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 29149.7090 | 15.245855 | 29001.2250 | ... | 0 | 0 | 12178.03300 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
745544 | 2014 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 29149.7090 | 15.245855 | 29001.2250 | ... | 0 | 0 | 11158.91700 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
745545 | 2014 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 29149.7090 | 15.245855 | 29001.2250 | ... | 0 | 0 | 10202.50000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
745546 | 2014 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 29149.7090 | 15.245855 | 29001.2250 | ... | 0 | 0 | 12930.38800 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
745547 | 2014 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 29149.7090 | 15.245855 | 29001.2250 | ... | 0 | 0 | 6887.58640 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
620046 rows × 70 columns
df_gravity_model = df_gravity_model.rename(columns={'iso3_d': 'iso_d', 'iso3_o': 'iso_o'})
df_gravity_model
year | country_d | iso_d | dynamic_code_d | landlocked_d | island_d | region_d | gdp_pwt_const_d | pop_d | gdp_pwt_cur_d | ... | hostility_level_o | hostility_level_d | distance | common_language | colony_of_destination_after45 | colony_of_destination_current | colony_of_destination_ever | colony_of_origin_after45 | colony_of_origin_current | colony_of_origin_ever | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2005 | Aruba | ABW | ABW | 0 | 1 | caribbean | 3906.5203 | 0.100031 | 4093.2434 | ... | 0 | 0 | 120.05867 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 2006 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4118.1396 | 0.100830 | 4217.0669 | ... | 0 | 0 | 978.77728 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 2007 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4196.4634 | 0.101218 | 4248.4707 | ... | 0 | 0 | 8563.69630 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 2008 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4433.6772 | 0.101342 | 4441.8828 | ... | 0 | 0 | 7562.67330 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 2009 | Aruba | ABW | ABW | 0 | 1 | caribbean | 4183.0449 | 0.101416 | 4304.9224 | ... | 0 | 0 | 16904.59600 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
745543 | 2014 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 29149.7090 | 15.245855 | 29001.2250 | ... | 0 | 0 | 12178.03300 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
745544 | 2014 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 29149.7090 | 15.245855 | 29001.2250 | ... | 0 | 0 | 11158.91700 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
745545 | 2014 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 29149.7090 | 15.245855 | 29001.2250 | ... | 0 | 0 | 10202.50000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
745546 | 2014 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 29149.7090 | 15.245855 | 29001.2250 | ... | 0 | 0 | 12930.38800 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
745547 | 2014 | Zimbabwe | ZWE | ZWE | 1 | 0 | africa | 29149.7090 | 15.245855 | 29001.2250 | ... | 0 | 0 | 6887.58640 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
620046 rows × 70 columns
df_gravity_model_columns = df_gravity_model[df_gravity_model['year','iso3_d','iso3_o','iso3_d' ]]
df_gravity_model.describe()
year | landlocked_d | island_d | gdp_pwt_const_d | pop_d | gdp_pwt_cur_d | capital_cur_d | capital_const_d | gdp_wdi_cur_d | gdp_wdi_const_d | ... | hostility_level_o | hostility_level_d | distance | common_language | colony_of_destination_after45 | colony_of_destination_current | colony_of_destination_ever | colony_of_origin_after45 | colony_of_origin_current | colony_of_origin_ever | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 746048.000000 | 746048.000000 | 746048.000000 | 4.477220e+05 | 447722.000000 | 4.477220e+05 | 4.454640e+05 | 4.454640e+05 | 5.426720e+05 | 5.307520e+05 | ... | 746048.000000 | 746048.000000 | 746048.000000 | 746048.000000 | 746048.000000 | 746048.0 | 746048.000000 | 746048.000000 | 746048.0 | 746048.000000 |
mean | 2010.548042 | 0.169466 | 0.336915 | 4.848036e+05 | 37.745831 | 4.850836e+05 | 1.646679e+06 | 1.714354e+06 | 3.289086e+11 | 3.402141e+11 | ... | 0.001556 | 0.001556 | 8459.633215 | 0.399296 | 0.001713 | 0.0 | 0.002206 | 0.001713 | 0.0 | 0.002206 |
std | 3.445108 | 0.375164 | 0.472656 | 1.625387e+06 | 138.725228 | 1.624879e+06 | 5.536413e+06 | 5.547838e+06 | 1.330710e+12 | 1.325681e+12 | ... | 0.074089 | 0.074089 | 4670.073812 | 0.489754 | 0.041353 | 0.0 | 0.046919 | 0.041353 | 0.0 | 0.046919 |
min | 2005.000000 | 0.000000 | 0.000000 | 7.285319e+01 | 0.004777 | 7.251530e+01 | 7.111269e+02 | 9.787498e+02 | 2.043274e+07 | 2.918573e+07 | ... | 0.000000 | 0.000000 | 0.183362 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 |
25% | 2008.000000 | 0.000000 | 0.000000 | 1.603936e+04 | 2.062881 | 1.619303e+04 | 4.065764e+04 | 4.594060e+04 | 4.980000e+09 | 5.552580e+09 | ... | 0.000000 | 0.000000 | 4748.271000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 |
50% | 2011.000000 | 0.000000 | 0.000000 | 5.415848e+04 | 7.737316 | 5.424052e+04 | 1.604362e+05 | 1.852463e+05 | 2.095800e+10 | 2.320806e+10 | ... | 0.000000 | 0.000000 | 8146.521500 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 |
75% | 2014.000000 | 0.000000 | 1.000000 | 3.113398e+05 | 25.922239 | 3.144975e+05 | 1.046180e+06 | 1.139293e+06 | 1.625901e+11 | 1.679981e+11 | ... | 0.000000 | 0.000000 | 11934.431000 | 1.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 |
max | 2016.000000 | 1.000000 | 1.000000 | 1.713595e+07 | 1369.435670 | 1.681026e+07 | 6.937970e+07 | 6.759007e+07 | 1.803665e+13 | 1.659745e+13 | ... | 4.000000 | 4.000000 | 19868.152000 | 1.000000 | 1.000000 | 0.0 | 1.000000 | 1.000000 | 0.0 | 1.000000 |
8 rows × 62 columns
df_gravity = df_gravity[df_gravity['FLOW'].notna()]
iso_o | iso_d | year | FLOW | FLOW_0 | SOURCE_TF | GDP_o | SOURCE_GDP_o | GDP_d | SOURCE_GDP_d | ... | Indepdate_o | Indepdate_d | XCH_RATE_o | SOURCE_XCH_o | XCH_RATE_d | SOURCE_XCH_d | CONTI_o | CONTI_d | REGIO_o | REGIO_d | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2187 | 2SICIL | AUTHUN | 1838 | 492920.687500 | NaN | RIC_IP | NaN | NaN | ... | NaN | NaN | NaN | NaN | EUROP | EUROP | CTREUR | CTREUR | ||||
2188 | 2SICIL | BEL | 1834 | 38164.500000 | NaN | RIC_IP | NaN | NaN | ... | NaN | NaN | NaN | 0.040975 | HDBK | EUROP | EUROP | CTREUR | NWEUR | |||
2189 | 2SICIL | BEL | 1835 | 30052.339844 | NaN | RIC_IP | NaN | 61133280.0 | GGDC | ... | NaN | NaN | NaN | 0.040850 | HDBK | EUROP | EUROP | CTREUR | NWEUR | ||
2190 | 2SICIL | BEL | 1836 | 44906.539062 | NaN | RIC_IP | NaN | 68204648.0 | GGDC | ... | NaN | NaN | NaN | 0.040960 | HDBK | EUROP | EUROP | CTREUR | NWEUR | ||
2191 | 2SICIL | BEL | 1837 | 33957.269531 | NaN | RIC_IP | NaN | 71462456.0 | GGDC | ... | NaN | NaN | NaN | 0.040960 | HDBK | EUROP | EUROP | CTREUR | NWEUR |
5 rows × 60 columns
df_gravity= df_gravity[df_gravity['year'] >= 1945]
df_gravity = df_gravity[df_gravity['year','iso_o','iso_d','FLOW','GDP_o','GDP_d']
Now I merge the gravity data provided by CEPII which contains the trade flow and the gravity data by USITC which provides political indicators.
df_final_version = pd.merge(df_gravity,df_gravity_model , on=['iso_o', 'iso_d','year'])
df_final_version
iso_o | iso_d | year | FLOW | FLOW_0 | SOURCE_TF | GDP_o | SOURCE_GDP_o | GDP_d | SOURCE_GDP_d | ... | hostility_level_o | hostility_level_d | distance | common_language | colony_of_destination_after45 | colony_of_destination_current | colony_of_destination_ever | colony_of_origin_after45 | colony_of_origin_current | colony_of_origin_ever | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ABW | AFG | 2005 | 0.0 | NaN | DOTS_IP | 1.282048e+09 | WDI | 3.451279e+09 | WDI | ... | 0 | 0 | 13146.86300 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | ABW | AFG | 2006 | 0.0 | NaN | DOTS_IP | 1.316040e+09 | WDI | 3.835713e+09 | WDI | ... | 0 | 0 | 13146.86300 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | ABW | AFG | 2007 | 0.0 | NaN | DOTS_IP | 1.311265e+09 | WDI | 4.919677e+09 | WDI | ... | 0 | 0 | 13146.86300 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | ABW | AFG | 2008 | 0.0 | NaN | DOTS_IP | 1.518732e+09 | WDI | 5.543302e+09 | WDI | ... | 0 | 0 | 13146.86300 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | ABW | AFG | 2009 | 0.0 | NaN | DOTS_IP | 1.604113e+09 | WDI | 8.015607e+09 | WDI | ... | 0 | 0 | 13146.86300 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
319270 | ZWE | ZMB | 2010 | 46558256.0 | NaN | DOTS_IP | 6.097825e+09 | WDI | 1.311544e+10 | WDI | ... | 0 | 0 | 605.01996 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
319271 | ZWE | ZMB | 2011 | 55171376.0 | NaN | DOTS_IP | 6.838230e+09 | WDI | 1.481194e+10 | WDI | ... | 0 | 0 | 605.01996 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
319272 | ZWE | ZMB | 2012 | 58787336.0 | NaN | DOTS_IP | 7.845171e+09 | WDI | 1.578776e+10 | WDI | ... | 0 | 0 | 605.01996 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
319273 | ZWE | ZMB | 2013 | 77869776.0 | NaN | DOTS_IP | 8.629172e+09 | WDI | 1.715622e+10 | WDI | ... | 0 | 0 | 605.01996 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
319274 | ZWE | ZMB | 2014 | 62405360.0 | NaN | DOTS_IP | 8.627889e+09 | WDI | 1.644896e+10 | WDI | ... | 0 | 0 | 605.01996 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
319275 rows × 127 columns
note: the distance here is the population weighted distance, so the population density is taken into account.
df_final_version_1= df_final_version[['year','iso_o','iso_d','region_d','FLOW','GDP_o','GDP_d','distance',
'contiguity','landlocked_d','island_d','common_language','colony_of_origin_after45',
'agree_fta','member_wto_o','member_wto_d','polity_o','polity_d',
'hostility_level_o','hostility_level_d','sanction_imposition_trade']]
df_final_version_1
year | iso_o | iso_d | region_d | FLOW | GDP_o | GDP_d | distance | contiguity | landlocked_d | ... | common_language | colony_of_origin_after45 | agree_fta | member_wto_o | member_wto_d | polity_o | polity_d | hostility_level_o | hostility_level_d | sanction_imposition_trade | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2005 | ABW | AFG | middle_east | 0.0 | 1.282048e+09 | 3.451279e+09 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | NaN | NaN | 0 | 0 | 0.0 |
1 | 2006 | ABW | AFG | middle_east | 0.0 | 1.316040e+09 | 3.835713e+09 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | NaN | NaN | 0 | 0 | 0.0 |
2 | 2007 | ABW | AFG | middle_east | 0.0 | 1.311265e+09 | 4.919677e+09 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | NaN | NaN | 0 | 0 | 0.0 |
3 | 2008 | ABW | AFG | middle_east | 0.0 | 1.518732e+09 | 5.543302e+09 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | NaN | NaN | 0 | 0 | 0.0 |
4 | 2009 | ABW | AFG | middle_east | 0.0 | 1.604113e+09 | 8.015607e+09 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | NaN | NaN | 0 | 0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
319270 | 2010 | ZWE | ZMB | africa | 46558256.0 | 6.097825e+09 | 1.311544e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 |
319271 | 2011 | ZWE | ZMB | africa | 55171376.0 | 6.838230e+09 | 1.481194e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 |
319272 | 2012 | ZWE | ZMB | africa | 58787336.0 | 7.845171e+09 | 1.578776e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 |
319273 | 2013 | ZWE | ZMB | africa | 77869776.0 | 8.629172e+09 | 1.715622e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 4.0 | 7.0 | 0 | 0 | 0.0 |
319274 | 2014 | ZWE | ZMB | africa | 62405360.0 | 8.627889e+09 | 1.644896e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 4.0 | 7.0 | 0 | 0 | NaN |
319275 rows × 21 columns
df_final_version_1[['polity_o','polity_d','sanction_imposition_trade']] = df_final_version_1[['polity_o','polity_d','sanction_imposition_trade']].fillna(value=0)
/Users/zihao/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py:2963: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self[k1] = value[k2]
df_final_version_2 = df_final_version_1.dropna(subset=['GDP_o','GDP_d'])
df_final_version_2
year | iso_o | iso_d | region_d | FLOW | GDP_o | GDP_d | distance | contiguity | landlocked_d | ... | common_language | colony_of_origin_after45 | agree_fta | member_wto_o | member_wto_d | polity_o | polity_d | hostility_level_o | hostility_level_d | sanction_imposition_trade | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2005 | ABW | AFG | middle_east | 0.0 | 1.282048e+09 | 3.451279e+09 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 |
1 | 2006 | ABW | AFG | middle_east | 0.0 | 1.316040e+09 | 3.835713e+09 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 |
2 | 2007 | ABW | AFG | middle_east | 0.0 | 1.311265e+09 | 4.919677e+09 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 |
3 | 2008 | ABW | AFG | middle_east | 0.0 | 1.518732e+09 | 5.543302e+09 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 |
4 | 2009 | ABW | AFG | middle_east | 0.0 | 1.604113e+09 | 8.015607e+09 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
319270 | 2010 | ZWE | ZMB | africa | 46558256.0 | 6.097825e+09 | 1.311544e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 |
319271 | 2011 | ZWE | ZMB | africa | 55171376.0 | 6.838230e+09 | 1.481194e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 |
319272 | 2012 | ZWE | ZMB | africa | 58787336.0 | 7.845171e+09 | 1.578776e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 |
319273 | 2013 | ZWE | ZMB | africa | 77869776.0 | 8.629172e+09 | 1.715622e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 4.0 | 7.0 | 0 | 0 | 0.0 |
319274 | 2014 | ZWE | ZMB | africa | 62405360.0 | 8.627889e+09 | 1.644896e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 4.0 | 7.0 | 0 | 0 | 0.0 |
285368 rows × 21 columns
Get rid off the rows that has FLOW = 0
df_final_version_3 = df_final_version_2.loc[~(df_final_version_2['FLOW'] == 0)]
df_final_version_3
year | iso_o | iso_d | region_d | FLOW | GDP_o | GDP_d | distance | contiguity | landlocked_d | ... | common_language | colony_of_origin_after45 | agree_fta | member_wto_o | member_wto_d | polity_o | polity_d | hostility_level_o | hostility_level_d | sanction_imposition_trade | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | 2011 | ABW | AFG | middle_east | 6.023273e+04 | 1.613070e+09 | 1.119100e+10 | 13146.86300 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 |
36 | 2005 | ABW | ARE | middle_east | 5.235400e+02 | 1.282048e+09 | 9.933899e+10 | 12761.36100 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 1 | 0.0 | -8.0 | 0 | 0 | 0.0 |
38 | 2007 | ABW | ARE | middle_east | 3.234000e+02 | 1.311265e+09 | 1.288993e+11 | 12761.36100 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 1 | 0.0 | -8.0 | 0 | 0 | 0.0 |
39 | 2008 | ABW | ARE | middle_east | 1.432072e+05 | 1.518732e+09 | 1.716075e+11 | 12761.36100 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 1 | 0.0 | -8.0 | 0 | 0 | 0.0 |
40 | 2009 | ABW | ARE | middle_east | 1.119558e+04 | 1.604113e+09 | 1.627569e+11 | 12761.36100 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 1 | 0.0 | -8.0 | 0 | 0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
319270 | 2010 | ZWE | ZMB | africa | 4.655826e+07 | 6.097825e+09 | 1.311544e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 |
319271 | 2011 | ZWE | ZMB | africa | 5.517138e+07 | 6.838230e+09 | 1.481194e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 |
319272 | 2012 | ZWE | ZMB | africa | 5.878734e+07 | 7.845171e+09 | 1.578776e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 |
319273 | 2013 | ZWE | ZMB | africa | 7.786978e+07 | 8.629172e+09 | 1.715622e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 4.0 | 7.0 | 0 | 0 | 0.0 |
319274 | 2014 | ZWE | ZMB | africa | 6.240536e+07 | 8.627889e+09 | 1.644896e+10 | 605.01996 | 1 | 1 | ... | 1 | 0 | 1 | 1 | 1 | 4.0 | 7.0 | 0 | 0 | 0.0 |
214825 rows × 21 columns
df_final_version_3['log_flow'] = np.log(df_final_version_3['FLOW'])
df_final_version_3['log_GDP_o']= np.log(df_final_version_3['GDP_o'])
df_final_version_3['log_GDP_d']= np.log(df_final_version_3['GDP_d'])
df_final_version_3['log_distance']= np.log(df_final_version_3['distance'])
df_final_version_4 = df_final_version_3.drop(['FLOW','GDP_o','GDP_d','distance'],axis=1)
df_final_version_5 = df_final_version_4[df_final_version_4['log_flow']>0]
df_final_version_5
/Users/zihao/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for launching an IPython kernel. /Users/zihao/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy /Users/zihao/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy This is separate from the ipykernel package so we can avoid doing imports until /Users/zihao/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy after removing the cwd from sys.path.
year | iso_o | iso_d | region_d | contiguity | landlocked_d | island_d | common_language | colony_of_origin_after45 | agree_fta | ... | member_wto_d | polity_o | polity_d | hostility_level_o | hostility_level_d | sanction_imposition_trade | log_flow | log_GDP_o | log_GDP_d | log_distance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | 2011 | ABW | AFG | middle_east | 0 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 11.005971 | 21.201405 | 23.138376 | 9.483938 |
36 | 2005 | ABW | ARE | middle_east | 0 | 0 | 0 | 1 | 0 | 0 | ... | 1 | 0.0 | -8.0 | 0 | 0 | 0.0 | 6.260613 | 20.971725 | 25.321804 | 9.454177 |
38 | 2007 | ABW | ARE | middle_east | 0 | 0 | 0 | 1 | 0 | 0 | ... | 1 | 0.0 | -8.0 | 0 | 0 | 0.0 | 5.778890 | 20.994259 | 25.582296 | 9.454177 |
39 | 2008 | ABW | ARE | middle_east | 0 | 0 | 0 | 1 | 0 | 0 | ... | 1 | 0.0 | -8.0 | 0 | 0 | 0.0 | 11.872047 | 21.141142 | 25.868475 | 9.454177 |
40 | 2009 | ABW | ARE | middle_east | 0 | 0 | 0 | 1 | 0 | 0 | ... | 1 | 0.0 | -8.0 | 0 | 0 | 0.0 | 9.323275 | 21.195837 | 25.815523 | 9.454177 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
319270 | 2010 | ZWE | ZMB | africa | 1 | 1 | 0 | 1 | 0 | 1 | ... | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 | 17.656216 | 22.531199 | 23.297056 | 6.405261 |
319271 | 2011 | ZWE | ZMB | africa | 1 | 1 | 0 | 1 | 0 | 1 | ... | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 | 17.825954 | 22.645794 | 23.418699 | 6.405261 |
319272 | 2012 | ZWE | ZMB | africa | 1 | 1 | 0 | 1 | 0 | 1 | ... | 1 | 1.0 | 7.0 | 0 | 0 | 0.0 | 17.889437 | 22.783165 | 23.482500 | 6.405261 |
319273 | 2013 | ZWE | ZMB | africa | 1 | 1 | 0 | 1 | 0 | 1 | ... | 1 | 4.0 | 7.0 | 0 | 0 | 0.0 | 18.170549 | 22.878414 | 23.565626 | 6.405261 |
319274 | 2014 | ZWE | ZMB | africa | 1 | 1 | 0 | 1 | 0 | 1 | ... | 1 | 4.0 | 7.0 | 0 | 0 | 0.0 | 17.949162 | 22.878265 | 23.523529 | 6.405261 |
214776 rows × 21 columns
df_final_version_5 = df_final_version_5[df_final_version_5['year'] < 2014]
Now I'll create some variables based on the gravity model, which was inspired by Newton law of gravitation. My approach is based on the ARTNeT- GIZ Capacity Building Workshop on Introduction to Gravity Modelling: 19-21 April 2016, Ulaanbaatar, Introduction to the basic gravity model. https://www.unescap.org/sites/default/files/Day1_S2_Gravity_intro.pdf
I also introduced some more policy indicators such as hospitality level, government control into the trade cost friction, after taking the log, we can get the linear regression below. G is the gravitation constant parameter.
$$ \log(\text{Export}_{ij}) = \beta_0\log{G} + \beta_1\log\text{GDP_o} + \beta_2\log\text{GDP_d} +\beta_i \text{Policy Indicators}_i+ \beta_j \text{Geographic Indicators}_j + \epsilon $$Let’s firstly make the log price vs sqft_living to visualize the scatter plot of the data
X = df_final_version_5.drop(["log_flow", "iso_o", "iso_d",'region_d',"year"], axis=1).copy()
for col in list(X):
X[col] = X[col].astype(float)
# convert everything to be a float for later on
for col in list(X):
X[col] = X[col].astype(float)
y = df_final_version_5['log_flow']
def var_scatter(df, ax=None, var="log_GDP_o"):
if ax is None:
_, ax = plt.subplots(figsize=(15, 10))
df.plot.scatter(x=var , y="log_flow", alpha=0.35, s=1.5, ax=ax)
return ax
var_scatter(df_final_version_5);
# import
from sklearn import linear_model
# construct the model instance
gravity_lr_model = linear_model.LinearRegression()
# fit the model
gravity_lr_model.fit(X[["log_GDP_o"]], y)
# print the coefficients
beta_g_0 = gravity_lr_model.intercept_
beta_g_1 = gravity_lr_model.coef_[0]
#####
print(f"Fit model: log(flow) = {beta_g_0:.4f} + {beta_g_1:.4f} log_GDP_o")
#the full model
lr_model = linear_model.LinearRegression()
lr_model.fit(X, y)
ax = var_scatter(df_final_version_5)
def scatter_model(mod, X, ax=None, color='red', x="log_GDP_o"):
if ax is None:
_, ax = plt.subplots()
ax.scatter(X[x], mod.predict(X), c=color, alpha=0.25, s=1)
return ax
scatter_model(lr_model, X, ax, color='indianred')
scatter_model(gravity_lr_model, X[["log_GDP_o"]], ax, color='gold')
ax.legend(["data", "Multi-linear full model", "log_GDP_o model"],markerscale = 5)
ax.set_title('Simple Linear V.S. Gravity Multilinear')
gravity_mse = metrics.mean_squared_error(y, gravity_lr_model.predict(X[["log_GDP_o"]]))
gravity_R2 = metrics.r2_score(y,gravity_lr_model.predict(X[["log_GDP_o"]]))
gravity_full_mse = metrics.mean_squared_error(y,lr_model.predict(X))
gravity_full_R2 = metrics.r2_score(y,lr_model.predict(X))
print(f"The R2 score of the linear model is {gravity_R2}")
print(f"The full model mse is {gravity_full_mse:.4f} ")
print(f"The R2 score of the full model is {gravity_full_R2}")
gravity_coefs = pd.Series(dict(zip(list(X), lr_model.coef_)))
print(gravity_coefs)
Fit model: log(flow) = -10.8105 + 1.0360 log_GDP_o The R2 score of the linear model is 0.31319855485247206 The full model mse is 6.4102 The R2 score of the full model is 0.6344888113693667 contiguity 1.169724 landlocked_d -0.659235 island_d 0.125324 common_language 0.897848 colony_of_origin_after45 0.594258 agree_fta 0.476895 member_wto_o 0.644206 member_wto_d 0.147447 polity_o 0.018764 polity_d -0.002851 hostility_level_o -0.080827 hostility_level_d 0.012910 sanction_imposition_trade 0.165399 log_GDP_o 1.201219 log_GDP_d 0.894970 log_distance -1.334600 dtype: float64
Or simply look at the summary table
import statsmodels.api as sm
X1 = sm.add_constant(X)
sm.OLS(y,X1).fit().summary()
Dep. Variable: | log_flow | R-squared: | 0.634 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.634 |
Method: | Least Squares | F-statistic: | 2.114e+04 |
Date: | Mon, 20 Apr 2020 | Prob (F-statistic): | 0.00 |
Time: | 17:16:24 | Log-Likelihood: | -4.5757e+05 |
No. Observations: | 194884 | AIC: | 9.152e+05 |
Df Residuals: | 194867 | BIC: | 9.153e+05 |
Df Model: | 16 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | -26.0190 | 0.120 | -217.421 | 0.000 | -26.254 | -25.784 |
contiguity | 1.1697 | 0.042 | 28.053 | 0.000 | 1.088 | 1.251 |
landlocked_d | -0.6592 | 0.016 | -40.305 | 0.000 | -0.691 | -0.627 |
island_d | 0.1253 | 0.016 | 7.787 | 0.000 | 0.094 | 0.157 |
common_language | 0.8978 | 0.012 | 72.667 | 0.000 | 0.874 | 0.922 |
colony_of_origin_after45 | 0.5943 | 0.100 | 5.955 | 0.000 | 0.399 | 0.790 |
agree_fta | 0.4769 | 0.018 | 25.787 | 0.000 | 0.441 | 0.513 |
member_wto_o | 0.6442 | 0.018 | 35.100 | 0.000 | 0.608 | 0.680 |
member_wto_d | 0.1474 | 0.018 | 8.189 | 0.000 | 0.112 | 0.183 |
polity_o | 0.0188 | 0.001 | 17.460 | 0.000 | 0.017 | 0.021 |
polity_d | -0.0029 | 0.001 | -2.633 | 0.008 | -0.005 | -0.001 |
hostility_level_o | -0.0808 | 0.096 | -0.845 | 0.398 | -0.268 | 0.107 |
hostility_level_d | 0.0129 | 0.095 | 0.136 | 0.892 | -0.173 | 0.199 |
sanction_imposition_trade | 0.1654 | 0.105 | 1.569 | 0.117 | -0.041 | 0.372 |
log_GDP_o | 1.2012 | 0.003 | 435.658 | 0.000 | 1.196 | 1.207 |
log_GDP_d | 0.8950 | 0.003 | 308.677 | 0.000 | 0.889 | 0.901 |
log_distance | -1.3346 | 0.009 | -155.398 | 0.000 | -1.351 | -1.318 |
Omnibus: | 14299.664 | Durbin-Watson: | 0.607 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 23923.475 |
Skew: | -0.562 | Prob(JB): | 0.00 |
Kurtosis: | 4.297 | Cond. No. | 813. |
from patsy import dmatrices
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif = pd.DataFrame()
vif["VIF Factor"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif["features"] = X.columns
vif.round(2)
VIF Factor | features | |
---|---|---|
0 | 1.20 | contiguity |
1 | 1.32 | landlocked_d |
2 | 1.45 | island_d |
3 | 1.67 | common_language |
4 | 1.02 | colony_of_origin_after45 |
5 | 1.49 | agree_fta |
6 | 8.86 | member_wto_o |
7 | 8.46 | member_wto_d |
8 | 1.83 | polity_o |
9 | 1.82 | polity_d |
10 | 3.94 | hostility_level_o |
11 | 3.94 | hostility_level_d |
12 | 1.01 | sanction_imposition_trade |
13 | 93.23 | log_GDP_o |
14 | 96.71 | log_GDP_d |
15 | 126.67 | log_distance |
X_vif = X.drop(['log_GDP_d','log_distance'],axis=1).copy()
lr_model_vif = linear_model.LinearRegression()
lr_model_vif.fit(X_vif, y)
gravity_vif_R2 = metrics.r2_score(y,lr_model_vif.predict(X_vif))
print(f"The R2 score is {gravity_vif_R2}")
gravity_vif_coefs = pd.Series(dict(zip(list(X), lr_model_vif.coef_)))
print(gravity_vif_coefs)
The R2 score is 0.42723639753728515 contiguity 3.214427 landlocked_d -1.649783 island_d -1.619409 common_language 0.780110 colony_of_origin_after45 -0.175894 agree_fta 2.099707 member_wto_o 0.287735 member_wto_d 0.373834 polity_o -0.004910 polity_d 0.060412 hostility_level_o -0.046207 hostility_level_d 0.166451 sanction_imposition_trade 2.398556 log_GDP_o 1.059179 dtype: float64
Linear least squares with l2 regularization. $$||y - Xw||^2_2 + \alpha * ||w||^2_2$$ In this case ridge only slightly differs from the linear regression as shown in the result.
from sklearn.linear_model import Ridge
clf = linear_model.Ridge(alpha=10)
clf.fit(X, y)
ridge_mse = metrics.mean_squared_error(y,clf.predict(X))
ridge_r2 = metrics.r2_score(y,clf.predict(X))
print(f"The ridge mse is {ridge_mse :.4f} ")
print(f"The ridge R2 score is {ridge_r2}")
ridge_coefs = pd.Series(dict(zip(list(X), clf.coef_)))
print(ridge_coefs)
The ridge mse is 6.4102 The ridge R2 score is 0.6344887832813664 contiguity 1.166796 landlocked_d -0.658958 island_d 0.125310 common_language 0.897877 colony_of_origin_after45 0.585220 agree_fta 0.476808 member_wto_o 0.643853 member_wto_d 0.147402 polity_o 0.018774 polity_d -0.002850 hostility_level_o -0.079209 hostility_level_d 0.012120 sanction_imposition_trade 0.162706 log_GDP_o 1.201236 log_GDP_d 0.894967 log_distance -1.334672 dtype: float64
By running Lasso Regression with the default value 1.0, most coeficients got penalized to 0, except for distance, political status of country of origin, and bilateral GDP
lasso_gravity_model = linear_model.Lasso()
lasso_gravity_model.fit(X, y)
lasso_coefs = pd.Series(dict(zip(list(X), lasso_gravity_model.coef_)))
lr_coefs = pd.Series(dict(zip(list(X), lr_model.coef_)))
coefs = pd.DataFrame(dict(lasso=lasso_coefs, linreg=lr_coefs))
coefs
lasso | linreg | |
---|---|---|
contiguity | 0.000000 | 1.169724 |
landlocked_d | -0.000000 | -0.659235 |
island_d | -0.000000 | 0.125324 |
common_language | 0.000000 | 0.897848 |
colony_of_origin_after45 | 0.000000 | 0.594258 |
agree_fta | 0.000000 | 0.476895 |
member_wto_o | 0.000000 | 0.644206 |
member_wto_d | 0.000000 | 0.147447 |
polity_o | 0.008226 | 0.018764 |
polity_d | 0.000000 | -0.002851 |
hostility_level_o | 0.000000 | -0.080827 |
hostility_level_d | 0.000000 | 0.012910 |
sanction_imposition_trade | 0.000000 | 0.165399 |
log_GDP_o | 0.947046 | 1.201219 |
log_GDP_d | 0.628656 | 0.894970 |
log_distance | -0.000000 | -1.334600 |
lasso_gravity_model_vif = linear_model.Lasso()
lasso_gravity_model_vif.fit(X_vif, y)
lasso_coefs_vif = pd.Series(dict(zip(list(X_vif), lasso_gravity_model_vif.coef_)))
lr_coefs_vif = pd.Series(dict(zip(list(X_vif), lr_model_vif.coef_)))
coefs_vif = pd.DataFrame(dict(lasso_vif=lasso_coefs_vif, linreg_vif=lr_coefs_vif))
coefs_vif
lasso_vif | linreg_vif | |
---|---|---|
contiguity | 0.000000 | 3.214427 |
landlocked_d | -0.000000 | -1.649783 |
island_d | -0.000000 | -1.619409 |
common_language | 0.000000 | 0.780110 |
colony_of_origin_after45 | 0.000000 | -0.175894 |
agree_fta | 0.000000 | 2.099707 |
member_wto_o | 0.000000 | 0.287735 |
member_wto_d | 0.000000 | 0.373834 |
polity_o | 0.000000 | -0.004910 |
polity_d | 0.048370 | 0.060412 |
hostility_level_o | 0.000000 | -0.046207 |
hostility_level_d | 0.000000 | 0.166451 |
sanction_imposition_trade | 0.000000 | 2.398556 |
log_GDP_o | 0.853548 | 1.059179 |
Corss Validation for alpha, folds=5. We see the corss validation path is very similar to the train path. Here I select the test size as 80000, roughly one third of the sample size
# Compute lasso for many alphas (the lasso path)
from itertools import cycle
alphas = np.exp(np.linspace(10, -2, 50))
alphas, coefs_lasso, _ = linear_model.lasso_path(X, y, alphas=alphas, fit_intercept=True, max_iter=10000)
# plotting
fig, ax = plt.subplots(figsize=(12, 8))
colors = cycle(qeds.themes.COLOR_CYCLE)
log_alphas = -np.log10(alphas)
for coef_l, c, name in zip(coefs_lasso, colors, list(X)):
ax.plot(log_alphas, coef_l, c=c)
ax.set_xlabel('-Log(alpha)')
ax.set_ylabel('lasso coefficients')
ax.set_title('Lasso Path')
ax.axis('tight')
maxabs = np.max(np.abs(coef_l))
i = [idx for idx in range(len(coef_l)) if abs(coef_l[idx]) >= (0.9*maxabs)][0]
xnote = log_alphas[i]
ynote = coef_l[i]
ax.annotate(name, (xnote, ynote), color=c)
Find the MSE minimizer lasso alpha
def fit_and_report_mses(mod, X_train, X_test, y_train, y_test):
mod.fit(X_train, y_train)
return dict(
mse_train=metrics.mean_squared_error(y_train, mod.predict(X_train)),
mse_test=metrics.mean_squared_error(y_test, mod.predict(X_test)),
R2 = metrics.r2_score(y_test, mod.predict(X_test))
)
# fit the train model, apply to predict
n_test = 80000
#X is a matrix 2D, y is an array
X_train = X.iloc[:n_test, :]
X_test = X.iloc[n_test:, :]
y_train = y.iloc[:n_test]
y_test = y.iloc[n_test:]
alphas = np.exp(np.linspace(10, -5, 100))
mse = pd.DataFrame([fit_and_report_mses(linear_model.Lasso(alpha=alpha, max_iter=50000),
X_train, X_test, y_train, y_test)
for alpha in alphas]) #test for different alphas, see which one is better
mse["log_alpha"] = -np.log10(alphas)
fig, ax = plt.subplots(figsize=(10,6))
mse.plot(x="log_alpha", y="mse_test", c='r', ax=ax)
mse.plot(x="log_alpha", y="mse_train", c='blue', ax=ax)
ax.set_xlabel(r"$-\log(\alpha)$")
ax.set_ylabel("MSE")
from sklearn.model_selection import cross_val_score
mse["cv"] = [-np.mean(cross_val_score(linear_model.Lasso(alpha=alpha, max_iter=50000),
X_train, y_train, cv=5, scoring='neg_mean_squared_error'))
for alpha in alphas]
mse.plot(x="log_alpha", y="cv", c='gold', ax=ax)
lasso = linear_model.LassoCV(cv=5).fit(X_train,y_train)
alpha_minimize = -np.log10(lasso.alpha_)
print(alpha_minimize)
print(f'the lasso alpha minimizer is {lasso.alpha_}')
2.2336381795139677 the lasso alpha minimizer is 0.005839313880316532
By the minimizer alpha = 0.006. By the defualt alpha = 1, many Political related scores are penalized to 0, but It's far from ideal to analyze trade sanction policy. We could also see that lasso minimizer alpha is a really bad prediction for data, far inferior than the gravity model in the export trade flow case
lasso_alpha_model = linear_model.Lasso(alpha = 0.006)
lasso_alpha_model.fit(X, y)
lasso_alpha_coefs = pd.Series(dict(zip(list(X), lasso_alpha_model.coef_)))
coefs_new = pd.DataFrame(dict(lasso_alpha_min = lasso_alpha_coefs, lasso_alpha_default =lasso_coefs,
linreg=lr_coefs,ridge = ridge_coefs))
lasso_1_R2 = metrics.r2_score(y,lasso_gravity_model.predict(X))
lasso_min_R2 = metrics.r2_score(y,lasso_alpha_model.predict(X))
lasso_1_mse = metrics.mean_squared_error(y,lasso_gravity_model.predict(X))
lasso_min_mse = metrics.mean_squared_error(y,lasso_alpha_model.predict(X))
R_sqaure = pd.Series(dict(lasso_alpha_min = lasso_min_R2 , lasso_alpha_default = lasso_1_R2,
gravity =gravity_full_R2, ridge_gravity = ridge_r2, single_variable = gravity_R2))
MSE = pd.Series(dict(lasso_alpha_min = lasso_min_mse , lasso_alpha_default = lasso_1_mse,
gravity =gravity_full_mse, ridge_gravity = ridge_mse, single_variable = gravity_mse ))
print("R square for each model is")
print(R_sqaure)
print("MSE for each model is")
print(MSE)
coefs_new
R square for each model is lasso_alpha_min 0.634240 lasso_alpha_default 0.500732 gravity 0.634489 ridge_gravity 0.634489 single_variable 0.313199 dtype: float64 MSE for each model is lasso_alpha_min 6.414600 lasso_alpha_default 8.756027 gravity 6.410241 ridge_gravity 6.410241 single_variable 12.044946 dtype: float64
lasso_alpha_min | lasso_alpha_default | linreg | ridge | |
---|---|---|---|---|
contiguity | 0.871726 | 0.000000 | 1.169724 | 1.166796 |
landlocked_d | -0.625962 | -0.000000 | -0.659235 | -0.658958 |
island_d | 0.087903 | -0.000000 | 0.125324 | 0.125310 |
common_language | 0.898684 | 0.000000 | 0.897848 | 0.897877 |
colony_of_origin_after45 | 0.000000 | 0.000000 | 0.594258 | 0.585220 |
agree_fta | 0.437331 | 0.000000 | 0.476895 | 0.476808 |
member_wto_o | 0.583790 | 0.000000 | 0.644206 | 0.643853 |
member_wto_d | 0.091606 | 0.000000 | 0.147447 | 0.147402 |
polity_o | 0.020058 | 0.008226 | 0.018764 | 0.018774 |
polity_d | -0.001176 | 0.000000 | -0.002851 | -0.002850 |
hostility_level_o | -0.000000 | 0.000000 | -0.080827 | -0.079209 |
hostility_level_d | 0.000000 | 0.000000 | 0.012910 | 0.012120 |
sanction_imposition_trade | 0.000000 | 0.000000 | 0.165399 | 0.162706 |
log_GDP_o | 1.202756 | 0.947046 | 1.201219 | 1.201236 |
log_GDP_d | 0.893488 | 0.628656 | 0.894970 | 0.894967 |
log_distance | -1.346871 | -0.000000 | -1.334600 | -1.334672 |
d = pd.concat([R_sqaure,MSE],axis = 1)
d.columns = ['R^2','MSE']
d
R^2 | MSE | |
---|---|---|
lasso_alpha_min | 0.635133 | 6.407172 |
lasso_alpha_default | 0.500832 | 8.765536 |
gravity | 0.635377 | 6.402887 |
ridge_gravity | 0.635377 | 6.402887 |
single_variable | 0.313830 | 12.049356 |
Now add lasso alpha 1.0 and lasso alpha 0.006 to the scatter plot
#the full model
lr_model = linear_model.LinearRegression()
lr_model.fit(X, y)
ax = var_scatter(df_final_version_5)
scatter_model(lr_model, X, ax, color='indianred')
scatter_model(lasso_alpha_model, X, ax, color='gray')
scatter_model(lasso_gravity_model, X, ax, color='pink')
scatter_model(gravity_lr_model, X[["log_GDP_o"]], ax, color='gold')
ax.legend(["data", "Multi-linear full model","lasso_a =0.006","lasso_a =1","log_GDP_o model"],markerscale = 5)
ax.set_title('Simple Linear V.S. Lasso V.S. Gravity Multilinear')
Text(0.5, 1.0, 'Simple Linear V.S. Lasso V.S. Gravity Multilinear')
Use the model_selection to train, and by trail and error, I found by setting test_size = 0.25 gives the smaller mse for both train and test data. Both R^2 are roughly 0.63.
Again, compared to other R Squares
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3)
fit_and_report_mses(linear_model.LinearRegression(), X_train, X_test, y_train, y_test)
{'mse_train': 6.41744198695048, 'mse_test': 6.393686367439606, 'R2': 0.635950972867064}
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.25)
fit_and_report_mses(linear_model.LinearRegression(), X_train, X_test, y_train, y_test)
{'mse_train': 6.417193759468607, 'mse_test': 6.389683021606089, 'R2': 0.634657260248257}
mod_train_test = linear_model.LinearRegression()
mod_train_test.fit(X_train, y_train)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
Two Layers, 30 and 20 neurons
from sklearn import neural_network
# two hidden layers, with N1=30 and N2=20
nn_model = neural_network.MLPRegressor((30, 20))
nn_model.fit(X, y)
from sklearn import preprocessing, pipeline
# X is the observation matrix
#regressors are continuous
#observation can be binary
# the pipeline defines any number of steps that will be applied
# to transform the `X` data and then a final step that is a model
# we can use for prediction
nn_scaled_model = pipeline.make_pipeline(
preprocessing.StandardScaler(),
neural_network.MLPRegressor((30, 20))
)
# We can now use `model` like we have used our other models all along
# Call fit
nn_scaled_model.fit(X, y)
# Call predict
mse_nn_scaled = metrics.mean_squared_error(y, nn_scaled_model.predict(X))
print(f"Scaled mse {mse_nn_scaled}")
/Users/zihao/opt/anaconda3/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py:571: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Scaled mse 5.351520187673711
/Users/zihao/opt/anaconda3/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py:571: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
R2_nn_scaled = metrics.r2_score(y, nn_scaled_model.predict(X))
print("The R^2 score is", R2_nn_scaled)
The R^2 score is 0.6947039769866588
Three Layers, 10, 20, 10 neurons
nn_scaled_model_3_layer = pipeline.make_pipeline(
preprocessing.StandardScaler(),
neural_network.MLPRegressor((10, 20,10))
)
nn_scaled_model_3_layer.fit(X, y)
mse_nn_scaled_3_layer = metrics.mean_squared_error(y, nn_scaled_model_3_layer.predict(X))
R2_nn_scaled_3 = metrics.r2_score(y, nn_scaled_model_3_layer.predict(X))
print(f"Scaled mse {mse_nn_scaled_3_layer}")
print("The R^2 score is", R2_nn_scaled_3)
Scaled mse 5.467001044014516 The R^2 score is 0.6882722283413251
/Users/zihao/opt/anaconda3/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py:571: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Two layers, 50, 40 neurons.
nn_scaled_model_2_layer_more = pipeline.make_pipeline(
preprocessing.StandardScaler(),
neural_network.MLPRegressor((50,40))
)
nn_scaled_model_2_layer_more.fit(X, y)
mse_nn_scaled_2_layer_more = metrics.mean_squared_error(y, nn_scaled_model_2_layer_more.predict(X))
R2_nn_scaled_2_layer_more = metrics.r2_score(y, nn_scaled_model_2_layer_more.predict(X))
print(f"Scaled mse {mse_nn_scaled_2_layer_more }")
print("The R^2 score is", R2_nn_scaled_2_layer_more)
/Users/zihao/opt/anaconda3/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py:571: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Scaled mse 5.082227372823879 The R^2 score is 0.7102119788823471
R_sqaure = pd.Series(dict(lasso_alpha_min = lasso_min_R2 , lasso_alpha_default = lasso_1_R2,
gravity =gravity_full_R2, ridge_gravity = ridge_r2, single_variable = gravity_R2,
train_and_test_25 = 0.634657,layer_3_10_20_10 = R2_nn_scaled_3,
layer_2_50_40 = R2_nn_scaled_2_layer_more))
MSE = pd.Series(dict(lasso_alpha_min = lasso_min_mse , lasso_alpha_default = lasso_1_mse,
train_and_test_25 = 6.389683,
gravity =gravity_full_mse, ridge_gravity = ridge_mse, single_variable = gravity_mse,
layer_3_10_20_10 = mse_nn_scaled_3_layer , layer_2_50_40 = mse_nn_scaled_2_layer_more ))
r2_mse = pd.concat([R_sqaure,MSE],axis = 1)
r2_mse.columns = ['R^2','MSE']
r2_mse
R^2 | MSE | |
---|---|---|
lasso_alpha_min | 0.634240 | 6.414600 |
lasso_alpha_default | 0.500732 | 8.756027 |
gravity | 0.634489 | 6.410241 |
ridge_gravity | 0.634489 | 6.410241 |
single_variable | 0.313199 | 12.044946 |
train_and_test_25 | 0.634657 | 6.389683 |
layer_3_10_20_10 | 0.688272 | 5.467001 |
layer_2_50_40 | 0.710212 | 5.082227 |
ax = var_scatter(df_final_version_5)
scatter_model(mod_train_test, X, ax, color='lime')
scatter_model(nn_scaled_model_2_layer_more, X, ax, color='black')
scatter_model(nn_scaled_model_3_layer, X, ax, color='purple')
scatter_model(lr_model, X, ax, color='indianred')
scatter_model(lasso_alpha_model, X, ax, color='gray')
scatter_model(lasso_gravity_model, X, ax, color='pink')
scatter_model(gravity_lr_model, X[["log_GDP_o"]], ax, color='gold')
ax.legend(["data",'test and train','NN layer_2 model 50,40','NN layer_3 model 10,20,10', "Multi-linear Gravity model",
"lasso_a =1","lasso_a =0.006","log_GDP_o model"],markerscale = 5)
ax.set_title('Data vs Test_n_Train vs Neural Networks vs Gravity Multilinear vs Lasso vs Simple Linear')
ax.grid(False)
def create_groupcount_barplot(df, group_col, figsize, **kwargs):
"call df.groupby(group_col), then count number of records and plot"
counts = df.groupby(group_col)["iso_d"].count().sort_index()
fig, ax = plt.subplots(figsize=figsize)
counts.plot(kind="bar", **kwargs)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.set_xlabel("")
ax.set_ylabel("")
ax.set_title('Political Score from -10 to 10')
ax.grid(False)
return fig, ax
create_groupcount_barplot(df_final_version_5, "polity_d", (12, 8), color="DarkBlue", rot=0)
(<Figure size 864x576 with 1 Axes>, <matplotlib.axes._subplots.AxesSubplot at 0x1d32777510>)
What is the political score of China and the U.S.?
print(df_final_version_5.set_index('iso_d').loc['CHN']['polity_d'])
print(df_final_version_5.set_index('iso_d').loc['USA']['polity_d'])
iso_d CHN -7.0 CHN -7.0 CHN -7.0 CHN -7.0 CHN -7.0 ... CHN -7.0 CHN -7.0 CHN -7.0 CHN -7.0 CHN -7.0 Name: polity_d, Length: 1609, dtype: float64 iso_d USA 10.0 USA 10.0 USA 10.0 USA 10.0 USA 10.0 ... USA 10.0 USA 10.0 USA 10.0 USA 10.0 USA 10.0 Name: polity_d, Length: 1612, dtype: float64
Which country was set as -10? Qatar, Saudi and Bahrain ( quite surprising )
a = df_final_version_5.set_index('polity_d').loc[-10]['iso_d'].tolist()
minus_10 = []
for i in a :
if i not in minus_10:
minus_10.append(i)
minus_10
['QAT', 'SAU', 'BHR']
How does the political score differs by continent. From the plot below, most negative score come from middle east, central asia and euro asia. And China is the only negative score country in east asia
# create a region list
b = df_final_version_5['region_d'].tolist()
region_list = []
for i in b :
if i not in region_list:
region_list.append(i)
#then plot
region_count = df_final_version_5.groupby(["region_d"])["iso_d"].count()
dfgb = df_final_version_5.groupby("region_d")
region = df_final_version_5.groupby("region_d")["iso_d"].count()
fig, ax = plt.subplots(len(region_list), figsize=(20, 40))
for (i, region) in enumerate(region_list):
(
(dfgb
.get_group(region)
.groupby("polity_d")["iso_d"].count() / region_count[region] # percentage
)
.plot(kind="bar", ax=ax[i], color="blue", alpha = 0.5)
)
ax[i].set_ylabel(region,fontsize=20)
ax[i].set_xlabel("")
ax[i].set_facecolor(background_1)
ax[i].grid(False)
# set equal y limit for visual comparison
fig.set_facecolor(background_1)
fig.suptitle("Score Frequency by Region", size = 30)
fig.tight_layout()
And How does the political affects export percentage of GDP?
sns.lmplot(
data=df_final_version_5, x="polity_o", y="log_flow", height=6,
scatter_kws=dict(s=1.5, alpha=0.5)
);
plt.title(' log_flow againt polity_o')
Text(0.5, 1, ' log_flow againt polity_o')
Hostility_d varibale is coded 1–5 and denotes the level of hostility of country_d towards country_o in year t. We see most countries were not assigned to be in this score. After cleaning the data that drop the 0 value, only 263 rows left The defition of each level of hosility score is (given by USITC dynamic gravity dataset https://www.usitc.gov/data/gravity/html_documentation/dynamic_gravity_technical_documentation_v1_00.html#x1-480007.1)
1 - no militarized action taken
2 - some form of threat made
3 - some show of force or alert of force
4 - some use of more substantial force
5 - complete war
def create_groupcount_barplot(df, group_col, figsize, **kwargs):
"call df.groupby(group_col), then count number of records and plot"
counts = df.groupby(group_col)["iso_d"].count().sort_index()
fig, ax = plt.subplots(figsize=figsize)
counts.plot(kind="bar", **kwargs)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.set_title('Trade Flow against Hostility score')
ax.grid(False)
return fig, ax
create_groupcount_barplot(df_final_version_5, "hostility_level_d", (6, 4), color="DarkBlue", rot=0)
(<Figure size 432x288 with 1 Axes>, <matplotlib.axes._subplots.AxesSubplot at 0x1dcde6a250>)
Since there're too many zeros, what if it's conditional on a score has alreaby been assigned?
df_hostility = df_final_version_5.set_index("hostility_level_d").drop(0).reset_index()
def create_groupcount_barplot_c(df, group_col, figsize, **kwargs):
"call df.groupby(group_col), then count number of records and plot"
counts = df.groupby(group_col)["iso_d"].count().sort_index()
fig, ax = plt.subplots(figsize=figsize)
counts.plot(kind="bar", **kwargs)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.set_title('Trade Flow against Hostility score,conditional on score > 0')
ax.grid(False)
return fig, ax
create_groupcount_barplot_c(df_hostility, "hostility_level_d", (6, 4), color="DarkBlue", rot=0)
(<Figure size 432x288 with 1 Axes>, <matplotlib.axes._subplots.AxesSubplot at 0x1dce54c510>)
sns.lmplot(
data=df_hostility, x="hostility_level_d", y="log_flow", height=6,
scatter_kws=dict(s=1.5, alpha=0.5)
);
plt.title('log_flow agains hostility conditional on score has been assgined')
Text(0.5, 1, 'log_flow agains hostility conditional on score has been assgined')
From the process below, we can see the sanction on trade decision is not dependent on other classifications. So it may not cause multi-collinearity issue. The contour plot map is a visualization of the VIF result on 7.3 and 7.4
The correlation table between trade_sanction, hostility_level_o, hostility_level_d, polity_o,polity_d
df_final_version_5[["sanction_imposition_trade","hostility_level_o", "hostility_level_d",'polity_o','polity_d']].corr()
sanction_imposition_trade | hostility_level_o | hostility_level_d | polity_o | polity_d | |
---|---|---|---|---|---|
sanction_imposition_trade | 1.000000 | 0.053208 | 0.052771 | 0.011827 | 0.011438 |
hostility_level_o | 0.053208 | 1.000000 | 0.863372 | -0.005700 | -0.007234 |
hostility_level_d | 0.052771 | 0.863372 | 1.000000 | -0.006295 | -0.007205 |
polity_o | 0.011827 | -0.005700 | -0.006295 | 1.000000 | -0.049693 |
polity_d | 0.011438 | -0.007234 | -0.007205 | -0.049693 | 1.000000 |
y = df_final_version_5["sanction_imposition_trade"]
X = df_final_version_5[["hostility_level_o", "hostility_level_d"]]
X_train, X_test, y_train, y_test = model_selection.train_test_split(
X, y, test_size=0.25,
)
def plot_contours(ax, mod, xx, yy, **params):
"""
Plot the decision boundaries for a classifier with 2 features x and y.
Parameters
----------
ax: matplotlib axes object
mod: a classifier
xx: meshgrid ndarray
yy: meshgrid ndarray
params: dictionary of params to pass to contourf, optional
"""
Z = mod.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
out = ax.contourf(xx, yy, Z, **params)
return out
def fit_and_plot_decision_boundary(mod, X, y, **params):
# fit model
mod.fit(X, y)
# generate grids of first two columns of X
def gen_grid(xseries):
if xseries.nunique() < 6:
return sorted(xseries.unique())
else:
return np.linspace(xseries.min(), xseries.max(), 6)
x1, x2 = np.meshgrid(gen_grid(X.iloc[:, 0]), gen_grid(X.iloc[:, 1]))
# plot contours and scatter
fig, ax = plt.subplots()
plot_contours(ax, mod, x1, x2, **params)
x1_name, x2_name = list(X)[:2]
X.plot.scatter(x=x1_name, y=x2_name, color=y, ax=ax)
ax.set_xlabel(x1_name)
ax.set_ylabel(x2_name)
return ax
logistic_age_model = linear_model.LogisticRegression(solver="lbfgs")
logistic_age_model.fit(X_train, y_train)
beta_0 = logistic_age_model.intercept_[0]
beta_1, beta_2 = logistic_age_model.coef_[0]
print(f"Fit model: p(sanction_imposition_trade) = L({beta_0:.4f} + {beta_1:.4f} hostility_level_o + {beta_2:.4f} hostility_level_o)")
fit_and_plot_decision_boundary(
linear_model.LogisticRegression(solver="lbfgs"),
X_train, y_train, cmap=plt.cm.Greys
)
Fit model: p(sanction_imposition_trade) = L(-5.8408 + 0.2252 hostility_level_o + 0.7849 hostility_level_o)
<matplotlib.axes._subplots.AxesSubplot at 0x1d327b4190>
The package was created by the Gravity Modeling Group at the United States International Trade Commission. https://www.usitc.gov/data/gravity/gme_docs/about/ I noticed this package at the end of finishing the project. This package facilitates the estimation procedure for fixed effects and specific country's estimation I've not fully learned the GME package. Here's a simple example that is much similer to the GME instruction that estimates the small gravity dataset, for which I also created a summary table.
Implementing the merged CEPII and USITC dataset by GME
gme_data = gme.EstimationData(data_frame = df_final_version_5,
imp_var_name = 'iso_o',
exp_var_name = 'iso_d',
trade_var_name = 'log_flow',
year_var_name = 'year')
print(gme_data)
number of countries: 194 number of exporters: 194 number of importers: 193 number of years: 10 number of sectors: not_applicable dimensions: (214776, 21)
gme_model.estimate()
select specification variables: ['log_distance', 'agree_pta', 'common_language', 'contiguity', 'trade_value', 'importer', 'exporter', 'year'], Observations excluded by user: {'rows': 0, 'columns': 0} drop_intratrade: no, Observations excluded by user: {'rows': 0, 'columns': 0} drop_imp: none, Observations excluded by user: {'rows': 0, 'columns': 0} drop_exp: none, Observations excluded by user: {'rows': 0, 'columns': 0} keep_imp: all available, Observations excluded by user: {'rows': 0, 'columns': 0} keep_exp: all available, Observations excluded by user: {'rows': 0, 'columns': 0} drop_years: none, Observations excluded by user: {'rows': 0, 'columns': 0} keep_years: [1989], Observations excluded by user: {'rows': 95190, 'columns': 0} drop_missing: yes, Observations excluded by user: {'rows': 116, 'columns': 0} Estimation began at 05:40 PM on Apr 18, 2020 Omitted Columns: ['importer_fe_ARG', 'importer_fe_AUT', 'importer_fe_BOL', 'importer_fe_CHL', 'importer_fe_CHN', 'importer_fe_COL', 'importer_fe_DZA', 'importer_fe_ECU', 'importer_fe_EGY', 'importer_fe_FRA', 'importer_fe_GBR', 'importer_fe_GHA', 'importer_fe_HUN', 'importer_fe_IRL', 'importer_fe_IRN', 'importer_fe_ISR', 'importer_fe_ITA', 'importer_fe_KEN', 'importer_fe_KWT', 'importer_fe_LBY', 'importer_fe_MAR', 'importer_fe_MEX', 'importer_fe_NGA', 'importer_fe_NLD', 'importer_fe_NOR', 'importer_fe_PAK', 'importer_fe_PER', 'importer_fe_PHL', 'importer_fe_POL', 'importer_fe_SAU', 'importer_fe_SDN', 'importer_fe_SWE', 'importer_fe_TUN', 'importer_fe_URY', 'importer_fe_USA', 'importer_fe_VEN', 'importer_fe_ZAF'] Estimation completed at 05:40 PM on Apr 18, 2020
{'all': <gme.estimate.SlimResults.SlimResults at 0x1a1d1a4710>}
fixed_effects_model_2 = gme.EstimationModel(estimation_data = gme_data,
lhs_var = 'log_flow',
rhs_var = (['log_GDP_o','log_GDP_d','log_distance',
'contiguity','landlocked_d','island_d','common_language','colony_of_origin_after45',
'agree_fta','member_wto_o','member_wto_d','polity_o','polity_d',
'hostility_level_o','hostility_level_d','sanction_imposition_trade']),
fixed_effects = ['iso_d', 'iso_o'],
keep_years = [2014])
# Conduct a PPML estimation of the fixed effects model.
estimates = fixed_effects_model_2.estimate()
select specification variables: ['log_GDP_o', 'log_GDP_d', 'log_distance', 'contiguity', 'landlocked_d', 'island_d', 'common_language', 'colony_of_origin_after45', 'agree_fta', 'member_wto_o', 'member_wto_d', 'polity_o', 'polity_d', 'hostility_level_o', 'hostility_level_d', 'sanction_imposition_trade', 'log_flow', 'iso_o', 'iso_d', 'year'], Observations excluded by user: {'rows': 0, 'columns': 1} drop_intratrade: no, Observations excluded by user: {'rows': 0, 'columns': 0} drop_imp: none, Observations excluded by user: {'rows': 0, 'columns': 0} drop_exp: none, Observations excluded by user: {'rows': 0, 'columns': 0} keep_imp: all available, Observations excluded by user: {'rows': 0, 'columns': 0} keep_exp: all available, Observations excluded by user: {'rows': 0, 'columns': 0} drop_years: none, Observations excluded by user: {'rows': 0, 'columns': 0} keep_years: [2014], Observations excluded by user: {'rows': 194884, 'columns': 0} drop_missing: yes, Observations excluded by user: {'rows': 0, 'columns': 0} Estimation began at 01:14 PM on Apr 19, 2020 Omitted Columns: ['hostility_level_o', 'hostility_level_d', 'sanction_imposition_trade', 'iso_d_fe_VUT', 'iso_d_fe_WSM', 'iso_d_fe_ZAF', 'iso_d_fe_ZMB', 'iso_d_fe_ZWE', 'iso_o_fe_UZB', 'iso_o_fe_ZAF', 'iso_o_fe_ZMB', 'iso_o_fe_ZWE', 'iso_d_fe_URY', 'iso_d_fe_USA', 'iso_d_fe_UZB', 'iso_d_fe_VCT', 'iso_d_fe_VNM', 'iso_o_fe_TUV', 'iso_o_fe_VNM', 'iso_o_fe_VUT', 'iso_o_fe_WSM'] Estimation completed at 01:14 PM on Apr 19, 2020
fixed_effects_model_2.format_regression_table(format = "txt")
Variable | all | |
---|---|---|
a_agree_fta | agree_fta | 0.019*** |
a_agree_fta_se | (0.003) | |
a_colony_of_origin_after45 | colony_of_origin_after45 | 0.061*** |
a_colony_of_origin_after45_se | (0.012) | |
a_common_language | common_language | 0.060*** |
... | ... | ... |
a_polity_o_se | (2.323) | |
b_aic | AIC | 93529.945 |
b_bic | BIC | -175316.579 |
b_llf | Likelihood | -46425.972 |
b_nobs | Obs. | 18934 |
682 rows × 2 columns
results = estimates['all']
results.summary()
Dep. Variable: | log_flow | No. Iterations: | 4 |
---|---|---|---|
Model: | GLM | Df Residuals: | 18595 |
Model Family: | Poisson | Df Model: | 338 |
Link Function: | log | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -46426. |
Covariance Type: | HC1 | Deviance: | 7820.3 |
No. Observations: | 18934 | Pearson chi2: | 7.35e+03 |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
log_GDP_o | -2.3455 | 2.988 | -0.785 | 0.432 | -8.202 | 3.511 |
log_GDP_d | 0.0680 | 0.003 | 22.698 | 0.000 | 0.062 | 0.074 |
log_distance | -0.1078 | 0.002 | -49.488 | 0.000 | -0.112 | -0.104 |
contiguity | 0.0103 | 0.008 | 1.236 | 0.216 | -0.006 | 0.027 |
landlocked_d | 40.9660 | 50.652 | 0.809 | 0.419 | -58.317 | 140.249 |
island_d | 0.0011 | 0.030 | 0.037 | 0.970 | -0.058 | 0.060 |
common_language | 0.0597 | 0.003 | 18.578 | 0.000 | 0.053 | 0.066 |
colony_of_origin_after45 | 0.0610 | 0.012 | 4.917 | 0.000 | 0.037 | 0.085 |
agree_fta | 0.0185 | 0.003 | 5.351 | 0.000 | 0.012 | 0.025 |
member_wto_o | 7.4399 | 9.139 | 0.814 | 0.416 | -10.473 | 25.353 |
member_wto_d | 41.0893 | 50.652 | 0.811 | 0.417 | -58.194 | 140.373 |
polity_o | -1.9071 | 2.323 | -0.821 | 0.412 | -6.461 | 2.647 |
polity_d | -0.0038 | 0.001 | -3.698 | 0.000 | -0.006 | -0.002 |
iso_d_fe_AFG | 0.0855 | 0.026 | 3.328 | 0.001 | 0.035 | 0.136 |
iso_d_fe_ALB | -0.0431 | 0.022 | -1.919 | 0.055 | -0.087 | 0.001 |
iso_d_fe_ARE | -0.0296 | 0.017 | -1.774 | 0.076 | -0.062 | 0.003 |
iso_d_fe_ARG | -0.1136 | 0.018 | -6.318 | 0.000 | -0.149 | -0.078 |
iso_d_fe_ARM | -41.0070 | 50.652 | -0.810 | 0.418 | -140.290 | 58.276 |
iso_d_fe_ATG | 0.0437 | 0.029 | 1.530 | 0.126 | -0.012 | 0.100 |
iso_d_fe_AUS | 0.0265 | 0.030 | 0.889 | 0.374 | -0.032 | 0.085 |
iso_d_fe_AUT | -41.0841 | 50.653 | -0.811 | 0.417 | -140.368 | 58.200 |
iso_d_fe_AZE | 40.9292 | 50.652 | 0.808 | 0.419 | -58.354 | 140.212 |
iso_d_fe_BDI | -41.0643 | 50.652 | -0.811 | 0.418 | -140.348 | 58.219 |
iso_d_fe_BEL | 0.0450 | 0.016 | 2.889 | 0.004 | 0.014 | 0.075 |
iso_d_fe_BEN | -0.0198 | 0.025 | -0.803 | 0.422 | -0.068 | 0.029 |
iso_d_fe_BFA | -41.0219 | 50.652 | -0.810 | 0.418 | -140.305 | 58.261 |
iso_d_fe_BGD | -0.0001 | 0.017 | -0.007 | 0.995 | -0.033 | 0.033 |
iso_d_fe_BGR | -0.0166 | 0.020 | -0.842 | 0.400 | -0.055 | 0.022 |
iso_d_fe_BHR | -0.1098 | 0.031 | -3.539 | 0.000 | -0.171 | -0.049 |
iso_d_fe_BHS | 41.0948 | 50.652 | 0.811 | 0.417 | -58.188 | 140.378 |
iso_d_fe_BIH | 40.9759 | 50.652 | 0.809 | 0.419 | -58.307 | 140.259 |
iso_d_fe_BLR | 0.0362 | 0.021 | 1.760 | 0.078 | -0.004 | 0.077 |
iso_d_fe_BLZ | 0.0278 | 0.028 | 0.995 | 0.320 | -0.027 | 0.083 |
iso_d_fe_BOL | -41.0601 | 50.653 | -0.811 | 0.418 | -140.345 | 58.224 |
iso_d_fe_BRA | -0.0859 | 0.014 | -6.278 | 0.000 | -0.113 | -0.059 |
iso_d_fe_BRB | -0.0535 | 0.027 | -1.948 | 0.051 | -0.107 | 0.000 |
iso_d_fe_BRN | -0.1354 | 0.023 | -5.815 | 0.000 | -0.181 | -0.090 |
iso_d_fe_BTN | -0.0012 | 0.036 | -0.033 | 0.974 | -0.072 | 0.070 |
iso_d_fe_BWA | -41.0896 | 50.652 | -0.811 | 0.417 | -140.373 | 58.194 |
iso_d_fe_CAF | -40.9817 | 50.652 | -0.809 | 0.418 | -140.265 | 58.302 |
iso_d_fe_CAN | 0.0153 | 0.012 | 1.271 | 0.204 | -0.008 | 0.039 |
iso_d_fe_CHE | -41.0244 | 50.652 | -0.810 | 0.418 | -140.308 | 58.259 |
iso_d_fe_CHL | -0.0618 | 0.018 | -3.368 | 0.001 | -0.098 | -0.026 |
iso_d_fe_CHN | -0.0358 | 0.022 | -1.645 | 0.100 | -0.079 | 0.007 |
iso_d_fe_CIV | -0.0040 | 0.019 | -0.212 | 0.832 | -0.041 | 0.033 |
iso_d_fe_CMR | -0.0925 | 0.021 | -4.432 | 0.000 | -0.133 | -0.052 |
iso_d_fe_COD | -0.0062 | 0.021 | -0.298 | 0.765 | -0.047 | 0.035 |
iso_d_fe_COG | 0.0002 | 0.020 | 0.013 | 0.990 | -0.038 | 0.039 |
iso_d_fe_COL | -0.0670 | 0.015 | -4.541 | 0.000 | -0.096 | -0.038 |
iso_d_fe_COM | 41.1426 | 50.652 | 0.812 | 0.417 | -58.141 | 140.426 |
iso_d_fe_CPV | 0.0015 | 0.032 | 0.048 | 0.961 | -0.061 | 0.064 |
iso_d_fe_CRI | -0.0791 | 0.023 | -3.499 | 0.000 | -0.123 | -0.035 |
iso_d_fe_CYP | -0.0849 | 0.026 | -3.269 | 0.001 | -0.136 | -0.034 |
iso_d_fe_CZE | -41.0057 | 50.653 | -0.810 | 0.418 | -140.289 | 58.278 |
iso_d_fe_DEU | -0.0197 | 0.013 | -1.503 | 0.133 | -0.045 | 0.006 |
iso_d_fe_DJI | 0.0979 | 0.027 | 3.661 | 0.000 | 0.045 | 0.150 |
iso_d_fe_DMA | 0.0402 | 0.030 | 1.354 | 0.176 | -0.018 | 0.098 |
iso_d_fe_DNK | -0.0673 | 0.015 | -4.495 | 0.000 | -0.097 | -0.038 |
iso_d_fe_DOM | -0.0332 | 0.021 | -1.552 | 0.121 | -0.075 | 0.009 |
iso_d_fe_DZA | 40.9874 | 50.652 | 0.809 | 0.418 | -58.295 | 140.270 |
iso_d_fe_ECU | -0.0081 | 0.016 | -0.513 | 0.608 | -0.039 | 0.023 |
iso_d_fe_EGY | -0.0911 | 0.015 | -6.126 | 0.000 | -0.120 | -0.062 |
iso_d_fe_ESP | 0.0201 | 0.013 | 1.529 | 0.126 | -0.006 | 0.046 |
iso_d_fe_EST | -0.0983 | 0.023 | -4.221 | 0.000 | -0.144 | -0.053 |
iso_d_fe_ETH | -0.0067 | 0.025 | -0.268 | 0.789 | -0.056 | 0.042 |
iso_d_fe_FIN | -0.1008 | 0.019 | -5.341 | 0.000 | -0.138 | -0.064 |
iso_d_fe_FJI | 0.0180 | 0.029 | 0.616 | 0.538 | -0.039 | 0.075 |
iso_d_fe_FRA | -0.0096 | 0.012 | -0.801 | 0.423 | -0.033 | 0.014 |
iso_d_fe_GAB | -0.0185 | 0.019 | -0.951 | 0.342 | -0.057 | 0.020 |
iso_d_fe_GBR | -0.0307 | 0.031 | -0.986 | 0.324 | -0.092 | 0.030 |
iso_d_fe_GEO | -0.0028 | 0.021 | -0.137 | 0.891 | -0.044 | 0.038 |
iso_d_fe_GHA | 0.0426 | 0.018 | 2.347 | 0.019 | 0.007 | 0.078 |
iso_d_fe_GIN | 0.0551 | 0.022 | 2.531 | 0.011 | 0.012 | 0.098 |
iso_d_fe_GNB | 0.0244 | 0.031 | 0.794 | 0.427 | -0.036 | 0.085 |
iso_d_fe_GNQ | 40.9895 | 50.652 | 0.809 | 0.418 | -58.294 | 140.273 |
iso_d_fe_GRC | -0.0342 | 0.016 | -2.170 | 0.030 | -0.065 | -0.003 |
iso_d_fe_GRD | -0.0044 | 0.029 | -0.151 | 0.880 | -0.062 | 0.053 |
iso_d_fe_GTM | -0.0411 | 0.021 | -1.983 | 0.047 | -0.082 | -0.000 |
iso_d_fe_GUY | 0.0440 | 0.027 | 1.642 | 0.101 | -0.009 | 0.097 |
iso_d_fe_HKG | 0.0427 | 0.013 | 3.405 | 0.001 | 0.018 | 0.067 |
iso_d_fe_HND | -0.0458 | 0.024 | -1.912 | 0.056 | -0.093 | 0.001 |
iso_d_fe_HRV | -0.0856 | 0.019 | -4.547 | 0.000 | -0.123 | -0.049 |
iso_d_fe_HTI | 0.0041 | 0.021 | 0.195 | 0.845 | -0.038 | 0.046 |
iso_d_fe_HUN | -41.0649 | 50.652 | -0.811 | 0.418 | -140.348 | 58.218 |
iso_d_fe_IDN | -0.0136 | 0.016 | -0.861 | 0.390 | -0.045 | 0.017 |
iso_d_fe_IND | 0.0302 | 0.014 | 2.185 | 0.029 | 0.003 | 0.057 |
iso_d_fe_IRL | -0.1060 | 0.017 | -6.226 | 0.000 | -0.139 | -0.073 |
iso_d_fe_IRN | 40.9322 | 50.652 | 0.808 | 0.419 | -58.351 | 140.216 |
iso_d_fe_IRQ | 41.0076 | 50.652 | 0.810 | 0.418 | -58.276 | 140.291 |
iso_d_fe_ISL | -0.0700 | 0.026 | -2.683 | 0.007 | -0.121 | -0.019 |
iso_d_fe_ISR | -0.0688 | 0.012 | -5.578 | 0.000 | -0.093 | -0.045 |
iso_d_fe_ITA | -0.0129 | 0.013 | -0.981 | 0.327 | -0.039 | 0.013 |
iso_d_fe_JAM | 0.0049 | 0.028 | 0.176 | 0.860 | -0.050 | 0.060 |
iso_d_fe_JOR | -0.0660 | 0.017 | -3.918 | 0.000 | -0.099 | -0.033 |
iso_d_fe_JPN | 0.0343 | 0.033 | 1.055 | 0.291 | -0.029 | 0.098 |
iso_d_fe_KAZ | -0.0070 | 0.022 | -0.326 | 0.745 | -0.049 | 0.035 |
iso_d_fe_KEN | -0.0278 | 0.020 | -1.408 | 0.159 | -0.066 | 0.011 |
iso_d_fe_KGZ | -40.9613 | 50.652 | -0.809 | 0.419 | -140.245 | 58.322 |
iso_d_fe_KHM | -0.0907 | 0.020 | -4.448 | 0.000 | -0.131 | -0.051 |
iso_d_fe_KIR | 41.1072 | 50.652 | 0.812 | 0.417 | -58.176 | 140.391 |
iso_d_fe_KNA | 0.0184 | 0.029 | 0.632 | 0.528 | -0.039 | 0.076 |
iso_d_fe_KOR | 0.0448 | 0.014 | 3.184 | 0.001 | 0.017 | 0.072 |
iso_d_fe_KWT | -0.1474 | 0.018 | -8.293 | 0.000 | -0.182 | -0.113 |
iso_d_fe_LAO | -41.1680 | 50.652 | -0.813 | 0.416 | -140.451 | 58.115 |
iso_d_fe_LBN | 41.0669 | 50.652 | 0.811 | 0.418 | -58.216 | 140.350 |
iso_d_fe_LBR | 41.2016 | 50.652 | 0.813 | 0.416 | -58.082 | 140.485 |
iso_d_fe_LBY | 41.0506 | 50.652 | 0.810 | 0.418 | -58.233 | 140.334 |
iso_d_fe_LCA | 0.0019 | 0.029 | 0.065 | 0.949 | -0.056 | 0.060 |
iso_d_fe_LKA | -0.0819 | 0.028 | -2.949 | 0.003 | -0.136 | -0.027 |
iso_d_fe_LSO | -41.1165 | 50.652 | -0.812 | 0.417 | -140.400 | 58.167 |
iso_d_fe_LTU | -0.0777 | 0.023 | -3.441 | 0.001 | -0.122 | -0.033 |
iso_d_fe_LVA | -0.0722 | 0.021 | -3.422 | 0.001 | -0.113 | -0.031 |
iso_d_fe_MAC | -0.1319 | 0.023 | -5.698 | 0.000 | -0.177 | -0.087 |
iso_d_fe_MAR | -0.0472 | 0.029 | -1.612 | 0.107 | -0.105 | 0.010 |
iso_d_fe_MDA | -0.0139 | 0.023 | -0.594 | 0.552 | -0.060 | 0.032 |
iso_d_fe_MDG | -0.0324 | 0.027 | -1.187 | 0.235 | -0.086 | 0.021 |
iso_d_fe_MDV | 0.0001 | 0.027 | 0.005 | 0.996 | -0.053 | 0.053 |
iso_d_fe_MEX | -0.0086 | 0.013 | -0.667 | 0.505 | -0.034 | 0.017 |
iso_d_fe_MKD | -41.0756 | 50.652 | -0.811 | 0.417 | -140.359 | 58.208 |
iso_d_fe_MLI | -40.9877 | 50.652 | -0.809 | 0.418 | -140.271 | 58.296 |
iso_d_fe_MMR | -0.1120 | 0.018 | -6.363 | 0.000 | -0.146 | -0.077 |
iso_d_fe_MNG | -40.9866 | 50.652 | -0.809 | 0.418 | -140.270 | 58.297 |
iso_d_fe_MOZ | 0.0523 | 0.020 | 2.617 | 0.009 | 0.013 | 0.091 |
iso_d_fe_MRT | 0.0053 | 0.023 | 0.228 | 0.820 | -0.041 | 0.051 |
iso_d_fe_MUS | 0.0327 | 0.026 | 1.251 | 0.211 | -0.019 | 0.084 |
iso_d_fe_MWI | -40.9315 | 50.652 | -0.808 | 0.419 | -140.215 | 58.352 |
iso_d_fe_MYS | 0.0524 | 0.014 | 3.813 | 0.000 | 0.025 | 0.079 |
iso_d_fe_NAM | -0.0100 | 0.023 | -0.441 | 0.659 | -0.054 | 0.034 |
iso_d_fe_NER | -40.9653 | 50.652 | -0.809 | 0.419 | -140.249 | 58.318 |
iso_d_fe_NGA | -0.0731 | 0.013 | -5.427 | 0.000 | -0.099 | -0.047 |
iso_d_fe_NIC | -0.0516 | 0.026 | -2.010 | 0.044 | -0.102 | -0.001 |
iso_d_fe_NLD | 0.0791 | 0.015 | 5.431 | 0.000 | 0.051 | 0.108 |
iso_d_fe_NOR | -0.0323 | 0.014 | -2.240 | 0.025 | -0.061 | -0.004 |
iso_d_fe_NPL | -41.0664 | 50.652 | -0.811 | 0.418 | -140.350 | 58.217 |
iso_d_fe_OMN | -0.1400 | 0.021 | -6.634 | 0.000 | -0.181 | -0.099 |
iso_d_fe_PAK | -0.0414 | 0.015 | -2.837 | 0.005 | -0.070 | -0.013 |
iso_d_fe_PAN | -0.0113 | 0.019 | -0.609 | 0.542 | -0.048 | 0.025 |
iso_d_fe_PER | -0.0545 | 0.020 | -2.699 | 0.007 | -0.094 | -0.015 |
iso_d_fe_PHL | -0.0944 | 0.029 | -3.256 | 0.001 | -0.151 | -0.038 |
iso_d_fe_PLW | 40.9141 | 50.652 | 0.808 | 0.419 | -58.369 | 140.197 |
iso_d_fe_PNG | -0.0942 | 0.026 | -3.634 | 0.000 | -0.145 | -0.043 |
iso_d_fe_POL | -0.0606 | 0.015 | -4.048 | 0.000 | -0.090 | -0.031 |
iso_d_fe_PRT | 0.0291 | 0.016 | 1.775 | 0.076 | -0.003 | 0.061 |
iso_d_fe_PRY | -40.9660 | 50.653 | -0.809 | 0.419 | -140.250 | 58.318 |
iso_d_fe_QAT | -0.1454 | 0.018 | -8.216 | 0.000 | -0.180 | -0.111 |
iso_d_fe_RUS | -0.0714 | 0.013 | -5.526 | 0.000 | -0.097 | -0.046 |
iso_d_fe_RWA | -41.0554 | 50.652 | -0.811 | 0.418 | -140.339 | 58.228 |
iso_d_fe_SAU | -0.1388 | 0.020 | -7.111 | 0.000 | -0.177 | -0.101 |
iso_d_fe_SDN | 40.9774 | 50.652 | 0.809 | 0.419 | -58.306 | 140.261 |
iso_d_fe_SEN | 0.0275 | 0.021 | 1.300 | 0.193 | -0.014 | 0.069 |
iso_d_fe_SGP | 0.0467 | 0.014 | 3.247 | 0.001 | 0.019 | 0.075 |
iso_d_fe_SLB | 0.0466 | 0.033 | 1.414 | 0.157 | -0.018 | 0.111 |
iso_d_fe_SLE | 0.0519 | 0.024 | 2.161 | 0.031 | 0.005 | 0.099 |
iso_d_fe_SLV | -0.0389 | 0.023 | -1.708 | 0.088 | -0.084 | 0.006 |
iso_d_fe_SOM | 41.0233 | 50.652 | 0.810 | 0.418 | -58.260 | 140.307 |
iso_d_fe_STP | 41.0411 | 50.652 | 0.810 | 0.418 | -58.242 | 140.324 |
iso_d_fe_SUR | -0.0247 | 0.026 | -0.941 | 0.347 | -0.076 | 0.027 |
iso_d_fe_SVK | -41.0997 | 50.652 | -0.811 | 0.417 | -140.383 | 58.184 |
iso_d_fe_SVN | -0.0372 | 0.028 | -1.336 | 0.182 | -0.092 | 0.017 |
iso_d_fe_SWE | -0.0747 | 0.015 | -4.974 | 0.000 | -0.104 | -0.045 |
iso_d_fe_SWZ | -41.0821 | 50.652 | -0.811 | 0.417 | -140.365 | 58.201 |
iso_d_fe_SYC | 41.1362 | 50.652 | 0.812 | 0.417 | -58.147 | 140.420 |
iso_d_fe_TCD | -41.0563 | 50.652 | -0.811 | 0.418 | -140.340 | 58.227 |
iso_d_fe_TGO | 0.0722 | 0.021 | 3.358 | 0.001 | 0.030 | 0.114 |
iso_d_fe_THA | 0.0170 | 0.016 | 1.049 | 0.294 | -0.015 | 0.049 |
iso_d_fe_TJK | -41.0480 | 50.652 | -0.810 | 0.418 | -140.331 | 58.235 |
iso_d_fe_TKM | 0.0068 | 0.023 | 0.294 | 0.769 | -0.038 | 0.052 |
iso_d_fe_TON | -0.0023 | 0.039 | -0.060 | 0.952 | -0.078 | 0.073 |
iso_d_fe_TTO | -0.0092 | 0.027 | -0.342 | 0.733 | -0.062 | 0.044 |
iso_d_fe_TUN | -0.0305 | 0.019 | -1.617 | 0.106 | -0.068 | 0.006 |
iso_d_fe_TUR | -0.0104 | 0.012 | -0.835 | 0.404 | -0.035 | 0.014 |
iso_d_fe_TUV | 41.1241 | 50.652 | 0.812 | 0.417 | -58.159 | 140.408 |
iso_d_fe_TZA | -0.0683 | 0.018 | -3.869 | 0.000 | -0.103 | -0.034 |
iso_d_fe_UGA | -41.0437 | 50.652 | -0.810 | 0.418 | -140.327 | 58.240 |
iso_d_fe_UKR | -0.0274 | 0.015 | -1.858 | 0.063 | -0.056 | 0.002 |
iso_o_fe_AFG | 13.1758 | 16.417 | 0.803 | 0.422 | -19.003 | 45.354 |
iso_o_fe_ALB | 23.8809 | 29.266 | 0.816 | 0.415 | -33.484 | 81.245 |
iso_o_fe_ARE | -0.1779 | 0.046 | -3.908 | 0.000 | -0.267 | -0.089 |
iso_o_fe_ARG | 31.0599 | 38.017 | 0.817 | 0.414 | -43.457 | 105.577 |
iso_o_fe_ARM | 15.8795 | 19.596 | 0.810 | 0.418 | -22.530 | 54.289 |
iso_o_fe_ATG | 0.9383 | 1.242 | 0.756 | 0.450 | -1.496 | 3.372 |
iso_o_fe_AUS | 37.2582 | 45.638 | 0.816 | 0.414 | -52.196 | 126.713 |
iso_o_fe_AUT | 34.3484 | 42.044 | 0.817 | 0.414 | -48.061 | 116.758 |
iso_o_fe_AZE | 4.9544 | 6.430 | 0.771 | 0.441 | -7.648 | 17.557 |
iso_o_fe_BDI | 16.1810 | 20.030 | 0.808 | 0.419 | -23.079 | 55.441 |
iso_o_fe_BEL | 31.0773 | 37.983 | 0.818 | 0.413 | -43.373 | 105.528 |
iso_o_fe_BEN | 19.2635 | 23.658 | 0.814 | 0.415 | -27.108 | 65.635 |
iso_o_fe_BFA | 6.5565 | 8.202 | 0.799 | 0.424 | -9.520 | 22.633 |
iso_o_fe_BGD | 14.9526 | 18.364 | 0.814 | 0.416 | -21.043 | 50.948 |
iso_o_fe_BGR | 27.5455 | 33.620 | 0.819 | 0.413 | -38.353 | 93.444 |
iso_o_fe_BHR | -10.0124 | 12.064 | -0.830 | 0.407 | -33.659 | 13.634 |
iso_o_fe_BHS | 13.1377 | 16.181 | 0.812 | 0.417 | -18.579 | 44.855 |
iso_o_fe_BIH | 15.0024 | 18.467 | 0.812 | 0.417 | -21.194 | 51.199 |
iso_o_fe_BLR | 5.1282 | 6.467 | 0.793 | 0.428 | -7.547 | 17.803 |
iso_o_fe_BLZ | 1.8823 | 2.229 | 0.845 | 0.398 | -2.486 | 6.251 |
iso_o_fe_BOL | 22.3534 | 27.355 | 0.817 | 0.414 | -31.264 | 75.971 |
iso_o_fe_BRA | 34.5964 | 42.420 | 0.816 | 0.415 | -48.550 | 117.743 |
iso_o_fe_BRB | 4.0334 | 5.041 | 0.800 | 0.424 | -5.847 | 13.914 |
iso_o_fe_BRN | 7.2073 | 9.129 | 0.790 | 0.430 | -10.686 | 25.101 |
iso_o_fe_BTN | 18.9746 | 23.408 | 0.811 | 0.418 | -26.907 | 64.857 |
iso_o_fe_BWA | 22.2798 | 27.480 | 0.811 | 0.418 | -31.584 | 76.143 |
iso_o_fe_CAF | 1.7671 | 2.270 | 0.779 | 0.436 | -2.682 | 6.216 |
iso_o_fe_CAN | 37.7472 | 46.250 | 0.816 | 0.414 | -52.907 | 128.401 |
iso_o_fe_CHE | 35.4704 | 43.457 | 0.816 | 0.414 | -49.708 | 120.649 |
iso_o_fe_CHL | 33.1057 | 40.470 | 0.818 | 0.413 | -46.220 | 112.431 |
iso_o_fe_CHN | 9.6473 | 12.007 | 0.803 | 0.422 | -13.888 | 33.183 |
iso_o_fe_CIV | 16.7789 | 20.497 | 0.819 | 0.413 | -23.397 | 56.954 |
iso_o_fe_CMR | 1.2486 | 1.712 | 0.729 | 0.466 | -2.108 | 4.605 |
iso_o_fe_COD | 18.4300 | 22.720 | 0.811 | 0.417 | -26.102 | 62.962 |
iso_o_fe_COG | -0.6930 | 0.726 | -0.955 | 0.340 | -2.115 | 0.729 |
iso_o_fe_COL | 28.2351 | 34.639 | 0.815 | 0.415 | -39.661 | 96.131 |
iso_o_fe_COM | 23.9449 | 29.282 | 0.818 | 0.414 | -33.450 | 81.340 |
iso_o_fe_CPV | 20.9967 | 25.749 | 0.815 | 0.415 | -29.474 | 71.468 |
iso_o_fe_CRI | 29.1010 | 35.540 | 0.819 | 0.413 | -40.560 | 98.762 |
iso_o_fe_CYP | 27.2095 | 33.275 | 0.818 | 0.414 | -38.013 | 92.432 |
iso_o_fe_CZE | 30.6461 | 37.463 | 0.818 | 0.413 | -42.785 | 104.078 |
iso_o_fe_DEU | 39.6182 | 48.560 | 0.816 | 0.415 | -55.564 | 134.801 |
iso_o_fe_DJI | 7.2968 | 8.998 | 0.811 | 0.417 | -10.340 | 24.934 |
iso_o_fe_DMA | -1.0154 | 1.287 | -0.789 | 0.430 | -3.538 | 1.507 |
iso_o_fe_DNK | 33.7659 | 41.315 | 0.817 | 0.414 | -47.215 | 114.747 |
iso_o_fe_DOM | 25.8715 | 31.664 | 0.817 | 0.414 | -36.193 | 87.936 |
iso_o_fe_DZA | 24.6094 | 30.457 | 0.808 | 0.419 | -35.088 | 84.307 |
iso_o_fe_ECU | 21.3012 | 26.049 | 0.818 | 0.414 | -29.757 | 72.359 |
iso_o_fe_EGY | 6.5757 | 8.258 | 0.796 | 0.426 | -9.610 | 22.762 |
iso_o_fe_ESP | 37.1449 | 45.483 | 0.817 | 0.414 | -52.007 | 126.296 |
iso_o_fe_EST | 25.6984 | 31.344 | 0.820 | 0.412 | -35.740 | 87.136 |
iso_o_fe_ETH | 11.8696 | 14.821 | 0.801 | 0.423 | -17.180 | 40.919 |
iso_o_fe_FIN | 33.2265 | 40.630 | 0.818 | 0.413 | -46.412 | 112.865 |
iso_o_fe_FJI | 7.9794 | 9.806 | 0.814 | 0.416 | -11.242 | 27.201 |
iso_o_fe_FRA | 36.9444 | 45.302 | 0.816 | 0.415 | -51.852 | 125.741 |
iso_o_fe_GAB | 13.2137 | 16.281 | 0.812 | 0.417 | -18.698 | 45.125 |
iso_o_fe_GBR | 38.9442 | 47.790 | 0.815 | 0.415 | -54.728 | 132.616 |
iso_o_fe_GEO | 20.6893 | 25.289 | 0.818 | 0.413 | -28.880 | 70.259 |
iso_o_fe_GHA | 24.6297 | 30.148 | 0.817 | 0.414 | -34.463 | 83.723 |
iso_o_fe_GIN | 12.6542 | 15.587 | 0.812 | 0.417 | -17.898 | 43.206 |
iso_o_fe_GNB | 11.9798 | 14.650 | 0.818 | 0.414 | -16.736 | 40.696 |
iso_o_fe_GNQ | 3.1123 | 4.040 | 0.770 | 0.441 | -4.807 | 11.031 |
iso_o_fe_GRC | 32.7816 | 40.198 | 0.816 | 0.415 | -46.010 | 111.573 |
iso_o_fe_GRD | 0.2196 | 0.371 | 0.591 | 0.554 | -0.508 | 0.948 |
iso_o_fe_GTM | 25.7005 | 31.406 | 0.818 | 0.413 | -35.858 | 87.259 |
iso_o_fe_GUY | 14.7424 | 17.962 | 0.821 | 0.412 | -20.464 | 49.949 |
iso_o_fe_HKG | 14.3994 | 17.596 | 0.818 | 0.413 | -20.090 | 48.889 |
iso_o_fe_HND | 21.1241 | 25.765 | 0.820 | 0.412 | -29.379 | 71.627 |
iso_o_fe_HRV | 27.4678 | 33.641 | 0.817 | 0.414 | -38.471 | 93.407 |
iso_o_fe_HTI | 5.6618 | 7.113 | 0.796 | 0.426 | -8.281 | 19.604 |
iso_o_fe_HUN | 31.6019 | 38.608 | 0.819 | 0.413 | -44.072 | 107.276 |
iso_o_fe_IDN | 34.2100 | 41.842 | 0.818 | 0.414 | -47.803 | 116.223 |
iso_o_fe_IND | 36.2139 | 44.337 | 0.817 | 0.414 | -50.692 | 123.119 |
iso_o_fe_IRL | 33.0113 | 40.385 | 0.817 | 0.414 | -46.148 | 112.170 |
iso_o_fe_IRN | 9.1677 | 11.607 | 0.790 | 0.430 | -13.583 | 31.918 |
iso_o_fe_IRQ | 32.3036 | 39.886 | 0.810 | 0.418 | -45.877 | 110.484 |
iso_o_fe_ISL | 7.3953 | 9.117 | 0.811 | 0.417 | -10.474 | 25.265 |
iso_o_fe_ISR | 25.8504 | 31.683 | 0.816 | 0.415 | -36.252 | 87.953 |
iso_o_fe_ITA | 38.1861 | 46.793 | 0.816 | 0.414 | -53.532 | 129.905 |
iso_o_fe_JAM | 23.9995 | 29.416 | 0.816 | 0.415 | -33.659 | 81.658 |
iso_o_fe_JOR | 3.4653 | 4.368 | 0.793 | 0.428 | -5.097 | 12.027 |
iso_o_fe_JPN | 40.0833 | 49.079 | 0.817 | 0.414 | -56.116 | 136.282 |
iso_o_fe_KAZ | 9.4915 | 11.931 | 0.796 | 0.426 | -13.895 | 32.878 |
iso_o_fe_KEN | 27.6520 | 33.834 | 0.817 | 0.414 | -38.666 | 93.970 |
iso_o_fe_KGZ | 18.6881 | 22.890 | 0.816 | 0.414 | -26.178 | 63.554 |
iso_o_fe_KHM | 11.2697 | 13.718 | 0.822 | 0.411 | -15.618 | 38.157 |
iso_o_fe_KIR | 3.6324 | 4.432 | 0.820 | 0.412 | -5.054 | 12.319 |
iso_o_fe_KNA | 0.1010 | 0.172 | 0.589 | 0.556 | -0.235 | 0.437 |
iso_o_fe_KOR | 33.4461 | 40.899 | 0.818 | 0.413 | -46.720 | 113.612 |
iso_o_fe_KWT | -0.5709 | 0.387 | -1.475 | 0.140 | -1.330 | 0.188 |
iso_o_fe_LAO | -6.8760 | 8.194 | -0.839 | 0.401 | -22.937 | 9.185 |
iso_o_fe_LBN | 28.6131 | 35.145 | 0.814 | 0.416 | -40.274 | 97.501 |
iso_o_fe_LBR | 21.1270 | 25.813 | 0.818 | 0.413 | -29.469 | 71.723 |
iso_o_fe_LBY | 16.8471 | 20.890 | 0.806 | 0.420 | -24.099 | 57.793 |
iso_o_fe_LCA | 1.2439 | 1.660 | 0.749 | 0.454 | -2.010 | 4.498 |
iso_o_fe_LKA | 16.8975 | 20.664 | 0.818 | 0.414 | -23.605 | 57.401 |
iso_o_fe_LSO | 17.5468 | 21.561 | 0.814 | 0.416 | -24.715 | 59.808 |
iso_o_fe_LTU | 29.0540 | 35.466 | 0.819 | 0.413 | -40.463 | 98.571 |
iso_o_fe_LVA | 24.1933 | 29.519 | 0.820 | 0.412 | -33.667 | 82.053 |
iso_o_fe_MAC | 27.2476 | 33.555 | 0.812 | 0.417 | -38.524 | 93.019 |
iso_o_fe_MAR | 4.3232 | 5.397 | 0.801 | 0.423 | -6.256 | 14.902 |
iso_o_fe_MDA | 22.7405 | 27.753 | 0.819 | 0.413 | -31.658 | 77.139 |
iso_o_fe_MDG | 17.7013 | 21.637 | 0.818 | 0.413 | -24.708 | 60.111 |
iso_o_fe_MDV | 3.1177 | 3.988 | 0.782 | 0.434 | -4.700 | 10.935 |
iso_o_fe_MEX | 33.1638 | 40.643 | 0.816 | 0.415 | -46.501 | 112.828 |
iso_o_fe_MKD | 6.3847 | 7.896 | 0.809 | 0.419 | -9.093 | 21.862 |
iso_o_fe_MLI | 15.9472 | 19.695 | 0.810 | 0.418 | -22.657 | 54.551 |
iso_o_fe_MMR | 4.8762 | 6.117 | 0.797 | 0.425 | -7.114 | 16.866 |
iso_o_fe_MNG | 25.4819 | 31.306 | 0.814 | 0.416 | -35.881 | 86.845 |
iso_o_fe_MOZ | 16.8220 | 20.534 | 0.819 | 0.413 | -23.426 | 57.070 |
iso_o_fe_MRT | 0.6626 | 0.844 | 0.785 | 0.432 | -0.992 | 2.317 |
iso_o_fe_MUS | 25.7624 | 31.455 | 0.819 | 0.413 | -35.893 | 87.417 |
iso_o_fe_MWI | 15.6099 | 18.913 | 0.825 | 0.409 | -21.462 | 52.682 |
iso_o_fe_MYS | 24.3206 | 29.661 | 0.820 | 0.412 | -33.818 | 82.460 |
iso_o_fe_NAM | 18.1608 | 22.247 | 0.816 | 0.414 | -25.446 | 61.767 |
iso_o_fe_NER | 16.8845 | 20.860 | 0.809 | 0.418 | -24.003 | 57.772 |
iso_o_fe_NGA | 23.3657 | 28.891 | 0.809 | 0.419 | -33.263 | 79.995 |
iso_o_fe_NIC | 23.6688 | 28.930 | 0.818 | 0.413 | -33.037 | 80.374 |
iso_o_fe_NLD | 36.1082 | 44.134 | 0.818 | 0.413 | -50.398 | 122.614 |
iso_o_fe_NOR | 34.6336 | 42.446 | 0.816 | 0.415 | -48.564 | 117.831 |
iso_o_fe_NPL | 19.0815 | 23.501 | 0.812 | 0.417 | -26.982 | 65.145 |
iso_o_fe_OMN | -4.0700 | 4.781 | -0.851 | 0.395 | -13.442 | 5.302 |
iso_o_fe_PAK | 27.2275 | 33.329 | 0.817 | 0.414 | -38.100 | 92.555 |
iso_o_fe_PAN | 26.9731 | 33.008 | 0.817 | 0.414 | -37.725 | 91.671 |
iso_o_fe_PER | 30.5951 | 37.424 | 0.818 | 0.414 | -42.760 | 103.950 |
iso_o_fe_PHL | 29.4681 | 36.118 | 0.816 | 0.415 | -41.327 | 100.263 |
iso_o_fe_PLW | 4.3423 | 5.649 | 0.769 | 0.442 | -6.730 | 15.415 |
iso_o_fe_PNG | 16.8623 | 20.714 | 0.814 | 0.416 | -23.739 | 57.464 |
iso_o_fe_POL | 34.8784 | 42.704 | 0.817 | 0.414 | -48.826 | 118.582 |
iso_o_fe_PRT | 32.8235 | 40.128 | 0.818 | 0.413 | -45.831 | 111.478 |
iso_o_fe_PRY | 26.1029 | 31.803 | 0.821 | 0.412 | -36.234 | 88.440 |
iso_o_fe_QAT | -5.6192 | 6.609 | -0.850 | 0.395 | -18.573 | 7.335 |
iso_o_fe_RUS | 26.4097 | 32.434 | 0.814 | 0.415 | -37.163 | 89.983 |
iso_o_fe_RWA | -0.3177 | 0.157 | -2.021 | 0.043 | -0.626 | -0.010 |
iso_o_fe_SAU | -2.5185 | 2.822 | -0.893 | 0.372 | -8.049 | 3.012 |
iso_o_fe_SDN | 10.5690 | 13.343 | 0.792 | 0.428 | -15.585 | 36.723 |
iso_o_fe_SEN | 20.5736 | 25.127 | 0.819 | 0.413 | -28.678 | 69.825 |
iso_o_fe_SGP | 10.7417 | 13.119 | 0.819 | 0.413 | -14.972 | 36.455 |
iso_o_fe_SLB | 16.2288 | 19.669 | 0.825 | 0.409 | -22.325 | 54.782 |
iso_o_fe_SLE | 17.6466 | 21.618 | 0.816 | 0.414 | -24.726 | 60.019 |
iso_o_fe_SLV | 23.5677 | 28.868 | 0.816 | 0.414 | -33.017 | 80.152 |
iso_o_fe_SOM | 21.5274 | 26.603 | 0.809 | 0.418 | -30.618 | 73.673 |
iso_o_fe_STP | 5.2158 | 6.537 | 0.798 | 0.425 | -7.597 | 18.029 |
iso_o_fe_SUR | 13.9592 | 17.193 | 0.812 | 0.417 | -19.741 | 47.659 |
iso_o_fe_SVK | 30.7870 | 37.645 | 0.818 | 0.413 | -43.001 | 104.575 |
iso_o_fe_SVN | 29.1141 | 35.536 | 0.819 | 0.413 | -40.540 | 98.768 |
iso_o_fe_SWE | 35.0285 | 42.844 | 0.818 | 0.414 | -48.950 | 119.007 |
iso_o_fe_SWZ | -12.9566 | 15.829 | -0.819 | 0.413 | -43.983 | 18.070 |
iso_o_fe_SYC | 8.8650 | 10.836 | 0.818 | 0.413 | -12.375 | 30.105 |
iso_o_fe_TCD | 2.8458 | 3.867 | 0.736 | 0.462 | -4.734 | 10.426 |
iso_o_fe_TGO | 0.3866 | 0.505 | 0.765 | 0.444 | -0.604 | 1.377 |
iso_o_fe_THA | 9.4983 | 11.614 | 0.818 | 0.413 | -13.265 | 32.262 |
iso_o_fe_TJK | 0.0715 | 0.321 | 0.223 | 0.824 | -0.558 | 0.701 |
iso_o_fe_TKM | 1.9665 | 2.762 | 0.712 | 0.476 | -3.447 | 7.380 |
iso_o_fe_TON | -1.6393 | 1.848 | -0.887 | 0.375 | -5.262 | 1.983 |
iso_o_fe_TTO | 27.7390 | 33.927 | 0.818 | 0.414 | -38.760 | 94.238 |
iso_o_fe_TUN | 23.3131 | 28.513 | 0.818 | 0.414 | -32.574 | 79.200 |
iso_o_fe_TUR | 22.5063 | 27.583 | 0.816 | 0.415 | -31.558 | 76.571 |
iso_o_fe_TZA | 7.9854 | 9.892 | 0.807 | 0.420 | -11.404 | 27.375 |
iso_o_fe_UGA | 6.5171 | 8.169 | 0.798 | 0.425 | -9.496 | 22.530 |
iso_o_fe_UKR | 20.0548 | 24.523 | 0.818 | 0.413 | -28.013 | 68.123 |
iso_o_fe_URY | 29.5266 | 35.983 | 0.821 | 0.412 | -41.003 | 100.056 |
iso_o_fe_USA | 43.2280 | 53.057 | 0.815 | 0.415 | -60.768 | 147.224 |
iso_o_fe_VCT | -0.3596 | 0.304 | -1.184 | 0.236 | -0.955 | 0.236 |
china_21_tariff_change = (np.exp(logt_china_21) - china_export_2018)/china_export_2018
us_21_tariff_change = (np.exp(logt_us_21) - usa_export_2018)/usa_export_2018
tariff_21 = pd.Series(dict(China_exports_relative_change = china_21_tariff_change,
US_exports_relative_change = us_21_tariff_change))
tariff_21
China_exports_relative_change -0.999720 US_exports_relative_change -0.913481 dtype: float64
Firstly Create the list for 2019 China and the U.S. data. Due to the reason that the databse has a selected base year value, here I'm applying the GDP growth rate on 2013 data, since the original data source in 2014 might not be the full year annual data.
The data sources from World Bank Open Data
china_gdp_2018 = 6.070769e+12 *(1+0.07769)* (1+0.06905) *(1+0.06737) * (1+0.06757)*(1+0.06567)
usa_gdp_2018 = 1.072587e+13 * (1+0.02452)* (1+0.02811)* (1+0.01567) * (1+ 0.02217)* (1+ 0.02927)
9.401673e+10
china_export_2018 = 2.836604e+11*(468474.9/440430.0)*(483201.7/468474.9)*(462420.0/483201.7)*(505220.2/462420.0)*(539675.6/505220.2)
usa_export_2018 = 9.401673e+10*(123657.2/121746.2)*(115873.4/123657.2)*(115594.8/115873.4)*(129797.6/115594.8)*(120148.1/129797.6)
np.log(usa_gdp_2018)
30.12195219656553
X_china_2018 = [0,0,0,0,0,0,1,1,-7,10,0,0,0,np.log(china_gdp_2018),np.log(usa_gdp_2018),9.346115]
X_usa_2018 = [0,0,0,0,0,0,1,1,10,-7,0,0,0,np.log(usa_gdp_2018),np.log(china_gdp_2018),9.346115]
X_predict = [X_china_2018,X_usa_2018]
X_flow_2018 = [np.log(china_export_2018),np.log(usa_export_2018)]
linear_gravity_pre = lr_model.predict(X_predict)
ridge_gravity_pre = clf.predict(X_predict)
lasso_min_pre = lasso_alpha_model.predict(X_predict)
lasso_default_pre = lasso_gravity_model.predict(X_predict)
train_n_test_pre = mod_train_test.predict(X_predict)
neural_2_layer_pre = nn_scaled_model_2_layer_more.predict(X_predict)
neural_3_layer_pre = nn_scaled_model_3_layer.predict(X_predict)
Prediction = pd.Series(dict(gravity_model = linear_gravity_pre , ridge_gravity= ridge_gravity_pre,
lasso_min_alpha =lasso_min_pre , lasso_default_alpha = lasso_default_pre,
train_n_test = train_n_test_pre,
neural_2_layer = neural_2_layer_pre,neural_3_layer = neural_3_layer_pre))
Prediction
X_flow_2018
[26.574260280831858, 25.253525180617917]
X_flow_2018
[26.574260280831858, 25.253525180617917]
p = df_final_version_5.set_index('iso_o').loc['USA']
p.loc[p['iso_d'] == 'CHN']
year | iso_d | region_d | contiguity | landlocked_d | island_d | common_language | colony_of_origin_after45 | agree_fta | member_wto_o | member_wto_d | polity_o | polity_d | hostility_level_o | hostility_level_d | sanction_imposition_trade | log_flow | log_GDP_o | log_GDP_d | log_distance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
iso_o | ||||||||||||||||||||
USA | 2005 | CHN | east_asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 10.0 | -7.0 | 0 | 0 | 1.0 | 24.017134 | 29.605314 | 27.852343 | 9.346115 |
USA | 2006 | CHN | east_asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 10.0 | -7.0 | 0 | 0 | 1.0 | 24.196566 | 29.649982 | 28.025494 | 9.346115 |
USA | 2007 | CHN | east_asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 10.0 | -7.0 | 0 | 0 | 1.0 | 24.278124 | 29.610023 | 28.196758 | 9.346115 |
USA | 2008 | CHN | east_asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 10.0 | -7.0 | 0 | 0 | 1.0 | 24.517729 | 29.711264 | 28.539131 | 9.346115 |
USA | 2009 | CHN | east_asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 10.0 | -7.0 | 3 | 3 | 1.0 | 24.633760 | 29.856256 | 28.808979 | 9.346115 |
USA | 2010 | CHN | east_asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 10.0 | -7.0 | 0 | 0 | 1.0 | 24.912704 | 29.901562 | 28.994236 | 9.346115 |
USA | 2011 | CHN | east_asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 10.0 | -7.0 | 0 | 0 | 0.0 | 25.032387 | 29.901638 | 29.173536 | 9.346115 |
USA | 2012 | CHN | east_asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 10.0 | -7.0 | 0 | 0 | 0.0 | 25.123058 | 29.956545 | 29.309351 | 9.346115 |
USA | 2013 | CHN | east_asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 10.0 | -7.0 | 0 | 0 | 0.0 | 25.266739 | 30.003679 | 29.434505 | 9.346115 |
USA | 2014 | CHN | east_asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 10.0 | -7.0 | 0 | 0 | 0.0 | 25.263077 | 29.990559 | 29.470449 | 9.346115 |
We could see that in the USITC dataset the hostility level was 3 for both countries in 2009, and we'll use 2,3 and for for simulation, and also add trade sanction to 1 for both countries as well.
X_china_2018_2 = [0,0,0,0,0,0,1,1,-7,10,2,2,1,np.log(china_gdp_2018),np.log(usa_gdp_2018),9.346115]
X_usa_2018_2 = [0,0,0,0,0,0,1,1,10,-7,2,2,1,np.log(usa_gdp_2018),np.log(china_gdp_2018),9.346115]
X_predict_2 = [X_china_2018_2,X_usa_2018_2]
X_flow_2018 = [np.log(china_export_2018),np.log(usa_export_2018)]
linear_gravity_pre_2 = lr_model.predict(X_predict_2)
ridge_gravity_pre_2 = clf.predict(X_predict_2)
lasso_min_pre_2 = lasso_alpha_model.predict(X_predict_2)
lasso_default_pre_2 = lasso_gravity_model.predict(X_predict_2)
train_n_test_pre_2 = mod_train_test.predict(X_predict_2)
neural_2_layer_pre_2 = nn_scaled_model_2_layer_more.predict(X_predict_2)
neural_3_layer_pre_2 = nn_scaled_model_3_layer.predict(X_predict_2)
Prediction_2 = pd.Series(dict(gravity_model = linear_gravity_pre_2 , ridge_gravity= ridge_gravity_pre_2,
lasso_min_alpha =lasso_min_pre_2 , lasso_default_alpha = lasso_default_pre_2,
train_n_test = train_n_test_pre_2,
neural_2_layer = neural_2_layer_pre_2,
neural_3_layer = neural_3_layer_pre_2))
Relative_Change = pd.Series(dict(gravity_model = (np.exp(linear_gravity_pre_2) -np.exp(X_flow_2018))/np.exp(X_flow_2018) ,
ridge_gravity= (np.exp(ridge_gravity_pre_2)-np.exp(X_flow_2018))/np.exp(X_flow_2018),
lasso_min_alpha = (np.exp(lasso_min_pre_2)-np.exp(X_flow_2018))/np.exp(X_flow_2018) ,
lasso_default_alpha = (np.exp(lasso_default_pre_2)-np.exp(X_flow_2018))/np.exp(X_flow_2018),
train_n_test = (np.exp(train_n_test_pre_2)-np.exp(X_flow_2018))/np.exp(X_flow_2018),
neural_2_layer =(np.exp(neural_2_layer_pre_2)-np.exp(X_flow_2018))/np.exp(X_flow_2018),
neural_3_layer = (np.exp(neural_3_layer_pre_2)-np.exp(X_flow_2018))/np.exp(X_flow_2018)))
df_2 = pd.DataFrame.from_dict(Prediction_2)
df_2['China log_exports'], df_2['U.S. log_exports'] = zip(*df_2.pop(0))
df_2_relative = pd.DataFrame.from_dict(Relative_Change)
df_2_relative['China exports percentage change'], df_2_relative['U.S. exports percentage change'] = zip(*df_2_relative.pop(0))
h_score_2 = pd.merge(df_2, df_2_relative,how='outer', left_index=True, right_index=True).round(2)
h_score_2
China log_exports | U.S. log_exports | China exports percentage change | U.S. exports percentage change | |
---|---|---|---|---|
gravity_model | 24.89 | 25.36 | -0.81 | 0.12 |
ridge_gravity | 24.89 | 25.36 | -0.82 | 0.11 |
lasso_min_alpha | 24.85 | 25.31 | -0.82 | 0.06 |
lasso_default_alpha | 23.18 | 23.43 | -0.97 | -0.84 |
train_n_test | 24.78 | 25.25 | -0.83 | -0.00 |
neural_2_layer | 23.47 | 23.25 | -0.96 | -0.86 |
neural_3_layer | 25.91 | 25.46 | -0.49 | 0.23 |
X_flow_2018
[26.574260280831858, 25.253525180617917]
Assigning Trade sanction to 1 and hostility level 3 for the U.S. and China.
X_china_2018_3 = [0,0,0,0,0,0,1,1,-7,10,3,3,1,np.log(china_gdp_2018),np.log(usa_gdp_2018),9.346115]
X_usa_2018_3 = [0,0,0,0,0,0,1,1,10,-7,3,3,1,np.log(usa_gdp_2018),np.log(china_gdp_2018),9.346115]
X_predict_3 = [X_china_2018_3,X_usa_2018_3]
X_flow_2018 = [np.log(china_export_2018),np.log(usa_export_2018)]
linear_gravity_pre_3 = lr_model.predict(X_predict_3)
ridge_gravity_pre_3 = clf.predict(X_predict_3)
lasso_min_pre_3 = lasso_alpha_model.predict(X_predict_3)
lasso_default_pre_3 = lasso_gravity_model.predict(X_predict_3)
train_n_test_pre_3 = mod_train_test.predict(X_predict_3)
neural_2_layer_pre_3 = nn_scaled_model_2_layer_more.predict(X_predict_3)
neural_3_layer_pre_3 = nn_scaled_model_3_layer.predict(X_predict_3)
Prediction_3 = pd.Series(dict(gravity_model = linear_gravity_pre_3,
ridge_gravity= ridge_gravity_pre_3,
lasso_min_alpha =lasso_min_pre_3,
lasso_default_alpha = lasso_default_pre_3,
train_n_test = train_n_test_pre_3,
neural_2_layer = neural_2_layer_pre_3,
neural_3_layer = neural_3_layer_pre_3))
Relative_change_3 = pd.Series(dict(gravity_model = (np.exp(linear_gravity_pre_3) -np.exp(X_flow_2018))/np.exp(X_flow_2018) ,
ridge_gravity= (np.exp(ridge_gravity_pre_3)-np.exp(X_flow_2018))/np.exp(X_flow_2018),
lasso_min_alpha = (np.exp(lasso_min_pre_3)-np.exp(X_flow_2018))/np.exp(X_flow_2018) ,
lasso_default_alpha = (np.exp(lasso_default_pre_3)-np.exp(X_flow_2018))/np.exp(X_flow_2018),
train_n_test = (np.exp(train_n_test_pre_3)-np.exp(X_flow_2018))/np.exp(X_flow_2018),
neural_2_layer =(np.exp(neural_2_layer_pre_3)-np.exp(X_flow_2018))/np.exp(X_flow_2018),
neural_3_layer = (np.exp(neural_3_layer_pre_3)-np.exp(X_flow_2018))/np.exp(X_flow_2018)))
df_3 = pd.DataFrame.from_dict(Prediction_3)
df_3['China log_exports'], df_3['U.S. log_exports'] = zip(*df_3.pop(0))
df_3_relative = pd.DataFrame.from_dict(Relative_change_3)
df_3_relative['China exports percentage change'], df_3_relative['U.S. exports percentage change'] = zip(*df_3_relative.pop(0))
h_score_3 = pd.merge(df_3, df_3_relative,how='outer', left_index=True, right_index=True).round(2)
h_score_3
China log_exports | U.S. log_exports | China exports percentage change | U.S. exports percentage change | |
---|---|---|---|---|
gravity_model | 24.82 | 25.30 | -0.83 | 0.04 |
ridge_gravity | 24.82 | 25.30 | -0.83 | 0.04 |
lasso_min_alpha | 24.85 | 25.31 | -0.82 | 0.06 |
lasso_default_alpha | 23.18 | 23.43 | -0.97 | -0.84 |
train_n_test | 24.68 | 25.15 | -0.85 | -0.10 |
neural_2_layer | 23.73 | 25.12 | -0.94 | -0.12 |
neural_3_layer | 25.62 | 25.08 | -0.61 | -0.16 |
Comparing the OLS Gravity model, Ridge Gravity model ($\alpha$=10), coefficients are rather similar though a regularization is given. Then by the Lasso path, $\alpha$ is set to be 0.006 to minimize the mean square error, which implies that the Gravity model in this original expression was not overfitting per se. To the contrary, the neural network model by setting 2 layers, 3 layers, or training 25% or 30% of the model could not improve the r square nor decrease mse significantly. Political indicators, by setting 0.1 as the threshold, political indicators are statistically significant, and do not create much multicollinearity issue. Therefore, I let the hostility level and trade sanction vary to predict the U.S.-China trade flow.
As a result, China's exports reduces more than that of the U.S. in all scenarios and regression or learning methods presented in this project. U.S. exports to China growth rate do not witness a dramatic fluctuation might be due to the fact that the GDP growth offsets the reduction in trade value.
tariff_21
China_exports_relative_change -0.999720 US_exports_relative_change -0.913481 dtype: float64
h_score_2.columns = pd.MultiIndex.from_product([['hostility_score_2, with trade sanction'], h_score_2.columns])
h_score_3.columns = pd.MultiIndex.from_product([['hostility_score_3, with trade sanction'], h_score_3.columns])
pd.concat([h_score_2, h_score_3], axis = 1)
hostility_score_2, with trade sanction | hostility_score_3, with trade sanction | |||||||
---|---|---|---|---|---|---|---|---|
China log_exports | U.S. log_exports | China exports percentage change | U.S. exports percentage change | China log_exports | U.S. log_exports | China exports percentage change | U.S. exports percentage change | |
gravity_model | 24.89 | 25.36 | -0.81 | 0.12 | 24.82 | 25.30 | -0.83 | 0.04 |
ridge_gravity | 24.89 | 25.36 | -0.82 | 0.11 | 24.82 | 25.30 | -0.83 | 0.04 |
lasso_min_alpha | 24.85 | 25.31 | -0.82 | 0.06 | 24.85 | 25.31 | -0.82 | 0.06 |
lasso_default_alpha | 23.18 | 23.43 | -0.97 | -0.84 | 23.18 | 23.43 | -0.97 | -0.84 |
train_n_test | 24.78 | 25.25 | -0.83 | -0.00 | 24.68 | 25.15 | -0.85 | -0.10 |
neural_2_layer | 23.47 | 23.25 | -0.96 | -0.86 | 23.73 | 25.12 | -0.94 | -0.12 |
neural_3_layer | 25.91 | 25.46 | -0.49 | 0.23 | 25.62 | 25.08 | -0.61 | -0.16 |
r2_mse.sort_values('R^2',ascending = False)
R^2 | MSE | |
---|---|---|
layer_2_50_40 | 0.695249 | 4.959282 |
layer_3_10_20_10 | 0.686665 | 5.502264 |
train_and_test_25 | 0.636677 | 6.381244 |
gravity | 0.635377 | 6.402887 |
ridge_gravity | 0.635377 | 6.402887 |
lasso_alpha_min | 0.635133 | 6.407172 |
lasso_alpha_default | 0.500832 | 8.765536 |
single_variable | 0.313830 | 12.049356 |
bokeh U.S. heat map plot is the modification based on https://towardsdatascience.com/walkthrough-mapping-basics-with-bokeh-and-geopandas-in-python-43f40aa5b7e9