Get BTC data (FGI, GST, BHP)¶

For my project to predict BTC price, i had to get data from different sources with different methods (using package, webscraping and API).
So here are the lines of code i'm using to get a CSV file with :

Crypto Fear & Greed Index (FGI)
Bitcoin Google Searches Trend (GST)
Bitcoin Historical Price (BHP)

Librairies Importation¶

In [3]:

# Main
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Web
import requests
from bs4 import BeautifulSoup
#from IPython.display import display_html

# More
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
pd.set_option('display.max_rows', 400)

# Date format (ex: 190913)
from datetime import date
#today = date.today().strftime("%Y%m%d")[2:]

# PyTrends
from pytrends.request import TrendReq
pytrend = TrendReq()

Dataframe Preparation¶

Dataframe 1: Crypto fear & greed index (fgi)¶

Since February 2018, Alternative.me provides a fear and greed index based on:

Volatility (25 %)
Market Momentum/Volume (25%)
Social Media (15%)
Surveys (15%)
BTC Dominance (10%)
Trends (10%)

Note: The API returns a CSV code, but i didn't find how to easily read/convert it.
PS: There are some duplicates and missing values in these data. See below in 1.2.4

In [4]:

url = 'https://api.alternative.me/fng/?limit=0&format=csv&date_format=cn'
r = requests.get(url)
fgi = BeautifulSoup(r.text, 'lxml')

fgi = str(fgi)
fgi = fgi[fgi.find('date\n')+5:fgi.find('\n\t]')] 
fgi = fgi.replace('\n',',').split(',')

fgi = pd.DataFrame({'Date':fgi[::3],'Score':fgi[1:][::3],'Category':fgi[2:][::3]})
fgi['Date'] = pd.to_datetime(fgi['Date'])
fgi = fgi.set_index('Date').sort_index()

fgi.head()

Out[4]:

	Score	Category
Date
2018-02-01	30	Fear
2018-02-02	15	Extreme Fear
2018-02-03	40	Fear
2018-02-04	24	Extreme Fear
2018-02-05	11	Extreme Fear

Dataframe 2: Bitcoin Google searches trend (gst)¶

In a recent notebook, i discover Pytrends (https://github.com/GeneralMills/pytrends) which is a great tool for web scraping Google Trends.
(Thanks to its team to share it)

You can get data from 2004. But note that if you choose a too large period (~8 months), Google Trends returns weekly score, not daily.

The period from today (2019-09-13) to first Fear&Greed Index data (2018-02-01) exceeds this limit to get daily data. So the function interest_over_time() returns weekly data. But here is the magic and what we interest in. Pytrends gives a function build_payload() in order to set timeframe, localisation, etc.. (more on the documentation).

(More: PyTrends has a function get_historical_interest() to get hours data, and many others possibilities.)

What we need to do, it to set period time of 8 months in order to have our daily trend score.

In [32]:

# Keywords list (max 5 words/expressions)
kw_list = ['bitcoin trading']

# 2018-02-01 to 2018-05-11
pytrend.build_payload(kw_list, cat=0, timeframe='2018-02-01 2018-05-11', geo='', gprop='')
gst_1 = pytrend.interest_over_time()

# 2018-05-12 to 2019-01-22
pytrend.build_payload(kw_list, cat=0, timeframe='2018-05-12 2019-01-22', geo='', gprop='')
gst_2 = pytrend.interest_over_time()

# 2019-01-23 to 2019-09-23
pytrend.build_payload(kw_list, cat=0, timeframe='2019-01-23 2019-09-23', geo='', gprop='')
gst_3 = pytrend.interest_over_time()

# Concat
frames = [gst_1, gst_2, gst_3]
gst = pd.concat(frames)

gst.head()

Out[32]:

	bitcoin trading	isPartial
date
2018-02-01	82	False
2018-02-02	98	False
2018-02-03	64	False
2018-02-04	67	False
2018-02-05	83	False

Dataframe 3: Bitcoin historical price (bhp)¶

No surprise here, i already webscrap a few times the BTC price on coinmarketcap.com.

In [8]:

def get_btc_price(start_date='20170101', end_date='20190913'):
    ''' 
    Webscraping BTC price on coinmarketcap.com
    ''' 
    url = f'https://coinmarketcap.com/currencies/bitcoin/historical-data/?start={start_date}&end={end_date}'
    btc = BeautifulSoup(requests.get(url).text, 'lxml')
    btc = pd.read_html(str(btc.find_all('table', class_='table')[0]))[0]
    btc['Date'] = pd.to_datetime(btc['Date'])
    
    return btc

In [10]:

bhp = get_btc_price(start_date='20180201', end_date='20190923')
bhp = bhp.set_index('Date').sort_index()

bhp.head()

Out[10]:

	Open*	High	Low	Close**	Volume	Market Cap
Date
2018-02-01	10237.30	10288.80	8812.28	9170.54	9959400448	154428564694
2018-02-02	9142.28	9142.28	7796.49	8830.75	12726899712	148725283812
2018-02-03	8852.12	9430.75	8251.63	9174.91	7263790080	154540000411
2018-02-04	9175.70	9334.87	8031.22	8277.01	7073549824	139433682759
2018-02-05	8270.54	8364.84	6756.68	6955.27	9285289984	117184385122

Dataframe 1 (fgi) + Dataframe 2 (gst) + Dataframe 3 (bhp)¶

Let's join our 3 dataframes. For each, we need to check the period, and the shape.
A necessary step, because we will find duplicates and missing values in one dataframe.

In [33]:

df_fgi = fgi.copy()
df_gst = gst.copy()
df_bhp = bhp.copy()

#print(df_fgi.index[-1]) >>> 2019-09-23 (+3 days)
#print(df_gst.index[-1]) >>> 2019-09-20
#print(df_bhp.index[-1]) >>> 2019-09-22 (+2 day)

df_fgi = df_fgi[:-3]
df_gst = df_gst
df_bhp = df_bhp[:-2]

#print(df_fgi.shape) >>> (596, 2) (Something weird with this dataframe)
#print(df_gst.shape) >>> (597, 2)
#print(df_bhp.shape) >>> (597, 6)

#df_fgi.reset_index().shape >>> (596, 3)
#df_fgi.reset_index().drop_duplicates(subset='Date', keep='last').shape >>> (594, 3)

# Found two duplicate values: 2019-02-17, 2019-01-30
df_fgi = df_fgi.reset_index().drop_duplicates(subset='Date', keep='last')

# Found three missing values: 2018-04-14, 2018-04-15, 2018-04-16
# I had to create realistic values
# And reported the bug to API team, maybe it will be fix
val = pd.DataFrame({'Date':['2018-04-14', '2018-04-15', '2018-04-16'],'Score':['25', '26', '24'],'Category':['Extreme Fear', 'Fear', 'Extreme Fear']})
val['Date'] = pd.to_datetime(val['Date'])
df_fgi = df_fgi.append(val).set_index('Date').sort_index()

# Okay
df = df_fgi.join(df_gst).join(df_bhp)

df.head()

Out[33]:

	Score	Category	bitcoin trading	isPartial	Open*	High	Low	Close**	Volume	Market Cap
Date
2018-02-01	30	Fear	82	False	10237.30	10288.80	8812.28	9170.54	9959400448	154428564694
2018-02-02	15	Extreme Fear	98	False	9142.28	9142.28	7796.49	8830.75	12726899712	148725283812
2018-02-03	40	Fear	64	False	8852.12	9430.75	8251.63	9174.91	7263790080	154540000411
2018-02-04	24	Extreme Fear	67	False	9175.70	9334.87	8031.22	8277.01	7073549824	139433682759
2018-02-05	11	Extreme Fear	83	False	8270.54	8364.84	6756.68	6955.27	9285289984	117184385122

Export dataframe as CSV file¶

In [ ]:

# Let's save this code (date format 190913)
today = date.today().strftime("%Y%m%d")[2:]

df.to_csv(f'{today}_bitcoin_predictions_data.csv')

In [ ]: