Datasets: Downloading Data from Google Trends

28th May 2014

Neil Lawrence

This data set collection was inspired by a ipython notebook from sahuguet which made queries to google trends and downloaded the results. We've modified the download to cache the results of a query: making multiple calls to the google API results in a block due to terms of service violations, cacheing the data locally prevents this happening.

In [1]:
import pods
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
# calling without arguments uses the default query terms
data = pods.datasets.google_trends() 
Reading cached data for google trends. To refresh the cache set 'refresh_data=True' when calling this function.
Query terms:  big data, data science, machine learning

The default query terms are 'big data', 'data science' and 'machine learning'. The dictionary returned from the call contains the standard 'X' and 'y' keys that are ready to be used in the GPy toolkit as inputs to the Gaussian process. In this case the 'X' variables are the time (first column) and an index representing the query.

In [3]:
print(data['X'][284, :])
[126   1]

So the 284th element of X contains is the 34th time point of the query term 2, which in this case is the 34th time point of the 'machine learning' time series. The value of the time series at that point is given by the corresponding row of Y

In [4]:
print(data['Y'][284, :])
[ 16.]

The dictionary also contains a pandas data frame of the trend data, which is in line with what sahuguet originally returned.

In [5]:
data['data frame'].describe()
Out[5]:
big data data science machine learning
count 158.000000 158.000000 158.000000
mean 30.240506 13.556962 21.816456
std 33.957341 10.097293 13.176817
min 3.000000 4.000000 9.000000
25% 5.000000 7.250000 14.000000
50% 7.000000 9.000000 17.000000
75% 65.000000 15.000000 24.000000
max 100.000000 52.000000 78.000000

And we can plot the trends data to see what the effect is.

In [6]:
data['data frame'].set_index('Date', inplace=True)
fig, ax = plt.subplots(figsize=(5,5))
data['data frame'].plot(ax=ax)
ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=45)
Out[6]:
[<matplotlib.text.Text at 0x116113048>,
 <matplotlib.text.Text at 0x1184f14a8>,
 <matplotlib.text.Text at 0x118549e80>,
 <matplotlib.text.Text at 0x11854f8d0>,
 <matplotlib.text.Text at 0x11854fe80>,
 <matplotlib.text.Text at 0x1185528d0>,
 <matplotlib.text.Text at 0x118556320>,
 <matplotlib.text.Text at 0x118556d30>,
 <matplotlib.text.Text at 0x11855b780>]

Dogs, Cats and Rabbits

Another data set we might consider downloading from google trends is different pets. Below we consider cats, dogs and rabbits.

In [7]:
data = pods.datasets.google_trends(['cats', 'dogs', 'rabbits'])
data['data frame'].set_index('Date', inplace=True)
fig, ax = plt.subplots(figsize=(5,5))
data['data frame'].plot(ax=ax)
ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=45)
Reading cached data for google trends. To refresh the cache set 'refresh_data=True' when calling this function.
Query terms:  cats, dogs, rabbits
Out[7]:
[<matplotlib.text.Text at 0x118b22160>,
 <matplotlib.text.Text at 0x118b290f0>,
 <matplotlib.text.Text at 0x118b66e80>,
 <matplotlib.text.Text at 0x118b6b8d0>,
 <matplotlib.text.Text at 0x118b6e320>,
 <matplotlib.text.Text at 0x118b6ed30>,
 <matplotlib.text.Text at 0x118b73780>,
 <matplotlib.text.Text at 0x118b73f60>,
 <matplotlib.text.Text at 0x118b759b0>]

Here we've plotted the data in the same manner as sahuguet suggested in his original notebook, using the plotting facility of pandas.

Games Consoles

Finally we can try and compare different games console popularity.

In [8]:
data = pods.datasets.google_trends(['xbox one', 'wii u', 'ps4'])
Reading cached data for google trends. To refresh the cache set 'refresh_data=True' when calling this function.
Query terms:  ps4, wii u, xbox one
In [9]:
data['data frame'].set_index('Date', inplace=True)
fig, ax = plt.subplots(figsize=(5,5))
data['data frame'].plot(ax=ax)
ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=45)
Out[9]:
[<matplotlib.text.Text at 0x118ba6be0>,
 <matplotlib.text.Text at 0x118bab4e0>,
 <matplotlib.text.Text at 0x118cc4da0>,
 <matplotlib.text.Text at 0x118ccb7f0>,
 <matplotlib.text.Text at 0x118ccd240>,
 <matplotlib.text.Text at 0x118ccdc50>,
 <matplotlib.text.Text at 0x118cd26a0>,
 <matplotlib.text.Text at 0x118cd60f0>,
 <matplotlib.text.Text at 0x118cd6b00>]
In [ ]: