Dask Monthly Windows Downloads¶

We ran this query in Google BigQuery

#standardSQL
SELECT
  COUNT(*) AS num_downloads,
  SUBSTR(_TABLE_SUFFIX, 1, 6) AS `month`
FROM `the-psf.pypi.downloads*`
WHERE file.project = 'dask'
  AND details.system.name = 'Windows'
  -- Only query the last 30 days of history
GROUP BY `month`
ORDER BY `month` DESC

Then we downloaded that file and plotted it with Pandas.

In [1]:

!cat /home/mrocklin/Downloads/results-20200113-213507.csv

num_downloads,month
12075,202001
29687,201912
27060,201911
29474,201910
23942,201909
25695,201908
29664,201907
27482,201906
33289,201905
22954,201904
53842,201903
35855,201902
33247,201901
29777,201812
32735,201811
26656,201810
24305,201809
22880,201808
7387,201807
7394,201806
3394,201805
3353,201804
2681,201803
2954,201802
2870,201801
3273,201712
3477,201711
2187,201710
1844,201709
2255,201708
2104,201707
1552,201706
2242,201705
1390,201704
4686,201703
4268,201702
4624,201701
3304,201612
3330,201611
3011,201610
3067,201609
3271,201608
3173,201607
3442,201606
1168,201605
148,201603
662,201602
364,201601

In [2]:

import pandas as pd

%matplotlib inline

In [3]:

def parse_date(m):
    return pd.Timestamp(year=m // 100, month=m % 100, day=1)

df = pd.read_csv("/home/mrocklin/Downloads/results-20200113-213507.csv")
df["month"] = df.month.map(parse_date)
df = df.set_index("month")
df.head()

Out[3]:

	num_downloads
month
2020-01-01	12075
2019-12-01	29687
2019-11-01	27060
2019-10-01	29474
2019-09-01	23942

In [4]:

ax = df.plot(title="Dask Monthly Windows Downloads", legend=None, figsize=(14, 6))

In [5]:

ax.get_figure().savefig("dask-windows-downloads.png")