We ran this query in Google BigQuery
#standardSQL
SELECT
COUNT(*) AS num_downloads,
SUBSTR(_TABLE_SUFFIX, 1, 6) AS `month`
FROM `the-psf.pypi.downloads*`
WHERE file.project = 'dask'
AND details.system.name = 'Windows'
-- Only query the last 30 days of history
GROUP BY `month`
ORDER BY `month` DESC
Then we downloaded that file and plotted it with Pandas.
!cat /home/mrocklin/Downloads/results-20200113-213507.csv
num_downloads,month 12075,202001 29687,201912 27060,201911 29474,201910 23942,201909 25695,201908 29664,201907 27482,201906 33289,201905 22954,201904 53842,201903 35855,201902 33247,201901 29777,201812 32735,201811 26656,201810 24305,201809 22880,201808 7387,201807 7394,201806 3394,201805 3353,201804 2681,201803 2954,201802 2870,201801 3273,201712 3477,201711 2187,201710 1844,201709 2255,201708 2104,201707 1552,201706 2242,201705 1390,201704 4686,201703 4268,201702 4624,201701 3304,201612 3330,201611 3011,201610 3067,201609 3271,201608 3173,201607 3442,201606 1168,201605 148,201603 662,201602 364,201601
import pandas as pd
%matplotlib inline
def parse_date(m):
return pd.Timestamp(year=m // 100, month=m % 100, day=1)
df = pd.read_csv("/home/mrocklin/Downloads/results-20200113-213507.csv")
df["month"] = df.month.map(parse_date)
df = df.set_index("month")
df.head()
num_downloads | |
---|---|
month | |
2020-01-01 | 12075 |
2019-12-01 | 29687 |
2019-11-01 | 27060 |
2019-10-01 | 29474 |
2019-09-01 | 23942 |
ax = df.plot(title="Dask Monthly Windows Downloads", legend=None, figsize=(14, 6))
ax.get_figure().savefig("dask-windows-downloads.png")