Airqdata: analysis tools for air quality data

Notebook version with output and functional links: https://nbviewer.jupyter.org/gist/dr-1/450c275b1ad2cbf88e9c4325c5d032bc

In [1]:
# Prepare notebook to render plots
%matplotlib inline
In [2]:
import pandas as pd

# Import the analysis package to make it available in the notebook
import airqdata

# More convenient in some cases: import individual names from the package to avoid having
# to prepend them with "airqdata."
from airqdata import (compare_nearest_irceline_sensors, compare_sensor_data, describe,
                      influencair, irceline, luftdaten)

# Limit output length for readability
pd.set_option("display.max_rows", 10)

InfluencAir / Luftdaten.info resources

Download list of sensors from InfluencAir's Google Sheet

In [3]:
influencair.Metadata(refresh_cache=True)
sensor_info = influencair.Metadata.sensors
sensor_info.head(4)
Downloading InfluencAir sensor information
Out[3]:
Chip ID PM Sensor ID Hum/Temp Sensor ID Label Address Floor Side
0 4021549 3445 NaN NaN Avenue Princesse Elisabeth 28, Schaarbeek 2 Street
1 4022301 3803 NaN NaN Rue Brogniez, Anderlecht 2 Street
2 4020466 3805 NaN NaN Rue de l'Équateur, Uccle NaN Garden
3 4018142 3893 3894 Cinquantenaire I Avenue de la Renaissance 10, Bruxelles 4 Street
In [4]:
len(sensor_info)
Out[4]:
97

Create a Sensor object and get the sensor's metadata and current measurements

In [5]:
# PM sensor in Anderlecht
demo_sensor_id = "3803"
demo_sensor = influencair.Sensor(demo_sensor_id, refresh_cache=True)
Downloading sensor 3803 metadata from luftdaten.info
In [6]:
demo_sensor.influencair_metadata
Out[6]:
Chip ID                     4022301
Label                           NaN
Address    Rue Brogniez, Anderlecht
Floor                             2
Side                         Street
Name: 1, dtype: object
In [7]:
demo_sensor.luftdaten_metadata
Out[7]:
id                                   2400692506
location.altitude                          20.0
location.country                             BE
location.id                                1917
location.latitude                       50.8380
                                       ...     
sensor.id                                  3803
sensor.pin                                    1
sensor.sensor_type.id                        14
sensor.sensor_type.manufacturer    Nova Fitness
sensor.sensor_type.name                  SDS011
Name: metadata, Length: 12, dtype: object
In [8]:
demo_sensor.luftdaten_metadata_url
Out[8]:
'https://api.luftdaten.info/v1/sensor/3803/'
In [9]:
# Luftdaten.info provides current measurements along with the sensor metadata.
demo_sensor.current_measurements
Out[9]:
{'pm10': 4.0, 'pm2.5': 3.2}

Open data history graphs of the sensor in a browser

In [10]:
demo_sensor.open_madavi_graphs()

Retrieve data history

Data are retrieved from cache or server and then cleaned.

In [11]:
demo_sensor.get_measurements(start_date="2018-11-04",
                             end_date="2018-11-08")
Using cached luftdaten.info data for sensor 3803 on 2018-11-04
Using cached luftdaten.info data for sensor 3803 on 2018-11-05
Using cached luftdaten.info data for sensor 3803 on 2018-11-06
Using cached luftdaten.info data for sensor 3803 on 2018-11-07
Using cached luftdaten.info data for sensor 3803 on 2018-11-08

Now we can analyze the measurement data.

Inspect, summarize and plot data

In [12]:
demo_sensor.measurements
Out[12]:
pm10 pm2.5
timestamp
2018-11-04 17:14:07+00:00 15.17 9.87
2018-11-04 17:16:44+00:00 40.30 20.87
2018-11-04 17:19:12+00:00 38.33 18.20
2018-11-04 17:21:40+00:00 29.07 17.40
2018-11-04 17:24:08+00:00 31.30 15.20
... ... ...
2018-11-08 23:50:03+00:00 3.57 2.73
2018-11-08 23:52:30+00:00 7.10 3.40
2018-11-08 23:54:58+00:00 5.63 3.80
2018-11-08 23:57:26+00:00 8.10 3.30
2018-11-08 23:59:53+00:00 5.47 3.73

2455 rows × 2 columns

In [13]:
describe(demo_sensor.measurements)
Out[13]:
pm10 pm2.5
count 2455.00 2455.00
mean 21.40 14.12
std 48.05 30.70
min 1.10 0.90
1% 1.47 1.10
50% 7.23 4.73
99% 255.76 154.81
max 448.90 325.67
In [14]:
demo_sensor.plot_measurements()
Out[14]:
([<Figure size 864x576 with 1 Axes>, <Figure size 864x576 with 1 Axes>],
 [<matplotlib.axes._subplots.AxesSubplot at 0x7fa5f7297da0>,
  <matplotlib.axes._subplots.AxesSubplot at 0x7fa5f71d4ef0>])

Inspect, summarize and plot hourly means

In [15]:
demo_sensor.get_hourly_means()
Out[15]:
pm10 pm2.5
Period
2018-11-04 17:00 206.24 136.97
2018-11-04 18:00 240.40 159.21
2018-11-04 19:00 248.83 149.59
2018-11-04 20:00 193.55 116.36
2018-11-04 21:00 144.24 88.44
... ... ...
2018-11-08 19:00 18.66 8.71
2018-11-08 20:00 10.80 5.38
2018-11-08 21:00 8.94 4.74
2018-11-08 22:00 7.60 4.12
2018-11-08 23:00 6.57 3.68

103 rows × 2 columns

In [16]:
describe(demo_sensor.get_hourly_means())
Out[16]:
pm10 pm2.5
count 103.00 103.00
mean 21.87 14.43
std 46.02 29.12
min 1.61 1.18
1% 1.65 1.23
50% 7.50 4.74
99% 239.72 149.34
max 248.83 159.21
In [17]:
demo_sensor.plot_hourly_means()
Out[17]:
([<Figure size 864x576 with 1 Axes>, <Figure size 864x576 with 1 Axes>],
 [<matplotlib.axes._subplots.AxesSubplot at 0x7fa5f71297b8>,
  <matplotlib.axes._subplots.AxesSubplot at 0x7fa5f70f1cf8>])

Check distribution of sample intervals

Time series analyses tend to be easier with regularly spaced intervals. How regular are ours? Ideally all data points will be in the same interval group.

In [18]:
demo_sensor.intervals.head(10)
Out[18]:
00:02:28    1488
00:02:27     606
00:02:29     203
00:02:33      52
00:02:30      25
00:02:34      16
00:02:32      14
00:05:22      13
00:05:21       9
00:02:31       7
Name: timestamp, dtype: int64

List sensors near a given location

Defaults to searching within an 8 kilometer radius around the center of Brussels

In [19]:
luftdaten.search_proximity()
Out[19]:
sensor_type latitude longitude distance
sensor_id
17129 SDS011 50.85 4.35 0.21
17719 SDS011 50.85 4.35 0.42
17720 DHT22 50.85 4.35 0.42
17547 DHT22 50.85 4.36 0.66
17546 SDS011 50.85 4.36 0.66
... ... ... ... ...
15960 SDS011 50.89 4.43 7.54
13637 SDS011 50.92 4.36 7.57
13638 DHT22 50.92 4.36 7.57
13128 DHT22 50.92 4.36 7.60
13127 SDS011 50.92 4.36 7.60

167 rows × 4 columns

Using different location parameters

In [20]:
luftdaten.search_proximity(lat=51.22, lon=4.41, radius=5)  # Antwerp
Out[20]:
sensor_type latitude longitude distance
sensor_id
5937 SDS011 51.22 4.41 0.14
5938 DHT22 51.22 4.41 0.14
14110 SDS011 51.21 4.45 2.87
14111 DHT22 51.21 4.45 2.87
2588 SDS011 51.20 4.44 3.10
2589 DHT22 51.20 4.44 3.10
7454 SDS011 51.21 4.45 3.34
10100 SDS011 51.26 4.44 4.60
In [21]:
(near_sensors,
 hourly_means) = luftdaten.evaluate_near_sensors(start_date="2018-11-10",
                                                 end_date="2018-11-13",
                                                 radius=1,
                                                 quiet=True)
/usr/lib/python3.7/site-packages/matplotlib/axes/_base.py:3604: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')
In [22]:
hourly_means
Out[22]:
pm10 pm2.5
14452 14464 14532 16926 17107 17129 17546 17719 14452 14464 14532 16926 17107 17129 17546 17719
Period
2018-11-10 00:00 5.83 5.99 NaN 6.63 8.23 9.92 8.85 7.90 3.80 4.91 NaN 2.44 3.96 5.42 3.26 4.39
2018-11-10 01:00 4.39 5.47 NaN 6.42 7.44 9.83 7.48 7.18 3.02 4.34 NaN 2.10 3.47 4.49 2.80 3.94
2018-11-10 02:00 3.13 NaN NaN 4.16 5.21 5.88 5.51 4.68 2.06 NaN NaN 1.51 2.47 3.21 1.96 2.85
2018-11-10 03:00 2.46 NaN NaN 3.57 3.91 3.85 5.28 3.12 1.54 NaN NaN 1.05 1.67 2.04 1.46 1.91
2018-11-10 04:00 2.86 3.42 NaN 3.15 3.98 3.76 4.33 3.53 1.78 2.88 NaN 1.30 1.95 2.19 1.54 2.15
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-11-13 19:00 17.16 13.03 NaN 20.31 23.39 21.25 22.95 18.01 6.83 5.46 NaN 4.90 7.98 8.56 6.23 6.49
2018-11-13 20:00 18.69 14.94 NaN 22.80 25.76 22.17 24.37 19.19 6.84 6.12 NaN 5.55 8.43 8.11 6.12 7.04
2018-11-13 21:00 36.58 31.65 NaN 43.52 42.18 39.97 45.23 31.58 13.24 10.47 NaN 8.66 12.41 13.54 10.66 10.26
2018-11-13 22:00 21.72 27.62 NaN 27.50 26.64 24.07 28.66 24.36 8.26 9.51 NaN 6.46 8.54 8.68 7.26 8.69
2018-11-13 23:00 19.22 19.21 NaN 26.78 23.92 20.81 25.05 21.19 7.82 7.94 NaN 6.07 7.47 7.53 6.44 7.66

96 rows × 16 columns

Irceline.be resources

Get IRCELINE metadata

IRCELINE provides information about

  • the phenomena it measures
  • the stations where those phenomena are measured
  • the sensors that measure them (represented as time series)
In [23]:
irceline.Metadata()
Using cached phenomenon metadata
Using cached station metadata
Using cached time series metadata
Out[23]:
<airqdata.irceline.Metadata at 0x7fa5f70888d0>
In [24]:
irceline.Metadata.phenomena
Out[24]:
label
id
1 Sulphur dioxide
5 Particulate Matter < 10 µm
7 Ozone
8 Nitrogen dioxide
10 Carbon Monoxide
... ...
6001 Particulate Matter < 2.5 µm
6002 Particulate Matter < 1 µm
61102 wind direction
61110 wind speed (scalar)
62101 temperature

19 rows × 1 columns

In [25]:
irceline.Metadata.stations
Out[25]:
label lat lon
id
1030 40AL01 - LINKEROEVER 51.24 4.39
1031 40AL02 - BEVEREN 51.30 4.23
1032 40AL03 - BEVEREN 51.25 4.20
1033 40AL04 - BEVEREN 51.29 4.29
1034 40AL05 - BEVEREN 51.26 4.28
... ... ... ...
1240 47E715 - ZUIENKERKE 51.25 3.17
1241 47E716 - MARIAKERKE 51.07 3.68
1242 47E804 - Kallo 51.28 4.31
1716 47E814 - Ham 51.08 5.13
1725 T2H801 - Zwijndrecht 51.24 4.33

114 rows × 3 columns

In [26]:
irceline.Metadata.get_stations_by_name("bru")
Out[26]:
label lat lon
id
1110 41B004 - Bruxelles (Sainte-Catherine) 50.85 4.35
1112 41B006 - Bruxelles (Parlement UE) 50.84 4.37
1711 41B008 - Brussel (Beliardstraat) 50.84 4.38
In [27]:
irceline.Metadata.time_series
Out[27]:
label phenomenon unit station_id station_label station_lat station_lon
id
6522 1,2-XYLENE O-XYLENE 6522 - btx, o-xyleen - pro... 1,2-XYLENE O-XYLENE µg/m³ 1112 41B006 - Bruxelles (Parlement UE) 50.84 4.37
6202 Benzene 6202 - ? - procedure, 40GK09 - GENK Benzene µg/m³ 1045 40GK09 - GENK 50.93 5.49
6274 Benzene 6274 - Synspec - procedure, 40LD01 - L... Benzene µg/m³ 1058 40LD01 - LAAKDAL 51.11 5.00
6283 Benzene 6283 - Synspec - procedure, 40LD02 - L... Benzene µg/m³ 1059 40LD02 - LAAKDAL 51.12 5.02
6487 Benzene 6487 - Airmotec 10000S plus benzeen - ... Benzene µg/m³ 1107 40ZL01 - ZELZATE 51.20 3.81
... ... ... ... ... ... ... ...
99904 wind speed (scalar) 99904 - Unknown device - p... wind speed (scalar) m/s 1198 44M705 - ROESELARE 50.95 3.15
99907 wind speed (scalar) 99907 - Unknown device - p... wind speed (scalar) m/s 1207 44R701 - GENT 51.06 3.73
99910 wind speed (scalar) 99910 - Unknown device - p... wind speed (scalar) m/s 1127 42M802 - ANTWERPEN 51.26 4.42
99916 wind speed (scalar) 99916 - Unknown device - p... wind speed (scalar) m/s 1118 41R001 - Molenbeek-Saint-Jean 50.85 4.33
99940 wind speed (scalar) 99940 - Unknown device - p... wind speed (scalar) m/s 1122 41R012 - UCCLE 50.80 4.36

505 rows × 7 columns

How many stations measure a given phenomenon?

In [28]:
irceline.Metadata.time_series["phenomenon"].value_counts()
Out[28]:
Nitrogen dioxide               91
Nitrogen monoxide              91
Particulate Matter < 10 µm     66
Particulate Matter < 2.5 µm    64
Sulphur dioxide                53
                               ..
1,2-XYLENE O-XYLENE             1
Toluene                         1
Ethylbenzene                    1
Particulate Matter < 1 µm       1
M+P-xylene                      1
Name: phenomenon, Length: 19, dtype: int64

How many phenomena does a given station measure?

In [29]:
irceline.Metadata.time_series["station_label"].value_counts().head()
Out[29]:
44R701 - GENT                    12
44N029 - HOUTEM                  11
41R012 - UCCLE                   11
41R001 - Molenbeek-Saint-Jean    11
42M802 - ANTWERPEN                9
Name: station_label, dtype: int64
In [30]:
pd.set_option("display.max_rows", 6)

Where is a given phenomenon measured?

In [31]:
irceline.Metadata.query_time_series(phenomenon="ethylbenzene")
Out[31]:
label phenomenon unit station_id station_label station_lat station_lon
id
6521 Ethylbenzene 6521 - btx, ethylbenzeen - proced... Ethylbenzene µg/m³ 1112 41B006 - Bruxelles (Parlement UE) 50.84 4.37

Where is PM2.5 measured?

In [32]:
irceline.Metadata.get_pm25_time_series()
Out[32]:
label phenomenon unit station_id station_label station_lat station_lon
id
100001 Particulate Matter < 2.5 µm 100001 - Unknown d... Particulate Matter < 2.5 µm µg/m³ 1044 40GK06 - GENK 50.93 5.47
100005 Particulate Matter < 2.5 µm 100005 - Unknown d... Particulate Matter < 2.5 µm µg/m³ 1070 40RL01 - ROESELARE 50.95 3.12
100006 Particulate Matter < 2.5 µm 100006 - Unknown d... Particulate Matter < 2.5 µm µg/m³ 1066 40OB01 - OOSTROZEBEK 50.92 3.31
... ... ... ... ... ... ... ...
99997 Particulate Matter < 2.5 µm 99997 - Unknown de... Particulate Matter < 2.5 µm µg/m³ 1200 44N012 - MOERKERKE 51.25 3.36
99998 Particulate Matter < 2.5 µm 99998 - Unknown de... Particulate Matter < 2.5 µm µg/m³ 1208 44R710 - DESTELBERGE 51.06 3.78
99999 Particulate Matter < 2.5 µm 99999 - Unknown de... Particulate Matter < 2.5 µm µg/m³ 1048 40HB23 - HOBOKEN 51.17 4.34

64 rows × 7 columns

Where is PM10 measured?

In [33]:
irceline.Metadata.get_pm10_time_series()
Out[33]:
label phenomenon unit station_id station_label station_lat station_lon
id
10600 Particulate Matter < 10 µm 10600 - - procedur... Particulate Matter < 10 µm µg/m³ 1159 42R831 - BERENDRECHT 51.35 4.34
10610 Particulate Matter < 10 µm 10610 - Unknown dev... Particulate Matter < 10 µm µg/m³ 1710 42R834 - Boom 51.09 4.38
10680 Particulate Matter < 10 µm 10680 - Unknown dev... Particulate Matter < 10 µm µg/m³ 1714 40OB02 - Wielsbeke 50.91 3.38
... ... ... ... ... ... ... ...
7148 Particulate Matter < 10 µm 7148 - GRIMM - proc... Particulate Matter < 10 µm µg/m³ 1219 45R510 - CHATELINEAU 50.42 4.52
7151 Particulate Matter < 10 µm 7151 - GRIMM - proc... Particulate Matter < 10 µm µg/m³ 1220 45R511 - MARCINELLE 50.38 4.42
7162 Particulate Matter < 10 µm 7162 - GRIMM - proc... Particulate Matter < 10 µm µg/m³ 1221 45R512 - MARCHIENNE 50.41 4.40

66 rows × 7 columns

What are the closest locations to Etterbeek where IRCELINE measures NO₂?

Using a location in Etterbeek as a reference point: 50.837°N 4.39°E

In [34]:
irceline.Metadata.query_time_series("nitrogen dioxide",
                                    lat_nearest=50.837,
                                    lon_nearest=4.39)
Out[34]:
label phenomenon unit station_id station_label station_lat station_lon distance
id
10614 Nitrogen dioxide 10614 - Unknown device - proc... Nitrogen dioxide µg/m³ 1711 41B008 - Brussel (Beliardstraat) 50.84 4.38 1.06
6516 Nitrogen dioxide 6516 - AC-31M (Environnement)... Nitrogen dioxide µg/m³ 1112 41B006 - Bruxelles (Parlement UE) 50.84 4.37 1.11
6615 Nitrogen dioxide 6615 - AC-31M (Environnement)... Nitrogen dioxide µg/m³ 1119 41R002 - Ixelles 50.83 4.38 1.37
... ... ... ... ... ... ... ... ...
7047 Nitrogen dioxide 7047 - THIS 42C - procedure, ... Nitrogen dioxide µg/m³ 1202 44N029 - HOUTEM 51.02 2.58 128.25
6934 Nitrogen dioxide 6934 - API 200A 1849 - proced... Nitrogen dioxide µg/m³ 1180 43N085 - VIELSALM 50.30 6.00 128.37
6968 Nitrogen dioxide 6968 - Unknown device - proce... Nitrogen dioxide µg/m³ 1185 43N132 - Habay-La-Neuve 49.72 5.63 152.36

91 rows × 8 columns

What does the Uccle station measure?

In [35]:
irceline.Metadata.list_station_time_series("ucc")
Out[35]:
label phenomenon unit station_id station_label
id
10607 Black Carbon 10607 - - procedure, 41R012 - UCCLE Black Carbon µg/m³ 1122 41R012 - UCCLE
6619 Carbon Dioxide 6619 - This model 41H - procedu... Carbon Dioxide ppm 1122 41R012 - UCCLE
6622 Nitrogen dioxide 6622 - AC-31M (Environnement)... Nitrogen dioxide µg/m³ 1122 41R012 - UCCLE
... ... ... ... ... ...
99941 temperature 99941 - Unknown device - procedure... temperature °C 1122 41R012 - UCCLE
99939 wind direction 99939 - Unknown device - proced... wind direction degrees 1122 41R012 - UCCLE
99940 wind speed (scalar) 99940 - Unknown device - p... wind speed (scalar) m/s 1122 41R012 - UCCLE

11 rows × 5 columns

List stations near a location

Defaults to coordinates and radius of Brussels

In [36]:
irceline.Metadata.search_proximity(lat=50.9, lon=4.4, radius=3)
Out[36]:
label lat lon distance
id
1116 41MEU1 - MEUDON 50.90 4.39 0.75
1117 41N043 - HAREN 50.88 4.38 2.18
1232 47E008 - Grimbergen 50.93 4.40 2.92

Create a sensor object from a time series, retrieve its measurements and plot them

In [37]:
irceline_demo_sensor = irceline.Sensor("6615")  # An NO₂ sensor in Ixelles
In [38]:
irceline_demo_sensor.get_measurements(start_date="2018-11-03",
                                      end_date="2018-11-08")
Using cached IRCELINE timeseries data
In [39]:
irceline_demo_sensor.measurements.head()
Out[39]:
Nitrogen dioxide
Period
2018-11-03 00:00 51.5
2018-11-03 01:00 50.5
2018-11-03 02:00 41.5
2018-11-03 03:00 47.0
2018-11-03 04:00 44.5
In [40]:
irceline_demo_sensor.plot_measurements()
Out[40]:
([<Figure size 864x576 with 1 Axes>],
 [<matplotlib.axes._subplots.AxesSubplot at 0x7fa5f6fba668>])

Combining the sources

In [41]:
pd.set_option("display.max_rows", 10)

Which are the closest IRCELINE sensors to a given luftdaten.info sensor that measure the same phenomenon?

In [42]:
nearest = irceline.find_nearest_sensors(demo_sensor, quiet=True)
nearest
Out[42]:
pm10 pm2.5
time series id 6578 6579
label Particulate Matter < 10 µm 6578 - TEOM FDMS - ... Particulate Matter < 2.5 µm 6579 - TEOM FDMS -...
phenomenon Particulate Matter < 10 µm Particulate Matter < 2.5 µm
unit µg/m³ µg/m³
station_id 1118 1118
station_label 41R001 - Molenbeek-Saint-Jean 41R001 - Molenbeek-Saint-Jean
station_lat 51 51
station_lon 4.3 4.3
distance 1.3 1.3

Compare data of a luftdaten.info sensor and the nearest IRCELINE sensors

In [43]:
combined_data, plots = compare_nearest_irceline_sensors(demo_sensor,
                                                        start_date="2018-11-03",
                                                        end_date="2018-11-10",
                                                        quiet=True)

Correlation between the compared values

In [44]:
combined_data.corr()
Out[44]:
Phenomenon pm10 Particulate Matter < 10 µm pm2.5 Particulate Matter < 2.5 µm
Sensor 3803 at 50.838°N 4.332°E 6578 at 41R001 - Molenbeek-Saint-Jean 3803 at 50.838°N 4.332°E 6579 at 41R001 - Molenbeek-Saint-Jean
Affiliation luftdaten.info & InfluencAir IRCELINE luftdaten.info & InfluencAir IRCELINE
Phenomenon Sensor Affiliation
pm10 3803 at 50.838°N 4.332°E luftdaten.info & InfluencAir 1.00 0.06 1.00 0.16
Particulate Matter < 10 µm 6578 at 41R001 - Molenbeek-Saint-Jean IRCELINE 0.06 1.00 0.10 0.97
pm2.5 3803 at 50.838°N 4.332°E luftdaten.info & InfluencAir 1.00 0.10 1.00 0.21
Particulate Matter < 2.5 µm 6579 at 41R001 - Molenbeek-Saint-Jean IRCELINE 0.16 0.97 0.21 1.00

Compare data from any sensors

In [45]:
t_rh_sensor = luftdaten.Sensor("5562")  # Temperature and humidity sensor at Brussels Central Station
combined_data, plot = compare_sensor_data(sensors=[demo_sensor, t_rh_sensor, t_rh_sensor, irceline_demo_sensor],
                                          phenomena=["pm2.5", "temperature", "humidity", "Nitrogen dioxide"],
                                          start_date="2018-11-05",
                                          end_date="2018-11-10",
                                          hourly_means=True,
                                          quiet=True)
Using cached sensor 5562 metadata from luftdaten.info
In [46]:
combined_data.head()
Out[46]:
Phenomenon pm2.5 temperature humidity Nitrogen dioxide
Sensor 3803 at 50.838°N 4.332°E 5562 at 50.846°N 4.357°E 5562 at 50.846°N 4.357°E 6615 at 41R002 - Ixelles
Affiliation luftdaten.info & InfluencAir luftdaten.info luftdaten.info IRCELINE
Period
2018-11-05 00:00 50.34 13.95 84.15 34.5
2018-11-05 01:00 42.88 13.68 84.72 31.5
2018-11-05 02:00 35.42 13.34 85.13 29.5
2018-11-05 03:00 30.12 12.45 88.59 26.5
2018-11-05 04:00 23.71 12.54 87.87 27.5

Correlation between the compared values

In [47]:
combined_data.corr()
Out[47]:
Phenomenon pm2.5 temperature humidity Nitrogen dioxide
Sensor 3803 at 50.838°N 4.332°E 5562 at 50.846°N 4.357°E 5562 at 50.846°N 4.357°E 6615 at 41R002 - Ixelles
Affiliation luftdaten.info & InfluencAir luftdaten.info luftdaten.info IRCELINE
Phenomenon Sensor Affiliation
pm2.5 3803 at 50.838°N 4.332°E luftdaten.info & InfluencAir 1.00 -0.06 0.32 -0.09
temperature 5562 at 50.846°N 4.357°E luftdaten.info -0.06 1.00 -0.74 0.34
humidity 5562 at 50.846°N 4.357°E luftdaten.info 0.32 -0.74 1.00 -0.59
Nitrogen dioxide 6615 at 41R002 - Ixelles IRCELINE -0.09 0.34 -0.59 1.00

Export data for use in another environment

In [48]:
# demo_sensor.measurements.to_csv("demo_sensor_data.csv")

More advanced analysis

We can analyze the measurement data using Pandas' extensive capabilities.

Get more data

In [49]:
demo_sensor.get_measurements(start_date="2018-05-24",
                             end_date="2018-11-23",
                             quiet=True)
data = demo_sensor.measurements["pm2.5"]

# Convert index to local time
data.index = data.index.tz_convert("Europe/Brussels")

describe(data)
Out[49]:
count    90374.00
mean         7.50
std         17.83
min          0.80
1%           1.27
50%          4.30
99%         56.70
max        507.57
Name: pm2.5, dtype: float64

Summarize measurements by day of the week

In [50]:
# Produce a statistical summary of the data grouped by day of the week
grouping_variable = data.index.dayofweek
weekday_summary = (data
                   .groupby(grouping_variable)
                   .describe(percentiles=[0.01, 0.99]))

# Show day names instead of integers
import calendar
calendar.setfirstweekday(1)  # Start week on Monday
weekday_summary.index = [calendar.day_abbr[i]
                         for i in weekday_summary.index]
weekday_summary.index.name = "Day of the Week (Local Time)"

# Get spread values
yspread = [[(weekday_summary["mean"] - weekday_summary["1%"]),
            (weekday_summary["99%"] - weekday_summary["mean"])]]

# Plot
title = ("PM2.5 Concentration by Day of the Week\n"
         "Mean and 98% Range\n"
         + demo_sensor.label)
ax = (weekday_summary["mean"]
      .plot(kind="bar", ylim=(0, None), color="black", title=title,
            yerr=yspread, legend=True, figsize=(12, 8)))
ax.set(ylabel="Concentration in µg/m³")
ax.xaxis.grid(False);