This tutorial presents a few basic options for preparing 3D-printed visualizations from datasets that are available through the City of Toronto's Open Data Portal.
Most of the processes used are software/language agnostic, but we strongly encourage you to acquire a baseline understanding of the modern Python data science ecosystem. These are a few important pieces that you'll want to set up:
We will be using data from the City of Toronto's portal, but there are various other interesting datasets that you might consider working with. For example, you can collect and use your own biometric/self-tracking data if you have a wearable device like a fitbit. Kaggle provides lots of awesome datasets (the pokemon one is fun if you're working with kids): https://kaggle.com/datasets. Google has a dataset search tool: https://toolbox.google.com/datasetsearch. 538 has plenty of interesting political, social, and sports-related datasets: https://data.fivethirtyeight.com/. You might also familiarize yourself with the Open North community: https://github.com/opennorth.
There are various tools you can use to clean/munge/prepare your data, including Excel, Libre Office Calc, R, and many more. We prefer pandas, a Python library for data analysis. There are lots of great tutorials that will outline how to import and prepare data with pandas in an iPython/Jupyter notebook. Your best bet is to do the free datacamp tutorials if you're completely new to this stuff: https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python That said, we recommend either starting with something simple or spending plenty of time familiarizing yourself with a dataset in a spreadsheet application before moving to pandas dataframes (even though we have included some very basic cleaning functions in the notebook cells below).
For this exercise, we will be working with pedestrian data from the King Street Pilot Project. We scraped all the data from the monthly .pdf reports that city of Toronto has made available here: https://www.toronto.ca/city-government/planning-development/planning-studies-initiatives/king-street-pilot/data-reports-background-materials/ The data is available in the data directory of the repository that contains this file.
Start by importing the necessary libraries:
import pandas as pd
import numpy as np
import stat as st
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.plotly as py
import plotly.graph_objs as go
If you want, you can set a default figure size for charts:
plt.rcParams['figure.figsize'] = [10, 8]
Next, import your data. We have included a cleaned version of the dataset in the data directory. pvol is what is referred to as a "dataframe" in pandas terminology. You will often see the variable name df (e.g. df = pd.readcsv('example.csv')).
pvol = pd.read_csv('data/king_pedestrian_volume.csv')
Set the index for the pandas dataframe you've created:
pvol.set_index('street')
am_bathurst_baseline | am_bathurst_january | am_bathurst_february | am_bathurst_march | am_bathurst_april | am_bathurst_may | am_bathurst_june | am_spadina_baseline | am_spadina_january | am_spadina_february | ... | pm_bay_april | pm_bay_may | pm_bay_june | pm_jarvis_baseline | pm_jarvis_january | pm_jarvis_february | pm_jarvis_march | pm_jarvis_april | pm_jarvis_may | pm_jarvis_june | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
street | |||||||||||||||||||||
queen | 1810 | 1640 | 1750 | 1760 | 1790 | 1960 | 1770 | 2000 | 1880 | 1790 | ... | 4890 | 8340 | 9280 | 1320 | 1140 | 1300 | 1210 | 1300 | 1450 | 1460 |
king | 2820 | 2680 | 2620 | 2590 | 2580 | 2890 | 2780 | 4150 | 3580 | 3690 | ... | 5540 | 8060 | 8190 | 3370 | 3760 | 4050 | 3930 | 4060 | 3920 | 4080 |
2 rows × 56 columns
Create some objects for each street. Normally, we wouldn't want such long names (and wide dataframes), but explaining the hows and whys of reshaping data is not in the scope of this tutorial. If you're interested, read Hadley Wickham's papers on the subject (http://vita.had.co.nz/papers/tidy-data.html) or follow these instructions for reshaping data in pandas: https://pandas.pydata.org/pandas-docs/stable/reshaping.html
#### am objects
# bathurst
pvol_bathurst_am = pvol[['street',
'am_bathurst_baseline',
'am_bathurst_january',
'am_bathurst_february',
'am_bathurst_march',
'am_bathurst_april',
'am_bathurst_may',
'am_bathurst_june']]
# spadina
pvol_spadina_am = pvol[['street',
'am_spadina_baseline',
'am_spadina_january',
'am_spadina_february',
'am_spadina_march',
'am_spadina_april',
'am_spadina_may',
'am_spadina_june']]
# bay
pvol_bay_am = pvol[['street',
'am_bay_baseline',
'am_bay_january',
'am_bay_february',
'am_bay_march',
'am_bay_april',
'am_bay_may',
'am_bay_june']]
# jarvis
pvol_jarvis_am = pvol[['street',
'am_jarvis_baseline',
'am_jarvis_january',
'am_jarvis_february',
'am_jarvis_march',
'am_jarvis_april',
'am_jarvis_may',
'am_jarvis_june']]
#### pm objects
# bathurst
pvol_bathurst_pm = pvol[['street',
'pm_bathurst_baseline',
'pm_bathurst_january',
'pm_bathurst_february',
'pm_bathurst_march',
'pm_bathurst_april',
'pm_bathurst_may',
'pm_bathurst_june']]
# spadina
pvol_spadina_pm = pvol[['street',
'pm_spadina_baseline',
'pm_spadina_january',
'pm_spadina_february',
'pm_spadina_march',
'pm_spadina_april',
'pm_spadina_may',
'pm_spadina_june']]
# bay
pvol_bay_pm = pvol[['street',
'pm_bay_baseline',
'pm_bay_january',
'pm_bay_february',
'pm_bay_march',
'pm_bay_april',
'pm_bay_may',
'pm_bay_june']]
# jarvis
pvol_jarvis_pm = pvol[['street',
'pm_jarvis_baseline',
'pm_jarvis_january',
'pm_jarvis_february',
'pm_jarvis_march',
'pm_jarvis_april',
'pm_jarvis_may',
'pm_jarvis_june']]
Using the standard pandas plotting functions (which rely on matplotlib), you can prepare bare-bones static charts (you might use matplotlib or seaborn if you want greater customization options). There are lots of ways to adjust the colours if you want, but we like our charts to look like life savers ;-)
pvol_bathurst_am.plot.bar(x='street',
rot=0,
width=0.85,
title='AM Peak Pedestrian Volume Measured at Bathurst');
If you want horizontal charts, you can feed barh to the plot method:
pvol_bathurst_am.plot.barh(x='street',
rot=0,
width=0.85,
title='AM Peak Pedestrian Volume Measured at Bathurst')
plt.gca().invert_yaxis();
Far more interesting and useful is the potential for creating interactive charts inside a notebook. There are various libraries you can use (such as Bokeh or Pygal), but we find Plotly to be the most well-developed. It also has an easy-to-use web portal. What we're going to do next is create a grouped bar chart using Plotly's python library.
#### plotly-based grouped bar charts
# AM Bathurst
bath_baseline = go.Bar(
x=pvol['street'],
y=pvol['am_bathurst_baseline'],
name='AM Bathurst Baseline',
hoverinfo='y+name'
)
bath_january = go.Bar(
x=pvol['street'],
y=pvol['am_bathurst_january'],
name='AM Bathurst January',
hoverinfo='y+name'
)
bath_february = go.Bar(
x=pvol['street'],
y=pvol['am_bathurst_february'],
name='AM Bathurst February',
hoverinfo='y+name'
)
bath_march = go.Bar(
x=pvol['street'],
y=pvol['am_bathurst_march'],
name='AM Bathurst March',
hoverinfo='y+name'
)
bath_april = go.Bar(
x=pvol['street'],
y=pvol['am_bathurst_april'],
name='AM Bathurst April',
hoverinfo='y+name'
)
bath_may = go.Bar(
x=pvol['street'],
y=pvol['am_bathurst_may'],
name='AM Bathurst May',
hoverinfo='y+name'
)
bath_june = go.Bar(
x=pvol['street'],
y=pvol['am_bathurst_june'],
name='AM Bathurst June',
hoverinfo='y+name'
)
data = [bath_baseline, bath_january, bath_february, bath_march, bath_april, bath_may, bath_june]
layout = go.Layout(
barmode='group',
# bargap=0.15,
bargroupgap=0.1,
hovermode='closest'
# showlegend=False
)
am_bath_pvol_fig = go.Figure(data=data, layout=layout)
py.iplot(am_bath_pvol_fig, filename='am_bath_pvol_grouped-bar')
This chart displays monthly average pedestrian counts for the morning rush at the intersections Bathurst/Queen and Bathurst/King. The dataframe provides options for the 7-10 am and 4-7 pm peak periods at the intersections of Bathurst, Spadina, Bay, and Jarvis (at both King and Queen). Change your arguments accordingly to prepare different - or multiple - charts.
Remember this guy?
%%html
<img src="images/al-carbone.jpg">
Now that the Pilot Project has been running for almost a year, is there evidence that King is the "wasteland" Al Carbone claims it to be? Let's look at the data. Here's evening (4-7 pm) pedestrian counts for Spadina… right around early dinner time:
# PM Spadina
spadina_baseline = go.Bar(
x=pvol['street'],
y=pvol['pm_spadina_baseline'],
name='PM Spadina Baseline',
hoverinfo='y+name'
)
spadina_january = go.Bar(
x=pvol['street'],
y=pvol['pm_spadina_january'],
name='PM Spadina January',
hoverinfo='y+name'
)
spadina_february = go.Bar(
x=pvol['street'],
y=pvol['pm_spadina_february'],
name='PM Spadina February',
hoverinfo='y+name'
)
spadina_march = go.Bar(
x=pvol['street'],
y=pvol['pm_spadina_march'],
name='PM Spadina March',
hoverinfo='y+name'
)
spadina_april = go.Bar(
x=pvol['street'],
y=pvol['pm_spadina_april'],
name='PM Spadina April',
hoverinfo='y+name'
)
spadina_may = go.Bar(
x=pvol['street'],
y=pvol['pm_spadina_may'],
name='PM Spadina May',
hoverinfo='y+name'
)
spadina_june = go.Bar(
x=pvol['street'],
y=pvol['pm_spadina_june'],
name='PM Spadina June',
hoverinfo='y+name'
)
data = [spadina_baseline, spadina_january, spadina_february, spadina_march, spadina_april, spadina_may, spadina_june]
layout = go.Layout(
barmode='group',
# bargap=0.15,
bargroupgap=0.1,
hovermode='closest'
# showlegend=False
)
pm_spad_pvol_fig = go.Figure(data=data, layout=layout)
py.iplot(pm_spad_pvol_fig, filename='pm_spad_pvol_grouped-bar')
And here's Bay…
# PM Bay
bay_baseline = go.Bar(
x=pvol['street'],
y=pvol['pm_bay_baseline'],
name='PM Bay Baseline',
hoverinfo='y+name'
)
bay_january = go.Bar(
x=pvol['street'],
y=pvol['pm_bay_january'],
name='PM Bay January',
hoverinfo='y+name'
)
bay_february = go.Bar(
x=pvol['street'],
y=pvol['pm_bay_february'],
name='PM Bay February',
hoverinfo='y+name'
)
bay_march = go.Bar(
x=pvol['street'],
y=pvol['pm_bay_march'],
name='PM Bay March',
hoverinfo='y+name'
)
bay_april = go.Bar(
x=pvol['street'],
y=pvol['pm_bay_april'],
name='PM Bay April',
hoverinfo='y+name'
)
bay_may = go.Bar(
x=pvol['street'],
y=pvol['pm_bay_may'],
name='PM Bay May',
hoverinfo='y+name'
)
bay_june = go.Bar(
x=pvol['street'],
y=pvol['pm_bay_june'],
name='PM Bay June',
hoverinfo='y+name'
)
data = [bay_baseline, bay_january, bay_february, bay_march, bay_april, bay_may, bay_june]
layout = go.Layout(
barmode='group',
# bargap=0.15,
bargroupgap=0.1,
hovermode='closest'
# showlegend=False
)
pm_bay_pvol_fig = go.Figure(data=data, layout=layout)
py.iplot(pm_bay_pvol_fig, filename='pm_bay_pvol_grouped-bar')
Keep in mind that we have incomplete data, and won't be able to get a really good sense of the Pilot Project's impact without year over year data. These numbers also don't take into account things like major events (including TIFF), the impact that the Muskoka chair seating areas have had on public space use, and whether or not people are spending more or less time waiting for streetcars. While it is easy to look at the increase in pedestrian volume as evidence of the Pilot Project's success, how do we know the weather hasn't been the biggest driver (keep in mind that the baseline was captured in October)?
Now, let's take these same charts that we've prepared for the screen and render them as 3D models. This section assumes that you have the csv and bpy modules available in your Python ecosystem. Depending on your operating system and Python configuration, they may be pre-loaded, or you may need to install and configure separately.
These steps have already been done, but are included for reference:
%%html
<img src="images/blender.gif">
Here are some sample tiles and prototypes of tactile dashboard interfaces:
%%html
<img src="images/prototypes.jpg">
Now, we're going to switch to the recently-released 2016 Neighbourhood Profiles Dataset. (We had done a bunch of work with ward data in anticipation of the upcoming election, but it seems pretty irrelevant in light of recent events!) We're going to compare population growth between 2011 and 2016 (which is originally taken from the 2016 Census - more info here). There are plenty of interesting features of this dataset that you might consider using instead of population - language concentrations, income, citizenship, etc. We've already cleaned and processed the population data so it will play nice with QGIS. The raw and processed .csv files are in the data directory.
If you're going to use Excel or Calc to prep data for import into QGIS, here are some important steps:
Here are some additional things you can do with pandas and numpy:
Depending on the data you use, you might have to re-scale to make it printable. Refer to the following image:
%%html
<img src="images/ladder2.gif">
Import the data:
df = pd.read_csv('data/neighbourhood_pop.csv', dtype=str) # dtype str will keep the leading zeroes
df.head()
id | 2011 | 2016 | |
---|---|---|---|
0 | 001 | 0.0341 | 0.033312 |
1 | 002 | 0.032788 | 0.032954 |
2 | 003 | 0.010138 | 0.01036 |
3 | 004 | 0.010488 | 0.010529 |
4 | 005 | 0.00955 | 0.009456 |
Set index to ID:
df.set_index('id', inplace=True)
Convert strings to floats in order to use numpy functions:
df['2011'] = df['2011'].astype(str).astype(float)
df['2016'] = df['2016'].astype(str).astype(float)
You can use numpy to convert to square root, logarithmic, or whatever other scale you like:
df['2016'] = np.sqrt(df['2016'])
df['2011'] = np.log10(df['2011'])
df.head()
2011 | 2016 | |
---|---|---|
id | ||
001 | -1.467246 | 0.182516 |
002 | -1.484285 | 0.181532 |
003 | -1.994048 | 0.101784 |
004 | -1.979307 | 0.102611 |
005 | -2.019997 | 0.097242 |
When you're done processing, you can output a new .csv for import into QGIS (this has already been done):
df.to_csv('data/neighbourhood_pop_scaled.csv')
%%html
<img src="images/qgis.gif">
%%html
<img src="images/shapefile.gif">
This is what your printed maps will look like:
%%html
<img src="images/models.jpg">
There are numerous software applications that you might use for preparing models prior to setting them up to print. You can likely do most of your prep in Blender, but the learning curve is steep.
Some Useful Blender shortcuts:
keys | |
---|---|
a | select all |
c | circle select |
ctrl-lmb | lasso select |
b | border select |
ctrl-g | group selected objects |
m | when object selected, move to specific layer |
Some things to think about if you're preparing tactile models for blind users:
Printed tactile models do not have to be static! Think about how to separate your models into individual, reconfigurable/modular chunks in order to create dynamic data representations. It is easy to 3D print lego-like connectors onto the faces of your objects: https://www.thingiverse.com/search?q=lego+brick&dwh=525baba295ab0da. Additionally, attachable velcro tape gives you lots of options for creating endlessly modular graphics.
Remember, once you have a digital 3D data representation, it can usually be ported to any number of interaction contexts: