It's often useful to think about the visualisation design space before thinking about point designs. However, the visualisation design space is large. By scaffolding it, we can reassure ourselves that we haven't missed an obvious point design.

Here, I've attempted to scaffold a small part of the visualisation design space.
I've made two assumptions.
First, you have tidy data.
For the data to be tidy, each variable should form a column, each observation should form a row, and each type of observation should form a table.
Second, the data have *location*, *time*, and *category* variables, as well as a *value* variable.

There are three steps:

Think about the

*location*,*time*, and*category*variables. Think about the domains of these variables; that is, think about the unique values of these variables. Think about whether you're interested in*one*,*some*, or*all*unique values of these variables.Choose two variables and sketch a 3x3 grid. The rows are for the first variable. The columns are for the second variable. The first row/column has a cardinality of

*one*, the second row/column has a cardinality of*some*, and the third row/column has a cardinality of*all*. The variable you didn't choose has a cardinality of*one*.For each cell in the grid, sketch a single-view visualisation that represents the

*value*variable for the given cardinalities of the given row/column variables.

That's it! Well, almost. It's worth sketching several single-view visualisations in Step 3. If you're struggling, then try transposing the rows and columns in Step 2.

Here's an example of scaffolding a small part of the visualisation design space. The data, which are from the World Bank, are GDP per capita for the G20 countries from 1990 to 2015. I used Altair to produce the single-view visualisations. This is because Altair is more than a library: through Vega and Vega-Lite, it's a domain-specific language.

In [1]:

```
import altair as alt
import pandas as pd
from pandas_datareader import wb
```

In [2]:

```
alt.renderers.enable('notebook')
```

Out[2]:

In [3]:

```
g20_countries = ['AR', 'AU', 'BR', 'CA', 'CN', 'DE', 'EU', 'FR', 'GB', 'ID', 'IN', 'IT', 'JP', 'KR', 'MX', 'RU', 'SA', 'TR', 'US', 'ZA']
```

In [4]:

```
df = wb.download(indicator='NY.GDP.PCAP.CD', country=g20_countries, start=1990, end=2015, errors='ignore').sort_index()
```

In [5]:

```
df.columns = [x.replace('.', '-') for x in df.columns] # Altair doesn't like column names that contain periods.
```

Notice that we have tidy data.

In [6]:

```
df.head()
```

Out[6]:

The *location* variable is *country*.
The unique values of this variable are the 20 names of the G20 countries.
The *time* variable is *year*.
The unique values of this variable are the 26 years from 1990 to 2015.
There isn't a *category* variable.
The *value* variable is *NY-GDP-PCAP-CD*.

Here's the 3x3 grid, with rows as *location* and columns as *time*.

Location/Time | One | Some | All |
---|---|---|---|

One |
? | ? | ? |

Some |
? | ? | ? |

All |
? | ? | ? |

The easiest cell to fill is **one location, one time**.
We will use a single number.
Is a single number a single-view visualisation?
Well, it's faster and more accurate to extract information from a single number than it is to extract information from a chart that represents a single number.

In [7]:

```
def print_one_location_one_time(location='Canada', time='2015'):
print('{}, {}: {}'.format(location, time, df.loc[(location, time), 'NY-GDP-PCAP-CD']))
print_one_location_one_time()
```

If we move from the top left to the bottom right of the 3x3 grid, then the next easiest cells to fill are **some locations, one time** and **one location, some times**.

Let's consider **some locations, one time**: the top ten countries for 2015.
We will use a bar chart.

In [8]:

```
df_2015 = df.loc[(slice(None), '2015'), :]
```

In [9]:

```
df_2015_top_10 = df_2015.sort_values('NY-GDP-PCAP-CD', ascending=False).iloc[:10]
```

In [10]:

```
alt.Chart(df_2015_top_10.reset_index()).mark_bar().encode(x=alt.X('country', sort=None), y='NY-GDP-PCAP-CD')
```

Out[10]:

Let's consider **one location, some times**: Australia, from 2006 to 2015.
We will use a line chart.

In [11]:

```
df_australia = df.loc['Australia', slice('2006', '2015'), :]
```

In [12]:

```
alt.Chart(df_australia.reset_index()).mark_line().encode(x='year', y='NY-GDP-PCAP-CD')
```

Out[12]:

Let's update our 3x3 grid.

Location/Time | One | Some | All |
---|---|---|---|

One |
Single number | Line chart | ? |

Some |
Bar chart | ? | ? |

All |
? | ? | ? |

Let's consider **all locations, one time** and **one location, all times**.
To decide how to fill these cells, we should ask "How many is *all*?"

We know there are 20 unique values of the *location* variable and 26 unique values of the *time* variable.
These cardinalities are small enough to use bar charts and line charts again - we would add an extra bar and line for each extra country and year.
For larger *location* cardinalities, we might consider several bar charts, or *small multiples* - one for each group of locations.
For larger *time* cardinalities, we might consider a focus+context chart.

Let's update our 3x3 grid.

Location/Time | One | Some | All |
---|---|---|---|

One |
Single number | Line chart | Focus+context chart |

Some |
Bar chart | ? | ? |

All |
Small multiples | ? | ? |

Let's consider **some locations, some times**: the top five countries with the largest mean GDP per capita, from 2006 to 2015.
We will use a multi-series line chart, with each series encoded using a different named colour, or *colour hue*.
We can distinguish between six and 12 colour hues (Ware, 2008), so we will be able to distinguish between the five lines.

In [13]:

```
index_top_5 = df.loc[(slice(None), slice('2006', '2015')), :].groupby('country').mean().sort_values('NY-GDP-PCAP-CD', ascending=False).iloc[:5].index
```

In [14]:

```
df_top_5 = df.loc[index_top_5]
```

In [15]:

```
alt.Chart(df_top_5.reset_index()).mark_line().encode(x='year', y='NY-GDP-PCAP-CD', color='country')
```

Out[15]:

Let's update our 3x3 grid.

Location/Time | One | Some | All |
---|---|---|---|

One |
Single number | Line chart | Focus+context chart |

Some |
Bar chart | Multi-series line chart | ? |

All |
Small multiples | ? | ? |

We've been moving from the top left to the bottom right of the 3x3 grid.
For the final three cells, let's move from the bottom right to the top left of the 3x3 grid and consider **all locations, all times**.
We will use a matrix.

In [16]:

```
alt.Chart(df.reset_index()).mark_rect().encode(x='year', y='country', color='NY-GDP-PCAP-CD')
```

Out[16]:

Let's update our 3x3 grid.

Location/Time | One | Some | All |
---|---|---|---|

One |
Single number | Line chart | Focus+context chart |

Some |
Bar chart | Multi-series line chart | ? |

All |
Small multiples | ? | Matrix |

We're left with **all locations, some times** and **some locations, all times**.
I think these are the hardest cells to fill.
This is because to fill these cells, we really have to think about the trade-offs.

If we move from the top left to the bottom right of the 3x3 grid, then we could create another multi-series line chart.
However, remember that we can distinguish between six and 12 colour hues (Ware, 2008).
If we created another multi-series line chart, then we wouldn't be able to distingish between *all* locations.
Would we accept this trade-off?

If we move from the bottom right to the top left of the 3x3 grid, then we could create another matrix.
However, the combination of colour hue, colour luminance, and colour saturation is relatively ineffective, compared to the other visual channels (Munzner, 2014).
For **all locations, all times**, we trade-off ineffective visual channels for a compact single-view visualisation.
Would we accept this trade-off for **all locations, some times** and **some locations, all times**?

We know that the cardinalities of the *location* and *time* variables are small, so we will use a multi-series line chart for **all locations, some times** and **some locations, all times**.
However, we will use interaction to distinguish between *all* locations.
If you mouse over a line, then the line will highlight.
If the mouse cursor is at a value on the x and y axes, then the value will appear in a tooltip.
We will also reduce the opacity of each line, to distinguish between more dense and less dense bunches of lines.

In [17]:

```
highlight = alt.selection_single(on='mouseover', nearest=True)
```

In [18]:

```
alt.Chart(df.reset_index()).mark_line().encode(
x='year',
y='NY-GDP-PCAP-CD',
opacity=alt.condition(~highlight, alt.value(.25), alt.value(1)),
detail='country',
tooltip=['country', 'year', 'NY-GDP-PCAP-CD'],
).add_selection(highlight)
```

Out[18]:

Let's update our 3x3 grid.

Location/Time | One | Some | All |
---|---|---|---|

One |
Single number | Line chart | Focus+context chart |

Some |
Bar chart | Multi-series line chart | Interactive multi-series line chart |

All |
Small multiples | Interactive multi-series line chart | Matrix |

Here, I've attempted to scaffold a small part of the visualisation design space by emphasising nine pairwise comparisons of cardinalities and variables. By being systematic, hopefully we've reassured ourselves that we haven't missed an obvious point design.

Munzner, Tamara (2014). *Visualization Analysis and Design*. A.K. Peters Visualization Series, CRC Press.

Ware, Colin (2008). *Visual Thinking for Design*. Morgan Kaufmann.