This is an example of bars, stacked, lines and area plots in one single diagram.
We display pollen count data from the site Hoya del Castillo in Spain collected by Davis and Stevenson (2007) obtained from the European pollen database (EPD). We extended this data by two additional columns: summer (JJA) and winter (DJF) temperature.
We start by importing the necessary libraries. We use pandas
to read and manage the pollen data and stratplot
for it's visualization.
import psyplot.project as psy
import pandas as pd
from psy_strat.stratplot import stratplot
import matplotlib as mpl
Adjusting the figure size and dpi improves the readability of the plots in this notebook.
%matplotlib inline
mpl.rcParams['figure.figsize'] = (16, 10)
mpl.rcParams['figure.dpi'] = 150
The data is stored as a comma-separated text file that can be loaded into a pandas.DataFrame
using pandas pandas.read_csv
function:
df = pd.read_csv('pollen-data.csv', index_col='agebp')
print(df.shape)
df.head(5)
(34, 37)
DJF Temperature | JJA Temperature | Alnus | Anthemis-type | Artemisia | Betula | Bidens-type | Carpinus | Caryophyllaceae | Cerealia-type | ... | Plantago coronopus | Plantago major/P. media | Plantago maritima | Potamogeton | Pteridium | Quercus ilex-type | Quercus suber-type | Ruppia | Sparganium | Ulmus | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
agebp | |||||||||||||||||||||
4690 | 5.6125 | 21.1750 | NaN | NaN | 4.0 | NaN | NaN | NaN | NaN | NaN | ... | 1.0 | NaN | NaN | NaN | NaN | 11.0 | NaN | NaN | NaN | NaN |
4890 | 6.3375 | 21.8250 | 1.0 | NaN | 6.0 | NaN | NaN | NaN | NaN | NaN | ... | 7.0 | 3.0 | NaN | NaN | 2.0 | 13.0 | NaN | NaN | NaN | NaN |
5087 | 6.2750 | 21.8750 | NaN | NaN | 8.0 | NaN | NaN | NaN | NaN | NaN | ... | 8.0 | 1.0 | NaN | NaN | NaN | 21.0 | NaN | 1.0 | NaN | NaN |
5278 | 4.8750 | 20.8250 | NaN | NaN | 7.0 | NaN | NaN | NaN | NaN | NaN | ... | 2.0 | NaN | NaN | NaN | NaN | 24.0 | NaN | 3.0 | NaN | NaN |
5465 | 2.2750 | 20.3125 | 1.0 | NaN | 4.0 | NaN | NaN | NaN | NaN | NaN | ... | 1.0 | NaN | NaN | NaN | NaN | 18.0 | NaN | 1.0 | NaN | NaN |
5 rows × 37 columns
This data contains 34 samples and 37 columns. Note that we chose the DataFrame to be indexed by the 'agebp'
column. This will be the vertical axis of our stratigraphic diagram that is shared between all the variables.
Now we can display this dataframe using stratplot
:
sp, groupers = stratplot(df)
You see now one plot for each column in the above dataframe. However, this figure is not very informative. The x-axis labels are hardly readable and the taxa all have different scalings. We can significantly improve this plot by grouping the variables together.
Luckily, the EPD comes with a mapping from taxon name to group names. This mapping is stored in the tab separated file epd-groups.tsv
:
groups = pd.read_csv('epd-groups.tsv', delimiter='\t', index_col=0)
groups.head(5)
groupname | |
---|---|
varname | |
Abies | Trees and shrubs |
Abies undiff. | Trees and shrubs |
Acacia | Trees and shrubs |
Acer | Trees and shrubs |
Acer cf. A. campestre | Trees and shrubs |
Using this data, we can group the columns in our DataFrame using the following function:
def grouper(col):
if 'Temperature' in col:
return 'Temperature'
else:
return groups.groupname.loc[col]
And have a look into how this functions groups our data:
group2taxon = pd.DataFrame.from_dict(df.groupby(grouper, axis=1).groups, orient='index').T
group2taxon.fillna('')
Aquatics | Dwarf shrubs | Helophytes | Herbs | Nonpollen | Temperature | Trees and shrubs | Vascular cryptogams (Pteridophytes) | |
---|---|---|---|---|---|---|---|---|
0 | Potamogeton | Ericaceae | Sparganium | Anthemis-type | Concentration pollen | DJF Temperature | Alnus | Filicopsida |
1 | Ruppia | Artemisia | JJA Temperature | Betula | Pteridium | |||
2 | Bidens-type | Carpinus | ||||||
3 | Caryophyllaceae | Corylus | ||||||
4 | Cerealia-type | Ephedra distachya-type | ||||||
5 | Chenopodiaceae | Ephedra fragilis-type | ||||||
6 | Compositae subf. Cichorioideae | Juniperus | ||||||
7 | Cruciferae | Olea | ||||||
8 | Cyperaceae | Pinus | ||||||
9 | Filipendula | Quercus ilex-type | ||||||
10 | Gramineae | Quercus suber-type | ||||||
11 | Helianthemum-type | Ulmus | ||||||
12 | Mentha-type | |||||||
13 | Plantago coronopus | |||||||
14 | Plantago major/P. media | |||||||
15 | Plantago maritima |
For our pollen diagram, we are actually only interested in Temperature, Herbs, Trees and shrubs. Therefore we can exclude the other groups from our plot using the exclude parameter of stratplot
. Additionally we transform the pollen counts into percentages to get a better scaling of the diagram. Since Trees and shrubs and Herbs should both be considered when calculating the percentages, we additionally put them into a larger Pollen group.
sp, groupers = stratplot(
df, grouper,
widths={'Temperature': 0.1, 'Pollen': 0.9},
percentages=['Pollen'],
subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
'Vascular cryptogams (Pteridophytes)'])
/Users/psommer/miniconda3/lib/python3.6/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce return umr_minimum(a, axis, None, out, keepdims) /Users/psommer/miniconda3/lib/python3.6/site-packages/numpy/core/_methods.py:26: RuntimeWarning: invalid value encountered in reduce return umr_maximum(a, axis, None, out, keepdims)
This diagram already looks much better, however there are still to many taxa in there that have only very little amount of data. Therefore we use the thresh parameter to set a threshold of 1%. Every taxon now that is never above 1% will not be displayed.
Additionally we apply a new order to the columns such that we put Juniperus, Pinus and Quercus ilex-type, and then the other trees and shrubs to the left. Note that, if you run this in the psyplot GUI, you can change the order of the variables more easily without scripting. However, it is also not so difficult using the reorder
method of the grouper for the pollen variables.
sp, groupers = stratplot(
df, grouper,
thresh=1.0,
widths={'Temperature': 0.1, 'Pollen': 0.9},
percentages=['Pollen'],
subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
'Vascular cryptogams (Pteridophytes)'])
# apply a new order where we first display Juniperus, Pinus and Quercus, and then the rest
pollen_grouper = groupers[1]
first_taxa = ['Juniperus', 'Pinus', 'Quercus ilex-type']
remaining_trees = group2taxon['Trees and shrubs'][
~group2taxon['Trees and shrubs'].isin(first_taxa)].dropna().tolist()
neworder = first_taxa + remaining_trees
pollen_grouper.reorder(neworder)
/Users/psommer/miniconda3/lib/python3.6/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce return umr_minimum(a, axis, None, out, keepdims) /Users/psommer/miniconda3/lib/python3.6/site-packages/numpy/core/_methods.py:26: RuntimeWarning: invalid value encountered in reduce return umr_maximum(a, axis, None, out, keepdims)
Finally, we can display the temperature columns in one single diagram because they share the same units. This is done using the all_in_one parameter. Additionally we can include a sum of the different pollen subgroups using the summed parameter.
sp, groupers = stratplot(
df, grouper,
thresh=1.0,
all_in_one=['Temperature'],
summed=['Trees and shrubs', 'Herbs'],
widths={'Temperature': 0.1, 'Pollen': 0.9},
percentages=['Pollen'],
subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
'Vascular cryptogams (Pteridophytes)'])
# apply a new order where we first display Juniperus, Pinus and Quercus, and then the rest
pollen_grouper = groupers[1]
first_taxa = ['Juniperus', 'Pinus', 'Quercus ilex-type']
remaining_trees = group2taxon['Trees and shrubs'][
~group2taxon['Trees and shrubs'].isin(first_taxa)].dropna().tolist()
neworder = first_taxa + remaining_trees
pollen_grouper.reorder(neworder)
/Users/psommer/miniconda3/lib/python3.6/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce return umr_minimum(a, axis, None, out, keepdims) /Users/psommer/miniconda3/lib/python3.6/site-packages/numpy/core/_methods.py:26: RuntimeWarning: invalid value encountered in reduce return umr_maximum(a, axis, None, out, keepdims)
Last but not least, let's talk a bit about the final layout. To better distinguish herbs from Trees and shrubs, we can display this group using a bar plot by making use of the use_bars parameter of stratplot
. Furthermore we decrease the size of the groupers and increase the height of the plot using the trunc_height
parameter.
Additionally we can use the psyplot framework to do some changes to the colors, etc..:
+
This modifications make the plot look much nicer!
sp, groupers = stratplot(
df, grouper,
thresh=1.0,
trunc_height=0.1,
use_bars=['Herbs'],
all_in_one=['Temperature'],
summed=['Trees and shrubs', 'Herbs'],
widths={'Temperature': 0.1, 'Pollen': 0.9},
percentages=['Pollen'],
calculate_percentages=True,
subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
'Vascular cryptogams (Pteridophytes)'])
# apply a new order where we first display Juniperus, Pinus and Quercus, and then the rest
pollen_grouper = groupers[1]
first_taxa = ['Juniperus', 'Pinus', 'Quercus ilex-type']
remaining_trees = group2taxon['Trees and shrubs'][
~group2taxon['Trees and shrubs'].isin(first_taxa)].dropna().tolist()
neworder = first_taxa + remaining_trees
pollen_grouper.reorder(neworder)
# -- psyplot update
blue = '#1f77b4'
orange = '#ff7f0e'
green = '#2ca02c'
red = '#d62728'
# change the color of trees and shrubs to green
sp(group='Trees and shrubs').update(color=[green])
# exaggerate the trees with low counts by a factor of 4
sp(name=remaining_trees).update(exag='areax', exag_factor=4)
# mark small taxon occurences below 1% with a +
sp(maingroup='Pollen').update(occurences=1.0)
# change the color of JJA temperature to red, shorten legend labels and
# change the x- and y-label
sp(group='Temperature').update(color=[blue, red], legendlabels=['DJF', 'JJA'],
ylabel='Age BP [years]', xlabel='$^\circ$C',
legend={'loc': 'lower left'})
# change the color of the summed trees and shrubs to green and put the legend
# on the bottom
sp(group='Summed').update(color=[green, orange], legend={'loc': 'lower left'})
/Users/psommer/miniconda3/lib/python3.6/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce return umr_minimum(a, axis, None, out, keepdims) /Users/psommer/miniconda3/lib/python3.6/site-packages/numpy/core/_methods.py:26: RuntimeWarning: invalid value encountered in reduce return umr_maximum(a, axis, None, out, keepdims) /Users/psommer/Documents/myplots-scripts/psy-strat/psy_strat/plotters.py:349: RuntimeWarning: invalid value encountered in greater_equal mask = (arr.values >= vmin) & (arr.values <= vmax) /Users/psommer/Documents/myplots-scripts/psy-strat/psy_strat/plotters.py:349: RuntimeWarning: invalid value encountered in less_equal mask = (arr.values >= vmin) & (arr.values <= vmax)
psy.close('all')