# load libraries
import os, sys
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
Now import the data in various format for analysis. First check the folder for data. Commands starting with !
indicates they are system commmands.
!dir
Volume in drive E is New Volume Volume Serial Number is A2BF-3D1C Directory of E:\Yichuan\Climate_vulnerability_wh\jupyter_workspace 11/07/2016 16:17 <DIR> . 11/07/2016 16:17 <DIR> .. 26/05/2016 10:00 854 .gitignore 26/05/2016 10:04 <DIR> .ipynb_checkpoints 16/06/2016 15:06 286,727 agg_amp.csv 16/06/2016 15:06 456,560 agg_bird.csv 16/06/2016 15:06 68,377 agg_coral.csv 10/02/2016 11:06 8,536,618 amp.xlsx 10/02/2016 11:06 10,999,911 bird.xlsx 10/02/2016 11:06 4,465,794 coral.xlsx 26/05/2016 09:57 35,815 LICENSE 26/05/2016 10:03 290 README.md 25/05/2016 12:33 14,251 region.csv 11/07/2016 16:17 4,613 report.ipynb 10/05/2016 16:44 54,522,213 result_final.csv 03/03/2016 13:30 40,785,576 rl_attr.csv 15/12/2015 17:54 154,368,653 sis.csv 16/06/2016 15:06 2,036,150 wh_amp.csv 16/06/2016 15:06 22,931,649 wh_bird.csv 16/06/2016 15:06 3,343,584 wh_coral.csv 16/06/2016 17:51 2,431,705 workspace.ipynb 18 File(s) 305,289,340 bytes 3 Dir(s) 788,429,307,904 bytes free
# CCV analysis results
amp = pd.read_excel('amp.xlsx')
bird = pd.read_excel('bird.xlsx')
coral = pd.read_excel('coral.xlsx')
# check data for different structure
# the intersection result processed in arcmap, essentially: wdpaid, id_no, per_overlap, + sp attr and wh attr
# Essentially, POS invalid rows removed, duplicated dissolved and percentage overlap calculated
# for more information, consult Evernote notes
result = pd.read_csv('result_final.csv', encoding='latin1')
sis = pd.read_csv('sis.csv', sep='|')
rl = pd.read_csv('rl_attr.csv')
# no duplicates for bird and amp
(amp.Fullname.size == amp.index.size) and (bird.Fullname.size == bird.index.size)
True
# coral data has duplicates
print('unique coral names:', coral.Fullname.unique().size)
print('total rows:', coral.index.size)
unique coral names: 797 total rows: 1594
# get only the half
coral_unique = coral.groupby('Fullname').first().reset_index()
coral_unique.index.size
797
result.per.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x111137b8>
Note: areakm2_x
is the actual overlap between pairs of PA and species, while areakm2_y
refers to the size of WH site.
result.head(5)
Unnamed: 0 | wdpaid | id_no | areakm2_x | areakm2_y | per | en_name | fr_name | status_yr | rep_area | ... | phylum_name | class_name | order_name | family_name | genus_name | species_name | category | biome_marine | biome_freshwater | biome_terrestrial | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 191 | 2057.0 | 106497.703640 | 146962.025607 | 0.724661 | Galápagos Islands | Îles Galápagos | 1978 | 140665.14 | ... | CHORDATA | MAMMALIA | CARNIVORA | OTARIIDAE | Arctocephalus | galapagoensis | EN | t | f | t |
1 | 1 | 191 | 2474.0 | 138918.354698 | 146962.025607 | 0.945267 | Galápagos Islands | Îles Galápagos | 1978 | 140665.14 | ... | CHORDATA | MAMMALIA | CETARTIODACTYLA | BALAENOPTERIDAE | Balaenoptera | acutorostrata | LC | t | f | f |
2 | 2 | 2012 | 2474.0 | 2129.930772 | 5853.472599 | 0.363875 | Everglades National Park | Parc national des Everglades\r\r\r\n | 1979 | 5929.20 | ... | CHORDATA | MAMMALIA | CETARTIODACTYLA | BALAENOPTERIDAE | Balaenoptera | acutorostrata | LC | t | f | f |
3 | 3 | 2018 | 2474.0 | 2549.486762 | 97284.250341 | 0.026207 | Kluane / Wrangell-St Elias / Glacier Bay / Tat... | Kluane / Wrangell-St Elias / Glacier Bay / Tat... | 1979 | 98391.21 | ... | CHORDATA | MAMMALIA | CETARTIODACTYLA | BALAENOPTERIDAE | Balaenoptera | acutorostrata | LC | t | f | f |
4 | 4 | 2554 | 2474.0 | 4.105942 | 5502.386948 | 0.000746 | Darien National Park | Parc national du Darien | 1981 | 5970.00 | ... | CHORDATA | MAMMALIA | CETARTIODACTYLA | BALAENOPTERIDAE | Balaenoptera | acutorostrata | LC | t | f | f |
5 rows × 26 columns
result.dtypes
Unnamed: 0 int64 wdpaid int64 id_no float64 areakm2_x float64 areakm2_y float64 per float64 en_name object fr_name object status_yr int64 rep_area float64 gis_area float64 country object crit object areakm2 float64 binomial object kingdom_name object phylum_name object class_name object order_name object family_name object genus_name object species_name object category object biome_marine object biome_freshwater object biome_terrestrial object dtype: object
The climate vulnerability analysis was done back using an older version of Red List 2009 (methodology) and the current species range distribution data is the latest 2015-4 version. Therefore it is likely the taxanomies may change. In some cases, changes may be simple for example renaming genus name or other higher taxanomy, while others may be extremely convoluted, involving spliting of one previously recognised species into multiple ones or vice versa merging previously different species. It is very difficult to reconcile such differences.
Here, I explore such mismatches between birds, amphibians and warm water reef-building corals.
print('Total birds in SIS,', (sis['class'] == 'AVES').sum())
# birds SIS
sis_bird = sis[sis['class']=='AVES']
Total birds in SIS, 10424
print('unique birds in CCV analysis:', bird.Fullname.unique().size)
print('unique birds in SIS:', sis_bird.friendly_name.unique().size)
unique birds in CCV analysis: 9856 unique birds in SIS: 10424
Differences between bird numbers, and similarly other taxa groups
sis_amp = sis[sis['class']=='AMPHIBIA']
print('Birds in CCV but not in SIS:', np.setdiff1d(bird.Fullname.unique(), sis_bird.friendly_name.unique()).size)
print('Total birds in CCV:', bird.index.size)
print('Amphibians in CCV but not in SIS:', np.setdiff1d(amp.Fullname.unique(), sis_amp.friendly_name.unique()).size)
print('Total amphibians in CCV:', amp.index.size)
print('Corals in CCV but not in SIS:', np.setdiff1d(coral_unique.Fullname.unique(), sis.friendly_name.unique()).size)
print('Total corals in CCV:', coral_unique.index.size)
Birds in CCV but not in SIS: 673 Total birds in CCV: 9856 Amphibians in CCV but not in SIS: 502 Total amphibians in CCV: 6204 Corals in CCV but not in SIS: 19 Total corals in CCV: 797
Given the relatively low number and proportion of such missing species due to taxonomic changes, it is unlikely the result may change significantly. For the purpose of deriving a globally consistent picture, we simply exclude these species.
Note the datatype in the result table is somehow different (int vs float) and it will impact comparisons. This needs to be explicited addressed, by rounding.
result.ix[176646].id_no
61623293.999999993
result.id_no.astype('int64')
will not work as it simply gets rid of its fraction. Instead round(0)
method should be used
result['id_no_int'] = result.id_no.round(0)
The effect can be easily illustrated. The difference has reduced from 1581 to 36.
(result.id_no.isin(sis.taxonid)).sum()
164315
dif_spatial = np.setdiff1d(result.id_no, sis.taxonid)
dif_spatial.size
1581
dif_spatial = np.setdiff1d(result.id_no_int, sis.taxonid)
dif_spatial.size
36
Let's take a look at the differece, i.e., IDS in the intersection result but not in the SIS.
dif_sis = result[result.id_no.isin(dif_spatial)][['id_no', 'class_name', 'binomial']]
dif_sis['class_name'].unique()
array(['ACTINOPTERYGII', 'MAGNOLIOPSIDA', 'LILIOPSIDA', 'GASTROPODA', 'MAMMALIA', 'SARCOPTERYGII', 'INSECTA', 'BIVALVIA'], dtype=object)
Now it is clear that the odd missing species should not impact our analysis, which only looks at birds, amphibians and corals. In other words, the IDs in the result table having corresponding SIS entries. Given that the result table contains already all SIS information, there is no need to join the SIS table. What needs to be address now is to compare the result table directly with the climate vulnerability analysis.
dif_amp_result = np.setdiff1d(amp.Fullname, result.binomial)
dif_amp_sis = np.setdiff1d(amp.Fullname, sis.friendly_name)
dif_amp_result.size, dif_amp_sis.size
(4174, 502)
# get amp_result using CCV, SIS and partialoverlap_result
amp_result = pd.merge(amp, sis_amp, left_on = 'Fullname', right_on='friendly_name')
amp_result = pd.merge(amp_result, result, left_on='taxonid', right_on='id_no_int')
amp_result.index
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 4746, 4747, 4748, 4749, 4750, 4751, 4752, 4753, 4754, 4755], dtype='int64', length=4756)
amp.index
RangeIndex(start=0, stop=6204, step=1)
# what happens if a join is made directly without going through SIS and ID_NO
test_amp = pd.merge(amp, result, left_on='Fullname', right_on='binomial')
test_amp.index
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 4762, 4763, 4764, 4765, 4766, 4767, 4768, 4769, 4770, 4771], dtype='int64', length=4772)
# the difference
test_dif = np.setdiff1d(test_amp.id_no_int.unique(), amp_result.id_no_int.unique())
test_dif.size
11
There might be something peculiar happening - as in theory the number should match. The fact there are 11 mismatching names suggest there is a possibility that the name in SIS still mismatches. See below the missing species using the CCV-SIS join.
test_amp[test_amp.id_no_int.isin(test_dif)]
SVP_ID | SIS_GAA_ID | GAA Family | Genus | Species | Fullname | Threatened | SUSC_A_Habitats | SUSC_A_aquatic larvae | SUSC_B_Temperature Range | ... | class_name | order_name | family_name | genus_name | species_name | category | biome_marine | biome_freshwater | biome_terrestrial | id_no_int | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
398 | 3260 | 9380 | MICROHYLIDAE | Anodonthyla | boulengerii | Anodonthyla boulengerii | NaN | L | L | L | ... | AMPHIBIA | ANURA | MICROHYLIDAE | Anodonthyla | boulengeri | LC | f | f | t | 57674.0 |
797 | 2771 | 32142 | MICROHYLIDAE | Chiasmocleis | panamensis | Chiasmocleis panamensis | NaN | L | H | H | ... | AMPHIBIA | ANURA | MICROHYLIDAE | Elachistocleis | panamensis | LC | f | t | t | 57761.0 |
798 | 2771 | 32142 | MICROHYLIDAE | Chiasmocleis | panamensis | Chiasmocleis panamensis | NaN | L | H | H | ... | AMPHIBIA | ANURA | MICROHYLIDAE | Elachistocleis | panamensis | LC | f | t | t | 57761.0 |
978 | 4258 | 51160 | SALAMANDRIDAE | Cynops | cyanurus | Cynops cyanurus | NaN | L | L | L | ... | AMPHIBIA | CAUDATA | SALAMANDRIDAE | Hypselotriton | cyanurus | LC | f | t | t | 59440.0 |
979 | 4258 | 51160 | SALAMANDRIDAE | Cynops | cyanurus | Cynops cyanurus | NaN | L | L | L | ... | AMPHIBIA | CAUDATA | SALAMANDRIDAE | Hypselotriton | cyanurus | LC | f | t | t | 59440.0 |
980 | 3529 | 51161 | SALAMANDRIDAE | Cynops | orientalis | Cynops orientalis | NaN | L | L | L | ... | AMPHIBIA | CAUDATA | SALAMANDRIDAE | Hypselotriton | orientalis | LC | f | t | t | 59442.0 |
981 | 3529 | 51161 | SALAMANDRIDAE | Cynops | orientalis | Cynops orientalis | NaN | L | L | L | ... | AMPHIBIA | CAUDATA | SALAMANDRIDAE | Hypselotriton | orientalis | LC | f | t | t | 59442.0 |
982 | 3529 | 51161 | SALAMANDRIDAE | Cynops | orientalis | Cynops orientalis | NaN | L | L | L | ... | AMPHIBIA | CAUDATA | SALAMANDRIDAE | Hypselotriton | orientalis | LC | f | t | t | 59442.0 |
983 | 3529 | 51161 | SALAMANDRIDAE | Cynops | orientalis | Cynops orientalis | NaN | L | L | L | ... | AMPHIBIA | CAUDATA | SALAMANDRIDAE | Hypselotriton | orientalis | LC | f | t | t | 59442.0 |
1306 | 2789 | 20162 | MICROHYLIDAE | Gastrophryne | pictiventris | Gastrophryne pictiventris | NaN | L | L | H | ... | AMPHIBIA | ANURA | MICROHYLIDAE | Hypopachus | pictiventris | LC | f | t | t | 57816.0 |
1307 | 2790 | 19796 | MICROHYLIDAE | Gastrophryne | usta | Gastrophryne usta | NaN | L | L | L | ... | AMPHIBIA | ANURA | MICROHYLIDAE | Hypopachus | ustus | LC | f | t | t | 57817.0 |
1393 | 3897 | 100217 | HELEOPHRYNIDAE | Heleophryne | natalensis | Heleophryne natalensis | NaN | L | L | L | ... | AMPHIBIA | ANURA | HELEOPHRYNIDAE | Hadromophryne | natalensis | LC | f | t | t | 55273.0 |
1751 | 2792 | 29294 | MICROHYLIDAE | Hyophryne | histrio | Hyophryne histrio | NaN | L | L | H | ... | AMPHIBIA | ANURA | MICROHYLIDAE | Stereocyclops | histrio | DD | f | t | t | 10634.0 |
2093 | 2137 | 29203 | BRACHYCEPHALIDAE | Ischnocnema | paulodutrai | Ischnocnema paulodutrai | NaN | H | L | L | ... | AMPHIBIA | ANURA | CRAUGASTORIDAE | Pristimantis | paulodutrai | LC | f | f | t | 56835.0 |
4522 | 5278 | 9562 | MICROHYLIDAE | Stumpffia | grandis | Stumpffia grandis | NaN | L | L | L | ... | AMPHIBIA | ANURA | MICROHYLIDAE | Rhombophryne | grandis | DD | f | f | t | 58009.0 |
4523 | 5282 | 9566 | MICROHYLIDAE | Stumpffia | tridactyla | Stumpffia tridactyla | NaN | L | L | L | ... | AMPHIBIA | ANURA | MICROHYLIDAE | Rhombophryne | tridactyla | DD | f | f | t | 58015.0 |
16 rows × 58 columns
As a result, use direct join between RL intersection and CCV.
## remove test variables
del amp_result, test_amp
There are several tasks outstanding:
wdpaid, id_no
look up table is then filtered, and it can be further refined by variables such as IUCN Red List category, CCV traits etc.def f(input_result_table, cut-off-value):
new_table= input_result_table[input_result_table.per >= cut-off-value]
# function to produces outputs based on the result table
result = process_result(new_table)
return result
The EOO (extent of occurence) nature of the Red List species distribution polygon, an overlap between a species and a WH site does not necessarily suggest that a species is present. This is made worse if only part of the distribution polygon intersects with the WH site - this could be a digitisation error or simply inaccuracy due to a scale mismatch. Such 'species within WH sites' as a result must be removed. Quantiles of can be a useful way to look at.
# what happens if a join is made directly without going through SIS and ID_NO
result_amp = pd.merge(amp, result, left_on='Fullname', right_on='binomial')
result_bird = pd.merge(bird, result, left_on='Fullname', right_on='binomial')
result_coral = pd.merge(coral_unique, result, left_on='Fullname', right_on='binomial')
all_per_abc = pd.concat([result_amp.per, result_bird.per, result_coral.per])
all_per_abc.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0xc051dd8>
The large majority of overlaps are fairely big, suggesting sites WH are generally entirely covered by species. This is not surprising as most species distributions polygons are a few magnitudes bigger than WH sites. Yet, it is possible that range-restricted and/or threatened species may have much smaller range.
all_per_abc.quantile(0.5)
0.9582745841568836
For illustration purposes, note the difference of choosing a cut-off value between 10% and 15% is not signicant (~2%, or ~2000 pa-species pair less). This should have minimal effects on the result.
(all_per_abc>0.15).sum(), (all_per_abc>0.1).sum(), (all_per_abc>0.05).sum()
(56508, 58029, 60301)
((all_per_abc>0.1).sum() - (all_per_abc>0.15).sum())/all_per_abc.index.size
0.022358440642088553
However, if we look at amphibians, birds and mammals separately, it is easily noticeable that corals have significant number of low percentage overlaps. This is because marine World Heritage sites tend to be rather extensive and huge compared to coral distributions which are coastal. Thus it is important to examine the absolute overlap as well for corals.
result_amp.per.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0xc051f28>
result_bird.per.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0xb7b1d68>
result_coral.per.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0xd7be4a8>
result_amp.areakm2_x.hist(bins =50, log=True)
<matplotlib.axes._subplots.AxesSubplot at 0x10ca4f60>
result_bird.areakm2_x.hist(bins =50, log=True)
<matplotlib.axes._subplots.AxesSubplot at 0x1d1343c8>
result_coral.areakm2_x.hist(bins =50, log=True)
<matplotlib.axes._subplots.AxesSubplot at 0x1d267f60>
The histograms show all three taxa have a bimodal distribution, with significant number of values near two poles (very large overlaps or very small ones)
result_coral.sort_values('per', ascending=True)[['en_name', 'binomial', 'areakm2_x', 'per']].head(20)
en_name | binomial | areakm2_x | per | |
---|---|---|---|---|
5363 | Darien National Park | Millepora intricata | 0.033701 | 0.000006 |
7471 | Tropical Rainforest Heritage of Sumatra | Podabacia motuporensis | 0.874951 | 0.000034 |
2550 | Tropical Rainforest Heritage of Sumatra | Ctenactis albitentaculata | 0.999503 | 0.000038 |
169 | Tropical Rainforest Heritage of Sumatra | Acropora abrolhosensis | 0.999503 | 0.000038 |
4692 | Henderson Island | Leptastrea pruinosa | 0.002691 | 0.000065 |
430 | Henderson Island | Acropora cytherea | 0.002691 | 0.000065 |
777 | Henderson Island | Acropora hyacinthus | 0.002691 | 0.000065 |
1577 | Henderson Island | Acropora subulata | 0.002691 | 0.000065 |
3214 | Henderson Island | Favia matthaii | 0.002691 | 0.000065 |
755 | Henderson Island | Acropora humilis | 0.002691 | 0.000065 |
6285 | Henderson Island | Montipora venosa | 0.002691 | 0.000065 |
7714 | Henderson Island | Porites lobata | 0.002691 | 0.000065 |
281 | Henderson Island | Acropora austera | 0.002691 | 0.000065 |
1347 | Henderson Island | Acropora retusa | 0.002691 | 0.000065 |
2389 | Henderson Island | Caulastrea furcata | 0.002691 | 0.000065 |
1362 | Henderson Island | Acropora robusta | 0.002691 | 0.000065 |
5587 | Henderson Island | Montipora australiensis | 0.002691 | 0.000065 |
4715 | Henderson Island | Leptastrea purpurea | 0.002691 | 0.000065 |
5133 | Henderson Island | Lobophyllia hemprichii | 0.002691 | 0.000065 |
7743 | Henderson Island | Porites lutea | 0.002691 | 0.000065 |
result_coral[['en_name', 'binomial', 'areakm2_x', 'per']][result_coral.en_name.isin(['Henderson Island'])].head()
en_name | binomial | areakm2_x | per | |
---|---|---|---|---|
195 | Henderson Island | Acropora aculeus | 0.002691 | 0.000065 |
281 | Henderson Island | Acropora austera | 0.002691 | 0.000065 |
350 | Henderson Island | Acropora cerealis | 0.002691 | 0.000065 |
430 | Henderson Island | Acropora cytherea | 0.002691 | 0.000065 |
480 | Henderson Island | Acropora digitifera | 0.002691 | 0.000065 |
The threshold value can be reasonably justified by using both the perentage overlap and actual overlap. It is obvious if the entire WH site is covered by a species, then it should be counted. If the percentage overlap is too small, it could be either due to an inaccurate boundary, in which case, the species should not be counted; or a genunie overlap if the WH is considerably larger. Therefore, by adding an additional test to the absolute overlap value in $km^2$, ommissions due to small percentage overlap can be reduced.
Let's test the effect of using various emperical values of per
and abskm2
to remove artifical overlaps caused by the inaccuracy of boundaries. This represents an optimistic estimate of the number of species in WH (reinforced by the nature of Red List EOO)
def test_params(per=None, abkm2=None):
if per is None and abkm2 is None:
result = None
if per and abkm2: # both conditions are applied
result = [(result_taxa[(result_taxa.per>per)|(result_taxa.areakm2_x>abkm2)].index.size, result_taxa.index.size) for \
result_taxa in [result_amp, result_bird, result_coral]]
elif per and abkm2 is None: # only per
result = [(result_taxa[(result_taxa.per>per)].index.size, result_taxa.index.size) for \
result_taxa in [result_amp, result_bird, result_coral]]
elif per is None and abkm2:
result = [(result_taxa[(result_taxa.areakm2_x>abkm2)].index.size, result_taxa.index.size) for \
result_taxa in [result_amp, result_bird, result_coral]]
else:
return None
print(per, abkm2, result)
# # check the number of rows after applying criteria
# for each in zip(['after', 'before'], *test_params(0.15, 1)):
# print(each)
# per, abkm2, (after and before) for amp, bird and coral
print('---only per---')
test_params(0.05)
test_params(0.15)
test_params(0.25)
test_params(0.50)
print('---only abs km2---')
test_params(abkm2=1)
test_params(abkm2=5)
test_params(abkm2=10)
print('--- per or abs km2---')
test_params(0.15, 1)
test_params(0.15, 2)
test_params(0.25, 1)
test_params(0.25, 2)
test_params(0.25, 5)
---only per--- 0.05 None [(4140, 4772), (50190, 54446), (5971, 8810)] 0.15 None [(3751, 4772), (47361, 54446), (5396, 8810)] 0.25 None [(3504, 4772), (45570, 54446), (5293, 8810)] 0.5 None [(3047, 4772), (41125, 54446), (3795, 8810)] ---only abs km2--- None 1 [(4689, 4772), (53712, 54446), (8445, 8810)] None 5 [(4575, 4772), (52187, 54446), (7628, 8810)] None 10 [(4523, 4772), (51560, 54446), (7326, 8810)] --- per or abs km2--- 0.15 1 [(4731, 4772), (54144, 54446), (8445, 8810)] 0.15 2 [(4711, 4772), (53852, 54446), (7694, 8810)] 0.25 1 [(4731, 4772), (54139, 54446), (8445, 8810)] 0.25 2 [(4701, 4772), (53704, 54446), (7694, 8810)] 0.25 5 [(4661, 4772), (53298, 54446), (7628, 8810)]
The result corroborates the distributions that percentage overlaps are sensitive and their change has a considerable impact on the number of filtered rows, while s small increase in absolute area in $km^2$ seems to have little effect. It is observed that an increase of per from 15%
to 25%
poses little change.
The threshold values chosen here may present a rather insigificant factor compared to the assumption made by the EOO intersection, i.e., even though a species overlaps 100% with a given WH, it may still be absent. However this approach remains a popular method in estimating global species conservation and offer a consistent view for comprehensively assessed species, in the absence of true area of occupancy (AOO) data.
I use 15%
and 1km^2
as cut-off values
To be policy relavant at a site level, it's imperative that climate vulnerability information on individual World Heritage sites be obtained. This analysis illustrates how many amphibians, birds and coral species are affected by climate change in each World Heritage site, by using both the total number of climate vulnerable species and their proportion. Further aggretations can be used to reveal each component within sensitivity, low adaptability and exposure.
# get filtered result
result_amp_f, result_bird_f, result_coral_f = [result_taxa[(result_taxa.per>.15)|(result_taxa.areakm2_x>1)] for \
result_taxa in [result_amp, result_bird, result_coral]]
# get species numbers
print('Filtered', 'Total-intersect', 'Total')
print(result_amp_f.Fullname.unique().size, result_amp.Fullname.unique().size, amp.Fullname.unique().size)
print(result_bird_f.Fullname.unique().size, result_bird.Fullname.unique().size, bird.Fullname.unique().size)
print(result_coral_f.Fullname.unique().size, result_coral.Fullname.unique().size, coral.Fullname.unique().size)
Filtered Total-intersect Total 2013 2030 6204 6914 6924 9856 724 727 797
Using a lambda function as an argument to get
## agg_dict passed to do multiple aggregation at the same time
## x is a group of FINAL_SCOREs
agg_dict = {'FINAL_SCORE':\
{'H_vul_per': lambda x: (x=='H').sum()/x.size, \
'total_H_vul': lambda x: (x=='H').sum(),\
'total_vul':len}}
## amp
amp_v = result_amp_f.groupby(['wdpaid', 'en_name']).agg(agg_dict).reset_index()
amp_v.columns = amp_v.columns.droplevel()
# birds
bird_v = result_bird_f.groupby(['wdpaid', 'en_name']).agg(agg_dict).reset_index()
bird_v.columns = bird_v.columns.droplevel()
# corals
coral_v = result_coral_f.groupby(['wdpaid', 'en_name']).agg(agg_dict).reset_index()
coral_v.columns = coral_v.columns.droplevel()
bird_v.head()
H_vul_per | total_vul | total_H_vul | |||
---|---|---|---|---|---|
0 | 191 | Galápagos Islands | 0.129412 | 85 | 11 |
1 | 197 | Tikal National Park | 0.179661 | 295 | 53 |
2 | 2004 | Dinosaur Provincial Park | 0.126866 | 134 | 17 |
3 | 2005 | Nahanni National Park | 0.141667 | 120 | 17 |
4 | 2006 | Simien National Park | 0.096886 | 289 | 28 |
It is necessary to add the missing labels for the two columns
# rename column names. One cannot do: amp_v.columns[1] = newcolumn_name
new_columns = amp_v.columns.values
new_columns[:2] = ['wdpaid', 'en_name']
amp_v.columns = new_columns
bird_v.columns = new_columns
coral_v.columns = new_columns
Graphs to show both the percentage (the number of species with FINAL SCORE =='H' against all species), and the total number of species that are highly vulnerable.
# percentage of H distribution
amp_v.H_vul_per.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x1d0b0be0>
# total number of H distribution
amp_v.total_H_vul.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x1e1715f8>
# percentage of H distribution
bird_v.H_vul_per.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x10d58b38>
# total number of H distribution
bird_v.total_H_vul.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x1ee95c50>
# percentage of H distribution
coral_v.H_vul_per.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x1eec0cf8>
# total number of H distribution
coral_v.total_H_vul.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x1dc93898>
print('Top WHs with high number of CV birds\n', bird_v.sort_values('total_H_vul', ascending=False).head(10))
print('==================')
print('Top WHs with high percentage of CV birds\n', bird_v.sort_values('H_vul_per', ascending=False).head(10))
Top WHs with high number of CV birds wdpaid en_name H_vul_per \ 92 61612 Canaima National Park 0.366987 72 17760 Manú National Park 0.266350 149 220296 Central Amazon Conservation Complex 0.370307 151 220298 Central Suriname Nature Reserve 0.411157 41 9614 Sangay National Park 0.268730 148 220295 Noel Kempff Mercado National Park 0.227979 52 10903 Talamanca Range-La Amistad Reserves / La Amist... 0.225750 15 2554 Darien National Park 0.240310 90 61610 Los Katíos National Park 0.241228 82 26651 Río Abiseo National Park 0.211240 total_vul total_H_vul 92 624 229 72 841 224 149 586 217 151 484 199 41 614 165 148 579 132 52 567 128 15 516 124 90 456 110 82 516 109 ================== Top WHs with high percentage of CV birds wdpaid en_name H_vul_per \ 120 145576 Heard and McDonald Islands 0.512195 177 902373 Ilulissat Icefjord 0.500000 204 555512003 Putorana Plateau 0.472222 198 903139 Surtsey 0.431373 128 168239 New Zealand Sub-Antarctic Islands 0.423077 151 220298 Central Suriname Nature Reserve 0.411157 61 12207 Giant's Causeway and Causeway Coast 0.400000 118 124388 Laponian Area 0.396947 184 902489 West Norwegian Fjords Geirangerfjord and Nær... 0.395522 224 555577556 Stevns Klint 0.394161 total_vul total_H_vul 120 41 21 177 36 18 204 72 34 198 51 22 128 78 33 151 484 199 61 100 40 118 131 52 184 134 53 224 137 54
print('Top WHs with high number of CV amphibians\n', amp_v.sort_values('total_H_vul', ascending=False).head(10))
print('==================')
print('Top WHs with high percentage of CV amphibian\n', amp_v.sort_values('H_vul_per', ascending=False).head(10))
Top WHs with high number of CV amphibians wdpaid en_name H_vul_per \ 49 10903 Talamanca Range-La Amistad Reserves / La Amist... 0.325926 133 220298 Central Suriname Nature Reserve 0.445783 67 17760 Manú National Park 0.342593 83 61612 Canaima National Park 0.360465 131 220296 Central Amazon Conservation Complex 0.318681 152 902347 Cape Floral Region Protected Areas 0.425532 162 903062 Rainforests of the Atsinanana 0.120253 127 220292 Kinabalu Park 0.260870 128 220293 Gunung Mulu National Park 0.293103 120 198296 Area de Conservación Guanacaste 0.326923 total_vul total_H_vul 49 135 44 133 83 37 67 108 37 83 86 31 131 91 29 152 47 20 162 158 19 127 69 18 128 58 17 120 52 17 ================== Top WHs with high percentage of CV amphibian wdpaid en_name H_vul_per total_vul \ 153 902367 Pitons Management Area 1.000000 1 181 555547992 Rock Islands Southern Lagoon 1.000000 1 173 555512003 Putorana Plateau 1.000000 1 106 124387 Volcanoes of Kamchatka 1.000000 1 110 145583 Morne Trois Pitons National Park 0.666667 3 180 555547991 Lena Pillars Nature Park 0.500000 2 148 900880 Uvs Nuur Basin 0.500000 2 96 68918 Whale Sanctuary of El Vizcaino 0.500000 2 133 220298 Central Suriname Nature Reserve 0.445783 83 152 902347 Cape Floral Region Protected Areas 0.425532 47 total_H_vul 153 1 181 1 173 1 106 1 110 2 180 1 148 1 96 1 133 37 152 20
amp_v.quantile(0.9)
wdpaid 5.555120e+08 H_vul_per 3.320513e-01 total_vul 5.800000e+01 total_H_vul 4.800000e+00 dtype: float64
amp_v.quantile(0.1)
wdpaid 2574.2 H_vul_per 0.0 total_vul 3.0 total_H_vul 0.0 dtype: float64
bird_v.quantile(0.9)
wdpaid 5.555120e+08 H_vul_per 3.503704e-01 total_vul 4.516000e+02 total_H_vul 6.720000e+01 dtype: float64
bird_v.quantile(0.1)
wdpaid 2577.800000 H_vul_per 0.058983 total_vul 56.600000 total_H_vul 9.000000 dtype: float64
For communications purposes, it may be convenient to have a single indicator/index to indicate whether a given WH site is considered climate vulnerable for some taxon (in this case amphibians). The questions then becomes how such a cut-off value can be reasonable calculated and can be meaningfully applied. One way is to, again, use quantile cut-off values for total number of highly vulnerable species and percentage overlap. However, any ranking created using quantile only compares within WH sites. Ideally, this should be derived using a global metric. Perhaps this is an important caveat. One needs to know about other taxa before making a decision as they should be consistent.
Questions:
Top10, top 50, top200
WH sites with high proportion seem to have a low total number of CV amphibians. Thus using proportion alone may not paint a full picture. It is conceivable that a combination of both number and propertion is needed to classify/rank WH sites.
While the above methods work well, similar stats for other columns need considerable manual coding. In this study, depending on the taxon of interest, there may be as high as 20 replications (a total of ~60 if all three taxa are conisidered). Therefore it is necessary to find a way through which such aggregations could be calculated simply and more scalable.
# get all fields and pick those we need for the stats
print(result_amp_f.columns)
print(result_bird_f.columns)
print(result_coral_f.columns)
Index(['SVP_ID', 'SIS_GAA_ID', 'GAA Family', 'Genus', 'Species', 'Fullname', 'Threatened', 'SUSC_A_Habitats', 'SUSC_A_aquatic larvae', 'SUSC_B_Temperature Range', 'SUSC_B_Precipitation Range', 'SUSC_C_explosive breeder', 'SUSC_D_disease', 'SENSITIVITY', 'ADAPT_A_barriers', 'ADAPT_A_dispersal_distance', 'ADAPT_C_Slow_Gen_Turnpover', 'LOW_ADAPTABILITY', 'EXP_Sea Level', 'EXP_MeanTemperature', 'EXP_MeanRainfall', 'EXP_AADTemperature', 'EXP_AADRainfall', 'EXPOSURE', 'FINAL_SCORE', 'Final_Pessimistic', 'Score_HorS', 'SensXLAdaptability', 'SensXExposure', 'LAdaptXExposure', 'Threatened_and_Vulnerable', 'Unnamed: 0', 'wdpaid', 'id_no', 'areakm2_x', 'areakm2_y', 'per', 'en_name', 'fr_name', 'status_yr', 'rep_area', 'gis_area', 'country', 'crit', 'areakm2', 'binomial', 'kingdom_name', 'phylum_name', 'class_name', 'order_name', 'family_name', 'genus_name', 'species_name', 'category', 'biome_marine', 'biome_freshwater', 'biome_terrestrial', 'id_no_int'], dtype='object') Index(['SVP_ID', 'Fullname', 'Common name', 'New seq', 'SPcRecID', '2008 RDL Cat', 'Threatened', 'FINAL_SCORE', '__hab_specialisation', '__microhabitat', '__ForestDependence', '__TemperatureRange', '__PrecipRange', '__Species dependence', '__PopnSize', '__EffectivePopnSize', 'SENSITIVITY', '__Dispersal distance limited', '__Dispersal barriers', '__Low genetic diversity', '__clutch size', '__Gen_Length', 'LOW_ADAPTABILITY', '__EXP_Sea Inundation', '__EXP_MeanTemperature', '__EXP_AADTemperature', '__EXP_MeanPrecip', '__EXP_AADPrecip', 'EXPOSURE', 'Unnamed: 0', 'wdpaid', 'id_no', 'areakm2_x', 'areakm2_y', 'per', 'en_name', 'fr_name', 'status_yr', 'rep_area', 'gis_area', 'country', 'crit', 'areakm2', 'binomial', 'kingdom_name', 'phylum_name', 'class_name', 'order_name', 'family_name', 'genus_name', 'species_name', 'category', 'biome_marine', 'biome_freshwater', 'biome_terrestrial', 'id_no_int'], dtype='object') Index(['Fullname', 'SVProject_ID', 'genus', 'species', 'Threatened', 'FINAL_SCORE', '_Few habitats', '_narrow depth rng', '_Larval vulnerabilityH', '_ShallowH', '_PastMortalityH', '_zooxH', '_Rare', 'SENSITIVITY', '_Short dispersal distance', '_dispersal barriers', '_ColonyMaxAge', '_SlowGrowthRate', 'LOW_ADAPTABILITY', '_ProportionArag<3', '_MeanDHM', 'EXPOSURE', 'Unnamed: 0', 'wdpaid', 'id_no', 'areakm2_x', 'areakm2_y', 'per', 'en_name', 'fr_name', 'status_yr', 'rep_area', 'gis_area', 'country', 'crit', 'areakm2', 'binomial', 'kingdom_name', 'phylum_name', 'class_name', 'order_name', 'family_name', 'genus_name', 'species_name', 'category', 'biome_marine', 'biome_freshwater', 'biome_terrestrial', 'id_no_int'], dtype='object')
# Selected columns, based on the df.columns for each taxon
## amphibian
a_columns = ['SUSC_A_Habitats', 'SUSC_A_aquatic larvae',
'SUSC_B_Temperature Range', 'SUSC_B_Precipitation Range',
'SUSC_C_explosive breeder', 'SUSC_D_disease', 'SENSITIVITY',
'ADAPT_A_barriers', 'ADAPT_A_dispersal_distance',
'ADAPT_C_Slow_Gen_Turnpover', 'LOW_ADAPTABILITY',
'EXP_Sea Level',
'EXP_MeanTemperature', 'EXP_MeanRainfall', 'EXP_AADTemperature',
'EXP_AADRainfall', 'EXPOSURE', 'FINAL_SCORE']
## birds
b_columns = ['FINAL_SCORE', '__hab_specialisation',
'__microhabitat', '__ForestDependence', '__TemperatureRange',
'__PrecipRange', '__Species dependence', '__PopnSize',
'__EffectivePopnSize', 'SENSITIVITY', '__Dispersal distance limited',
'__Dispersal barriers', '__Low genetic diversity', '__clutch size',
'__Gen_Length', 'LOW_ADAPTABILITY', '__EXP_Sea Inundation',
'__EXP_MeanTemperature', '__EXP_AADTemperature', '__EXP_MeanPrecip',
'__EXP_AADPrecip', 'EXPOSURE']
## corals
c_columns = ['FINAL_SCORE', '_Few habitats', '_narrow depth rng',
'_Larval vulnerabilityH', '_ShallowH', '_PastMortalityH', '_zooxH',
'_Rare', 'SENSITIVITY', '_Short dispersal distance',
'_dispersal barriers', '_ColonyMaxAge', '_SlowGrowthRate',
'LOW_ADAPTABILITY', '_ProportionArag<3', '_MeanDHM', 'EXPOSURE']
# apply summation of HLU to each column, count how many occurences of each H, L, U and percentage
def get_hlu(x):
return pd.Series({'H': (x=='H').sum(), 'L': (x=='L').sum(), 'U': (x=='U').sum(), 'per_H': (x=='H').sum()/x.size})
# apply get_hlu to each df
def apply_func(df, select_columns):
# for each column of the df apply a function
return df[select_columns].apply(get_hlu).T
# ======= GENERIC VERSION ======
# get unique value of the series and count how many occurences of each unique value
def generic_count(x):
# x is a pd.Series object
unique_values = x.unique()
return pd.Series({value: (x==value).sum() for value in unique_values})
def apply_func_mk2(df):
return df[summary_columnsy_columns].apply(generic_count).T
# ======= about 'apply' =======
## dfgroupby.apply: takes a df as argument(for each sub df defined by groupby criteria)
## dataframe.apply: takes a series as argument (for each of the columns in the df)
## series.apply: takes a value as argument (for each value of the series)
## split into df for each unique wdpaid-en_name combinations
## for each df apply the logic
amp_vv = result_amp_f.groupby(['wdpaid', 'en_name']).apply(apply_func, a_columns).reset_index()
bird_vv = result_bird_f.groupby(['wdpaid', 'en_name']).apply(apply_func, b_columns).reset_index()
coral_vv = result_coral_f.groupby(['wdpaid', 'en_name']).apply(apply_func, c_columns).reset_index()
# amp_vvv = groups.apply(apply_func_mk2).reset_index()
## spot checks
bird_vv[bird_vv.wdpaid == 191]
wdpaid | en_name | level_2 | H | L | U | per_H | |
---|---|---|---|---|---|---|---|
0 | 191 | Galápagos Islands | FINAL_SCORE | 11.0 | 74.0 | 0.0 | 0.129412 |
1 | 191 | Galápagos Islands | __hab_specialisation | 4.0 | 81.0 | 0.0 | 0.047059 |
2 | 191 | Galápagos Islands | __microhabitat | 1.0 | 84.0 | 0.0 | 0.011765 |
3 | 191 | Galápagos Islands | __ForestDependence | 8.0 | 77.0 | 0.0 | 0.094118 |
4 | 191 | Galápagos Islands | __TemperatureRange | 2.0 | 69.0 | 14.0 | 0.023529 |
5 | 191 | Galápagos Islands | __PrecipRange | 20.0 | 51.0 | 14.0 | 0.235294 |
6 | 191 | Galápagos Islands | __Species dependence | 0.0 | 85.0 | 0.0 | 0.000000 |
7 | 191 | Galápagos Islands | __PopnSize | 12.0 | 56.0 | 17.0 | 0.141176 |
8 | 191 | Galápagos Islands | __EffectivePopnSize | 15.0 | 53.0 | 17.0 | 0.176471 |
9 | 191 | Galápagos Islands | SENSITIVITY | 32.0 | 39.0 | 14.0 | 0.376471 |
10 | 191 | Galápagos Islands | __Dispersal distance limited | 24.0 | 61.0 | 0.0 | 0.282353 |
11 | 191 | Galápagos Islands | __Dispersal barriers | 18.0 | 67.0 | 0.0 | 0.211765 |
12 | 191 | Galápagos Islands | __Low genetic diversity | 6.0 | 79.0 | 0.0 | 0.070588 |
13 | 191 | Galápagos Islands | __clutch size | 27.0 | 52.0 | 6.0 | 0.317647 |
14 | 191 | Galápagos Islands | __Gen_Length | 47.0 | 38.0 | 0.0 | 0.552941 |
15 | 191 | Galápagos Islands | LOW_ADAPTABILITY | 59.0 | 25.0 | 1.0 | 0.694118 |
16 | 191 | Galápagos Islands | __EXP_Sea Inundation | 8.0 | 77.0 | 0.0 | 0.094118 |
17 | 191 | Galápagos Islands | __EXP_MeanTemperature | 19.0 | 51.0 | 15.0 | 0.223529 |
18 | 191 | Galápagos Islands | __EXP_AADTemperature | 5.0 | 65.0 | 15.0 | 0.058824 |
19 | 191 | Galápagos Islands | __EXP_MeanPrecip | 12.0 | 58.0 | 15.0 | 0.141176 |
20 | 191 | Galápagos Islands | __EXP_AADPrecip | 6.0 | 64.0 | 15.0 | 0.070588 |
21 | 191 | Galápagos Islands | EXPOSURE | 30.0 | 40.0 | 15.0 | 0.352941 |
bird_vv[bird_vv.wdpaid == 2008]
wdpaid | en_name | level_2 | H | L | U | per_H | |
---|---|---|---|---|---|---|---|
132 | 2008 | Bia?owie?a Forest | FINAL_SCORE | 60.0 | 119.0 | 0.0 | 0.335196 |
133 | 2008 | Bia?owie?a Forest | __hab_specialisation | 0.0 | 179.0 | 0.0 | 0.000000 |
134 | 2008 | Bia?owie?a Forest | __microhabitat | 12.0 | 167.0 | 0.0 | 0.067039 |
135 | 2008 | Bia?owie?a Forest | __ForestDependence | 13.0 | 166.0 | 0.0 | 0.072626 |
136 | 2008 | Bia?owie?a Forest | __TemperatureRange | 0.0 | 176.0 | 3.0 | 0.000000 |
137 | 2008 | Bia?owie?a Forest | __PrecipRange | 152.0 | 24.0 | 3.0 | 0.849162 |
138 | 2008 | Bia?owie?a Forest | __Species dependence | 0.0 | 179.0 | 0.0 | 0.000000 |
139 | 2008 | Bia?owie?a Forest | __PopnSize | 0.0 | 179.0 | 0.0 | 0.000000 |
140 | 2008 | Bia?owie?a Forest | __EffectivePopnSize | 0.0 | 179.0 | 0.0 | 0.000000 |
141 | 2008 | Bia?owie?a Forest | SENSITIVITY | 161.0 | 18.0 | 0.0 | 0.899441 |
142 | 2008 | Bia?owie?a Forest | __Dispersal distance limited | 5.0 | 174.0 | 0.0 | 0.027933 |
143 | 2008 | Bia?owie?a Forest | __Dispersal barriers | 4.0 | 175.0 | 0.0 | 0.022346 |
144 | 2008 | Bia?owie?a Forest | __Low genetic diversity | 1.0 | 178.0 | 0.0 | 0.005587 |
145 | 2008 | Bia?owie?a Forest | __clutch size | 14.0 | 164.0 | 1.0 | 0.078212 |
146 | 2008 | Bia?owie?a Forest | __Gen_Length | 66.0 | 113.0 | 0.0 | 0.368715 |
147 | 2008 | Bia?owie?a Forest | LOW_ADAPTABILITY | 76.0 | 103.0 | 0.0 | 0.424581 |
148 | 2008 | Bia?owie?a Forest | __EXP_Sea Inundation | 0.0 | 179.0 | 0.0 | 0.000000 |
149 | 2008 | Bia?owie?a Forest | __EXP_MeanTemperature | 161.0 | 15.0 | 3.0 | 0.899441 |
150 | 2008 | Bia?owie?a Forest | __EXP_AADTemperature | 0.0 | 176.0 | 3.0 | 0.000000 |
151 | 2008 | Bia?owie?a Forest | __EXP_MeanPrecip | 0.0 | 176.0 | 3.0 | 0.000000 |
152 | 2008 | Bia?owie?a Forest | __EXP_AADPrecip | 0.0 | 176.0 | 3.0 | 0.000000 |
153 | 2008 | Bia?owie?a Forest | EXPOSURE | 161.0 | 15.0 | 3.0 | 0.899441 |
result_amp_f[result_amp_f.wdpaid == 191]
SVP_ID | SIS_GAA_ID | GAA Family | Genus | Species | Fullname | Threatened | SUSC_A_Habitats | SUSC_A_aquatic larvae | SUSC_B_Temperature Range | ... | class_name | order_name | family_name | genus_name | species_name | category | biome_marine | biome_freshwater | biome_terrestrial | id_no_int |
---|
0 rows × 58 columns
amp_vv[amp_vv.wdpaid == 191]
coral_vv[coral_vv.wdpaid == 191]
wdpaid | en_name | level_2 | H | L | U | per_H | |
---|---|---|---|---|---|---|---|
0 | 191 | Galápagos Islands | FINAL_SCORE | 2.0 | 19.0 | 0.0 | 0.095238 |
1 | 191 | Galápagos Islands | _Few habitats | 4.0 | 17.0 | 0.0 | 0.190476 |
2 | 191 | Galápagos Islands | _narrow depth rng | 4.0 | 16.0 | 1.0 | 0.190476 |
3 | 191 | Galápagos Islands | _Larval vulnerabilityH | 2.0 | 19.0 | 0.0 | 0.095238 |
4 | 191 | Galápagos Islands | _ShallowH | 4.0 | 16.0 | 1.0 | 0.190476 |
5 | 191 | Galápagos Islands | _PastMortalityH | 11.0 | 10.0 | 0.0 | 0.523810 |
6 | 191 | Galápagos Islands | _zooxH | 15.0 | 6.0 | 0.0 | 0.714286 |
7 | 191 | Galápagos Islands | _Rare | 3.0 | 18.0 | 0.0 | 0.142857 |
8 | 191 | Galápagos Islands | SENSITIVITY | 21.0 | 0.0 | 0.0 | 1.000000 |
9 | 191 | Galápagos Islands | _Short dispersal distance | 0.0 | 18.0 | 3.0 | 0.000000 |
10 | 191 | Galápagos Islands | _dispersal barriers | 3.0 | 18.0 | 0.0 | 0.142857 |
11 | 191 | Galápagos Islands | _ColonyMaxAge | 0.0 | 21.0 | 0.0 | 0.000000 |
12 | 191 | Galápagos Islands | _SlowGrowthRate | 7.0 | 14.0 | 0.0 | 0.333333 |
13 | 191 | Galápagos Islands | LOW_ADAPTABILITY | 8.0 | 13.0 | 0.0 | 0.380952 |
14 | 191 | Galápagos Islands | _ProportionArag<3 | 3.0 | 18.0 | 0.0 | 0.142857 |
15 | 191 | Galápagos Islands | _MeanDHM | 1.0 | 19.0 | 1.0 | 0.047619 |
16 | 191 | Galápagos Islands | EXPOSURE | 3.0 | 18.0 | 0.0 | 0.142857 |
Having more CCV species, having more CCV species in proportion indicates WH sites, whose species are more susceptible to adverse impact due to climate change.
Questions and statements that can be made?
# dump for database
amp_vv.to_csv('agg_amp.csv')
bird_vv.to_csv('agg_bird.csv')
coral_vv.to_csv('agg_coral.csv')
result_amp_f.to_csv('wh_amp.csv')
result_bird_f.to_csv('wh_bird.csv')
result_coral_f.to_csv('wh_coral.csv')
This section looks at individual taxon within the entire network. A comparison is then made to examine whether or not the WH network house species that are more vulnerable to climate change. Because the result contains many duplicate species rows (due to different sites having the same species), the first step is to get a unique list of species within all WH sites.
# combine species in WH result and all results, by taxa
## get a unique set of species within each taxon
amp_selected = result_amp_f.groupby('id_no_int').first().reset_index()
bird_selected = result_bird_f.groupby('id_no_int').first().reset_index()
coral_selected = result_coral_f.groupby('id_no_int').first().reset_index()
## function to aggregate result and apply hlu function to all columns
def get_agg_result(unique_taxon_result, taxon_columns):
return unique_taxon_result[taxon_columns].apply(get_hlu).T
amp_total = pd.merge(get_agg_result(amp_selected, a_columns), \
get_agg_result(amp, a_columns),
left_index = True, right_index=True, suffixes=('_wh', '_all'))
bird_total = pd.merge(get_agg_result(bird_selected, b_columns), \
get_agg_result(bird, b_columns),
left_index = True, right_index=True, suffixes=('_wh', '_all'))
coral_total = pd.merge(get_agg_result(coral_selected, c_columns), \
get_agg_result(coral, c_columns),
left_index = True, right_index=True, suffixes=('_wh', '_all'))
amp_total
H_wh | L_wh | U_wh | per_H_wh | H_all | L_all | U_all | per_H_all | |
---|---|---|---|---|---|---|---|---|
SUSC_A_Habitats | 220.0 | 1783.0 | 10.0 | 0.109290 | 1509.0 | 4539.0 | 156.0 | 0.243230 |
SUSC_A_aquatic larvae | 330.0 | 1673.0 | 10.0 | 0.163934 | 955.0 | 5085.0 | 164.0 | 0.153933 |
SUSC_B_Temperature Range | 386.0 | 1626.0 | 1.0 | 0.191754 | 1519.0 | 4557.0 | 128.0 | 0.244842 |
SUSC_B_Precipitation Range | 479.0 | 1533.0 | 1.0 | 0.237953 | 1519.0 | 4557.0 | 128.0 | 0.244842 |
SUSC_C_explosive breeder | 162.0 | 1629.0 | 222.0 | 0.080477 | 316.0 | 4113.0 | 1775.0 | 0.050935 |
SUSC_D_disease | 436.0 | 1577.0 | 0.0 | 0.216592 | 1307.0 | 4897.0 | 0.0 | 0.210671 |
SENSITIVITY | 1340.0 | 606.0 | 67.0 | 0.665673 | 4453.0 | 1365.0 | 386.0 | 0.717763 |
ADAPT_A_barriers | 183.0 | 1661.0 | 169.0 | 0.090909 | 745.0 | 3900.0 | 1559.0 | 0.120084 |
ADAPT_A_dispersal_distance | 501.0 | 1493.0 | 19.0 | 0.248882 | 1569.0 | 4522.0 | 113.0 | 0.252901 |
ADAPT_C_Slow_Gen_Turnpover | 609.0 | 586.0 | 818.0 | 0.302534 | 2073.0 | 899.0 | 3232.0 | 0.334139 |
LOW_ADAPTABILITY | 997.0 | 1007.0 | 9.0 | 0.495281 | 3233.0 | 2898.0 | 73.0 | 0.521115 |
EXP_Sea Level | 0.0 | 2003.0 | 10.0 | 0.000000 | 4.0 | 6044.0 | 156.0 | 0.000645 |
EXP_MeanTemperature | 315.0 | 1697.0 | 1.0 | 0.156483 | 1515.0 | 4544.0 | 145.0 | 0.244197 |
EXP_MeanRainfall | 503.0 | 1509.0 | 1.0 | 0.249876 | 1515.0 | 4544.0 | 145.0 | 0.244197 |
EXP_AADTemperature | 212.0 | 1800.0 | 1.0 | 0.105315 | 1498.0 | 4561.0 | 145.0 | 0.241457 |
EXP_AADRainfall | 371.0 | 1641.0 | 1.0 | 0.184302 | 1515.0 | 4544.0 | 145.0 | 0.244197 |
EXPOSURE | 864.0 | 1145.0 | 4.0 | 0.429210 | 3356.0 | 2642.0 | 206.0 | 0.540941 |
FINAL_SCORE | 301.0 | 1712.0 | 0.0 | 0.149528 | 1368.0 | 4836.0 | 0.0 | 0.220503 |
a = amp_total[['H_wh', 'L_wh', 'U_wh']].sum(1)/amp_total[['H_all', 'L_all', 'U_all']].sum(1)
b = pd.concat([amp_total.H_wh/amp_total.H_all, a], 1)
b.columns = ['H_per', 'total_per']
b
# (amp_total.H_wh + amp_total.L_wh + amp_total.U_wh)/(amp_total.H_all + amp_total.L_all + amp_total.U_all)
H_per | total_per | |
---|---|---|
SUSC_A_Habitats | 0.145792 | 0.324468 |
SUSC_A_aquatic larvae | 0.345550 | 0.324468 |
SUSC_B_Temperature Range | 0.254115 | 0.324468 |
SUSC_B_Precipitation Range | 0.315339 | 0.324468 |
SUSC_C_explosive breeder | 0.512658 | 0.324468 |
SUSC_D_disease | 0.333588 | 0.324468 |
SENSITIVITY | 0.300921 | 0.324468 |
ADAPT_A_barriers | 0.245638 | 0.324468 |
ADAPT_A_dispersal_distance | 0.319312 | 0.324468 |
ADAPT_C_Slow_Gen_Turnpover | 0.293777 | 0.324468 |
LOW_ADAPTABILITY | 0.308382 | 0.324468 |
EXP_Sea Level | 0.000000 | 0.324468 |
EXP_MeanTemperature | 0.207921 | 0.324468 |
EXP_MeanRainfall | 0.332013 | 0.324468 |
EXP_AADTemperature | 0.141522 | 0.324468 |
EXP_AADRainfall | 0.244884 | 0.324468 |
EXPOSURE | 0.257449 | 0.324468 |
FINAL_SCORE | 0.220029 | 0.324468 |
It appears that also fewer proportion of amphibians in WH sites are climate vulnerable
bird_total
H_wh | L_wh | U_wh | per_H_wh | H_all | L_all | U_all | per_H_all | |
---|---|---|---|---|---|---|---|---|
FINAL_SCORE | 1336.0 | 5578.0 | 0.0 | 0.193231 | 2323.0 | 7533.0 | 0.0 | 0.235694 |
__hab_specialisation | 849.0 | 6059.0 | 6.0 | 0.122794 | 1530.0 | 8306.0 | 20.0 | 0.155235 |
__microhabitat | 703.0 | 6211.0 | 0.0 | 0.101678 | 1001.0 | 8855.0 | 0.0 | 0.101562 |
__ForestDependence | 1527.0 | 5386.0 | 1.0 | 0.220856 | 2575.0 | 7277.0 | 4.0 | 0.261262 |
__TemperatureRange | 1068.0 | 4401.0 | 1445.0 | 0.154469 | 1974.0 | 6118.0 | 1764.0 | 0.200284 |
__PrecipRange | 1414.0 | 4055.0 | 1445.0 | 0.204513 | 2095.0 | 5997.0 | 1764.0 | 0.212561 |
__Species dependence | 66.0 | 6848.0 | 0.0 | 0.009546 | 89.0 | 9767.0 | 0.0 | 0.009030 |
__PopnSize | 393.0 | 1794.0 | 4727.0 | 0.056841 | 1084.0 | 2319.0 | 6453.0 | 0.109984 |
__EffectivePopnSize | 558.0 | 1629.0 | 4727.0 | 0.080706 | 1410.0 | 1993.0 | 6453.0 | 0.143060 |
SENSITIVITY | 4022.0 | 585.0 | 2307.0 | 0.581718 | 6290.0 | 719.0 | 2847.0 | 0.638190 |
__Dispersal distance limited | 1180.0 | 5734.0 | 0.0 | 0.170668 | 1993.0 | 7863.0 | 0.0 | 0.202212 |
__Dispersal barriers | 314.0 | 6600.0 | 0.0 | 0.045415 | 700.0 | 9156.0 | 0.0 | 0.071023 |
__Low genetic diversity | 41.0 | 6873.0 | 0.0 | 0.005930 | 69.0 | 9787.0 | 0.0 | 0.007001 |
__clutch size | 1760.0 | 3213.0 | 1941.0 | 0.254556 | 2414.0 | 3946.0 | 3496.0 | 0.244927 |
__Gen_Length | 1730.0 | 5184.0 | 0.0 | 0.250217 | 2500.0 | 7356.0 | 0.0 | 0.253653 |
LOW_ADAPTABILITY | 3576.0 | 2092.0 | 1246.0 | 0.517211 | 5337.0 | 2507.0 | 2012.0 | 0.541498 |
__EXP_Sea Inundation | 100.0 | 6808.0 | 6.0 | 0.014463 | 163.0 | 9673.0 | 20.0 | 0.016538 |
__EXP_MeanTemperature | 1227.0 | 4222.0 | 1465.0 | 0.177466 | 1921.0 | 6066.0 | 1869.0 | 0.194907 |
__EXP_AADTemperature | 944.0 | 4505.0 | 1465.0 | 0.136535 | 1925.0 | 6062.0 | 1869.0 | 0.195312 |
__EXP_MeanPrecip | 1180.0 | 4269.0 | 1465.0 | 0.170668 | 1998.0 | 5989.0 | 1869.0 | 0.202719 |
__EXP_AADPrecip | 1070.0 | 4379.0 | 1465.0 | 0.154758 | 2152.0 | 5835.0 | 1869.0 | 0.218344 |
EXPOSURE | 3053.0 | 2406.0 | 1455.0 | 0.441568 | 4920.0 | 3082.0 | 1854.0 | 0.499188 |
coral_total
H_wh | L_wh | U_wh | per_H_wh | H_all | L_all | U_all | per_H_all | |
---|---|---|---|---|---|---|---|---|
FINAL_SCORE | 111.0 | 582.0 | 31.0 | 0.153315 | 242.0 | 1198.0 | 154.0 | 0.151819 |
_Few habitats | 160.0 | 564.0 | 0.0 | 0.220994 | 384.0 | 1210.0 | 0.0 | 0.240903 |
_narrow depth rng | 169.0 | 529.0 | 26.0 | 0.233425 | 384.0 | 1140.0 | 70.0 | 0.240903 |
_Larval vulnerabilityH | 133.0 | 589.0 | 2.0 | 0.183702 | 274.0 | 1316.0 | 4.0 | 0.171895 |
_ShallowH | 172.0 | 527.0 | 25.0 | 0.237569 | 376.0 | 1156.0 | 62.0 | 0.235885 |
_PastMortalityH | 312.0 | 412.0 | 0.0 | 0.430939 | 644.0 | 950.0 | 0.0 | 0.404015 |
_zooxH | 669.0 | 54.0 | 1.0 | 0.924033 | 1478.0 | 114.0 | 2.0 | 0.927227 |
_Rare | 144.0 | 575.0 | 5.0 | 0.198895 | 392.0 | 1190.0 | 12.0 | 0.245922 |
SENSITIVITY | 723.0 | 1.0 | 0.0 | 0.998619 | 1592.0 | 2.0 | 0.0 | 0.998745 |
_Short dispersal distance | 65.0 | 491.0 | 168.0 | 0.089779 | 144.0 | 1042.0 | 408.0 | 0.090339 |
_dispersal barriers | 83.0 | 635.0 | 6.0 | 0.114641 | 234.0 | 1338.0 | 22.0 | 0.146801 |
_ColonyMaxAge | 11.0 | 707.0 | 6.0 | 0.015193 | 26.0 | 1546.0 | 22.0 | 0.016311 |
_SlowGrowthRate | 263.0 | 455.0 | 6.0 | 0.363260 | 586.0 | 990.0 | 18.0 | 0.367629 |
LOW_ADAPTABILITY | 366.0 | 356.0 | 2.0 | 0.505525 | 840.0 | 746.0 | 8.0 | 0.526976 |
_ProportionArag<3 | 161.0 | 521.0 | 42.0 | 0.222376 | 354.0 | 1058.0 | 182.0 | 0.222083 |
_MeanDHM | 170.0 | 513.0 | 41.0 | 0.234807 | 368.0 | 1036.0 | 190.0 | 0.230866 |
EXPOSURE | 249.0 | 445.0 | 30.0 | 0.343923 | 542.0 | 894.0 | 158.0 | 0.340025 |
Amphibians and birds in WH sites have consistently lower number of CCV species in proportion, compared to all ampbibians (14.9% to 22%) and birds (19.3% and 23.5%). Lower proportions are also observed in sensitivity, low adaptability and exposure.
Question: what does it mean? what statements can be realistically made?
Corals show a consistent trend in proportion in terms of vulnerability, and SAE.
Test statistics for significance
per region (IUCN/UNESCO regions?) / biodiversity sites
amp_total
H_wh | L_wh | U_wh | per_H_wh | H_all | L_all | U_all | per_H_all | |
---|---|---|---|---|---|---|---|---|
SUSC_A_Habitats | 220.0 | 1783.0 | 10.0 | 0.109290 | 1509.0 | 4539.0 | 156.0 | 0.243230 |
SUSC_A_aquatic larvae | 330.0 | 1673.0 | 10.0 | 0.163934 | 955.0 | 5085.0 | 164.0 | 0.153933 |
SUSC_B_Temperature Range | 386.0 | 1626.0 | 1.0 | 0.191754 | 1519.0 | 4557.0 | 128.0 | 0.244842 |
SUSC_B_Precipitation Range | 479.0 | 1533.0 | 1.0 | 0.237953 | 1519.0 | 4557.0 | 128.0 | 0.244842 |
SUSC_C_explosive breeder | 162.0 | 1629.0 | 222.0 | 0.080477 | 316.0 | 4113.0 | 1775.0 | 0.050935 |
SUSC_D_disease | 436.0 | 1577.0 | 0.0 | 0.216592 | 1307.0 | 4897.0 | 0.0 | 0.210671 |
SENSITIVITY | 1340.0 | 606.0 | 67.0 | 0.665673 | 4453.0 | 1365.0 | 386.0 | 0.717763 |
ADAPT_A_barriers | 183.0 | 1661.0 | 169.0 | 0.090909 | 745.0 | 3900.0 | 1559.0 | 0.120084 |
ADAPT_A_dispersal_distance | 501.0 | 1493.0 | 19.0 | 0.248882 | 1569.0 | 4522.0 | 113.0 | 0.252901 |
ADAPT_C_Slow_Gen_Turnpover | 609.0 | 586.0 | 818.0 | 0.302534 | 2073.0 | 899.0 | 3232.0 | 0.334139 |
LOW_ADAPTABILITY | 997.0 | 1007.0 | 9.0 | 0.495281 | 3233.0 | 2898.0 | 73.0 | 0.521115 |
EXP_Sea Level | 0.0 | 2003.0 | 10.0 | 0.000000 | 4.0 | 6044.0 | 156.0 | 0.000645 |
EXP_MeanTemperature | 315.0 | 1697.0 | 1.0 | 0.156483 | 1515.0 | 4544.0 | 145.0 | 0.244197 |
EXP_MeanRainfall | 503.0 | 1509.0 | 1.0 | 0.249876 | 1515.0 | 4544.0 | 145.0 | 0.244197 |
EXP_AADTemperature | 212.0 | 1800.0 | 1.0 | 0.105315 | 1498.0 | 4561.0 | 145.0 | 0.241457 |
EXP_AADRainfall | 371.0 | 1641.0 | 1.0 | 0.184302 | 1515.0 | 4544.0 | 145.0 | 0.244197 |
EXPOSURE | 864.0 | 1145.0 | 4.0 | 0.429210 | 3356.0 | 2642.0 | 206.0 | 0.540941 |
FINAL_SCORE | 301.0 | 1712.0 | 0.0 | 0.149528 | 1368.0 | 4836.0 | 0.0 | 0.220503 |
Is it possible that the statistical differences observed simply reflect the fact that WH network happens to draw a sample from the pool of all species and that the difference is only due to random sampling and nothing else. In other words, the difference is not because of WH capturing species of specific traits but random sampling.
print('amp:', amp.index.size, 'in WH', amp_selected.index.size)
print('bird:', bird.index.size, 'in WH', bird_selected.index.size)
print('coral:', coral_unique.index.size, 'in WH', coral_selected.index.size)
amp: 6204 in WH 2013 bird: 9856 in WH 6914 coral: 797 in WH 724
amp_selected.head()
id_no_int | SVP_ID | SIS_GAA_ID | GAA Family | Genus | Species | Fullname | Threatened | SUSC_A_Habitats | SUSC_A_aquatic larvae | ... | phylum_name | class_name | order_name | family_name | genus_name | species_name | category | biome_marine | biome_freshwater | biome_terrestrial | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 520.0 | 4556 | 2661 | LIMNODYNASTIDAE | Adelotus | brevis | Adelotus brevis | NaN | L | L | ... | CHORDATA | AMPHIBIA | ANURA | LIMNODYNASTIDAE | Adelotus | brevis | NT | f | t | t |
1 | 596.0 | 4760 | 100011 | HYPEROLIIDAE | Afrixalus | aureus | Afrixalus aureus | NaN | L | H | ... | CHORDATA | AMPHIBIA | ANURA | HYPEROLIIDAE | Afrixalus | aureus | LC | f | t | t |
2 | 1272.0 | 3629 | 51121 | CRYPTOBRANCHIDAE | Andrias | davidianus | Andrias davidianus | Y | L | L | ... | CHORDATA | AMPHIBIA | CAUDATA | CRYPTOBRANCHIDAE | Andrias | davidianus | CR | f | t | f |
3 | 1282.0 | 3644 | 13093 | PLETHODONTIDAE | Aneides | aeneus | Aneides aeneus | NaN | H | L | ... | CHORDATA | AMPHIBIA | CAUDATA | PLETHODONTIDAE | Aneides | aeneus | NT | f | f | t |
4 | 2865.0 | 3109 | 15108 | BOMBINATORIDAE | Bombina | bombina | Bombina bombina | NaN | L | L | ... | CHORDATA | AMPHIBIA | ANURA | BOMBINATORIDAE | Bombina | bombina | LC | f | t | t |
5 rows × 58 columns
If I get repeated random samples from the pool, calculate the H_per value and then plot them...
# get a random sample of 2013 repeatedly and calculate H_per
# ===== explanation
# ## get a random sample without replacement
# a = amp[amp.SVP_ID.isin(np.random.choice(amp.SVP_ID, 2013, replace=False))]
# ## apply HLU and get the final score
# a[a_columns].apply(get_hlu).T.ix['FINAL_SCORE'].per_H
test_a = [amp[amp.SVP_ID.isin(np.random.choice(amp.SVP_ID, 2013, replace=True))][a_columns].apply(get_hlu).T.ix['FINAL_SCORE'].per_H\
for i in range(1000)]
sns.distplot(test_a)
C:\Users\yichuans\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
<matplotlib.axes._subplots.AxesSubplot at 0x26eb8208>
amp_total.ix['FINAL_SCORE'].per_H_wh
0.14952806756085443
It is quite clear the WH result is an outlier - a signficant difference from the empirical distribution.
Try birds and corals
test_b = [bird[bird.SVP_ID.isin(np.random.choice(bird.SVP_ID, 6914, replace=True))][b_columns].apply(get_hlu).T.ix['FINAL_SCORE'].per_H\
for i in range(1000)]
sns.distplot(test_b)
C:\Users\yichuans\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
<matplotlib.axes._subplots.AxesSubplot at 0x1deba358>
bird_total.ix['FINAL_SCORE'].per_H_wh
0.19323112525310962
Significant result for birds too
test_c = [coral_unique[coral_unique.Fullname.isin(np.random.choice(coral_unique.Fullname, 724, replace=True))][c_columns].apply(get_hlu).T.ix['FINAL_SCORE'].per_H\
for i in range(1000)]
sns.distplot(test_c)
C:\Users\yichuans\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
<matplotlib.axes._subplots.AxesSubplot at 0x20ed1908>
coral_total.ix['FINAL_SCORE'].per_H_wh
0.15331491712707182
sns.distplot(test_c)
C:\Users\yichuans\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
<matplotlib.axes._subplots.AxesSubplot at 0x26efcb00>
# a function to plot all columns using stacked horizontal barchart
def plot_comparison(wh_species, all_species, columns):
my_colors = ['#b2182b', '#ffffff', '#e0e0e0']
col = len(columns)
fig, axes = plt.subplots(col, figsize=(15, 2*col)) # len and heihgt
fig.subplots_adjust(hspace = 0.5)
# get agg count of HLUso
a = get_agg_result(wh_species, columns)
b = get_agg_result(all_species, columns)
for i, column in enumerate(columns):
data = pd.concat([a[['H', 'L', 'U']].loc[column],
b[['H', 'L', 'U']].loc[column]], 1).T
data.index = ['Species in WH sites', 'All species']
c = data.plot.barh(ax=axes[i], stacked=True, color=my_colors)
axes[i].set_xlim([0, all_species.index.size*1.2])
axes[i].set_title(column)
# annotation
for bar, y_axis in zip([a,b], range(2)):
c.annotate('{0:.0f}'.format(bar.loc[column].H) + ', ' + '{0:.0f}'.format(bar.loc[column].H + bar.loc[column].L +bar.loc[column].U) + \
', ' + '{0:.1%}'.format(bar.loc[column].per_H),
(bar.loc[column].sum() + 10, y_axis))
return fig
# a function to plot all columns using stacked horizontal barchart
def plot_comparison_sle(wh_species, all_species, columns = ['FINAL_SCORE', 'SENSITIVITY', 'LOW_ADAPTABILITY', 'EXPOSURE']):
my_colors = ['#b2182b', '#ffffff', '#e0e0e0']
col = len(columns)
fig, axes = plt.subplots(col, figsize=(15, 2*col)) # len and heihgt
fig.subplots_adjust(hspace = 0.5)
# get agg count of HLUso
a = get_agg_result(wh_species, columns)
b = get_agg_result(all_species, columns)
for i, column in enumerate(columns):
data = pd.concat([a[['H', 'L', 'U']].loc[column],
b[['H', 'L', 'U']].loc[column]], 1).T
data.index = ['Species in WH sites', 'All species']
c = data.plot.barh(ax=axes[i], stacked=True, color=my_colors)
axes[i].set_xlim([0, all_species.index.size*1.2])
axes[i].set_title(column)
# annotation
for bar, y_axis in zip([a,b], range(2)):
c.annotate('{0:.0f}'.format(bar.loc[column].H) + ', ' + '{0:.0f}'.format(bar.loc[column].H + bar.loc[column].L +bar.loc[column].U) + \
', ' + '{0:.1%}'.format(bar.loc[column].per_H),
(bar.loc[column].sum() + 10, y_axis))
return fig
Detailed comparisons across traits and exposures for amphibians
bird[['FINAL_SCORE', 'SENSITIVITY', 'LOW_ADAPTABILITY', 'EXPOSURE']]
FINAL_SCORE | SENSITIVITY | LOW_ADAPTABILITY | EXPOSURE | |
---|---|---|---|---|
0 | H | H | H | H |
1 | L | H | L | U |
2 | L | H | H | U |
3 | L | H | L | U |
4 | L | H | U | L |
5 | L | H | H | L |
6 | H | H | H | H |
7 | L | U | L | H |
8 | L | H | L | L |
9 | L | H | L | L |
10 | L | H | L | H |
11 | L | H | H | U |
12 | L | U | L | U |
13 | L | H | H | L |
14 | H | H | H | H |
15 | H | H | H | H |
16 | L | U | L | U |
17 | L | H | H | L |
18 | L | U | L | U |
19 | L | U | L | U |
20 | L | U | L | U |
21 | H | H | H | H |
22 | L | H | L | L |
23 | L | H | L | H |
24 | H | H | H | H |
25 | L | L | H | L |
26 | L | L | H | L |
27 | H | H | H | H |
28 | H | H | H | H |
29 | H | H | H | H |
... | ... | ... | ... | ... |
9826 | L | H | U | H |
9827 | H | H | H | H |
9828 | H | H | H | H |
9829 | H | H | H | H |
9830 | L | H | H | U |
9831 | H | H | H | H |
9832 | L | H | U | U |
9833 | L | H | H | U |
9834 | L | H | H | U |
9835 | H | H | H | H |
9836 | L | H | L | H |
9837 | L | U | L | L |
9838 | L | U | L | L |
9839 | H | H | H | H |
9840 | H | H | H | H |
9841 | L | H | H | U |
9842 | H | H | H | H |
9843 | L | H | H | U |
9844 | L | H | U | H |
9845 | L | H | U | H |
9846 | L | U | L | L |
9847 | H | H | H | H |
9848 | L | H | H | U |
9849 | L | H | U | H |
9850 | L | H | H | U |
9851 | L | H | U | L |
9852 | H | H | H | H |
9853 | H | H | H | H |
9854 | L | H | U | H |
9855 | H | H | H | H |
9856 rows × 4 columns
# amp
plot_comparison(amp_selected, amp, a_columns);
a = plot_comparison_sle(amp_selected, amp);
a.savefig('wh_network_amp.png', dpi=100, bbox_inches='tight')
Detailed comparisons across traits and exposures for birds
# bird
plot_comparison(bird_selected, bird, b_columns);
a = plot_comparison_sle(bird_selected, bird);
a.savefig('wh_network_bird.png', dpi=100, bbox_inches='tight')
[!Insignificant findings] Detailed comparisons across traits and exposures for amphibians
# coral
plot_comparison(coral_selected, coral_unique, c_columns);
It is quite interesting to further examine subsets of WH sites, for example, by regions and by criteria, with a view to answering questions such as a) do biodiversity WH sites have more climate vulnerable species b) do WH sites in asian regions host more CV species?
type(amp_selected.crit)
pandas.core.series.Series
# try testing regular expression and `pd.series.str.match`
pattern = r'.*x.*' # crit having character `x` indicates a biodiversity WH site
test_matching = pd.concat([amp_selected.crit, amp_selected.crit.str.match(pattern)], 1)
# get a filter for biodi
amp_wh_bio_filter = amp_selected.crit.str.match(pattern)
bird_wh_bio_filter = bird_selected.crit.str.match(pattern)
coral_wh_bio_filter = coral_selected.crit.str.match(pattern)
# reduction in the number of WH sites
print('all WH', 'only bWH', 'non bWH')
print(amp_selected.wdpaid.unique().size, amp_selected[amp_wh_bio_filter].wdpaid.unique().size, amp_selected[~amp_wh_bio_filter].wdpaid.unique().size,)
print(bird_selected.wdpaid.unique().size, bird_selected[bird_wh_bio_filter].wdpaid.unique().size, bird_selected[~bird_wh_bio_filter].wdpaid.unique().size)
print(coral_selected.wdpaid.unique().size, coral_selected[coral_wh_bio_filter].wdpaid.unique().size)
all WH only bWH non bWH 150 116 34 184 147 37 26 23
# do comparison between subset of
amp_wh_bio = pd.merge(amp_selected[amp_wh_bio_filter][a_columns].apply(get_hlu).T, \
amp_selected[~amp_wh_bio_filter][a_columns].apply(get_hlu).T, \
left_index = True, right_index=True, suffixes=('_bio', '_wh'))
bird_wh_bio = pd.merge(bird_selected[bird_wh_bio_filter][b_columns].apply(get_hlu).T, \
bird_selected[~bird_wh_bio_filter][b_columns].apply(get_hlu).T, \
left_index = True, right_index=True, suffixes=('_bio', '_wh'))
coral_wh_bio = pd.merge(coral_selected[coral_wh_bio_filter][c_columns].apply(get_hlu).T, \
coral_selected[c_columns].apply(get_hlu).T, \
left_index = True, right_index=True, suffixes=('_bio', '_wh'))
def plot_comparison_sle_bio(wh_species, all_species, columns = ['FINAL_SCORE', 'SENSITIVITY', 'LOW_ADAPTABILITY', 'EXPOSURE']):
"""<non-bio>, <bio WH sites>"""
my_colors = ['#b2182b', '#ffffff', '#e0e0e0']
col = len(columns)
fig, axes = plt.subplots(col, figsize=(15, 2*col)) # len and heihgt
fig.subplots_adjust(hspace = 0.5)
# get agg count of HLUso
a = get_agg_result(wh_species, columns)
b = get_agg_result(all_species, columns)
for i, column in enumerate(columns):
data = pd.concat([a[['H', 'L', 'U']].loc[column],
b[['H', 'L', 'U']].loc[column]], 1).T
data.index = ['Other natural WH sites', 'Biodiversity WH sites']
c = data.plot.barh(ax=axes[i], stacked=True, color=my_colors)
axes[i].set_xlim([0, all_species.index.size*1.2])
axes[i].set_title(column)
# annotation
for bar, y_axis in zip([a,b], range(2)):
c.annotate('{0:.0f}'.format(bar.loc[column].H) + ', ' + '{0:.0f}'.format(bar.loc[column].H + bar.loc[column].L +bar.loc[column].U) + \
', ' + '{0:.1%}'.format(bar.loc[column].per_H),
(bar.loc[column].sum() + 10, y_axis))
return fig
a = plot_comparison_sle_bio(amp_selected[~amp_wh_bio_filter], amp_selected[amp_wh_bio_filter]);
a.savefig('network_bio_amp.png', dpi=100, bbox_inches='tight')
amp_wh_bio
H_bio | L_bio | U_bio | per_H_bio | H_wh | L_wh | U_wh | per_H_wh | |
---|---|---|---|---|---|---|---|---|
SUSC_A_Habitats | 215.0 | 1664.0 | 9.0 | 0.113877 | 5.0 | 119.0 | 1.0 | 0.040 |
SUSC_A_aquatic larvae | 293.0 | 1586.0 | 9.0 | 0.155191 | 37.0 | 87.0 | 1.0 | 0.296 |
SUSC_B_Temperature Range | 385.0 | 1502.0 | 1.0 | 0.203919 | 1.0 | 124.0 | 0.0 | 0.008 |
SUSC_B_Precipitation Range | 423.0 | 1464.0 | 1.0 | 0.224047 | 56.0 | 69.0 | 0.0 | 0.448 |
SUSC_C_explosive breeder | 151.0 | 1526.0 | 211.0 | 0.079979 | 11.0 | 103.0 | 11.0 | 0.088 |
SUSC_D_disease | 412.0 | 1476.0 | 0.0 | 0.218220 | 24.0 | 101.0 | 0.0 | 0.192 |
SENSITIVITY | 1260.0 | 563.0 | 65.0 | 0.667373 | 80.0 | 43.0 | 2.0 | 0.640 |
ADAPT_A_barriers | 174.0 | 1548.0 | 166.0 | 0.092161 | 9.0 | 113.0 | 3.0 | 0.072 |
ADAPT_A_dispersal_distance | 484.0 | 1385.0 | 19.0 | 0.256356 | 17.0 | 108.0 | 0.0 | 0.136 |
ADAPT_C_Slow_Gen_Turnpover | 584.0 | 532.0 | 772.0 | 0.309322 | 25.0 | 54.0 | 46.0 | 0.200 |
LOW_ADAPTABILITY | 957.0 | 922.0 | 9.0 | 0.506886 | 40.0 | 85.0 | 0.0 | 0.320 |
EXP_Sea Level | 0.0 | 1879.0 | 9.0 | 0.000000 | 0.0 | 124.0 | 1.0 | 0.000 |
EXP_MeanTemperature | 273.0 | 1614.0 | 1.0 | 0.144597 | 42.0 | 83.0 | 0.0 | 0.336 |
EXP_MeanRainfall | 500.0 | 1387.0 | 1.0 | 0.264831 | 3.0 | 122.0 | 0.0 | 0.024 |
EXP_AADTemperature | 201.0 | 1686.0 | 1.0 | 0.106462 | 11.0 | 114.0 | 0.0 | 0.088 |
EXP_AADRainfall | 364.0 | 1523.0 | 1.0 | 0.192797 | 7.0 | 118.0 | 0.0 | 0.056 |
EXPOSURE | 814.0 | 1070.0 | 4.0 | 0.431144 | 50.0 | 75.0 | 0.0 | 0.400 |
FINAL_SCORE | 291.0 | 1597.0 | 0.0 | 0.154131 | 10.0 | 115.0 | 0.0 | 0.080 |
a = plot_comparison_sle_bio(bird_selected[~bird_wh_bio_filter], bird_selected[bird_wh_bio_filter]);
a.savefig('network_bio_bird.png', dpi=100, bbox_inches='tight')
bird_wh_bio
H_bio | L_bio | U_bio | per_H_bio | H_wh | L_wh | U_wh | per_H_wh | |
---|---|---|---|---|---|---|---|---|
FINAL_SCORE | 1208.0 | 4834.0 | 0.0 | 0.199934 | 128.0 | 744.0 | 0.0 | 0.146789 |
__hab_specialisation | 777.0 | 5259.0 | 6.0 | 0.128600 | 72.0 | 800.0 | 0.0 | 0.082569 |
__microhabitat | 644.0 | 5398.0 | 0.0 | 0.106587 | 59.0 | 813.0 | 0.0 | 0.067661 |
__ForestDependence | 1419.0 | 4622.0 | 1.0 | 0.234856 | 108.0 | 764.0 | 0.0 | 0.123853 |
__TemperatureRange | 1062.0 | 3779.0 | 1201.0 | 0.175770 | 6.0 | 622.0 | 244.0 | 0.006881 |
__PrecipRange | 1046.0 | 3795.0 | 1201.0 | 0.173121 | 368.0 | 260.0 | 244.0 | 0.422018 |
__Species dependence | 66.0 | 5976.0 | 0.0 | 0.010924 | 0.0 | 872.0 | 0.0 | 0.000000 |
__PopnSize | 348.0 | 1446.0 | 4248.0 | 0.057597 | 45.0 | 348.0 | 479.0 | 0.051606 |
__EffectivePopnSize | 493.0 | 1301.0 | 4248.0 | 0.081595 | 65.0 | 328.0 | 479.0 | 0.074541 |
SENSITIVITY | 3494.0 | 521.0 | 2027.0 | 0.578285 | 528.0 | 64.0 | 280.0 | 0.605505 |
__Dispersal distance limited | 1084.0 | 4958.0 | 0.0 | 0.179411 | 96.0 | 776.0 | 0.0 | 0.110092 |
__Dispersal barriers | 269.0 | 5773.0 | 0.0 | 0.044522 | 45.0 | 827.0 | 0.0 | 0.051606 |
__Low genetic diversity | 31.0 | 6011.0 | 0.0 | 0.005131 | 10.0 | 862.0 | 0.0 | 0.011468 |
__clutch size | 1657.0 | 2585.0 | 1800.0 | 0.274247 | 103.0 | 628.0 | 141.0 | 0.118119 |
__Gen_Length | 1512.0 | 4530.0 | 0.0 | 0.250248 | 218.0 | 654.0 | 0.0 | 0.250000 |
LOW_ADAPTABILITY | 3238.0 | 1669.0 | 1135.0 | 0.535915 | 338.0 | 423.0 | 111.0 | 0.387615 |
__EXP_Sea Inundation | 91.0 | 5945.0 | 6.0 | 0.015061 | 9.0 | 863.0 | 0.0 | 0.010321 |
__EXP_MeanTemperature | 967.0 | 3854.0 | 1221.0 | 0.160046 | 260.0 | 368.0 | 244.0 | 0.298165 |
__EXP_AADTemperature | 892.0 | 3929.0 | 1221.0 | 0.147633 | 52.0 | 576.0 | 244.0 | 0.059633 |
__EXP_MeanPrecip | 1092.0 | 3729.0 | 1221.0 | 0.180735 | 88.0 | 540.0 | 244.0 | 0.100917 |
__EXP_AADPrecip | 986.0 | 3835.0 | 1221.0 | 0.163191 | 84.0 | 544.0 | 244.0 | 0.096330 |
EXPOSURE | 2674.0 | 2156.0 | 1212.0 | 0.442569 | 379.0 | 250.0 | 243.0 | 0.434633 |
It appears that subdivision by separating biodiversity sites from all natural sites does not indicate a different picture. The percentage difference is minute.
Check differences according to regions
# load regions
region = pd.read_csv('region.csv')
region.unesco_reg.unique()
array(['Africa', 'Arab States', 'Asia and the Pacific', 'Europe and North America', 'Latin America and the Caribbean'], dtype=object)
# convenient variables
regions_list = ['_africa', '_asia', '_euna', '_lac', '_arab']
regions_name_list = ['Africa', 'Asia and the Pacific', 'Europe and North America', 'Latin America and the Caribbean', 'Arab States']
# africa_ids = region[region.unesco_reg == 'Africa'].wdpaid
# asia_ids = region[region.unesco_reg == 'Asia and the Pacific'].wdpaid
# euna_ids = region[region.unesco_reg == 'Europe and North America'].wdpaid
# lac_ids = region[region.unesco_reg == 'Latin America and the Caribbean'].wdpaid
# arab_ids = region[region.unesco_reg == 'Arab States'].wdpaid
# more concise version using list comprehension
africa_ids, asia_ids, euna_ids, lac_ids, arab_ids = [region[region.unesco_reg == region_name].wdpaid for region_name in \
regions_name_list]
print([len(x) for x in [africa_ids, asia_ids, euna_ids, lac_ids, arab_ids]])
[41, 70, 71, 41, 6]
# get a function that returns the result of aggregates
def get_agg_result_by_wdpaidlist(taxon_result, taxon_columns, filtered_wdpaids):
# get a unique list of species for sites, based on the filtered wdpaids
unique_taxon_result = taxon_result[taxon_result.wdpaid.isin(filtered_wdpaids)].groupby('id_no_int').first().reset_index()
# apply the stangdard aggregation methods
return unique_taxon_result[taxon_columns].apply(get_hlu).T
# amp in wh sites by region
amp_africa, amp_asia, amp_euna, amp_lac, amp_arab = \
[get_agg_result_by_wdpaidlist(result_amp_f, a_columns, a_region) \
for a_region in [africa_ids, asia_ids, euna_ids, lac_ids, arab_ids]]
amp_regions = [amp_africa, amp_asia, amp_euna, amp_lac, amp_arab]
# bird in wh sites by region
bird_africa, bird_asia, bird_euna, bird_lac, bird_arab = \
[get_agg_result_by_wdpaidlist(result_bird_f, b_columns, a_region) \
for a_region in [africa_ids, asia_ids, euna_ids, lac_ids, arab_ids]]
bird_regions = [bird_africa, bird_asia, bird_euna, bird_lac, bird_arab]
# coral in wh sites by region
coral_africa, coral_asia, coral_euna, coral_lac, coral_arab = \
[get_agg_result_by_wdpaidlist(result_coral_f, c_columns, a_region) \
for a_region in [africa_ids, asia_ids, euna_ids, lac_ids, arab_ids]]
coral_regions = [coral_africa, coral_asia, coral_euna, coral_lac, coral_arab]
# join all the regional tables together
from functools import reduce
# use reduce function to merge any given number of dfs; here i used a tuple in order to get suffix information
# the first element is the df, the second a dummy value used in the reduce process
amp_region_final = reduce(lambda left, right: (pd.merge(left[0], right[0], left_index=True, right_index=True, \
suffixes = ('', right[1])), 0), \
zip(amp_regions, regions_list),
(amp_selected[a_columns].apply(get_hlu).T, 0))[0]
bird_region_final = reduce(lambda left, right: (pd.merge(left[0], right[0], left_index=True, right_index=True, \
suffixes = ('', right[1])), 0), \
zip(bird_regions, regions_list),
(bird_selected[b_columns].apply(get_hlu).T, 0))[0]
coral_region_final = reduce(lambda left, right: (pd.merge(left[0], right[0], left_index=True, right_index=True, \
suffixes = ('', right[1])), 0), \
zip(coral_regions, regions_list),
(coral_selected[c_columns].apply(get_hlu).T, 0))[0]
amp_region_final[['per_H' + each for each in ['', '_africa', '_asia', '_euna', '_lac', '_arab']]]
per_H | per_H_africa | per_H_asia | per_H_euna | per_H_lac | per_H_arab | |
---|---|---|---|---|---|---|
SUSC_A_Habitats | 0.109290 | 0.089524 | 0.067434 | 0.068966 | 0.165975 | 0.000000 |
SUSC_A_aquatic larvae | 0.163934 | 0.278095 | 0.139803 | 0.183908 | 0.095436 | 0.666667 |
SUSC_B_Temperature Range | 0.191754 | 0.241905 | 0.069079 | 0.000000 | 0.300138 | 0.000000 |
SUSC_B_Precipitation Range | 0.237953 | 0.184762 | 0.231908 | 0.839080 | 0.147994 | 1.000000 |
SUSC_C_explosive breeder | 0.080477 | 0.160000 | 0.059211 | 0.045977 | 0.052559 | 0.333333 |
SUSC_D_disease | 0.216592 | 0.019048 | 0.185855 | 0.126437 | 0.410788 | 0.000000 |
SENSITIVITY | 0.665673 | 0.636190 | 0.483553 | 0.885057 | 0.791148 | 1.000000 |
ADAPT_A_barriers | 0.090909 | 0.093333 | 0.144737 | 0.045977 | 0.053942 | 0.166667 |
ADAPT_A_dispersal_distance | 0.248882 | 0.264762 | 0.199013 | 0.143678 | 0.302905 | 0.000000 |
ADAPT_C_Slow_Gen_Turnpover | 0.302534 | 0.346667 | 0.250000 | 0.327586 | 0.302905 | 0.000000 |
LOW_ADAPTABILITY | 0.495281 | 0.533333 | 0.447368 | 0.442529 | 0.515906 | 0.166667 |
EXP_Sea Level | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
EXP_MeanTemperature | 0.156483 | 0.034286 | 0.197368 | 0.189655 | 0.219917 | 0.166667 |
EXP_MeanRainfall | 0.249876 | 0.268571 | 0.100329 | 0.028736 | 0.408022 | 0.333333 |
EXP_AADTemperature | 0.105315 | 0.045714 | 0.088816 | 0.017241 | 0.181189 | 0.000000 |
EXP_AADRainfall | 0.184302 | 0.100952 | 0.208882 | 0.074713 | 0.246196 | 0.333333 |
EXPOSURE | 0.429210 | 0.306667 | 0.417763 | 0.270115 | 0.576763 | 0.500000 |
FINAL_SCORE | 0.149528 | 0.091429 | 0.101974 | 0.097701 | 0.246196 | 0.166667 |
amp_region_final.loc['FINAL_SCORE']
H 301.000000 L 1712.000000 U 0.000000 per_H 0.149528 H_africa 48.000000 L_africa 477.000000 U_africa 0.000000 per_H_africa 0.091429 H_asia 62.000000 L_asia 546.000000 U_asia 0.000000 per_H_asia 0.101974 H_euna 17.000000 L_euna 157.000000 U_euna 0.000000 per_H_euna 0.097701 H_lac 178.000000 L_lac 545.000000 U_lac 0.000000 per_H_lac 0.246196 H_arab 1.000000 L_arab 5.000000 U_arab 0.000000 per_H_arab 0.166667 Name: FINAL_SCORE, dtype: float64
bird_region_final.loc['FINAL_SCORE']
H 1336.000000 L 5578.000000 U 0.000000 per_H 0.193231 H_africa 209.000000 L_africa 1410.000000 U_africa 0.000000 per_H_africa 0.129092 H_asia 457.000000 L_asia 2117.000000 U_asia 0.000000 per_H_asia 0.177545 H_euna 212.000000 L_euna 868.000000 U_euna 0.000000 per_H_euna 0.196296 H_lac 703.000000 L_lac 2120.000000 U_lac 0.000000 per_H_lac 0.249026 H_arab 71.000000 L_arab 226.000000 U_arab 0.000000 per_H_arab 0.239057 Name: FINAL_SCORE, dtype: float64
coral_region_final.loc['FINAL_SCORE']
H 111.000000 L 582.000000 U 31.000000 per_H 0.153315 H_africa 20.000000 L_africa 281.000000 U_africa 4.000000 per_H_africa 0.065574 H_asia 60.000000 L_asia 543.000000 U_asia 5.000000 per_H_asia 0.098684 H_euna 43.000000 L_euna 395.000000 U_euna 4.000000 per_H_euna 0.097285 H_lac 30.000000 L_lac 54.000000 U_lac 1.000000 per_H_lac 0.352941 H_arab 12.000000 L_arab 197.000000 U_arab 22.000000 per_H_arab 0.051948 Name: FINAL_SCORE, dtype: float64
It is very difficult to find patterns in a table full of numbers like the above. Graphs may present a much better view in terms of the underlying differences.
# amp
# a = pd.concat([each[['H', 'L', 'U']][-1:] for each in amp_regions])
## a better way to slice columns and rows, loc is a label based indxing
## .loc returns a series, concatenate on axis 1, transpose and then replace index, so that they'll show up as labels
# get df for all regions, with HLU
a = pd.concat([each[['H', 'L', 'U']].loc['FINAL_SCORE'] for each in amp_regions], 1).T
a.index = regions_name_list
b = a.plot.barh(stacked=True, figsize=(15,5))
# for each bar, annotate num of 'H' and percentage of 'H'
for region_name, region_data, y_axis in zip(regions_name_list, amp_regions, range(5)):
b.annotate('{0:.0f}'.format(a.loc[region_name].H) + ', ' + '{0:.0f}'.format(a.loc[region_name].sum()) + ', ' + '{0:.2%}'.format(region_data.loc['FINAL_SCORE'].per_H),
(a.loc[region_name].sum() + 10, y_axis)) # add a buf-space for x
## find amphibians species in arab states WH sites
pd.set_option('display.max_columns', 60)
example = result_amp_f[result_amp_f.wdpaid.isin(arab_ids)]
example[example.FINAL_SCORE=='H']
SVP_ID | SIS_GAA_ID | GAA Family | Genus | Species | Fullname | Threatened | SUSC_A_Habitats | SUSC_A_aquatic larvae | SUSC_B_Temperature Range | SUSC_B_Precipitation Range | SUSC_C_explosive breeder | SUSC_D_disease | SENSITIVITY | ADAPT_A_barriers | ADAPT_A_dispersal_distance | ADAPT_C_Slow_Gen_Turnpover | LOW_ADAPTABILITY | EXP_Sea Level | EXP_MeanTemperature | EXP_MeanRainfall | EXP_AADTemperature | EXP_AADRainfall | EXPOSURE | FINAL_SCORE | Final_Pessimistic | Score_HorS | SensXLAdaptability | SensXExposure | LAdaptXExposure | Threatened_and_Vulnerable | Unnamed: 0 | wdpaid | id_no | areakm2_x | areakm2_y | per | en_name | fr_name | status_yr | rep_area | gis_area | country | crit | areakm2 | binomial | kingdom_name | phylum_name | class_name | order_name | family_name | genus_name | species_name | category | biome_marine | biome_freshwater | biome_terrestrial | id_no_int | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1132 | 3152 | 6187 | ALYTIDAE | Discoglossus | pictus | Discoglossus pictus | NaN | L | L | L | H | L | L | H | H | L | L | H | L | L | H | L | H | H | H | H | H | H | NaN | NaN | NaN | 126160 | 4322 | 55270.0 | 124.424076 | 124.424076 | 1.0 | Ichkeul National Park | Parc national de l'Ichkeul | 1980 | 126.0 | 124.424076 | TUN | (x) | 124.424076 | Discoglossus pictus | ANIMALIA | CHORDATA | AMPHIBIA | ANURA | ALYTIDAE | Discoglossus | pictus | LC | f | t | t | 55270.0 |
Latin America and the Caribbean region contains the highest number of amphibians that are climate vulnerable (178, out of a total of 723, 24.62%), compared to about 10% for WH sites in other regions. Notably on the other end of the spectrum, WH sites in the Arab States region host very few ampbibans and only one of them is climate vulnerable (Discoglossus pictus in Ichkeul National Park, based on the IUCN Red List data).
# bird
a = pd.concat([each[['H', 'L', 'U']].loc['FINAL_SCORE'] for each in bird_regions], 1).T
a.index = regions_name_list
b = a.plot.barh(stacked=True, figsize=(15,5))
# # testing annotation, note y axis is from bottom up. unit based on data
# for each bar, annotate num of 'H' and percentage of 'H'
for region_name, region_data, y_axis in zip(regions_name_list, bird_regions, range(5)):
b.annotate('{0:.0f}'.format(a.loc[region_name].H) + ', ' + '{0:.2%}'.format(region_data.loc['FINAL_SCORE'].per_H),
(a.loc[region_name].sum() + 10, y_axis)) # add a buf-space for x
Similarly from the point of view of climate vulnerable birds in global WH network, Latin America and the Caribbean region has the largest number of birds, and a quarter of them are climate vulnerable, compared to other regions.
# coral
a = pd.concat([each[['H', 'L', 'U']].loc['FINAL_SCORE'] for each in coral_regions], 1).T
a.index = regions_name_list
b = a.plot.barh(stacked=True, figsize=(15,5))
# for each bar, annotate num of 'H' and percentage of 'H'
for region_name, region_data, y_axis in zip(regions_name_list, coral_regions, range(5)):
b.annotate('{0:.0f}'.format(a.loc[region_name].H) + ', ' + '{0:.2%}'.format(region_data.loc['FINAL_SCORE'].per_H),
(a.loc[region_name].sum() + 10, y_axis)) # add a buf-space for x
The above regional comparisons across amphibians, birds and corals indicate a large variation in the number and proportion of cimate vulnerable species in the World Heritage network.
Need to apply the same methodology across all attributes, i.e. sensitivity, low-adaptability and each of the traits...
# enumerate to get index number, which is needed in the plot
for i, column in enumerate(a_columns):
print(i, column)
0 SUSC_A_Habitats 1 SUSC_A_aquatic larvae 2 SUSC_B_Temperature Range 3 SUSC_B_Precipitation Range 4 SUSC_C_explosive breeder 5 SUSC_D_disease 6 SENSITIVITY 7 ADAPT_A_barriers 8 ADAPT_A_dispersal_distance 9 ADAPT_C_Slow_Gen_Turnpover 10 LOW_ADAPTABILITY 11 EXP_Sea Level 12 EXP_MeanTemperature 13 EXP_MeanRainfall 14 EXP_AADTemperature 15 EXP_AADRainfall 16 EXPOSURE 17 FINAL_SCORE
# a function to plot all columns using stacked horizontal barchart
def plot_all_attributes(taxon_regions, region_namelist, columns):
my_colors = ['#b2182b', '#ffffff', '#e0e0e0']
# create a figure with correct number of axes
col = len(columns)
fig, axes = plt.subplots(nrows = col, figsize=(15, 5*col)) # width and heihgt
fig.subplots_adjust(hspace = 0.3)
# for each exes, plot a column result for all regions, for comparisons
for i, column in enumerate(columns):
# a df of all regions HLUs
a = pd.concat([each[['H', 'L', 'U']].loc[column] for each in taxon_regions], 1).T
a.index = region_namelist
# so it also works for col=1. If col=1, axes is not an array and thus does not support indexing
if col>1:
b = a.plot.barh(ax=axes[i], stacked=True, color=my_colors)
axes[i].set_title(column)
else:
b = a.plot.barh(ax=axes, stacked=True, color=my_colors)
axes.set_title(column)
# add num of H, total num and percentage as annotation
for region_name, region_data, y_axis in zip(region_namelist, taxon_regions, range(len(region_namelist))):
b.annotate('{0:.0f}'.format(a.loc[region_name].H) + ', ' + '{0:.0f}'.format(a.loc[region_name].H + a.loc[region_name].L + a.loc[region_name].U) + \
', ' + '{0:.2%}'.format(region_data.loc[column].per_H),
(a.loc[region_name].sum() + 10, y_axis))
return fig
# need to add `;` to suppress the automatic behaviour that plots the same graph twice
a = plot_all_attributes(amp_regions, regions_name_list, ['FINAL_SCORE', 'SENSITIVITY', 'LOW_ADAPTABILITY', 'EXPOSURE']);
a.savefig('region_amp.png', dpi=100, bbox_inches='tight')
Detailed regional comparisons across traits and exposures for amphibians
# need to add `;` to suppress the automatic behaviour that plots the same graph twice
plot_all_attributes(amp_regions, regions_name_list, a_columns);
# test for single column and it also works now
plot_all_attributes(amp_regions, regions_name_list, ['FINAL_SCORE']);
a = plot_all_attributes(bird_regions, regions_name_list, ['FINAL_SCORE', 'SENSITIVITY', 'LOW_ADAPTABILITY', 'EXPOSURE']);
a.savefig('region_bird.png', dpi=100, bbox_inches='tight')
Detailed regional comparisons across traits and exposures for birds
plot_all_attributes(bird_regions, regions_name_list, b_columns);
plot_all_attributes(coral_regions, regions_name_list, c_columns);
Sensitivity seems rather weird - all species across all regions have sensitivity high?
(coral_unique.SENSITIVITY == 'H').sum(), coral_unique.index.size
(796, 797)
The top 25% of WH sites both in terms of percentage and total number of highly vulnerable species for a given taxon are labelled as 'highly vulnerable' for that taxon.
# theshold value according to the number of H species
thres_total = [np.percentile(each[each.level_2 == 'FINAL_SCORE'].H, 75) for each in [amp_vv, bird_vv, coral_vv]]
# treshold value according to the percentage of H in relation to total HLU
thres_per = [np.percentile(each[each.level_2 == 'FINAL_SCORE'].per_H, 75) for each in [amp_vv, bird_vv, coral_vv]]
thres_per, thres_total
([0.14285714285714285, 0.28225806451612906, 0.14732142857142858], [1.0, 47.0, 27.0])
# the number of sites as 'highly vulnerable' for amp
((amp_vv[amp_vv.level_2=='FINAL_SCORE'].H > thres_total[0]) & (amp_vv[amp_vv.level_2=='FINAL_SCORE'].per_H > thres_per[0])).sum()
30
(amp_vv[amp_vv.level_2=='FINAL_SCORE'].H > thres_total[0]).sum(), (amp_vv[amp_vv.level_2=='FINAL_SCORE'].per_H > thres_per[0]).sum()
(42, 47)
# amp, bird, coral number of cv sites
[((each[each.level_2=='FINAL_SCORE'].H > thres_total[i]) & (each[each.level_2=='FINAL_SCORE'].per_H > thres_per[i])).sum() for i, each in enumerate([amp_vv, bird_vv, coral_vv])]
[30, 28, 1]