Working with Digital Herbarium using Python

Python programming language is a popular tool for speaking with a computer. It is a common purpose programming language widely used not only in programming field, but also in scientific computations, data analysis and engineering.

Among reasons of Python popularity are its programming productivity, code readability, obvious simplicity, as well as dynamic typing, supporting OOPs coding style, cross-platform runability etc.

When writing in Python, there is almost no chance to get unreadable code. Every block of the program is separated from others by predefined number of spaces that leads to an easy-to-read textual document. There are recommendations on how to write Python code - pep8. Pep8 document aims to help user write readable and easy understandable Python code.

Reproducing computations presented below assumes Python 3.5+ to be installed on your operational system, as well as some Python packages (e.g. Pandas, NumPy etc.); To work with spatially distributed data, you will need additionally to install pyshp (a tool for I/O operations with ESRI shape files) and shapely (geometry primitives and interactions) packages.

Windows users can start with Anaconda distribution. This distribution was designed to provide a convenient toolset for scientific computations and data analysis on top of Python, so it could be a good starting point on the way of building computational environment for your investigations.

Executable lines of code are given in In[xxx]-started blocks; these blocks could be executed either in Jupyter interactive environment or by coping them into a separate text file (with .py extension) and running with Python interpreter.

This document is created using Jupyter software, interactive environment that allows mixing code and text in pretty-styled way.

Making simple queries

To get started, we need to prepare computational environment. The most important component of it is the Pandas package; It is highly recommended to install it before doing any manipulation with data. Being installed, Pandas supplies us with a good and convenient data container -- the DataFrame class, that, in turn, allows to make sophisticated data selection and perform basic input and output (IO) operations.

We assume that the Pandas has been already installed. So, let's import it.

In [1]:
import pandas as pd  # This is an implicit agreement: everytime you need pandas, it is better to import it as `pd`.

Let's define variable called HERBARIUM_SEARCH_URL. This variable will point to the URI, where we will send HTTP search queries. This URI is assumed to be a permanent address, so it probably wouldn't be changed in the nearest future.

Note: Оne can use HTTPS protocol instead. In this case just replace http with https in HERBARIUM_SEARCH_URL.

In [2]:
HERBARIUM_SEARCH_URL = 'http://botsad.ru/hitem/json/'

It is worth noting that variable's naming is a very important part of any programming process. One can think that it isn't so important, and would use shorthand notations instead, bearing in mind that everything will be clear, at least, for themselves. But this isn't a good reason to do so, even if you are writing code just for yourself. Best practice in this case assumes to choose variable names that will be clear for everyone (in an intuitive way) - on the one hand, and that will be as short as possible - on the other.

Let us organize search parameters in a list of tuples, as follows (to get full description of all available search parameters follow the link):

In [3]:
search_parameters_set = (('collectedby', 'Bakalin'),
                         ('identifiedby', 'Bakalin'),
                         ('colstart', '2016-01-01'),
                         ('colend', '2016-12-30')
                         )

Let's import a set of necessary functions to make HTTP-requests automatically. Python's ecosystem provides a lot of tools to do this: one can use third party packages, or just use ones included in the Python standard distributive (a.k.a "battarie included" pack).

In [4]:
try:
    # Python 3.x
    from urllib.parse import quote
    from urllib.request import urlopen
except ImportError:
    # Python 2.x
    from urllib import quote
    from urllib import urlopen

Now, we can use defined variables to compose search URI (according to HTTP-API rules):

In [5]:
search_request_url = HERBARIUM_SEARCH_URL + '?' + '&'.join(map(lambda x: x[0] + '=' + quote(x[1].strip()), search_parameters_set))

According to the URI specification non-ASCII characters used in the URI should be encoded using symbol '%'. So, if your query includes non-ASCII chars, you will need to use helper function quote to do such encoding.

In case of simple queries, one can assign search_request_url directly, e.g.:

search_request_url = http://botsad.ru/hitem/json?collectedby=bakalin

Choosing this way is up to you, but scripting best practice assumes that you will divide process of building the search_request_url. Using search_parameters_set allows you to be more structured and organized in case of, for instance, making complicated searching queries, performing a set of consequent queries with different parameters etc.:

list_of_search_pars = [search_parameters_set1, search_parameters_set2, search_parameters_set3,]

list_of_search_urls = [search_request_url1, search_request_url2, search_request_url3, ]

For the sake of self-checking, one can print out the current value of the search_request_url.

In [6]:
search_request_url
Out[6]:
'http://botsad.ru/hitem/json/?collectedby=Bakalin&identifiedby=Bakalin&colstart=2016-01-01&colend=2016-12-30'

Now we are ready to make searching request to the server. In general, data loading and its transformation into Python dictionary consist of the following four lines of code:

In [7]:
import json
server_response = urlopen(search_request_url)
data = json.loads(server_response.read().decode('utf-8'))
server_response.close()

So, the variable data stores Python dictionary with fields defined in official docs.

Getting started the data processing, one should check errors and warnings fields; if everything went fine, these variables are leaved empty, or just warnings is non-empty.

In [8]:
data['errors'], data['warnings']
Out[8]:
([], [])

Therefore, data was successfully loaded, and we can start typical data evaluation process using Pandas.

In [9]:
print("The number of obtained records is ", len(data['data']))
The number of obtained records is  179

Now, data['data'] is a Python dictionary; Python dictionaries are convenient containers for structured data, but Pandas DataFrame is more appropriate for this purpose. Let us convert the dictionary to the DataFrame object; it is quite simple:

In [10]:
search_df = pd.DataFrame(data['data'])

search_df is an instance of the DataFrame class, that has a lot of helper methods to get information about the data. Some of them are .info() and .describe() methods; they are used to get common information about the data.

In [11]:
search_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 179 entries, 0 to 178
Data columns (total 41 columns):
acronym                     179 non-null object
additionals                 179 non-null object
altitude                    179 non-null object
branch                      179 non-null object
collection_finished         179 non-null object
collection_started          179 non-null object
collectors                  179 non-null object
country                     179 non-null object
country_id                  177 non-null float64
created                     179 non-null object
details                     179 non-null object
dethistory                  179 non-null object
devstage                    179 non-null object
district                    179 non-null object
family                      179 non-null object
family_authorship           179 non-null object
fieldid                     179 non-null object
genus                       179 non-null object
genus_authorship            179 non-null object
gpsbased                    179 non-null bool
id                          179 non-null int64
identification_finished     179 non-null object
identification_started      179 non-null object
identifiers                 179 non-null object
images                      179 non-null object
infraspecific_authorship    179 non-null object
infraspecific_epithet       179 non-null object
infraspecific_rank          179 non-null object
itemcode                    179 non-null object
latitude                    179 non-null float64
longitude                   179 non-null float64
note                        179 non-null object
region                      179 non-null object
short_note                  179 non-null object
significance                179 non-null object
species_authorship          179 non-null object
species_epithet             179 non-null object
species_fullname            179 non-null object
species_id                  179 non-null int64
species_status              179 non-null object
updated                     179 non-null object
dtypes: bool(1), float64(3), int64(2), object(35)
memory usage: 56.2+ KB

Also, one can inspect the DataFrame's content:

In [12]:
search_df
Out[12]:
acronym additionals altitude branch collection_finished collection_started collectors country country_id created ... note region short_note significance species_authorship species_epithet species_fullname species_id species_status updated
0 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... SaPa, Phan Xi Pan National Park. Lao Cai Province (Austin) Stephani subciliata Pallavicinia subciliata (Austin) Stephani 420693 Approved 2017-06-13
1 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... SaPa, Phan Xi Pan National Park[Vietnam] Lao Cai Province (Austin) Stephani subciliata Pallavicinia subciliata (Austin) Stephani 420693 Approved 2017-06-13
2 VBGI [{'species_authorship': '', 'infraspecific_aut... 1700-1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien Son National ... Lai Chau Province Furuki flavovirens Riccardia flavovirens Furuki 24180 Approved 2017-06-26
3 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park. Lai Chau Province Furuki pumila Riccardia pumila Furuki 24296 From plantlist 2017-07-07
4 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin NaN 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park. Lai Chau Province (Lehm. & Lindenb.) Trevis. campylophylla Porella campylophylla (Lehm. & Lindenb.) Trevis. 588613 Approved 2017-07-14
5 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin NaN 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park. Lai Chau Province (Lehm. & Lindenb.) Trevis. acutifolia Porella acutifolia (Lehm. & Lindenb.) Trevis. 470689 Approved 2017-07-11
6 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park. Lai Chau Province sp. Asterella sp. 588815 Approved 2017-08-14
7 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park. Lai Chau Province (Lehm. & Lindenb.) Trevis. acutifolia Porella acutifolia (Lehm. & Lindenb.) Trevis. 470689 Approved 2017-07-11
8 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park. Lai Chau Province (Lehm. & Lindenb.) Trevis. campylophylla Porella campylophylla (Lehm. & Lindenb.) Trevis. 588613 Approved 2017-07-14
9 VBGI [{'species_authorship': '(Scop.) Nees', 'infra... 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien Son National ... Lai Chau Province (L.) Dumort. pinguis Aneura pinguis (L.) Dumort. 24020 Approved 2017-06-13
10 VBGI [] 1700-1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien Son National ... Lai Chau Province Furuki flavovirens Riccardia flavovirens Furuki 24180 Approved 2017-06-26
11 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park. Lai Chau Province sp. Marchantia sp. 589009 Approved 2017-08-08
12 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... SaPa, Phan Xi Pan National Park Lao Cai Province sp. Preissia sp. 590106 Approved 2017-06-13
13 VBGI [] 1700-1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien Son National ... Lai Chau Province Furuki flavovirens Riccardia flavovirens Furuki 24180 Approved 2017-06-26
14 VBGI [{'species_authorship': '(Grolle & Váňa) Váňa ... 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park. Lai Chau Province Furuki pumila Riccardia pumila Furuki 24296 From plantlist 2017-07-07
15 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park... Lai Chau Province Furuki pumila Riccardia pumila Furuki 24296 From plantlist 2017-07-07
16 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range. Hoang Lien Son National ... Lai Chau Province (S. Hatt.) Furuki yakusimensis Lobatiriccardia yakusimensis (S. Hatt.) Furuki 24100 Approved 2017-06-21
17 VBGI [] 1900 Bryophyte herbarium 2016-03-15 V.A. Bakalin Vietnam 134.0 2017-04-07 ... SaPa, Phan Xi Pan National Park Lao Cai Province (Thunb.) Grolle japonicum Conocephalum japonicum (Thunb.) Grolle 184206 From plantlist 2017-06-13
18 VBGI [] 2100 Bryophyte herbarium 2016-03-16 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, border of the Hoang Lien... Lai Chau Province (Austin) Stephani subciliata Pallavicinia subciliata (Austin) Stephani 420693 Approved 2017-06-13
19 VBGI [] 2100 Bryophyte herbarium 2016-03-16 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, border of the Hoang Lien... Lai Chau Province Schiffn. consanguinea Metzgeria consanguinea Schiffn. 589938 Approved 2017-06-13
20 VBGI [] 2100 Bryophyte herbarium 2016-03-16 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, border of the Hoang Lien... Lai Chau Province Mitt. crispula Calycularia crispula Mitt. 13454 Approved 2017-06-13
21 VBGI [] 2100 Bryophyte herbarium 2016-03-16 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park. Lai Chau Province (Lehm. & Lindenb.) Trevis. acutifolia Porella acutifolia (Lehm. & Lindenb.) Trevis. 470689 Approved 2017-07-11
22 VBGI [] 2100 Bryophyte herbarium 2016-03-16 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Hoang Lien Son Range, Hoang Lien National Park. Lai Chau Province Furuki pumila Riccardia pumila Furuki 24296 From plantlist 2017-07-07
23 VBGI [] 3143 Bryophyte herbarium 2016-03-17 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Sa Pa District, San Sa Ho Communne, Hoang Lien... Lao Cai Province (Schiffn.) Steph. maxima Aneura maxima (Schiffn.) Steph. 589001 Approved 2017-06-13
24 VBGI [] 3143 Bryophyte herbarium 2016-03-17 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Sa Pa District, San Sa Ho Comunne, Hoang Lien ... Lao Cai Province gem. (Mitt.) Konstant. setosa Schistochilopsis setosa (Mitt.) Konstant. 589942 Approved 2017-11-07
25 VBGI [] 3143 Bryophyte herbarium 2016-03-17 V.A. Bakalin Vietnam 134.0 2017-04-07 ... SaPa, Phan Xi Pan National Park, around the peak. Lao Cai Province Schiffn. consanguinea Metzgeria consanguinea Schiffn. 589938 Approved 2017-06-13
26 VBGI [] 3143 Bryophyte herbarium 2016-03-17 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Sa Pa District, San Sa Ho Comunne, Hoang Lien ... Lao Cai Province sp. Aneura sp. 589627 Approved 2017-06-13
27 VBGI [{'species_authorship': 'S. Hatt.', 'infraspec... 3143 Bryophyte herbarium 2016-03-17 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Sa Pa District, San Sa Ho Comunne, Hoang Lien ... Lao Cai Province (Mitt.) Konstant. setosa Schistochilopsis setosa (Mitt.) Konstant. 589942 Approved 2017-11-07
28 VBGI [] 1350 Bryophyte herbarium 2016-03-18 V.A. Bakalin Vietnam 134.0 2017-04-07 ... SaPa, Ta Phin area. Lao Cai Province (Austin) Stephani subciliata Pallavicinia subciliata (Austin) Stephani 420693 Approved 2017-06-13
29 VBGI [] 1350 Bryophyte herbarium 2016-03-18 V.A. Bakalin Vietnam 134.0 2017-04-07 ... Sa Pa District, Ta Phin Commune, Hoang Lien So... Lao Cai Province (Taylor) Trevis. obtusata Porella obtusata (Taylor) Trevis. var. macrolo... 590629 Approved 2017-08-02
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
149 VBGI [] 81 Bryophyte herbarium 2016-10-16 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-05-12 ... Khasansky District, Kravtsovka Stream Valley. Дальний Восток|Russian Far East spor. (Mont.) Grolle leptophylla Asterella leptophylla (Mont.) Grolle 62971 Approved 2017-08-14
150 VBGI [] 810 Bryophyte herbarium 2016-10-15 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-05-23 ... Shkotovsky District, Northern slope of Falaza ... Дальний Восток|Russian Far East Lindb. & Arnell laxa Calycularia laxa Lindb. & Arnell 588423 Approved 2017-06-13
151 VBGI [{'species_authorship': 'Szweyk., Buczk. & Odr... 124 Bryophyte herbarium 2016-10-16 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-05-25 ... Khasansky District, Kravtsovka Stream Valley. Дальний Восток|Russian Far East (Steph.) R.M. Schust. & Inoue erimonus Hattorianthus erimonus (Steph.) R.M. Schust. &... 590071 Approved 2017-06-13
152 VBGI [{'species_authorship': 'Spruce', 'infraspecif... 571 Bryophyte herbarium 2016-10-15 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-07-07 ... Shkotovsky District, Northern slope of Falaza ... Дальний Восток|Russian Far East (Hedw.) Carruth. palmata Riccardia palmata (Hedw.) Carruth. 24263 Approved 2017-07-07
153 VBGI [] 35 Bryophyte herbarium 2016-10-16 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-07-12 ... Nadezhdensky District, Razdol’naya River Valley. Дальний Восток|Russian Far East (Steph.) S. Hatt. caespitans Porella caespitans (Steph.) S. Hatt. 588551 Approved 2017-07-12
154 VBGI [] 571 Bryophyte herbarium 2016-10-15 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-07-24 ... Shkotovsky District, Northern slope of Falaza ... Дальний Восток|Russian Far East (Steph.) S. Hatt. fauriei Porella fauriei (Steph.) S. Hatt. 588934 Approved 2017-07-24
155 VBGI [] 35 Bryophyte herbarium 2016-10-16 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-07-27 ... Nadezhdinsky District, Razdol'naya River Valley Дальний Восток|Russian Far East Lindb. grandiloba Porella grandiloba Lindb. 588822 Approved 2017-07-27
156 VBGI [] 90 Bryophyte herbarium 2016-04-28 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Barabash Area Дальний Восток|Russian Far East ant., arch. (L.) Raddi hemisphaerica Reboulia hemisphaerica (L.) Raddi subsp. hemis... 590127 Approved 2017-09-15
157 VBGI [] 90 Bryophyte herbarium 2016-04-28 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Barabash Area Дальний Восток|Russian Far East Lindb. vernicosa Porella vernicosa Lindb. 588792 Approved 2017-09-15
158 VBGI [{'species_authorship': '(Stephani) S. Hatt.',... 90 Bryophyte herbarium 2016-04-28 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Barabash Area Дальний Восток|Russian Far East Steph. taradakensis Frullania taradakensis Steph. 264954 Approved 2017-09-15
159 VBGI [] 90 Bryophyte herbarium 2016-04-28 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Barabash Area Дальний Восток|Russian Far East (Stephani) S. Hatt. chinensis Porella chinensis (Stephani) S. Hatt. 588836 Approved 2017-09-15
160 VBGI [{'species_authorship': '(L.) Dumort.', 'infra... 12 Bryophyte herbarium 2016-04-28 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Romashka Area Дальний Восток|Russian Far East L. fluitans Riccia fluitans L. 497877 From plantlist 2017-09-15
161 VBGI [] 12 Bryophyte herbarium 2016-04-28 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Romashka Area Дальний Восток|Russian Far East (L.) Dumort. pinguis Aneura pinguis (L.) Dumort. 24020 Approved 2017-09-15
162 VBGI [] 12 Bryophyte herbarium 2016-04-28 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Romashka Area Дальний Восток|Russian Far East (Nees) Nees irrigua Scapania irrigua (Nees) Nees 550233 Approved 2017-09-15
163 VBGI [] 12 Bryophyte herbarium 2016-04-28 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Romashka Area Дальний Восток|Russian Far East Steph. otaruensis Cephalozia otaruensis Steph. 588535 Approved 2017-09-15
164 VBGI [] 150 Bryophyte herbarium 2016-04-29 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Mramornaya Mt. Дальний Восток|Russian Far East ant., arch. (L.) Raddi hemisphaerica Reboulia hemisphaerica (L.) Raddi subsp. hemis... 590127 Approved 2017-09-15
165 VBGI [] 150 Bryophyte herbarium 2016-04-29 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Mramornaya Mt. Дальний Восток|Russian Far East sp. Riccia sp. 588829 Approved 2017-09-15
166 VBGI [] 150 Bryophyte herbarium 2016-04-29 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Mramornaya Mt. Дальний Восток|Russian Far East sp. Riccia sp. 588829 Approved 2017-09-15
167 VBGI [] 150 Bryophyte herbarium 2016-04-29 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Mramornaya Mt. Дальний Восток|Russian Far East per. juv. Mitt. infusca Plectocolea infusca Mitt. var. recondita Bakalin 590883 Approved 2017-09-15
168 VBGI [{'species_authorship': '(A. Evans) Schljakov'... 150 Bryophyte herbarium 2016-04-29 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Mramornaya Mt. Дальний Восток|Russian Far East per. (Steph.) Inoue truncatum Pedinophyllum truncatum (Steph.) Inoue 588820 Approved 2017-09-15
169 VBGI [] 150 Bryophyte herbarium 2016-04-29 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Mramornaya Mt. Дальний Восток|Russian Far East (DC.) Steph. autumnalis Jamesoniella autumnalis (DC.) Steph. 590059 Approved 2017-09-15
170 VBGI [{'species_authorship': '(Gottsche) Mizut.', '... 150 Bryophyte herbarium 2016-04-29 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Mramornaya Mt. Дальний Восток|Russian Far East Steph. taradakensis Frullania taradakensis Steph. 264954 Approved 2017-09-14
171 VBGI [{'species_authorship': '(Steph.) Inoue', 'inf... 150 Bryophyte herbarium 2016-04-29 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Mramornaya Mt. Дальний Восток|Russian Far East ant., arch., per. (Sm.) Schiffn. divaricata Cephaloziella divaricata (Sm.) Schiffn. 588472 Approved 2017-09-14
172 VBGI [] 150 Bryophyte herbarium 2016-04-29 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Mramornaya Mt. Дальний Восток|Russian Far East (Gottsche) Mizut. sandvicensis Trocholejeunea sandvicensis (Gottsche) Mizut. 590069 Approved 2017-09-14
173 VBGI [] 170 Bryophyte herbarium 2016-04-30 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Gryaznaya River Basin Дальний Восток|Russian Far East Stotler & Crotz azurea Calypogeia azurea Stotler & Crotz 101182 Approved 2017-09-14
174 VBGI [] 170 Bryophyte herbarium 2016-04-30 V.A. Bakalin Russia 162.0 2017-08-22 ... Khasansky District, Gryaznaya River Basin Дальний Восток|Russian Far East (Schrad.) Hazsl. rivularis Chiloscyphus rivularis (Schrad.) Hazsl. 333777 Approved 2017-09-14
175 VBGI [] 35 Bryophyte herbarium 2016-10-16 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-09-24 ... Nadezhdinsky District, Razdol'naya River Valley Дальний Восток|Russian Far East ant. Rev. by Yu.S. Mamontov 24.06.2017: Ok! Steph. muscicola Frullania muscicola Steph. 264739 Approved 2017-09-24
176 VBGI [] 639 Bryophyte herbarium 2016-10-15 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-10-30 ... Shkotovsky District, northern slope of Falaza ... Дальний Восток|Russian Far East per. (Hook.) Gray taylorii Mylia taylorii (Hook.) Gray 590048 Approved 2017-10-30
177 VBGI [] 81 Bryophyte herbarium 2016-10-16 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-10-30 ... Khasansky District, Kravtsovka Stream Valley Дальний Восток|Russian Far East gem. Bertol. paleacea Marchantia paleacea Bertol. 350001 Approved 2017-10-30
178 VBGI [] 35 Bryophyte herbarium 2016-10-16 K.G. Klimova & V.A. Bakalin Russia 162.0 2017-10-30 ... Deciduous forest (Quercus mongolica) on the ri... Дальний Восток|Russian Far East per., ant. A. Evans ornata Cololejeunea ornata A. Evans 325431 Approved 2017-10-30

179 rows × 41 columns

DataFrames are convenient containers allowing to do complicated data filtering

Let's filter our dataset leaving the records that have defined altitude (i.e. non-empty altitude parameter):

In [13]:
altitude_only = search_df[search_df.altitude != ''].copy()
In [14]:
altitude_only.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 179 entries, 0 to 178
Data columns (total 41 columns):
acronym                     179 non-null object
additionals                 179 non-null object
altitude                    179 non-null object
branch                      179 non-null object
collection_finished         179 non-null object
collection_started          179 non-null object
collectors                  179 non-null object
country                     179 non-null object
country_id                  177 non-null float64
created                     179 non-null object
details                     179 non-null object
dethistory                  179 non-null object
devstage                    179 non-null object
district                    179 non-null object
family                      179 non-null object
family_authorship           179 non-null object
fieldid                     179 non-null object
genus                       179 non-null object
genus_authorship            179 non-null object
gpsbased                    179 non-null bool
id                          179 non-null int64
identification_finished     179 non-null object
identification_started      179 non-null object
identifiers                 179 non-null object
images                      179 non-null object
infraspecific_authorship    179 non-null object
infraspecific_epithet       179 non-null object
infraspecific_rank          179 non-null object
itemcode                    179 non-null object
latitude                    179 non-null float64
longitude                   179 non-null float64
note                        179 non-null object
region                      179 non-null object
short_note                  179 non-null object
significance                179 non-null object
species_authorship          179 non-null object
species_epithet             179 non-null object
species_fullname            179 non-null object
species_id                  179 non-null int64
species_status              179 non-null object
updated                     179 non-null object
dtypes: bool(1), float64(3), int64(2), object(35)
memory usage: 57.5+ KB
In [15]:
print('Average altitude is {} meters above sea level'.format(altitude_only.altitude.apply(pd.to_numeric, args=('coerce',)).mean()))
Average altitude is 688.9428571428572 meters above sea level
In [ ]:
 

As you can see from the .info() output, altitude has the type non-null object, that means that its value could be quite arbitrary, e.g. a string, an array etc. We used .astype method to force its type to numeric, exactly, floating point. This is important in order to use the .mean() method. Altitude is a string that could have one of the forms: "700-900", "100", "300 m a.s.l." etc. So, one can wish to handle all of these cases smartly. To do so, regular expressions could be used. Regular expressions are convenient tool to do numbers extraction.

Let's consider another filtering conditions: we want to find all records collected at altitudes higher than 1 km and after 1 Aug, 2016.

To start working with dates in Pandas we need to use datetime objects.

In [16]:
altitude_only.altitude = altitude_only.altitude.apply(pd.to_numeric, args=('coerce',))
altitude_only.collection_started = pd.to_datetime(altitude_only.collection_started)
In [17]:
deadline = pd.to_datetime('2016-08-01')
In [18]:
altitude_only[(altitude_only.altitude > 1000) & (altitude_only.collection_started>deadline)]
Out[18]:
acronym additionals altitude branch collection_finished collection_started collectors country country_id created ... note region short_note significance species_authorship species_epithet species_fullname species_id species_status updated
48 VBGI [{'species_authorship': '(Hook.) Gray', 'infra... 1460.0 Bryophyte herbarium 2016-08-02 V.A. Bakalin Russia 162.0 2017-04-07 ... Baidzhalsky Mountain System, Yarap River Middl... Дальний Восток|Russian Far East (S. Hatt. & Inoue) S. Hatt. & Mizut. nana Apotreubia nana (S. Hatt. & Inoue) S. Hatt. & ... 573460 Approved 2017-08-07
52 VBGI [{'species_authorship': '(Huds.) H. Buch', 'in... 1570.0 Bryophyte herbarium 2016-08-04 V.A. Bakalin Russia 162.0 2017-04-07 ... Baidzhalsky Mountain System, Yarap River Middl... Дальний Восток|Russian Far East Lindb. & Arnell laxa Calycularia laxa Lindb. & Arnell 588423 Approved 2017-06-13
53 VBGI [{'species_authorship': '(Schreb.) Berggr.', '... 1570.0 Bryophyte herbarium 2016-08-04 V.A. Bakalin Russia 162.0 2017-04-07 ... Baidzhalsky Mountain System, Yarap River Middl... Дальний Восток|Russian Far East (S. Hatt. & Inoue) S. Hatt. & Mizut. nana Apotreubia nana (S. Hatt. & Inoue) S. Hatt. & ... 573460 Approved 2017-08-07
54 VBGI [] 1570.0 Bryophyte herbarium 2016-08-04 V.A. Bakalin Russia 162.0 2017-04-07 ... Baidzhalsky Mountain System, Yarap River Middl... Дальний Восток|Russian Far East Lindb. & Arnell laxa Calycularia laxa Lindb. & Arnell 588423 Approved 2017-06-13
55 VBGI [] 1480.0 Bryophyte herbarium 2016-08-06 V.A. Bakalin Russia 162.0 2017-04-07 ... Baidzhalsky Mountain System, Yarap River Middl... Дальний Восток|Russian Far East (Hook.) Gray taylorii Mylia taylorii (Hook.) Gray 590048 Approved 2017-11-07
56 VBGI [{'species_authorship': '(Wahlenb.) Dumort.', ... 1640.0 Bryophyte herbarium 2016-08-06 V.A. Bakalin Russia 162.0 2017-04-07 ... Baidzhalsky Mountain System, Yarap River Middl... Дальний Восток|Russian Far East Lindb. & Arnell laxa Calycularia laxa Lindb. & Arnell 588423 Approved 2017-06-13
57 VBGI [] 1640.0 Bryophyte herbarium 2016-08-06 V.A. Bakalin Russia 162.0 2017-04-07 ... Baidzhalsky Mountain System, Yarap River Middl... Дальний Восток|Russian Far East (S. Hatt. & Inoue) S. Hatt. & Mizut. nana Apotreubia nana (S. Hatt. & Inoue) S. Hatt. & ... 573460 Approved 2017-08-07
58 VBGI [{'species_authorship': '(Limpr.) Trevis.', 'i... 1780.0 Bryophyte herbarium 2016-08-08 V.A. Bakalin Russia 162.0 2017-04-07 ... Baidzhalsky Mountain System, Yarap River Middl... Дальний Восток|Russian Far East Lindb. & Arnell laxa Calycularia laxa Lindb. & Arnell 588423 Approved 2017-06-13
59 VBGI [] 1370.0 Bryophyte herbarium 2016-08-08 V.A. Bakalin Russia 162.0 2017-04-07 ... Baidzhalsky Mountain System, Yarap River Middl... Дальний Восток|Russian Far East (Gottsche) Limpr. neesiana Pellia neesiana (Gottsche) Limpr. 425171 Approved 2017-06-13
60 VBGI [{'species_authorship': 'Lindb.', 'infraspecif... 1640.0 Bryophyte herbarium 2016-08-09 V.A. Bakalin Russia 162.0 2017-04-07 ... Baidzhalsky Mountain System, Yarap River Middl... Дальний Восток|Russian Far East (Schrank) Kuwah. pubescens Apometzgeria pubescens (Schrank) Kuwah. 363552 Approved 2017-06-19

10 rows × 41 columns

Complex queries

Current version of the HTTP API (см. HTTP-API Description) doesn't support queries of OR-type. One can't build a single query url that performs, for example, searching all the records with dates of collection in Spring and Fall (but not Summer).

Such type of queries could be reached by Pandas with two or more consequent queries to the database that could emulate OR-type query. Rising problem with a big data in this case doesn't matter because it is unlikely for the Herbarium database to be very large.

Let's illustrate dividing of an OR-type query into two simple quieries. We will consider two simple queries named search_query1 and search_query2, and aim to build a complex one (i.e. search_query1 OR search_query2):

In [19]:
search_query1 = (('collectedby', 'Крестов'),
                 ('identifiedby', 'Крестов') 
                 )
search_query2 = (('collectedby', 'Баркалов'),
                 ('identifiedby', 'Пименова')
                 )
In [20]:
from functools import reduce # `reduce` was moved into `functools` in Python3, so we need to import it 

# Make search queries consequently...
datastore = [] # storage for DataFrames corresponding to quieries
for sp in [search_query1, search_query2]:  # We have the only two quieries 
    # building searching url for each query
    search_request_url = HERBARIUM_SEARCH_URL + '?' + '&'.join(map(lambda x: x[0] + '=' + quote(x[1].strip()), sp))
    server_response = urlopen(search_request_url)
    data = json.loads(server_response.read().decode('utf-8'))
    data = pd.DataFrame(data['data'])
    datastore.append(data)  # storing results for each query
    server_response.close()  # close connection to the server

# Combine results using Pandas (combining is based on uniqueness of ID):
df_combined = pd.concat(datastore).drop_duplicates('id').reset_index()
In [21]:
df_combined.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 536 entries, 0 to 535
Data columns (total 42 columns):
index                       536 non-null int64
acronym                     536 non-null object
additionals                 536 non-null object
altitude                    536 non-null object
branch                      536 non-null object
collection_finished         536 non-null object
collection_started          536 non-null object
collectors                  536 non-null object
country                     536 non-null object
country_id                  534 non-null float64
created                     536 non-null object
details                     536 non-null object
dethistory                  536 non-null object
devstage                    536 non-null object
district                    536 non-null object
family                      536 non-null object
family_authorship           536 non-null object
fieldid                     536 non-null object
genus                       536 non-null object
genus_authorship            536 non-null object
gpsbased                    536 non-null bool
id                          536 non-null int64
identification_finished     536 non-null object
identification_started      536 non-null object
identifiers                 536 non-null object
images                      536 non-null object
infraspecific_authorship    536 non-null object
infraspecific_epithet       536 non-null object
infraspecific_rank          536 non-null object
itemcode                    536 non-null object
latitude                    536 non-null float64
longitude                   536 non-null float64
note                        536 non-null object
region                      536 non-null object
short_note                  536 non-null object
significance                536 non-null object
species_authorship          536 non-null object
species_epithet             536 non-null object
species_fullname            536 non-null object
species_id                  536 non-null int64
species_status              536 non-null object
updated                     536 non-null object
dtypes: bool(1), float64(3), int64(3), object(35)
memory usage: 172.3+ KB
In [22]:
df_combined.shape
Out[22]:
(536, 42)

Original dimensions of composed DataFrame's:

In [23]:
datastore[0].shape, datastore[1].shape
Out[23]:
((179, 41), (359, 41))

Searching within Polygonal Areas

HTTP-API provides you ability to make rectangular-type geographic queries, i.e. searching herbarium records included in specified areas. Internal database structure doesn't allow to make a specific polygonal-type queries, but they could be emulated by means of Python and its third party packages.

Let us consider a hypothetical problem of comparison species diversity based on plants collected from Sakhalin Island and from 200-km circle around Petropavlovsk-Kamchatsky city. One of the ways to handle the problem is to use ESRI shapefile to restrict searching results by specified area. We will need pyshp and shapely packages to read shapefiles and process spatial data.

So, if they aren't yet installed on your computational environment, install them with the pip tool or another Python package manager.

Also, we will suppose that ESRI shapefiles are stored in shapefiles folder existing in your current directory.

In this case, basic reading workflow with shapefiles will be the following:

In [24]:
import shapefile
import numpy as np # Note: numpy is a part of Pandas pack: you can access numpy via pandas.np or pd.np

sakhalin_shp = shapefile.Reader("shapefiles/sakhalin.shp")

If no errors occurred, one can plot loaded data:

In [25]:
contour_sakhalin = np.array(sakhalin_shp.shapes()[0].points)  #convert contour points to numpy array
In [26]:
from pylab import *
plot(contour_sakhalin[:,0], contour_sakhalin[:,1])
gca().set_aspect('equal')
title('Sakhalin Island')
show()

The contour of the shapefile -- a coastline -- consists of 835 points.

In [27]:
contour_sakhalin.shape
Out[27]:
(835, 2)

Bounding box of the shapefile is easily accessible from shapefile specification

In [28]:
sakhalin_shp.bbox
Out[28]:
[141.63803100585938, 45.88860321044922, 144.75164794921875, 54.424713134765625]

Lets build a search url, according to the previous steps:

In [29]:
query_sakhalin_bbox = tuple(zip(['lonl', 'latl', 'lonu', 'latu'], map(str, sakhalin_shp.bbox)))
print(query_sakhalin_bbox)
(('lonl', '141.63803100585938'), ('latl', '45.88860321044922'), ('lonu', '144.75164794921875'), ('latu', '54.424713134765625'))
In [30]:
within_sakhalin_request_url = HERBARIUM_SEARCH_URL + '?' + '&'.join(map(lambda x: x[0] + '=' + quote(x[1].strip()), query_sakhalin_bbox))
In [31]:
within_sakhalin_request_url  #it is good to inspect an url before sending a request
Out[31]:
'http://botsad.ru/hitem/json/?lonl=141.63803100585938&latl=45.88860321044922&lonu=144.75164794921875&latu=54.424713134765625'

Getting data within Sakhalin Island bounding box:

In [32]:
server_response = urlopen(within_sakhalin_request_url)
sakhalin_data_in_bbox = pd.DataFrame(json.loads(server_response.read().decode('utf-8'))['data'])

Next step assumes applying a fine filtering with the help of the Polygon class instance:

In [33]:
from shapely.geometry import Polygon, Point
closed_sakhalin_contour = np.vstack([contour_sakhalin, contour_sakhalin[-1]]) # Polygon should be closed to check inclusions
sakhalin_poly = Polygon(closed_sakhalin_contour)
In [34]:
sakhalin_filtered = sakhalin_data_in_bbox[[sakhalin_poly.contains(Point(x,y)) for x,y in zip(sakhalin_data_in_bbox.longitude, sakhalin_data_in_bbox.latitude)]]
In [35]:
sakhalin_filtered
Out[35]:
acronym additionals altitude branch collection_finished collection_started collectors country country_id created ... note region short_note significance species_authorship species_epithet species_fullname species_id species_status updated
0 VBGI [] 792 2016-10-01 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. F.Schmidt ssiori Prunus ssiori F.Schmidt 506834 From plantlist 2017-06-13
1 VBGI [] 1042 2016-10-01 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. Nakai axillare Vaccinium axillare Nakai 225901 From plantlist 2017-06-13
2 VBGI [] 792 2016-10-01 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. F.Schmidt rugosa Ilex rugosa F.Schmidt 44334 From plantlist 2017-06-13
3 VBGI [] 1042 2016-10-01 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. (Koidz.) H.Ohba nipponica Cerasus nipponica (Koidz.) H.Ohba 499916 From plantlist 2017-06-13
4 VBGI [] 1042 2016-10-01 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. (Cham. & Schltdl.) M.Roem. sambucifolia Sorbus sambucifolia (Cham. & Schltdl.) M.Roem. 518523 From plantlist 2017-06-13
5 VBGI [] 1042 2016-10-01 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. A.Gray smallii Vaccinium smallii A.Gray 226494 From plantlist 2017-06-13
6 VBGI [] 792 2016-10-01 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. Jancz. latifolium Ribes latifolium Jancz. 251865 From plantlist 2017-06-13
7 VBGI [] 1042 2016-10-01 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. Hult‚n beauverdiana Spiraea beauverdiana Hult‚n 518694 From plantlist 2017-06-13
8 VBGI [] 192 2016-09-29 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. Cham. ermanii Betula ermanii Cham. 68288 From plantlist 2017-06-13
9 VBGI [] 192 2016-09-29 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. Cham. ermanii Betula ermanii Cham. 68288 From plantlist 2017-06-13
10 VBGI [] 26 2016-09-30 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. Blume crispula Quercus crispula Blume 588219 Recently added 2017-06-13
11 VBGI [] 26 2016-09-30 Пименова Е.А. Russia 162 2017-01-16 ... Сахалинская обл. Blume crispula Quercus crispula Blume 588219 Recently added 2017-06-13
12 VBGI [] 250 2016-09-29 Пименова Е.А. Russia 162 2017-02-09 ... одиночное крупное дерево Сахалинская обл., L. regia Juglans regia L. 265223 From plantlist 2017-06-13
13 VBGI [] 250 2016-09-29 Пименова Е.А. Russia 162 2017-02-09 ... одиночное крупное дерево Сахалинская обл., L. regia Juglans regia L. 265223 From plantlist 2017-06-13
14 VBGI [] 170 2016-06-09 2016-06-09 Корзников К.А. Russia 162 2017-02-10 ... Сахалинская область / Sakhalin Oblast (L.) Scop. odoratum Galium odoratum (L.) Scop. 523442 From plantlist 2017-06-13
15 VBGI [] 120 2016-06-21 2016-06-21 Корзников К.А. Russia 162 2017-02-10 ... Сахалинская область Rupr. macroptera Euonymus macroptera Rupr. 588266 Approved 2017-06-13
16 VBGI [] 160 2016-06-16 2016-06-16 Корзников К.А. Russia 162 2017-02-10 ... Сахалинская область (F.Schmidt) Maxim. sachalinensis Euonymus sachalinensis (F.Schmidt) Maxim. 114805 From plantlist 2017-06-13
18 VBGI [] 50 2015-06-28 2015-06-28 Корзников К.А. Russia 162 2017-02-10 ... Сахалинская область / Sakhalin Oblast (L.) Schott fragrans Dryopteris fragrans (L.) Schott 212858 From plantlist 2017-06-13
19 VBGI [] 380 2015-08-13 2015-08-13 Корзников К.А., Попова К.Б. Russia 162 2017-02-10 ... Сахалинская область / Sakhalin Oblast (Kunze) C. Presl tripteron Polystichum tripteron (Kunze) C. Presl 216977 From plantlist 2017-06-13
20 VBGI [] 40 2016-07-16 2016-07-16 Корзников К.А. Russia 162 2017-02-10 ... Сахалинская область / Sakhalin Oblast H. Lév. fauriei Cardamine fauriei H. Lév. 588274 Recently added 2017-06-13
21 VBGI [] 40 2016-07-16 2016-07-16 Корзников К.А. Russia 162 2017-02-10 ... Сахалинская область / Sakhalin Oblast H. Lév. fauriei Cardamine fauriei H. Lév. 588274 Recently added 2017-06-13
22 VBGI [] 40 2016-07-16 2016-07-16 Корзников К.А. Russia 162 2017-02-10 ... Сахалинская область / Sakhalin Oblast H. Lév. fauriei Cardamine fauriei H. Lév. 588274 Recently added 2017-06-13
23 VBGI [] 70 2015-06-27 2015-06-27 Корзников К.А. Russia 162 2017-02-10 ... Сахалинская область / Sakhalin Oblast (L.) DC. amplexifolius Streptopus amplexifolius (L.) DC. 330252 From plantlist 2017-06-13
24 VBGI [] 140 2015-07-12 2015-07-12 Корзников К.А. Russia 162 2017-02-10 ... Сахалинская область / Sakhalin Oblast (Thunb.) Makino cordatum Cardiocrinum cordatum (Thunb.) Makino 329621 From plantlist 2017-06-13
25 VBGI [] 15 2015-06-26 2015-06-26 Корзников К.А. Russia 162 2017-02-10 ... Сахалинская область / Sakhalin Oblast (L.) Ker Gawl. camschatcensis Fritillaria camschatcensis (L.) Ker Gawl. 329686 From plantlist 2017-06-13
27 VBGI [] 30 2015-06-28 2015-06-28 Корзников К.А. Russia 162 2017-02-12 ... Сахалинская область / Sakhalin Oblast (L.) Sw. lunaria Botrychium lunaria (L.) Sw. 381813 From plantlist 2017-06-13
28 VBGI [] 150 2016-06-21 2016-06-21 Корзников К.А. Russia 162 2017-02-12 ... Сахалинская область / Sakhalin Oblast (F. Schmidt) Sarg. sachalinense Phellodendron sachalinense (F. Schmidt) Sarg. 537480 From plantlist 2017-06-13
29 VBGI [] 60 2016-07-17 2016-07-17 Корзников К.А. Russia 162 2017-02-12 ... Сахалинская область / Sakhalin Oblast (F. Schmidt) Sarg. sachalinense Phellodendron sachalinense (F. Schmidt) Sarg. 537480 From plantlist 2017-06-13
30 VBGI [] 60 2016-07-17 2016-07-17 Корзников К.А. Russia 162 2017-02-12 ... Сахалинская область / Sakhalin Oblast (F. Schmidt) Sarg. sachalinense Phellodendron sachalinense (F. Schmidt) Sarg. 537480 From plantlist 2017-06-13
31 VBGI [] 70 2016-09-17 2016-09-17 Корзников К.А. Russia 162 2017-02-12 ... Сахалинская область / Sakhalin Oblast Nakai repens Skimmia repens Nakai 537822 From plantlist 2017-06-13
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
536 VBGI [] 2016-07-07 Храпко О.В. Russia 162 2017-10-12 ... Сахалинская область / Sakhalin Oblast (C. Presl) Fraser-Jenk. & Jermy expansa Dryopteris expansa (C. Presl) Fraser-Jenk. & J... 212819 From plantlist 2017-10-12
537 VBGI [] 2008-07-25 Храпко О.В. Russia 162 2017-10-12 ... Сахалинская область / Sakhalin Oblast (L.) Kuhn aquilinum Pteridium aquilinum (L.) Kuhn 205696 From plantlist 2017-10-12
538 VBGI [] 2014-07-04 Храпко О.В. Russia 162 2017-10-13 ... Сахалинская область / Sakhalin Oblast (L.) Roth filix-femina Athyrium filix-femina (L.) Roth 61481 From plantlist 2017-10-13
539 VBGI [] 2008-07-15 Храпко О.В., Царенко Н.А. Russia 162 2017-10-13 ... Сахалинская область / Sakhalin Oblast Rupr. sinense Athyrium sinense Rupr. 61829 From plantlist 2017-10-13
540 VBGI [] 2008-07-15 Храпко О.В., Царенко Н.А. Russia 162 2017-10-13 ... Сахалинская область Rupr. sinense Athyrium sinense Rupr. 61829 From plantlist 2017-10-13
541 VBGI [] 1995-07-10 Дудкин Р.В., Тесленко В.В. Russia 162 2017-10-13 ... Сахалинская область / Sakhalin Oblast (L.) Schott fragrans Dryopteris fragrans (L.) Schott 212858 From plantlist 2017-10-13
542 VBGI [] 2012-07-09 Храпко О.В. Russia 162 2017-10-16 ... Сахалинская область / Sakhalin Oblast (Fern.) Tagawa asiaticum Osmundastrum asiaticum (Fern.) Tagawa 591052 Approved 2017-10-16
543 VBGI [] 2017-07-04 Храпко О.В. Russia 162 2017-10-16 ... Сахалинская область / Sakhalin Oblast (Fern.) Tagawa asiaticum Osmundastrum asiaticum (Fern.) Tagawa 591052 Approved 2017-10-16
544 VBGI [] 2008-07-12 Храпко О.В., Царенко Н.А., Богачева А.В. Russia 162 2017-10-16 ... Сахалинская область / Sakhalin Oblast (Fern.) Tagawa asiaticum Osmundastrum asiaticum (Fern.) Tagawa 591052 Approved 2017-10-16
547 SAKH [] 52-112 Herbarium of Vascular Plants 1994-06-10 А. Таран Russia 162 2017-10-18 ... Сахалинская область / Sakhalin Oblast Ching chinensis Huperzia chinensis Ching 336147 From plantlist 2017-10-18
548 SAKH [] 52-112 Herbarium of Vascular Plants А. Таран Russia 162 2017-10-18 ... Сахалинская область / Sakhalin Oblast L. annotinum Lycopodium annotinum L. 336427 From plantlist 2017-10-18
551 VBGI [] 382 1995-07-11 Дудкин Р.В., Тесленко В.В. Russia 162 2017-10-19 ... Сахалинская область / Sakhalin Oblast Huds. viride Asplenium viride Huds. 60611 From plantlist 2017-10-19
552 VBGI [] 2012-07-09 Храпко О.В. Russia 162 2017-10-19 ... Сахалинская область / Sakhalin Oblast (C. Presl) Fraser-Jenk. & Jermy expansa Dryopteris expansa (C. Presl) Fraser-Jenk. & J... 212819 From plantlist 2017-10-19
553 VBGI [] 2016-07-07 Храпко О.В. Russia 162 2017-10-19 ... Сахалинская область / Sakhalin Oblast (C. Presl) Fraser-Jenk. & Jermy expansa Dryopteris expansa (C. Presl) Fraser-Jenk. & J... 212819 From plantlist 2017-10-19
554 VBGI [] 2012-07-09 Храпко О.В. Russia 162 2017-10-19 ... Сахалинская область / Sakhalin Oblast (C. Presl) Fraser-Jenk. & Jermy expansa Dryopteris expansa (C. Presl) Fraser-Jenk. & J... 212819 From plantlist 2017-10-19
558 VBGI [] 1997-10-03 Недолужко В.А., Денисов Н.И. Russia 162 2017-10-19 ... Сахалинская область / Sakhalin Oblast Tzvelev amurensis Leptorumohra amurensis Tzvelev 215409 From plantlist 2017-10-19
560 VBGI [] 1980-06-24 Недолужко В.А. Russia 162 2017-10-20 ... Сахалинская область / Sakhalin Oblast (L.) Sw. lunaria Botrychium lunaria (L.) Sw. 381813 From plantlist 2017-10-20
561 VBGI [] 1980-06-25 Недолужко В.А. Russia 162 2017-10-20 ... Сахалинская область / Sakhalin Oblast (Rupr.) Underw. robustum Botrychium robustum (Rupr.) Underw. 381844 From plantlist 2017-10-20
562 VBGI [] 1966-07-27 Недолужко В.А., Стародубцев В.Н. Russia 162 2017-10-20 ... Сахалинская область / Sakhalin Oblast (L.) Bernh. fragilis Cystopteris fragilis (L.) Bernh. 204147 From plantlist 2017-10-20
563 VBGI [] 1991-07-22 Храпко О.В. Russia 162 2017-10-20 ... Сахалинская область / Sakhalin Oblast (L.) Newman dryopteris Gymnocarpium dryopteris (L.) Newman 204208 From plantlist 2017-10-20
564 VBGI [] 2015-07-18 Храпко О.В. Russia 162 2017-10-20 ... Сахалинская область / Sakhalin Oblast (L.) Newman dryopteris Gymnocarpium dryopteris (L.) Newman 204208 From plantlist 2017-10-20
565 VBGI [] 2006-08-14 Галанин А.В. Russia 162 2017-10-20 ... Сахалинская обл. (Spenn.) Fée braunii Polystichum braunii (Spenn.) Fée 216417 From plantlist 2017-10-20
566 VBGI [] 1966-08-24 Павлова Н.С., Панков Russia 162 2017-10-31 ... Сахалинская область / Sakhalin Oblast R. Br. acrostichoides Cryptogramma acrostichoides R. Br. 485069 From plantlist 2017-10-31
567 VBGI [] 1971-07-11 Егорова Е.М. Russia 162 2017-11-08 ... Сахалинская область / Sakhalin Oblast (Turcz. ex Kunze) Sa. Kurata sibiricum Diplazium sibiricum (Turcz. ex Kunze) Sa. Kurata 62611 From plantlist 2017-11-08
568 VBGI [] 1980-06-24 Недолужко В.А. Russia 162 2017-11-09 ... Сахалинская область / Sakhalin Oblast (S.G. Gmel.) Ångström lanceolatum Botrychium lanceolatum (S.G. Gmel.) Ångström 381807 From plantlist 2017-11-09
569 VBGI [] 1986-07-31 Недолужко В.А., Стародубцев В.Н. Russia 162 2017-11-09 ... Сахалинская область / Sakhalin Oblast (Rupr.) Underw. robustum Botrychium robustum (Rupr.) Underw. 381844 From plantlist 2017-11-09
570 VBGI [] 1987-09-11 Стеценко Н.М. Russia 162 2017-11-09 ... Сахалинская область / Sakhalin Oblast (Rupr.) Underw. robustum Botrychium robustum (Rupr.) Underw. 381844 From plantlist 2017-11-09
571 VBGI [] 1969-09-25 Алексеева Л.М. Russia 162 2017-11-09 ... Сахалинская область / Sakhalin Oblast (Rupr.) Underw. robustum Botrychium robustum (Rupr.) Underw. 381844 From plantlist 2017-11-09
572 VBGI [] 1974-08-09 Ворошилова Г.И., Гвоздева И., Карпова Е., Опри... Russia 162 2017-11-09 ... Сахалинская область / Sakhalin Oblast (Rupr.) Underw. robustum Botrychium robustum (Rupr.) Underw. 381844 From plantlist 2017-11-09
573 VBGI [] 1406 Bryophyte herbarium 2006-08-17 V.A. Bakalin Russia 162 2017-11-10 ... Central part of Sakhalin Island. Nabilsky Rang... Дальний Восток|Russian Far East Rev. by V.A. Bakalin: Ok! Jun 2016 Steph. japonica Nardia japonica Steph. 588500 Approved 2017-11-10

549 rows × 41 columns

In [36]:
sakhalin_data_in_bbox.shape
Out[36]:
(574, 41)

Now, we will find ID's of the points belonging to the bounding box of Sakhalin Island, but not to its countour (coastline) defined in the shapefile.

In [37]:
set(sakhalin_data_in_bbox.id.values) - set(sakhalin_filtered.id.values)
Out[37]:
{1403,
 1412,
 1432,
 1434,
 1438,
 1439,
 1529,
 10015,
 10016,
 10373,
 19819,
 19820,
 19823,
 19912,
 20830,
 20832,
 30563,
 31010,
 31032,
 31079,
 31080,
 31152,
 31153,
 31154,
 31161}

Inspecting positions of the filtered points, e.g. ID=1412 (see http://botsad.ru/hitem/1412), one can conclude that all filtered records were collected near the coastline; this is probably caused by errors in herbarium records and coastline points positioning.So, that isn't a true error, but such cases should be taken into account when do filtering by polygonal areas.

Lets find herbarium records collected in the proximity of Petropavlovsk-Kamchatsky city. Firstly, we set coordinates of the center of Petropavlovsk-Kamchatsky city and define a bounding box that includes 200-km circle around this point:

In [38]:
kamchatka_bbox = [151.1, 47.8, 172.0, 58.3]
petropavlovsk_coords = (53.145992, 158.683548)

The Earth shape isn't a prefect sphere, so we need an additional tool that provides a function to get estimation of distances between geographically distributed points. Geopy package provides necessary functionality to do distance computation.

In [39]:
from geopy.distance import vincenty
query_kamchatka_bbox = tuple(zip(['lonl', 'latl', 'lonu', 'latu'], map(str, kamchatka_bbox)))
near_petropavlovsk_kamchatsky_url = HERBARIUM_SEARCH_URL + '?' + '&'.join(map(lambda x: x[0] + '=' + quote(x[1].strip()), query_kamchatka_bbox))
server_response = urlopen(near_petropavlovsk_kamchatsky_url)
petropavlovsk_data_in_bbox = pd.DataFrame(json.loads(server_response.read().decode('utf-8'))['data'])
petropavlovsk_data_in_bbox.shape
Out[39]:
(306, 41)
In [40]:
petropavlovsk_filtered = petropavlovsk_data_in_bbox[[vincenty((lat, lon), petropavlovsk_coords).km < 200.0 for lat,lon in zip(petropavlovsk_data_in_bbox.latitude, petropavlovsk_data_in_bbox.longitude)]]
In [41]:
petropavlovsk_filtered.shape
Out[41]:
(146, 41)

When the datasets are obtained, we can carry out some investigations (e.g. comparison analysis):

In [42]:
print('The number of unique genera in 200 km circle around the Petropavlovsk-Kamchatsky city:', len(petropavlovsk_filtered.genus.unique()))
The number of unique genera in 200 km circle around the Petropavlovsk-Kamchatsky city: 64
In [43]:
print('The number of unique species in 200-km circle around the Petropavlovsk-Kamchatsky city:', len(petropavlovsk_filtered.species_id.unique()))
The number of unique species in 200-km circle around the Petropavlovsk-Kamchatsky city: 78
In [44]:
print('The number of unique genera at Sakhalin Island:', len(sakhalin_filtered.species_id.unique()))
The number of unique genera at Sakhalin Island: 211
In [45]:
print('The number of unique species at Sakhalin Island:', len(sakhalin_filtered.genus.unique()))
The number of unique species at Sakhalin Island: 136

Let's count frequencies:

In [46]:
from collections import Counter

Genera frequencies near the Petropavlovsk-Kamchatsky city

In [47]:
petropavlovsk_freq = petropavlovsk_filtered.genus.value_counts() / len(petropavlovsk_filtered)
petropavlovsk_freq
Out[47]:
Riccardia          0.061644
Aneura             0.061644
Dryopteris         0.047945
Moerckia           0.047945
Peltolepis         0.041096
Botrychium         0.041096
Nardia             0.041096
Conocephalum       0.034247
Gymnocarpium       0.027397
Calycularia        0.027397
Pellia             0.027397
Cystopteris        0.020548
Calamagrostis      0.020548
Lunathyrium        0.020548
Polystichum        0.020548
Preissia           0.020548
Sauteria           0.020548
Athyrium           0.020548
Phegopteris        0.020548
Marchantia         0.020548
Agrostis           0.013699
Cryptogramma       0.013699
Stellaria          0.013699
Blasia             0.013699
Draba              0.013699
Euphrasia          0.013699
Oreopteris         0.013699
Salix              0.013699
Saxifraga          0.006849
Gentianella        0.006849
                     ...   
Lophozia           0.006849
Papaver            0.006849
Huperzia           0.006849
Abies              0.006849
Andromeda          0.006849
Myrica             0.006849
Dryas              0.006849
Anaphalis          0.006849
Arnica             0.006849
Ermania            0.006849
Myosotis           0.006849
Chamaedaphne       0.006849
Parnassia          0.006849
Cardaminopsis      0.006849
Eriophorum         0.006849
Sagina             0.006849
Erigeron           0.006849
Urtica             0.006849
Fritillaria        0.006849
Trollius           0.006849
Metasolenostoma    0.006849
Oxytropis          0.006849
Trientalis         0.006849
Mannia             0.006849
Douglasia          0.006849
Equisetum          0.006849
Tephroseris        0.006849
Cardamine          0.006849
Rubus              0.006849
Saussurea          0.006849
Name: genus, Length: 64, dtype: float64

Genera frequencies at Sakhalin Island:

In [48]:
sakhalin_freq = sakhalin_filtered.genus.value_counts() / len(sakhalin_filtered)
sakhalin_freq
Out[48]:
Porella           0.071038
Riccardia         0.040073
Conocephalum      0.038251
Dryopteris        0.030965
Gymnocarpium      0.027322
Jungermannia      0.025501
Preissia          0.023679
Scapania          0.023679
Lophozia          0.023679
Peltolepis        0.021858
Leptorumohra      0.021858
Asarum            0.021858
Marchantia        0.020036
Nardia            0.020036
Mesoptychia       0.020036
Cephalozia        0.018215
Calypogeia        0.018215
Frullania         0.016393
Blepharostoma     0.016393
Phegopteris       0.014572
Botrychium        0.014572
Sauteria          0.014572
Mylia             0.014572
Sphenolobus       0.012750
Lejeunea          0.012750
Woodsia           0.010929
Orthocaulis       0.010929
Pellia            0.010929
Leiocolea         0.010929
Reboulia          0.009107
                    ...   
Primula           0.001821
Vitis             0.001821
Brylkinia         0.001821
Fritillaria       0.001821
Larix             0.001821
Pteridium         0.001821
Spiraea           0.001821
Lophoziopsis      0.001821
Hierochloe        0.001821
Seligeria         0.001821
Ilex              0.001821
Lycopodium        0.001821
Orthothecium      0.001821
Thuidiaceae       0.001821
Artemisia         0.001821
Geranium          0.001821
Androsace         0.001821
Schistidium       0.001821
Plagiochasma      0.001821
Asplenium         0.001821
Solenostoma       0.001821
Malva             0.001821
Cardiocrinum      0.001821
Corydalis         0.001821
Calycularia       0.001821
Beckmannia        0.001821
Matteuccia        0.001821
Tetraplodon       0.001821
Monotropastrum    0.001821
Sanguisorba       0.001821
Name: genus, Length: 136, dtype: float64

Shannon's informational measures:

In [49]:
shannon_sakhalin = - sum(np.log2(sakhalin_freq.values) * sakhalin_freq.values)
shannon_sakhalin
Out[49]:
6.3251019029259146
In [50]:
shannon_petropavlovsk = - sum(np.log2(petropavlovsk_freq.values) * petropavlovsk_freq.values)
shannon_petropavlovsk
Out[50]:
5.564519271608396

Relative values of informational measures (relative to its theoretically maximal value):

In [51]:
shannon_sakhalin_relative = shannon_sakhalin / np.log2(len(sakhalin_freq))
shannon_sakhalin_relative
Out[51]:
0.89243528249808335
In [52]:
shannon_petropavlovsk_relative = shannon_petropavlovsk / np.log2(len(petropavlovsk_freq))
shannon_petropavlovsk_relative
Out[52]:
0.92741987860139929

Note: One can be confused looking at the results: genus diversity near the Petropavlovsk-Kamchatsky city is greater than on Sakhalin Island; this could be caused by lots of impacts -- such as Herbarium database filling peculiarities (the database filling is still in progress...) and just by a statistical ambiguity; records in the database are collected in a non-random and/or non-regular way that, in turn, may lead to fake conclusions. So, be careful making any conclusions...

In [53]:
import datetime
print("Date of last code execution: ", datetime.datetime.now())
Date of last code execution:  2017-11-12 11:31:02.205075