Python programming language is a popular tool for speaking with a computer. It is a common purpose programming language widely used not only in programming field, but also in scientific computations, data analysis and engineering.
Among reasons of Python popularity are its programming productivity, code readability, obvious simplicity, as well as dynamic typing, supporting OOPs coding style, cross-platform runability etc.
When writing in Python, there is almost no chance to get unreadable code. Every block of the program is separated from others by predefined number of spaces that leads to an easy-to-read textual document. There are recommendations on how to write Python code - pep8. Pep8 document aims to help user write readable and easy understandable Python code.
Reproducing computations presented below assumes Python 3.5+ to be installed on your operational system, as well as some Python packages (e.g. Pandas, NumPy etc.);
To work with spatially distributed data, you will need additionally to install pyshp
(a tool for I/O operations with ESRI shape files) and shapely
(geometry primitives and interactions) packages.
Windows users can start with Anaconda distribution. This distribution was designed to provide a convenient toolset for scientific computations and data analysis on top of Python, so it could be a good starting point on the way of building computational environment for your investigations.
Executable lines of code are given in In[xxx]
-started blocks; these blocks could be executed either in Jupyter interactive environment or by coping them into a separate text file (with .py
extension) and running with Python interpreter.
This document is created using Jupyter software, interactive environment that allows mixing code and text in pretty-styled way.
To get started, we need to prepare computational environment. The most important component of it is the Pandas package; It is highly recommended to install it before doing any manipulation with data. Being installed, Pandas supplies us with a good and convenient data container -- the DataFrame class, that, in turn, allows to make sophisticated data selection and perform basic input and output (IO) operations.
We assume that the Pandas has been already installed. So, let's import it.
import pandas as pd # This is an implicit agreement: everytime you need pandas, it is better to import it as `pd`.
Let's define variable called HERBARIUM_SEARCH_URL
. This variable will point to the URI, where we will send HTTP search queries. This URI is assumed to be a permanent address, so it probably wouldn't be changed in the nearest future.
Note: Оne can use HTTPS protocol instead. In this case just replace http
with https
in HERBARIUM_SEARCH_URL
.
HERBARIUM_SEARCH_URL = 'http://botsad.ru/hitem/json/'
It is worth noting that variable's naming is a very important part of any programming process. One can think that it isn't so important, and would use shorthand notations instead, bearing in mind that everything will be clear, at least, for themselves. But this isn't a good reason to do so, even if you are writing code just for yourself. Best practice in this case assumes to choose variable names that will be clear for everyone (in an intuitive way) - on the one hand, and that will be as short as possible - on the other.
Let us organize search parameters in a list of tuples, as follows (to get full description of all available search parameters follow the link):
search_parameters_set = (('collectedby', 'Bakalin'),
('identifiedby', 'Bakalin'),
('colstart', '2016-01-01'),
('colend', '2016-12-30')
)
Let's import a set of necessary functions to make HTTP-requests automatically. Python's ecosystem provides a lot of tools to do this: one can use third party packages, or just use ones included in the Python standard distributive (a.k.a "battarie included" pack).
try:
# Python 3.x
from urllib.parse import quote
from urllib.request import urlopen
except ImportError:
# Python 2.x
from urllib import quote
from urllib import urlopen
Now, we can use defined variables to compose search URI (according to HTTP-API rules):
search_request_url = HERBARIUM_SEARCH_URL + '?' + '&'.join(map(lambda x: x[0] + '=' + quote(x[1].strip()), search_parameters_set))
According to the URI specification non-ASCII characters used in the URI should be encoded using symbol '%'. So, if your query includes non-ASCII chars, you will need to use helper function quote
to do such encoding.
In case of simple queries, one can assign search_request_url
directly, e.g.:
search_request_url = http://botsad.ru/hitem/json?collectedby=bakalin
Choosing this way is up to you, but scripting best practice assumes that you will divide process of building the search_request_url
. Using search_parameters_set
allows you to be more structured and organized in case of, for instance, making complicated searching queries, performing a set of consequent queries with different parameters etc.:
list_of_search_pars = [search_parameters_set1, search_parameters_set2, search_parameters_set3,]
list_of_search_urls = [search_request_url1, search_request_url2, search_request_url3, ]
For the sake of self-checking, one can print out the current value of the search_request_url
.
search_request_url
'http://botsad.ru/hitem/json/?collectedby=Bakalin&identifiedby=Bakalin&colstart=2016-01-01&colend=2016-12-30'
Now we are ready to make searching request to the server. In general, data loading and its transformation into Python dictionary consist of the following four lines of code:
import json
server_response = urlopen(search_request_url)
data = json.loads(server_response.read().decode('utf-8'))
server_response.close()
So, the variable data
stores Python dictionary with fields defined in official docs.
Getting started the data processing, one should check errors
and warnings
fields; if everything went fine, these variables are leaved empty, or just warnings
is non-empty.
data['errors'], data['warnings']
([], [])
Therefore, data was successfully loaded, and we can start typical data evaluation process using Pandas.
print("The number of obtained records is ", len(data['data']))
The number of obtained records is 179
Now, data['data']
is a Python dictionary; Python dictionaries are convenient containers for structured data, but Pandas DataFrame is more appropriate for this purpose.
Let us convert the dictionary to the DataFrame object; it is quite simple:
search_df = pd.DataFrame(data['data'])
search_df
is an instance of the DataFrame class, that has a lot of helper methods to get information about the data. Some of them are .info()
and .describe()
methods; they are used to get common information about the data.
search_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 179 entries, 0 to 178 Data columns (total 41 columns): acronym 179 non-null object additionals 179 non-null object altitude 179 non-null object branch 179 non-null object collection_finished 179 non-null object collection_started 179 non-null object collectors 179 non-null object country 179 non-null object country_id 177 non-null float64 created 179 non-null object details 179 non-null object dethistory 179 non-null object devstage 179 non-null object district 179 non-null object family 179 non-null object family_authorship 179 non-null object fieldid 179 non-null object genus 179 non-null object genus_authorship 179 non-null object gpsbased 179 non-null bool id 179 non-null int64 identification_finished 179 non-null object identification_started 179 non-null object identifiers 179 non-null object images 179 non-null object infraspecific_authorship 179 non-null object infraspecific_epithet 179 non-null object infraspecific_rank 179 non-null object itemcode 179 non-null object latitude 179 non-null float64 longitude 179 non-null float64 note 179 non-null object region 179 non-null object short_note 179 non-null object significance 179 non-null object species_authorship 179 non-null object species_epithet 179 non-null object species_fullname 179 non-null object species_id 179 non-null int64 species_status 179 non-null object updated 179 non-null object dtypes: bool(1), float64(3), int64(2), object(35) memory usage: 56.2+ KB
Also, one can inspect the DataFrame's content:
search_df
acronym | additionals | altitude | branch | collection_finished | collection_started | collectors | country | country_id | created | ... | note | region | short_note | significance | species_authorship | species_epithet | species_fullname | species_id | species_status | updated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | SaPa, Phan Xi Pan National Park. | Lao Cai Province | (Austin) Stephani | subciliata | Pallavicinia subciliata (Austin) Stephani | 420693 | Approved | 2017-06-13 | |||
1 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | SaPa, Phan Xi Pan National Park[Vietnam] | Lao Cai Province | (Austin) Stephani | subciliata | Pallavicinia subciliata (Austin) Stephani | 420693 | Approved | 2017-06-13 | |||
2 | VBGI | [{'species_authorship': '', 'infraspecific_aut... | 1700-1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien Son National ... | Lai Chau Province | Furuki | flavovirens | Riccardia flavovirens Furuki | 24180 | Approved | 2017-06-26 | |||
3 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park. | Lai Chau Province | Furuki | pumila | Riccardia pumila Furuki | 24296 | From plantlist | 2017-07-07 | |||
4 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | NaN | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park. | Lai Chau Province | (Lehm. & Lindenb.) Trevis. | campylophylla | Porella campylophylla (Lehm. & Lindenb.) Trevis. | 588613 | Approved | 2017-07-14 | ||||
5 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | NaN | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park. | Lai Chau Province | (Lehm. & Lindenb.) Trevis. | acutifolia | Porella acutifolia (Lehm. & Lindenb.) Trevis. | 470689 | Approved | 2017-07-11 | ||||
6 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park. | Lai Chau Province | sp. | Asterella sp. | 588815 | Approved | 2017-08-14 | ||||
7 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park. | Lai Chau Province | (Lehm. & Lindenb.) Trevis. | acutifolia | Porella acutifolia (Lehm. & Lindenb.) Trevis. | 470689 | Approved | 2017-07-11 | |||
8 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park. | Lai Chau Province | (Lehm. & Lindenb.) Trevis. | campylophylla | Porella campylophylla (Lehm. & Lindenb.) Trevis. | 588613 | Approved | 2017-07-14 | |||
9 | VBGI | [{'species_authorship': '(Scop.) Nees', 'infra... | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien Son National ... | Lai Chau Province | (L.) Dumort. | pinguis | Aneura pinguis (L.) Dumort. | 24020 | Approved | 2017-06-13 | |||
10 | VBGI | [] | 1700-1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien Son National ... | Lai Chau Province | Furuki | flavovirens | Riccardia flavovirens Furuki | 24180 | Approved | 2017-06-26 | |||
11 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park. | Lai Chau Province | sp. | Marchantia sp. | 589009 | Approved | 2017-08-08 | ||||
12 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | SaPa, Phan Xi Pan National Park | Lao Cai Province | sp. | Preissia sp. | 590106 | Approved | 2017-06-13 | ||||
13 | VBGI | [] | 1700-1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien Son National ... | Lai Chau Province | Furuki | flavovirens | Riccardia flavovirens Furuki | 24180 | Approved | 2017-06-26 | |||
14 | VBGI | [{'species_authorship': '(Grolle & Váňa) Váňa ... | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park. | Lai Chau Province | Furuki | pumila | Riccardia pumila Furuki | 24296 | From plantlist | 2017-07-07 | |||
15 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park... | Lai Chau Province | Furuki | pumila | Riccardia pumila Furuki | 24296 | From plantlist | 2017-07-07 | |||
16 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range. Hoang Lien Son National ... | Lai Chau Province | (S. Hatt.) Furuki | yakusimensis | Lobatiriccardia yakusimensis (S. Hatt.) Furuki | 24100 | Approved | 2017-06-21 | |||
17 | VBGI | [] | 1900 | Bryophyte herbarium | 2016-03-15 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | SaPa, Phan Xi Pan National Park | Lao Cai Province | (Thunb.) Grolle | japonicum | Conocephalum japonicum (Thunb.) Grolle | 184206 | From plantlist | 2017-06-13 | |||
18 | VBGI | [] | 2100 | Bryophyte herbarium | 2016-03-16 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, border of the Hoang Lien... | Lai Chau Province | (Austin) Stephani | subciliata | Pallavicinia subciliata (Austin) Stephani | 420693 | Approved | 2017-06-13 | |||
19 | VBGI | [] | 2100 | Bryophyte herbarium | 2016-03-16 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, border of the Hoang Lien... | Lai Chau Province | Schiffn. | consanguinea | Metzgeria consanguinea Schiffn. | 589938 | Approved | 2017-06-13 | |||
20 | VBGI | [] | 2100 | Bryophyte herbarium | 2016-03-16 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, border of the Hoang Lien... | Lai Chau Province | Mitt. | crispula | Calycularia crispula Mitt. | 13454 | Approved | 2017-06-13 | |||
21 | VBGI | [] | 2100 | Bryophyte herbarium | 2016-03-16 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park. | Lai Chau Province | (Lehm. & Lindenb.) Trevis. | acutifolia | Porella acutifolia (Lehm. & Lindenb.) Trevis. | 470689 | Approved | 2017-07-11 | |||
22 | VBGI | [] | 2100 | Bryophyte herbarium | 2016-03-16 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Hoang Lien Son Range, Hoang Lien National Park. | Lai Chau Province | Furuki | pumila | Riccardia pumila Furuki | 24296 | From plantlist | 2017-07-07 | |||
23 | VBGI | [] | 3143 | Bryophyte herbarium | 2016-03-17 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Sa Pa District, San Sa Ho Communne, Hoang Lien... | Lao Cai Province | (Schiffn.) Steph. | maxima | Aneura maxima (Schiffn.) Steph. | 589001 | Approved | 2017-06-13 | |||
24 | VBGI | [] | 3143 | Bryophyte herbarium | 2016-03-17 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Sa Pa District, San Sa Ho Comunne, Hoang Lien ... | Lao Cai Province | gem. | (Mitt.) Konstant. | setosa | Schistochilopsis setosa (Mitt.) Konstant. | 589942 | Approved | 2017-11-07 | ||
25 | VBGI | [] | 3143 | Bryophyte herbarium | 2016-03-17 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | SaPa, Phan Xi Pan National Park, around the peak. | Lao Cai Province | Schiffn. | consanguinea | Metzgeria consanguinea Schiffn. | 589938 | Approved | 2017-06-13 | |||
26 | VBGI | [] | 3143 | Bryophyte herbarium | 2016-03-17 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Sa Pa District, San Sa Ho Comunne, Hoang Lien ... | Lao Cai Province | sp. | Aneura sp. | 589627 | Approved | 2017-06-13 | ||||
27 | VBGI | [{'species_authorship': 'S. Hatt.', 'infraspec... | 3143 | Bryophyte herbarium | 2016-03-17 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Sa Pa District, San Sa Ho Comunne, Hoang Lien ... | Lao Cai Province | (Mitt.) Konstant. | setosa | Schistochilopsis setosa (Mitt.) Konstant. | 589942 | Approved | 2017-11-07 | |||
28 | VBGI | [] | 1350 | Bryophyte herbarium | 2016-03-18 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | SaPa, Ta Phin area. | Lao Cai Province | (Austin) Stephani | subciliata | Pallavicinia subciliata (Austin) Stephani | 420693 | Approved | 2017-06-13 | |||
29 | VBGI | [] | 1350 | Bryophyte herbarium | 2016-03-18 | V.A. Bakalin | Vietnam | 134.0 | 2017-04-07 | ... | Sa Pa District, Ta Phin Commune, Hoang Lien So... | Lao Cai Province | (Taylor) Trevis. | obtusata | Porella obtusata (Taylor) Trevis. var. macrolo... | 590629 | Approved | 2017-08-02 | |||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
149 | VBGI | [] | 81 | Bryophyte herbarium | 2016-10-16 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-05-12 | ... | Khasansky District, Kravtsovka Stream Valley. | Дальний Восток|Russian Far East | spor. | (Mont.) Grolle | leptophylla | Asterella leptophylla (Mont.) Grolle | 62971 | Approved | 2017-08-14 | ||
150 | VBGI | [] | 810 | Bryophyte herbarium | 2016-10-15 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-05-23 | ... | Shkotovsky District, Northern slope of Falaza ... | Дальний Восток|Russian Far East | Lindb. & Arnell | laxa | Calycularia laxa Lindb. & Arnell | 588423 | Approved | 2017-06-13 | |||
151 | VBGI | [{'species_authorship': 'Szweyk., Buczk. & Odr... | 124 | Bryophyte herbarium | 2016-10-16 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-05-25 | ... | Khasansky District, Kravtsovka Stream Valley. | Дальний Восток|Russian Far East | (Steph.) R.M. Schust. & Inoue | erimonus | Hattorianthus erimonus (Steph.) R.M. Schust. &... | 590071 | Approved | 2017-06-13 | |||
152 | VBGI | [{'species_authorship': 'Spruce', 'infraspecif... | 571 | Bryophyte herbarium | 2016-10-15 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-07-07 | ... | Shkotovsky District, Northern slope of Falaza ... | Дальний Восток|Russian Far East | (Hedw.) Carruth. | palmata | Riccardia palmata (Hedw.) Carruth. | 24263 | Approved | 2017-07-07 | |||
153 | VBGI | [] | 35 | Bryophyte herbarium | 2016-10-16 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-07-12 | ... | Nadezhdensky District, Razdol’naya River Valley. | Дальний Восток|Russian Far East | (Steph.) S. Hatt. | caespitans | Porella caespitans (Steph.) S. Hatt. | 588551 | Approved | 2017-07-12 | |||
154 | VBGI | [] | 571 | Bryophyte herbarium | 2016-10-15 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-07-24 | ... | Shkotovsky District, Northern slope of Falaza ... | Дальний Восток|Russian Far East | (Steph.) S. Hatt. | fauriei | Porella fauriei (Steph.) S. Hatt. | 588934 | Approved | 2017-07-24 | |||
155 | VBGI | [] | 35 | Bryophyte herbarium | 2016-10-16 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-07-27 | ... | Nadezhdinsky District, Razdol'naya River Valley | Дальний Восток|Russian Far East | Lindb. | grandiloba | Porella grandiloba Lindb. | 588822 | Approved | 2017-07-27 | |||
156 | VBGI | [] | 90 | Bryophyte herbarium | 2016-04-28 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Barabash Area | Дальний Восток|Russian Far East | ant., arch. | (L.) Raddi | hemisphaerica | Reboulia hemisphaerica (L.) Raddi subsp. hemis... | 590127 | Approved | 2017-09-15 | ||
157 | VBGI | [] | 90 | Bryophyte herbarium | 2016-04-28 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Barabash Area | Дальний Восток|Russian Far East | Lindb. | vernicosa | Porella vernicosa Lindb. | 588792 | Approved | 2017-09-15 | |||
158 | VBGI | [{'species_authorship': '(Stephani) S. Hatt.',... | 90 | Bryophyte herbarium | 2016-04-28 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Barabash Area | Дальний Восток|Russian Far East | Steph. | taradakensis | Frullania taradakensis Steph. | 264954 | Approved | 2017-09-15 | |||
159 | VBGI | [] | 90 | Bryophyte herbarium | 2016-04-28 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Barabash Area | Дальний Восток|Russian Far East | (Stephani) S. Hatt. | chinensis | Porella chinensis (Stephani) S. Hatt. | 588836 | Approved | 2017-09-15 | |||
160 | VBGI | [{'species_authorship': '(L.) Dumort.', 'infra... | 12 | Bryophyte herbarium | 2016-04-28 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Romashka Area | Дальний Восток|Russian Far East | L. | fluitans | Riccia fluitans L. | 497877 | From plantlist | 2017-09-15 | |||
161 | VBGI | [] | 12 | Bryophyte herbarium | 2016-04-28 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Romashka Area | Дальний Восток|Russian Far East | (L.) Dumort. | pinguis | Aneura pinguis (L.) Dumort. | 24020 | Approved | 2017-09-15 | |||
162 | VBGI | [] | 12 | Bryophyte herbarium | 2016-04-28 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Romashka Area | Дальний Восток|Russian Far East | (Nees) Nees | irrigua | Scapania irrigua (Nees) Nees | 550233 | Approved | 2017-09-15 | |||
163 | VBGI | [] | 12 | Bryophyte herbarium | 2016-04-28 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Romashka Area | Дальний Восток|Russian Far East | Steph. | otaruensis | Cephalozia otaruensis Steph. | 588535 | Approved | 2017-09-15 | |||
164 | VBGI | [] | 150 | Bryophyte herbarium | 2016-04-29 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Mramornaya Mt. | Дальний Восток|Russian Far East | ant., arch. | (L.) Raddi | hemisphaerica | Reboulia hemisphaerica (L.) Raddi subsp. hemis... | 590127 | Approved | 2017-09-15 | ||
165 | VBGI | [] | 150 | Bryophyte herbarium | 2016-04-29 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Mramornaya Mt. | Дальний Восток|Russian Far East | sp. | Riccia sp. | 588829 | Approved | 2017-09-15 | ||||
166 | VBGI | [] | 150 | Bryophyte herbarium | 2016-04-29 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Mramornaya Mt. | Дальний Восток|Russian Far East | sp. | Riccia sp. | 588829 | Approved | 2017-09-15 | ||||
167 | VBGI | [] | 150 | Bryophyte herbarium | 2016-04-29 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Mramornaya Mt. | Дальний Восток|Russian Far East | per. juv. | Mitt. | infusca | Plectocolea infusca Mitt. var. recondita Bakalin | 590883 | Approved | 2017-09-15 | ||
168 | VBGI | [{'species_authorship': '(A. Evans) Schljakov'... | 150 | Bryophyte herbarium | 2016-04-29 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Mramornaya Mt. | Дальний Восток|Russian Far East | per. | (Steph.) Inoue | truncatum | Pedinophyllum truncatum (Steph.) Inoue | 588820 | Approved | 2017-09-15 | ||
169 | VBGI | [] | 150 | Bryophyte herbarium | 2016-04-29 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Mramornaya Mt. | Дальний Восток|Russian Far East | (DC.) Steph. | autumnalis | Jamesoniella autumnalis (DC.) Steph. | 590059 | Approved | 2017-09-15 | |||
170 | VBGI | [{'species_authorship': '(Gottsche) Mizut.', '... | 150 | Bryophyte herbarium | 2016-04-29 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Mramornaya Mt. | Дальний Восток|Russian Far East | Steph. | taradakensis | Frullania taradakensis Steph. | 264954 | Approved | 2017-09-14 | |||
171 | VBGI | [{'species_authorship': '(Steph.) Inoue', 'inf... | 150 | Bryophyte herbarium | 2016-04-29 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Mramornaya Mt. | Дальний Восток|Russian Far East | ant., arch., per. | (Sm.) Schiffn. | divaricata | Cephaloziella divaricata (Sm.) Schiffn. | 588472 | Approved | 2017-09-14 | ||
172 | VBGI | [] | 150 | Bryophyte herbarium | 2016-04-29 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Mramornaya Mt. | Дальний Восток|Russian Far East | (Gottsche) Mizut. | sandvicensis | Trocholejeunea sandvicensis (Gottsche) Mizut. | 590069 | Approved | 2017-09-14 | |||
173 | VBGI | [] | 170 | Bryophyte herbarium | 2016-04-30 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Gryaznaya River Basin | Дальний Восток|Russian Far East | Stotler & Crotz | azurea | Calypogeia azurea Stotler & Crotz | 101182 | Approved | 2017-09-14 | |||
174 | VBGI | [] | 170 | Bryophyte herbarium | 2016-04-30 | V.A. Bakalin | Russia | 162.0 | 2017-08-22 | ... | Khasansky District, Gryaznaya River Basin | Дальний Восток|Russian Far East | (Schrad.) Hazsl. | rivularis | Chiloscyphus rivularis (Schrad.) Hazsl. | 333777 | Approved | 2017-09-14 | |||
175 | VBGI | [] | 35 | Bryophyte herbarium | 2016-10-16 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-09-24 | ... | Nadezhdinsky District, Razdol'naya River Valley | Дальний Восток|Russian Far East | ant. Rev. by Yu.S. Mamontov 24.06.2017: Ok! | Steph. | muscicola | Frullania muscicola Steph. | 264739 | Approved | 2017-09-24 | ||
176 | VBGI | [] | 639 | Bryophyte herbarium | 2016-10-15 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-10-30 | ... | Shkotovsky District, northern slope of Falaza ... | Дальний Восток|Russian Far East | per. | (Hook.) Gray | taylorii | Mylia taylorii (Hook.) Gray | 590048 | Approved | 2017-10-30 | ||
177 | VBGI | [] | 81 | Bryophyte herbarium | 2016-10-16 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-10-30 | ... | Khasansky District, Kravtsovka Stream Valley | Дальний Восток|Russian Far East | gem. | Bertol. | paleacea | Marchantia paleacea Bertol. | 350001 | Approved | 2017-10-30 | ||
178 | VBGI | [] | 35 | Bryophyte herbarium | 2016-10-16 | K.G. Klimova & V.A. Bakalin | Russia | 162.0 | 2017-10-30 | ... | Deciduous forest (Quercus mongolica) on the ri... | Дальний Восток|Russian Far East | per., ant. | A. Evans | ornata | Cololejeunea ornata A. Evans | 325431 | Approved | 2017-10-30 |
179 rows × 41 columns
Let's filter our dataset leaving the records that have defined altitude (i.e. non-empty altitude parameter):
altitude_only = search_df[search_df.altitude != ''].copy()
altitude_only.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 179 entries, 0 to 178 Data columns (total 41 columns): acronym 179 non-null object additionals 179 non-null object altitude 179 non-null object branch 179 non-null object collection_finished 179 non-null object collection_started 179 non-null object collectors 179 non-null object country 179 non-null object country_id 177 non-null float64 created 179 non-null object details 179 non-null object dethistory 179 non-null object devstage 179 non-null object district 179 non-null object family 179 non-null object family_authorship 179 non-null object fieldid 179 non-null object genus 179 non-null object genus_authorship 179 non-null object gpsbased 179 non-null bool id 179 non-null int64 identification_finished 179 non-null object identification_started 179 non-null object identifiers 179 non-null object images 179 non-null object infraspecific_authorship 179 non-null object infraspecific_epithet 179 non-null object infraspecific_rank 179 non-null object itemcode 179 non-null object latitude 179 non-null float64 longitude 179 non-null float64 note 179 non-null object region 179 non-null object short_note 179 non-null object significance 179 non-null object species_authorship 179 non-null object species_epithet 179 non-null object species_fullname 179 non-null object species_id 179 non-null int64 species_status 179 non-null object updated 179 non-null object dtypes: bool(1), float64(3), int64(2), object(35) memory usage: 57.5+ KB
print('Average altitude is {} meters above sea level'.format(altitude_only.altitude.apply(pd.to_numeric, args=('coerce',)).mean()))
Average altitude is 688.9428571428572 meters above sea level
As you can see from the .info()
output, altitude
has the type non-null object
, that means that its value could be quite arbitrary, e.g. a string, an array etc. We used .astype
method to force its type to numeric, exactly, floating point. This is important in order to use the .mean()
method.
Altitude
is a string that could have one of the forms: "700-900", "100", "300 m a.s.l." etc. So, one can wish to handle all of these cases smartly. To do so, regular expressions could be used. Regular expressions are convenient tool to do numbers extraction.
Let's consider another filtering conditions: we want to find all records collected at altitudes higher than 1 km and after 1 Aug, 2016.
To start working with dates in Pandas we need to use datetime
objects.
altitude_only.altitude = altitude_only.altitude.apply(pd.to_numeric, args=('coerce',))
altitude_only.collection_started = pd.to_datetime(altitude_only.collection_started)
deadline = pd.to_datetime('2016-08-01')
altitude_only[(altitude_only.altitude > 1000) & (altitude_only.collection_started>deadline)]
acronym | additionals | altitude | branch | collection_finished | collection_started | collectors | country | country_id | created | ... | note | region | short_note | significance | species_authorship | species_epithet | species_fullname | species_id | species_status | updated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
48 | VBGI | [{'species_authorship': '(Hook.) Gray', 'infra... | 1460.0 | Bryophyte herbarium | 2016-08-02 | V.A. Bakalin | Russia | 162.0 | 2017-04-07 | ... | Baidzhalsky Mountain System, Yarap River Middl... | Дальний Восток|Russian Far East | (S. Hatt. & Inoue) S. Hatt. & Mizut. | nana | Apotreubia nana (S. Hatt. & Inoue) S. Hatt. & ... | 573460 | Approved | 2017-08-07 | |||
52 | VBGI | [{'species_authorship': '(Huds.) H. Buch', 'in... | 1570.0 | Bryophyte herbarium | 2016-08-04 | V.A. Bakalin | Russia | 162.0 | 2017-04-07 | ... | Baidzhalsky Mountain System, Yarap River Middl... | Дальний Восток|Russian Far East | Lindb. & Arnell | laxa | Calycularia laxa Lindb. & Arnell | 588423 | Approved | 2017-06-13 | |||
53 | VBGI | [{'species_authorship': '(Schreb.) Berggr.', '... | 1570.0 | Bryophyte herbarium | 2016-08-04 | V.A. Bakalin | Russia | 162.0 | 2017-04-07 | ... | Baidzhalsky Mountain System, Yarap River Middl... | Дальний Восток|Russian Far East | (S. Hatt. & Inoue) S. Hatt. & Mizut. | nana | Apotreubia nana (S. Hatt. & Inoue) S. Hatt. & ... | 573460 | Approved | 2017-08-07 | |||
54 | VBGI | [] | 1570.0 | Bryophyte herbarium | 2016-08-04 | V.A. Bakalin | Russia | 162.0 | 2017-04-07 | ... | Baidzhalsky Mountain System, Yarap River Middl... | Дальний Восток|Russian Far East | Lindb. & Arnell | laxa | Calycularia laxa Lindb. & Arnell | 588423 | Approved | 2017-06-13 | |||
55 | VBGI | [] | 1480.0 | Bryophyte herbarium | 2016-08-06 | V.A. Bakalin | Russia | 162.0 | 2017-04-07 | ... | Baidzhalsky Mountain System, Yarap River Middl... | Дальний Восток|Russian Far East | (Hook.) Gray | taylorii | Mylia taylorii (Hook.) Gray | 590048 | Approved | 2017-11-07 | |||
56 | VBGI | [{'species_authorship': '(Wahlenb.) Dumort.', ... | 1640.0 | Bryophyte herbarium | 2016-08-06 | V.A. Bakalin | Russia | 162.0 | 2017-04-07 | ... | Baidzhalsky Mountain System, Yarap River Middl... | Дальний Восток|Russian Far East | Lindb. & Arnell | laxa | Calycularia laxa Lindb. & Arnell | 588423 | Approved | 2017-06-13 | |||
57 | VBGI | [] | 1640.0 | Bryophyte herbarium | 2016-08-06 | V.A. Bakalin | Russia | 162.0 | 2017-04-07 | ... | Baidzhalsky Mountain System, Yarap River Middl... | Дальний Восток|Russian Far East | (S. Hatt. & Inoue) S. Hatt. & Mizut. | nana | Apotreubia nana (S. Hatt. & Inoue) S. Hatt. & ... | 573460 | Approved | 2017-08-07 | |||
58 | VBGI | [{'species_authorship': '(Limpr.) Trevis.', 'i... | 1780.0 | Bryophyte herbarium | 2016-08-08 | V.A. Bakalin | Russia | 162.0 | 2017-04-07 | ... | Baidzhalsky Mountain System, Yarap River Middl... | Дальний Восток|Russian Far East | Lindb. & Arnell | laxa | Calycularia laxa Lindb. & Arnell | 588423 | Approved | 2017-06-13 | |||
59 | VBGI | [] | 1370.0 | Bryophyte herbarium | 2016-08-08 | V.A. Bakalin | Russia | 162.0 | 2017-04-07 | ... | Baidzhalsky Mountain System, Yarap River Middl... | Дальний Восток|Russian Far East | (Gottsche) Limpr. | neesiana | Pellia neesiana (Gottsche) Limpr. | 425171 | Approved | 2017-06-13 | |||
60 | VBGI | [{'species_authorship': 'Lindb.', 'infraspecif... | 1640.0 | Bryophyte herbarium | 2016-08-09 | V.A. Bakalin | Russia | 162.0 | 2017-04-07 | ... | Baidzhalsky Mountain System, Yarap River Middl... | Дальний Восток|Russian Far East | (Schrank) Kuwah. | pubescens | Apometzgeria pubescens (Schrank) Kuwah. | 363552 | Approved | 2017-06-19 |
10 rows × 41 columns
Current version of the HTTP API (см. HTTP-API Description) doesn't support queries of OR-type. One can't build a single query url that performs, for example, searching all the records with dates of collection in Spring and Fall (but not Summer).
Such type of queries could be reached by Pandas
with two or more consequent queries to the database that could emulate OR-type query. Rising problem with a big data in this case doesn't matter because it is unlikely for the Herbarium database to be very large.
Let's illustrate dividing of an OR-type query into two simple quieries.
We will consider two simple queries named search_query1
and search_query2
, and aim to build a complex one (i.e. search_query1
OR search_query2
):
search_query1 = (('collectedby', 'Крестов'),
('identifiedby', 'Крестов')
)
search_query2 = (('collectedby', 'Баркалов'),
('identifiedby', 'Пименова')
)
from functools import reduce # `reduce` was moved into `functools` in Python3, so we need to import it
# Make search queries consequently...
datastore = [] # storage for DataFrames corresponding to quieries
for sp in [search_query1, search_query2]: # We have the only two quieries
# building searching url for each query
search_request_url = HERBARIUM_SEARCH_URL + '?' + '&'.join(map(lambda x: x[0] + '=' + quote(x[1].strip()), sp))
server_response = urlopen(search_request_url)
data = json.loads(server_response.read().decode('utf-8'))
data = pd.DataFrame(data['data'])
datastore.append(data) # storing results for each query
server_response.close() # close connection to the server
# Combine results using Pandas (combining is based on uniqueness of ID):
df_combined = pd.concat(datastore).drop_duplicates('id').reset_index()
df_combined.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 536 entries, 0 to 535 Data columns (total 42 columns): index 536 non-null int64 acronym 536 non-null object additionals 536 non-null object altitude 536 non-null object branch 536 non-null object collection_finished 536 non-null object collection_started 536 non-null object collectors 536 non-null object country 536 non-null object country_id 534 non-null float64 created 536 non-null object details 536 non-null object dethistory 536 non-null object devstage 536 non-null object district 536 non-null object family 536 non-null object family_authorship 536 non-null object fieldid 536 non-null object genus 536 non-null object genus_authorship 536 non-null object gpsbased 536 non-null bool id 536 non-null int64 identification_finished 536 non-null object identification_started 536 non-null object identifiers 536 non-null object images 536 non-null object infraspecific_authorship 536 non-null object infraspecific_epithet 536 non-null object infraspecific_rank 536 non-null object itemcode 536 non-null object latitude 536 non-null float64 longitude 536 non-null float64 note 536 non-null object region 536 non-null object short_note 536 non-null object significance 536 non-null object species_authorship 536 non-null object species_epithet 536 non-null object species_fullname 536 non-null object species_id 536 non-null int64 species_status 536 non-null object updated 536 non-null object dtypes: bool(1), float64(3), int64(3), object(35) memory usage: 172.3+ KB
df_combined.shape
(536, 42)
Original dimensions of composed DataFrame's:
datastore[0].shape, datastore[1].shape
((179, 41), (359, 41))
HTTP-API provides you ability to make rectangular-type geographic queries, i.e. searching herbarium records
included in specified areas. Internal database structure doesn't allow to make a specific polygonal-type queries,
but they could be emulated by means of Python
and its third party packages.
Let us consider a hypothetical problem of comparison species diversity based on plants collected from Sakhalin Island and from 200-km circle around Petropavlovsk-Kamchatsky city.
One of the ways to handle the problem is to use ESRI shapefile to restrict searching results by specified area.
We will need pyshp
and shapely
packages to read shapefiles and process spatial data.
So, if they aren't yet installed on your computational environment, install them with the pip
tool or another Python package manager.
Also, we will suppose that ESRI shapefiles are stored in shapefiles
folder existing in your current directory.
In this case, basic reading workflow with shapefiles will be the following:
import shapefile
import numpy as np # Note: numpy is a part of Pandas pack: you can access numpy via pandas.np or pd.np
sakhalin_shp = shapefile.Reader("shapefiles/sakhalin.shp")
If no errors occurred, one can plot loaded data:
contour_sakhalin = np.array(sakhalin_shp.shapes()[0].points) #convert contour points to numpy array
from pylab import *
plot(contour_sakhalin[:,0], contour_sakhalin[:,1])
gca().set_aspect('equal')
title('Sakhalin Island')
show()
The contour of the shapefile -- a coastline -- consists of 835 points.
contour_sakhalin.shape
(835, 2)
Bounding box of the shapefile is easily accessible from shapefile specification
sakhalin_shp.bbox
[141.63803100585938, 45.88860321044922, 144.75164794921875, 54.424713134765625]
Lets build a search url, according to the previous steps:
query_sakhalin_bbox = tuple(zip(['lonl', 'latl', 'lonu', 'latu'], map(str, sakhalin_shp.bbox)))
print(query_sakhalin_bbox)
(('lonl', '141.63803100585938'), ('latl', '45.88860321044922'), ('lonu', '144.75164794921875'), ('latu', '54.424713134765625'))
within_sakhalin_request_url = HERBARIUM_SEARCH_URL + '?' + '&'.join(map(lambda x: x[0] + '=' + quote(x[1].strip()), query_sakhalin_bbox))
within_sakhalin_request_url #it is good to inspect an url before sending a request
'http://botsad.ru/hitem/json/?lonl=141.63803100585938&latl=45.88860321044922&lonu=144.75164794921875&latu=54.424713134765625'
Getting data within Sakhalin Island bounding box:
server_response = urlopen(within_sakhalin_request_url)
sakhalin_data_in_bbox = pd.DataFrame(json.loads(server_response.read().decode('utf-8'))['data'])
Next step assumes applying a fine filtering with the help of the Polygon
class instance:
from shapely.geometry import Polygon, Point
closed_sakhalin_contour = np.vstack([contour_sakhalin, contour_sakhalin[-1]]) # Polygon should be closed to check inclusions
sakhalin_poly = Polygon(closed_sakhalin_contour)
sakhalin_filtered = sakhalin_data_in_bbox[[sakhalin_poly.contains(Point(x,y)) for x,y in zip(sakhalin_data_in_bbox.longitude, sakhalin_data_in_bbox.latitude)]]
sakhalin_filtered
acronym | additionals | altitude | branch | collection_finished | collection_started | collectors | country | country_id | created | ... | note | region | short_note | significance | species_authorship | species_epithet | species_fullname | species_id | species_status | updated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | VBGI | [] | 792 | 2016-10-01 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | F.Schmidt | ssiori | Prunus ssiori F.Schmidt | 506834 | From plantlist | 2017-06-13 | |||||
1 | VBGI | [] | 1042 | 2016-10-01 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | Nakai | axillare | Vaccinium axillare Nakai | 225901 | From plantlist | 2017-06-13 | |||||
2 | VBGI | [] | 792 | 2016-10-01 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | F.Schmidt | rugosa | Ilex rugosa F.Schmidt | 44334 | From plantlist | 2017-06-13 | |||||
3 | VBGI | [] | 1042 | 2016-10-01 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | (Koidz.) H.Ohba | nipponica | Cerasus nipponica (Koidz.) H.Ohba | 499916 | From plantlist | 2017-06-13 | |||||
4 | VBGI | [] | 1042 | 2016-10-01 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | (Cham. & Schltdl.) M.Roem. | sambucifolia | Sorbus sambucifolia (Cham. & Schltdl.) M.Roem. | 518523 | From plantlist | 2017-06-13 | |||||
5 | VBGI | [] | 1042 | 2016-10-01 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | A.Gray | smallii | Vaccinium smallii A.Gray | 226494 | From plantlist | 2017-06-13 | |||||
6 | VBGI | [] | 792 | 2016-10-01 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | Jancz. | latifolium | Ribes latifolium Jancz. | 251865 | From plantlist | 2017-06-13 | |||||
7 | VBGI | [] | 1042 | 2016-10-01 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | Hult‚n | beauverdiana | Spiraea beauverdiana Hult‚n | 518694 | From plantlist | 2017-06-13 | |||||
8 | VBGI | [] | 192 | 2016-09-29 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | Cham. | ermanii | Betula ermanii Cham. | 68288 | From plantlist | 2017-06-13 | |||||
9 | VBGI | [] | 192 | 2016-09-29 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | Cham. | ermanii | Betula ermanii Cham. | 68288 | From plantlist | 2017-06-13 | |||||
10 | VBGI | [] | 26 | 2016-09-30 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | Blume | crispula | Quercus crispula Blume | 588219 | Recently added | 2017-06-13 | |||||
11 | VBGI | [] | 26 | 2016-09-30 | Пименова Е.А. | Russia | 162 | 2017-01-16 | ... | Сахалинская обл. | Blume | crispula | Quercus crispula Blume | 588219 | Recently added | 2017-06-13 | |||||
12 | VBGI | [] | 250 | 2016-09-29 | Пименова Е.А. | Russia | 162 | 2017-02-09 | ... | одиночное крупное дерево | Сахалинская обл., | L. | regia | Juglans regia L. | 265223 | From plantlist | 2017-06-13 | ||||
13 | VBGI | [] | 250 | 2016-09-29 | Пименова Е.А. | Russia | 162 | 2017-02-09 | ... | одиночное крупное дерево | Сахалинская обл., | L. | regia | Juglans regia L. | 265223 | From plantlist | 2017-06-13 | ||||
14 | VBGI | [] | 170 | 2016-06-09 | 2016-06-09 | Корзников К.А. | Russia | 162 | 2017-02-10 | ... | Сахалинская область / Sakhalin Oblast | (L.) Scop. | odoratum | Galium odoratum (L.) Scop. | 523442 | From plantlist | 2017-06-13 | ||||
15 | VBGI | [] | 120 | 2016-06-21 | 2016-06-21 | Корзников К.А. | Russia | 162 | 2017-02-10 | ... | Сахалинская область | Rupr. | macroptera | Euonymus macroptera Rupr. | 588266 | Approved | 2017-06-13 | ||||
16 | VBGI | [] | 160 | 2016-06-16 | 2016-06-16 | Корзников К.А. | Russia | 162 | 2017-02-10 | ... | Сахалинская область | (F.Schmidt) Maxim. | sachalinensis | Euonymus sachalinensis (F.Schmidt) Maxim. | 114805 | From plantlist | 2017-06-13 | ||||
18 | VBGI | [] | 50 | 2015-06-28 | 2015-06-28 | Корзников К.А. | Russia | 162 | 2017-02-10 | ... | Сахалинская область / Sakhalin Oblast | (L.) Schott | fragrans | Dryopteris fragrans (L.) Schott | 212858 | From plantlist | 2017-06-13 | ||||
19 | VBGI | [] | 380 | 2015-08-13 | 2015-08-13 | Корзников К.А., Попова К.Б. | Russia | 162 | 2017-02-10 | ... | Сахалинская область / Sakhalin Oblast | (Kunze) C. Presl | tripteron | Polystichum tripteron (Kunze) C. Presl | 216977 | From plantlist | 2017-06-13 | ||||
20 | VBGI | [] | 40 | 2016-07-16 | 2016-07-16 | Корзников К.А. | Russia | 162 | 2017-02-10 | ... | Сахалинская область / Sakhalin Oblast | H. Lév. | fauriei | Cardamine fauriei H. Lév. | 588274 | Recently added | 2017-06-13 | ||||
21 | VBGI | [] | 40 | 2016-07-16 | 2016-07-16 | Корзников К.А. | Russia | 162 | 2017-02-10 | ... | Сахалинская область / Sakhalin Oblast | H. Lév. | fauriei | Cardamine fauriei H. Lév. | 588274 | Recently added | 2017-06-13 | ||||
22 | VBGI | [] | 40 | 2016-07-16 | 2016-07-16 | Корзников К.А. | Russia | 162 | 2017-02-10 | ... | Сахалинская область / Sakhalin Oblast | H. Lév. | fauriei | Cardamine fauriei H. Lév. | 588274 | Recently added | 2017-06-13 | ||||
23 | VBGI | [] | 70 | 2015-06-27 | 2015-06-27 | Корзников К.А. | Russia | 162 | 2017-02-10 | ... | Сахалинская область / Sakhalin Oblast | (L.) DC. | amplexifolius | Streptopus amplexifolius (L.) DC. | 330252 | From plantlist | 2017-06-13 | ||||
24 | VBGI | [] | 140 | 2015-07-12 | 2015-07-12 | Корзников К.А. | Russia | 162 | 2017-02-10 | ... | Сахалинская область / Sakhalin Oblast | (Thunb.) Makino | cordatum | Cardiocrinum cordatum (Thunb.) Makino | 329621 | From plantlist | 2017-06-13 | ||||
25 | VBGI | [] | 15 | 2015-06-26 | 2015-06-26 | Корзников К.А. | Russia | 162 | 2017-02-10 | ... | Сахалинская область / Sakhalin Oblast | (L.) Ker Gawl. | camschatcensis | Fritillaria camschatcensis (L.) Ker Gawl. | 329686 | From plantlist | 2017-06-13 | ||||
27 | VBGI | [] | 30 | 2015-06-28 | 2015-06-28 | Корзников К.А. | Russia | 162 | 2017-02-12 | ... | Сахалинская область / Sakhalin Oblast | (L.) Sw. | lunaria | Botrychium lunaria (L.) Sw. | 381813 | From plantlist | 2017-06-13 | ||||
28 | VBGI | [] | 150 | 2016-06-21 | 2016-06-21 | Корзников К.А. | Russia | 162 | 2017-02-12 | ... | Сахалинская область / Sakhalin Oblast | (F. Schmidt) Sarg. | sachalinense | Phellodendron sachalinense (F. Schmidt) Sarg. | 537480 | From plantlist | 2017-06-13 | ||||
29 | VBGI | [] | 60 | 2016-07-17 | 2016-07-17 | Корзников К.А. | Russia | 162 | 2017-02-12 | ... | Сахалинская область / Sakhalin Oblast | (F. Schmidt) Sarg. | sachalinense | Phellodendron sachalinense (F. Schmidt) Sarg. | 537480 | From plantlist | 2017-06-13 | ||||
30 | VBGI | [] | 60 | 2016-07-17 | 2016-07-17 | Корзников К.А. | Russia | 162 | 2017-02-12 | ... | Сахалинская область / Sakhalin Oblast | (F. Schmidt) Sarg. | sachalinense | Phellodendron sachalinense (F. Schmidt) Sarg. | 537480 | From plantlist | 2017-06-13 | ||||
31 | VBGI | [] | 70 | 2016-09-17 | 2016-09-17 | Корзников К.А. | Russia | 162 | 2017-02-12 | ... | Сахалинская область / Sakhalin Oblast | Nakai | repens | Skimmia repens Nakai | 537822 | From plantlist | 2017-06-13 | ||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
536 | VBGI | [] | 2016-07-07 | Храпко О.В. | Russia | 162 | 2017-10-12 | ... | Сахалинская область / Sakhalin Oblast | (C. Presl) Fraser-Jenk. & Jermy | expansa | Dryopteris expansa (C. Presl) Fraser-Jenk. & J... | 212819 | From plantlist | 2017-10-12 | ||||||
537 | VBGI | [] | 2008-07-25 | Храпко О.В. | Russia | 162 | 2017-10-12 | ... | Сахалинская область / Sakhalin Oblast | (L.) Kuhn | aquilinum | Pteridium aquilinum (L.) Kuhn | 205696 | From plantlist | 2017-10-12 | ||||||
538 | VBGI | [] | 2014-07-04 | Храпко О.В. | Russia | 162 | 2017-10-13 | ... | Сахалинская область / Sakhalin Oblast | (L.) Roth | filix-femina | Athyrium filix-femina (L.) Roth | 61481 | From plantlist | 2017-10-13 | ||||||
539 | VBGI | [] | 2008-07-15 | Храпко О.В., Царенко Н.А. | Russia | 162 | 2017-10-13 | ... | Сахалинская область / Sakhalin Oblast | Rupr. | sinense | Athyrium sinense Rupr. | 61829 | From plantlist | 2017-10-13 | ||||||
540 | VBGI | [] | 2008-07-15 | Храпко О.В., Царенко Н.А. | Russia | 162 | 2017-10-13 | ... | Сахалинская область | Rupr. | sinense | Athyrium sinense Rupr. | 61829 | From plantlist | 2017-10-13 | ||||||
541 | VBGI | [] | 1995-07-10 | Дудкин Р.В., Тесленко В.В. | Russia | 162 | 2017-10-13 | ... | Сахалинская область / Sakhalin Oblast | (L.) Schott | fragrans | Dryopteris fragrans (L.) Schott | 212858 | From plantlist | 2017-10-13 | ||||||
542 | VBGI | [] | 2012-07-09 | Храпко О.В. | Russia | 162 | 2017-10-16 | ... | Сахалинская область / Sakhalin Oblast | (Fern.) Tagawa | asiaticum | Osmundastrum asiaticum (Fern.) Tagawa | 591052 | Approved | 2017-10-16 | ||||||
543 | VBGI | [] | 2017-07-04 | Храпко О.В. | Russia | 162 | 2017-10-16 | ... | Сахалинская область / Sakhalin Oblast | (Fern.) Tagawa | asiaticum | Osmundastrum asiaticum (Fern.) Tagawa | 591052 | Approved | 2017-10-16 | ||||||
544 | VBGI | [] | 2008-07-12 | Храпко О.В., Царенко Н.А., Богачева А.В. | Russia | 162 | 2017-10-16 | ... | Сахалинская область / Sakhalin Oblast | (Fern.) Tagawa | asiaticum | Osmundastrum asiaticum (Fern.) Tagawa | 591052 | Approved | 2017-10-16 | ||||||
547 | SAKH | [] | 52-112 | Herbarium of Vascular Plants | 1994-06-10 | А. Таран | Russia | 162 | 2017-10-18 | ... | Сахалинская область / Sakhalin Oblast | Ching | chinensis | Huperzia chinensis Ching | 336147 | From plantlist | 2017-10-18 | ||||
548 | SAKH | [] | 52-112 | Herbarium of Vascular Plants | А. Таран | Russia | 162 | 2017-10-18 | ... | Сахалинская область / Sakhalin Oblast | L. | annotinum | Lycopodium annotinum L. | 336427 | From plantlist | 2017-10-18 | |||||
551 | VBGI | [] | 382 | 1995-07-11 | Дудкин Р.В., Тесленко В.В. | Russia | 162 | 2017-10-19 | ... | Сахалинская область / Sakhalin Oblast | Huds. | viride | Asplenium viride Huds. | 60611 | From plantlist | 2017-10-19 | |||||
552 | VBGI | [] | 2012-07-09 | Храпко О.В. | Russia | 162 | 2017-10-19 | ... | Сахалинская область / Sakhalin Oblast | (C. Presl) Fraser-Jenk. & Jermy | expansa | Dryopteris expansa (C. Presl) Fraser-Jenk. & J... | 212819 | From plantlist | 2017-10-19 | ||||||
553 | VBGI | [] | 2016-07-07 | Храпко О.В. | Russia | 162 | 2017-10-19 | ... | Сахалинская область / Sakhalin Oblast | (C. Presl) Fraser-Jenk. & Jermy | expansa | Dryopteris expansa (C. Presl) Fraser-Jenk. & J... | 212819 | From plantlist | 2017-10-19 | ||||||
554 | VBGI | [] | 2012-07-09 | Храпко О.В. | Russia | 162 | 2017-10-19 | ... | Сахалинская область / Sakhalin Oblast | (C. Presl) Fraser-Jenk. & Jermy | expansa | Dryopteris expansa (C. Presl) Fraser-Jenk. & J... | 212819 | From plantlist | 2017-10-19 | ||||||
558 | VBGI | [] | 1997-10-03 | Недолужко В.А., Денисов Н.И. | Russia | 162 | 2017-10-19 | ... | Сахалинская область / Sakhalin Oblast | Tzvelev | amurensis | Leptorumohra amurensis Tzvelev | 215409 | From plantlist | 2017-10-19 | ||||||
560 | VBGI | [] | 1980-06-24 | Недолужко В.А. | Russia | 162 | 2017-10-20 | ... | Сахалинская область / Sakhalin Oblast | (L.) Sw. | lunaria | Botrychium lunaria (L.) Sw. | 381813 | From plantlist | 2017-10-20 | ||||||
561 | VBGI | [] | 1980-06-25 | Недолужко В.А. | Russia | 162 | 2017-10-20 | ... | Сахалинская область / Sakhalin Oblast | (Rupr.) Underw. | robustum | Botrychium robustum (Rupr.) Underw. | 381844 | From plantlist | 2017-10-20 | ||||||
562 | VBGI | [] | 1966-07-27 | Недолужко В.А., Стародубцев В.Н. | Russia | 162 | 2017-10-20 | ... | Сахалинская область / Sakhalin Oblast | (L.) Bernh. | fragilis | Cystopteris fragilis (L.) Bernh. | 204147 | From plantlist | 2017-10-20 | ||||||
563 | VBGI | [] | 1991-07-22 | Храпко О.В. | Russia | 162 | 2017-10-20 | ... | Сахалинская область / Sakhalin Oblast | (L.) Newman | dryopteris | Gymnocarpium dryopteris (L.) Newman | 204208 | From plantlist | 2017-10-20 | ||||||
564 | VBGI | [] | 2015-07-18 | Храпко О.В. | Russia | 162 | 2017-10-20 | ... | Сахалинская область / Sakhalin Oblast | (L.) Newman | dryopteris | Gymnocarpium dryopteris (L.) Newman | 204208 | From plantlist | 2017-10-20 | ||||||
565 | VBGI | [] | 2006-08-14 | Галанин А.В. | Russia | 162 | 2017-10-20 | ... | Сахалинская обл. | (Spenn.) Fée | braunii | Polystichum braunii (Spenn.) Fée | 216417 | From plantlist | 2017-10-20 | ||||||
566 | VBGI | [] | 1966-08-24 | Павлова Н.С., Панков | Russia | 162 | 2017-10-31 | ... | Сахалинская область / Sakhalin Oblast | R. Br. | acrostichoides | Cryptogramma acrostichoides R. Br. | 485069 | From plantlist | 2017-10-31 | ||||||
567 | VBGI | [] | 1971-07-11 | Егорова Е.М. | Russia | 162 | 2017-11-08 | ... | Сахалинская область / Sakhalin Oblast | (Turcz. ex Kunze) Sa. Kurata | sibiricum | Diplazium sibiricum (Turcz. ex Kunze) Sa. Kurata | 62611 | From plantlist | 2017-11-08 | ||||||
568 | VBGI | [] | 1980-06-24 | Недолужко В.А. | Russia | 162 | 2017-11-09 | ... | Сахалинская область / Sakhalin Oblast | (S.G. Gmel.) Ångström | lanceolatum | Botrychium lanceolatum (S.G. Gmel.) Ångström | 381807 | From plantlist | 2017-11-09 | ||||||
569 | VBGI | [] | 1986-07-31 | Недолужко В.А., Стародубцев В.Н. | Russia | 162 | 2017-11-09 | ... | Сахалинская область / Sakhalin Oblast | (Rupr.) Underw. | robustum | Botrychium robustum (Rupr.) Underw. | 381844 | From plantlist | 2017-11-09 | ||||||
570 | VBGI | [] | 1987-09-11 | Стеценко Н.М. | Russia | 162 | 2017-11-09 | ... | Сахалинская область / Sakhalin Oblast | (Rupr.) Underw. | robustum | Botrychium robustum (Rupr.) Underw. | 381844 | From plantlist | 2017-11-09 | ||||||
571 | VBGI | [] | 1969-09-25 | Алексеева Л.М. | Russia | 162 | 2017-11-09 | ... | Сахалинская область / Sakhalin Oblast | (Rupr.) Underw. | robustum | Botrychium robustum (Rupr.) Underw. | 381844 | From plantlist | 2017-11-09 | ||||||
572 | VBGI | [] | 1974-08-09 | Ворошилова Г.И., Гвоздева И., Карпова Е., Опри... | Russia | 162 | 2017-11-09 | ... | Сахалинская область / Sakhalin Oblast | (Rupr.) Underw. | robustum | Botrychium robustum (Rupr.) Underw. | 381844 | From plantlist | 2017-11-09 | ||||||
573 | VBGI | [] | 1406 | Bryophyte herbarium | 2006-08-17 | V.A. Bakalin | Russia | 162 | 2017-11-10 | ... | Central part of Sakhalin Island. Nabilsky Rang... | Дальний Восток|Russian Far East | Rev. by V.A. Bakalin: Ok! Jun 2016 | Steph. | japonica | Nardia japonica Steph. | 588500 | Approved | 2017-11-10 |
549 rows × 41 columns
sakhalin_data_in_bbox.shape
(574, 41)
Now, we will find ID's of the points belonging to the bounding box of Sakhalin Island, but not to its countour (coastline) defined in the shapefile.
set(sakhalin_data_in_bbox.id.values) - set(sakhalin_filtered.id.values)
{1403, 1412, 1432, 1434, 1438, 1439, 1529, 10015, 10016, 10373, 19819, 19820, 19823, 19912, 20830, 20832, 30563, 31010, 31032, 31079, 31080, 31152, 31153, 31154, 31161}
Inspecting positions of the filtered points, e.g. ID=1412 (see http://botsad.ru/hitem/1412), one can conclude that all filtered records were collected near the coastline; this is probably caused by errors in herbarium records and coastline points positioning.So, that isn't a true error, but such cases should be taken into account when do filtering by polygonal areas.
Lets find herbarium records collected in the proximity of Petropavlovsk-Kamchatsky city. Firstly, we set coordinates of the center of Petropavlovsk-Kamchatsky city and define a bounding box that includes 200-km circle around this point:
kamchatka_bbox = [151.1, 47.8, 172.0, 58.3]
petropavlovsk_coords = (53.145992, 158.683548)
The Earth shape isn't a prefect sphere, so we need an additional tool that provides a function to get estimation of distances between geographically distributed points.
Geopy
package provides necessary functionality to do distance computation.
from geopy.distance import vincenty
query_kamchatka_bbox = tuple(zip(['lonl', 'latl', 'lonu', 'latu'], map(str, kamchatka_bbox)))
near_petropavlovsk_kamchatsky_url = HERBARIUM_SEARCH_URL + '?' + '&'.join(map(lambda x: x[0] + '=' + quote(x[1].strip()), query_kamchatka_bbox))
server_response = urlopen(near_petropavlovsk_kamchatsky_url)
petropavlovsk_data_in_bbox = pd.DataFrame(json.loads(server_response.read().decode('utf-8'))['data'])
petropavlovsk_data_in_bbox.shape
(306, 41)
petropavlovsk_filtered = petropavlovsk_data_in_bbox[[vincenty((lat, lon), petropavlovsk_coords).km < 200.0 for lat,lon in zip(petropavlovsk_data_in_bbox.latitude, petropavlovsk_data_in_bbox.longitude)]]
petropavlovsk_filtered.shape
(146, 41)
When the datasets are obtained, we can carry out some investigations (e.g. comparison analysis):
print('The number of unique genera in 200 km circle around the Petropavlovsk-Kamchatsky city:', len(petropavlovsk_filtered.genus.unique()))
The number of unique genera in 200 km circle around the Petropavlovsk-Kamchatsky city: 64
print('The number of unique species in 200-km circle around the Petropavlovsk-Kamchatsky city:', len(petropavlovsk_filtered.species_id.unique()))
The number of unique species in 200-km circle around the Petropavlovsk-Kamchatsky city: 78
print('The number of unique genera at Sakhalin Island:', len(sakhalin_filtered.species_id.unique()))
The number of unique genera at Sakhalin Island: 211
print('The number of unique species at Sakhalin Island:', len(sakhalin_filtered.genus.unique()))
The number of unique species at Sakhalin Island: 136
Let's count frequencies:
from collections import Counter
Genera frequencies near the Petropavlovsk-Kamchatsky city
petropavlovsk_freq = petropavlovsk_filtered.genus.value_counts() / len(petropavlovsk_filtered)
petropavlovsk_freq
Riccardia 0.061644 Aneura 0.061644 Dryopteris 0.047945 Moerckia 0.047945 Peltolepis 0.041096 Botrychium 0.041096 Nardia 0.041096 Conocephalum 0.034247 Gymnocarpium 0.027397 Calycularia 0.027397 Pellia 0.027397 Cystopteris 0.020548 Calamagrostis 0.020548 Lunathyrium 0.020548 Polystichum 0.020548 Preissia 0.020548 Sauteria 0.020548 Athyrium 0.020548 Phegopteris 0.020548 Marchantia 0.020548 Agrostis 0.013699 Cryptogramma 0.013699 Stellaria 0.013699 Blasia 0.013699 Draba 0.013699 Euphrasia 0.013699 Oreopteris 0.013699 Salix 0.013699 Saxifraga 0.006849 Gentianella 0.006849 ... Lophozia 0.006849 Papaver 0.006849 Huperzia 0.006849 Abies 0.006849 Andromeda 0.006849 Myrica 0.006849 Dryas 0.006849 Anaphalis 0.006849 Arnica 0.006849 Ermania 0.006849 Myosotis 0.006849 Chamaedaphne 0.006849 Parnassia 0.006849 Cardaminopsis 0.006849 Eriophorum 0.006849 Sagina 0.006849 Erigeron 0.006849 Urtica 0.006849 Fritillaria 0.006849 Trollius 0.006849 Metasolenostoma 0.006849 Oxytropis 0.006849 Trientalis 0.006849 Mannia 0.006849 Douglasia 0.006849 Equisetum 0.006849 Tephroseris 0.006849 Cardamine 0.006849 Rubus 0.006849 Saussurea 0.006849 Name: genus, Length: 64, dtype: float64
Genera frequencies at Sakhalin Island:
sakhalin_freq = sakhalin_filtered.genus.value_counts() / len(sakhalin_filtered)
sakhalin_freq
Porella 0.071038 Riccardia 0.040073 Conocephalum 0.038251 Dryopteris 0.030965 Gymnocarpium 0.027322 Jungermannia 0.025501 Preissia 0.023679 Scapania 0.023679 Lophozia 0.023679 Peltolepis 0.021858 Leptorumohra 0.021858 Asarum 0.021858 Marchantia 0.020036 Nardia 0.020036 Mesoptychia 0.020036 Cephalozia 0.018215 Calypogeia 0.018215 Frullania 0.016393 Blepharostoma 0.016393 Phegopteris 0.014572 Botrychium 0.014572 Sauteria 0.014572 Mylia 0.014572 Sphenolobus 0.012750 Lejeunea 0.012750 Woodsia 0.010929 Orthocaulis 0.010929 Pellia 0.010929 Leiocolea 0.010929 Reboulia 0.009107 ... Primula 0.001821 Vitis 0.001821 Brylkinia 0.001821 Fritillaria 0.001821 Larix 0.001821 Pteridium 0.001821 Spiraea 0.001821 Lophoziopsis 0.001821 Hierochloe 0.001821 Seligeria 0.001821 Ilex 0.001821 Lycopodium 0.001821 Orthothecium 0.001821 Thuidiaceae 0.001821 Artemisia 0.001821 Geranium 0.001821 Androsace 0.001821 Schistidium 0.001821 Plagiochasma 0.001821 Asplenium 0.001821 Solenostoma 0.001821 Malva 0.001821 Cardiocrinum 0.001821 Corydalis 0.001821 Calycularia 0.001821 Beckmannia 0.001821 Matteuccia 0.001821 Tetraplodon 0.001821 Monotropastrum 0.001821 Sanguisorba 0.001821 Name: genus, Length: 136, dtype: float64
Shannon's informational measures:
shannon_sakhalin = - sum(np.log2(sakhalin_freq.values) * sakhalin_freq.values)
shannon_sakhalin
6.3251019029259146
shannon_petropavlovsk = - sum(np.log2(petropavlovsk_freq.values) * petropavlovsk_freq.values)
shannon_petropavlovsk
5.564519271608396
Relative values of informational measures (relative to its theoretically maximal value):
shannon_sakhalin_relative = shannon_sakhalin / np.log2(len(sakhalin_freq))
shannon_sakhalin_relative
0.89243528249808335
shannon_petropavlovsk_relative = shannon_petropavlovsk / np.log2(len(petropavlovsk_freq))
shannon_petropavlovsk_relative
0.92741987860139929
Note: One can be confused looking at the results: genus diversity near the Petropavlovsk-Kamchatsky city is greater than on Sakhalin Island; this could be caused by lots of impacts -- such as Herbarium database filling peculiarities (the database filling is still in progress...) and just by a statistical ambiguity; records in the database are collected in a non-random and/or non-regular way that, in turn, may lead to fake conclusions. So, be careful making any conclusions...
import datetime
print("Date of last code execution: ", datetime.datetime.now())
Date of last code execution: 2017-11-12 11:31:02.205075