NBN Package

This notebook summarises the data available via the rnbn package. The vignette provided with the package provides a better overview of the package --- this notebook just demonstrates what needs to be done to get started with it.

Setup

rnbn is straight forward to get; we just need to install it from CRAN.

In [15]:
library(ggplot2)
library(ggmap)
library(dplyr)
install.packages('rnbn')
library(rnbn)
Warning message:
: package ‘ggplot2’ was built under R version 3.2.3Warning message:
: package ‘ggmap’ was built under R version 3.2.3
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

The downloaded binary packages are in
	/var/folders/y_/wsxrd3b16k99k229_7f38sph0000gn/T//RtmpX3ipxz/downloaded_packages

Logging In

The NBN data service needs a username and password. The vignette contains instructions on how to get one. It's not strictly necessary to run nbnLogin; all the functions in the package will call nbnLogin automatically if you aren't already logged in.

In this case I'm hiding my details in my environment to stop them getting printed out to the console.

In [3]:
#Change these if running.
nbn.user <- Sys.getenv("NBN_USER")
nbn.password <- Sys.getenv("NBN_PASSWORD")

nbnLogin(nbn.user,nbn.password)

Data Organisation

The NBN set is principally designed to log observations of species in different locations at different times. This is an abstraction over the underlying data generated from studies and data sets.

Data are principally organised by TVKs (Taxon Version Keys). A Taxon is simply a thing that can have observations logged about it. The package includes a function to get the TVKs for names of species to make the mapping easier.

This shows one of several views of the NBN data we can access; the taxon-level information which includes more details on the species and some aggregate stats.

In [4]:
head(getTVKQuery(query="badger"))
Out[4]:
entityTypesearchMatchTitledescriptpExtendedNametaxonVersionKeynamelanguageKeytaxonOutputGroupKeytaxonOutputGroupNameorganismKeyranknameStatusversionFormgatewayRecordCounthrefptaxonVersionKey
1taxonBadgerMeles meles (Linnaeus, 1758), TERRESTRIAL MAMMAL, 62006 record(s)Meles meles (Linnaeus, 1758), TERRESTRIAL MAMMALNBNSYS0000164968BadgerenNHMSYS0000080085terrestrial mammalNBNORG0000049711SpeciesSynonymWell-formed62006https://data.nbn.org.uk/Taxa/NHMSYS0000080191NHMSYS0000080191
2taxonBadger FleaParaceras melis (Walker, 1856), INSECT - FLEA (SIPHONAPTERA), 472 record(s)Paraceras melis (Walker, 1856), INSECT - FLEA (SIPHONAPTERA)NBNSYS0000164969Badger FleaenNHMSYS0000629161insect - flea (Siphonaptera)NBNORG0000013049SpeciesRecommendedWell-formed472https://data.nbn.org.uk/Taxa/NBNSYS0000013055NBNSYS0000013055
3taxona Badger fleaChaetopsylla (Chaetopsylla) trichosa Kohaut, 1903, INSECT - FLEA (SIPHONAPTERA), 2 record(s)Chaetopsylla (Chaetopsylla) trichosa Kohaut, 1903, INSECT - FLEA (SIPHONAPTERA)NHMSYS0020322599a Badger fleaenNHMSYS0000629161insect - flea (Siphonaptera)NBNORG0000061712SpeciesRecommendedWell-formed2https://data.nbn.org.uk/Taxa/NHMSYS0000545919NHMSYS0000545919
4taxonEurasian BadgerMeles meles (Linnaeus, 1758), TERRESTRIAL MAMMAL, 62006 record(s)Meles meles (Linnaeus, 1758), TERRESTRIAL MAMMALNHMSYS0000332260Eurasian BadgerenNHMSYS0000080085terrestrial mammalNBNORG0000049711SpeciesRecommendedWell-formed62006https://data.nbn.org.uk/Taxa/NHMSYS0000080191NHMSYS0000080191

Lists are OK, but usually we just want the top result:

In [5]:
getTVKQuery(query="badger", top=T)
Out[5]:
entityTypesearchMatchTitledescriptpExtendedNametaxonVersionKeynamelanguageKeytaxonOutputGroupKeytaxonOutputGroupNameorganismKeyranknameStatusversionFormgatewayRecordCounthrefptaxonVersionKey
1taxonBadgerMeles meles (Linnaeus, 1758), TERRESTRIAL MAMMAL, 62006 record(s)Meles meles (Linnaeus, 1758), TERRESTRIAL MAMMALNBNSYS0000164968BadgerenNHMSYS0000080085terrestrial mammalNBNORG0000049711SpeciesSynonymWell-formed62006https://data.nbn.org.uk/Taxa/NHMSYS0000080191NHMSYS0000080191

Armed with a Taxon Version Key, you can then interrogate the NBN database for observations of that species.

In [6]:
mushroom <- getTVKQuery(query="mushroom", top=T)
occurrences <- getOccurrences(gridRef='SK38',silent=T,acceptTandC=T)
nrow(occurrences)
Out[6]:
149701

We can plot this a little nicer using ggmap. Before we do, we need densities (there is a lot of overlap between points). We'll make a 10x10 grid and count the observations in each grid. To start with, compute the centres of each grid rectangle:

In [7]:
bins <- 10
x <- seq(from=min(occurrences$longitude), to=max(occurrences$longitude), length.out=bins)
y <- seq(from=min(occurrences$latitude), to=max(occurrences$latitude), length.out=bins)
binwidth.x <- (max(occurrences$longitude) - min(occurrences$longitude) ) / bins
binwidth.y <- (max(occurrences$latitude) - min(occurrences$latitude) ) / bins
binwidth <- c(binwidth.x, binwidth.y)
grid <- expand.grid(x=x,y=y)
head(grid)
Out[7]:
xy
1-1.550153.316
2-1.533553.316
3-1.51753.316
4-1.500453.316
5-1.483853.316
6-1.467253.316

Functions to give us the upper and lower bound of a bin given its centre-point:

In [8]:
bin.centre <- function(coord,d){ binwidth * floor(coord/binwidth[d]) }
bin.lower  <- function(coord,d){ coord - binwidth[d]/2   }
bin.upper  <- function(coord,d){ coord + binwidth[d]/2   }
bin.lower(x,1)
Out[8]:
  1. -1.55757343940197
  2. -1.54099782756053
  3. -1.52442221571909
  4. -1.50784660387765
  5. -1.49127099203622
  6. -1.47469538019478
  7. -1.45811976835334
  8. -1.4415441565119
  9. -1.42496854467046
  10. -1.40839293282902

Perform the count for each square. We do this by subsetting occurrences by each region and taking the size of it (using sum on the logical is faster than actually subsetting and calling length).

In [16]:
density <- mapply(function(x,y){
        with(occurrences,
             list(x=x,y=y,z=sum(
                    longitude > bin.lower(x,1) &
                    longitude <= bin.upper(x,1) & 
                    latitude > bin.lower(y,2) & 
                    latitude  <= bin.upper(y,2))
            ))}, x=grid$x,y=grid$y) %>% t
In [17]:
density <- as.data.frame(cbind(grid, z=unlist(density[,3])))

# density <- within(density, {
#     lx = bin.lower(x)
#     ly = bin.lower(y)
#     ux = bin.upper(x)
#     uy = bin.upper(y)
#     }
#     )
head(density)
Out[17]:
xyz
1-1.550153.31671
2-1.533553.316151
3-1.51753.316198
4-1.500453.316111
5-1.483853.3161
6-1.467253.3166

How does that look?

In [18]:
ggplot(density, aes(fill=z)) + 
    geom_rect(aes(xmin=bin.lower(x,1),ymin=bin.lower(y,2),xmax=bin.upper(x,1),ymax=bin.upper(y,2))) +
    scale_fill_distiller(palette = "Spectral")

Now we have our counts for each rectangle, we can plot them over a map. We'll use ggmap for this:

In [19]:
location <- c(median(x),median(y))
map <- get_map(location, zoom=12, source="google")

ggmap(map) + 
    geom_rect(data=density,
              inherit.aes=F,
              aes(xmin=bin.lower(x,1),ymin=bin.lower(y,2),xmax=bin.upper(x,1),ymax=bin.upper(y,2),fill=z,alpha=z)) +
    scale_fill_distiller(palette = "Spectral")
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=53.360781,-1.475524&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false