In [1]:
list.of.packages <- c("gdata","R.utils","rgdal","raster","rbioclim","stringr","stringi",
"proj4","data.table","FSA","plyr","wordcloud","KernSmooth","gdtools")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages) > 0) install.packages(new.packages)

lapply(list.of.packages, FUN = function(X) {
do.call("require", list(X))
})


Loading required package: gdata

gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.

gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.

Attaching package: ‘gdata’

The following object is masked from ‘package:stats’:

nobs

The following object is masked from ‘package:utils’:

object.size

The following object is masked from ‘package:base’:

startsWith

R.methodsS3 v1.8.1 (2020-08-26 16:20:06 UTC) successfully loaded. See ?R.methodsS3 for help.

R.oo v1.24.0 (2020-08-26 16:11:58 UTC) successfully loaded. See ?R.oo for help.

Attaching package: ‘R.oo’

The following object is masked from ‘package:R.methodsS3’:

throw

The following objects are masked from ‘package:gdata’:

ll, trim

The following objects are masked from ‘package:methods’:

getClasses, getMethods

The following objects are masked from ‘package:base’:

R.utils v2.10.1 (2020-08-26 22:50:31 UTC) successfully loaded. See ?R.utils for help.

Attaching package: ‘R.utils’

The following objects are masked from ‘package:gdata’:

env, resample

The following object is masked from ‘package:utils’:

timestamp

The following objects are masked from ‘package:base’:

cat, commandArgs, getOption, inherits, isOpen, nullfile, parse,
warnings

rgdal: version: 1.5-23, (SVN revision 1121)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
Path to GDAL shared files: /usr/share/gdal/2.2
GDAL binary built with GEOS: TRUE
Loaded PROJ runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
Path to PROJ shared files: (autodetected)

Attaching package: ‘rgdal’

The following object is masked from ‘package:R.oo’:

getDescription

Attaching package: ‘raster’

The following objects are masked from ‘package:R.utils’:

extract, resample

The following objects are masked from ‘package:R.oo’:

extend, trim

The following objects are masked from ‘package:gdata’:

resample, trim

Attaching package: ‘rbioclim’

The following object is masked from ‘package:raster’:

getData

Attaching package: ‘proj4’

The following object is masked from ‘package:rgdal’:

project

Attaching package: ‘data.table’

The following object is masked from ‘package:raster’:

shift

The following objects are masked from ‘package:gdata’:

first, last

## FSA v0.8.32. See citation('FSA') if used in publication.
## Run fishR() for related website and fishR('IFAR') for related book.


1. TRUE
2. TRUE
3. TRUE
4. TRUE
5. TRUE
6. TRUE
7. TRUE
8. TRUE
9. TRUE
10. TRUE
11. TRUE
12. TRUE
13. TRUE
14. TRUE
Skipping install of 'rbioclim' from a github remote, the SHA1 (662b653b) has not changed since last install.
Use force = TRUE to force installation



As part of this exercise, we'll have to download climate and topographic information. The default directory to which files will be downloaded is the current working directory. If you'd like to specify a different folder, change "getwd()" to the filepath with frontslashes between quotes. (e.g. "I:/DataArc/")

In [2]:
path <- getwd()
setwd(path)
set.seed(1601720)
rasterOptions(maxmemory=1.1e9)

# Set up necessary directories
if (!dir.exists("WorldClim")) {dir.create("WorldClim")}
if (!dir.exists("DEM")) {dir.create("DEM")}


We'll also load the "Translator.R" file, that contains a custom function designed to convert an output JSON from the dataARC API to a parsable data.table or CSV. To use it, simply call "arc.dataconvert()" on a JSON or compressed JSON (.json.gz). If you'd like to export the converted table, simply add the argument "export = filepath" where filepath is the desired output file in .csv or .gz formats. Users who desire greater control over the data structure can modify the "Translator.R" file directly. If you are accessing this Jupyter notebook through a binder, a pre-processed output has been provided due to memory limits.

In [3]:
source("Translator.R")


First we need to download the necessary rasters from the internet. This particular getData function from the rbioclim library first checks to see if the files already exist locally in the required directory and if not downloads them from the UC Davis repository. The provided resolutions are 2.5 arcminutes (around 1957 m at this latitude). They will be downsampled in a later step.

We start with the modern climate rasters. For time's sake we are only downloading the pre-preprocessed bioclimatic variables for annual values although the user is free to change the download to "tmin", "tmax", or "prec" for monthly values of raw variables. Unfortunately, rbioclim does not provide a way to only extract a single variable, so this could take a while (~10 min depending on the computer for both sets).

In [4]:
clim_p <- recursive.getData("pres",path="WorldClim",var="bio")
clim_p <- clim_p$pres[[1]] plot(clim_p)  importing dataset pres  Notice that we're only selecting the first raster of the downloaded rasterBrick (the [[1]] subset). This means that the preprocessing will only be appled to the mean annual temperature raster. If the user would like to work with different types of climactic variables, they may change the subset according to the indices listed at: https://www.worldclim.org/data/bioclim.html Next come the climate predictions. WorldClim provides predictions based on the CMIP6 model (CC prefix), which has predictions for 2050 and 2070 (final two digits of download code) at using four different climate scenarios (representative concentration pathways): 2.6, 4.5, 6.0, and 8.5 (first two digits of download code). 8.5 is generally taken as the worst-case scenario, but it is increasingly seen as unrepresentative of the most likely case given trends in CO2 emissions (https://www.nature.com/articles/d41586-020-00177-3). In this example we use 2.6--one of the best-case scenarios-- due to its proximity to the pathway aspired to by the Paris agreements, although the user is free to change both the year and pathway. For example, if we wanted to get the 8.5 pathway prediciton for 2070, we'd download CC8570 In [5]: clim_f <- recursive.getData("CC2650",path="WorldClim",var="bio") clim_f <- clim_f$CC2650[[1]]
plot(clim_f)

importing dataset CC2650



It's fine if you get a bunch (19) of Warnings. We'll reproject the data so it won't be an issue.

Finally, the DEM is a little tricker. Iceland lies outside of the range covered by the SRTM STS-99 mission and does not have publically-available national lidar coverage. The resource used in this demonstration is obtained from digitized topographic maps from declassified Russian intelligence sources. Data are provided at 15 arcsecond resolution (around 196 m at this latitude) but also available are 1", 3" and 5" from the website.

Due to computational limits at 2GB of memory for binders, we provide a pre-upsampled DEM. If this file is present in the main directory, the program will load that as the DEM. If not, it will check to see if the raw Russian data has been downloaded, and if so use that. If it hasn't, download and uncompress the necessary file.

In [6]:
if (file.exists("Provided_DEM.tif")) {
dem <- raster("Provided_DEM.tif")
} else {
if (!file.exists("DEM/15-C.tif")) {
destfile="DEM/isl_2010.zip")
unzip("DEM/isl_2010.zip",exdir="DEM")
}
dem <- raster("DEM/15-C.tif",sep="")
}
plot(dem)


Finally, if you don't want to keep the originally-downloaded WorldClim and DEM files, you can delete them running the cell below. If you are running this script locally, considering using them again and are not short of storage space, we recommend keeping them to drastically reduce the processing time in the future.

In [7]:
writeRaster(dem,"Iceland_DEM.tif",type="GTiff",overwrite=TRUE)
writeRaster(clim_f,"Iceland_Clim_Future.tif",type="GTiff",overwrite=TRUE)
writeRaster(clim_p,"Iceland_Clim_Present.tif",type="GTiff",overwrite=TRUE)

dem <- raster("Iceland_DEM.tif")
clim_f <- raster("Iceland_Clim_Future.tif")
clim_p <- raster("Iceland_Clim_Present.tif")



# Raster Preprocessing¶

Now that we have downloaded the necessary files, we need to prepare them for analysis. For this, we need to address three issues:

1. The rasters are unprojected, with XY coorinates (planar variables), actually standing for latitude/longitude (altitude and azimuth in spherical)
2. The rasters have different extents, all larger than the area of interest (Iceland),
3. The rasters are of different resolutions and have different origins.

First, we'll define the projection we want and a bounding box that will cover the area of interest. Projections transform spherical/geodesic data (latitude and longitude relative to the WGS1984 spheroid) to planar data (XY coordinates on a flat surface). Since we're interested in performing operations related to area (such as densities) or distance, we need a conformal projection that minimizes both types of error.

Iceland uses EPSG:9040---a Lambert Conformal Conic projection defined with the ISL2016 datum, but for most practical purposes WGS1984 will do. The init file for proj4 included in this binder release does not have access to the complete list of EPSG codes, and as such the simple EPSG import has been commented out in this script. Users may employ that line if they are running the notebook or associated scripts locally. The projections, however, are identical.

In [8]:
#proj <- CRS("+init=epsg:9040 +datum=WGS84")
proj <- CRS("+proj=lcc +lat_0=52 +lon_0=10 +lat_1=35 +lat_2=65 +x_0=4000000 +y_0=2800000
+datum=WGS84 +units=m +no_defs")


Next we need to crop the rasters to make them more manageble. Ideally, this should be no larger than our area of interest since the time needed toy process a raster increases according to the product of its length times its witdh---so large rasters get ver slow very quickly. The smaller the better.

We'll define a blank raster that describes the area we're interested in in the target projection, then we'll transform its extent to the original projections of thedownloaded data, and finally we'll crop it.

In [9]:
ext <- raster()  # Dummy target extent raster
extent(ext) <- extent(2400000,3000000,4300000,4700000)
crs(ext) <- proj

ext_proj <- projectExtent(ext,dem) # The DEM
dem <- crop(dem,ext_proj)
dem[dem <= 0] <- NA # Since the ocean is reported as having Z = 0---we want it as NA

ext_proj <- projectExtent(ext,clim_f) # The WorldClim data.
clim_f <- crop(clim_f,ext_proj)
clim_p <- crop(clim_p,ext_proj)


Now that the rasters are much smaller, we can reproject them to the target projection. We set the target resolution of the DEM at 200 m since it's a friendlier number and bilinear interpolation is performed during the resampling, so we're drawing elevation estimates from an otherwise smooth model. We'll crop the model one more time with an unprojected target extent to make sure that all of our final analysis rasters are exactly the same size and have the same origin. If you're working off of the Jupyter Binder with the provided DEM, it'll skip this step.

In [10]:
if (!file.exists("Provided_DEM.tif")) {
dem <- projectRaster(dem,crs=crs(ext),res=200)
dem <- crop(dem,ext)
}
plot(dem)


For the climate rasters, we want to downsample them so that the final products are of the same (smaller/finer) resolution as the DEM. So we'll project them to the closest integer resolution in meters, resample them to match the DEM's resolution and origin, and crop them with an unprojected extent.

NOTE THAT RESAMPLING DOES NOT GENERATE NEW INFORMATION

You cannot get "more accurate" estimates for a particular location by downsampling them. Rather, you assume that the general trend at the coarser resolution is consistent over each step/pixel, and estimate that tendency at a particular point. It cannot account for variances at smaller resolutions.

In [11]:
clim_f <- projectRaster(clim_f,crs=crs(ext),res=1957) #Project
clim_p <- projectRaster(clim_p,crs=crs(ext),res=1957)

clim_f <- resample(clim_f,dem) # Resample
clim_p <- resample(clim_p,dem)

clim_f <- crop(clim_f,ext) # and Crop
clim_p <- crop(clim_p,ext)

plot(clim_f)


Before we can finally move on to the analyses, we should (optionally) clean up our directories of files we won't use anymore and save the final data rasters so we don't have to re-generate them each time.

In [12]:
writeRaster(dem,"Iceland_DEM.tif",type="GTiff",overwrite=TRUE)
writeRaster(clim_f,"Iceland_Clim_Future.tif",type="GTiff",overwrite=TRUE)
writeRaster(clim_p,"Iceland_Clim_Present.tif",type="GTiff",overwrite=TRUE)


Now if you've already generated these files and for some reason need to restart your session, you just need to import those rasters using the raster() function (use stack() for the climate raster if you used more than one subset bioclim variable).

# DataARC Preprocessing¶

Now we can finally import the DataARC output. A pre-processed, compressed CSV generated from all dataARC observations in Iceland as of March 3, 2021 has been provided in the binder containing the Jupyter notebooks. If you'd like to use your own dataARC output (be it from your own selections, or from a more-recent version), please run this locally using the "arc.dataconvert" function.

For manipulations can use R's base class of data.frame, but it can be very slow for export operations, and while indexing large datasets such as DataARC. Instead, we'll use data.table for most of the operations. We'll also "lapply" the trimws function to remove any whitespace to make things parsable. More on "lapply" later. Starting from here, you need to make sure that the following files are in the current working directory:

1. OUT_WITH_CONCEPTS_COMBINATORS_RELATED.csv, containing the dataARC observations
2. CONCEPT_DATA_FRAME.csv, containing the concept hashes
3. DATASET_HASH_OUT.csv, containing the hashes for each original dataset
In [13]:
if (file.exists("Provided_Results.gz")) {
} else {
arc <- arc.dataconvert("results.json.gz")
}

setnames(arc,names(arc),c("NUMBER","ID","Y","X","DATASET","CATEGORY","ENTRY","CONCEPT","COMBINATOR","RELATED","CONTEXT"))
arc[, names(arc) := lapply(.SD,trimws,which='left')]

tail(arc)

A data.table: 6 × 11
NUMBERIDYXDATASETCATEGORYENTRYCONCEPTCOMBINATORRELATEDCONTEXT
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>

The nice thing about data.tables is that we can manipulate the data in-place without having to generate a copy first. So, for example if there are entries in the XY columns that don't contain numbers, the columns are imported as if they were text. We'll use as.numeric to ensure that they're numbers

In [14]:
arc[,c("X","Y") := lapply(.SD,as.numeric), .SDcols = c("X","Y")]


Data.tables are like data.frames, but far more powerful. Most of the action occurs in the indexing parameters, in the form [i,j,k,SD] where "i" is a number or a logical condition to filter rows, "j" how columns will be manipulated and returned, "k" the features by which we can group, and SD indicating which columns to use for operations.

Notice that we used the ':=' symbol and didn't assign the output to a new variable or re-assign it to the original arc variable. The := symbol changes it in-place. .SD tells it to iteratively apply that function to all columns defined under .SDcols. We could have created new columns by simply changing the name of the strings in the first vector in the j slot.

DataARC also reports coordinates in unprojected decimal degrees according to the WGS1984 spheroid. We need to transform these values to the appropriate projection.

The function we use to project raw coordinates in decimal degrees form expects radians in unprojected systems, so we first need to convert the XY to radians according to 180° = π rad

In [15]:
arc[, :=(X = X * ..pi / 180, Y = Y * ..pi / 180)]

# Now we can convert to our projection system
arc[,:=(X = ptransform(.(X,Y),src.proj=CRS("+proj=longlat +datum=WGS84"),
dst.proj=proj)[[1]],
Y = ptransform(.(X,Y),src.proj=CRS("+proj=longlat +datum=WGS84"),
dst.proj=proj)[[2]])]
arc[1:25,.(ID,X,Y)]

A data.table: 25 × 3
IDXY
<chr><dbl><dbl>
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383
5FD9474424B939001C69230324951204446383

# Plotting dataARC Observations¶

Let's start off by making a plot of all the sites in the datasets. For this, we'll group by ID (since it belongs to different observations for each individual site) in the 3rd/k column, and take the mean of each and X and Y column. In theory, the X and Y should be the same for each entry with the same ID, so we could also use min() or max(), or list(unique()). We'll use the .SD feature so that we can use lapply to iteratively get the mean on each column.

In [16]:
output <- arc[,lapply(.SD,mean), by=ID, .SDcols = c("X","Y")][,.(X,Y)]
output <- na.omit(output)


Now we convert this from a data.table to a SpatialPointsDataFrame object (a vector format, akin to a shapefile with point values), and plot onto the DEM using the raster package's plot function

In [17]:
output <- SpatialPointsDataFrame(output[,.(X,Y)],output,proj4string=proj)
plot(dem)