Bioclim variable rasters: CHELSA

Overview

This vignette will walk through the process of downloading and tidying these climate data. We will be downloading the data from CHELSA, which is a high-resolution (30 arc-seconds, ~1km) land surface dataset provided by the Swiss Federal Institute for Forest, Snow and Landscape Research. The specific data we are interested in is the suite of 19 “bioclimatic” variables. These bioclim variables are biologically relevant climate features derived from temperature and precipitation. CHELSA also makes the same variables available as climate change predictions, using a subset of the available CMIP6 models. I will use 3 ssp scenarios (126, 370, 585) from the GFDL-ESM4 model (read ahead for an explanation of these).

First, I will download the data from CHELSA into a local library. I will then tidy, harmonize and put the datasets in the proper format for MaxEnt. We will then choose the covariates for our models via an assessment of the level of co-linearity between the rasters. I will also consider biological relevance in selecting the final set of covariates.

Note: I have added a number of if() statements throughout my vignettes, usually to prevent data from being downloaded again unnecessarily. Simply change the FALSE within the if() statement to TRUE to run a chunk. This is to prevent re-downloading.

Setup

library(tidyverse)  #data manipulation
library(here) #making directory pathways easier on different instances
library(devtools)

# spatial
library(terra)
# devtools::install_github("danlwarren/ENMTools")
library(ENMTools) # env covariates colinearity

library(patchwork)

Note: I will be setting the global options of this document so that only certain code chunks are rendered in the final .html file. I will set the eval = FALSE so that none of the code is re-run (preventing files from being overwritten during knitting) and will simply overwrite this in chunks with plots.

1. Retrieve CHELSA bioclim rasters

First, I will download the bioclim data into a local directory. I recommend downloading these data into a separate directory from this package. Since I am using a directory outside of the immediate package, I will create an object to define the file path at the beginning of each chunk. This pathing will still use here::here() to start at the package root folder. First, I will download the historical climate data from CHELSA. The URLs to the bioclim data that were downloaded from CHELSA are stored in data-raw/CHELSA within the package directory. I will be using two different versions of the CHELSA bioclim variables for my analysis:

Historical- time period 1981-2010
Climate change predictions- time period 2041-2070, GFDL-ESM4 model, Ssp scenarios 126, 370, and 585.

The CHELSA download directory for each group of bioclim variables also contains 27 other distinct climatologies, available for both historical datasets and climate change models. The URLs for these were also downloaded, as some of them might be applicable to this analysis. For now, we will download the 19 bioclim variables. If you would like to download any of the additional climatologies available from CHELSA, see code for this in the Appendix section of this vignette. Those chunks should be run after the initial run of all chunks before section 2.2. After running these chunks, continue from chunk “loop to tidy rasters” in section 2.2.

1.1 Historical

The best approach I found to download these data was to create a loop to open each URL in a web browser (be ready, as it literally just opens whatever URLs are selected from the list in a browser).

if(FALSE) {

  # historical data
  chelsa_historic_URLs <- read_table(file = file.path(here::here(), "data-raw", "CHELSA", "chelsa_1981-2010_bioclim_URLs.txt"),
                                col_names = FALSE) %>%
    as.data.frame() %>%
    dplyr::select("X1") %>%
    dplyr::rename("URL" = "X1")
  
  # select only URLs I am interested in 
  chelsa_historic_URLs <- slice(.data = chelsa_historic_URLs, 2:20)
  
  # loop to download historical URLs- I tried utils::download.files, but it would not work with these data
  for(i in 1:nrow(chelsa_historic_URLs)) {
    
    file.tmp <- chelsa_historic_URLs[i, ]
    
    utils::browseURL(url = file.tmp)
    
  }

}

With the current loop construction, these files can only be downloaded to my PC’s downloads folder, so they will need to be manually moved to the destination directory (outside of the package root folder because they are large). These files were stored manually in maxent/historical_climate_rasters/chelsa2.1_30arcsec/originals.

1.2 CMIP6

Next, I will download data from the provided CMIP6 models. I will be downloading data for different SSP scenarios (see link for an explanation of these scenarios). These ssp scenarios represent different assumptions about global development and greenhouse gas emissions. I will be using the ssp scenarios 126, 370, and 585. 126 represents a more ideal scenario where all countries are cooperating to reduce emissions. 370 represents a middle-high emissions scenario, coupled with an emphasis on regional rivalry and global conflict (which reduce cooperation and drive up emissions). 585 represents a high greenhouse gas emissions scenario where countries are cooperating and improving infrastructure, but there is no emphasis on reducing emissions.

I will use the GFDL-ESM4 CMIP6 model from NOAA. According to the CHELSA documentation, if all five available models are not used together, then CMIP6 model usage should follow the given priority (see documentation linked above). These climate rasters should be moved to their respective folders within maxent/future_climate_rasters/chelsa2.1_30arcsec once the download is complete.

SSP126

if(FALSE) {

  # ssp370 data
  chelsa_126_URLs <- read_table(
    file = file.path(here::here(), "data-raw", "CHELSA", "chelsa_2041-2070_GFDL-ESM4_ssp126_bioclim_URLs.txt"),
    col_names = FALSE
    ) %>%
    as.data.frame() %>%
    dplyr::select("X1") %>%
    dplyr::rename("URL" = "X1")
  
  
  # select only URLs I am interested in 
  chelsa_126_URLs <- slice(.data = chelsa_126_URLs, 1:19)
  
  # loop to download ssp126 URLs
  for(i in 1:nrow(chelsa_126_URLs)) {
    
    file.tmp <- chelsa_126_URLs[i, ]
    
    utils::browseURL(url = file.tmp)
    
  }

}

SSP370

if(FALSE) {

  # ssp370 data
  chelsa_370_URLs <- read_table(file = file.path(here::here(), "data-raw", "CHELSA", "chelsa_2041-2070_GFDL-ESM4_ssp370_bioclim_URLs.txt"),
                                col_names = FALSE) %>%
    as.data.frame() %>%
    dplyr::select("X1") %>%
    dplyr::rename("URL" = "X1")
  
  
  # select only URLs I am interested in 
  chelsa_370_URLs <- slice(.data = chelsa_370_URLs, 1:19)
  
  # loop to download ssp370 URLs
  for(i in 1:nrow(chelsa_370_URLs)) {
    
    file.tmp <- chelsa_370_URLs[i, ]
    
    utils::browseURL(url = file.tmp)
    
  }

}

SSP585

if(FALSE) {

  # ssp370 data
  chelsa_585_URLs <- read_table(
    file = file.path(here::here(), "data-raw", "CHELSA", "chelsa_2041-2070_GFDL-ESM4_ssp585_bioclim_URLs.txt"),
    col_names = FALSE
    ) %>%
    as.data.frame() %>%
    dplyr::select("X1") %>%
    dplyr::rename("URL" = "X1")
  
  
  # select only URLs I am interested in 
  chelsa_585_URLs <- slice(.data = chelsa_585_URLs, 1:19)
  
  # loop to download ssp370 URLs
  for(i in 1:nrow(chelsa_585_URLs)) {
    
    file.tmp <- chelsa_585_URLs[i, ]
    
    utils::browseURL(url = file.tmp)
    
  }

}

2. Tidy for MaxEnt

Next, I need to tidy these rasters for MaxEnt. I will need to mask any data that is over the oceans or other bodies of water. I will also need to create downsized versions of these variables that can be used in step 3 to assess the level of correlation between the different variables. Lastly, I will rename the files and convert them to the required .ascii format for MaxEnt.

I will mask the CHELSA bioclim rasters using the worldclim version of bio1, annual mean temperature (mask). The worldclim version has the continents already cut out, making it a good choice for the masking. I will use the chelsa bio1 version (main) as a reference to resample the new layers after setting the CRS and extent.

2.1 Load in files

First, I will load in the reference layers

# set path to external directory
mypath <- file.path(here::here() %>% 
                          dirname(),
                        "maxent/historical_climate_rasters/chelsa2.1_30arcsec")

# 1st chelsa bioclim layer as reference
ref_chelsa_bio1_main <- terra::rast(x = file.path(mypath, "originals", "CHELSA_bio1_1981-2010_V.2.1.tif"))

ref_atc_mask <- terra::rast(x = file.path(mypath, "originals", "2015_accessibility_to_cities_v1.0.tif"))

We will get a list of the files that I need to modify. We will also create an object containing the new names assigned to the files to simplify the naming.

# I will load in the files and then get the new names I would like to give them

# load in bioclim layers to be cropped- the original .tif files
env.files <- list.files(path = file.path(mypath, "originals"), pattern = "\\.tif$", full.names = TRUE) 
  

# output file names
output.files <- list.files(path = file.path(mypath, "originals"), pattern = "\\.tif$", full.names = FALSE) %>%
  # get rid of filetype endings
  gsub(pattern = ".tif", replacement = "") %>%
  # crop off ending 
  gsub(pattern = "_V.2.1", replacement = "") %>%
  tolower()
# more edits to file names
output.files <- output.files %>%
  gsub(pattern = "chelsa_", replacement = "") %>%
  # specifically edit the access to cities naming
  gsub(pattern = "2015_accessibility_to_cities_v1.0", replacement = "atc_2015")

Note In this version of the vignette, I have isolated only the variables I selected for the final analysis. If you would like to tidy and compare all variables, simply skip this next chunk.

if(FALSE) {

  env.files <- grep("bio2|bio11|bio12|bio15", env.files, value = TRUE)
  
  output.files <- grep("bio2|bio11|bio12|bio15", output.files, value = TRUE)
  
}

Now that we have our files and reference layers loaded in and names chosen, we can begin reformatting these layers.

2.2 Prepare reference layers

In this chunk, we will create 2 reference layers. ref_chelsa_bio1_main will be used for resampling the data, while ref_atc_mask will be used for masking. First, I will crop the layers, set their origins, and resample them to the same resolution. I will also keep the rasters consistently to EPSG:4326 for now.

# resolutions
res(ref_chelsa_bio1_main)
res(ref_atc_mask)
# they are identical

# ext of the reference layers
ext(ref_chelsa_bio1_main)
ext(ref_atc_mask)
# the extents will need to be corrected

terra::crs(ref_chelsa_bio1_main, parse = TRUE)
terra::crs(ref_atc_mask, parse = TRUE)
# crs of reference layers is epsg:4326 so no change needed

# check origin
terra::origin(ref_chelsa_bio1_main)
terra::origin(ref_atc_mask)
# the origin of ref_atc_mask will need to be changed to match the others

I will need to change the origin and extent of these reference layers. I will crop them to the same extent first, then unify their origins. Their extents will be matched to the smallest, ref_atc_mask, while their origins will be matched to the bioclim layer ref_chelsa_bio1_main.

# project 
## UNNECESSARY NOW THAT THE CRS IS CORRECT
# ref_chelsa_bio1_main <- terra::project(x = ref_chelsa_bio1_main, y = "ESPG:4326", method = "bilinear", threads = TRUE)
# ref_atc_mask <- terra::project(x = ref_atc_mask, y = "ESPG:4326", method = "bilinear", threads = TRUE)

# set extent
# set ext to the smallest whole number shared between the layers
ext.obj <- terra::ext(-180, 179, -60, 83)
# create main layer for future cropping, crop to new ext
ref_chelsa_bio1_main <- terra::crop(x = ref_chelsa_bio1_main, y = ext.obj)
# create a mask layer specifically for the cropping and masking
ref_atc_mask <- terra::crop(x = ref_atc_mask, y = ext.obj)

# unify origins
# set origin of ref_atc_mask
terra::origin(ref_atc_mask) <- c(-0.0001396088, -0.0001392488)
# resample mask to origin of main layer- most rasters will already have the origin of the bio1 layer
ref_atc_mask <- terra::resample(x = ref_atc_mask, y = ref_chelsa_bio1_main, method = "bilinear", threads = TRUE)

# check
terra::ext(ref_chelsa_bio1_main)
terra::ext(ref_atc_mask)
terra::origin(ref_chelsa_bio1_main)
terra::origin(ref_atc_mask)

crs("EPSG:4326", parse = TRUE)

2.3 Tidying historical data

Here is where we get into tidying the data. We will crop the rasters using the extent object created from the reference layers. We will follow up the cropping with masking, which will convert cells in x that do not have a value in global_atc to NAs. I will also ensure the crs is correct and resample the rasters. The rasters will be resampled to the resolution of global_bio1 so they have the same resolution. The output files will be saved in v1.

mypath <- file.path(here::here() %>% 
                      dirname(),
                    "maxent/historical_climate_rasters/chelsa2.1_30arcsec")

In this first chunk, I will:

crop the layers to the correct extent
mask the layers to the outlines of the continents
set the CRS to EPSG:4326
set the origin to match the reference layers
crop the layers again to ensure the aggregation doesn’t increase the extent

This chunk will take a very long time to run… so maybe find a good book to read.

# view list of filetypes for terra, use .ascii
terra::gdal(drivers = TRUE)

# loop to crop extent for all files
for(a in seq_along(env.files)) {
  
  # load each raster into temp object
  rast.hold <- terra::rast(env.files[a])
  
  
  
  # begin edits
  # crop new rasters to extent
  rast.hold <- terra::crop(x = rast.hold, y = ext.obj)
  
  # mask the bioclim layers by global_atc
  rast.hold <- terra::mask(x = rast.hold, mask = ref_atc_mask)
  
  # project crs
  rast.hold <- terra::project(x = rast.hold, y = "EPSG:4326", method = "bilinear", threads = TRUE)
  
  # set origin
  terra::origin(rast.hold) <- c(-0.0001396088, -0.0001392488)
  
  #resample to fit the extent/resolution of the main layer global_bio1
  #use bilinear interpolation, since values are continuous
  rast.hold <- terra::resample(x = rast.hold, y = ref_chelsa_bio1_main, method = "bilinear", threads = TRUE)
  
  
  
  #write out the new resampled rasters!
  terra::writeRaster(
    x = rast.hold, 
    filename = file.path(mypath, "v1", paste0(output.files[a], "_global", ".tif")), 
    filetype = "GTiff", 
    overwrite = FALSE
    )
  
  # remove object once its done
  rm(rast.hold)

}

We will convert the rasters to the .ascii format required by MaxEnt.

# directory of files to modify
env.files <- list.files(path = file.path(mypath, "v1"), pattern = "\\.tif$", full.names = TRUE) 

# output file names
output.files <- list.files(path = file.path(mypath, "v1"), pattern = "\\.tif$", full.names = FALSE) %>%
  gsub(pattern = ".tif", replacement = ".asc")

Note In this version of the vignette, I have isolated only the variables I selected for the final analysis. If you would like to tidy and compare all variables, simply skip this next chunk.

if(FALSE) {

  env.files <- grep("atc_2015_global|bio2_1981-2010_global|bio11_1981-2010_global|bio12_1981-2010_global|bio15_1981-2010_global", env.files, value = TRUE)
  
  output.files <- grep("atc_2015_global|bio2_1981-2010_global|bio11_1981-2010_global|bio12_1981-2010_global|bio15_1981-2010_global", output.files, value = TRUE)
  
}

# loop to convert to .ascii
for(a in seq_along(env.files)){
  
  # holding object  
  rast.hold <- terra::rast(env.files[a])
  
  # write output
  terra::writeRaster(
    x = rast.hold, 
    filename = file.path(mypath, "v1_maxent", output.files[a]), 
    filetype = "AAIGrid", 
    overwrite = FALSE
    )
  
  # remove object once its done
  rm(rast.hold)
    
  }

Finally, I will aggregate selected rasters to a lower resolution, with roughly 10km grid cells. We want to aggregate at 10km because we will be downsampling the resolution of our analysis to account for survey bias.

# loop to aggregate and convert to .ascii
for(a in seq_along(env.files)){
  
  # holding object  
  rast.hold <- terra::rast(env.files[a])
  
  # aggregate to 10km scale. cells are roughly 1km, so aggregate by factor of 10
  # the na.rm argument is incredibly important because if any of the cells are NAs, the aggregated cell will  be an NA value.
  rast.hold <- terra::aggregate(x = rast.hold, fact = 10, fun = "mean", na.rm = TRUE, threads = TRUE)
  
  # crop again to ensure aggregation doesnt increase extent
  rast.hold <- terra::crop(x = rast.hold, y = ext.obj)
  
  # write output
  terra::writeRaster(
    x = rast.hold, 
    filename = file.path(mypath, "v1_maxent_10km", output.files[a]), 
    filetype = "AAIGrid", 
    overwrite = FALSE
    )
  
  # remove object once its done
  rm(rast.hold)
    
}

These data can now be used in a MaxEnt model!

2.4 Tidying CMIP6 data

Here, we will go through all of the above steps in section 2.3 for the CMIP6 versions of the rasters.

# file path to directory
mypath <- file.path(here::here() %>% 
                      dirname(),
                    "maxent/future_climate_rasters/chelsa2.1_30arcsec")

SSP126

First, I will get the file names and create folders to hold the new copies per ssp scenario.

# v1 for tidied rasters
dir.create(file.path(mypath, "2041-2070_ssp126_GFDL", "v1"))

# v1_maxent for the ascii versions
dir.create(file.path(mypath, "2041-2070_ssp126_GFDL", "v1_maxent"))

# 10km aggregated versions of ascii
dir.create(file.path(mypath, "2041-2070_ssp126_GFDL", "v1_maxent_10km"))

# I will load in the files and then get the new names I would like to give them

# load in bioclim layers to be cropped- the original .tif files
env.files <- list.files(path = file.path(mypath, "2041-2070_ssp126_GFDL", "originals"), pattern = "\\.tif$", full.names = TRUE)

# output file names
output.files <- list.files(path = file.path(mypath, "2041-2070_ssp126_GFDL", "originals"), pattern = "\\.tif$", full.names = FALSE) %>%
  # change endings to show these are global raster versions
  gsub(pattern = "_V.2.1.tif", replacement = "") %>%
  # crop out some detail
  gsub(pattern = "-esm4_ssp", replacement = "_") %>%
  tolower()

# more edits to file names
output.files <- output.files %>%
  gsub(pattern = "chelsa_", replacement = "")

Note In this version of the vignette, I have isolated only the variables I selected for the final analysis. If you would like to tidy and compare all variables, simply skip this next chunk.

if(FALSE) {

  env.files <- grep("bio2|bio11|bio12|bio15", env.files, value = TRUE)
  
  output.files <- grep("bio2|bio11|bio12|bio15", output.files, value = TRUE)
  
}

tidy

# view list of filetypes for terra, use .ascii
terra::gdal(drivers = TRUE)

# loop to crop extent for all files
for(a in seq_along(env.files)) {
  
  # load each raster into temp object
  rast.hold <- terra::rast(env.files[a])
  
  
  
  # begin edits
  # crop new rasters to extent
  rast.hold <- terra::crop(x = rast.hold, y = ext.obj)
  
  # mask the bioclim layer by global_atc
  rast.hold <- terra::mask(x = rast.hold, mask = ref_atc_mask)
  
  # set crs
  rast.hold <- terra::project(x = rast.hold, y = "EPSG:4326", method = "bilinear", threads = TRUE)
  
  # set origin
  terra::origin(rast.hold) <- c(-0.0001396088, -0.0001392488)
  
  #resample to fit the extent/resolution of the main layer global_bio1
  #use bilinear interpolation, since values are continuous
  rast.hold <- terra::resample(x = rast.hold, y = ref_chelsa_bio1_main, method = "bilinear", threads = TRUE)

  
  
  
  #write out the new resampled rasters!
  terra::writeRaster(
    x = rast.hold,
    filename = file.path(mypath, "2041-2070_ssp126_GFDL", "v1", paste0(output.files[a], "_global", ".tif")),
    filetype = "GTiff",
    overwrite = FALSE
  )
  
  # remove object once its done
  rm(rast.hold)

}

# directory of files to modify
env.files <- list.files(path = file.path(mypath, "2041-2070_ssp126_GFDL", "v1"), pattern = "\\.tif$", full.names = TRUE) 

# output file names
output.files <- list.files(path = file.path(mypath, "2041-2070_ssp126_GFDL", "v1"), pattern = "\\.tif$", full.names = FALSE) %>%
  gsub(pattern = ".tif", replacement = ".asc")

Note In this version of the vignette, I have isolated only the variables I selected for the final analysis. If you would like to tidy and compare all variables, simply skip this next chunk.

if(FALSE) {

  env.files <- grep("bio2_2041-2070_gfdl_126_global|bio11_2041-2070_gfdl_126_global|bio12_2041-2070_gfdl_126_global|bio15_2041-2070_gfdl_126_global", env.files, value = TRUE)
  
  output.files <- grep("bio2_2041-2070_gfdl_126_global|bio11_2041-2070_gfdl_126_global|bio12_2041-2070_gfdl_126_global|bio15_2041-2070_gfdl_126_global", output.files, value = TRUE)
  
}

This code chunk was added later so I will only grab the rasters I need downstream.

# loop to convert to .ascii
for(a in seq_along(env.files)) {
  
  # holding object  
  rast.hold <- terra::rast(env.files[a])
  
  # write output
  terra::writeRaster(
    x = rast.hold, 
    filename = file.path(mypath, "2041-2070_ssp126_GFDL", "v1_maxent", output.files[a]), 
    filetype = "AAIGrid", 
    overwrite = FALSE
    )
  
  # remove object once its done
  rm(rast.hold)
    
  }

# loop to aggregate and convert to .ascii
for(a in seq_along(env.files)){
  
  # holding object  
  rast.hold <- terra::rast(env.files[a])
  
  # aggregate to 10km scale. cells are roughly 1km, so aggregate by factor of 10
  rast.hold <- terra::aggregate(x = rast.hold, fact = 10, fun = "mean", na.rm = TRUE, threads = TRUE)
  
  # crop again to ensure aggregation doesnt increase extent
  rast.hold <- terra::crop(x = rast.hold, y = ext.obj)
  
  # write output
  terra::writeRaster(
    x = rast.hold, 
    filename = file.path(mypath, "2041-2070_ssp126_GFDL", "v1_maxent_10km", output.files[a]), 
    filetype = "AAIGrid", 
    overwrite = FALSE
    )
  
  # remove object once its done
  rm(rast.hold)
    
}

3. Assess Colinearity

Now that these different climatologies have been tidied for use, we need to decide which variables are the most logical to use in a model for Lycorma delicatula. This can be done by assessing the level of co-linearity between the variables and assessing their biological relevance to SLF.

Before using any of these climatologies in a MaxEnt model, we need to assess their level of co-linearity (the amount the layers are correlated to each other). I expect that most of them will be, as they are all derived from either temperature or precipitation. I will be using a threshold of 0.7 pearson’s correlation or lower to select which variables I use in downstream models. I will only perform a correlation analysis on the historical versions of the rasters, as the climate change versions of the variables are likely to display the same trends. I will use this analysis to decide what covariates go into both the historical and interpolated (climate change) models.

While it is very important to reduce co-linearity, this cannot be the only metric used to select covariates for species distribution models. The biology of the target species is arguably more important than reducing co-linearity. A MaxEnt model that has not been informed by the biology of Lycorma delicatula will likely produce misleading or wrong predictions of its realized niche. Our process of selecting covariates will also need to be informed by the literature. For example, there is evidence that minimum winter temperature has a significantly negative impact on egg survival (Lee et.al, 2011).

We will downsize the rasters from the v1 folder to perform a co-linearity analysis between the layers. This will only be done for the historical variables. We will be using terra::aggregate() to combine cells in a 2x2 grid fashion and take the mean of the 4 cells.

mypath <- file.path(here::here() %>% 
                      dirname(),
                    "maxent/historical_climate_rasters/chelsa2.1_30arcsec")

#list env layers, load
env.files <- list.files(path = file.path(mypath, "v1"), pattern = "\\.tif$", full.names = TRUE)
output.files <- list.files(path = file.path(mypath, "v1"), pattern = "\\.tif$", full.names = FALSE)
# I had to modify "pattern" to not include any .tif.aux.xml files

# loop to downsample rasters for colinearity analysis
# code taken from Huron et.al
for(a in seq_along(env.files)){
  
  # holding object
  rast.hold <- terra::rast(env.files[a])
  
  rast.hold <- terra::aggregate(
    rast.hold, 
    fact = 2, # downsampling by factor of 2 (read 2 cells deep around cell) and take mean of cells
    fun = "mean", # take mean of these cells
    # expand = TRUE, # a carryover from raster::aggregate, terra does this automatically
    na.rm = TRUE, 
    filename = file.path(mypath, "v1_downsampled", output.files[a]), overwrite = FALSE
    )
  
  # remove object once its done
  rm(rast.hold)
  
}

3.1 create and tidy correlation matrix

We will use ENMTools::raster.cor.matrix() to perform the correlation analysis. This chunk will also take awhile, so its time to break out that book again.

if(FALSE){
  
  mypath <- file.path(here::here() %>% 
                        dirname(),
                      "maxent/historical_climate_rasters/chelsa2.1_30arcsec")
  
  # load downsampled layers and stack for raster.cor.matrix command

  # list of layer paths
  env.files <- list.files(path = file.path(mypath, "v1_downsampled"), pattern = "\\.tif$", full.names = TRUE) %>%
    # only grab bioclim variables
    grep(pattern = "bio", value = TRUE)
    
  
  # stack downsampled layers
  env <- c(x = terra::rast(env.files))
  # use nlyr, a command of the function terra::dimensions() to see if stacking was successful
  nlyr(env)
  
  
  # create a correlation matrix for picking model layers
  env.cor <- ENMTools::raster.cor.matrix(env = env, method = "pearson")
  
  # write out correlations as .csv.
  write.csv(x = env.cor, file = file.path(here::here(), "vignette-outputs", "data-tables", "env_cor_chelsa_downsampled_v2.csv"), col.names = TRUE, row.names = TRUE)
  
}

The correlation values have been saved as a 24x24 matrix. Now that the values have been saved as a .csv file, we need to tidy the data into a more readable format. We will simplify the column and row names and reorder them. We will also convert the correlations to their absolute values so that we can analyze the differences between the layers on a 0-1 scale.

env.cor2 <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "env_cor_chelsa_downsampled_v2.csv")) 

env.cor2 <- as.data.frame(env.cor2) 
# convert rownames column to rownames
row.names(env.cor2) <- env.cor2$X
# get rid of X column
env.cor2 <- env.cor2[, which(names(env.cor2) != "X")]

# reorder columns and take absolute values of correlations
env.cor3 <- env.cor2 %>%
  dplyr::select(order(colnames(.))) %>%
  abs(.) 

# simplify colnames
colnames(env.cor3) <- gsub(pattern = "_V.2.1", replacement = "", x = colnames(env.cor3))
colnames(env.cor3) <- gsub(pattern = "_1981.2010", replacement = "", x = colnames(env.cor3))
colnames(env.cor3) <- gsub(pattern = "CHELSA_", replacement = "", x = colnames(env.cor3))
# simplify rownames
rownames(env.cor3) <- gsub(pattern = "_V.2.1", replacement = "", x = rownames(env.cor3))
rownames(env.cor3) <- gsub(pattern = "_1981.2010", replacement = "", x = rownames(env.cor3))
rownames(env.cor3) <- gsub(pattern = "CHELSA_", replacement = "", x = rownames(env.cor3))

# factor levels to order row names
var.names <- c(x = "bio1", "bio2", "bio3", "bio4", "bio5", "bio6", "bio7", "bio8", "bio9", "bio10", "bio11", "bio12", "bio13", "bio14", "bio15", "bio16", "bio17", "bio18", "bio19")

# reorder columns 
env.cor4 <- env.cor3 %>%
  dplyr::select(bio1, bio2:bio9, bio10:bio19) %>%
  # reorder rows
  mutate(var = rownames(.)) %>% # add column of row names to reorder
  arrange(factor(var, levels = var.names)) %>%
  dplyr::select(!var) # take out temp column

# write tidied data
write.csv(x = env.cor4, file = file.path(here::here(), "vignette-outputs", "data-tables", "env_cor_chelsa_downsampled_abs_v2.csv"), col.names = TRUE, row.names = TRUE)

The correlations are now more intuitive to read, so they are ready to be analyzed.

3.2 Covariate selection.

Now that they are in a more readable format and have been converted to an absolute scale, we can create heatmaps to display these values and look for trends. I will create a heatmap of both the raw values and of a threshold of any above 0.7.

env.cor5 <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "env_cor_chelsa_downsampled_abs_v2.csv"))

row.names(env.cor5) <- env.cor5$X

env.cor5 <- as.matrix(env.cor5[, which(names(env.cor5) != "X")])
# Make into triangle
#env.cor5[upper.tri(env.cor5, diag = FALSE)] <- NA

# pivot correlation heatmap for plotting
# code taken from Huron et.al
env.cor5 <- env.cor5 %>%
  tibble::as_tibble() %>%
  mutate(covar = colnames(.)) %>%
  dplyr::select(covar, everything()) %>% # put covar at front
  pivot_longer(cols = -covar, names_to = "var") %>% # pivot longer for plotting
  dplyr::select(var, covar, everything())

I need to set factor levels for the variable names and how to rename them.

# factor levels to order row names
var.names <- c(x = "bio1", "bio2", "bio3", "bio4", "bio5", "bio6", "bio7", "bio8", "bio9", "bio10", "bio11", "bio12", "bio13", "bio14", "bio15", "bio16", "bio17", "bio18", "bio19")

# factor levels to rename abbreviations to actual names
var.renames <- c( 
  "bio1" = "annual mean temperature",
  "bio2" = "mean diurnal range",
  "bio3" = "isothermality",
  "bio4" = "temperature seasonality",
  "bio5"    = "max temp of warmest month",
  "bio6" = "min temp of coldest month",
  "bio7" = "temperature annual range",
  "bio8"    = "mean temp of wettest quarter",
  "bio9" = "mean temp of driest quarter",
  "bio10" = "mean temp of warmest quarter",
  "bio11"   = "mean temp of coldest quarter",
  "bio12" = "annual precipitation",
  "bio13" = "precip of wettest month",
  "bio14" = "precip of driest month",
  "bio15" = "precipitation seasonality",
  "bio16" = "precip of wettest quarter",
  "bio17"   = "precip of driest quarter",
  "bio18" = "precip of warmest quarter",
  "bio19" = "precip of coldest quarter"
)

# plot correlation matrix
(env.cor_plot <- ggplot(data = env.cor5) +
  geom_tile(aes(x = factor(var, levels = var.names), # reorder according to var.names factor obj
                y = factor(covar, levels = var.names), 
                fill = value)) +
  viridis::scale_fill_viridis(discrete = FALSE, direction = -1, limits = c(0,1), name = "Correlation", na.value = "white") +
  theme_minimal() +
  guides(x =  guide_axis(angle = 45)) +
  labs(x = "", y = "")  +
  labs(title = "Pearson's correlations: CHELSA bioclim covariate rasters") +
  theme(legend.position = "bottom") +
  scale_y_discrete(labels = var.renames) +
   coord_equal(ratio = 1)
)

I have found that choosing covariates can be rather ad-hoc, so I developed what I believe is the best-fit methodology for choosing covariates out of much practice. First, I look at the different groupings of covariates (those based on temperature and precipitation) and choose a representative from each group. The ideal representative is highly correlated with most of the others in the group, which indicates its predictive power, and represents a generality of its group (ie, the mean yearly average). In this case, I chose mean temperature of the coldest quarter and precip (bio 11 and 12). Besides its correlation values, mean temp of the coldest quarter is highly impactful for the survival of SLF eggs (Lee et.al, 2011).

I will save this plot and produce a binarized version before I continue choosing variables. This way, the cutoff is clearer and the choice is more obvious.

# save correlation heatmap
ggsave(
  env.cor_plot, filename = file.path(here::here(), "vignette-outputs", "figures", "env_cor_CHELSA_v2.jpg"),
  height = 8, 
  width = 10,
  device = jpeg,
  dpi = 500
  )

# binary heatmap

env.cor6 <- env.cor5
# cut values into ranges above and below 0.7
env.cor6$value <- cut(env.cor6$value, breaks = c(0, 0.709, 1), labels = c("0.7 or below", "above 0.7"))

# 0.7 threshold heatmap
(env.cor_binary_plot <- ggplot(data = env.cor6) +
  geom_tile(aes(x = factor(var, levels = var.names), # reorder according to var.names factor obj
                y = factor(covar, levels = var.names), 
                fill = value)) +
  theme_minimal() +
  guides(x =  guide_axis(angle = 45),
         fill = guide_legend(title = "Correlation")) +
  labs(x = "", y = "")  +
  labs(
    title = "Pearson's correlations: CHELSA bioclim covariate rasters (binarized)"
    ) +
  scale_fill_manual(values = c("yellow2", "azure4")) +
  theme(legend.position = "bottom") +
  scale_y_discrete(labels = var.renames) +
  coord_equal(ratio = 1)
)

Indeed, the choice is much clearer on this plot. Mean temp of the coldest quarter (bio 11) is above 0.7 correlated with all other temp-based variables, except mean diurnal range (bio 2). Meanwhile, annual precipitation is highly correlated with all precipitation based variables, except precipitation seasonality (bio 12). These two variables were also chosen.

ggsave(
  env.cor_binary_plot, filename = file.path(here::here(), "vignette-outputs", "figures", "env_cor_CHELSA_v2_binary.jpg"),
  height = 8, 
  width = 10,
  device = jpeg,
  dpi = 500
  )

4. Visualize rasters

mypath <- file.path(here::here() %>% 
                        dirname(),
                      "maxent/historical_climate_rasters/chelsa2.1_30arcsec")

# load in bio 1
bio1_df <- terra::rast(x = file.path(mypath, "v1_downsampled", "bio1_1981-2010_global.tif")) %>%
  terra::as.data.frame(., xy = TRUE)

bio1_plot <- ggplot() +
  geom_tile(data = bio1_df, aes(x = x, y = y, fill = `CHELSA_bio1_1981-2010_V.2.1`)) +
  xlab("longitude") +
  ylab("latitude") +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(title = "CHELSA bio 1") +
  coord_quickmap()

bio1_plot

The final 4 variables have been selected for SDM of Lycorma delicatula, which include “mean diurnal range”, “mean temperature of the coldest quarter”, “annual precipitation”, and “precipitation seasonality”. These covariates can be projected for climate change and are biologically relevant to our target species. In the next vignette, we will crop these covariates to the appropriate spatial scale to run a global-scale model. We will also create other necessary datasets for this model.

References

Huron, N. A., Behm, J. E., & Helmus, M. R. (2022). Paninvasion severity assessment of a U.S. grape pest to disrupt the global wine market. Communications Biology, 5(1), 655. https://doi.org/10.1038/s42003-022-03580-w
Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E., Linder, P., Kessler, M. (2017). Climatologies at high resolution for the Earth land surface areas. Scientific Data. 4 170122. https://doi.org/10.1038/sdata.2017.122
Karger D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E, Linder, H.P., Kessler, M. (2018): Data from: Climatologies at high resolution for the earth’s land surface areas. EnviDat. https://doi.org/10.16904/envidat.228.v2.1
Krasting, J. P., John, J. G., Blanton, C., McHugh, C., Nikonov, S., Radhakrishnan, A., Rand, K., Zadeh, N. T., Balaji, V., Durachta, J., Dupuis, C., Menzel, R., Robinson, T., Underwood, S., Vahlenkamp, H., Dunne, K. A., Gauthier, P. P., Ginoux, P., Griffies, S. M., … Zhao, M. (2018). NOAA-GFDL GFDL-ESM4 model output prepared for CMIP6 CMIP [Dataset]. Earth System Grid Federation. https://doi.org/10.22033/ESGF/CMIP6.1407
Lee, J.-S., Kim, I.-K., Koh, S.-H., Cho, S. J., Jang, S.-J., Pyo, S.-H., & Choi, W. I. (2011). Impact of minimum winter temperature on Lycorma delicatula (Hemiptera: Fulgoridae) egg mortality. Journal of Asia-Pacific Entomology, 14(1), 123–125. https://doi.org/10.1016/j.aspen.2010.09.004
Weiss, D. J., Nelson, A., Gibson, H. S., Temperley, W., Peedell, S., Lieber, A., Hancher, M., Poyart, E., Belchior, S., Fullman, N., Mappin, B., Dalrymple, U., Rozier, J., Lucas, T. C. D., Howes, R. E., Tusting, L. S., Kang, S. Y., Cameron, E., Bisanzio, D., … Gething, P. W. (2018). A global map of travel time to cities to assess inequalities in accessibility in 2015. Nature, 553(7688), 333–336. https://doi.org/10.1038/nature25181

Appendix

Additional environmental covariates

Along with the 19 traditional bioclimatic variables, CHELSA has created about 29 additional variables that may help to explain the SLF invasion in N. America. I will include 4 metrics of growing degree day. Several studies have attempted to characterize the lower threshold for egg development and have found numbers between 8-13C (Maino et.al, 2022). Maino et.al also found that egg mortality increased significantly under 10C (Maino et.al, 2022). We will include various measures of degree day with a 10C threshold. Here are the additional variables I will add to the models:

GDD10: Growing degree days heat sum above 10°C
NGD10: Number of growing degree days above 10°C
GDGFGD10: First growing degree day above 10°C
GDDLGD10: Last growing degree day above 10°C

Download historical DD variables

if(FALSE) {

  # historical data
  chelsa_historic_URLs <- read_table(file = file.path(here::here(), "data-raw", "CHELSA", "chelsa_1981-2010_bioclim_URLs.txt"),
                                col_names = FALSE) %>%
    as.data.frame() %>%
    dplyr::select("X1") %>%
    dplyr::rename("URL" = "X1")
  
  # select only DD URLs
  chelsa_historic_URLs <- slice(.data = chelsa_historic_URLs, c(32, 35, 38, 55))
  
  # loop to download historical URLs
  for(i in 1:nrow(chelsa_historic_URLs)) {
    
    file.tmp <- chelsa_historic_URLs[i, ]
    
    utils::browseURL(url = file.tmp)
    
  }

}

Files were placed in "maxent/historical_climate_rasters/chelsa2.1_30arcsec/originals".

Samuel M. Owens¹

2024-10-01

Overview

Setup

1. Retrieve CHELSA bioclim rasters

1.1 Historical

1.2 CMIP6

SSP126

SSP370

SSP585

2. Tidy for MaxEnt

2.1 Load in files

2.2 Prepare reference layers

2.3 Tidying historical data

2.4 Tidying CMIP6 data

SSP126

tidy

3. Assess Colinearity

3.1 create and tidy correlation matrix

3.2 Covariate selection.

4. Visualize rasters

References

Appendix

Additional environmental covariates

Download historical DD variables

Bioclim variable rasters: CHELSA

Samuel M. Owens1

2024-10-01

Overview

Setup

1. Retrieve CHELSA bioclim rasters

1.1 Historical

1.2 CMIP6

SSP126

SSP370

SSP585

2. Tidy for MaxEnt

2.1 Load in files

2.2 Prepare reference layers

2.3 Tidying historical data

2.4 Tidying CMIP6 data

SSP126

tidy

3. Assess Colinearity

3.1 create and tidy correlation matrix

3.2 Covariate selection.

4. Visualize rasters

References

Appendix

Additional environmental covariates

Download historical DD variables

Samuel M. Owens¹