Run 'Rn' model trained on SLF native range in east Asia

Overview

This vignette trains the last regional-scale model, the Rn model, which is based on the native range of SLF in east Asia. This vignette follows the structure of the last two vignettes and trains the last regional-scale model.

Setup

CURRENT MODEL VERSION: v3

# general tools
library(tidyverse)  #data manipulation
library(here) #making directory pathways easier on different instances
# here::here() starts at the root folder of this package.
library(devtools)

# SDMtune and dependencies
library(SDMtune) # main package used to run SDMs
library(dismo) # package underneath SDMtune
library(rJava) # for running MaxEnt
library(plotROC) # plots ROCs

# spatial data handling
library(raster) 
library(terra) 

library(viridis)

Note: I will be setting the global options of this document so that only certain code chunks are rendered in the final .html file. I will set the eval = FALSE so that none of the code is re-run (preventing files from being overwritten during knitting) and will simply overwrite this in chunks with plots.

SDMtune will run MaxEnt through java via the rJava package. You will need to ensure that your machine has the proper version of java installed (x32 or x64) for your operating system.

checkMaxentInstallation(verbose = TRUE)

This chunk sets the java memory allocation (Xmx). I will increase the memory allocation from 512m (the default) to 2GB of memory.

# xmx sets java memory allocation
options(java.parameters = "-Xmx2048m")

# xss sets java stack size
# options(java.parameters = c("-Xss2560k", "-Xmx2048m"))

1. Format Data for native regional Model

Input Data- env covariates

I will load in the datasets I will need for the MaxEnt models. These are labeled at the beginning of the object name by the parameter they will be used for in the maxent() function (x, p or a). I will begin by loading in the covariate data and then by loading in the points datasets. The env_covariates dataset labeled native contains a raster of each of the environmental covariates used to train the model- these were the covariates that were masked using the K-G climate zones in the last vignette.

# path to directory
mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/historical_climate_rasters/chelsa2.1_30arcsec")


# the env covariate scale used to train the model
x_native_env_covariates_list <- list.files(path = file.path(mypath, "v1_maxent_10km"), pattern = "\\_regional_native_KG.asc$", full.names = TRUE) %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

# the scale used to make xy predictions
x_global_hist_env_covariates_list <- list.files(path = file.path(mypath, "v1_maxent_10km"), pattern = "\\.asc$", full.names = TRUE) %>%
    grep("bio2_1981-2010_global.asc|bio11_1981-2010_global.asc|bio12_1981-2010_global.asc|bio15_1981-2010_global.asc", ., value = TRUE)

The CMIP6 versions of these covariates will only be used for projection purposes.

# path to directory
  mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/future_climate_rasters/chelsa2.1_30arcsec")

# the env covariates for performing xy predictions for global slf and IVR points

# SSP126
x_global_126_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp126_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

# SSP370
x_global_370_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp370_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

# SSP585
x_global_585_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp585_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

I will create rasters of the environmental covariates, stack and gather summary statistics. I will also shorten their names and exclude possible operators from layer names (for example, using the dash symbol was found to interfere with SDMtune making predictions for tables downstream).

# layer name object. Check order of layers first
env_layer_names <- c("bio11", "bio12", "bio15", "bio2")

# native range rasters
# stack env covariates
x_native_env_covariates <- terra::rast(x = x_native_env_covariates_list)

# attributes
nlyr(x_native_env_covariates)
names(x_native_env_covariates)
minmax(x_native_env_covariates)
# ext(x_native_env_covariates)
# crs(x_native_env_covariates)

# I will change the name of the variables because they are throwing errors in SDMtune
names(x_native_env_covariates) <- env_layer_names



# global rasters
# stack env covariates
x_global_hist_env_covariates <- terra::rast(x = x_global_hist_env_covariates_list)

# attributes
nlyr(x_global_hist_env_covariates)
names(x_global_hist_env_covariates)
minmax(x_global_hist_env_covariates)
# ext(x_global_hist_env_covariates)
# crs(x_global_hist_env_covariates)

# I will change the name of the variables because they are throwing errors in SDMtune
names(x_global_hist_env_covariates) <- env_layer_names
# confirmed- SDMtune doesnt like dashes in column names (it is read as a mathematical operation)

# SSP126
x_global_126_env_covariates <- terra::rast(x = x_global_126_env_covariates_list)

# attributes
nlyr(x_global_126_env_covariates)
names(x_global_126_env_covariates)
minmax(x_global_126_env_covariates)
# ext(x_global_126_env_covariates)
# crs(x_global_126_env_covariates)

names(x_global_126_env_covariates) <- env_layer_names


# SSP370
x_global_370_env_covariates <- terra::rast(x = x_global_370_env_covariates_list)

# attributes
nlyr(x_global_370_env_covariates)
names(x_global_370_env_covariates)
minmax(x_global_370_env_covariates)
# ext(x_global_370_env_covariates)
# crs(x_global_370_env_covariates)

names(x_global_370_env_covariates) <- env_layer_names


# SSP585
x_global_585_env_covariates <- terra::rast(x = x_global_585_env_covariates_list)

# attributes
nlyr(x_global_585_env_covariates)
names(x_global_585_env_covariates)
minmax(x_global_585_env_covariates)
# ext(x_global_585_env_covariates)
# crs(x_global_585_env_covariates)

names(x_global_585_env_covariates) <- env_layer_names

rm(x_native_env_covariates_list)
rm(x_global_hist_env_covariates_list)
rm(x_global_126_env_covariates_list)
rm(x_global_370_env_covariates_list)
rm(x_global_585_env_covariates_list)

1.2 Input data- presences / absences (training and testing)

I need to also load in the SLF presence dataset created in vignette 030 and the background points dataset created in vignette 040.

# slf presences
p_slf_points <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "slf_all_coords_final_2024-08-05.csv")) %>%
  dplyr::select(-species)

# training presences
p_slf_points_native_train <- read.csv(
  file = file.path(here::here(), "vignette-outputs", "data-tables", "regional_native_train_slf_presences_v3.csv")
  )
# test presences
p_slf_points_native_test <- read.csv(
  file = file.path(here::here(), "vignette-outputs", "data-tables", "regional_native_test_slf_presences_v3.csv")
  )

# entire eastern USA background point set
a_regional_native_background_points <- read.csv(
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_native_background_points_v3.csv")
  )

Model Training object

Now I will create the “samples with data” object that SDMtune requires.

regional_native_train <- SDMtune::prepareSWD(
  species = "Lycorma delicatula",
  env = x_native_env_covariates,
  p = p_slf_points_native_train, 
  a = a_regional_native_background_points,
  verbose = TRUE # print helpful messages
  )

regional_native_train@coords # coordinates
regional_native_train@pa # presence / absence (background counted as absence)
regional_native_train@data # extracted data from

SDMtune::swd2csv(swd = regional_native_train, file_name = c(
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_native_train_slf_presences_with_data_v3.csv"),
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_native_background_points_with_data_v3.csv")
  ))

Model testing object

The model will be tested using all presences from N America.

regional_native_test <- SDMtune::prepareSWD(
  species = "Lycorma delicatula",
  env = x_global_hist_env_covariates, # need the global covariates for China and N America
  p = p_slf_points_native_test, 
  a = a_regional_native_background_points,
  verbose = TRUE # print helpful messages
  )

regional_native_test@coords # coordinates
regional_native_test@pa # presence / absence (background counted as absence)
regional_native_test@data # extracted data from

SDMtune::swd2csv(swd = regional_native_test, file_name = c(
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_native_test_slf_presences_with_data_v3.csv"),
  file.path(here::here(), "vignette-outputs", "data-tables", "NULL.csv")
  ))

# remove second copy of background data points
file.remove(file.path(here::here(), "vignette-outputs", "data-tables", "NULL.csv"))

2. Train native Regional Model

First, I will train a maxEnt model. This and the other regional scale model will NOT be cross-validated. Cross-validation randomly selects a percentage of the presence data for training and testing, without regard for spatial location. This process makes sense for the global model, which is fed ALL presence data. Due to the spatial scale of the regional models, it makes more sense to spatially separate the training and testing data if given the opportunity.

I will use the following list of hyperparameters to train the initial model, which I selected via tuning for the global model.

ALL feature classes (fc) used (l = linear, q = quadratic, p = product, h = hinge, t = threshold)
regularization multiplier (reg) set to 1.5 (more regularized than default of 1)
iterations = 5000. This is the max number of iterations for the optimization algorithm to perform before stopping training. Increasing this number from the default of 500 allows the algorithm to make more refined predictions.

regional_native_model <- SDMtune::train(
  method = "Maxent",
  data = regional_native_train,
  fc = "lqpht", # feature classes set to ALL
  reg = 1.5,
  iter = 5000, # number of iterations
  progress = TRUE
)

Summary Statistics

This function produces all summary statistics for this model. For the complete and annotated workflow used to create this function, see 051_compute_MaxEnt_summary_statistics_workflow.Rmd.

# output directory
mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_native_v3")

scari::compute_MaxEnt_summary_statistics(
  model.obj = regional_native_model, 
  model.name = "regional_native", 
  mypath = mypath, 
  create.dir = TRUE, # create subdirectory
  env.covar.obj = x_native_env_covariates, # env covariates raster stacked
  train.obj = regional_native_train, # training data used to create model
  test.obj = regional_native_test, # data you wish to use to test the model
  plot.type = c("cloglog", "logistic"), # types of univariate and marginal response curves to be created
  jk.test.type = c("train", "test") # types of jackknife curves to be created
  )

3. Create Outputs for Analysis

3.1 Create distribution map for area of interest

Lastly, I will use the SDMtune::predict() function to predict the suitability for the range of interest. I will threshold the map by the MTP, MTSS and 10_percentile thresholds. The workflow for this function is wrapped into the function create_MaxEnt_suitability_maps().

I will retrieve the specific value for the MTSS threshold for this model and use it to define a what is considered unsuitable in this raster. Then, I will reclassify the mean raster so that anything below this threshold is now “unsuitable”. I will use the mean cloglog threshold for all iterations. I will create a binary version of the mean raster, with unsuitable regions, which are below the MTSS training threshold, as 0, and those above the threshold as 1. Then, I can reclassify averaged raster of MaxEnt predictions. I will load in the averaged prediction for this model and reclassify the raster using a temporary raster layer that I will create. This raster will only have values where the model predicts the climate to be unsuitable, so that it can be plotted overtop the averaged raster to mask unsuitable areas

mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_native_v3")

regional_native_model <- read_rds(file = file.path(mypath, "regional_native_model.rds"))

scari::create_MaxEnt_suitability_maps(
  model.obj = regional_native_model,
  model.name = "regional_native", 
  mypath = mypath, 
  create.dir = FALSE, 
  env.covar.obj = x_global_hist_env_covariates, 
  describe.proj = "globe_1981-2010", # name of area or time period describe.proj to
  clamp.pred = TRUE,
  map.thresh = TRUE, # whether thresholded versions of these maps should be created
  map.thresh.extra = "MTP",
  thresh = c("MTP", "10_percentile", "MTSS"),
  summary.file = file.path(mypath, "regional_native_summary.csv")
)

I will also create suitability maps for the projected 2041-2070 climate data.

3.2 Predict suitability for all SLF presences

mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_native_v3")

# slf presence data
slf_presences <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "slf_all_coords_final_2024-08-05.csv")) %>%
  dplyr::select(-species)

scari::predict_xy_suitability(
  xy.obj = slf_presences,
  xy.type = "Lycorma delicatula",
  env.covar.obj = x_global_hist_env_covariates,
  model.obj = regional_native_model,
  mypath = mypath,
  predict.type = c("cloglog", "logistic"),
  output.name = "regional_native_slf_all_coords_1981-2010"
)

3.3 Predict suibability for IVR regions

I will perform the same action as above, but for the locations of important wineries around the world.

mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_native_v3")

# load in all IVR points
load(file = file.path(here::here(), "data", "wineries.rda")) 

IVR_regions <- wineries %>%
  dplyr::select(x, y)

scari::predict_xy_suitability(
  xy.obj = IVR_regions,
  xy.type = "IVR regions",
  env.covar.obj = x_global_hist_env_covariates,
  model.obj = regional_native_model,
  mypath = mypath,
  predict.type = c("cloglog", "logistic"),
  output.name = "regional_native_wineries_1981-2010",
  buffer.pred = TRUE
)

Now that all of my models have been trained, I can move on to make check for the level of extrapolation (exdet) and the most important covariate (MIC) for each model’s predictions. This is an important step when applying models to climate change and invasive species, due to the non-equilibrium nature of these applications.

References

Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., & Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists: Statistical explanation of MaxEnt. Diversity and Distributions, 17(1), 43–57. https://doi.org/10.1111/j.1472-4642.2010.00725.x
Feng, X. (2022). Shandongfx/nimbios_enm [HTML]. https://github.com/shandongfx/nimbios_enm (Original work published 2018).
Gallien, L., Douzet, R., Pratte, S., Zimmermann, N. E., & Thuiller, W. (2012). Invasive species distribution models – how violating the equilibrium assumption can create new insights. Global Ecology and Biogeography, 21(11), 1126–1136. https://doi.org/10.1111/j.1466-8238.2012.00768.x
Maryam Bordkhani. (n.d.). Threshold rule [Online post].
Radosavljevic, A., & Anderson, R. P. (2014). Making better Maxent models of species distributions: Complexity, overfitting and evaluation. Journal of Biogeography, 41(4), 629–643. https://doi.org/10.1111/jbi.12227
Sobek-Swant, S., Kluza, D. A., Cuddington, K., & Lyons, D. B. (2012). Potential distribution of emerald ash borer: What can we learn from ecological niche models using Maxent and GARP? Forest Ecology and Management, 281, 23–31. https://doi.org/10.1016/j.foreco.2012.06.017
Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3), 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026
Steven Phillips. (2017). A Brief Tutorial on Maxent. http://biodiversityinformatics.amnh.org/open_source/maxent/.
Steven J. Phillips, Miroslav Dudík, & Robert E. Schapire. (2023). Maxent software for modeling species niches and distributions (Version 3.4.3 (Java)) [Computer software]. http://biodiversityinformatics.amnh.org/open_source/maxent/.
Srivastava, V., Roe, A. D., Keena, M. A., Hamelin, R. C., & Griess, V. C. (2021). Oh the places they’ll go: Improving species distribution modelling for invasive forest pests in an uncertain world. Biological Invasions, 23(1), 297–349. https://doi.org/10.1007/s10530-020-02372-9
VanDerWal, J., Shoo, L. P., Graham, C., & Williams, S. E. (2009). Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know? Ecological Modelling, 220(4), 589–594. https://doi.org/10.1016/j.ecolmodel.2008.11.010
Vignali, S., Barras, A. G., Arlettaz, R., & Braunisch, V. (2020). SDMtune: An R package to tune and evaluate species distribution models. Ecology and Evolution, 10(20), 11488–11506. https://doi.org/10.1002/ece3.6786

Samuel M. Owens¹

2024-08-09

Overview

Setup

1. Format Data for native regional Model

Input Data- env covariates

1.2 Input data- presences / absences (training and testing)

Model Training object

Model testing object

2. Train native Regional Model

Summary Statistics

3. Create Outputs for Analysis

3.1 Create distribution map for area of interest

3.2 Predict suitability for all SLF presences

3.3 Predict suibability for IVR regions

References

Run 'Rn' model trained on SLF native range in east Asia

Samuel M. Owens1

2024-08-09

Overview

Setup

1. Format Data for native regional Model

Input Data- env covariates

1.2 Input data- presences / absences (training and testing)

Model Training object

Model testing object

2. Train native Regional Model

Summary Statistics

3. Create Outputs for Analysis

3.1 Create distribution map for area of interest

3.2 Predict suitability for all SLF presences

3.3 Predict suibability for IVR regions

References

Samuel M. Owens¹