Run 'Ri.NAmerica' model trained on SLF invaded region in North America

Overview

In the last vignette, I cropped the covariate layer to the extent of each regional model training area and chose background points from those layers to characterize these covariates. In this vignette, I will train our first regional-scale model based on the invaded North American range. This another step toward my goal of analyzing the realized niche of Lycorma delicatula at multiple spatial scales. This output will be combined with the other regional models and compared directly to the global scale model to determine the impact of climate change on the suitability of SLF in important viticultural regions.

Just to reiterate, I hypothesize that by creating models at multiple spatial scales, I can have more confidence in suitability predictions made for specific SLF populations and important viticultural regions, and can predict the risk of specific populations spreading now and under climate change. I also hypothesize that the regional-scale models, once ensembled into a single prediction, will make more refined predictions than a global-scale model could.

Setup

CURRENT MODEL VERSION: v7

# general tools
library(tidyverse)  #data manipulation
library(here) #making directory pathways easier on different instances
# here() starts at the root folder of this package.
library(devtools)

# SDMtune and dependencies
library(SDMtune) # main package used to run SDMs
library(dismo) # package underneath SDMtune
library(rJava) # for running MaxEnt
library(plotROC) # plots ROCs

# spatial data handling
library(raster) 
library(terra) 

library(viridis)

Note: I will be setting the global options of this document so that only certain code chunks are rendered in the final .html file. I will set the eval = FALSE so that none of the code is re-run (preventing files from being overwritten during knitting) and will simply overwrite this in chunks with plots.

SDMtune will run MaxEnt through java via the rJava package. You will need to ensure that your machine has the proper version of java installed (x32 or x64) for your operating system.

checkMaxentInstallation(verbose = TRUE)

This chunk sets the java memory allocation (Xmx). I will increase the memory allocation from 512m (the default) to 2GB of memory.

# xmx sets java memory allocation
options(java.parameters = "-Xmx2048m")

# xss sets java stack size
# options(java.parameters = c("-Xss2560k", "-Xmx2048m"))

1. Format Data for Model

I will load in the datasets I will need for the MaxEnt models. These are labeled at the beginning of the object name by the parameter they will be used in the SDMtune::train() function (x, p or a). I will begin by loading in the covariate data and then by loading in the points datasets. The env_covariates dataset labeled invaded contains a raster of each of the environmental covariates used to train the model- these were the covariates that were masked using the K-G climate zones in the last vignette.

1.1 Input Data- env covariates

# path to directory
mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/historical_climate_rasters/chelsa2.1_30arcsec")


# the env covariate scale used to train the model
x_invaded_env_covariates_list <- list.files(path = file.path(mypath, "v1_maxent_10km"), pattern = "\\_regional_invaded_KG.asc$", full.names = TRUE)  %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

# the scale used to make xy predictions
x_global_hist_env_covariates_list <- list.files(path = file.path(mypath, "v1_maxent_10km"), pattern = "\\.asc$", full.names = TRUE) %>%
    grep("bio2_1981-2010_global.asc|bio11_1981-2010_global.asc|bio12_1981-2010_global.asc|bio15_1981-2010_global.asc", ., value = TRUE)

The CMIP6 versions of these covariates will only be used for projection purposes.

# path to directory
  mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/future_climate_rasters/chelsa2.1_30arcsec")


# the env covariates for performing xy predictions for global slf and IVR points

# SSP126
x_global_126_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp126_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

# SSP370
x_global_370_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp370_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

# SSP585
x_global_585_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp585_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

I will create rasters of the environmental covariates, stack and gather summary statistics. I will also shorten their names and exclude possible operators from layer names (for example, using the dash symbol was found to interfere with SDMtune making predictions for tables downstream).

# layer name object. Check order of layers first
env_layer_names <- c("bio11", "bio12", "bio15", "bio2")

# easternUSA rasters
# stack env covariates
x_invaded_env_covariates <- terra::rast(x = x_invaded_env_covariates_list)

# attributes
nlyr(x_invaded_env_covariates)
names(x_invaded_env_covariates)
minmax(x_invaded_env_covariates)
# ext(x_invaded_env_covariates)
# crs(x_invaded_env_covariates)

# I will change the name of the variables because they are throwing errors in SDMtune
names(x_invaded_env_covariates) <- env_layer_names



# global rasters
# stack env covariates
x_global_hist_env_covariates <- terra::rast(x = x_global_hist_env_covariates_list)

# attributes
nlyr(x_global_hist_env_covariates)
names(x_global_hist_env_covariates)
minmax(x_global_hist_env_covariates)
# ext(x_global_hist_env_covariates)
# crs(x_global_hist_env_covariates)

# I will change the name of the variables because they are throwing errors in SDMtune
names(x_global_hist_env_covariates) <- env_layer_names
# confirmed- SDMtune doesnt like dashes in column names (it is read as a mathematical operation)

# SSP126
x_global_126_env_covariates <- terra::rast(x = x_global_126_env_covariates_list)

# attributes
nlyr(x_global_126_env_covariates)
names(x_global_126_env_covariates)
minmax(x_global_126_env_covariates)
# ext(x_global_126_env_covariates)
# crs(x_global_126_env_covariates)

names(x_global_126_env_covariates) <- env_layer_names


# SSP370
x_global_370_env_covariates <- terra::rast(x = x_global_370_env_covariates_list)

# attributes
nlyr(x_global_370_env_covariates)
names(x_global_370_env_covariates)
minmax(x_global_370_env_covariates)
# ext(x_global_370_env_covariates)
# crs(x_global_370_env_covariates)

names(x_global_370_env_covariates) <- env_layer_names


# SSP585
x_global_585_env_covariates <- terra::rast(x = x_global_585_env_covariates_list)

# attributes
nlyr(x_global_585_env_covariates)
names(x_global_585_env_covariates)
minmax(x_global_585_env_covariates)
# ext(x_global_585_env_covariates)
# crs(x_global_585_env_covariates)

names(x_global_585_env_covariates) <- env_layer_names

rm(x_invaded_env_covariates_list)
rm(x_global_hist_env_covariates_list)
rm(x_global_126_env_covariates_list)
rm(x_global_370_env_covariates_list)
rm(x_global_585_env_covariates_list)

1.2 Input data- presences / absences (training and testing)

I need to also load in the SLF presence and background points datasets created in vignette 060.

# slf presences
p_slf_points <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "slf_all_coords_final_2024-08-05.csv")) %>%
  dplyr::select(-species)

# entire eastern USA background point set
a_regional_invaded_background_points <- read.csv(file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_background_points_v7.csv"))

I will need to combine the above covariates, presence and background point datasets into two different datasets to feed into MaxEnt: one for training and another for testing the model. These dataset will need to contain point-wise values for each of the predictor covariates at the presence and background point coordinates. SDMtune takes an SWD (sample with data) S4 object for these purposes, containing the presences and background points with associated covariate data to be fed into the model. First, I will create the training dataset. I will need to perform some extra edits on the slf presences. These presences need to be filtered into the training and testing datasets. I will filter the presences from the invaded range in North America and will use these points to train the invaded model, while the presences from the invaded range in Asia will be used to validate / test the model.

Model training object

p_slf_invaded_train <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_train_slf_presences_v7.csv"))

Now I will create the “samples with data” object that SDMtune requires.

regional_invaded_train <- SDMtune::prepareSWD(
  species = "Lycorma delicatula",
  env = x_invaded_env_covariates,
  p = p_slf_invaded_train, 
  a = a_regional_invaded_background_points,
  verbose = TRUE # print helpful messages
  )

regional_invaded_train@coords # coordinates
regional_invaded_train@pa # presence / absence (background counted as absence)
regional_invaded_train@data # extracted data from 


# OLD method
#regional_trainTest <-  SDMtune::trainValTest(
#  x = regional_invaded_SWD,
#  test = 0.2,
#  only_presence = TRUE,
#  seed = 6
#)

I usually look at how many records were dropped before I save it, because SDMtune gives an undefined warning about how many samples were discarded. In this case, we only lost 3 presence records, so this is acceptable. I will also save the output to be used later.

SDMtune::swd2csv(swd = regional_invaded_train, file_name = c(
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_train_slf_presences_with_data_v7.csv"),
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_background_points_with_data_v7.csv")
  ))

Model testing object

The testing / validation SWD object will be created in the same fashion, except using a subset of the presences from the invaded range in Asia. These presences will be selected using a shapefile of the Japan/SK extent as a mask. we chose to test using the invaded region over the native region because Sobek-Swant et al found that their invaded MaxEnt models performed poorly when projected to the native range for EAB. Additionally, the initial versions of this model performed very poorly when tested on the native range (Sobek-Swant et al, 2012).

p_slf_invaded_test <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_test_slf_presences_v7.csv"))

regional_invaded_test <- SDMtune::prepareSWD(
  species = "Lycorma delicatula",
  env = x_global_hist_env_covariates, 
  p = p_slf_invaded_test, 
  a = a_regional_invaded_background_points,
  verbose = TRUE # print helpful messages
  )

regional_invaded_test@coords # coordinates
regional_invaded_test@pa # presence / absence (background counted as absence)
regional_invaded_test@data # extracted data from

SDMtune::swd2csv(swd = regional_invaded_test, file_name = c(
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_test_slf_presences_with_data_v7.csv"),
  file.path(here::here(), "vignette-outputs", "data-tables", "NULL.csv")
  ))

# remove second copy of background data
file.remove(file.path(here::here(), "vignette-outputs", "data-tables", "NULL.csv"))

2. Train Invaded Regional Model

First, I will train a MaxEnt model. This and the other regional-scale models will NOT be cross-validated. Cross-validation randomly selects a percentage of the presence data for training and testing, without regard for spatial location. This process makes sense for the global model, which is fed ALL presence data. Due to the spatial scale of the regional models, it makes more sense to spatially separate the training and testing data if given the opportunity.

I will use the following list of hyperparameters to train the initial model, which I selected via tuning for the global model.

ALL feature classes (fc) used (l = linear, q = quadratic, p = product, h = hinge, t = threshold)
regularization multiplier (reg) set to 1.5 (more regularized than default of 1)
iterations = 5000. This is the max number of iterations for the optimization algorithm to perform before stopping training. Increasing this number from the default of 500 allows the algorithm to make more refined predictions.

regional_invaded_model <- SDMtune::train(
  method = "Maxent",
  data = regional_invaded_train,
  fc = "lqpht", # feature classes set to ALL
  reg = 1.5,
  iter = 5000, # number of iterations
  progress = TRUE
)

Summary Statistics

This function produces all summary statistics for this model. For the complete and annotated workflow used to create this function, see 051_compute_MaxEnt_summary_statistics_workflow.Rmd.

# output directory
mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_invaded_v7")

scari::compute_MaxEnt_summary_statistics(
  model.obj = regional_invaded_model, 
  model.name = "regional_invaded", 
  mypath = mypath, 
  create.dir = TRUE, # create subdirectory
  env.covar.obj = x_invaded_env_covariates, # env covariates raster stacked
  train.obj = regional_invaded_train, # training data used to create model
  test.obj = regional_invaded_test, # data you wish to use to test the model
  plot.type = c("cloglog", "logistic"), # types of univariate and marginal response curves to be created
  jk.test.type = c("train", "test") # types of jackknife curves to be created
  )

3. Create Outputs for Analysis

3.1 Create distribution map for area of interest

Lastly, I will use the SDMtune::predict() function to predict the suitability for the range of interest. I will threshold the map by the MTP, MTSS and 10_percentile thresholds.

I will retrieve the specific value for the MTSS threshold for this model and use it to define a what is considered unsuitable in this raster. Then, I will reclassify the mean raster so that anything below this threshold is now “unsuitable”. I will use the mean cloglog threshold for all iterations. I will create a binary version of the mean raster, with unsuitable regions, which are below the MTSS training threshold, as 0, and those above the threshold as 1. Then, I can reclassify averaged raster of MaxEnt predictions. I will load in the averaged prediction for this model and reclassify the raster using a temporary raster layer that I will create. This raster will only have values where the model predicts the climate to be unsuitable, so that it can be plotted overtop the averaged raster to mask unsuitable areas

mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_invaded_v7")

regional_invaded_model <- read_rds(file = file.path(mypath, "regional_invaded_model.rds"))

scari::create_MaxEnt_suitability_maps(
  model.obj = regional_invaded_model,
  model.name = "regional_invaded", 
  mypath = mypath, 
  create.dir = FALSE, 
  env.covar.obj = x_global_hist_env_covariates, 
  describe.proj = "globe_1981-2010", # name of area or time period describe.proj to
  clamp.pred = TRUE,
  thresh = c("MTP", "MTSS"),
  map.thresh = TRUE, # whether thresholded versions of these maps should be created
  map.thresh.extra = "MTP",
  summary.file = file.path(mypath, "regional_invaded_summary.csv")
)

I will also create suitability maps for ssp scenarios of the projected 2041-2070 climate data.

3.2 Predict suitability for known SLF presences

mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_invaded_v7")

# slf presence data
slf_presences <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "slf_all_coords_final_2024-08-05.csv")) %>%
  dplyr::select(-species)

scari::predict_xy_suitability(
  xy.obj = slf_presences,
  xy.type = "Lycorma delicatula",
  env.covar.obj = x_global_hist_env_covariates,
  model.obj = regional_invaded_model,
  mypath = mypath,
  predict.type = c("cloglog", "logistic"),
  output.name = "regional_invaded_slf_all_coords_1981-2010"
)

I will repeat the same process for the CMIP6 data.

3.3 Predict suibability for IVR regions

I will perform the same action as above, but for the locations of important wineries around the world.

mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_invaded_v7")

# load in all IVR points
load(file = file.path(here::here(), "data", "wineries.rda")) 

IVR_regions <- wineries %>%
  dplyr::select(x, y)

scari::predict_xy_suitability(
  xy.obj = IVR_regions,
  xy.type = "IVR regions",
  env.covar.obj = x_global_hist_env_covariates,
  model.obj = regional_invaded_model,
  mypath = mypath,
  predict.type = c("cloglog", "logistic"),
  output.name = "regional_invaded_wineries_1981-2010",
  buffer.pred = TRUE
)

I will do the same for the SSP scenarios.

We have trained the Ri.NAmerica model, so now we will move on to train the other two regional-scale models.

References

Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., & Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists: Statistical explanation of MaxEnt. Diversity and Distributions, 17(1), 43–57. https://doi.org/10.1111/j.1472-4642.2010.00725.x
Feng, X. (2022). Shandongfx/nimbios_enm [HTML]. https://github.com/shandongfx/nimbios_enm (Original work published 2018).
Gallien, L., Douzet, R., Pratte, S., Zimmermann, N. E., & Thuiller, W. (2012). Invasive species distribution models – how violating the equilibrium assumption can create new insights. Global Ecology and Biogeography, 21(11), 1126–1136. https://doi.org/10.1111/j.1466-8238.2012.00768.x
Maryam Bordkhani. (n.d.). Threshold rule [Online post].
Radosavljevic, A., & Anderson, R. P. (2014). Making better Maxent models of species distributions: Complexity, overfitting and evaluation. Journal of Biogeography, 41(4), 629–643. https://doi.org/10.1111/jbi.12227
Sobek-Swant, S., Kluza, D. A., Cuddington, K., & Lyons, D. B. (2012). Potential distribution of emerald ash borer: What can we learn from ecological niche models using Maxent and GARP? Forest Ecology and Management, 281, 23–31. https://doi.org/10.1016/j.foreco.2012.06.017
Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3), 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026
Steven Phillips. (2017). A Brief Tutorial on Maxent. http://biodiversityinformatics.amnh.org/open_source/maxent/.
Steven J. Phillips, Miroslav Dudík, & Robert E. Schapire. (2023). Maxent software for modeling species niches and distributions (Version 3.4.3 (Java)) [Computer software]. http://biodiversityinformatics.amnh.org/open_source/maxent/.
Srivastava, V., Roe, A. D., Keena, M. A., Hamelin, R. C., & Griess, V. C. (2021). Oh the places they’ll go: Improving species distribution modelling for invasive forest pests in an uncertain world. Biological Invasions, 23(1), 297–349. https://doi.org/10.1007/s10530-020-02372-9
VanDerWal, J., Shoo, L. P., Graham, C., & Williams, S. E. (2009). Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know? Ecological Modelling, 220(4), 589–594. https://doi.org/10.1016/j.ecolmodel.2008.11.010
Vignali, S., Barras, A. G., Arlettaz, R., & Braunisch, V. (2020). SDMtune: An R package to tune and evaluate species distribution models. Ecology and Evolution, 10(20), 11488–11506. https://doi.org/10.1002/ece3.6786

Samuel M. Owens¹

2024-08-02

Overview

Setup

1. Format Data for Model

1.1 Input Data- env covariates

1.2 Input data- presences / absences (training and testing)

Model training object

Model testing object

2. Train Invaded Regional Model

Summary Statistics

3. Create Outputs for Analysis

3.1 Create distribution map for area of interest

3.2 Predict suitability for known SLF presences

3.3 Predict suibability for IVR regions

References

Run 'Ri.NAmerica' model trained on SLF invaded region in North America

Samuel M. Owens1

2024-08-02

Overview

Setup

1. Format Data for Model

1.1 Input Data- env covariates

1.2 Input data- presences / absences (training and testing)

Model training object

Model testing object

2. Train Invaded Regional Model

Summary Statistics

3. Create Outputs for Analysis

3.1 Create distribution map for area of interest

3.2 Predict suitability for known SLF presences

3.3 Predict suibability for IVR regions

References

Samuel M. Owens¹