Run 'Ri.Asia' model trained on SLF invaded range in Japan and South Korea

Overview

This vignette trains the second regional-scale model, the Ri.Asia model, which is based on the invaded range in east Asia. This vignette follows the same structure as vignette 070. In the next vignette, we will train the final regional-scale model based on the native range for SLF in east Asia.

Setup

CURRENT MODEL VERSION: v2

# general tools
library(tidyverse)  #data manipulation
library(here) #making directory pathways easier on different instances
# here::here() starts at the root folder of this package.
library(devtools)

# SDMtune and dependencies
library(SDMtune) # main package used to run SDMs
library(dismo) # package underneath SDMtune
library(rJava) # for running MaxEnt
library(plotROC) # plots ROCs

# spatial data handling
library(raster) 
library(terra) 

library(viridis)

Note: I will be setting the global options of this document so that only certain code chunks are rendered in the final .html file. I will set the eval = FALSE so that none of the code is re-run (preventing files from being overwritten during knitting) and will simply overwrite this in chunks with plots.

SDMtune will run MaxEnt through java via the rJava package. You will need to ensure that your machine has the proper version of java installed (x32 or x64) for your operating system.

checkMaxentInstallation(verbose = TRUE)

This chunk sets the java memory allocation (Xmx). I will increase the memory allocation from 512m (the default) to 2GB of memory.

# xmx sets java memory allocation
options(java.parameters = "-Xmx2048m")

# xss sets java stack size
# options(java.parameters = c("-Xss2560k", "-Xmx2048m"))

1. Format Data for Model

I will load in the datasets I will need for the MaxEnt models. These are labeled at the beginning of the object name by the parameter they will be used in the SDMtune::train() function (x, p or a). I will begin by loading in the covariate data and then by loading in the points datasets. The env_covariates dataset labeled invaded_asian contains a raster of each of the environmental covariates used to train the model- these were the covariates that were masked using the K-G climate zones in the last vignette.

1.1 Input Data- env covariates

# path to directory
mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/historical_climate_rasters/chelsa2.1_30arcsec")


# the env covariate scale used to train the model
x_invaded_asian_env_covariates_list <- list.files(path = file.path(mypath, "v1_maxent_10km"), pattern = "\\_regional_invaded_asian_KG.asc$", full.names = TRUE)  %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

# the scale used to make xy predictions
x_global_hist_env_covariates_list <- list.files(path = file.path(mypath, "v1_maxent_10km"), pattern = "\\.asc$", full.names = TRUE) %>%
    grep("bio2_1981-2010_global.asc|bio11_1981-2010_global.asc|bio12_1981-2010_global.asc|bio15_1981-2010_global.asc", ., value = TRUE)

The CMIP6 versions of these covariates will only be used for projection purposes.

# path to directory
  mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/future_climate_rasters/chelsa2.1_30arcsec")

# the env covariates for performing xy predictions for global slf and IVR points

# SSP126
x_global_126_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp126_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

# SSP370
x_global_370_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp370_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

# SSP585
x_global_585_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp585_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
  # dont include Access to cities
  grep(pattern = "atc_2015", value = TRUE, invert = TRUE)

I will create rasters of the environmental covariates, stack and gather summary statistics. I will also shorten their names and exclude possible operators from layer names (for example, using the dash symbol was found to interfere with SDMtune making predictions for tables downstream).

# layer name object. Check order of layers first
env_layer_names <- c("bio11", "bio12", "bio15", "bio2")

# easternUSA rasters
# stack env covariates
x_invaded_asian_env_covariates <- terra::rast(x = x_invaded_asian_env_covariates_list)

# attributes
nlyr(x_invaded_asian_env_covariates)
names(x_invaded_asian_env_covariates)
minmax(x_invaded_asian_env_covariates)
# ext(x_invaded_asian_env_covariates)
# crs(x_invaded_asian_env_covariates)

# I will change the name of the variables because they are throwing errors in SDMtune
names(x_invaded_asian_env_covariates) <- env_layer_names



# global rasters
# stack env covariates
x_global_hist_env_covariates <- terra::rast(x = x_global_hist_env_covariates_list)

# attributes
nlyr(x_global_hist_env_covariates)
names(x_global_hist_env_covariates)
minmax(x_global_hist_env_covariates)
# ext(x_global_hist_env_covariates)
# crs(x_global_hist_env_covariates)

# I will change the name of the variables because they are throwing errors in SDMtune
names(x_global_hist_env_covariates) <- env_layer_names
# confirmed- SDMtune doesnt like dashes in column names (it is read as a mathematical operation)

# SSP126
x_global_126_env_covariates <- terra::rast(x = x_global_126_env_covariates_list)

# attributes
nlyr(x_global_126_env_covariates)
names(x_global_126_env_covariates)
minmax(x_global_126_env_covariates)
# ext(x_global_126_env_covariates)
# crs(x_global_126_env_covariates)

names(x_global_126_env_covariates) <- env_layer_names


# SSP370
x_global_370_env_covariates <- terra::rast(x = x_global_370_env_covariates_list)

# attributes
nlyr(x_global_370_env_covariates)
names(x_global_370_env_covariates)
minmax(x_global_370_env_covariates)
# ext(x_global_370_env_covariates)
# crs(x_global_370_env_covariates)

names(x_global_370_env_covariates) <- env_layer_names


# SSP585
x_global_585_env_covariates <- terra::rast(x = x_global_585_env_covariates_list)

# attributes
nlyr(x_global_585_env_covariates)
names(x_global_585_env_covariates)
minmax(x_global_585_env_covariates)
# ext(x_global_585_env_covariates)
# crs(x_global_585_env_covariates)

names(x_global_585_env_covariates) <- env_layer_names

rm(x_invaded_asian_env_covariates_list)
rm(x_global_hist_env_covariates_list)
rm(x_global_126_env_covariates_list)
rm(x_global_370_env_covariates_list)
rm(x_global_585_env_covariates_list)

1.2 Input data- presences / absences (training and testing)

I need to also load in the SLF presence and background points datasets created in vignette 060.

# slf presences
p_slf_points <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "slf_all_coords_final_2024-08-05.csv")) %>%
  dplyr::select(-species)

# training presences
p_slf_points_invaded_asian_train <- read.csv(
  file = file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_asian_train_slf_presences_v2.csv")
  )
# test presences
p_slf_points_invaded_asian_test <- read.csv(
  file = file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_asian_test_slf_presences_v2.csv")
  )

# entire eastern USA background point set
a_regional_invaded_asian_background_points <- read.csv(
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_asian_background_points_v2.csv")
  )

I showed the process for selecting the presences in the last vignette, but this time I will just load in the presences dataset created in vignette 060 (regional model setup).

Model training object

Now I will create the “samples with data” object that SDMtune requires.

regional_invaded_asian_train <- SDMtune::prepareSWD(
  species = "Lycorma delicatula",
  env = x_global_hist_env_covariates,
  p = p_slf_points_invaded_asian_train, 
  a = a_regional_invaded_asian_background_points,
  verbose = TRUE # print helpful messages
  )

regional_invaded_asian_train@coords # coordinates
regional_invaded_asian_train@pa # presence / absence (background counted as absence)
regional_invaded_asian_train@data # extracted data from

I usually look at how many records were dropped before I save it, because SDMtune gives an undefined warning about how many samples were discarded. In this case, we only lost 3 presence records, so this is acceptable. I will also save the output to be used later.

SDMtune::swd2csv(swd = regional_invaded_asian_train, file_name = c(
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_asian_train_slf_presences_with_data_v2.csv"),
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_asian_background_points_with_data_v2.csv")
  ))

Model testing object

The testing / validation SWD object will be created in the same fashion, except using the presences from the invaded range in North America.

regional_invaded_asian_test <- SDMtune::prepareSWD(
  species = "Lycorma delicatula",
  env = x_global_hist_env_covariates, 
  p = p_slf_points_invaded_asian_test, 
  a = a_regional_invaded_asian_background_points,
  verbose = TRUE # print helpful messages
  )

regional_invaded_asian_test@coords # coordinates
regional_invaded_asian_test@pa # presence / absence (background counted as absence)
regional_invaded_asian_test@data # extracted data from

SDMtune::swd2csv(swd = regional_invaded_asian_test, file_name = c(
  file.path(here::here(), "vignette-outputs", "data-tables", "regional_invaded_asian_test_slf_presences_with_data_v2.csv"),
  file.path(here::here(), "vignette-outputs", "data-tables", "NULL.csv")
  ))

# remove second copy of background data points
file.remove(file.path(here::here(), "vignette-outputs", "data-tables", "NULL.csv"))

2. Train Invaded Regional Model

First, I will train a maxEnt model. This and the other regional scale model will NOT be cross-validated. Cross-validation randomly selects a percentage of the presence data for training and testing, without regard for spatial location. This process makes sense for the global model, which is fed ALL presence data. Due to the spatial scale of the regional models, it makes more sense to spatially separate the training and testing data if given the opportunity.

I will use the following list of hyperparameters to train the initial model, which I selected via tuning for the global model.

ALL feature classes (fc) used (l = linear, q = quadratic, p = product, h = hinge, t = threshold)
regularization multiplier (reg) set to 1.5 (more regularized than default of 1)
iterations = 5000. This is the max number of iterations for the optimization algorithm to perform before stopping training. Increasing this number from the default of 500 allows the algorithm to make more refined predictions.

regional_invaded_asian_model <- SDMtune::train(
  method = "Maxent",
  data = regional_invaded_asian_train,
  fc = "lqpht", # feature classes set to ALL
  reg = 1.5,
  iter = 5000, # number of iterations
  progress = TRUE
)

Summary Statistics

This function produces all summary statistics for this model. For the complete and annotated workflow used to create this function, see 051_compute_MaxEnt_summary_statistics_workflow.Rmd.

# output directory
mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_invaded_asian_v2")

scari::compute_MaxEnt_summary_statistics(
  model.obj = regional_invaded_asian_model, 
  model.name = "regional_invaded_asian", 
  mypath = mypath, 
  create.dir = TRUE, # create subdirectory
  env.covar.obj = x_invaded_asian_env_covariates, # env covariates raster stacked
  train.obj = regional_invaded_asian_train, # training data used to create model
  test.obj = regional_invaded_asian_test, # data you wish to use to test the model
  plot.type = c("cloglog", "logistic"), # types of univariate and marginal response curves to be created
  jk.test.type = c("train", "test") # types of jackknife curves to be created

  )

3. Create Outputs for Analysis

3.1 Create distribution map for area of interest

Lastly, I will use the SDMtune::predict() function to predict the suitability for the range of interest. I will threshold the map by the MTP, MTSS and 10_percentile thresholds. The workflow for this function is wrapped into the function create_MaxEnt_suitability_maps().

I will retrieve the specific value for the MTSS threshold for this model and use it to define a what is considered unsuitable in this raster. Then, I will reclassify the mean raster so that anything below this threshold is now “unsuitable”. I will use the mean cloglog threshold for all iterations. I will create a binary version of the mean raster, with unsuitable regions, which are below the MTSS training threshold, as 0, and those above the threshold as 1. Then, I can reclassify averaged raster of MaxEnt predictions. I will load in the averaged prediction for this model and reclassify the raster using a temporary raster layer that I will create. This raster will only have values where the model predicts the climate to be unsuitable, so that it can be plotted overtop the averaged raster to mask unsuitable areas

mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_invaded_asian_v2")

regional_invaded_asian_model <- read_rds(file = file.path(mypath, "regional_invaded_asian_model.rds"))

scari::create_MaxEnt_suitability_maps(
  model.obj = regional_invaded_asian_model,
  model.name = "regional_invaded_asian", 
  mypath = mypath, 
  create.dir = FALSE, 
  env.covar.obj = x_global_hist_env_covariates, 
  describe.proj = "globe_1981-2010", # name of area or time period describe.proj to
  clamp.pred = TRUE,
  map.thresh = TRUE, # whether thresholded versions of these maps should be created
  map.thresh.extra = "MTP",
  thresh = c("MTP", "10_percentile", "MTSS"),
  summary.file = file.path(mypath, "regional_invaded_asian_summary.csv")
)

I will also create suitability maps for the projected 2041-2070 climate data.

3.2 Predict suitability for all SLF presences

I will get projected suitability values, calculated on the cloglog scale (between 0 and 1) for each of the SLF presence points. These suitability values will be added back to the original data frame and saved. First, I will use SDMtune::prepareSWD() to extract raster values from each layer used to build the model. I will save this output.

During data analysis, I will create a scatter plot of these suitability values in the global vs regional models.

mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_invaded_asian_v2")

# slf presence data
slf_presences <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "slf_all_coords_final_2024-08-05.csv")) %>%
  dplyr::select(-species)

scari::predict_xy_suitability(
  xy.obj = slf_presences,
  xy.type = "Lycorma delicatula",
  env.covar.obj = x_global_hist_env_covariates,
  model.obj = regional_invaded_asian_model,
  mypath = mypath,
  predict.type = c("cloglog", "logistic"),
  output.name = "regional_invaded_asian_slf_all_coords_1981-2010"
)

3.3 Predict suibability for IVR regions

I will perform the same action as above, but for the locations of important wineries around the world.

mypath <- file.path(here::here() %>% 
                       dirname(),
                     "maxent/models/slf_regional_invaded_asian_v2")

# load in all IVR points
load(file = file.path(here::here(), "data", "wineries.rda")) 

IVR_regions <- wineries %>%
  dplyr::select(x, y)

scari::predict_xy_suitability(
  xy.obj = IVR_regions,
  xy.type = "IVR regions",
  env.covar.obj = x_global_hist_env_covariates,
  model.obj = regional_invaded_asian_model,
  mypath = mypath,
  predict.type = c("cloglog", "logistic"),
  output.name = "regional_invaded_asian_wineries_1981-2010",
  buffer.pred = TRUE
)

We have the Ri.Asia model trained, so now we will move on to train the final regional-scale model based on the native range in east Asia.

References

Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., & Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists: Statistical explanation of MaxEnt. Diversity and Distributions, 17(1), 43–57. https://doi.org/10.1111/j.1472-4642.2010.00725.x
Feng, X. (2022). Shandongfx/nimbios_enm [HTML]. https://github.com/shandongfx/nimbios_enm (Original work published 2018).
Gallien, L., Douzet, R., Pratte, S., Zimmermann, N. E., & Thuiller, W. (2012). Invasive species distribution models – how violating the equilibrium assumption can create new insights. Global Ecology and Biogeography, 21(11), 1126–1136. https://doi.org/10.1111/j.1466-8238.2012.00768.x
Maryam Bordkhani. (n.d.). Threshold rule [Online post].
Radosavljevic, A., & Anderson, R. P. (2014). Making better Maxent models of species distributions: Complexity, overfitting and evaluation. Journal of Biogeography, 41(4), 629–643. https://doi.org/10.1111/jbi.12227
Sobek-Swant, S., Kluza, D. A., Cuddington, K., & Lyons, D. B. (2012). Potential distribution of emerald ash borer: What can we learn from ecological niche models using Maxent and GARP? Forest Ecology and Management, 281, 23–31. https://doi.org/10.1016/j.foreco.2012.06.017
Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3), 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026
Steven Phillips. (2017). A Brief Tutorial on Maxent. http://biodiversityinformatics.amnh.org/open_source/maxent/.
Steven J. Phillips, Miroslav Dudík, & Robert E. Schapire. (2023). Maxent software for modeling species niches and distributions (Version 3.4.3 (Java)) [Computer software]. http://biodiversityinformatics.amnh.org/open_source/maxent/.
Srivastava, V., Roe, A. D., Keena, M. A., Hamelin, R. C., & Griess, V. C. (2021). Oh the places they’ll go: Improving species distribution modelling for invasive forest pests in an uncertain world. Biological Invasions, 23(1), 297–349. https://doi.org/10.1007/s10530-020-02372-9
VanDerWal, J., Shoo, L. P., Graham, C., & Williams, S. E. (2009). Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know? Ecological Modelling, 220(4), 589–594. https://doi.org/10.1016/j.ecolmodel.2008.11.010
Vignali, S., Barras, A. G., Arlettaz, R., & Braunisch, V. (2020). SDMtune: An R package to tune and evaluate species distribution models. Ecology and Evolution, 10(20), 11488–11506. https://doi.org/10.1002/ece3.6786

Samuel M. Owens¹

2024-08-08

Overview

Setup

1. Format Data for Model

1.1 Input Data- env covariates

1.2 Input data- presences / absences (training and testing)

Model training object

Model testing object

2. Train Invaded Regional Model

Summary Statistics

3. Create Outputs for Analysis

3.1 Create distribution map for area of interest

3.2 Predict suitability for all SLF presences

3.3 Predict suibability for IVR regions

References

Run 'Ri.Asia' model trained on SLF invaded range in Japan and South Korea

Samuel M. Owens1

2024-08-08

Overview

Setup

1. Format Data for Model

1.1 Input Data- env covariates

1.2 Input data- presences / absences (training and testing)

Model training object

Model testing object

2. Train Invaded Regional Model

Summary Statistics

3. Create Outputs for Analysis

3.1 Create distribution map for area of interest

3.2 Predict suitability for all SLF presences

3.3 Predict suibability for IVR regions

References

Samuel M. Owens¹