
Run 'Rn' model trained on SLF native range in east Asia
Samuel M. Owens1
2024-08-09
Source:vignettes/090_run_regional_native_MaxEnt_model.Rmd
090_run_regional_native_MaxEnt_model.Rmd
Overview
This vignette trains the last regional-scale model, the Rn model, which is based on the native range of SLF in east Asia. This vignette follows the structure of the last two vignettes and trains the last regional-scale model.
Setup
CURRENT MODEL VERSION: v3
# general tools
library(tidyverse) #data manipulation
library(here) #making directory pathways easier on different instances
# here::here() starts at the root folder of this package.
library(devtools)
# SDMtune and dependencies
library(SDMtune) # main package used to run SDMs
library(dismo) # package underneath SDMtune
library(rJava) # for running MaxEnt
library(plotROC) # plots ROCs
# spatial data handling
library(raster)
library(terra)
library(viridis)
Note: I will be setting the global options of this
document so that only certain code chunks are rendered in the final
.html file. I will set the eval = FALSE
so that none of the
code is re-run (preventing files from being overwritten during knitting)
and will simply overwrite this in chunks with plots.
SDMtune
will run MaxEnt through java via the
rJava
package. You will need to ensure that your machine
has the proper version of java installed (x32 or x64) for your operating
system.
checkMaxentInstallation(verbose = TRUE)
This chunk sets the java memory allocation (Xmx
). I will
increase the memory allocation from 512m (the default) to 2GB of
memory.
# xmx sets java memory allocation
options(java.parameters = "-Xmx2048m")
# xss sets java stack size
# options(java.parameters = c("-Xss2560k", "-Xmx2048m"))
1. Format Data for native regional Model
Input Data- env covariates
I will load in the datasets I will need for the MaxEnt models. These
are labeled at the beginning of the object name by the parameter they
will be used for in the maxent()
function (x, p or a). I
will begin by loading in the covariate data and then by loading in the
points datasets. The env_covariates
dataset labeled
native
contains a raster of each of the environmental
covariates used to train the model- these were the covariates that were
masked using the K-G climate zones in the last vignette.
# path to directory
mypath <- file.path(here::here() %>%
dirname(),
"maxent/historical_climate_rasters/chelsa2.1_30arcsec")
# the env covariate scale used to train the model
x_native_env_covariates_list <- list.files(path = file.path(mypath, "v1_maxent_10km"), pattern = "\\_regional_native_KG.asc$", full.names = TRUE) %>%
# dont include Access to cities
grep(pattern = "atc_2015", value = TRUE, invert = TRUE)
# the scale used to make xy predictions
x_global_hist_env_covariates_list <- list.files(path = file.path(mypath, "v1_maxent_10km"), pattern = "\\.asc$", full.names = TRUE) %>%
grep("bio2_1981-2010_global.asc|bio11_1981-2010_global.asc|bio12_1981-2010_global.asc|bio15_1981-2010_global.asc", ., value = TRUE)
The CMIP6 versions of these covariates will only be used for projection purposes.
# path to directory
mypath <- file.path(here::here() %>%
dirname(),
"maxent/future_climate_rasters/chelsa2.1_30arcsec")
# the env covariates for performing xy predictions for global slf and IVR points
# SSP126
x_global_126_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp126_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
# dont include Access to cities
grep(pattern = "atc_2015", value = TRUE, invert = TRUE)
# SSP370
x_global_370_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp370_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
# dont include Access to cities
grep(pattern = "atc_2015", value = TRUE, invert = TRUE)
# SSP585
x_global_585_env_covariates_list <- list.files(path = file.path(mypath, "2041-2070_ssp585_GFDL", "v1_maxent_10km"), pattern = "\\_global.asc$", full.names = TRUE) %>%
# dont include Access to cities
grep(pattern = "atc_2015", value = TRUE, invert = TRUE)
I will create rasters of the environmental covariates, stack and gather summary statistics. I will also shorten their names and exclude possible operators from layer names (for example, using the dash symbol was found to interfere with SDMtune making predictions for tables downstream).
# layer name object. Check order of layers first
env_layer_names <- c("bio11", "bio12", "bio15", "bio2")
# native range rasters
# stack env covariates
x_native_env_covariates <- terra::rast(x = x_native_env_covariates_list)
# attributes
nlyr(x_native_env_covariates)
names(x_native_env_covariates)
minmax(x_native_env_covariates)
# ext(x_native_env_covariates)
# crs(x_native_env_covariates)
# I will change the name of the variables because they are throwing errors in SDMtune
names(x_native_env_covariates) <- env_layer_names
# global rasters
# stack env covariates
x_global_hist_env_covariates <- terra::rast(x = x_global_hist_env_covariates_list)
# attributes
nlyr(x_global_hist_env_covariates)
names(x_global_hist_env_covariates)
minmax(x_global_hist_env_covariates)
# ext(x_global_hist_env_covariates)
# crs(x_global_hist_env_covariates)
# I will change the name of the variables because they are throwing errors in SDMtune
names(x_global_hist_env_covariates) <- env_layer_names
# confirmed- SDMtune doesnt like dashes in column names (it is read as a mathematical operation)
# SSP126
x_global_126_env_covariates <- terra::rast(x = x_global_126_env_covariates_list)
# attributes
nlyr(x_global_126_env_covariates)
names(x_global_126_env_covariates)
minmax(x_global_126_env_covariates)
# ext(x_global_126_env_covariates)
# crs(x_global_126_env_covariates)
names(x_global_126_env_covariates) <- env_layer_names
# SSP370
x_global_370_env_covariates <- terra::rast(x = x_global_370_env_covariates_list)
# attributes
nlyr(x_global_370_env_covariates)
names(x_global_370_env_covariates)
minmax(x_global_370_env_covariates)
# ext(x_global_370_env_covariates)
# crs(x_global_370_env_covariates)
names(x_global_370_env_covariates) <- env_layer_names
# SSP585
x_global_585_env_covariates <- terra::rast(x = x_global_585_env_covariates_list)
# attributes
nlyr(x_global_585_env_covariates)
names(x_global_585_env_covariates)
minmax(x_global_585_env_covariates)
# ext(x_global_585_env_covariates)
# crs(x_global_585_env_covariates)
names(x_global_585_env_covariates) <- env_layer_names
1.2 Input data- presences / absences (training and testing)
I need to also load in the SLF presence dataset created in vignette 030 and the background points dataset created in vignette 040.
# slf presences
p_slf_points <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "slf_all_coords_final_2024-08-05.csv")) %>%
dplyr::select(-species)
# training presences
p_slf_points_native_train <- read.csv(
file = file.path(here::here(), "vignette-outputs", "data-tables", "regional_native_train_slf_presences_v3.csv")
)
# test presences
p_slf_points_native_test <- read.csv(
file = file.path(here::here(), "vignette-outputs", "data-tables", "regional_native_test_slf_presences_v3.csv")
)
# entire eastern USA background point set
a_regional_native_background_points <- read.csv(
file.path(here::here(), "vignette-outputs", "data-tables", "regional_native_background_points_v3.csv")
)
Model Training object
Now I will create the “samples with data” object that SDMtune requires.
regional_native_train <- SDMtune::prepareSWD(
species = "Lycorma delicatula",
env = x_native_env_covariates,
p = p_slf_points_native_train,
a = a_regional_native_background_points,
verbose = TRUE # print helpful messages
)
regional_native_train@coords # coordinates
regional_native_train@pa # presence / absence (background counted as absence)
regional_native_train@data # extracted data from
Model testing object
The model will be tested using all presences from N America.
regional_native_test <- SDMtune::prepareSWD(
species = "Lycorma delicatula",
env = x_global_hist_env_covariates, # need the global covariates for China and N America
p = p_slf_points_native_test,
a = a_regional_native_background_points,
verbose = TRUE # print helpful messages
)
regional_native_test@coords # coordinates
regional_native_test@pa # presence / absence (background counted as absence)
regional_native_test@data # extracted data from
SDMtune::swd2csv(swd = regional_native_test, file_name = c(
file.path(here::here(), "vignette-outputs", "data-tables", "regional_native_test_slf_presences_with_data_v3.csv"),
file.path(here::here(), "vignette-outputs", "data-tables", "NULL.csv")
))
# remove second copy of background data points
file.remove(file.path(here::here(), "vignette-outputs", "data-tables", "NULL.csv"))
2. Train native Regional Model
First, I will train a maxEnt model. This and the other regional scale model will NOT be cross-validated. Cross-validation randomly selects a percentage of the presence data for training and testing, without regard for spatial location. This process makes sense for the global model, which is fed ALL presence data. Due to the spatial scale of the regional models, it makes more sense to spatially separate the training and testing data if given the opportunity.
I will use the following list of hyperparameters to train the initial model, which I selected via tuning for the global model.
- ALL feature classes (fc) used (l = linear, q = quadratic, p = product, h = hinge, t = threshold)
- regularization multiplier (reg) set to 1.5 (more regularized than default of 1)
- iterations = 5000. This is the max number of iterations for the optimization algorithm to perform before stopping training. Increasing this number from the default of 500 allows the algorithm to make more refined predictions.
regional_native_model <- SDMtune::train(
method = "Maxent",
data = regional_native_train,
fc = "lqpht", # feature classes set to ALL
reg = 1.5,
iter = 5000, # number of iterations
progress = TRUE
)
Summary Statistics
This function produces all summary statistics for this model. For the
complete and annotated workflow used to create this function, see
051_compute_MaxEnt_summary_statistics_workflow.Rmd
.
# output directory
mypath <- file.path(here::here() %>%
dirname(),
"maxent/models/slf_regional_native_v3")
scari::compute_MaxEnt_summary_statistics(
model.obj = regional_native_model,
model.name = "regional_native",
mypath = mypath,
create.dir = TRUE, # create subdirectory
env.covar.obj = x_native_env_covariates, # env covariates raster stacked
train.obj = regional_native_train, # training data used to create model
test.obj = regional_native_test, # data you wish to use to test the model
plot.type = c("cloglog", "logistic"), # types of univariate and marginal response curves to be created
jk.test.type = c("train", "test") # types of jackknife curves to be created
)
3. Create Outputs for Analysis
3.1 Create distribution map for area of interest
Lastly, I will use the SDMtune::predict()
function to
predict the suitability for the range of interest. I will threshold the
map by the MTP
, MTSS
and
10_percentile
thresholds. The workflow for this function is
wrapped into the function
create_MaxEnt_suitability_maps()
.
I will retrieve the specific value for the MTSS threshold for this model and use it to define a what is considered unsuitable in this raster. Then, I will reclassify the mean raster so that anything below this threshold is now “unsuitable”. I will use the mean cloglog threshold for all iterations. I will create a binary version of the mean raster, with unsuitable regions, which are below the MTSS training threshold, as 0, and those above the threshold as 1. Then, I can reclassify averaged raster of MaxEnt predictions. I will load in the averaged prediction for this model and reclassify the raster using a temporary raster layer that I will create. This raster will only have values where the model predicts the climate to be unsuitable, so that it can be plotted overtop the averaged raster to mask unsuitable areas
scari::create_MaxEnt_suitability_maps(
model.obj = regional_native_model,
model.name = "regional_native",
mypath = mypath,
create.dir = FALSE,
env.covar.obj = x_global_hist_env_covariates,
describe.proj = "globe_1981-2010", # name of area or time period describe.proj to
clamp.pred = TRUE,
map.thresh = TRUE, # whether thresholded versions of these maps should be created
map.thresh.extra = "MTP",
thresh = c("MTP", "10_percentile", "MTSS"),
summary.file = file.path(mypath, "regional_native_summary.csv")
)
I will also create suitability maps for the projected 2041-2070 climate data.
3.2 Predict suitability for all SLF presences
mypath <- file.path(here::here() %>%
dirname(),
"maxent/models/slf_regional_native_v3")
# slf presence data
slf_presences <- read.csv(file = file.path(here::here(), "vignette-outputs", "data-tables", "slf_all_coords_final_2024-08-05.csv")) %>%
dplyr::select(-species)
scari::predict_xy_suitability(
xy.obj = slf_presences,
xy.type = "Lycorma delicatula",
env.covar.obj = x_global_hist_env_covariates,
model.obj = regional_native_model,
mypath = mypath,
predict.type = c("cloglog", "logistic"),
output.name = "regional_native_slf_all_coords_1981-2010"
)
3.3 Predict suibability for IVR regions
I will perform the same action as above, but for the locations of important wineries around the world.
# load in all IVR points
load(file = file.path(here::here(), "data", "wineries.rda"))
IVR_regions <- wineries %>%
dplyr::select(x, y)
scari::predict_xy_suitability(
xy.obj = IVR_regions,
xy.type = "IVR regions",
env.covar.obj = x_global_hist_env_covariates,
model.obj = regional_native_model,
mypath = mypath,
predict.type = c("cloglog", "logistic"),
output.name = "regional_native_wineries_1981-2010",
buffer.pred = TRUE
)
Now that all of my models have been trained, I can move on to make check for the level of extrapolation (exdet) and the most important covariate (MIC) for each model’s predictions. This is an important step when applying models to climate change and invasive species, due to the non-equilibrium nature of these applications.
References
Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., & Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists: Statistical explanation of MaxEnt. Diversity and Distributions, 17(1), 43–57. https://doi.org/10.1111/j.1472-4642.2010.00725.x
Feng, X. (2022). Shandongfx/nimbios_enm [HTML]. https://github.com/shandongfx/nimbios_enm (Original work published 2018).
Gallien, L., Douzet, R., Pratte, S., Zimmermann, N. E., & Thuiller, W. (2012). Invasive species distribution models – how violating the equilibrium assumption can create new insights. Global Ecology and Biogeography, 21(11), 1126–1136. https://doi.org/10.1111/j.1466-8238.2012.00768.x
Maryam Bordkhani. (n.d.). Threshold rule [Online post].
Radosavljevic, A., & Anderson, R. P. (2014). Making better Maxent models of species distributions: Complexity, overfitting and evaluation. Journal of Biogeography, 41(4), 629–643. https://doi.org/10.1111/jbi.12227
Sobek-Swant, S., Kluza, D. A., Cuddington, K., & Lyons, D. B. (2012). Potential distribution of emerald ash borer: What can we learn from ecological niche models using Maxent and GARP? Forest Ecology and Management, 281, 23–31. https://doi.org/10.1016/j.foreco.2012.06.017
Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3), 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026
Steven Phillips. (2017). A Brief Tutorial on Maxent. http://biodiversityinformatics.amnh.org/open_source/maxent/.
Steven J. Phillips, Miroslav Dudík, & Robert E. Schapire. (2023). Maxent software for modeling species niches and distributions (Version 3.4.3 (Java)) [Computer software]. http://biodiversityinformatics.amnh.org/open_source/maxent/.
Srivastava, V., Roe, A. D., Keena, M. A., Hamelin, R. C., & Griess, V. C. (2021). Oh the places they’ll go: Improving species distribution modelling for invasive forest pests in an uncertain world. Biological Invasions, 23(1), 297–349. https://doi.org/10.1007/s10530-020-02372-9
VanDerWal, J., Shoo, L. P., Graham, C., & Williams, S. E. (2009). Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know? Ecological Modelling, 220(4), 589–594. https://doi.org/10.1016/j.ecolmodel.2008.11.010
Vignali, S., Barras, A. G., Arlettaz, R., & Braunisch, V. (2020). SDMtune: An R package to tune and evaluate species distribution models. Ecology and Evolution, 10(20), 11488–11506. https://doi.org/10.1002/ece3.6786